Because the microprocessor can't understand high-level instructions like the ones that are used in Pascal or C++, there is a low-level alternative called assembly language. While one line of code in a high-level language might translate into several machine instructions, one line of assembly language will translate into only one machine instruction. This means that an assembly language program is almost exactly the same as a machine language program, but put in a form that is easer for a human to read and write than the pure binary form that the machine uses.
The format of an assembly language instruction is as follows:
INST arg1, arg2, arg3 The first thing on the line is the name of the instruction. This is generally a 3 or 4 letter abbreviation of what the instruction does. This is known as a mnemonic. In assembly language, whitespace doesn't matter, but there can only be one instruction on a line, and you can't break up an instruction across multiple lines. The Smalltium assembler is not case sensitive (except for character constants in the data section).
The actual number and type of the arguments is variable and depends on which instruction you are using. In general, the arguments can be either registers or values. A register is specified by ``r#'' where # is the register number (for example r0 or r12). Remember that r0 always contains the value 0, and r15 is the program counter. Because r15 is the program counter, you can also refer to it as ``pc''. A value is specified by a decimal number or by a label. We will discuss labels a bit later.
Here are a few examples of assembly language instructions. Don't worry about what they do, this will be described later (in section 5.1).
| SET r2, 10 | |
| ADD r4, r5, r5 | |
| LOAD r2, FOO | ; (`FOO' is a label) |
You can put a comment in an assembly language program by using a semi-colon. Everything from the semi-colon to the end of the line is ignored by the assembler.
When the program is actually loaded into the machine, each of the instructions will be represented in its own word of memory. This means that every instruction has a memory address associated with it. The first instruction in the program is put in memory address 0, and each subsequent instruction is put in the next memory address. This means that to find the address of any instruction we can count how many instructions come before it, and that is the memory address of that instruction.
Most of the time, the program will execute one instruction after another, stepping through memory. Some instructions, however, will cause the program to move to a different part of the program and execute from there. These instructions require you to give them a memory address as one of their arguments. It would be a real pain to have to count lines of code every time you write one of these instructions, not to mention the fact that you'd have to change them all if you added a line of code before it, but luckily there is a better way.
You can put a label as the first (or only) thing on a line. A label is just a string followed by a colon. When the assembler encounters a label, it determines the address of the next instruction, and anyplace you use that label, it replaces the label with that address. Here is an example using a label (in this case, an infinite loop):
LOOP:
ADD r1, r1, r2
JUMP LOOP
Sometimes you need to tell the assembler to do something without actually translating an instruction. These commands are known as assembler directives. In the Smalltium assembler, there is only one assembler directive: ``.DATA''.
A program usually won't be composed entirely of instructions. It will also store some data in memory that is used by the program. The Smalltium assembler stores all of its data following all of the instructions. The .DATA directive tells the assembler that we are done with instructions, and the rest of the assembly source code will be data.
In the data section, every line of code holds one word of data. This word can be specified either as a decimal number or as a character constant. A character constant can be any character enclosed in single quotes.
Data words also have a memory address associated with each word. Because you will also need to use these addresses in order to use the data in your program, you can put labels in the data section as well.
Take a look at the hello.asm file to see what a complete assembly language program looks like.