The Assembly language is an extremely low level programming language on the level of the hardware itself. Its statements directly correspond to machine instructions. Because these instructions only do very small and simple things, it is not a very convenient language to program in - you have to do everything by hand, and if you make a small mistake the program will crash or segfault or something. Also, since every type of processor takes different instructions, assembly code is notoriously unportable.
On the other hand, writing raw assembly allows you to fine tune the code and apply maniacal optimizations that would be impossible even in relatively low-level languages such as C. Also, using macros that create standard patterns of instructions for you, assembly programming can use some higher level constructs.
Typically, assembly is written as text files containing one instruction per line, with some way to add labels to lines that can be used for jumps. To create actual executable code from this a program called an assembler is run on these files. The assembler transforms the instructions and data into their binary form, which is more or less unreadable to humans.
Here's assembly code for the C function strcpy for an Intel processor in AT&T assembly format (Intel has another format, but this is what GCC spits out). Note that this is completely unoptimized, it could be heaps smaller:
_strcpy: pushl %ebp movl %esp, %ebp subl $4, %esp movl 8(%ebp), %eax movl %eax, -4(%ebp) L1: movl 12(%ebp), %eax cmpb $0, (%eax) jne L2 movl 8(%ebp), %edx movl 12(%ebp), %eax movzbl (%eax), %eax movb %al, (%edx) incl 8(%ebp) leal 12(%ebp), %eax incl (%eax) jmp L1 L2: movl -4(%ebp), %eax leave ret
I won't go into details here, since I'm not an expert in this stuff myself. The instructions might make some sense to you, if you know that the last letter is used to indicate of what type the parameters are (b for 8-bit, w for 16-bit and l for 32-bit values), and the start of the instruction is some kind of abbrevation of an english word. movl moves a 32-bit value from one place to another. Literal values are given with a $ (the zero in the cmpb instruction), the %eax thingies are processor registers, and the syntax 8(%ebp) means it has to use the value 8 bytes after the memory location that %ebp points to - ebp is the register that points to the current stack frame. Parameters are given in the counter-intuitive order - movl %eax %ebx means move the value from ebx to eax. Notice how labels are used to construct the loop, the way you would do it in ancient BASIC languages. The program keeps jumping back to L1 until it finds the end of the source string. The three lines after L1 test the current character that the source variable ( 12(%ebp) ) points to against 0 and jumps to L2 when it finds a 0 character.
Assembly is good when parts of a program need to be extremely fast. Since ASM translates directly to machine on a one-to-one basis its faster than C++ and other high level languages. One thing to keep in mind, is that one C++ statement can sometimes equal five ASM statements. ASM is best used if a fast engine is wanted to be devloped by scratch, but it requires alot of skill to do. Just to do some complex things in ASM, say print a string on a console window requires a knowlege of interrupts if one is using a 16 bit system, and knowlege of handles if using 32 bit on the MASM assembler. Here is an example of adding two numbers in MASM.
TITLE FunStuff ; Program Description: Just Does FunStuff .model medium .486 ; ---------------------------------------------- .data ;Vars nm1 dw 21 nm2 dw 9 an1 dw ? ; ---------------------------------------------- .stack .code main proc far mov ax, @data mov ds, ax ; ----- ; Example addition. ; Calculate: an1 = nm1 + nm2 mov ax, nm1 ; put num1 in to register ax add ax, nm2 ; add num2 to contents in ax mov an1, ax ; put contents of ax into an1 ; ----- ; Done, terminate program. ; --- this is stuff that is needed at the end of all masm programs to terminate correctly. last: mov ah, 4ch int 21h main endp end main
If the programmer wanted to print out the numbers he/she would have to use interrupts to just print the result (21h if using a console window).
Some programming languages compile only to assembly. This is usually because it is very difficult to create a machine code compiler, assembly however is easier without as much of a speed loss.
The source code files listed below do not have accompanying tutorials.
- Paddles - Pong clone in assembly for tasm 1.0