The Assembly language is an extremely low level programming language on the level of the hardware itself. Its statements directly correspond to machine instructions. Because these instructions only do very small and simple things, it is not a very convenient language to program in - you have to do everything by hand, and if you make a small mistake the program will crash or segfault or something. Also, since every type of processor takes different instructions, assembly code is notoriously unportable.

On the other hand, writing raw assembly allows you to fine tune the code and apply maniacal optimizations that would be impossible even in relatively low-level languages such as C. Also, using macros that create standard patterns of instructions for you, assembly programming can use some higher level constructs.

Typically, assembly is written as text files containing one instruction per line, with some way to add labels to lines that can be used for jumps. To create actual executable code from this a program called an assembler is run on these files. The assembler transforms the instructions and data into their binary form, which is more or less unreadable to humans.

Here's assembly code for the C function strcpy for an Intel processor in AT&T assembly format (Intel has another format, but this is what GCC spits out). Note that this is completely unoptimized, it could be heaps smaller:

	pushl	%ebp
	movl	%esp, %ebp
	subl	$4, %esp
	movl	8(%ebp), %eax
	movl	%eax, -4(%ebp)
	movl	12(%ebp), %eax
	cmpb	$0, (%eax)
	jne	L2
	movl	8(%ebp), %edx
	movl	12(%ebp), %eax
	movzbl	(%eax), %eax
	movb	%al, (%edx)
	incl	8(%ebp)
	leal	12(%ebp), %eax
	incl	(%eax)
	jmp	L1
	movl	-4(%ebp), %eax

I won't go into details here, since I'm not an expert in this stuff myself. The instructions might make some sense to you, if you know that the last letter is used to indicate of what type the parameters are (b for 8-bit, w for 16-bit and l for 32-bit values), and the start of the instruction is some kind of abbrevation of an english word. movl moves a 32-bit value from one place to another. Literal values are given with a $ (the zero in the cmpb instruction), the %eax thingies are processor registers, and the syntax 8(%ebp) means it has to use the value 8 bytes after the memory location that %ebp points to - ebp is the register that points to the current stack frame. Parameters are given in the counter-intuitive order - movl %eax %ebx means move the value from ebx to eax. Notice how labels are used to construct the loop, the way you would do it in ancient BASIC languages. The program keeps jumping back to L1 until it finds the end of the source string. The three lines after L1 test the current character that the source variable ( 12(%ebp) ) points to against 0 and jumps to L2 when it finds a 0 character.

Assembly is good when parts of a program need to be extremely fast. Since ASM translates directly to machine on a one-to-one basis its faster than C++ and other high level languages. One thing to keep in mind, is that one C++ statement can sometimes equal five ASM statements. ASM is best used if a fast engine is wanted to be devloped by scratch, but it requires alot of skill to do. Just to do some complex things in ASM, say print a string on a console window requires a knowlege of interrupts if one is using a 16 bit system, and knowlege of handles if using 32 bit on the MASM assembler. Here is an example of adding two numbers in MASM.

TITLE FunStuff 
; Program Description:	Just Does FunStuff

.model medium

; ----------------------------------------------


	nm1	dw	21
	nm2	dw	9
	an1	dw	?

; ----------------------------------------------



main proc far
	mov  ax, @data
	mov  ds, ax

; -----
;	Example addition.
;	Calculate:   an1 = nm1 + nm2

	mov	ax, nm1  ; put num1 in to register ax
	add	ax, nm2  ; add num2 to contents in ax
	mov	an1, ax  ; put contents of ax into an1

; -----
;	Done, terminate program.

; --- this is stuff that is needed at the end of all masm programs to terminate correctly.
	mov  ah, 4ch
	int  21h
main endp

end main

If the programmer wanted to print out the numbers he/she would have to use interrupts to just print the result (21h if using a console window).

Some programming languages compile only to assembly. This is usually because it is very difficult to create a machine code compiler, assembly however is easier without as much of a speed loss.

Source CodeEdit

The source code files listed below do not have accompanying tutorials.

  • Paddles - Pong clone in assembly for tasm 1.0