Chapter 1: The First Program

For this chapter, I will explain give the source code of an example program that works in DOS, how to assemble it using the tools FASM or NASM, and finally, how the program works line by line.

First, here is the source code of a program that looks like nonsense but does something really cool.

 1 org 100h
 2 
 3 mov ah,2
 4 mov dl,20h
 5 loop_start:
 6 int 21h
 7 inc dl
 8 cmp dl,7Fh
 9 jne loop_start
10 
11 mov ax,4C00h
12 int 21h

You will need an assembler. My first recommendation is FASM, the Flat Assembler.

You can download FASM and install it by putting it in your path. The instructions for doing this depend on your operating system but it can be done on Windows, Linux, or even within a DOS operating system, which you will of course need to run the program.

Assemble with FASM

To assemble this program with FASM, place the source in a file named “main.asm” and run this command

fasm main.asm

FASM will automatically create a “main.com” file because it understands by the context of “org 100h” that you are intending to create a DOS executable that ends with a “.com” extension. This directive signals that the program starts at address 100 hexadecimal or 256 decimal. This kind of DOS program always starts at that address.

After you have created the “main.com” file, you will need some kind of DOS emulator to run it. I recommend DOSBox because it is easy to set up and has a lot of documentation to help you.

As an example of how to use DOSBox efficiently, I have added the path of my working directory where I test my programs directly into my DOSBox configuration file. Instructions for doing this depend on your host operating system. Consult the DOSBox documention for the location of where it will be on your operating system.

1 [autoexec]
2 # Lines in this section will be run at startup.
3 # You can put your MOUNT lines here.
4 mount d ~/git/Chastity-Code-Cookbook/work/

This mounts a folder on my Linux system as if it was the D drive recognized by DOS. Back in the DOS and early Windows days, there were “drives” which were all a single letter. A and B were the floppy disk drives, C was the hard disk drive, and sometimes there was a D or E drive for a CDROM drive. DOSBox lets you emulate them and mount any folder on the host operating system (usually Windows or Linux) and access it as you would in DOS.

To switch to the D drive, I just enter

D:

And then I can type “dir” to see the files, and then I can type

main

and the main.com file will run. This works because “.com” and “.exe” files are seen by DOS as a program that can be executed or run.

When you run the program, it will display

1  !"#$%&'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrstuvwxyz{|}~

Basically the program is displaying every printable character. This is the correct behavior I expected when I wrote the program.

Assemble with NASM

You can assemble the example program with NASM instead of FASM if you wish

1 nasm main.asm -o main.com

Disassembling the Program

If you have a disassembler, it is possible to extract the source code from the main.com binary file! I always used “ndisasm” for this because it usually comes installed along with NASM.

1 ndisasm -o 0x100 main.com

If you disassemble it like this, you will get this as a result:

1 00000100  B402              mov ah,0x2
2 00000102  B220              mov dl,0x20
3 00000104  CD21              int 0x21
4 00000106  FEC2              inc dl
5 00000108  80FA7F            cmp dl,0x7f
6 0000010B  75F7              jnz 0x104
7 0000010D  B8004C            mov ax,0x4c00
8 00000110  CD21              int 0x21

You will see that it is almost identical to the source except that the loop_start label has been replaced with the number 104h. This is because at the machine code level, there are only numbers.

The first column in the ndisasm output is the address of the instruction. The second are the actual machine code bytes. The third column are the approximation of the original source code. It has small differences but it is close enough that we can figure out which program it was that was assembled!

Now let me break down why it works by repeating the source but including comments this time

 1 org 100h       ;tell assembler that code begins running at this address
 2 
 3 mov ah,2       ; move (copy) the number 2 into the ah register
 4 mov dl,20h     ; move the number 20 hex=32 dec into the dl register
 5 loop_start:    ; the loop starts here
 6 int 21h        ; interrupt 21 hex=33 dec for a DOS system call
 7 inc dl         ; add 1 to the dl register
 8 cmp dl,7Fh     ; compare the dl register with 7F hex = 127 dec
 9 jne loop_start ; Jump if Not Equal to loop_start
10 
11 mov ax,4C00h   ; DOS exit call with ah=4C and return al=0
12 int 21h        ; DOS system call to complete the exit function

I know you are probably a little bit confused at this point and have many questions such as

  • What is an interrupt?
  • What is a system call?
  • Why do you write your numbers in hexadecimal?
  • What is a register?

I would probably give you the wrong definition if I had to explain what an interrupt is, from a hardware or software perspective. In this case, the interrupt number 21h is something put into memory by DOS which can be called as if it were a function like you would write in any language.

The reason the interrupts and other numbers are in hexadecimal is because most assembly language books and tutorials use them. Hexadecimal is objectively more convenient because it can be more easily converted to and from binary. For now just remember that interupt 21h is actually thirty-three and not twenty-one. I have chosen to stick with hexadecimal for this book because it will be relevant later on when I show you other tools which can be used if you understand hexadecimal!

A register is a special variable that exists on a specific CPU type. DOS, Windows, and most Linux operating systems run on an Intel 8086 compatible CPU. I will explain the registers and their functions.

The General Purpose Registers

There are 8 general purpose registers.

  • AX: The Accumulator Register
  • BX: The Base Register
  • CX: The Count Register
  • DX: The Data Register
  • SI: The Source Index
  • DI: the Destination index
  • BP: The Base Pointer
  • SP: The Stack Pointer

Of those 8 registers, only BX,BP,SI,DI can be used as index variables. This is only a limitation of 16 bit real mode programs like those written in this book. 32-bit and 64-bit programs do not have this limitation, but they have other limitations to worry about and will be covered in future books.

These registers are “general” in that they can do many things, but they each have more “specific” uses also.

In my source code, I use lowercase for the names of instructions and registers, but for this section, I listed them in capital letters because they are actually acronyms named for their purpose according to what Intel had in mind when making the 8086 and above Central Processing Units.

Most of the time, I stick with only AX,BC,CX,DX when writing my programs. If I need an extra registers, I will use BP,SI,DI. There are special instructions for them but these are outside the scope of what I am trying to teach with this book.

You might wonder, why isn’t there EX,FX,…YX,ZX? Perhaps in a perfect world there should have been, but they probably didn’t want to spend the extra money on having extra registers for every 26 letter of the alphabet.

In the next chapter I am going to introduce a program that can print any string you give to it. Basically, it will be the proper “Hello World” program that you were expecting.

Interrupt Information

All code depends on different functions of interrupt 21h in this book. I have provided the following link to where you can read about the most common functions of this interrupt which will be used in this book

The function chosen depends on the value of the AH register (the upper half of the AX register). Depending on which function is selected, then other registers act as options or arguments to these functions. The example I included in this chapter shows only the ah=2 call (equivalent of C putchar call) and the exit call of ah=4Ch with al being the return value.