Chapter 3: The intstr and putint functions

In this chapter, I will introduce two new functions designed to work with the putstring function from the last chapter. We can already print a string, but this doesn’t work for numbers. Fortunately I have written my own functions which can convert whatever number is in the ax register into a string which can be displayed.

The source of a complete program is below. Take a good look at it even if you don’t understand it at first because I will be explaining some things about it.

  1 org 100h
  2 
  3 main:
  4 
  5 mov ax,text
  6 call putstring
  7 
  8 mov [radix],word 10
  9 mov [int_width],word 1 
 10 
 11 mov ax,1
 12 
 13 main_loop:
 14 call putint
 15 add ax,ax
 16 cmp ax,0
 17 jnz main_loop
 18 
 19 mov ax,4C00h
 20 int 21h
 21 
 22 text db 'This program displays numbers!',0Dh,0Ah,0
 23 
 24 ;This section is for the putstring function I wrote.
 25 ;It will print any zero terminated string that register ax points to
 26 
 27 stdout dw 1 ; variable for standard output so that it can theoretically be redirected
 28 
 29 putstring:
 30 
 31 push ax
 32 push bx
 33 push cx
 34 push dx
 35 
 36 mov bx,ax                  ;copy ax to bx for use as index register
 37 
 38 putstring_strlen_start:    ;this loop finds the length of the string as part of the putstring function
 39 
 40 cmp [bx], byte 0           ;compare this byte with 0
 41 jz putstring_strlen_end    ;if comparison was zero, jump to loop end because we have found the length
 42 inc bx                     ;increment bx (add 1)
 43 jmp putstring_strlen_start ;jump to the start of the loop and keep trying until we find a zero
 44 
 45 putstring_strlen_end:
 46 
 47 sub bx,ax                  ; sub ax from bx to get the difference for number of bytes
 48 mov cx,bx                  ; mov bx to cx
 49 mov dx,ax                  ; dx will have address of string to write
 50 
 51 mov ah,40h                 ; select DOS function 40h write 
 52 mov bx,[stdout]            ; file handle 1=stdout
 53 int 21h                    ; call the DOS kernel
 54 
 55 pop dx
 56 pop cx
 57 pop bx
 58 pop ax
 59 
 60 ret
 61 
 62 ;this is the location in memory where digits are written to by the intstr function
 63 int_string db 16 dup '?' ;enough bytes to hold maximum size 16-bit binary integer
 64 ;this is the end of the integer string optional line feed and terminating zero
 65 ;clever use of this label can change the ending to be a different character when needed 
 66 int_newline db 0Dh,0Ah,0 ;the proper way to end a line in DOS/Windows
 67 
 68 radix dw 2 ;radix or base for integer output. 2=binary, 8=octal, 10=decimal, 16=hexadecimal
 69 int_width dw 8
 70 
 71 intstr:
 72 
 73 mov bx,int_newline-1 ;find address of lowest digit(just before the newline 0Ah)
 74 mov cx,1
 75 
 76 digits_start:
 77 
 78 mov dx,0;
 79 div word [radix]
 80 cmp dx,10
 81 jb decimal_digit
 82 jge hexadecimal_digit
 83 
 84 decimal_digit: ;we go here if it is only a digit 0 to 9
 85 add dx,'0'
 86 jmp save_digit
 87 
 88 hexadecimal_digit:
 89 sub dx,10
 90 add dx,'A'
 91 
 92 save_digit:
 93 
 94 mov [bx],dl
 95 cmp ax,0
 96 jz intstr_end
 97 dec bx
 98 inc cx
 99 jmp digits_start
100 
101 intstr_end:
102 
103 prefix_zeros:
104 cmp cx,[int_width]
105 jnb end_zeros
106 dec bx
107 mov [bx],byte '0'
108 inc cx
109 jmp prefix_zeros
110 end_zeros:
111 
112 mov ax,bx ; store string in ax for display later
113 
114 ret
115 
116 ;function to print string form of whatever integer is in eax
117 ;The radix determines which number base the string form takes.
118 ;Anything from 2 to 36 is a valid radix
119 ;in practice though, only bases 2,8,10,and 16 will make sense to other programmers
120 ;this function does not process anything by itself but calls the combination of my other
121 ;functions in the order I intended them to be used.
122 
123 putint: 
124 
125 push ax
126 push bx
127 push cx
128 push dx
129 
130 call intstr
131 call putstring
132 
133 pop dx
134 pop cx
135 pop bx
136 pop ax
137 
138 ret

If you assemble and run this program, you will get the following output.

 1 This program displays numbers!
 2 1
 3 2
 4 4
 5 8
 6 16
 7 32
 8 64
 9 128
10 256
11 512
12 1024
13 2048
14 4096
15 8192
16 16384
17 32768

The program prints ax, adds ax to itself, and then stops as soon as ax “overflows” by going higher than the 16 bits limit. When this happens, it will become zero. Our jnz means Jump if Not Zero to the main loop.

If you look at the main function you will see that I set the radix to 10 with a mov instruction, even though it defaulted to 2. This is because most humans are used to decimal, AKA base ten. The base can be changed at any time in the program however you like.

Another thing you will notices is that the “putint” function does not process anything at all. It simply backs up the registers, calls the intstr function to create a string and then calls putstring to display it. In this example, a newline is automatically added for convenience. In my own code, I usually have this done manually by another small function, but for the putposes of this book, this default behavior is fine.

The real power of this program is the intstr function and why it works as it does. I will spend the rest of this chapter explaining why it works, why I designed it this way, and why this function is essential for Assembly language programs to make sense at all.

First, before the function begins, I have defined data using db and dw directives that FASM and NASM both understand.

1 int_string db 16 dup '?'

This creates sixteen bytes of question marks. The actual bytes used here don’t matter but I used ‘?’ marks to signal that the data that will go here is unknown when the program starts. The actual digits of the number we convert from the ax register will replace these bytes when we call the intstr function.

1 int_newline db 0Dh,0Ah,0

This line takes care of two problems. First, it makes sure that there is a zero byte after the 16 bytes of data from the int_string variable. It also includes the bytes thirteen and ten in hexadecimal notation. This is how newlines are saved if you have used a text editor in DOS or Windows. When you hit the return key it generates both these bytes to define that a line of text has ended. If you were to change the 0Dh to 0, then the program would print the numbers but without separating them with lines or even spaces. Such a thing would make the numbers hard to read. That is why the default behavior is to print a number and end the line for readability. This works for most simple integer sequence programs I will include in this book.

1 radix dw 2
2 int_width dw 8

These are two variables that define the base/radix of the number string generated and also the “width” AKA how many leading zeroes are used when writing it. The width should be set to one for most programs when decimal integers are expected. However, setting the width to 8 or 16 makes sense for binary integers where seeing the exact bits in their positions lined up is essential.

The defaults I have chosen include radix 2 (the binary numeral system) and a width of 8 (for seeing 8 bits of a byte). But the defaults are irrelevant for what you need to know. See how the main function in my example program for this chapter overwrites them in the main function.

1 intstr:

This is the label defining the start of the intstr function. If this label were not present, then the “call putint” statement would not know what you mean. Also, keep in mind that “intstr” is just an address in memory much like “radix” and “int_width” are addresses that tell where bytes of data are. However, the convention I use is that labels ending with a colon are labels that will be called with the “call” instruction or jumped to with a jmp or j?? instruction. There will be more explanation of conditional jumps in chapter 3.

1 mov bx,int_newline-1
2 mov cx,1

Before the loop in the intstr function, we set bx equal to the address of the byte before int_newline. This will be the final ‘?’ we defined earlier. The cx register is set to one to signal the number of bytes that will exist in the string. Every number, including 0 and 1 have at least one digit no matter which base you use. The cx register will come into use near the end of the function in its own loop.

1 digits_start:

This is a label defining the loop of where the digits are generated in the string.

1 mov dx,0;
2 div word [radix]
3 cmp dx,10
4 jb decimal_digit
5 jge hexadecimal_digit

dx is set to 0 because this has to be done before the “div” instruction. Otherwise, it will be mistaken as part of the dividend. This is a quirk of the x86 family of CPUs. The div intruction takes one argument, in this a word value from address radix and divides the ax register. If we don’t zero dx, it will use the dx register as an upper 16 bits of the number we are dividing from as well as using ax as the lower 16 bits.

For a full explanation of this division behavior, see section “2.1.3 Binary arithmetic instructions” in the FASM documentation. Tomasz Grysztar explains it better than I can and his information greatly helped me when trying to figure out why my function wasn’t working.

After the division, the dx register contains the remainder of the division. The ax register will be whatever it was divided by the radix. Knowing this, we “cmp dx,10” which means compare the dx register with 10. If it is less or below 10, then we know it is a decimal digit in the ranger of 0 to 9. If it is greater than or equal. Based on these conditions, we jump to one of two sections. One handles decimal digits and the other handles hexadecimal digits. Technically bases 2 to 36 are handled by my program as a consequence of the way I wrote it, but I wrote it with the idea that I would be using this function with only 3 different bases.

base 2 or binary for my personal enjoyment
base 16 or hexadecimal for a short form of binary
base 10 or decimal which is what humans know how to read. It will be used mostly in this book

1 decimal_digit: ;we go here if it is only a digit 0 to 9
2 add dx,'0'
3 jmp save_digit
4 
5 hexadecimal_digit:
6 sub dx,10
7 add dx,'A'

These two sections do the math of converting the byte digit into a character in ASCII representation that is printable. In either case, code moves on to the save_digit label after these.

 1 save_digit:
 2 
 3 mov [bx],dl
 4 cmp ax,0
 5 jz intstr_end
 6 dec bx
 7 inc cx
 8 jmp digits_start
 9 
10 intstr_end:

This tiny section saves the digit we obtained from this pass of the loop. The dl register is the lower byte of the dx register so we store this character of the digit into the address pointed to by bx.

Keep in mind that pointers are a primary feature in Assembly Language despite being criticized in C/C++ and excluded entirely from other languages like Java.

Next, we compare ax with zero. If it is zero, there are no more digits to write and we will end this loop by jumping to “intstr_end”. Otherwise we decrement (subtract 1 from bx) so that it will point to the digit left of the one we saved this time. We also increment cx so that it knows at least one more digit is to be written because the loop will happen again. We unconditionally jump to digits_start to process digits and save them until ax equals zero.

After ax is zero, we still have one more job to do in this function. The following loop will prefix the string with extra ‘0’ characters while cx is less than the int_width variable. This will be important for those who need the digits lined up to their place values. This is much more important for binary and hexadecimal than it is decimal, but it can still be helpful in decimal as I will show in a later chapter.

1 prefix_zeros:
2 cmp cx,[int_width]
3 jnb end_zeros
4 dec bx
5 mov [bx],byte '0'
6 inc cx
7 jmp prefix_zeros
8 end_zeros:

There is only one more instruction before we return from this function. We copy the bx register to the ax register because it points to the beginning of the string. This means that my putstring function will accurately display it because it expects ax to contain the address of the string.

1 mov ax,bx

Last but not least, we still have to end the function by returning to the caller.

1 ret

Before I end this chapter, I want to explain why I chose the register ax as the foundation for the behavior of my Assembly functions. ax is a special register in the sense that multiplication and division instructions use it as the required number we are multiplying or dividing. The Intel architecture treats this register as being more important for this reason.

But it is not just that, the programmers of DOS decided that the ax register was what decided which function of interrupt 21h would be called.

Therefore, because others already treated register ax as special, and since ‘a’ is the first letter of the alphabet, I decided that it would be the foundation of all my functions in “chastelib”, my DOS Assembly Standard Library. You are not expected to take my functions as the way things must be done. Once you are done with this book, you may continue learning beyond my skills and may decide that using another register makes more sense than ax.

I wrote this book to teach Assembly Language as I understand it, not to force my coding practices on you. However, I add these extra details so that other programmers who have experience in Assembly will have answers before they start emailing me: “Chastity, why didn’t you write the function this way! You can save a few bytes if you use instruction ??? instead or you could achieve faster speed if you avoided this jump here.”

I am letting you know now, I wrote the code for simplicity rather than performance. I use a very limited subset of the instructions available to the Intel 8086 family of CPUs. I firmly believe that all math for programs I want to write can be written using only mov,push,pop,add,sub,mul,div,cmp,jmp (and conditional jumps as well).

Up next

Chapter 4: Chastity’s Intel Assembly Reference