Selfie: Computer Science for Everyone

Selfie: Computer Science for Everyone
Selfie: Computer Science for Everyone
Buy on Leanpub

Acknowledgements

The ideas presented here are the result of many years of teaching and working with students and colleagues around the world. I am grateful to my group of students and faculty in Salzburg who helped me over the years with refining and deepening my understanding of computer science. I am also particularly grateful to my colleague Professor Raja Sengupta at UC Berkeley who challenged me to the point that made me discover what has now become the Selfie Project. The design of the compiler is inspired by the Oberon compiler of Professor Niklaus Wirth from ETH Zurich. The design of the selfie microkernel is inspired by microkernels of Professor Jochen Liedtke from University of Karlsruhe.


Disclaimer:

This is work in progress. Only three chapters are done and a fourth getting ready by mid 2017. Others exist in draft form but will take more time to be done. Also, even existing chapters may change, see the git history for details.

Code License

The code provided in this book is covered by the Simplified BSD License. Please make sure to include it in any code derived from here.

Copyright (c) 2015-2017, the Selfie Project authors. All rights reserved.

Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions are met:

1. Redistributions of source code must retain the above copyright notice, this
   list of conditions and the following disclaimer.
2. Redistributions in binary form must reproduce the above copyright notice,
   this list of conditions and the following disclaimer in the documentation
   and/or other materials provided with the distribution.

THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND
ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR
ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
(INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND
ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

1. Introduction

Computer science is a mystery to so many and yet more and more people use computers every day in one form or another. There are increasingly many people with all kinds of backgrounds other than computer science that even code every day. At schools, colleges, universities, and companies around the world there is hardly anyone left who is not confronted with the machine and its code. But even for people just using the machine in their hands, on their desks, and in the clouds there is often that unsatisfactory experience of not understanding what is really going on.

While a book about computer science for everyone may sound appealing it actually requires serious commitment to understand the material even though we tried very hard to simplify everything as much as possible. The reason is that computers and software are so expressive that it is unlikely that any other computational machine in the future will be more expressive. Anything that can be computed can be computed now, if you have enough time, space, and energy.

This book is based on the Selfie Project, an educational software system that has grown out of many years of teaching undergraduate and graduate classes in computer science. The key observation is that understanding computer science and the meaning of software in particular can only be achieved by understanding how software and data translate all the way down to the machine. Selfie shows how to do that. This may sound difficult but can actually be done in a systematic fashion with a strong focus on basic principles.

Selfie is software that can translate itself to code that can be executed by a computer. This is called self-compilation. But selfie even includes software that can mimic a computer that can execute the code of selfie. This is called self-execution. And, selfie can even do that on behalf of other computers but also itself. This is called self-hosting. By now your mind might be spinning but you at least understand why selfie is called selfie.

Why is the self so important? Because it shows how meaning is created systematically on a machine. Selfie is software written in some programming language. However, selfie defines the meaning of that language which implies that selfie defines its own meaning. This is very similar to an English dictionary written in English. The only difference is that selfie defines meaning formally in such a way that a machine can run selfie and thus any software written in the language in which selfie is written.

The book works by explaining selfie step by step while focusing on the basic principles that are necessary to understand each step. In order to follow the book you will need to run selfie either on your computer or in the cloud. To run selfie on your computer you need a terminal and a C compiler. If you have a Mac or a Linux system, terminal and compiler are already there, and easy to get for a Windows machine. If you only have access to a web browser you can still run selfie through a cloud-based development environment, see the README for more details.

Chapter 2 introduces selfie and how to use it. The focus is on developing an idea of the problem of semantics. Computer scientists have developed all these formalisms for coding and many other things. We show how selfie can help to understand the relevant aspects of the problem.

Chapter 3 is an introduction to encoding and decoding basic information such as characters, strings, numbers, and even machine instructions in binary form. We use selfie to show examples and how their encoding and decoding is actually done.

Chapter 4 introduces computation as the evolution of state. Using a simple example of a program we explain what its source code means, how that source code translates to machine code that gives the source code its meaning, and how both source and machine code execute from one state to another.

Chapter 5 will be on regularity and finite state machines. We show how to control the potentially very large state space of code to simplify reasoning about its correctness.

2. Semantics

When it comes to computers a bit is the first thing we may want to know about and understand. The only thing that computers really do on the level of the machine is handle enormous amounts of bits, nothing else.

Bit
The basic unit of information in computing and digital communications. A bit can have only one of two values which are most commonly represented as either a 0 or 1. The term bit is a portmanteau of binary digit.

There are two fundamental reasons why computers use bits and nothing else. The first reason is that the two values of a bit can readily be distinguished by electronic circuits using different levels of voltage, say, low voltage for 0 and high voltage for 1. Distinguishing more values is possible and would be exciting to see but has largely not yet happened for technological reasons. The second reason is that whatever you can say using any number of values per digit or character greater than two you can also say using two values with only an insignificant difference in the number of digits or characters you need to say the same thing. More on that in the next chapter.

Selfie is a computer program that is fully contained in a single file called selfie.c which consists of around one hundred and sixty thousand characters that are encoded in bits.

Character
A unit of information that roughly corresponds to a grapheme, grapheme-like unit, or symbol, such as in an alphabet or syllabary in the written form of a natural language. Examples of characters include letters, numerical digits, common punctuation marks (such as “.” or “-“), and whitespace. The concept also includes control characters, which do not correspond to symbols in a particular natural language, but rather to other bits of information used to process text in one or more languages. Examples of control characters include carriage return, line feed, and tab, as well as instructions to printers or other devices that display or otherwise process text.

You may also see for yourself by downloading selfie and then using the less command in your terminal (use the cd command first to go into the top folder of selfie which is where you unzipped it on your machine):

> less selfie.c

For example, the first few lines of selfie.c are:

// Copyright (c) 2015-2016, the Selfie Project authors. All rights reserved.
// Please see the AUTHORS file for details. Use of this source code is
// governed by a BSD license that can be found in the LICENSE file.
//
// Selfie is a project of the Computational Systems Group at the
// Department of Computer Sciences of the University of Salzburg
// in Austria. For further information and code please refer to:
//
// http://selfie.cs.uni-salzburg.at

What you see here is called source code where the first few lines are in fact comments that will be ignored by the machine.

Source Code
Any collection of computer instructions (possibly with comments) written using some human-readable computer language, usually as text.

The characters of source code like selfie.c are usually encoded in bits according to the ASCII standard. Remember, since computers can only handle bits everything needs to be encoded in bits.

American Standard Code for Information Interchange (ASCII)
7-bit encoding scheme for 128 characters: numbers 0 to 9, lowercase letters a to z, uppercase letters A to Z, basic punctuation symbols, control codes that originated with Teletype machines, and a space.

On machine level, each character is thus represented by seven bits. What we see when we invoke less is merely a human-readable version of these bits. To get a better feel of the size of selfie.c run the command wc -m selfie.c which counts the characters in selfie.c:

> wc -m selfie.c
176408 selfie.c

The output means that selfie.c at the time of invoking the command consisted of 176,408 characters. Your output may not be exactly the same number of characters depending on which version of selfie you have. The same is true for other statistical data shown below. However, the order of magnitude of that data is likely to be the same here and in your version.

The -m part of the command is called an option that directs, in this case, wc to output the number of characters. However, we should mention that the characters in selfie.c are actually encoded according to the newer UTF-8 standard which uses eight rather than seven bits per character.

UTF-8
(Universal Character Set Transformation Format - 8-bit) A character encoding capable of encoding all possible characters (called code points) in Unicode. The encoding is variable-length and uses 8-bit code units. It is designed for backward compatibility with ASCII.

When it comes to selfie we nevertheless speak of ASCII characters because of that backward compatibility. Here this means that the UTF-8 encoding of a given character is exactly the ASCII encoding of that character when simply ignoring the eighth bit. More on encoding characters can be found in the next chapter.

Since a unit of eight bits is very common in computer systems there is a well-known term for that unit called a byte.

Byte
A unit of digital information in computing and telecommunications that most commonly consists of eight bits.

We can easily verify that selfie.c consists of the same number of bytes than characters by using the command wc -c selfie.c with the -c option which directs wc to report the number of bytes in selfie.c:

> wc -c selfie.c
176408 selfie.c

In other words, for a computer selfie.c is in fact a sequence of eight times 176,408 bits, that is, 1,411,264 bits. The key question addressed by this book is where the meaning of these bits comes from.

The source code of selfie is ultimately a sequence of bits. How do they get their meaning? Bits, as is, have no meaning, they are just bits. Characters, by the way, are no different. Characters, as is, have no meaning either, they are just symbols, and can anyway be encoded in bits. When it comes to a machine the meaning of bits and thus characters or any kind of symbol has to be created mechanically. The key insight is that the meaning of bits is in the process of changing bits, not in the bits themselves.

The process of adding 7 to 42, according to the rules of elementary arithmetic, makes 7, 42, and 49 represent numbers rather than something else. But if we just look at 7, 42, and 49 they could mean anything. In this example, elementary arithmetic provides meaning, namely the semantics of natural numbers. This is why it is so important to learn elementary arithmetic in school. It tells us what numbers are, not just how to add, subtract, multiply, and divide them!

Elementary Arithmetic
The simplified portion of arithmetic that includes the operations of addition, subtraction, multiplication, and division.

Virtually all modern computers include circuitry for performing elementary arithmetic but they do so with binary rather than decimal numbers since computers only handle bits. Our example of 7, 42, and 49 in binary is just as simple.

Again, adding 111 to 101010 makes 111, 101010, and 110001 represent numbers while they could otherwise represent anything. More on encoding numbers in binary can be found in the next chapter.

Binary Number
A number expressed in the binary numeral system or base-2 numeral system which represents numeric values using two different symbols: typically 0 (zero) and 1 (one).

It may be hard to believe but determination and knowing elementary arithmetic is enough to understand this book. The source code of selfie, that is, the around one hundred and fifty thousand characters of selfie.c represented by around one million bits get their meaning in pretty much the same way than having bits represent numbers: by changing them. The only difference is that the process of change is a bit more involved than elementary arithmetic.

The Compiler

Let us have a closer look at how this works with selfie. Try the make command:

> make
cc -w -m32 -D'main(a,b)=main(a,char**argv)' selfie.c -o selfie

The make command invokes the cc command which compiles the file selfie.c into a file called selfie (without the .c extension) as directed by the -o option, ignoring the other options for clarity. In other words, the sequence of bits representing selfie.c is changed into another sequence of bits representing selfie. The difference between the two sequences is that the former represents source code whereas the latter represents machine code.

Compiling Selfie
Compiling Selfie
Machine Code
A sequence of instructions executed directly by a computer’s central processing unit (CPU).

The key idea is that both sequences are supposed to have the same semantics. However, selfie is executable by a computer whereas selfie.c is not, at least not purposefully, yet selfie.c is human-readable and writable in particular. The process of changing selfie.c into selfie is called compilation which is done by a compiler such as the above cc compiler.

Compiler
A computer program that transforms source code written in a programming language (the source language) into another computer language (the target language), with the latter often having a binary form known as object or machine code. The most common reason for converting source code is to create an executable program.

This means that we now have a version of selfie that we can run on our machine! Let us try and run selfie using the command ./selfie:

> ./selfie
./selfie: usage: selfie { -c { source } | -o binary | -s assembly | -l binary } \
[ (-m | -d | -y | -min | -mob ) size ... ]

Selfie requires using at least one option to do anything useful and therefore responds with its usage pattern and then terminates without doing anything else. To do something useful, let us try the first -c option on, well, selfie.c itself:

> ./selfie -c selfie.c
./selfie: this is selfie's starc compiling selfie.c
./selfie: 176408 characters read in 7083 lines and 969 comments
./selfie: with 97779(55.55%) characters in 28914 actual symbols
./selfie: 261 global variables, 289 procedures, 450 string literals
./selfie: 1958 calls, 723 assignments, 57 while, 572 if, 243 return
./selfie: 121660 bytes generated with 28779 instructions and 6544 bytes of data

Now, things are taking off. Selfie includes a compiler, just like the cc compiler, that we call starc and invoke with the -c option. The statistics in the output of starc may safely be ignored here. This will become clear later. The starc compiler is capable of compiling all of selfie including itself. By now we all know why selfie is called selfie but the story continues.

After compiling selfie.c starc only stores the machine code internally but does not write it to a file. To do that we need to use the -o option:

> ./selfie -c selfie.c -o selfie.m
./selfie: this is selfie's starc compiling selfie.c
...
./selfie: 121660 bytes with 28779 instructions and 6544 bytes of data written in\
to selfie.m

The three dots ... indicate that we omitted some output that is either identical to or insignificantly different from the previous output. The command produces a file called selfie.m that contains machine code compiled from selfie.c using the starc compiler in selfie rather than the cc compiler. The process is called self-compilation.

Self-Compiling Selfie
Self-Compiling Selfie

By now we have three different sequences of bits in selfie.c, selfie, and selfie.m that are all supposed to have the same semantics. The only difference between selfie and selfie.m is that selfie is executable on our computer whereas selfie.m is executable on a computer that we do not have. This is not the end of the story though.

The Emulator

The reason why starc targets a different machine than cc is because it makes starc a lot simpler. But how do we execute selfie.m? Well, selfie not only includes the starc compiler, it also includes mipster, which is an emulator of the computer that can execute selfie.m and any other machine code generated by starc. That computer is so simple that everyone can understand it in a reasonable amount of time even though it corresponds to parts of a real machine.

Emulator
Software that enables one computer system (called the host) to behave like another computer system (called the guest).

We execute selfie.m by first loading it using the -l option and then running it by invoking mipster using the -m option:

> ./selfie -l selfie.m -m 1
./selfie: 121660 bytes with 28779 instructions and 6544 bytes of data loaded fro\
m selfie.m
./selfie: this is selfie's mipster executing selfie.m with 1MB of physical memory
selfie.m: usage: selfie { -c { source } | -o binary | -s assembly | -l binary } \
[ (-m | -d | -y | -min | -mob ) size ... ]
selfie.m: exiting with exit code 0 and 0.00MB of mallocated memory
./selfie: this is selfie's mipster terminating selfie.m with exit code 0 and 0.1\
2MB of mapped memory
./selfie: profile: total,max(ratio%)@addr,2max(ratio%)@addr,3max(ratio%)@addr
./selfie: calls: 1507,498(33.11%)@0x238C,248(16.47%)@0x23EC,124(8.23%)@0x88
./selfie: loops: 153,123(80.64%)@0x3D2C,30(19.60%)@0x1DC,0(0.00%)
./selfie: loads: 10568,498(4.71%)@0x23A0,248(2.34%)@0x2400,125(1.18%)@0x3D2C
./selfie: stores: 6321,498(7.88%)@0x2390,248(3.92%)@0x23F0,125(1.97%)@0x3D34

After loading selfie.m the -m 1 option directs mipster to emulate a computer with 1 megabyte of memory (abbreviated 1MB, explained below) for executing selfie.m. Since selfie.m is invoked without any options, which could appear after the -m 1 option, it responds, just like selfie without options before, with its usage pattern and then terminates. After that mipster terminates and outputs a summary of its builtin performance profiler.

Profiling
A form of dynamic program analysis that measures, for example, the space (memory) or time complexity of a program, the usage of particular instructions, or the frequency and duration of function calls. Most commonly, profiling information serves to aid program optimization.

We later use profiling to explain performance-related issues of selfie. But now, let us try something cool. Since mipster is part of selfie we can even have mipster execute machine code for selfie generated by starc without writing the code into a file:

> ./selfie -c selfie.c -m 1
./selfie: this is selfie's starc compiling selfie.c
...
./selfie: this is selfie's mipster executing selfie.c with 1MB of physical memory
...
./selfie: profile: total,max(ratio%)@addr(line#),2max(ratio%)@addr(line#),3max(r\
atio%)@addr(line#)
./selfie: calls: 1507,498(33.11%)@0x238C(~1294),248(16.47%)@0x23EC(~1300),124(8.\
23%)@0x88(~1)
./selfie: loops: 153,123(80.64%)@0x3D2C(~1666),30(19.60%)@0x1DC(~219),0(0.00%)
./selfie: loads: 10568,498(4.71%)@0x23A0(~1294),248(2.34%)@0x2400(~1300),125(1.1\
8%)@0x3D2C(~1666)
./selfie: stores: 6321,498(7.88%)@0x2390(~1294),248(3.92%)@0x23F0(~1300),125(1.9\
7%)@0x3D34(~1666)

The output is just like before except for the approximate source code line numbers in the profile. Those are only available if executing machine code generated in the same run rather than loading machine code. Never mind if you do not understand what this means. It will become clear later.

Let us maintain our momentum and do something even cooler than before. We now compile selfie.c with starc and then execute the generated machine code on mipster only to have that code compile selfie.c again, all in the same run. This requires 2MB rather than 1MB for mipster and will take starc a few minutes to complete depending on the speed of our computer because executing starc on mipster is slower than executing starc directly on our computer. However, it does work and this is what counts here:

> ./selfie -c selfie.c -m 2 -c selfie.c
./selfie: this is selfie's starc compiling selfie.c
...
./selfie: this is selfie's mipster executing selfie.c with 2MB of physical memory
selfie.c: this is selfie's starc compiling selfie.c
...
selfie.c: exiting with exit code 0 and 1.05MB of mallocated memory
./selfie: this is selfie's mipster terminating selfie.c with exit code 0 and 1.1\
6MB of mapped memory
...

We can even verify that starc generates the same machine code independently of whether it runs directly on our machine or on mipster. We simply have starc generate machine code into two different files called selfie1.m and selfie2.m:

> ./selfie -c selfie.c -o selfie1.m -m 2 -c selfie.c -o selfie2.m
./selfie: this is selfie's starc compiling selfie.c
...
./selfie: 121660 bytes with 28779 instructions and 6544 bytes of data written in\
to selfie1.m
./selfie: this is selfie's mipster executing selfie1.m with 2MB of physical memo\
ry
selfie1.m: this is selfie's starc compiling selfie.c
...
selfie1.m: 121660 bytes with 28779 instructions and 6544 bytes of data written i\
nto selfie2.m
selfie1.m: exiting with exit code 0 and 1.05MB of mallocated memory
./selfie: this is selfie's mipster terminating selfie1.m with exit code 0 and 1.\
16MB of mapped memory
...
Mipster-Executed Self-Compilation
Mipster-Executed Self-Compilation

Both files generated by starc are indeed identical. To verify that try the diff command as follows:

> diff -s selfie1.m selfie2.m
Files selfie1.m and selfie2.m are identical

We call this phenomenon the fixed point of a self-compiling compiler. If we continue generating machine code for starc using starc the machine code will remain the same. So, we still have only three different sequences of bits in selfie.c, selfie, and selfie.m that are supposed to have the same semantics. In particular, if we run selfie.m on mipster to compile selfie.c the result will again be an exact copy of selfie.m. However, there is a fourth sequence with the same semantics that we can generate with selfie. It represents a human-readable version of the machine code in selfie.m written in what is called assembly.

Assembly
A low-level programming language for a computer, or other programmable device, in which there is a very strong (generally one-to-one) correspondence between the language and the architecture’s machine code instructions.

Try the -s option to have selfie generate assembly as follows:

> ./selfie -c selfie.c -s selfie.s
./selfie: this is selfie's starc compiling selfie.c
...
./selfie: 1210472 characters of assembly with 28779 instructions written into se\
lfie.s

The part of selfie that generates assembly is called a disassembler.

Disassembler
A computer program that translates machine language into assembly language — the inverse operation to that of an assembler.

Selfie does not implement an assembler though. The starc compiler simply generates machine code right away without going through assembly first (which would then require an assembler to generate machine code from that assembly).

Since assembly is human-readable let us check it out:

> less selfie.s

For example, the first few lines of selfie.s are:

0x0(~1): 0x24080007: addiu $t0,$zero,7
0x4(~1): 0x24094000: addiu $t1,$zero,16384
0x8(~1): 0x01090019: multu $t0,$t1
0xC(~1): 0x00004012: mflo $t0
0x10(~1): 0x00000000: nop
0x14(~1): 0x00000000: nop
0x18(~1): 0x25081B38: addiu $t0,$t0,6968
0x1C(~1): 0x251C0000: addiu $gp,$t0,0
0x20(~1): 0x24080FFF: addiu $t0,$zero,4095
0x24(~1): 0x24094000: addiu $t1,$zero,16384
0x28(~1): 0x01090019: multu $t0,$t1
0x2C(~1): 0x00004012: mflo $t0
0x30(~1): 0x00000000: nop
0x34(~1): 0x00000000: nop
0x38(~1): 0x25083FFC: addiu $t0,$t0,16380
0x3C(~1): 0x8D1D0000: lw $sp,0($t0)
0x40(~1): 0x0C007029: jal 0x7029[0x1C0A4]
0x44(~1): 0x00000000: nop

What you see is a human-readable version of the machine code in selfie.m. The purpose of selfie.s is here to study selfie.m and eventually understand its semantics. Selfie can even show the assembly code as it is being executed by mipster which helps debugging the machine code.

Debugging
The process of finding and resolving of defects that prevent correct operation of computer software or a system.

Let us try that using the -d option which invokes mipster in debugging mode. Careful though, there will be a lot of assembly code racing down your screen. We omit most of that here:

> ./selfie -c selfie.c -d 1
./selfie: this is selfie's starc compiling selfie.c
...
./selfie: this is selfie's mipster executing selfie.c with 1MB of physical memory
selfie.c: $pc=0x0(~1): 0x24080007: addiu $t0,$zero,7: $t0=0,$zero=0 -> $t0=7
selfie.c: $pc=0x4(~1): 0x24094000: addiu $t1,$zero,16384: $t1=0,$zero=0 -> $t1=1\
6384
selfie.c: $pc=0x8(~1): 0x01090019: multu $t0,$t1: $t0=7,$t1=16384,$lo=0 -> $lo=1\
14688
selfie.c: $pc=0xC(~1): 0x00004012: mflo $t0: $t0=7,$lo=114688 -> $t0=114688
selfie.c: $pc=0x10(~1): 0x00000000: nop
selfie.c: $pc=0x14(~1): 0x00000000: nop
selfie.c: $pc=0x18(~1): 0x25081B38: addiu $t0,$t0,6968: $t0=114688,$t0=114688 ->\
 $t0=121656
selfie.c: $pc=0x1C(~1): 0x251C0000: addiu $gp,$t0,0: $gp=0,$t0=121656 -> $gp=121\
656
selfie.c: $pc=0x20(~1): 0x24080FFF: addiu $t0,$zero,4095: $t0=121656,$zero=0 -> \
$t0=4095
selfie.c: $pc=0x24(~1): 0x24094000: addiu $t1,$zero,16384: $t1=16384,$zero=0 -> \
$t1=16384
selfie.c: $pc=0x28(~1): 0x01090019: multu $t0,$t1: $t0=4095,$t1=16384,$lo=114688\
 -> $lo=67092480
selfie.c: $pc=0x2C(~1): 0x00004012: mflo $t0: $t0=4095,$lo=67092480 -> $t0=67092\
480
selfie.c: $pc=0x30(~1): 0x00000000: nop
selfie.c: $pc=0x34(~1): 0x00000000: nop
selfie.c: $pc=0x38(~1): 0x25083FFC: addiu $t0,$t0,16380: $t0=67092480,$t0=670924\
80 -> $t0=67108860
selfie.c: $pc=0x3C(~1): 0x8D1D0000: lw $sp,0($t0): $sp=0,$t0=0x3FFFFFC -> $sp=67\
108832=memory[0x3FFFFFC]
selfie.c: $pc=0x40(~1): 0x0C007029: jal 0x7029[0x1C0A4]: $ra=0x0 -> $ra=0x48,$pc\
=0x1C0A4
          |
          |
          |
selfie.c: $pc=0x48(~1): 0x27BDFFFC: addiu $sp,$sp,-4: $sp=67108840,$sp=67108840 \
-> $sp=67108836
selfie.c: $pc=0x4C(~1): 0xAFA20000: sw $v0,0($sp): $v0=0,$sp=0x3FFFFE4 -> memory\
[0x3FFFFE4]=0=$v0
selfie.c: $pc=0x50(~1): 0x8FA40000: lw $a0,0($sp): $a0=1,$sp=0x3FFFFE4 -> $a0=0=\
memory[0x3FFFFE4]
selfie.c: $pc=0x54(~1): 0x27BD0004: addiu $sp,$sp,4: $sp=67108836,$sp=67108836 -\
> $sp=67108840
selfie.c: $pc=0x58(~1): 0x24020FA1: addiu $v0,$zero,4001: $v0=0,$zero=0 -> $v0=4\
001
selfie.c: $pc=0x5C(~1): 0x0000000C: syscall
selfie.c: exiting with exit code 0 and 0.00MB of mallocated memory
./selfie: this is selfie's mipster terminating selfie.c with exit code 0 and 0.1\
2MB of mapped memory
...

What is interesting about that output is the fact that each line on the screen shows a tiny bit of progress the machine is making during the run. In fact, all lines taken together make up what the semantics of selfie.c is during that run. Even though, in this case, it is only the response with its usage pattern there is quite a bit happening as you can see.

By the way, we can also slow down the machine to see what happens in slow motion by leveraging the fact that mipster can execute any machine code generated by starc. In particular, mipster can not only run starc but all of selfie including itself. Let us try that and see what happens if we load selfie.m into mipster and then run mipster on top of mipster to run selfie without any options. We call that self-execution of mipster:

> ./selfie -c selfie.c -o selfie.m -m 1 -l selfie.m -m 1
./selfie: this is selfie's starc compiling selfie.c
...
./selfie: 121660 bytes with 28779 instructions and 6544 bytes of data written in\
to selfie.m
./selfie: this is selfie's mipster executing selfie.m with 1MB of physical memory
selfie.m: 121660 bytes with 28779 instructions and 6544 bytes of data loaded fro\
m selfie.m
selfie.m: this is selfie's mipster executing selfie.m with 1MB of physical memory
selfie.m: usage: selfie { -c { source } | -o binary | -s assembly | -l binary } \
[ (-m | -d | -y | -min | -mob ) size ... ]
selfie.m: exiting with exit code 0 and 0.00MB of mallocated memory
selfie.m: this is selfie's mipster terminating selfie.m with exit code 0 and 0.1\
2MB of mapped memory
...
selfie.m: exiting with exit code 0 and 1.68MB of mallocated memory
./selfie: this is selfie's mipster terminating selfie.m with exit code 0 and 0.8\
7MB of mapped memory
...

Now, we could have starc compile selfie.c while running on top of two mipsters but this would take several hours depending on the speed of our computer. We could even do that on top of three or in fact any number of mipsters but the time for starc to complete would increase exponentially with the number of mipsters running on top of each other. This means three mipsters take days, four take months, and five take years!

The Hypervisor

A way to avoid the slowdown of running emulators is virtualization.

Virtualization
The act of creating a virtual (rather than actual) version of something, including virtual computer hardware platforms, operating systems, storage devices, and computer network resources.

With selfie, instead of having mipster execute mipster to execute starc, we can also have mipster execute hypster to host starc rather than to execute starc. Hypster is a hypervisor that mimics mipster without executing any machine code itself but instead asks the mipster on which it runs to execute the machine code on its behalf. For the machine code there is no difference except that it runs much faster.

Hypervisor
A piece of computer software, firmware or hardware that creates and runs virtual machines. A computer on which a hypervisor runs one or more virtual machines is called a host machine, and each virtual machine is called a guest machine.

Let us try using the same command as above for running starc on mipster but this time running starc on hypster, using the -y option, running on top of mipster:

> ./selfie -c selfie.c -o selfie3.m -m 2 -l selfie3.m -y 2 -c selfie.c -o selfie\
4.m
./selfie: this is selfie's starc compiling selfie.c
...
./selfie: 121660 bytes with 28779 instructions and 6544 bytes of data written in\
to selfie3.m
./selfie: this is selfie's mipster executing selfie3.m with 2MB of physical memo\
ry
selfie3.m: 121660 bytes with 28779 instructions and 6544 bytes of data loaded fr\
om selfie3.m
selfie3.m: this is selfie's hypster executing selfie3.m with 2MB of physical mem\
ory
selfie3.m: this is selfie's starc compiling selfie.c
...
selfie3.m: 121660 bytes with 28779 instructions and 6544 bytes of data written i\
nto selfie4.m
selfie3.m: exiting with exit code 0 and 1.05MB of mallocated memory
selfie3.m: this is selfie's hypster terminating selfie3.m with exit code 0 and 1\
.17MB of mapped memory
selfie3.m: exiting with exit code 0 and 2.68MB of mallocated memory
./selfie: this is selfie's mipster terminating selfie3.m with exit code 0 and 1.\
42MB of mapped memory
...
Hypster-Hosted Self-Compilation
Hypster-Hosted Self-Compilation

Again, selfie3.m and selfie4.m are identical and equal to selfie.m, selfie1.m, and selfie2.m. Since hypster can run on top of hypster, which is why we call hypster self-hosting, we could now run starc on two or in fact any number of hypsters and mipsters on at least one mipster and get the same result. Try four hypsters on one mipster and see for yourself that it will not take years but in fact just a few minutes, not much longer than using just one mipster:

> ./selfie -c selfie.c -o selfie5.m -m 3 -l selfie5.m -y 2 -l selfie5.m -y 2 -l \
selfie5.m -y 2 -l selfie5.m -y 2 -c selfie.c -o selfie6.m
./selfie: this is selfie's starc compiling selfie.c
...
./selfie: 121660 bytes with 28779 instructions and 6544 bytes of data written in\
to selfie5.m
./selfie: this is selfie's mipster executing selfie5.m with 3MB of physical memo\
ry
selfie5.m: 121660 bytes with 28779 instructions and 6544 bytes of data loaded fr\
om selfie5.m
selfie5.m: this is selfie's hypster executing selfie5.m with 2MB of physical mem\
ory
selfie5.m: 121660 bytes with 28779 instructions and 6544 bytes of data loaded fr\
om selfie5.m
selfie5.m: this is selfie's hypster executing selfie5.m with 2MB of physical mem\
ory
selfie5.m: 121660 bytes with 28779 instructions and 6544 bytes of data loaded fr\
om selfie5.m
selfie5.m: this is selfie's hypster executing selfie5.m with 2MB of physical mem\
ory
selfie5.m: 121660 bytes with 28779 instructions and 6544 bytes of data loaded fr\
om selfie5.m
selfie5.m: this is selfie's hypster executing selfie5.m with 2MB of physical mem\
ory
selfie5.m: this is selfie's starc compiling selfie.c
...
selfie5.m: 121660 bytes with 28779 instructions and 6544 bytes of data written i\
nto selfie6.m
selfie5.m: exiting with exit code 0 and 1.05MB of mallocated memory
selfie5.m: this is selfie's hypster terminating selfie5.m with exit code 0 and 1\
.17MB of mapped memory
selfie5.m: exiting with exit code 0 and 2.68MB of mallocated memory
selfie5.m: this is selfie's hypster terminating selfie5.m with exit code 0 and 1\
.42MB of mapped memory
selfie5.m: exiting with exit code 0 and 2.68MB of mallocated memory
selfie5.m: this is selfie's hypster terminating selfie5.m with exit code 0 and 1\
.68MB of mapped memory
selfie5.m: exiting with exit code 0 and 2.68MB of mallocated memory
selfie5.m: this is selfie's hypster terminating selfie5.m with exit code 0 and 1\
.94MB of mapped memory
selfie5.m: exiting with exit code 0 and 2.68MB of mallocated memory
./selfie: this is selfie's mipster terminating selfie5.m with exit code 0 and 2.\
19MB of mapped memory
...

Performance

A significant part of computer science is about computational performance. How much time does a computation take, how much memory does it need, and, increasingly important, how much energy does it consume? We introduce the standard units for quantifying computational performance in terms of time and space, and even in terms of energy.

Before doing so, we should nevertheless point out that in computer science the notion of order of magnitude may be used with its standard meaning as it is common in other disciplines or with a meaning rather unique to the field. Traditionally, something is an order of magnitude bigger or smaller if it is bigger or smaller by a factor of ten. However, in computer science, because of the widespread use of binary codes, an order of magnitude may also refer to a factor of two.

Order of Magnitude
A factor with base 10 (decimal) or with base 2 (binary).

As a consequence and even more confusing is the fact that commonly used prefixes such as kilo, mega, giga, and tera may either refer to factors of 103=1000 or 210=1024 depending on context and unit. Only recently new binary prefixes for kilo, mega, giga, tera, and so on, called kibi, mebi, gibi, tebi, respectively, have been introduced. Adoption is nevertheless slow.

Decimal Prefix Value Binary Prefix Value
nano (n) 10-9=1000-3    
micro (u) 10-6=1000-2    
milli (m) 10-3=1000-1    
kilo (k) 103=10001 kilo (K,Ki,kibi) 210 = 10241
mega (M) 106=10002 mega (M,Mi,mebi) 220 = 10242
giga (G) 109=10003 giga (G,Gi,gibi) 230 = 10243
tera (T) 1012=10004 tera (T,Ti,tebi) 240 = 10244

Memory storage is typically quantified in bytes with base 2, for example, in gigabytes or, more recently and unambiguously in gibibytes. Processor speed, however, may be represented in instructions per second with base 10, for example, in million instructions per second (MIPS). Similarly, special-purpose metrics such as FLOPS, floating-point operations per second, are prefixed with base 10. Also, data rates are often represented in bits per second with base 10, for example, in gigabits per second (gbps), that is, 109 bits per second.

Speed is generally characterized in terms of throughput, the amount of work done per unit of time, and latency, the amount of time to do some work, in particular before some other work can be done. The difference is usually explained with a simple example. Imagine a fiber optic cable connecting, say, New York City and San Francisco and a truck loaded with DVDs driving from New York City to San Francisco. Which one provides higher throughput and which one lower latency? Surprisingly, it may very well be possible that the truck provides higher throughput. However, delivering just a single bit by truck may take days. Thus the truck clearly provides terrible latency not suitable to host, say, a skype call. Got it?

Throughput
Amount of work performed per unit of time.
Latency
Amount of time (or delay) to perform work.

For completeness and because of the increasing importance of energy efficiency in computing, in particular in mobile computing and large-scale data center computing, we also list standard units for quantifying energy here even though there is no further discussion of that topic.

Energy is quantified in Joule and power consumption in Watt, that is, the amount of energy consumed per second. Interestingly, we can quantify the efficiency of computation in number of operations performed by that computation per Joule, or even in MIPS or FLOPS per Watt.

Performance Unit
memory bit (b), kilobit (kb), megabit (mb), gigabit (gb), terabit (tb) with decimal prefix
  byte (B), kilobyte (kB), megabyte (MB), gigabyte (GB), terabyte (TB) with decimal prefix
  byte (B), kibibyte (KB,KiB), mebibyte (MB,MiB), gibibyte (GB,GiB), tebibyte (TiB) with binary prefix
latency nanoseconds (ns), microseconds (us), milliseconds (ms), seconds (s), minutes (m), hours (h)
throughput million instructions per second (MIPS)
  floating point operations per second (FLOPS)
  bytes/second, kB/s, MB/s, GB/s, TB/s
  bits/second, kbps, mbps, gbps, tbps
energy joule
power joule/second (watt)
efficiency operations/joule
  MIPS/watt
  FLOPS/watt

Summary

Let us summarize what we have seen so far and prepare for the next chapter. This chapter is the first step towards understanding selfie by observing that:

In particular, the meaning of bits or characters or any symbols comes from change over time, by being changed (making them data) and by changing others (making them code). Let us have another look at our earlier example of adding 7 to 42.

Bits represent data such as an integer value at the time when they are used as operand of an integer operation. Bits represent code such as an integer operation at the time when they direct a machine to perform an integer operation. At all other times, their meaning could be anything. This also means that bits can sometimes be data and sometimes be code. Selfie is a non-trivial example of that phenomenon.

For now it is important to remember that semantics comes from change and that bits can be data, when changed, and code, when changing others. Selfie shows that through its self-referential nature. Selfie in source code can be data when being compiled and selfie in machine code can be code when being executed or hosted.

Furthermore, the source code of self-compiling compilers such as the starc compiler is written in the programming language that they compile. For example, selfie.c is written in C* which is a tiny subset of the widely used programming language C hence the extension .c in the name of the file.

C
A general-purpose, imperative computer programming language, supporting structured programming, lexical variable scope and recursion, while a static type system prevents many unintended operations.

C* is much easier to learn than all of C. We designed C* to make everything as simple as possible but still realistic. This is similar to using only a tiny but useful subset of the full vocabulary of a foreign language. It also means that all C* programs are C programs but not the other way around.

The starc compiler is part of selfie and thus written in C* but starc not only compiles selfie and in particular itself but all C* programs hence the name starc. In other words, starc is a C* compiler written in C*. We do not need to know in which programming language the cc compiler is written (likely C) but we do know that it compiles all C programs hence the name cc. Since C* is a subset of C the cc compiler also compiles selfie.

We conclude this chapter by pointing out that there was a time when there was no compiler for C and in fact no compiler for any programming language. The first compilers were therefore written in machine code or in some programming language and then compiled into machine code by hand. Thus even selfie’s origins go back to that time!

3. Encoding

Information, whatever it is, needs to be encoded in bits for a computer to handle it. Since selfie.c is a sequence of characters let us first look at how characters are encoded. After that we explain how sequences of characters form larger constructs such as strings, identifiers, and even positive and negative numbers. We also recall parts of elementary arithmetics and explain how that works on a computer.

Interestingly, the encoding of numbers can also be used to represent information on machine level. For example, the encoding of memory addresses and machine instructions can be done in the same or a similar way than the encoding of numbers. We explain that here as well.

Characters

We mention in the previous chapter that the characters in selfie.c are ASCII characters encoded in eight bits, that is, one byte according to the UTF-8 standard.

Both ASCII and UTF-8 are essentially one-to-one mappings from bits to characters written down in a table. Each 8-bit sequence is associated with exactly one character. For a computer that table is only relevant when the machine receives characters from a keyboard and sends characters to a screen. Then, the machine needs to know the ASCII code for each key on the keyboard to remember which key was pressed and it needs to know which shape of character to draw on the screen for each bit sequence. Internally, however, characters are just 8-bit sequences, with selfie and a lot of other software as well.

Let us pause for a moment and reflect about this. The above bit sequence could really still mean anything. The encoding standards for characters are just agreements on how to associate bits and characters but there is still no meaning.

So what is it now, U or 85? The answer is both, and anything else. As mentioned in the previous chapter, meaning comes from change. When the machine draws U for 01010101 on the screen then 01010101 stands for U in that moment but in the next moment the machine could increment 01010101 according to elementary arithmetic making 01010101 represent 85.

But how does selfie and in particular the starc compiler actually read characters from files such as selfie.c? It turns out that all characters are read from left to right using just a single line of source code in selfie.c. Similarly, all characters written to files and the screen are written from left to right using just one line of code in selfie.c. For further details on what the code means refer to the comments in the code.

In general, we only provide links to code with comments so that text explaining code is not duplicated here. You may want read the code in selfie.c as if it was some sort of mechanical English. There are a lot of comments whenever the code is not sufficiently self-explanatory. In other words, reading code and comments is part of the experience of reading this book!

Comments

Now, what is a comment in code anyway? A comment is text that the compiler ignores completely. It is only there for people to read and understand the code. In C*, a comment is all text to the right of two slashes // until the end of the line. There is a lot of that in the beginning of selfie.c. It actually takes a bit of scrolling down to see the first line of code that means something to the machine and is not a comment.

If we were to remove all comments from selfie.c the result would still be semantically equivalent to selfie.c from the perspective of the machine. In fact, we can safely remove even more characters called whitespace without changing any semantics.

Whitespace

Whitespace Character
Any character or series of characters that represent horizontal or vertical space in typography. When rendered, a whitespace character does not correspond to a visible mark, but typically does occupy an area on a page.

Whitespace characters are invisible but still important for formatting source code yet they are typically irrelevant in terms of semantics. While this is true for many programming languages including C and C* it is not true in general. There are programming languages in which whitespace does make a semantical difference. This is important to check when learning new programming languages.

The starc compiler considers the space, the tabulator, the line feed, and the carriage return characters whitespace and ignores them when reading source code:

> ./selfie -c selfie.c
./selfie: this is selfie's starc compiling selfie.c
./selfie: 176408 characters read in 7083 lines and 969 comments
./selfie: with 97779(55.55%) characters in 28914 actual symbols
...

Out of all the characters in selfie.c only a little more than half of the characters are actually considered code. The rest is whitespace and characters in comments. The code in selfie.c that starc uses to ignore whitespace and comments works by reading characters from left to right, one after the other, and discarding them until a character is found that is not whitespace and not occurring in a comment. This may also continue until the end of the file is reached without finding such a character. Important for us here is to understand that the machine really only looks at one character at a time from start to end of the file.

Let us have a look at the following “Hello World!” program written in C*:

A “Hello World!” Program
 1 // global variable for pointing to the "Hello World!" string
 2 int* foo;
 3 
 4 // main procedure for printing "Hello World!" on the console
 5 int* main() {
 6   // point to the "Hello World!" string
 7   foo = "Hello World!";
 8 
 9   // strings are actually stored in chunks of 4 characters in memory,
10   // that is, here as "Hell", "o Wo", and "rld!" which allows us to
11   // print them conveniently in chunks of 4 characters at a time
12 
13   // as long as there are characters print them
14   while (*foo != 0) {
15     // 1 means that we print to the console
16     // foo points to a chunk of 4 characters
17     // 4 means that we print 4 characters
18     write(1, foo, 4);
19 
20     // go to the next chunk of 4 characters
21     foo = foo + 1;
22   }
23 }

and run the code as follows (ignoring the compiler warning):

> ./selfie -c manuscript/code/hello-world.c -m 1
./selfie: this is selfie's starc compiling manuscript/code/hello-world.c
./selfie: warning in manuscript/code/hello-world.c in line 5: type mismatch, int\
 expected but int* found
./selfie: 729 characters read in 22 lines and 11 comments
./selfie: with 80(10.97%) characters in 39 actual symbols
./selfie: 1 global variables, 1 procedures, 1 string literals
./selfie: 1 calls, 2 assignments, 1 while, 0 if, 0 return
./selfie: 600 bytes generated with 144 instructions and 24 bytes of data
./selfie: this is selfie's mipster executing manuscript/code/hello-world.c with \
1MB of physical memory
Hello World!manuscript/code/hello-world.c: exiting with exit code 0 and 0.00MB o\
f mallocated memory
./selfie: this is selfie's mipster terminating manuscript/code/hello-world.c wit\
h exit code 0 and 0.00MB of mapped memory
...

There is a lot of whitespace for increasing the readability of the code as well as comments for explaining what the code does. In fact, only a little more than ten percent of the characters are actual code. Note that "Hello World!" is printed on the console right before the program exits.

Removing all whitespace and comments, also called minification, results in the following version:

Minified “Hello World!” Program
1 int*foo;int*main(){foo="Hello World!";while(*foo!=0){write(1,foo,4);foo=foo+1;}}

with this output when running the code:

> ./selfie -c manuscript/code/hello-world-minified.c -m 1
./selfie: this is selfie's starc compiling manuscript/code/hello-world-minified.c
./selfie: warning in manuscript/code/hello-world-minified.c in line 1: type mism\
atch, int expected but int* found
./selfie: 80 characters read in 0 lines and 0 comments
./selfie: with 80(100.00%) characters in 39 actual symbols
./selfie: 1 global variables, 1 procedures, 1 string literals
./selfie: 1 calls, 2 assignments, 1 while, 0 if, 0 return
./selfie: 600 bytes generated with 144 instructions and 24 bytes of data
./selfie: this is selfie's mipster executing manuscript/code/hello-world-minifie\
d.c with 1MB of physical memory
Hello World!manuscript/code/hello-world-minified.c: exiting with exit code 0 and\
 0.00MB of mallocated memory
./selfie: this is selfie's mipster terminating manuscript/code/hello-world-minif\
ied.c with exit code 0 and 0.00MB of mapped memory
...

This time all characters are actual code but otherwise the behavior is the same with "Hello World!" appearing on the console just like before.

Minification
The process of removing all unnecessary characters from source code without changing its functionality. These unnecessary characters usually include white space characters, new line characters, and comments, which are used to add readability to the code but are not required for it to execute.

Minification is useful for improving performance and saving bandwidth when communicating source code among machines. We only mention it here to show that whitespace and comments have typically no meaning for machines. In fact, both versions of the code are semantically equivalent. How can we check that? We can have selfie generate machine code for both as follows:

> ./selfie -c manuscript/code/hello-world.c -o hello-world.m -c manuscript/code/\
hello-world-minified.c -o hello-world-minified.m
./selfie: this is selfie's starc compiling manuscript/code/hello-world.c
...
./selfie: 600 bytes with 144 instructions and 24 bytes of data written into hell\
o-world.m
./selfie: this is selfie's starc compiling manuscript/code/hello-world-minified.c
...
./selfie: 600 bytes with 144 instructions and 24 bytes of data written into hell\
o-world-minified.m

and then check that both files are indeed identical:

> diff -s hello-world.m hello-world-minified.m
Files hello-world.m and hello-world-minified.m are identical

Strings

In computer science sequences of characters such as Hello World! are called strings.

String
A finite sequence of characters taken from some finite alphabet.

In selfie, for example, Hello World! is a string whose alphabet is in fact the printable ASCII characters UTF-8-encoded in eight bits, that is, one byte per character. However, the question is how such strings are handled and in particular encoded and stored in the memory of a computer.

Memory
Hardware device that stores information for immediate use in a computer; it is synonymous with the term “primary storage”.

Logically, memory is storage for bits as well as addresses for identifying those bits. Memory addresses are usually natural numbers from zero or some positive number to some larger positive number. To save addresses and increase speed of access, most memory is byte-addressed, that is, each address refers to a whole byte and not just a single bit. The size of byte-addressed memory, that is, the number of bytes that can be stored is the difference between the smallest and largest address plus one. The number of bits that can be stored is therefore eight times that value.

In selfie, strings are stored contiguously in memory and null-terminated but what are the alternatives? We could store the number of characters in a string or the address of the last character in front of the string. Some systems do that but not selfie. Also, we could store the string non-contiguously in memory but would then need to remember where the characters are. This would require more space to store that information and more time to find the characters but enable us to store strings even if sufficiently large contiguous memory was not available. These are interesting and fundamental tradeoffs that will become more relevant later. Important for us here is to know that there is a choice.

String Literals

You may have noticed the double quotes enclosing the Hello World! string in the “Hello World!” program. There are other sequences of characters in the program such as foo, for example, that also look like strings but are not enclosed with double quotes. The difference is that "Hello World!" is meant to be literally Hello World! whereas foo is an identifier that provides a name for something. If we were to change foo consistently in the whole program to an unused identifier such as bar, for example, the program would be semantically equivalent to the original version with foo. Changing "Hello World!" on the other hand would really change the output of the program. Try it!

String Literal
The representation of a string value within the source code of a computer program.

String literals in C* such as "Hello World!" make it convenient to read and write source code that needs to output text, for example. We make extensive use of string literals in selfie.c with strings for reporting compiler errors as just one example.

The code in selfie.c that actually recognizes a string literal in source code, after reading a double quote outside of a comment, first allocates memory not used by anyone to store the string. Then it reads one character at a time and stores the characters contiguously in memory until it reads another double quote. It then stores a null to terminate the string. This code ultimately determines how string literals in C* are handled.

Character Literals

There is also the notion of character literals in C* which we use in selfie.c in a number of situations, for example, for identifying characters that represent letters and for identifying characters that represent digits.

Character Literal
The representation of a character value within the source code of a computer program.

A character literal in C* such as 'a', for example, is a single character a enclosed with single quotes. However, character literals are actually quite different from string literals. A character literal represents the ASCII code of the enclosed character whereas a string literal is a sequence of characters which may contain any number of characters including just one or even none denoted by "". Note that '', on the other hand, is meaningless.

The code in selfie.c that identifies characters other than letters and digits is another example which shows how character literals are used. Take '{' as an example. If we were to replace '{' with 123 the semantics of the code would not change because 123 is the ASCII code of {. In other words, '{' stands for 123, that is, '{' is really just a human-readable version of the ASCII code of {.

The code in selfie.c that recognizes a character literal in source code, after reading a single quote outside of a comment, reads the next character and then stores the ASCII code of that character. It then looks for the second single quote and, if it is there, returns the ASCII code. Again, this code ultimately determines how character literals in C* are handled.

Identifiers

Let us now go back to the notion of identifiers and our example of the identifier foo in the “Hello World!” program. An identifier like foo is just a name of something.

Identifier
A name that identifies (that is, labels the identity of) either a unique entity or a unique class of entities. Some of the kinds of entities an identifier might denote include variables, types, and subroutines.

Identifiers in C* can indeed denote different kinds of entities. But, for now, we only need to know that, unlike string literals, identifiers in C* always begin with a letter. After that there may appear letters, digits, and underscores _ in any order but no other characters. Why is that? Because this is how the machine knows when an identifier begins and ends. Remember, identifiers are not enclosed by any special characters like double quotes, for example.

The code in selfie.c that recognizes an identifier in source code, after reading a letter outside of a comment, first allocates memory not used by anyone to store the identifier, just like a string. Then it reads one character at a time and stores the characters contiguously in memory until it reads a character that is neither a letter nor a digit nor an underscore. It then stores a null to terminate the identifier. However, before deciding whether it has just recognized an identifier the code checks if it has actually recognized a reserved identifier or keyword.

Keywords

C* features a number of keywords with special meaning that you can nevertheless safely ignore for now. The “Hello World!” program, for example, uses the int keyword twice and the while keyword once.

Keyword
In a computer language, a reserved word (also known as a reserved identifier) is a word that cannot be used as an identifier, such as the name of a variable, function, or label – it is “reserved from use”. This is a syntactic definition, and a reserved word may have no meaning. A closely related and often conflated notion is a keyword which is a word with special meaning in a particular context. This is a semantic definition. The terms “reserved word” and “keyword” are often used interchangeably – one may say that a reserved word is “reserved for use as a keyword”.

Since the keywords in C* all begin with a letter they should not be mistaken for identifiers. The code in selfie.c that distinguishes keywords from identifiers compares potential identifiers with all keywords to implement that distinction.

Integers

Numbers are important and computers are incredibly good at working with them. Not surprisingly, it is very easy to talk about numbers in C* and compute with them. For now, we look at how numbers are represented in source code and how they are encoded in digital memory. Numbers in selfie are signed integers, that is, whole numbers, positive or negative.

Integer
A number that can be written without a fractional component (from the Latin integer meaning “whole”). For example, 21, 4, 0, and −2048 are integers, while 9.75 and 5.5 are not. The set of integers consists of zero (0), the natural numbers (1, 2, 3, …), also called whole, counting, or positive numbers, and their additive inverses (the negative numbers, that is −1, −2, −3, …).

Beyond signed integers there is no support of, say, fixed-point or even floating-point numbers in C*. However, it is always possible to write code in C* based on integers that would support them. For example, there is code in selfie.c for printing profiling information that computes the fixed-point ratio of two integers as percentage with up to two fractional digits.

Numbers, positive or negative, are encoded, like everything else, in bits. Let us go back to the earlier example of the decimal number 85.

Both representations look different but mean the same thing and are even based on the same numeral system using positional notation. The only difference is their base or radix.

Radix
The number of unique digits, including zero, used to represent numbers in a positional numeral system.

Most importantly, arithmetic operations such as addition and subtraction work the same way for any base greater than one as long as we use the same base for the operands of the operation. For example, manually adding two binary numbers works just like adding two decimal numbers, more on that below.

While 1010101 is a binary number with base 2, the decimal number 85 is obviously encoded with base 10 in the hindu-arabic numeral system.

Hindu-Arabic Numeral System
A positional decimal numeral system and the most common for the symbolic representation of numbers in the world.

Selfie does in fact implement exactly the above computation of a recurrence relation for encoding numbers but only for numbers represented in decimal notation. An extension to other bases is nevertheless easy to do. Think about it and try!

The encoding is done in the procedure atoi which stands for ascii-to-integer. This is a standard procedure that converts a sequence of ASCII characters representing digits in positional notation to an integer value.

Selfie also implements the procedure symmetric to atoi called itoa which obviously stands for integer-to-ascii. It works by dividing the given integer by the desired base repeatedly until the quotient is zero and saving the remainders during the process. At the end, the sequence of remainders is reversed (hindu-arabic) and then returned.

The implementation of itoa in selfie not only supports decimal but also binary, octal (base 8), and hexadecimal (base 16) notation. We prepared a simple program called integer.c that prints the value represented by 85 in all possible ways supported by selfie as follows:

> ./selfie -c manuscript/code/integer.c selfie.c -m 1
./selfie: this is selfie's starc compiling manuscript/code/integer.c
...
./selfie: this is selfie's mipster executing manuscript/code/integer.c with 1MB \
of physical memory
85 in decimal:     85
'U' in ASCII:      85
"85" string:       85
85 in hexadecimal: 0x55
85 in octal:       00125
85 in binary:      1010101
manuscript/code/integer.c: exiting with exit code 0 and 0.00MB of mallocated mem\
ory
./selfie: this is selfie's mipster terminating manuscript/code/integer.c with ex\
it code 0 and 0.12MB of mapped memory
...

You may check for yourself that 0x55 and 00125 do in fact represent the value 85 by computing 5*16+5 and (1*8+2)*8+5.

Hexadecimal and octal notation are popular in computer science because their digits represent blocks of four and three bits, respectively. In other words, their bases are powers of four and three, respectively, of the base 2 of binary notation. Hexadecimal and octal notation are therefore convenient and compact notation for binary numbers. The base 10 of decimal notation is not a power of the base 2. Digits in decimal notation therefore do not evenly correspond to blocks of bits.

The actual encoding of numbers in the memory of a digital computer, however, is not in hexadecimal or octal but usually in binary. Before continuing to negative numbers let us point out that there is a way to represent numbers that is even more basic than binary. It is called unary and works by simply using an alphabet with a single digit for counting the value of a number.

Unary Numeral System
The bijective base-1 numeral system. It is the simplest numeral system to represent natural numbers: in order to represent a number N, an arbitrarily chosen symbol representing the value 1 is repeated N times.

Well, have you noticed the enormous difference in length between unary and other representations with higher bases?

The fundamental reason for the difference in size between different notations is that any representation with a base greater than one is exponentially more succinct than unary. However, representations with base greater than one are only different in size by a constant factor. For example, octal and hexadecimal representations are only three and four times, respectively, more succinct than binary. A larger alphabet makes even less of a difference, that is, logarithmically less in the size of the alphabet. ASCII, for example, is only seven times more succinct than binary although there are 128 ASCII characters.

Encoding Alphabet Base (Radix, Size of Alphabet) # Digits in Values \(n>1\) # Values in Digits \(n>0\)
Unary {1} 1 \(n\) \(n\)
Binary {0,1} 2 \(\lceil\frac{log(n)}{log(2)}\rceil\) \(2^n\)
Octal {0,1,2,3,4,5,6,7} 8 \(\lceil\frac{log(n)}{log(8)}\rceil\) \(8^n\)
Decimal {0,1,2,3,4,5,6,7,8,9} 10 \(\lceil\frac{log(n)}{log(10)}\rceil\) \(10^n\)
Hexadecimal {0,1,2,3,4,5,6,7,8,9, 16 \(\lceil\frac{log(n)}{log(16)}\rceil\) \(16^n\)
  A,B,C,D,E,F}      

Integer Arithmetics

Fortunately, elementary arithmetics with binary numbers works just like it does with decimal numbers or any other representation with base greater than one. For example, adding two numbers in any such representation works as usual by adding their digits from right to left while carrying any overflow to the left. Only unary is different! Elementary arithmetics with unary numbers is obviously done by, imagine, string concatenation.

In binary numbers the leftmost and rightmost bits have a meaning similar to the leftmost and rightmost digits in decimal numbers. The rightmost bit, also called least significant bit, determines if the number is even or odd. For example, 01010101 represents an odd number whereas 10101010 an even number. The leftmost bit, also called most significant bit, represents the greatest value. Thus 10101010 stands for a number larger than what 01010101 stands for.

Least Significant Bit (LSB)
The bit in a binary number that appears rightmost and determines if the number is even (0) or odd (1).
Most Significant Bit (MSB)
The bit in a binary number that appears leftmost and has the greatest value.

Now, what happens if we try to add two numbers where the result exceeds the number of digits necessary to represent them individually? For example, what if we compute 255+1=256 in binary? In this case, that is, 11111111+00000001, the result is 100000000, a binary number with 9 bits rather than the 8 bits representing 255. This is not a problem if we have more than 8 bits. However, with computers everything is finite, in particular memory. Moreover, arithmetic operations are on most machines implemented for bit strings with a fixed size such as 8 bits. On such machines adding 11111111 and 00000001 results in what is called arithmetic overflow.

Arithmetic Overflow
Occurs when an arithmetic operation attempts to create a numeric value that is too large to be represented within the available storage space.

How can we deal with arithmetic overflow? There are two approaches that can be combined: detection and semantics. If the occurrence of an arithmetic overflow can be detected one can discard the computation and do something else. For this purpose most processors feature a so-called carry bit or carry flag which is set if an arithmetic operation causes an overflow indicated by a carry out of the most significant bit. In our example, the 9-th bit in 100000000 is that carry bit.

In terms of semantics, if the result of an arithmetic overflow has a defined value, one may be able to use that value in a meaningful way. For example, a common semantics for n-bit arithmetics is to compute everything modulo 2n, also referred to as wrap-around semantics or just wrap around. For example, 255+1=256 modulo 28=256 modulo 256=0, which is exactly what 100000000 in an 8-bit system stands for. There are applications that are correct even when such wrap-arounds occur.

Modulo
The remainder after division of one number by another, also called modulus.

Arithmetic overflow nevertheless is the cause of numerous software bugs and even costly accidents. Restricting the space available for representing something that can be arbitrarily large such as numbers has serious consequences. Integer arithmetics are always an approximation of real arithmetics. For correctness, computer applications need to be properly adapted to work for integer arithmetics, not real arithmetics.

Signed Integers

Now, what about subtraction and negative numbers? Ideally, we would like to represent not just positive but also negative numbers just using bits with integer arithmetics on them still intact. Obviously, one bit, or more generally one digit, is enough to encode the sign of a number, that is, distinguish positive from negative numbers. Fortunately, however, there is an overall encoding scheme that works without changing integer arithmetics such as addition. In fact, subtraction will work by using addition, as previously discussed, with negative numbers.

So, how can we represent -85 and compute something like, say, 127-85? Again, fortunately, there is a representation of negative numbers that works for any representation with base greater than one and makes addition work like subtraction. We simply take the so-called radix complement to encode a negative number which only requires us to fix the number of digits that we intend to support. Subtracting a number then works by adding its complement and ignoring the overflow beyond the supported number of digits.

You may still ask how computing 1000-85 is any easier than computing 127-85 directly. Well, it is not but the radix complement of a number happens to be equal to the so-called diminished radix complement of that number plus one, which is in fact easier to compute than the radix complement.

Back to our example. What happens if we actually try to compute 85-127 instead? Let us go through that computation as well.

Using the ten’s complement for representing negative numbers really means that the leftmost digit represents the sign. In the decimal system, an even leftmost digit indicates that we are dealing with a positive number. Conversely, an odd leftmost digit such as the 9 in 958 means that the number is actually meant to be negative. This implies that with n digits and the ten’s complement for negative numbers we can represent signed integers i with -10n/2-1 < i < 10n/2. For example, in a 3-digit system we can represent signed integers i with -501 < i < 500. In other words, we can still represent 1000 numbers with 3 digits but offset by 500 into the negative numbers.

Now, how does a computer represent negative numbers? The short answer: the exact same way as in the decimal system, just in binary. In other words, a negative number may be represented in binary by its two’s complement (remember binary has base 2). Just like before, the two’s complement of a number is computed by taking the ones’ complement of that number and adding one to it. This time, however, computing the diminished radix complement, that is, the ones’ complement is particularly easy. It is done by simply inverting the bits.

Using two’s complement for representing negative numbers a byte can in total represent signed integer values i from -128 to 127, that is, with -28/2-1 = -27-1 = -129 < i < 128 = 27 = 28/2. Our running example fits these constraints. Moreover, the most significant bit has taken over a different role than before. It indicates if a number is positive or negative. For example, 01111111 represents a positive number whereas 10000000 stands for a negative number.

Ones’ Complement
All bits of the number inverted, that is, 0s swapped for 1s and vice versa.
Two’s Complement
Given an n-bit number i with -2n/2-1 = -2n-1-1 < i < 2n-1 = 2n/2, the complement of i with respect to 2n, that is, 2n-i which is equal to the ones’ complement of i plus one.
Most Significant Bit (MSB)
The bit in a binary number that appears leftmost and has the greatest value or, if the number represents a signed integer in two’s complement, determines if the integer is positive (0) or negative (1).

Finally, what happened to the problem of arithmetic overflow? It is still there. Only the carry bit is now insufficient to detect it. When can an overflow actually occur? Simple. Only when adding two numbers with the same sign! The result of an overflow is a number with the opposite sign. Thus we need to distinguish two cases: adding two positive numbers resulting in a negative number and adding two negative numbers resulting in a positive number. Both cases follow wrap-around semantics similar to unsigned integers.

In the first case, the MSBs of the two numbers are zero, since both represent positive numbers, and the MSB of the result is one. The second case is the exact opposite. How can we detect this? Easy. There is an overflow either if there is a carry into the MSB of the result but not out into the carry bit (wrapping two positive numbers into a negative number), or else if there is no carry into the MSB but out into the carry bit (wrapping two negative numbers into a positive number). Most processors feature an overflow bit or overflow flag that is set accordingly, that is, by a so-called exclusive or between the carry into and out of the MSB. An exclusive or of two bits is true if and only if the bits are different. Note, however, that just like the carry bit has no meaning when adding signed integers, the overflow bit has no meaning when adding unsigned integers.

Integer Literals

Similar to character and string literals, source code written in C* may contain integer values, also called integer literals, written in decimal notation.

Integer Literal
An integer whose value is directly represented in source code.

The code in selfie that reads integer literals is next to the code that reads identifiers. Similar to the characters of an identifier, the digits of an integer literal are first read and stored in a string. However, that string is then converted to an integer value using the atoi procedure.

What about negative numbers then? Can we write integer literals in C* that represent negative values? The answer is yes, very conveniently in fact, for example, by writing -85 which obviously represents the value -85. However, this notation is only an abbreviation for 0 - 85 which obviously represents the same value. When reading integer literals such as -85 the starc compiler does in fact generate code that subtracts the positive value 85 from 0 to obtain the negative value -85.

Symbols

Among all language elements of C* we have seen identifiers and keywords as well as character, string, and integer literals. The only missing elements are operator symbols and symbols for structuring source code. The complete list of C* symbols, also called tokens, is surprisingly small. Our “Hello World!” Program does not contain all possible symbols but at least one from each category, that is, keywords, identifiers, literals, operator symbols, and symbols for structuring source code.

Machine Words

So, before continuing let us point out that character, string, and integer literals are the only way to describe data in C* programs. All other language elements of C* including keywords and identifiers are there to describe code and manage memory! We already know that characters are encoded in bytes and strings are stored contiguously in byte-addressed memory. What about integers then? To answer that question we need to take a closer look at how a computer handles data.

First of all, in addition to memory, most computers contain at least one processor or central processing unit (CPU) connected to memory and other hardware. While memory stores information the CPU performs actual computation with that information.

Central Processing Unit (CPU)
The electronic circuitry within a computer that carries out the instructions of a computer program by performing the basic arithmetic, logical, control and input/output (I/O) operations specified by the instructions.

Instead of operating directly on memory, most CPUs load data from memory into registers, then perform some computation with that data, and finally store the result back in memory. The reason is performance. The registers of a CPU are by far the fastest storage available in a computer.

Register
A quickly accessible location available to a digital processor’s central processing unit (CPU). Registers usually consist of a small amount of fast storage, although some registers have specific hardware functions. Registers are typically addressed by mechanisms other than main memory.

Most registers of a CPU have the same size, that is, accommodate the same number of bits. Usually, data goes back and forth between memory and registers in chunks of that size called machine words or just words.

Word
The natural unit of data used by a particular processor design. A word is a fixed-sized piece of data handled as a unit by the instruction set or the hardware of the processor. The number of bits in a word (the word size, word width, or word length) is an important characteristic of any specific processor design or computer architecture.

The processor that the mipster emulator implements has a word size of 32 bits. In fact, virtually everything on that machine happens at the level of words. Loading data from memory and storing it back, arithmetic and logical operations among registers, and even fetching code from memory for execution, as we see below. The reason is again performance. Involving 32 bits in parallel in all operations is obviously faster than working with bits individually. You probably know that there are machines with even larger word sizes for even greater speed such as 64-bit machines, for example. We stick to 32-bit words though since it makes things easier for two reasons.

The first reason is that the size of an integer in C* is also 32 bits encoding the sign in the MSB and the value in the remaining 31 LSBs in two’s complement. This means that a single mipster register can hold exactly one C* integer value. Beautiful!

We prepared another simple program called negative.c that prints the numerical value represented by -85, and of INT_MAX and INT_MIN for reference, in all possible ways supported by selfie as follows:

> ./selfie -c manuscript/code/negative.c selfie.c -m 1
./selfie: this is selfie's starc compiling manuscript/code/negative.c
...
./selfie: this is selfie's mipster executing manuscript/code/negative.c with 1MB\
 of physical memory
    -85 in decimal:     -85
    -85 in hexadecimal: 0xFFFFFFAB
    -85 in octal:       0037777777653
    -85 in binary:      11111111111111111111111110101011
INT_MAX in decimal:     2147483647
INT_MAX in hexadecimal: 0x7FFFFFFF
INT_MAX in octal:       0017777777777
INT_MAX in binary:      01111111111111111111111111111111
INT_MIN in decimal:     -2147483648
INT_MIN in hexadecimal: 0x80000000
INT_MIN in octal:       0020000000000
INT_MIN in binary:      10000000000000000000000000000000
manuscript/code/negative.c: exiting with exit code 0 and 0.00MB of mallocated me\
mory
./selfie: this is selfie's mipster terminating manuscript/code/negative.c with e\
xit code 0 and 0.12MB of mapped memory
...

The second reason for using 32-bit words is that memory addresses in mipster and ultimately in C* are then 32 bits as well. This means in particular that the content of a register can also be seen as a memory address and not just an integer value. However, there is one important detail. On a mipster machine, memory is not only byte-addressed but also word-aligned for access. This means that words can only be accessed in memory at addresses that are multiples of four, the word size in bytes.

An interesting question is which bits in a word are used to represent which bytes in memory. The two standard choices are determined by the endianness of a processor.

Endianness
The order of the bytes that compose a digital word in computer memory. Words may be represented in big-endian or little-endian format. When storing a word in big-endian format the most significant byte, which is the byte containing the most significant bit, is stored first and the following bytes are stored in decreasing significance order with the least significant byte, which is the byte containing the least significant bit, thus being stored at last place. Little-endian format reverses the order of the sequence and stores the least significant byte at the first location with the most significant byte being stored last.

Interestingly, the endianness of a mipster machine is irrelevant since there is no way to access data in sizes other than word size directly in memory at addresses that are not multiples of the word size. In general, however, this is not true. Most processors allow accessing memory such that endianness does make a difference. We have deliberately ruled that out in selfie to make things simpler. Another simplification in selfie is the absence of any native support of bitwise operations.

Bitwise Operation
Operates on one or more bit patterns or binary numerals at the level of their individual bits. It is a fast, simple action directly supported by the processor, and is used to manipulate values for comparisons and calculations.

Why would we want support of that anyway? Well, how do we read and modify individual bits in registers and eventually memory if everything happens at word granularity? This is where bitwise operations come in. There are typically bitwise logical operations such as bitwise NOT, AND, and OR operations as well as bitwise shift operations such as left shift and logical right shift operations.

Bitwise operations have in common that they treat integers and words purely as sequences of bits. For example, a left shift operation shifts the bits in an integer or word to the left by a given amount of bits while shifting in zeros into the LSB. The logical right shift operation does the exact opposite.

We prepared another simple program called bitwise.c that prints the numerical value 3 in binary and decimal notation and then shifts it repeatedly by six bits to the left until it reaches 0. The program also performs a bitwise OR operation of all intermediate values and prints the result. The program then reverses direction and shifts the most recent value before reaching 0 repeatedly by six bits to the right until it reaches 0 again:

> ./selfie -c manuscript/code/bitwise.c selfie.c -m 1
./selfie: this is selfie's starc compiling manuscript/code/bitwise.c
...
./selfie: this is selfie's mipster executing manuscript/code/bitwise.c with 1MB \
of physical memory
00000000000000000000000000000011 in binary = 3 in decimal
00000000000000000000000011000000 in binary = 192 in decimal
00000000000000000011000000000000 in binary = 12288 in decimal
00000000000011000000000000000000 in binary = 786432 in decimal
00000011000000000000000000000000 in binary = 50331648 in decimal
11000000000000000000000000000000 in binary = -1073741824 in decimal
11000011000011000011000011000011 in binary = -1022611261 in decimal
11000000000000000000000000000000 in binary = -1073741824 in decimal
00000011000000000000000000000000 in binary = 50331648 in decimal
00000000000011000000000000000000 in binary = 786432 in decimal
00000000000000000011000000000000 in binary = 12288 in decimal
00000000000000000000000011000000 in binary = 192 in decimal
00000000000000000000000000000011 in binary = 3 in decimal
manuscript/code/bitwise.c: exiting with exit code 0 and 0.00MB of mallocated mem\
ory
...

But how did we do this if there is no native support of bitwise operations? Well, we use integer arithmetics and wrap-around semantics to provide bitwise OR, left shift, and logical right shift operations in selfie.

In the bitwise.c program we shift the bits representing the integer value 3 to the left by six bits by multiplying the value repeatedly with 64=26 until the value wraps around and becomes negative.

The bitwise.c program demonstrates that by performing a bitwise OR operation on all intermediate values through simple integer addition.

The bitwise.c program applies that method to shift the bits to the right repeatedly by six bits back to their original position.

Interestingly, the bitwise OR, left shift, and logical right shift operations presented here are sufficient to implement all of selfie!

Before moving on let us quickly revisit how characters and strings are stored in memory. In selfie characters are represented by 8 bits. A 32-bit word may therefore hold up to four characters. This is in fact done to store the characters of strings in memory. A character literal, however, is stored in the eight LSBs of a word leaving the remaining bits blank.

Try the following command to see that our “Hello World!” program does actually print the characters in chunks of four by printing the three words containing the characters directly on the console. The command creates three mipsters on top of each other slowing down the execution of the program to the extent that the behavior is really visible:

> ./selfie -c manuscript/code/hello-world.c -o hello-world.m -c selfie.c -o self\
ie.m -m 1 -l selfie.m -m 1 -l hello-world.m -m 1
...

This is nice but how do we then access individual characters of a string? Simple, by using our bitwise operations, of course! Selfie implements loading and storing characters of strings in memory accordingly.

So, on the machine everything related to data happens at the granularity of words. Interesting. What is even more fascinating is that even memory addresses and all machine code is handled at that granularity as well.

Memory Addresses

The address of a byte or in fact a word in memory is obviously a positive number. On a mipster machine and many others as well, memory addresses are also represented by words. Thus the word size of such machines determines how many memory addresses can be distinguished and, as a consequence, how much memory can be accessed. A mipster machine, for example, supports up to 64MB of byte-addressed and word-aligned memory. Addressing that memory thus requires the 26 LSBs out of the 32 bits of a word. The remaining 6 MSBs are unused.

Let us reflect on that for a moment. So, on a mipster machine, the 32 bits of a word can be used to encode characters, signed integers, and even memory addresses! That’s right and this is not all. Even machine instructions are represented by words which is our next topic.

Instructions

The code of a mipster machine is represented by a sequence of machine words where each word encodes a machine instruction. It is that easy and actually true for many other processors as well. The exact specification of the encoding as well as the meaning of each instruction is provided by the instruction set architecture or ISA for short.

Instruction Set Architecture (ISA)
The part of the computer architecture related to programming, including the native data types, instructions, registers, addressing modes, memory architecture, interrupt and exception handling, and external I/O. An ISA includes a specification of the set of opcodes (machine language), and the native commands implemented by a particular processor. An instruction set is the interface between a computer’s software and its hardware, and thereby enables the independent development of these two computing realms.

Let us have another look at the first few lines of the assembly code in selfie.s presented in the previous chapter:

0x0(~1): 0x24080007: addiu $t0,$zero,7
0x4(~1): 0x24094000: addiu $t1,$zero,16384
0x8(~1): 0x01090019: multu $t0,$t1
0xC(~1): 0x00004012: mflo $t0
0x10(~1): 0x00000000: nop
0x14(~1): 0x00000000: nop
0x18(~1): 0x25081B38: addiu $t0,$t0,6968
0x1C(~1): 0x251C0000: addiu $gp,$t0,0
0x20(~1): 0x24080FFF: addiu $t0,$zero,4095
0x24(~1): 0x24094000: addiu $t1,$zero,16384
0x28(~1): 0x01090019: multu $t0,$t1
0x2C(~1): 0x00004012: mflo $t0
0x30(~1): 0x00000000: nop
0x34(~1): 0x00000000: nop
0x38(~1): 0x25083FFC: addiu $t0,$t0,16380
0x3C(~1): 0x8D1D0000: lw $sp,0($t0)
0x40(~1): 0x0C007029: jal 0x7029[0x1C0A4]
0x44(~1): 0x00000000: nop

Each line represents one machine instruction. The first line, for example, reads like this. The hexadecimal number 0x0 is the word-aligned memory address of the instruction in memory. The expression (~1) is the approximate line number of the source code, in this case selfie.c, that was compiled to this instruction. The 32-bit word 0x24080007 is in fact the in binary encoded version of the instruction itself. Finally, addiu $t0,$zero,7 is the human-readable assembly version of the instruction. This means in particular that 0x24080007 and addiu $t0,$zero,7 are semantically equivalent. The 32-bit word 0x24080007 in binary stored at address 0x0 in memory is thus the only thing that the machine needs, the rest is for us to make it readable.

The machine code in selfie.m presented in the previous chapter contains just that binary code, that is, 0x24080007 followed by 0x24094000 from the second line in selfie.s and so on. To prepare the machine for executing that code we only need to load selfie.m into memory starting at address 0x0 and then tell the machine to execute the code. How this is done is part of the next chapter.

So, how do we know that 0x24080007 represents addiu $t0,$zero,7? This is specified precisely by the ISA of the widely used MIPS processor family.

MIPS
A reduced instruction set computer (RISC) instruction set architecture (ISA) developed by MIPS Technologies (formerly MIPS Computer Systems, Inc.). The early MIPS architectures were 32-bit, with 64-bit versions added later. Multiple revisions of the MIPS instruction set exist, including MIPS I, MIPS II, MIPS III, MIPS IV, MIPS V, MIPS32, and MIPS64. The current revisions are MIPS32 (for 32-bit implementations) and MIPS64 (for 64-bit implementations).

The mipster emulator implements a strict subset of the instructions of the MIPS32 processor which we call MIPSter. In fact, mipster only implements 17 out of the more than 40 instructions of MIPS32. The starc compiler generates MIPSter code that runs on mipster and thus, at least in principle, also on a real MIPS32 processor. The converse is not true, of course. MIPS32 code does in general not run on mipster but that is not a problem here.

Let us go back to the above example. The addiu string in addiu $t0,$zero,7 is an assembly mnemonic of the operation code or opcode, for short, that identifies the operation to be performed while the remaining $t0,$zero,7 are three operands of that operation.

Opcode
The portion of a machine language instruction that specifies the operation to be performed. Beside the opcode itself, most instructions also specify the data they will process, in the form of operands.

In our example, addiu instructs the processor to add the value of the integer 7 to the value stored in register $zero (which is always 0) and store the result in register $t0 (which will thus contain the value 7 after executing the instruction).

A MIPS processor has 32 32-bit registers that can be used for this purpose. They are numbered from 0 to 31. The register $zero is obviously register 0 while the register $t0, less obviously, happens to be register 8. The only missing piece of information is that addiu is represented by the opcode 9. Now, how do we get from there to 0x24080007?

We take the opcode 9 (001001), register 0 (00000), register 8 (01000), and value 7 (0000000000000111) and put them together in binary as follows:

001001 00000 01000 0000000000000111

If you merge that into a 32-bit word you get 0x24080007. The MIPS32 ISA specifies that the six MSBs encode the opcode 9, the next five bits encode the second (!) operand register 0, the following five bits encode the first (!) operand register 8, and lastly the remaining sixteen LSBs encode the third operand value 7, in 16-bit two’s complement in fact so it could even be a negative value. The fact that the third operand is treated by addiu as an integer value rather than, say, a number identifying a register is called immediate addressing hence the i in addiu. Immediate addressing is one of various so-called addressing modes. The u in addiu stands for unsigned which is misleading. Its actual meaning is that arithmetic overflows with addiu are ignored while wrap-around semantics apply.

Addressing Mode
An aspect of the instruction set architecture in most central processing unit (CPU) designs. The various addressing modes that are defined in a given instruction set architecture define how machine language instructions in that architecture identify the operand(s) of each instruction.

But why does the ISA provision five bits for the first and second operand? Because five bits allow us to address exactly the 25=32 different registers of the machine. The six bits for the opcode obviously allow us to distinguish up to 26=64 different opcodes but we actually do not need that many. Now, think about the range of values that can be encoded in two’s complement in the sixteen bits for the third operand! This is the range of values that we can get into a register such as $t0$ with a single addiu instruction. In other words, we can use that instruction to initialize registers. Cool!

Note that the value in register $zero is assumed to be 0 at all times. This is in fact needed for initializing registers using the addiu instruction. There exists MIPS assembly in which such intention is made explicit by using pseudo instructions. Here, the pseudo instruction movi $t0,7, for example, could be used instead but would anyhow just be a short version of addiu $t0,$zero,7. The remaining sixteen instructions of MIPSter are just as simple, we introduce them on the fly as needed.

So, how does the starc compiler generate such code? It uses bitwise operations, of course, that is, bitwise OR and left shifting in particular. There are in total three different formats in MIPS32 depending on the opcode. The actual source code in selfie.c for encoding machine instructions is nevertheless pretty straightforward.

An interesting feature of the implementation of selfie in a single file is that the source code for decoding machine instructions, which is used by the mipster emulator and selfie’s disassembler, is right after the code for encoding instructions. Decoding machine code performs the exact inverse to encoding machine code extracting the opcode and operands that were originally encoded. It is done by a combination of left and logical right shifting. See for yourself how this works in the code! It may look technical but is actually very simple.

Summary

In this chapter we have seen how characters, strings, identifiers, integers, and even machine instructions are encoded and decoded, and how all that allows us to represent source and machine code using just bits. We even understand now some of the output of the starc compiler when compiling our “Hello World!” program, for example:

> ./selfie -c manuscript/code/hello-world.c -m 1
./selfie: this is selfie's starc compiling manuscript/code/hello-world.c
...
./selfie: 729 characters read in 22 lines and 11 comments
./selfie: with 80(10.97%) characters in 39 actual symbols
./selfie: 1 global variables, 1 procedures, 1 string literals
./selfie: 1 calls, 2 assignments, 1 while, 0 if, 0 return
./selfie: 600 bytes generated with 144 instructions and 24 bytes of data
...

The compiler counts characters, lines, comments, whitespace, and symbols in source code, and instructions and bytes of data in the generated machine code. The information about global variables, procedures, calls, assignments, while, if, and return will become clear in the next chapter.

But there is still something missing here. Why is all of this encoded the way it is and not some other way? There are two important reasons:

  1. Time: Depending on the encoding, data can be processed faster or slower, even fundamentally faster or slower in the sense that some encoding may not allow reasonably fast processing at all.

    But why do we not always pick the encoding that supports the fastest possible processing? We could do that but it could make the actual encoding process slow and in some cases even fundamentally slow, and because:

  2. Space: The amount of bits needed to encode data can vary significantly and even fundamentally depending on the encoding scheme.

Machine instructions are also encoded such that code size is compact while decoding them, which is ultimately done in hardware by the machine, is still fast. This is, of course, extremely important since every single instruction a processor executes needs to be decoded into opcode and operands before execution.

In computer science the trade off between time and space shows up in all kinds of situations. Encoding information is just one example. The important lesson here is to be aware of the trade off and understand that there is no such thing as a free lunch!

4. State

Computation is the evolution of state. At any given time, a computer stores a very large but still finite amount of bits in memory and registers. The values of all these bits together is what we call the state of the machine. Then the processor executes one instruction which directs it to change the values of a very small number of bits creating a new state. That process of change from one state to the next continues until the machine is turned off.

State
The state of a digital logic circuit or computer program is a technical term for all the stored information, at a given instant in time, to which the circuit or program has access. The output of a digital circuit or computer program at any time is completely determined by its current inputs and its state.

Software on source and machine code level specifies for each state what the next state is. There are the data bits that are being changed and the code bits that determine that change. Input is typically modeled by data bits that are changed by something other than the processor such as a keyboard, for example.

In this chapter, we explain how on machine level memory and registers represent state and how machine code describes change of that state. We also show how source code provides a high-level view of program state and how that translates down to machine code level. Seeing both levels and how they connect is an important step towards learning how to code.

Von Neumann Machine

Most general-purpose computers today are based on what is known as the von Neumann model or von Neumann architecture.

Von Neumann Architecture
A computer architecture, described in 1945 by the mathematician and physicist John von Neumann and others, for an electronic digital computer with parts consisting of a processing unit containing an arithmetic logic unit and processor registers; a control unit containing an instruction register and program counter; a memory to store both data and instructions; external mass storage; and input and output mechanisms.
Von Neumann Architecture
Von Neumann Architecture

The CPU and main memory in a typical computer correspond to the processing unit and memory of a von Neumann architecture, respectively. The mipster emulator is no exception. It emulates a von Neumann machine.

The key idea is very simple. A von Neumann machine is a stored-program computer which stores both code and data in the same memory. In fact, in memory there is really no distinction between code and data. A von Neumann machine fetches code from memory and executes it. The code will in turn instruct the machine to load data from memory into registers, perform some computation on registers, and finally store the results back in memory. We explain the details of how this works below. For now it is only important to understand that bits fetched from memory and executed happen to represent code in that moment while bits loaded from memory into registers, then modified, and finally stored back in memory represent data in that moment. At all other times, they are just bits.

State Space

A von Neumann machine that can store n bits in memory and registers can distinguish 2n states.

Interestingly, we can always, at least in principle, partition that enormously large state space into a set of good states and a set of bad states. Software without bugs would always keep the machine in good states, or conversely, prevent the machine from ever going to a bad state. However, what is a good state?

The answer to that question depends on what we would like the machine to do, it depends on the application. But most applications have nothing to do with individual bits. We therefore need formalisms that allow us to formalize what we want the machine to do on the level of the applications we are interested in. This is the reason why high-level programming languages were invented.

Since there are new applications or at least application characteristics appearing all the time new programming languages are also invented all the time. The key to being able to follow that trend is to understand the principles of programming languages and how they translate to machine level.

The programming language C of which we use a tiny subset here was originally designed for programming systems software such as operating systems and compilers. However, C has become very popular in many other application domains as well. Programming in C is called imperative programming which closely matches the imperative nature of most computers. It is therefore relatively straightforward to compile C code to machine code and manipulate machine states in C code even at the level of individual bits.

Imperative Programming
A programming paradigm that uses statements that change program’s state. In much the same way that the imperative mood in natural languages expresses commands, an imperative program consists of commands for the computer to perform. Imperative programming focuses on describing how a program operates.

Imperative programming is supported by many programming languages but it is not the only programming paradigm. Declarative programming is an important alternative that is also supported by many programming languages but handles program state differently.

Declarative Programming
A programming paradigm — a style of building the structure and elements of computer programs — that expresses the logic of a computation without describing its control flow.

Intuitively, rather than saying imperatively how to change state, declarative programming focuses on declaring what needs to change. While spelling out how to change state can become tedious with imperative programming spelling out what to change can become burdensome with declarative programming. Yet both paradigms have their important use cases and a port of selfie to a declarative programming language would be very nice to have but remains future work for now.

Before explaining how C* code works, we introduce C* language elements that allow us to describe program state as a high-level abstraction of machine state. Code written in C* then operates on that program state. Let us have a look at the following C* program which we call the countdown program or just countdown:

The Countdown Program
 1 // declaring bar as a global variable of type int and
 2 // defining the decimal value 10 as initial value of bar
 3 int bar = 10;
 4 
 5 // declaring main as a procedure with no arguments and return type int
 6 int main() {
 7   // defining main to be the following code
 8 
 9   // while bar is greater than the decimal value 0 ...
10   // otherwise continue to the return statement below
11   while (bar > 0) {
12     // ... subtract the decimal value 1 from bar and assign the result to bar
13     bar = bar - 1;
14 
15     // go back to the while statement and check again
16   }
17 
18   // return the value of bar as the result of the invocation of main
19   return bar;
20 }

The program takes the decimal value 10 (Line 3) and decrements it (Line 13) until it reaches the decimal value 0 (Line 11) which is then returned (Line 19) as so-called exit code. To see for yourself run the code as follows:

 1 > ./selfie -c manuscript/code/countdown.c -o countdown.m -s countdown.s -m 1
 2 ./selfie: this is selfie's starc compiling manuscript/code/countdown.c
 3 ./selfie: 625 characters read in 19 lines and 9 comments
 4 ./selfie: with 55(8.80%) characters in 28 actual symbols
 5 ./selfie: 1 global variables, 1 procedures, 0 string literals
 6 ./selfie: 0 calls, 1 assignments, 1 while, 0 if, 1 return
 7 ./selfie: 496 bytes generated with 122 instructions and 8 bytes of data
 8 ./selfie: 496 bytes with 122 instructions and 8 bytes of data written into count\
 9 down.m
10 ./selfie: 4301 characters of assembly with 122 instructions written into countdo\
11 wn.s
12 ./selfie: this is selfie's mipster executing countdown.m with 1MB of physical me\
13 mory
14 countdown.m: exiting with exit code 0 and 0.00MB of mallocated memory
15 ./selfie: this is selfie's mipster terminating countdown.m with exit code 0 and \
16 0.00MB of mapped memory
17 ./selfie: profile: total,max(ratio%)@addr(line#),2max(ratio%)@addr(line#),3max(r\
18 atio%)@addr(line#)
19 ./selfie: calls: 1,1(100.00%)@0x17C(~11),0(0.00%),0(0.00%)
20 ./selfie: loops: 10,10(100.00%)@0x190(~11),0(0.00%),0(0.00%)
21 ./selfie: loads: 26,11(42.37%)@0x190(~11),10(38.46%)@0x1A4(~13),1(3.84%)@0x24(~1)
22 ./selfie: stores: 13,10(76.92%)@0x1B0(~13),1(7.69%)@0x4C(~1),0(0.00%)

Global Variable

For the countdown program to be able to operate on a number there needs to be memory to store that number. For this purpose, Line 3 in the source code declares a so-called global variable called bar. The starc compiler even reports that it found exactly that one global variable, see Line 5 in the above output.

Global Variable
A variable with global scope, meaning that it is visible (hence accessible) throughout the program, unless shadowed. The set of all global variables is known as the global environment or global state.

So global really just means here that bar can be used throughout the program. Line 3 is thus a declaration that specifies that the identifier bar refers to the same variable in the whole program.

Declaration
Specifies properties of an identifier: it declares what an identifier means. Declarations are most commonly used for functions, variables, constants, and classes. Beyond the name (the identifier itself) and the kind of entity (function, variable, etc.), declarations typically specify the data type (for variables and constants), or the type signature (for functions). The term “declaration” is frequently contrasted with the term “definition”, but meaning and usage varies significantly between languages.

Line 3 not only declares bar but also defines the initial value of bar as the decimal value 10 represented by the integer literal 10. The initial value of a global variable is nevertheless optional. Line 3 could be rewritten to int bar; in which case the value of bar would be initially undefined meaning it could initially be any value. Undefined values are a common source of errors, if programs depend on them. Modern compilers usually warn programmers about that (not starc though since we need to keep things simple). A simple way to avoid depending on undefined values is to either provide an initial value for a variable or to assign a value to a variable before using the variable in any computation, see below for more about how to do that. A program that does not depend on undefined values has a single initial state from which it begins all computations. This is what we want!

Note that the equality sign = in Line 3 is merely syntactic sugar making the code more readable while the semicolon ; is a so-called terminator which indicates the end of a statement. After the semicolon we could insert more global variable declarations as long as they all were to introduce unique identifiers and were properly terminated with semicolons. Programming languages newer than C often make such terminators optional or omit them altogether since they are, similar to syntactic sugar, not necessary for the compiler to work and, unlike syntactic sugar, sometimes considered a burden.

Data Type

Line 3 also specifies that the data type of bar is int which, according to the C standard, means that bar represents a signed 32-bit integer, that is, 32 bits encoding a positive or negative number in two’s complement. It also means that arithmetic operations involving bar will be done with 32-bit wrap-around semantics.

Data Type
A classification of data which tells the compiler or interpreter how the programmer intends to use the data. Most programming languages support various types of data, for example: real, integer, or Boolean. A data type provides a set of values from which an expression (i.e. variable, function …) may take its values. The type defines the operations that can be done on the data, the meaning of the data, and the way values of that type can be stored.

So, this is important! A data type tells us and the compiler what the intended meaning of the bits are that encode the data. Remember, bits can encode anything and have no meaning unless they are involved in an operation. Data types therefore help with identifying meaning even without performing any operations.

A global variable of type int such as bar provides storage for 32 bits which happens to match the size of a word on a mipster machine. In fact, the value of bar will be stored in exactly one word somewhere in memory. First of all, this means that bar provides storage that is identified by the identifier bar and not by some memory address. But it also means that the program as is cannot access any other bits in memory than the 32 bits identified by bar which obviously reduces the size of the state space dramatically! So the program state space is much smaller than the machine state space and therefore much easier to reason about. However, there is also code in countdown that operates on bar. Let us have a closer look at how that is introduced in C*.

Procedure

The source code of the countdown program declares a so-called procedure called main in Line 6. The broader term for procedures is subroutines defined as follows.

Subroutine
A sequence of program instructions that perform a specific task, packaged as a unit. This unit can then be used in programs wherever that particular task should be performed. Subprograms may be defined within programs, or separately in libraries that can be used by multiple programs. In different programming languages, a subroutine may be called a procedure, a function, a routine, a method, or a subprogram.

In C subroutines are called procedures. Line 6 specifies that main refers to a procedure rather than a global variable simply by using () after the identifier. In fact, it would be perfectly fine to just say int main();. However, the code enclosed in {}, called the body of the procedure, also defines the implementation of the procedure. It describes what the procedure does. We get to that further below. The int keyword before main specifies that the so-called return type of the procedure is a signed 32-bit integer. This means that the procedure returns a signed 32-bit integer value when done.

Global variable and procedure declarations, as in Lines 3 and 6, may use any identifier not used anywhere else in the program. In other words, identifiers used in declarations must be unique. The main procedure name, however, is even more special because the main procedure is the one that is invoked when a C program starts executing. Thus a valid C program needs to contain exactly one declaration and definition of a procedure called main. Otherwise, the system would not “know” what to execute. See for yourself by renaming main in countdown.c to something else. When main returns the program stops executing and the system outputs the value returned by main, which is 0, as the previously mentioned exit code, see Line 10 in the above output.

There can be any number of global variable and procedure declarations in a C program. The starc compiler reports in Line 5 in the above output that there is exactly one procedure declared in countdown.c which is in fact the main procedure.

Line 7 in the above output mentions that starc generated exactly 496 bytes for countdown.c representing 122 instructions (remember each instruction takes 4 bytes) and 8 bytes of data (496=122*4+8). The 122 instructions represent the machine code generated by starc for initializing and controlling the machine and in particular for implementing the main procedure, that is, the C* code in Lines 11-19 in countdown.c. Out of the 8 bytes of data, 4 bytes represent the initial value of bar which is 10. The other 4 bytes encode the amount of bytes needed to represent the instructions, that is, the value 488 which is equal to 122*4. This information is necessary to determine which bytes are code and which are data.

Selfie can write the generated bytes into a binary file or just binary (using the -o option) that can later be loaded again (using the -l option) and executed (using the -m option, for example).

Binary File
A computer file that is not a text file. The term “binary file” is often used as a term meaning “non-text file”. Binary files are usually thought of as being a sequence of bytes, which means the binary digits (bits) are grouped in eights. Binary files typically contain bytes that are intended to be interpreted as something other than text characters. Compiled computer programs are typical examples; indeed, compiled applications are sometimes referred to, particularly by programmers, as binaries.

In our example, selfie is instructed to write the generated bytes into a binary file called countdown.m, as reported in Line 8 in the above output. The countdown.m binary is what is known as an executable.

Executable
Causes a computer “to perform indicated tasks according to encoded instructions,” as opposed to a data file that must be parsed by a program to be meaningful. These instructions are traditionally machine code instructions for a physical CPU.

We may use the terms binary and executable interchangeably even though, strictly speaking, there are binary files such as image files, for example, that are obviously not executable. However, executables are usually binaries.

The format of the countdown.m executable is very simple. It begins with the 4 bytes encoding the number of bytes of code (representing 488 here) followed by the 488 bytes of code followed by the 4 bytes of data (representing 10 here). That’s it.

When invoking selfie on the countdown program, we also instructed selfie to produce an assembly file countdown.s for the countdown program (using the -s option) that represents a human-readable version of the binary file countdown.m. Selfie reports on that in Line 9 in the above output. Note that countdown.s only contains code but not any data such as the initial value of bar.

Here is countdown.s but only showing the instructions that will actually be executed when running the program. The instructions that will not be executed, shown as ..., are generated by starc because they may be needed by some programs, just not the countdown program. The comments on the right are inserted manually and explained below.

0x0(~1): 0x240801EC: addiu $t0,$zero,492           // initialize global pointer
0x4(~1): 0x251C0000: addiu $gp,$t0,0               // via $t0 register with 492
0x8(~1): 0x24080FFF: addiu $t0,$zero,4095          // initialize stack pointer
0xC(~1): 0x24094000: addiu $t1,$zero,16384         // with the machine word at
0x10(~1): 0x01090019: multu $t0,$t1                // address 4095*16384+16380
0x14(~1): 0x00004012: mflo $t0                     // by loading 4095 into $t0
0x18(~1): 0x00000000: nop                          // and 16384 into $t1, then
0x1C(~1): 0x00000000: nop                          // multiplying $t0 with $t1,
0x20(~1): 0x25083FFC: addiu $t0,$t0,16380          // and adding 16380 to the
0x24(~1): 0x8D1D0000: lw $sp,0($t0)                // result in $t0, and finally
0x28(~1): 0x00000000: nop                          // loading the word at the
0x2C(~1): 0x00000000: nop                          // address in $t0.
0x30(~1): 0x00000000: nop
0x34(~1): 0x00000000: nop
0x38(~1): 0x00000000: nop
0x3C(~1): 0x00000000: nop
0x40(~1): 0x0C00005F: jal 0x5F[0x17C]              // jump to main procedure
0x44(~1): 0x00000000: nop                          // at address 0x17C and
0x48(~1): 0x27BDFFFC: addiu $sp,$sp,-4             // return here when done.
0x4C(~1): 0xAFA20000: sw $v0,0($sp)                // copy exit code in $v0
0x50(~1): 0x8FA40000: lw $a0,0($sp)                // register to $a0 register,
0x54(~1): 0x27BD0004: addiu $sp,$sp,4              // load exit syscall ID
0x58(~1): 0x24020FA1: addiu $v0,$zero,4001         // 4001 into $v0, and
0x5C(~1): 0x0000000C: syscall                      // finally exit here.
...
0x17C(~11): 0x27BDFFFC: addiu $sp,$sp,-4           // procedure prologue
0x180(~11): 0xAFBF0000: sw $ra,0($sp)              // explained later in
0x184(~11): 0x27BDFFFC: addiu $sp,$sp,-4           // stack chapter.
0x188(~11): 0xAFBE0000: sw $fp,0($sp)
0x18C(~11): 0x27BE0000: addiu $fp,$sp,0
0x190(~11): 0x8F88FFFC: lw $t0,-4($gp)             // load bar into $t0,
0x194(~11): 0x24090000: addiu $t1,$zero,0          // load 0 into $t1, and
0x198(~11): 0x0128402A: slt $t0,$t1,$t0            // if $t0 <= $t1 then
0x19C(~11): 0x10080007: beq $zero,$t0,7[0x1BC]     // branch forward to return
0x1A0(~11): 0x00000000: nop                        // code at address 0x1BC.
0x1A4(~13): 0x8F88FFFC: lw $t0,-4($gp)             // load bar into $t0,
0x1A8(~13): 0x24090001: addiu $t1,$zero,1          // load 1 into $t1,
0x1AC(~13): 0x01094023: subu $t0,$t0,$t1           // subtract $t1 from $t0,
0x1B0(~13): 0xAF88FFFC: sw $t0,-4($gp)             // and store $t0 in bar.
0x1B4(~19): 0x1000FFF6: beq $zero,$zero,-10[0x190] // branch back to while
0x1B8(~19): 0x00000000: nop                        // code at address 0x190.
0x1BC(~19): 0x8F88FFFC: lw $t0,-4($gp)             // to return bar, load bar
0x1C0(~19): 0x00081021: addu $v0,$zero,$t0         // into $v0 via $t0 and
0x1C4(~19): 0x08000073: j 0x73[0x1CC]              // then jump to epilogue
0x1C8(~19): 0x00000000: nop                        // at address 0x1CC.
0x1CC(~20): 0x27DD0000: addiu $sp,$fp,0            // procedure epilogue
0x1D0(~20): 0x8FBE0000: lw $fp,0($sp)              // explained later in
0x1D4(~20): 0x27BD0004: addiu $sp,$sp,4            // stack chapter.
0x1D8(~20): 0x8FBF0000: lw $ra,0($sp)
0x1DC(~20): 0x27BD0004: addiu $sp,$sp,4
0x1E0(~20): 0x03E00008: jr $ra                     // return to the code
0x1E4(~20): 0x00000000: nop                        // at address 0x48.

Each line in countdown.s represents one instruction. The first line, for example, is the instruction addiu $t0,$zero,492, which is encoded in the 32-bit word 0x240801EC. The instruction will be loaded into memory at address 0x0, as indicated by 0x0(~1). The number ~1 in parentheses is the approximate line number in countdown.c for which starc generated the instruction. Even though there is a comment at Line 1 in countdown.c this still makes sense because starc always generates some instructions before compiling any actual source code. Try to find the four instructions that starc actually generated for bar = bar - 1 in Line 13 of countdown.c! They, along with the others, are explained below.

Ok, but what happens now when selfie is instructed by the final -m 1 option to execute the generated code? Doing that involves solving a problem that appears to have no solution. How does a computer load an executable into memory without an executable in memory that instructs the processor how to do this? The process that solves that problem is called bootstrapping.

Bootstrapping
A self-starting process that is supposed to proceed without external input. In computer technology the term (usually shortened to booting) usually refers to the process of loading the basic software into the memory of a computer after power-on or general reset, especially the operating system which will then take care of loading other software as needed.

A computer typically bootstraps itself by having the processor initially fetch, decode, and execute machine code from some non-volatile memory rather than volatile main memory. That machine code implements a so-called boot loader which instructs the processor to load the code that the processor is actually supposed to execute from some external source and store it in main memory. When done, the boot loader instructs the processor to start fetching, decoding, and executing the code stored in main memory.

Program Break

Before launching the mipster emulator, selfie bootstraps mipster exactly like a computer bootstraps itself. First of all, the emulated memory and registers are all zeroed, that is, set to 0x0. The machine code and data generated by starc (or loaded from a binary file) is then copied into the emulated memory starting at the lowest address 0x0. The portions of memory where code and data are located are also called the code segment and the data segment, respectively. The result is the following memory layout.

Memory Layout
Memory Layout

With code and data copied to memory the machine is essentially ready to go. The rest of the memory layout will be explained in later chapters. For now we only need to know that the border between code and data and the rest of memory is the program break which divides memory into statically and dynamically allocated storage. The addresses of the code and data stored in memory below the program break will not change while the storage above the program break may be used for different purposes during execution. Keep in mind though that the memory layout we describe here is only one choice out of many possible choices. However, that layout is probably the most widely adopted choice today.

Going back to C* in general and the countdown program in particular, global variable and procedure declarations specify exactly what is below the program break, what is code and data, and what the code does with the data, as we see next. Most important here is to understand that the state of memory is fully determined after copying the code for procedures and the data for global variables into memory. While countdown is a simple program think of the code and data for selfie. There are hundreds of global variable and procedure declarations in selfie.c but it is still the same thing. The fact that C* allows us to talk about variables and procedures without worrying about memory layout is a key ingredient for managing the enormously large state space. The only missing piece now for a complete picture is the state of the registers. Let us take a look at that next!

Program Counter

How does a computer “know” what to execute? After all the bits in memory could mean anything. They could be code, they could be data, anything. But the answer to that question can anyway not be any simpler.

Processors based on the von Neumann model feature a special-purpose register as part of their control unit called the program counter (PC). The PC of the machine emulated by mipster is, unsurprisingly, yet another 32-bit register.

Program Counter (PC)
A processor register that indicates where a computer is in its program sequence. In most processors, the PC is incremented after fetching an instruction, and holds the memory address of (“points to”) the next instruction that would be executed. Instructions are usually fetched sequentially from memory, but control transfer instructions change the sequence by placing a new value in the PC. These include branches (sometimes called jumps), subroutine calls, and returns. A transfer that is conditional on the truth of some assertion lets the computer follow a different sequence under different conditions. A branch provides that the next instruction is fetched from somewhere else in memory. A subroutine call not only branches but saves the preceding contents of the PC somewhere. A return retrieves the saved contents of the PC and places it back in the PC, resuming sequential execution with the instruction following the subroutine call.

At boot time, when selfie is done zeroing all emulated memory and registers, in particular the PC, and copying code and data into memory, mipster is ready to start code execution, well, at the lowest address 0x0 in memory because that is where the PC points to. From then on mipster fetches the word in memory where the PC points to, decodes that word, and executes the resulting instruction. Each instruction not only instructs the processor to perform some computation but also determines the next value of the PC so that the processor “knows” where in memory the next instruction is stored. That sequence of PC values is called control flow.

Control Flow
The order in which individual statements, instructions or function calls of an imperative program are executed or evaluated. The emphasis on explicit control flow distinguishes an imperative programming language from a declarative programming language.

Let us take a look at how the first few and last few instructions for the countdown program execute. You will need to scroll back to the beginning after the following command though since selfie will output the whole sequence of executed instructions. Most of the output is skipped and just shown as ... here.

> ./selfie -c manuscript/code/countdown.c -d 1
./selfie: this is selfie's starc compiling manuscript/code/countdown.c
./selfie: 625 characters read in 19 lines and 9 comments
./selfie: with 55(8.80%) characters in 28 actual symbols
./selfie: 1 global variables, 1 procedures, 0 string literals
./selfie: 0 calls, 1 assignments, 1 while, 0 if, 1 return
./selfie: 496 bytes generated with 122 instructions and 8 bytes of data
./selfie: this is selfie's mipster executing manuscript/code/countdown.c with 1M\
B of physical memory
$pc=0x0(~1): 0x240801EC: addiu $t0,$zero,492: $t0=0,$zero=0 -> $t0=492
$pc=0x4(~1): 0x251C0000: addiu $gp,$t0,0: $gp=0,$t0=492 -> $gp=492
$pc=0x8(~1): 0x24080FFF: addiu $t0,$zero,4095: $t0=492,$zero=0 -> $t0=4095
$pc=0xC(~1): 0x24094000: addiu $t1,$zero,16384: $t1=0,$zero=0 -> $t1=16384
$pc=0x10(~1): 0x01090019: multu $t0,$t1: $t0=4095,$t1=16384,$lo=0 -> $lo=67092480
$pc=0x14(~1): 0x00004012: mflo $t0: $t0=4095,$lo=67092480 -> $t0=67092480
$pc=0x18(~1): 0x00000000: nop
$pc=0x1C(~1): 0x00000000: nop
$pc=0x20(~1): 0x25083FFC: addiu $t0,$t0,16380: $t0=67092480,$t0=67092480 -> $t0=\
67108860
$pc=0x24(~1): 0x8D1D0000: lw $sp,0($t0): $sp=0,$t0=0x3FFFFFC -> $sp=67108816=mem\
ory[0x3FFFFFC]
$pc=0x28(~1): 0x00000000: nop
$pc=0x2C(~1): 0x00000000: nop
$pc=0x30(~1): 0x00000000: nop
$pc=0x34(~1): 0x00000000: nop
$pc=0x38(~1): 0x00000000: nop
$pc=0x3C(~1): 0x00000000: nop
$pc=0x40(~1): 0x0C00005F: jal 0x5F[0x17C]: $ra=0x0 -> $ra=0x48,$pc=0x17C
...
$pc=0x1E0(~20): 0x03E00008: jr $ra: $ra=0x48 -> $pc=0x48
$pc=0x48(~1): 0x27BDFFFC: addiu $sp,$sp,-4: $sp=67108816,$sp=67108816 -> $sp=671\
08812
$pc=0x4C(~1): 0xAFA20000: sw $v0,0($sp): $v0=0,$sp=0x3FFFFCC -> memory[0x3FFFFCC\
]=0=$v0
$pc=0x50(~1): 0x8FA40000: lw $a0,0($sp): $a0=0,$sp=0x3FFFFCC -> $a0=0=memory[0x3\
FFFFCC]
$pc=0x54(~1): 0x27BD0004: addiu $sp,$sp,4: $sp=67108812,$sp=67108812 -> $sp=6710\
8816
$pc=0x58(~1): 0x24020FA1: addiu $v0,$zero,4001: $v0=0,$zero=0 -> $v0=4001
$pc=0x5C(~1): 0x0000000C: syscall
exiting with exit code 0 and 0.00MB of mallocated memory
./selfie: this is selfie's mipster terminating manuscript/code/countdown.c with \
exit code 0 and 0.00MB of mapped memory
./selfie: profile: total,max(ratio%)@addr(line#),2max(ratio%)@addr(line#),3max(r\
atio%)@addr(line#)
./selfie: calls: 1,1(100.00%)@0x17C(~11),0(0.00%),0(0.00%)
./selfie: loops: 10,10(100.00%)@0x190(~11),0(0.00%),0(0.00%)
./selfie: loads: 26,11(42.37%)@0x190(~11),10(38.46%)@0x1A4(~13),1(3.84%)@0x24(~1)
./selfie: stores: 13,10(76.92%)@0x1B0(~13),1(7.69%)@0x4C(~1),0(0.00%)

Initialization

The purpose of the first sixteen instructions executed by mipster is to initialize the machine and get it ready for executing the code that implements the main procedure.

addiu

Initially, the PC denoted by $pc points to address 0x0 in memory. The instruction at this address is thus the first instruction that will be executed by the machine. The instruction is encoded in the word 0x240801EC which stands for addiu $t0,$zero,492, as mentioned before.

In our discussion we provide for each new instruction a link to the source code of mipster that implements the instruction (see, for example, the above addiu) and for each concrete instruction in the example a link to the source code of starc that generated the instruction (see, for example, the above addiu $t0,$zero,492).

Now, here is the interesting part. The output $t0=0,$zero=0 -> $t0=492 next to the instruction tells us which part of the state space (memory, registers) the instruction depends on and what the affected state actually is right before executing the instruction ($t0=0,$zero=0 to the left of the arrow ->) and which part of the state space changes and how after executing the instruction ($t0=492 to the right of ->). In other words, the instruction depends on the values in the two registers $t0 and $zero that both contain 0 and the instruction changes the value in register $t0 to 492. This is because addiu $t0,$zero,492 instructs the processor to add the value 492 to the value in register $zero which is always 0 and store the result in register $t0. In other words, addiu $t0,$zero,492 does effectively load the value 492 into $t0, enabled by the fact that $zero always contains 0. Thus there is no need for another instruction to initialize registers.

The addiu instruction can nevertheless be used for other purposes and involve any of the 32 general-purpose registers, not just $t0 and $zero. However, there is a convention of using registers for a certain purpose as reflected in the names of registers. For example, the t in $t0 stands for temporary. Registers with t are meant to store temporary results during computation.

Also, very important and not to forget, addiu $t0,$zero,492 makes the processor increment the $pc register from 0x0 to 0x4 so that the next instruction executed by the processor is the instruction at address 0x4 that immediately follows the current instruction in memory. Incrementing the PC like that creates so-called sequential control flow. Most instructions actually do that, not just addiu. There are, however, also instructions that can alter the control flow by setting the $pc register depending on the values in registers other than $pc. We explain that below.

This takes us to the next instruction addiu $gp,$t0,0 at address 0x4. Its effect on the machine state is that the value in the $gp register is set to 492 because it instructs the processor to add 0 to the value in register $t0 which is currently 492 and store the result in $gp. Also, the value of the $pc register is incremented to 0x8. The $gp register is register 28 among the 32 general-purpose registers.

What is the purpose of the first two instructions? Simple. They are meant to initialize the $gp register which stands for global pointer. Why do we use two instructions instead of one? Good question. Just using addiu $gp,$zero,492 would do the trick as well. The reason why we are not doing this is because it makes the compiler simpler, and better performance through using fewer instructions and fewer registers is not our focus here. It is, however, of major concern in state-of-the-art compilers.

More important here is the question as to why $gp is initialized to 492? Because 492, 0x1EC in hexadecimal, is the address of the first word after the code and data segments in memory which occupy exactly, well, 492 bytes but starting from address 0x0. But wait, 492 is in this case the program break, right? Yes and no. It is but we anyway use as program break in selfie a higher, more conservative address independent of code and data size because it is simpler to implement.

The purpose of the global pointer is to provide a fixed point of reference for referring to memory words that store the values of global variables. To find out where the word storing the value of a global variable is in memory, we only need to know the offset in bytes of how far below the address to which $gp points to the word is located. So, relative to the value of $gp, what is the offset of the word in memory that stores the value of the global variable bar? Think about it! This is clarified and explained in more detail below.

Let us take a look at the next eight instructions in the above output. Their purpose is to initialize the $sp register by loading the word stored at the highest address in memory into $sp. The $sp register stands for stack pointer and is register 29 among the 32 general-purpose registers. The purpose of the stack pointer is to facilitate fast dynamic allocation of memory for procedures. At all times during code execution, the value of the stack pointer is an address that points to memory below which data can be stored temporarily. The details are explained in the stack chapter.

Interesting here is to see how the highest address which is 226-4=67108860=0x3FFFFFC is loaded into register $t0. Remember that mipster emulates 64MB of memory which is 64*220=26*220=226 bytes. The highest address is thus 226 bytes subtracted by 4 bytes because memory addresses start at 0x0 and need to be word-aligned.

Now, the issue is that representing 0x3FFFFFC in binary requires 26 bits but the signed integer argument of addiu is only 16 bits. The solution is to break that large number into smaller numbers using left and logical right shift and then arithmetics to recompute the original number. Here, we exploit that with signed 32-bit integer arithmetics 226-4=((226-4)/214)*214+((226-4)-((226-4)/214)*214)=4095*16384+16380 where all three numbers fit into 16-bit signed integers and can therefore be loaded into registers through immediate addressing with addiu.

multu and the $hi and $lo registers

The multiplication 4095*16384 is performed by first loading 4095 and 16384 into registers $t0 and $t1, respectively. Similar to $t0, $t1 is also a register for temporary results, it is register 9 among the 32 general-purpose registers. The actual multiplication is performed by the instruction multu $t0,$t1 where multu stands for multiply unsigned. Similar to addiu, the term unsigned is misleading. Its actual meaning is that arithmetic overflows with multu are ignored while wrap-around semantics apply.

The instruction multiplies the value in $t0 with the value in $t1 and stores the 32 LSBs of the 64-bit result in a special-purpose 32-bit register called $lo. An actual MIPS processor would also store the 32 MSBs of the result in another special-purpose 32-bit register called $hi which we nevertheless ignore here. The $lo register contains the result with 32-bit wrap-around semantics which is exactly what we need. However, $hi is used for integer division in MIPSter just like on a MIPS32 processor. We get to that later.

The $lo and $hi registers are together with the $pc register the 3 special-purpose registers of a mipster machine that we mentioned above. At boot time, $lo and $hi are also zeroed just like all other registers. This is in fact now the complete machine state, no more surprises.

mflo and nop

In order to access the result in $lo we use the instruction mflo $t0 where mflo stands for move from low. It copies the value in $lo to $t0. The following two nop instructions do no operation, that is, do not do anything other than advancing the PC to the next instruction. They are there because the MIPS ISA restricts what the processor can do in the next two instructions after an mflo instruction. We can nevertheless safely ignore the technical reasons behind that. After adding 16380 to the value in $t0 we have the desired value 0x3FFFFFC in $t0.

lw

The next instruction is the lw $sp,0($t0) instruction at address 0x24 where lw stands for load word. This instruction loads, into the $sp register, the word from memory stored at the address derived from the value of $t0 plus offset 0, as indicated by the notation 0($t0). This means in particular that the value in $t0 is interpreted as memory address plus some constant offset that does not have to be zero and can even be negative. This behavior is another addressing mode which is called register-relative addressing. We hear more about that below. Why we are loading that word is explained in another chapter.

Interestingly, this load operation is actually mentioned by the profiler in the above output, that is, in loads: 26,...,...,1(3.84%)@0x24(~1) as one of the third most executed operations among a total of 26 load operations even though it is only executed once which corresponds to 3.84% of all load operations.

The next six instructions in the above output are all nop instructions. So, imagine, it took us sixteen instructions to get the integer value 492 into $gp and the value at address 0x3FFFFFC into $sp. We definitely need a higher-level programming language to raise the level of abstraction. However, as tedious as the machine level might be, it is completely deterministic and rather easy to understand.

For now, the important take-away message here is that we can reconstruct the full state of the machine at any instruction in the above output just by following the arrows -> line by line until that instruction. If you still cannot believe that a computer really is so simple and does work in these tiny steps and does so completely deterministically it is time to reflect about that here again. The machine starts in some given state and proceeds from there by changing very few bits in each step instructed by other bits that are identified by the program counter. That’s it. The only reason why computers appear to be so powerful is because they are so fast and can store enormous amounts of bits. This even applies to computers appearing to do many things at the same time. They don’t. When looking close enough things happen step by step.

Execution

Let us now look at the next instruction which makes the machine execute the code that actually implements the main procedure.

jal

That instruction is the jal 0x5F[0x17C] instruction where jal stands for jump and link. Its purpose is to call the code that implements the main procedure by setting the PC to the absolute address 0x17C which is where the first instruction of that code is located in memory. The profiler even mentions that first instruction the above output, that is, in calls: 1,1(100.00%)@0x17C(~11),...,... as the most called procedure code even though it is only called once (since there is no other). The jal instruction instructs the processor to jump to an absolute address in memory and continue code execution there. The addressing mode is called absolute addressing. The actual binary encoding of the address in the instruction contains the address in words, here 0x5F, rather than bytes, here 0x17C. The address in bytes is only provided as [0x17C] for better readability.

However, most importantly, before setting the PC and performing the actual jump, the jal instruction sets the $ra register which stands for return address to the value of the $pc register plus 8 bytes, which is 0x48 in the example here. The $ra register is the register with the highest index 31 among the 32 general-purpose registers. Its purpose is to remember or link to where code execution should resume when done with the code to which the machine is going to jump. The reason why it is 8 bytes rather than 4 bytes is because the instruction immediately following the jal instruction should be skipped when resuming code execution here. That instruction is in a so-called delay slot which is nevertheless an artifact that we can safely ignore. For simplicity, starc ensures that there is always a nop instruction in any delay slot.

jr

In the above output we are not showing the instructions implementing the main procedure except for the very last one which is the natural counterpart to the jal instruction. Let us first focus on that instruction and then see how the machine shuts down before explaining how the main procedure is implemented.

The jr $ra instruction sets the PC to the value of the $ra register where jr stands for jump register. The addressing mode is called register addressing. Here, the value of $ra is 0x48 which is the address of the instruction that follows the delay slot of the jal instruction that took us here. In other words, jal and jr work together by instructing the machine to jump to some code, execute it, and eventually come back to resume code execution right after the jump. While jal always links to the $ra register (hardware convention) jr could in principle be used with any general-purpose register but is usually not (software convention).

Termination

So, with the PC now pointing to the address 0x48 in memory, the next four instructions to be executed are addiu $sp,$sp,-4, followed by sw $v0,0($sp), lw $a0,0($sp), and addiu $sp,$sp,4. Their purpose is to copy the value in the $v0 register, which is 0, to the $a0 register. This is something we could have done with a single instruction but never mind. The $v0 and $a0 registers are registers 2 and 4, respectively, among the 32 general-purpose registers. The v in $v0 stands for value while the a in $a0 stands for argument. The value in $v0 is in fact the value returned by the main procedure which now becomes the argument of code for exiting the program by shutting down the machine.

sw

Among the four instructions the instruction we have not seen yet is the sw $v0,0($sp) instruction at address 0x4C where sw stands for store word. This instruction stores the word in $v0 in the memory word at the address derived from the value of $sp plus offset 0, again as indicated by the notation 0($sp). Similar to the lw instruction, the sw instruction uses register-relative addressing and is thus the natural counterpart to the lw instruction. The effect of the four instructions is that the value in $v0 is copied to $a0 via the memory word at address $sp (after decrementing $sp by 4 bytes and before incrementing $sp, again by 4 bytes, back to its original value). The reasoning behind that behavior is explained in the stack chapter.

Interestingly again, this store instruction is also mentioned by the profiler in the above output, that is, in stores: 13,...,1(7.69%)@0x4C(~1),... as one of the second most executed operations among a total of 13 store operations even though it is only executed once which corresponds to 7.69% of all store operations.

syscall

The next instruction addiu $v0,$zero,4001 loads the value 4001 into the $v0 register. Upon executing the following syscall instruction, that value in $v0 instructs the machine to output the value in $a0 as exit code, which is 0, and then to shut down. That’s it.

Again, the exact reasoning why things are done this way and what other behavior is supported by mipster is explained in later chapters. Here, we only point out that the syscall instruction, which stands for system call, does not have any explicit arguments. However, it does expect implicit arguments provided in at least the $v0 register which identifies among a finite set of choices the functionality that the machine is supposed to perform. The $a0 register can then be used to pass additional information such as an exit code.

Statement

So, how does the main procedure of countdown actually work? A procedure in C* consists of a sequence of statements which are the C* counterpart to MIPSter machine instructions. In fact, each statement translates to a sequence of machine instructions generated automatically by starc during compilation.

Statement
The smallest standalone element of an imperative programming language that expresses some action to be carried out. It is an instruction written in a high-level language that commands the computer to perform a specified action. A program written in such a language is formed by a sequence of one or more statements. A statement may have internal components (e.g., expressions).

C* features only five different kinds of statements: assignment, while, if, procedure call, and return. We discuss assignment, while, and return here and explain if and procedure call in subsequent chapters. The default control flow in C*, just like in MIPSter, is sequential from one statement to the next. However, while only 5 out of the 17 MIPSter machine instructions are control flow instructions, all statements but assignment are control flow statements in C*.

Here is the output of mipster when executing the instructions that implement the main procedure. The ... part in the middle of the output is repetitive and skipped for brevity but explained below. The first five and the last six instructions, including the jr instruction, correspond to the so-called prologue and epilogue of the procedure and can safely be ignored for now. They are generated by starc for all procedures. An important property is that the last five instructions before the jr instruction undo exactly what the first five instructions did to the involved registers. The $fp register is explained in the next two chapters. Look at the register values before and after executing these instructions to see for yourself!

> ./selfie -c manuscript/code/countdown.c -d 1
...
$pc=0x17C(~11): 0x27BDFFFC: addiu $sp,$sp,-4: $sp=67108816,$sp=67108816 -> $sp=6\
7108812
$pc=0x180(~11): 0xAFBF0000: sw $ra,0($sp): $ra=72,$sp=0x3FFFFCC -> memory[0x3FFF\
FCC]=72=$ra
$pc=0x184(~11): 0x27BDFFFC: addiu $sp,$sp,-4: $sp=67108812,$sp=67108812 -> $sp=6\
7108808
$pc=0x188(~11): 0xAFBE0000: sw $fp,0($sp): $fp=0,$sp=0x3FFFFC8 -> memory[0x3FFFF\
C8]=0=$fp
$pc=0x18C(~11): 0x27BE0000: addiu $fp,$sp,0: $fp=0,$sp=67108808 -> $fp=67108808
$pc=0x190(~11): 0x8F88FFFC: lw $t0,-4($gp): $t0=67108860,$gp=0x1EC -> $t0=10=mem\
ory[0x1E8]
$pc=0x194(~11): 0x24090000: addiu $t1,$zero,0: $t1=16384,$zero=0 -> $t1=0
$pc=0x198(~11): 0x0128402A: slt $t0,$t1,$t0: $t1=0,$t0=10 -> $t0=1
$pc=0x19C(~11): 0x10080007: beq $zero,$t0,7[0x1BC]: $zero=0,$t0=1 -> $pc=0x1A0
$pc=0x1A0(~11): 0x00000000: nop
$pc=0x1A4(~13): 0x8F88FFFC: lw $t0,-4($gp): $t0=1,$gp=0x1EC -> $t0=10=memory[0x1\
E8]
$pc=0x1A8(~13): 0x24090001: addiu $t1,$zero,1: $t1=0,$zero=0 -> $t1=1
$pc=0x1AC(~13): 0x01094023: subu $t0,$t0,$t1: $t0=10,$t0=10,$t1=1 -> $t0=9
$pc=0x1B0(~13): 0xAF88FFFC: sw $t0,-4($gp): $t0=9,$gp=0x1EC -> memory[0x1E8]=9=$\
t0
$pc=0x1B4(~19): 0x1000FFF6: beq $zero,$zero,-10[0x190]: $zero=0,$zero=0 -> $pc=0\
x190
$pc=0x190(~11): 0x8F88FFFC: lw $t0,-4($gp): $t0=9,$gp=0x1EC -> $t0=9=memory[0x1E\
8]
$pc=0x194(~11): 0x24090000: addiu $t1,$zero,0: $t1=1,$zero=0 -> $t1=0
$pc=0x198(~11): 0x0128402A: slt $t0,$t1,$t0: $t1=0,$t0=9 -> $t0=1
$pc=0x19C(~11): 0x10080007: beq $zero,$t0,7[0x1BC]: $zero=0,$t0=1 -> $pc=0x1A0
$pc=0x1A0(~11): 0x00000000: nop
$pc=0x1A4(~13): 0x8F88FFFC: lw $t0,-4($gp): $t0=1,$gp=0x1EC -> $t0=9=memory[0x1E\
8]
$pc=0x1A8(~13): 0x24090001: addiu $t1,$zero,1: $t1=0,$zero=0 -> $t1=1
$pc=0x1AC(~13): 0x01094023: subu $t0,$t0,$t1: $t0=9,$t0=9,$t1=1 -> $t0=8
$pc=0x1B0(~13): 0xAF88FFFC: sw $t0,-4($gp): $t0=8,$gp=0x1EC -> memory[0x1E8]=8=$\
t0
$pc=0x1B4(~19): 0x1000FFF6: beq $zero,$zero,-10[0x190]: $zero=0,$zero=0 -> $pc=0\
x190
...
$pc=0x190(~11): 0x8F88FFFC: lw $t0,-4($gp): $t0=1,$gp=0x1EC -> $t0=1=memory[0x1E\
8]
$pc=0x194(~11): 0x24090000: addiu $t1,$zero,0: $t1=1,$zero=0 -> $t1=0
$pc=0x198(~11): 0x0128402A: slt $t0,$t1,$t0: $t1=0,$t0=1 -> $t0=1
$pc=0x19C(~11): 0x10080007: beq $zero,$t0,7[0x1BC]: $zero=0,$t0=1 -> $pc=0x1A0
$pc=0x1A0(~11): 0x00000000: nop
$pc=0x1A4(~13): 0x8F88FFFC: lw $t0,-4($gp): $t0=1,$gp=0x1EC -> $t0=1=memory[0x1E\
8]
$pc=0x1A8(~13): 0x24090001: addiu $t1,$zero,1: $t1=0,$zero=0 -> $t1=1
$pc=0x1AC(~13): 0x01094023: subu $t0,$t0,$t1: $t0=1,$t0=1,$t1=1 -> $t0=0
$pc=0x1B0(~13): 0xAF88FFFC: sw $t0,-4($gp): $t0=0,$gp=0x1EC -> memory[0x1E8]=0=$\
t0
$pc=0x1B4(~19): 0x1000FFF6: beq $zero,$zero,-10[0x190]: $zero=0,$zero=0 -> $pc=0\
x190
$pc=0x190(~11): 0x8F88FFFC: lw $t0,-4($gp): $t0=0,$gp=0x1EC -> $t0=0=memory[0x1E\
8]
$pc=0x194(~11): 0x24090000: addiu $t1,$zero,0: $t1=1,$zero=0 -> $t1=0
$pc=0x198(~11): 0x0128402A: slt $t0,$t1,$t0: $t1=0,$t0=0 -> $t0=0
$pc=0x19C(~11): 0x10080007: beq $zero,$t0,7[0x1BC]: $zero=0,$t0=0 -> $pc=0x1BC
$pc=0x1BC(~19): 0x8F88FFFC: lw $t0,-4($gp): $t0=0,$gp=0x1EC -> $t0=0=memory[0x1E\
8]
$pc=0x1C0(~19): 0x00081021: addu $v0,$zero,$t0: $v0=0,$zero=0,$t0=0 -> $v0=0
$pc=0x1C4(~19): 0x08000073: j 0x73[0x1CC]: -> $pc=0x1CC
$pc=0x1CC(~20): 0x27DD0000: addiu $sp,$fp,0: $sp=67108808,$fp=67108808 -> $sp=67\
108808
$pc=0x1D0(~20): 0x8FBE0000: lw $fp,0($sp): $fp=67108808,$sp=0x3FFFFC8 -> $fp=0=m\
emory[0x3FFFFC8]
$pc=0x1D4(~20): 0x27BD0004: addiu $sp,$sp,4: $sp=67108808,$sp=67108808 -> $sp=67\
108812
$pc=0x1D8(~20): 0x8FBF0000: lw $ra,0($sp): $ra=72,$sp=0x3FFFFCC -> $ra=72=memory\
[0x3FFFFCC]
$pc=0x1DC(~20): 0x27BD0004: addiu $sp,$sp,4: $sp=67108812,$sp=67108812 -> $sp=67\
108816
$pc=0x1E0(~20): 0x03E00008: jr $ra: $ra=0x48 -> $pc=0x48
...

While Statement

The execution of a procedure like main always begins with the first statement of the procedure which is here the while statement, also called while loop, in Line 11 in countdown.c. The meaning of that statement is to check if the value of the global variable bar is greater than 0 and, if yes, to execute the sequence of statements, called the body of the while loop, enclosed in the curly braces {} right after while (bar > 0). Here, there is only one statement which is the assignment bar = bar - 1. When the body of the while loop is finished executing, control flows back to while (bar > 0) for checking the value of bar again. If the value is still greater than 0 the body of the while loop is executed again in another iteration of the loop and so on. Only if the value of bar is not greater than 0 the body is not executed. In this case, the while loop is terminated and control flows to the statement that follows the while statement, which is here the return bar statement.

While Loop
A control flow statement that allows code to be executed repeatedly based on a given Boolean condition.

Here, bar > 0 is that Boolean condition, also called loop condition, which evaluates to the Boolean values true or false. A possibly confusing convention in C is that the Boolean value false is represented by the integer value 0 while true is represented by the integer value 1 or in fact any integer value other than 0. Many other programming languages avoid that ambiguity by providing Boolean literals such as true and false.

The execution of a while loop always results in checking the loop condition at least once. However, the body of the while loop may be executed any number of times, including zero times, depending on the Boolean condition and the statements in the body. For example, while (0) { ... } would find its loop condition to evaluate to false and therefore proceed to the next statement, skipping any statements in its body, whereas while (1) { ... } would always find its loop condition to evaluate to true and therefore never proceed to the next statement. Try that yourself by modifying the countdown program accordingly!

You may also ask yourself if there can be nested while loops, that is, while statements in the body of a while loop. The answer is yes, of course, any finite number of times in fact. Imagine what this can do, a loop inside of a loop inside of a loop. This is very powerful and can get quite tricky so we stay away from that for now.

Let us now take a look at the machine code generated by starc for the while loop and see how it executes. The instructions implement exactly what we just described informally.

The first instruction of the while loop is lw $t0,-4($gp) at address 0x190. Its purpose is to load from memory the value of the global variable bar occurring in while (bar > 0) into register $t0 for comparison with 0. The value of bar is stored in memory in the word at the address derived from $gp plus the offset -4 (bytes). Remember, $gp points to the first word in memory above the data segment. Since the value of $gp is 492, or 0x1EC in hexadecimal, the actual address of the value of bar in memory is 488, or 0x1E8 in hexadecimal. If there was a second global variable its value would be stored at the address derived from $gp plus the offset -8 (bytes) and so on. The output $t0=67108860,$gp=0x1EC -> $t0=10=memory[0x1E8] next to the instruction shows that the value stored at 0x1E8 is in fact 10 which is the initial value of bar. How did that value get there? The boot loader put it there! So 10 was there after loading the code and data generated by starc even before mipster started executing any code.

The next instruction of the while loop is addiu $t1,$zero,0 which loads the value of 0 occurring in while (bar > 0) into register $t1 for comparison with $t0.

slt

The actual comparison is performed by the slt $t0,$t1,$t0 instruction where slt stands for set on less than. It sets the value of $t0 to 1 if the value of $t1, which is here 0, is less than the value of $t0, which is here 10. The output $t1=0,$t0=10 -> $t0=1 confirms that. If the value of $t1 was not less than the value of $t0 it would set $t0 to 0, which is exactly what happens when the value of bar stored at address 0x1E8 eventually becomes 0, see the last occurrence of slt $t0,$t1,$t0 in the above output.

beq

The next instruction is the beq $zero,$t0,7[0x1BC] instruction where beq stands for branch on equal. Its purpose is to check the result of the previous slt instruction stored in $t0 and see if it is 0 or 1 by comparing it with the value of $zero which is still just 0. If the result is 1, which effectively means that the value of bar is greater than 0, nothing happens other than an increment of the PC by 4 bytes from 0x19C to 0x1A0 which is where the next instruction is located. Since beq instructions also have a delay slot, starc generated a nop instruction for the next instruction. The lw instruction after that is the first instruction implementing the body of the while loop.

Now, what would happen if the result of the previous slt instruction was 0, meaning that the value of bar was not greater than 0? This does in fact happen exactly once in our example, see the last occurrence of beq $zero,$t0,7[0x1BC] in the above output. In that case, the instruction branches forward to the first instruction after the (5 plus 2 nop) instructions that implement the body of the while loop, effectively terminating the while loop. In particular, beq $zero,$t0,7[0x1BC] increments in that case the PC by 4 bytes plus the offset 7 times 4 bytes from 0x19C to 0x1BC. The addressing mode is called PC-relative addressing. The address in absolute terms is only provided as 0x1BC for better readability. Branching, as opposed to jumping, typically uses PC-relative addressing, rather than absolute addressing. The way we use the beq instruction here is called conditional branching.

Branch
An instruction in a computer program that can cause a computer to begin executing a different instruction sequence and thus deviate from its default behavior of executing instructions in order. A branch instruction can be either an unconditional branch, which always results in branching, or a conditional branch, which may or may not cause branching, depending on some condition. Branch instructions are used to implement control flow in program loops and conditionals (i.e., executing a particular sequence of instructions only if certain conditions are satisfied).

Before explaining the instructions that implement the body of the while loop we focus on the last instruction that implements the while statement (disregarding the nop instruction in its delay slot). It is the beq $zero,$zero,-10[0x190] instruction at address 0x1B4 which is in fact used here for unconditional branching. Since the value of $zero is always equal to itself, the instruction unconditionally branches backwards by 10 instructions (due to the negative offset -10) to the first instruction that implements the while statement at address 0x190. In other words, the unconditional branch completes the loop.

For brevity we are only showing in the above output the first two iterations and the last, in fact, tenth iteration of the loop. The profiler even mentions in loops: 10,10(100.00%)@0x190(~11),...,... the total number of loop iterations taken during program execution. It also reports that they were all done by the same while loop with its first instruction at address 0x190.

Also, note that the only relevant thing that changes in the machine’s state from one loop iteration to the next is the value of bar in memory. The values of $t0 and $t1 also change but are always overwritten in the next iteration and therefore never used beyond one iteration.

Moreover, the control structure of a while statement is only implemented by the two beq instructions we just explained, that is, a conditional forward branch and an unconditional backward branch. The instructions before the forward branch that belong to the while statement implement the loop condition and the instructions between the forward and backward branch implement the body of the while loop. Let us now have a look at the body of the while loop in our example.

Assignment

The assignment bar = bar - 1 in Line 13 in countdown.c constitutes the body of the previously discussed while loop. The assignment decrements the value of bar by 1, that is, it loads the value of bar (right hand side of =) from memory, subtracts 1 from that value, and stores the resulting value in the memory for bar (left hand side of =), overwriting the previous value of bar in memory.

Assignment
Sets and/or re-sets the value stored in the storage location(s) denoted by a variable name; in other words, it copies a value into the variable. In most imperative programming languages, the assignment statement (or expression) is a fundamental construct.

What does fundamental construct mean? In imperative programming languages assignments are the only way to change state other than control state which is the portion of state that represents the current state of control flow, that is, the currently executed statement. In the countdown program the only non-control-state information is thus the value of bar in memory. That’s it! On source code level, the values of all other memory and all registers is therefore not relevant for the correctness of countdown.

Expression

The right hand side of an assignment provides a so-called expression for computing non-control-state information, that is, the next value of the variable on the left hand side of the assignment. In bar = bar - 1 that expression is obviously bar - 1.

Expression
A combination of one or more explicit values, constants, variables, operators, and functions that the programming language interprets (according to its particular rules of precedence and of association) and computes to produce (“to return”, in a stateful environment) another value. This process, as for mathematical expressions, is called evaluation.

In other words, the expression bar - 1 could be more complex and involve other variables, constants, operators, and even functions. C*, for example, supports integer, character, and string literals as constants as well as operators for addition, subtraction, multiplication, division, and even remainder. It also supports procedure calls in expressions as functions.

A Boolean condition such as bar > 0 is actually also an example of an expression involving the comparison operator >. This means that a loop condition could be more complex and involve other comparison operators supported by C* as well.

Let us now again take a look at the machine code generated by starc for the assignment and see how it executes. Just like before, the instructions implement exactly what we described above informally.

The first two instructions of the assignment lw $t0,-4($gp) at address 0x1A4 and addiu $t1,$zero,1 at address 0x1A8 load the values of bar and 1 into $t0 and $t1, respectively, to prepare for the evaluation of the expression bar - 1.

subu

The next instruction is subu $t0,$t0,$t1 which actually implements the subtraction in bar - 1, that is, the - operator, by subtracting the value of $t1 from the value of $t0 and store the result in $t0. Unsurprisingly, subu stands for subtract unsigned, analogous to addiu. However, the term unsigned is again misleading. Its meaning is as usual that arithmetic overflows with subu are ignored while wrap-around semantics apply. Unlike addiu, however, subu uses register addressing, as opposed to immediate addressing. In other words, its third argument is a register, not an integer value.

At this point the evaluation of the expression and thus the right hand side of the = operator is complete. The only remaining thing to do is to perform the actual assignment. This is done by sw $t0,-4($gp) which is the last instruction generated by starc for the assignment, that is, for the left hand side of the = operator. Analogous to lw $t0,-4($gp), which loads the memory word that represents the value of bar into $t0, sw $t0,-4($gp) stores the result of the evaluation in $t0 in that memory word. That’s it! Interestingly, an assignment is actually implemented by only one instruction, namely that sw instruction. The other instructions are there for evaluating the expression in the right hand side of the assignment.

Return Statement

Once the while statement and the assignment in countdown.c has been executed ten times, control reaches the return statement return bar in Line 19. That statement does two things. It provides the value of bar as return value of the main procedure, and it terminates the execution of the body of main, even if there were more statements after return bar, and returns to the code that invoked main.

Return Statement
Causes execution to leave the current subroutine and resume at the point in the code immediately after where the subroutine was called, known as its return address. The return address is saved, usually on the process’s call stack, as part of the operation of making the subroutine call. Return statements in many languages allow a function to specify a return value to be passed back to the code that called the function.

We mentioned the return address for the invocation of the main procedure before, which is 0x48. However, we did not mention that it is actually saved in memory where the stack pointer refers to by the sw $ra,0($sp) instruction at 0x180 in the prologue of the procedure. There is also the lw $ra,0($sp) instruction at 0x1D8 in the epilogue of the procedure that matches the sw instruction. The details of why this is done are explained in the stack chapter. Also, the terms procedure and function are used synonymously here. We prefer to just use procedure to avoid confusion with functions in a mathematical sense.

Recall that the main procedure in our example has a return type int declared in Line 6 in countdown.c. The data type of any value returned by a return statement in the body of main needs to match that return type. In our example, return bar is fine because the data type of bar is also int. Also, any expressions more complex than just bar are possible as long as they evaluate to a value of the return type. Try return bar + 7, for example.

Again, let us now take a look at the machine code generated by starc for the return statement and see how it executes. Just like before, the instructions implement exactly what we described above informally.

Upon terminating the execution of the instructions that implement the while loop in countdown.c, the machine takes the forward branch of the beq $zero,$t0,7[0x1BC] instruction at address 0x19C to proceed to the lw $t0,-4($gp) instruction at address 0x1BC. This instruction is generated by starc for loading the value of bar occurring in return bar from memory into $t0.

addu

The next instruction is addu $v0,$zero,$t0. Its purpose is to copy the value of $t0 to $v0 by adding the value of $t0 to the value of $zero, which is still 0, and storing the result in $v0. Again, sounds awkward but does the job and also, unsurprisingly, addu stands for add unsigned and features wrap-around semantics, analogous to addiu and subu. Just like subu, it uses register addressing.

So, now the value of bar is in register $v0. Why? Because starc generates code such that return values of procedures are always stored in $v0. It is another software convention. Recall that the four instructions starting at address 0x48 to which we eventually return do something with $v0. In fact, they copy the value of $v0 to $a0. The following syscall instruction takes the value of $a0 and outputs it as exit code. Voila.

j

The last instruction implementing the return statement is j 0x73[0x1CC] where j stands for jump. Its purpose is to jump to the epilogue of the procedure using absolute addressing similar to the jal instruction. Here, it does so by merely jumping over the nop instruction in its delay slot since the epilogue appears right after that instruction. In other words, the jump is actually not necessary here. However, this is not true in general, namely whenever the return statement is not the last statement in the body of a procedure. To keep things simple we nevertheless have starc generate the j instruction anyway.

Hard to believe but this concludes the discussion of the countdown program. We have seen all there is to say about it. However, there are three more MIPSter instructions that we have not seen yet since they are not needed to implement countdown.c.

divu

The divu instruction divides the value of one general-purpose register by the value of another general-purpose register and stores the 32-bit integer result in the $lo register and the remainder in the $hi register. Obviously, divu stands for divide unsigned with the usual meaning.

mfhi

The mfhi instruction complements the mflo instruction where mfhi obviously stands for move from high. Unsurprisingly, it copies the value of $hi to a general-purpose register and thus provides access to the remainder of an integer division by the divu instruction.

bne

The bne instruction branches exactly when the beq instruction does not, that is, bne stands for branch on not equal. The starc compiler generates bne instructions for some comparison operators.

Pointer

Before concluding this chapter we go back to the “Hello World!” program presented in the previous chapter:

A “Hello World!” Program
 1 // global variable for pointing to the "Hello World!" string
 2 int* foo;
 3 
 4 // main procedure for printing "Hello World!" on the console
 5 int* main() {
 6   // point to the "Hello World!" string
 7   foo = "Hello World!";
 8 
 9   // strings are actually stored in chunks of 4 characters in memory,
10   // that is, here as "Hell", "o Wo", and "rld!" which allows us to
11   // print them conveniently in chunks of 4 characters at a time
12 
13   // as long as there are characters print them
14   while (*foo != 0) {
15     // 1 means that we print to the console
16     // foo points to a chunk of 4 characters
17     // 4 means that we print 4 characters
18     write(1, foo, 4);
19 
20     // go to the next chunk of 4 characters
21     foo = foo + 1;
22   }
23 }

Here, we are interested in how the "Hello World!" string is actually handled in machine code. Consider the global variable declaration in Line 2. It declares a global variable foo. However, its data type is not int, it is int*. So, what is the difference? Values of both types are represented by 32 bits. So, that is not the difference. The only difference is the intention of what to do with them. Values of type int are obviously meant to be signed 32-bit integers which is reflected in the semantics of the operations on them which is of course integer arithmetics. Values of type int*, however, are meant to be pointers to values of type int stored in memory and identified by their memory addresses.

Pointer
A programming language object, whose value refers to (or “points to”) another value stored elsewhere in the computer memory using its memory address. A pointer references a location in memory, and obtaining the value stored at that location is known as dereferencing the pointer.

Line 7 in the "Hello World!" program provides an example. In the previous chapter, we already explained that the string "Hello World!" is stored contiguously in memory in four words encoding "Hell", "o Wo", "rld!", and 0 to terminate the string. The assignment in Line 7 takes the address of the first word in memory encoding "Hell" and assigns that address as value to foo. In other words, after the assignment foo points to the beginning of the "Hello World!" string in memory.

So, how do we now get access to the string, or in fact any data in memory to which a pointer points to? The programming language C and C* as well provide a unary operator for this purpose called the dereference operator denoted by *. Even though this operator is denoted by the same symbol it has nothing to do with the binary operator for multiplication. The dereference operator takes as argument a pointer for which it returns the value stored at the address in memory where the pointer points to. It is in fact the only way in C* to access data anywhere in memory hence the name C*.

Line 14 is an example of how to use the dereference operator. The expression *foo in while (*foo != 0) returns as integer value the word that encodes "Hell" stored at the address in memory to which foo points to. Since that value is not equal to 0 (the != comparison operator checks for inequality) the body of the while loop is executed. The statement in Line 18 calls the write procedure to output the 4 characters encoded by that value to the console. The details of how this is done are explained in a later chapter.

The following assignment in Line 21 is interesting. It takes the value of foo and increments it by 4 (!), not 1, as the expression foo + 1 may suggest. The reason is that the + operator implements addition according to pointer arithmetics rather than integer arithmetics if the first operand is a pointer to an integer value (foo) and the second operand is just an integer value (1). In this case, + works by first multiplying the second operand 1 with the size of the data type in bytes to which the first operand foo points to. Only then the result of that multiplication, which is 4 because the size of an integer value is 4 bytes, is added to the value of foo. After the assignment, foo thus points to the word that encodes "o Wo".

After two more iterations of the while loop, the "Hello World!" string is output to the console and foo points to the word that terminates the string by encoding 0 effectively terminating the loop as well. The purpose of pointer arithmetics is to provide a convenient way to move a pointer across memory at the granularity of the size of the data type to which the pointer points to.

Let us now again take a look at the relevant machine code generated by starc for Lines 7, 14, and 21 in the "Hello World!" program and see how it executes. Just like before, the instructions implement exactly what we described above informally.

> ./selfie -c manuscript/code/hello-world.c -d 1
./selfie: this is selfie's starc compiling manuscript/code/hello-world.c
...
./selfie: this is selfie's mipster executing manuscript/code/hello-world.c with \
1MB of physical memory
...
$pc=0x190(~7): 0x2788FFEC: addiu $t0,$gp,-20: $t0=67108860,$gp=596 -> $t0=576
$pc=0x194(~7): 0xAF88FFFC: sw $t0,-4($gp): $t0=576,$gp=0x254 -> memory[0x250]=57\
6=$t0
$pc=0x198(~14): 0x8F88FFFC: lw $t0,-4($gp): $t0=576,$gp=0x254 -> $t0=576=memory[\
0x250]
$pc=0x19C(~14): 0x8D080000: lw $t0,0($t0): $t0=576,$t0=0x240 -> $t0=1819043144=m\
emory[0x240]
...
$pc=0x1F8(~21): 0x8F88FFFC: lw $t0,-4($gp): $t0=4,$gp=0x254 -> $t0=576=memory[0x\
250]
$pc=0x1FC(~21): 0x24090001: addiu $t1,$zero,1: $t1=0,$zero=0 -> $t1=1
$pc=0x200(~21): 0x240A0004: addiu $t2,$zero,4: $t2=0,$zero=0 -> $t2=4
$pc=0x204(~21): 0x012A0019: multu $t1,$t2: $t1=1,$t2=4,$lo=67092480 -> $lo=4
$pc=0x208(~21): 0x00004812: mflo $t1: $t1=1,$lo=4 -> $t1=4
$pc=0x20C(~21): 0x00000000: nop
$pc=0x210(~21): 0x00000000: nop
$pc=0x214(~21): 0x01094021: addu $t0,$t0,$t1: $t0=576,$t0=576,$t1=4 -> $t0=580
$pc=0x218(~21): 0xAF88FFFC: sw $t0,-4($gp): $t0=580,$gp=0x254 -> memory[0x250]=5\
80=$t0
...
exiting with exit code 0 and 0.00MB of mallocated memory
./selfie: this is selfie's mipster terminating manuscript/code/hello-world.c wit\
h exit code 0 and 0.00MB of mapped memory
...

The first relevant instruction is the addiu $t0,$gp,-20 instruction at memory address 0x190 which starc generated for Line 7 to compute the address of the first word of the "Hello World!" string in memory. Why is it 20 bytes below the word in memory to which $gp points to? Simple. String literals, just like the values of global variables, are stored in the data segment in memory. Whenever starc encounters a string literal such as "Hello World!" it counts the number of words necessary to encode and store the string. The compiler then allocates the required space in the data segment and computes the address of that space relative to the global pointer.

Similar to the value of bar in the countdown program, the value of foo is stored 4 bytes below the word to which $gp points to. Since we need 4 words or 16 bytes to store the "Hello World!" string the closest address that still provides enough space is 20 bytes below the word to which $gp points to. With that address in $t0, the following sw $t0,-4($gp) instruction performs the actual assignment into foo by storing the value of $t0 in memory where the value of foo is located.

So, "Hell", "o Wo", "rld!", and 0 are stored contiguously 20, 16, 12, and 8 bytes, respectively, below the word to which $gp points to. How did they get there? Similar to the initial values of global variables, the boot loader copied them from the binary to memory, even before code execution started. In other words, string literals are stored in binaries as data, just like initial values of global variables.

The next interesting instruction is the lw $t0,0($t0) instruction at address 0x19C which starc generated for the occurrence of the dereference operator in while (*foo != 0). Before that instruction, the lw $t0,-4($gp) instruction at address 0x198 loads the value of foo into $t0. The lw $t0,0($t0) instruction dereferences the pointer foo by loading the word into $t0 from memory where $t0 points to. After executing the instruction, $t0 thus contains the value that encodes "Hell". Really? Check the mipster output $t0=576,$t0=0x240 -> $t0=1819043144=memory[0x240] for that instruction. The decimal number 1819043144 loaded into $t0 is equal to 0x6C6C6548 in hexadecimal notation. The hexadecimal number 0x48 is the ASCII code for the uppercase letter H. Moreover, the hexadecimal numbers 0x65 and 0x6C are the ASCII codes for the lowercase letters e and l, respectively. Very nice!

Before concluding this chapter, let us have a quick look at the code generated for the assignment foo = foo + 1 in Line 21. With the values of foo and 1 loaded into $t0 and $t1, respectively, the addiu $t2,$zero,4 instruction at address 0x200 loads 4, the size of an integer value in bytes, into register $t2 which is another register for temporary results. It is register 10 among the 32 general-purpose registers.

The following two instructions compute the increment 4 by multiplying the values of $t1 and $t2 and store the result in $t1. The addu $t0,$t0,$t1 instruction increments the value of $t0 by 4. The final sw $t0,-4($gp) instruction performs the actual assignment of the value of foo + 1 according to pointer arithmetics stored in $t0 into the word that stores the value of foo. That’s all.

Summary

Computation is a sequence of states. The execution of a program written in a high-level programming language like C* as well as the execution of machine code implementing that program evolves from one program or machine state to another. A key challenge is to make sure that bad states are never reached during any execution. This is difficult because there are usually very large amounts of possible states, good and bad. Programming correct code in a high-level language rather than a machine language feels easier because there is generally way fewer high-level program states than low-level machine states.

The "Hello World!" program, for example, only has one global variable foo. Its program state consists of the value of foo and the currently executed statement in the body of the main procedure. The same is true for the countdown program with its global variable bar. The machine code for both programs, however, is a lot more complicated because of its size and its state which involves a number of registers and memory locations.

The following table summarizes in which way the high-level programming artifacts we have seen so far relate to the low-level machine artifacts that implement them.

High-Level Programming Artifact Low-Level Machine Artifact
Global Variable Declaration Word in Memory
Global Variable Definition Initial Value in Memory Word
Integer Literal in Definition Value in Memory Word
Character Literal in Definition Character in Memory Word
Data Type Intended Meaning of Bits
Procedure Declaration Address of Code in Memory
Procedure Definition Actual Code in Memory
Procedure Call Jumping and Linking to Prologue
Statement Machine Instructions
Current Statement Program Counter
While Statement Forward and Backward Branching
Assignment Storing Register into Memory Word
Return Statement Forward Jumping to Epilogue
Variable in Expression Loading Memory Word into Register
Integer Literal in Expression Value in Instruction Argument
Character Literal in Expression Character in Instruction Argument
String Literal in Expression Characters in Memory Words
  Offset in Instruction Argument
Pointer Address in Memory
Arithmetic/Comparison Operator Computing with Registers

The next chapter introduces the arguably simplest way to control the state space of programs that still allows to implement a large set of interesting applications. We show which parts of selfie are using that technique and how.