Selfie: Computer Science for Everyone
Selfie: Computer Science for Everyone
Buy on Leanpub
Selfie: Computer Science for Everyone

Acknowledgements

The ideas presented here are the result of many years of teaching and working with students and colleagues around the world. I am grateful to my group of students and faculty in Salzburg who helped me over the years with refining and deepening my understanding of computer science. I am also particularly grateful to my colleague Professor Raja Sengupta at UC Berkeley who challenged me to the point that made me discover what has now become the Selfie Project. The design of the compiler is inspired by the Oberon compiler of Professor Niklaus Wirth from ETH Zurich. The design of the selfie microkernel is inspired by microkernels of Professor Jochen Liedtke from University of Karlsruhe.


Disclaimer:

This is work in progress. Only three chapters are done. Others exist in draft form but will take more time to be done. Also, even existing chapters may change, see the git history for details.

Code License

The code provided in this book is covered by the Simplified BSD License. Please make sure to include it in any code derived from here.

Copyright (c) 2015-2018, the Selfie Project authors. All rights reserved.

Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions are met:

1. Redistributions of source code must retain the above copyright notice, this
   list of conditions and the following disclaimer.
2. Redistributions in binary form must reproduce the above copyright notice,
   this list of conditions and the following disclaimer in the documentation
   and/or other materials provided with the distribution.

THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND
ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR
ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
(INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND
ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

1. Introduction

Computer science is a mystery to so many and yet more and more people use computers every day in one form or another. There are increasingly many people with all kinds of backgrounds other than computer science that even code every day. At schools, colleges, universities, and companies around the world there is hardly anyone left who is not confronted with the machine and its code. But even for people just using the machine in their hands, on their desks, and in the clouds there is often that unsatisfactory experience of not understanding what is really going on.

While a book about computer science for everyone may sound appealing it actually requires serious commitment to understand the material even though we tried very hard to simplify everything as much as possible. The reason is that computers and software are so expressive that it is unlikely that any other computational machine in the future will be more expressive. Anything that can be computed can be computed now, if you have enough time, space, and energy.

This book is based on the Selfie Project, an educational software system that has grown out of many years of teaching undergraduate and graduate classes in computer science. The key observation is that understanding computer science and the meaning of software in particular can only be achieved by understanding how software and data translate all the way down to the machine. Selfie shows how to do that. This may sound difficult but can actually be done in a systematic fashion with a strong focus on basic principles.

Selfie is software that can translate itself to code that can be executed by a computer. This is called self-compilation. But selfie even includes software that can mimic a computer that can execute the code of selfie. This is called self-execution. And, selfie can even do that on behalf of other computers but also itself. This is called self-hosting. By now your mind might be spinning but you at least understand why selfie is called selfie.

Why is the self so important? Because it shows how meaning is created systematically on a machine. Selfie is software written in some programming language. However, selfie defines the meaning of that language which implies that selfie defines its own meaning. This is very similar to an English dictionary written in English. The only difference is that selfie defines meaning formally in such a way that a machine can run selfie and thus any software written in the language in which selfie is written.

The book works by explaining selfie step by step while focusing on the basic principles that are necessary to understand each step. In order to follow the book you will need to run selfie either on your computer or in the cloud. To run selfie on your computer you need a terminal and a C compiler. If you have a Mac or a Linux system, terminal and compiler are already there, and easy to get for a Windows machine. If you only have access to a web browser you can still run selfie through a cloud-based development environment, see the README for more details.

Chapter 2 introduces selfie and how to use it. The focus is on developing an idea of the problem of semantics. Computer scientists have developed all these formalisms for coding and many other things. We show how selfie can help to understand the relevant aspects of the problem.

Chapter 3 is an introduction to encoding and decoding basic information such as characters, strings, numbers, and even machine instructions in binary form. We use selfie to show examples and how their encoding and decoding is actually done.

Chapter 4 introduces computation as the evolution of state. Using a simple example of a program we explain what its source code means, how that source code translates to machine code that gives the source code its meaning, and how both source and machine code execute from one state to another.

Chapter 5 will be on regularity and finite state machines. We show how to control the potentially very large state space of code to simplify reasoning about its correctness.

2. Semantics

When it comes to computers a bit is the first thing we may want to know about and understand. The only thing that computers really do on the level of the machine is handle enormous amounts of bits, nothing else.

Bit
The basic unit of information in computing and digital communications. A bit can have only one of two values which are most commonly represented as either a 0 or 1. The term bit is a portmanteau of binary digit.

There are two fundamental reasons why computers use bits and nothing else. The first reason is that the two values of a bit can readily be distinguished by electronic circuits using different levels of voltage, say, low voltage for 0 and high voltage for 1. Distinguishing more values is possible and would be exciting to see but has largely not yet happened for technological reasons. The second reason is that whatever you can say using any number of values per digit or character greater than two you can also say using two values with only an insignificant difference in the number of digits or characters you need to say the same thing. More on that in the next chapter.

Selfie is a computer program that is fully contained in a single file called selfie.c which consists of around two hundred and eighty-eight thousand characters that are encoded in bits.

Character
A unit of information that roughly corresponds to a grapheme, grapheme-like unit, or symbol, such as in an alphabet or syllabary in the written form of a natural language. Examples of characters include letters, numerical digits, common punctuation marks (such as “.” or “-“), and whitespace. The concept also includes control characters, which do not correspond to symbols in a particular natural language, but rather to other bits of information used to process text in one or more languages. Examples of control characters include carriage return, line feed, and tab, as well as instructions to printers or other devices that display or otherwise process text.

You may also see for yourself by downloading selfie and then using the less command in your terminal (use the cd command first to go into the top folder of selfie which is where you unzipped it on your machine):

> less selfie.c

For example, the first few lines of selfie.c are:

/*
Copyright (c) 2015-2018, the Selfie Project authors. All rights reserved.
Please see the AUTHORS file for details. Use of this source code is
governed by a BSD license that can be found in the LICENSE file.

Selfie is a project of the Computational Systems Group at the
Department of Computer Sciences of the University of Salzburg
in Austria. For further information and code please refer to:

http://selfie.cs.uni-salzburg.at

What you see here is called source code where the first few lines are in fact comments that will be ignored by the machine.

Source Code
Any collection of computer instructions (possibly with comments) written using some human-readable computer language, usually as text.

The characters of source code like selfie.c are usually encoded in bits according to the ASCII standard. Remember, since computers can only handle bits everything needs to be encoded in bits.

American Standard Code for Information Interchange (ASCII)
7-bit encoding scheme for 128 characters: numbers 0 to 9, lowercase letters a to z, uppercase letters A to Z, basic punctuation symbols, control codes that originated with Teletype machines, and a space.

On machine level, each character is thus represented by seven bits. What we see when we invoke less is merely a human-readable version of these bits. To get a better feel of the size of selfie.c run the command wc -m selfie.c which counts the characters in selfie.c:

> wc -m selfie.c
288310 selfie.c

The output means that selfie.c at the time of invoking the command consisted of 288,310 characters. Your output may not be exactly the same number of characters depending on which version of selfie you have. The same is true for other statistical data shown below. However, the order of magnitude of that data is likely to be the same here and in your version.

The -m part of the command is called an option that directs, in this case, wc to output the number of characters. However, we should mention that the characters in selfie.c are actually encoded according to the newer UTF-8 standard which uses eight rather than seven bits per character.

UTF-8
(Universal Character Set Transformation Format - 8-bit) A character encoding capable of encoding all possible characters (called code points) in Unicode. The encoding is variable-length and uses 8-bit code units. It is designed for backward compatibility with ASCII.

When it comes to selfie we nevertheless speak of ASCII characters because of that backward compatibility. Here this means that the UTF-8 encoding of a given character is exactly the ASCII encoding of that character when simply ignoring the eighth bit. More on encoding characters can be found in the next chapter.

Since a unit of eight bits is very common in computer systems there is a well-known term for that unit called a byte.

Byte
A unit of digital information in computing and telecommunications that most commonly consists of eight bits.

We can easily verify that selfie.c consists of the same number of bytes than characters by using the command wc -c selfie.c with the -c option which directs wc to report the number of bytes in selfie.c:

> wc -c selfie.c
288310 selfie.c

In other words, for a computer selfie.c is in fact a sequence of eight times 288,310 bits, that is, 2,306,480 bits. The key question addressed by this book is where the meaning of these bits comes from.

The source code of selfie is ultimately a sequence of bits. How do they get their meaning? Bits, as is, have no meaning, they are just bits. Characters, by the way, are no different. Characters, as is, have no meaning either, they are just symbols, and can anyway be encoded in bits. When it comes to a machine the meaning of bits and thus characters or any kind of symbol has to be created mechanically. The key insight is that the meaning of bits is in the process of changing bits, not in the bits themselves.

The process of adding 7 to 42, according to the rules of elementary arithmetic, makes 7, 42, and 49 represent numbers rather than something else. But if we just look at 7, 42, and 49 they could mean anything. In this example, elementary arithmetic provides meaning, namely the semantics of natural numbers. This is why it is so important to learn elementary arithmetic in school. It tells us what numbers are, not just how to add, subtract, multiply, and divide them!

Elementary Arithmetic
The simplified portion of arithmetic that includes the operations of addition, subtraction, multiplication, and division.

Virtually all modern computers include circuitry for performing elementary arithmetic but they do so with binary rather than decimal numbers since computers only handle bits. Our example of 7, 42, and 49 in binary is just as simple.

Again, adding 111 to 101010 makes 111, 101010, and 110001 represent numbers while they could otherwise represent anything. More on encoding numbers in binary can be found in the next chapter.

Binary Number
A number expressed in the binary numeral system or base-2 numeral system which represents numeric values using two different symbols: typically 0 (zero) and 1 (one).

It may be hard to believe but determination and knowing elementary arithmetic is enough to understand this book. The source code of selfie, that is, the around two hundred and seventy thousand characters of selfie.c represented by around two million bits get their meaning in pretty much the same way than having bits represent numbers: by changing them. The only difference is that the process of change is a bit more involved than elementary arithmetic.

The Compiler

Let us have a closer look at how this works with selfie. Try the make command:

> make
cc -w -O3 -m64 -D'main(a,b)=main(int argc, char** argv)' -Duint64_t='unsigned long l\
ong' selfie.c -o selfie

The make command invokes the cc command which compiles the file selfie.c into a file called selfie (without the .c extension) as directed by the -o option, ignoring the other options for clarity. In other words, the sequence of bits representing selfie.c is changed into another sequence of bits representing selfie. The difference between the two sequences is that the former represents source code whereas the latter represents machine code.

Compiling Selfie
Compiling Selfie
Machine Code
A sequence of instructions executed directly by a computer’s central processing unit (CPU).

The key idea is that both sequences are supposed to have the same semantics. However, selfie is executable by a computer whereas selfie.c is not, at least not purposefully, yet selfie.c is human-readable and writable in particular. The process of changing selfie.c into selfie is called compilation which is done by a compiler such as the above cc compiler.

Compiler
A computer program that transforms source code written in a programming language (the source language) into another computer language (the target language), with the latter often having a binary form known as object or machine code. The most common reason for converting source code is to create an executable program.

This means that we now have a version of selfie that we can run on our machine! Let us try and run selfie using the command ./selfie:

> ./selfie
./selfie: usage: selfie { -c { source } | -o binary | -s assembly | -l binary | -sat\
 dimacs } [ ( -m | -d | -r | -n | -y | -min | -mob ) 0-64 ... ]

Selfie requires using at least one option to do anything useful and therefore responds with its usage pattern and then terminates without doing anything else. To do something useful, let us try the first -c option on, well, selfie.c itself:

> ./selfie -c selfie.c
./selfie: selfie compiling selfie.c with starc
./selfie: 288310 characters read in 10003 lines and 1334 comments
./selfie: with 169991(58.96%) characters in 43633 actual symbols
./selfie: 340 global variables, 437 procedures, 408 string literals
./selfie: 2509 calls, 1135 assignments, 85 while, 871 if, 391 return
./selfie: symbol table search time was 2 iterations on average and 48562 in total
./selfie: 169936 bytes generated with 39372 instructions and 12448 bytes of data
./selfie: init:    lui: 2280(5.78%), addi: 13557(34.41%)
./selfie: memory:  ld: 7090(17.99%), sd: 5865(14.88%)
./selfie: compute: add: 3405(8.64%), sub: 703(1.78%), mul: 807(2.40%), divu: 78(0.19\
%), remu: 35(0.80%)
./selfie: control: sltu: 623(1.58%), beq: 960(2.43%), jal: 3544(8.99%), jalr: 437(1.\
10%), ecall: 8(0.20%)

Now, things are taking off. Selfie includes a compiler, just like the cc compiler, that we call starc and invoke with the -c option. The statistics in the output of starc may safely be ignored here. This will become clear later. The starc compiler is capable of compiling all of selfie including itself. By now we all know why selfie is called selfie but the story continues.

After compiling selfie.c starc only stores the machine code internally but does not write it to a file. To do that we need to use the -o option:

> ./selfie -c selfie.c -o selfie.m
./selfie: selfie compiling selfie.c with starc
...
./selfie: 170064 bytes with 39372 instructions and 12448 bytes of data written into \
selfie.m

The three dots ... indicate that we omitted some output that is either identical to or insignificantly different from the previous output. The command produces a file called selfie.m that contains machine code compiled from selfie.c using the starc compiler in selfie rather than the cc compiler. The process is called self-compilation.

Self-Compiling Selfie
Self-Compiling Selfie

By now we have three different sequences of bits in selfie.c, selfie, and selfie.m that are all supposed to have the same semantics. The only difference between selfie and selfie.m is that selfie is executable on our computer whereas selfie.m is executable on a computer that we do not have. This is not the end of the story though.

The Emulator

The reason why starc targets a different machine than cc is because it makes starc a lot simpler. But how do we execute selfie.m? Well, selfie not only includes the starc compiler, it also includes mipster, which is an emulator of the computer that can execute selfie.m and any other machine code generated by starc. That computer is so simple that everyone can understand it in a reasonable amount of time even though it corresponds to parts of a real machine.

Emulator
Software that enables one computer system (called the host) to behave like another computer system (called the guest).

We execute selfie.m by first loading it using the -l option and then running it by invoking mipster using the -m option:

> ./selfie -l selfie.m -m 1
./selfie: 170064 bytes with 39372 instructions and 12448 bytes of data loaded from s\
elfie.m
./selfie: selfie executing selfie.m with 1MB physical memory on mipster
selfie.m: usage: selfie { -c { source } | -o binary | -s assembly | -l binary | -sat\
 dimacs } [ ( -m | -d | -r | -n | -y | -min | -mob ) 0-64 ... ]
./selfie: selfie.m exiting with exit code 0 and 0.00MB mallocated memory
./selfie: selfie terminating selfie.m with exit code 0
./selfie: summary: 67182 executed instructions and 0.16MB mapped memory
./selfie: init:    lui: 17(0.20%), addi: 28267(42.70%)
./selfie: memory:  ld: 14816(22.50%), sd: 9569(14.24%)
./selfie: compute: add: 1614(2.40%), sub: 1485(2.21%), mul: 1773(2.63%), divu: 623(0\
.92%), remu: 388(0.57%)
./selfie: control: sltu: 902(1.34%), beq: 912(1.35%), jal: 4466(6.64%), jalr: 2194(3\
.26%), ecall: 156(0.23%)
./selfie: profile: total,max(ratio%)@addr,2max,3max
./selfie: calls:   2194,624(28.44%)@0x282C,335(15.26%)@0x29C4,335(15.26%)@0x309C
./selfie: loops:   211,122(57.82%)@0x40A8,63(29.85%)@0x184,23(10.90%)@0x47C4
./selfie: loads:   14816,624(4.21%)@0x2840,624(4.21%)@0x2844,624(4.21%)@0x2854
./selfie: stores:  9569,624(6.52%)@0x2830,624(6.52%)@0x2838,335(3.50%)@0x29C8

After loading selfie.m the -m 1 option directs mipster to emulate a computer with 1 megabyte of memory (abbreviated 1MB, explained below) for executing selfie.m. Since selfie.m is invoked without any options, which could appear after the -m 1 option, it responds, just like selfie without options before, with its usage pattern and then terminates. After that mipster terminates and outputs a summary of its builtin performance profiler.

Profiling
A form of dynamic program analysis that measures, for example, the space (memory) or time complexity of a program, the usage of particular instructions, or the frequency and duration of function calls. Most commonly, profiling information serves to aid program optimization.

We later use profiling to explain performance-related issues of selfie. But now, let us try something cool. Since mipster is part of selfie we can even have mipster execute machine code for selfie generated by starc without writing the code into a file:

> ./selfie -c selfie.c -m 1
./selfie: selfie compiling selfie.c with starc
...
./selfie: selfie executing selfie.c with 1MB physical memory on mipster
...
./selfie: summary: 67182 executed instructions and 0.16MB mapped memory
./selfie: init:    lui: 17(0.20%), addi: 28267(42.70%)
./selfie: memory:  ld: 14816(22.50%), sd: 9569(14.24%)
./selfie: compute: add: 1614(2.40%), sub: 1485(2.21%), mul: 1773(2.63%), divu: 623(0\
.92%), remu: 388(0.57%)
./selfie: control: sltu: 902(1.34%), beq: 912(1.35%), jal: 4466(6.64%), jalr: 2194(3\
.26%), ecall: 156(0.23%)
./selfie: profile: total,max(ratio%)@addr(line#),2max,3max
./selfie: calls:   2194,624(28.44%)@0x282C(~1667),335(15.26%)@0x29C4(~1691),335(15.2\
6%)@0x309C(~1779)
./selfie: loops:   211,122(57.82%)@0x40A8(~2074),63(29.85%)@0x184(~254),23(10.90%)@0\
x47C4(~2173)
./selfie: loads:   14816,624(4.21%)@0x2840(~1667),624(4.21%)@0x2844(~1667),624(4.21%\
)@0x2854(~1667)
./selfie: stores:  9569,624(6.52%)@0x2830(~1667),624(6.52%)@0x2838(~1667),335(3.50%)\
@0x29C8(~1691)

The output is just like before except for the approximate source code line numbers in the profile. Those are only available if executing machine code generated in the same run rather than loading machine code. Never mind if you do not understand what this means. It will become clear later.

Let us maintain our momentum and do something even cooler than before. We now compile selfie.c with starc and then execute the generated machine code on mipster only to have that code compile selfie.c again, all in the same run. This requires 2 to 3MB rather than 1MB for mipster and will take starc a few minutes to complete depending on the type and speed of our computer because executing starc on mipster is slower than executing starc directly on our computer. However, it does work and this is what counts here:

> ./selfie -c selfie.c -m 3 -c selfie.c
./selfie: selfie compiling selfie.c with starc
...
./selfie: selfie executing selfie.c with 3MB physical memory on mipster
selfie.c: selfie compiling selfie.c with starc
...
./selfie: selfie.c exiting with exit code 0 and 2.11MB mallocated memory
./selfie: selfie terminating selfie.c with exit code 0
./selfie: summary: 284201855 executed instructions and 2.00MB mapped memory
...

We can even verify that starc generates the same machine code independently of whether it runs directly on our machine or on mipster. We simply have starc generate machine code into two different files called selfie1.m and selfie2.m:

> ./selfie -c selfie.c -o selfie1.m -m 3 -c selfie.c -o selfie2.m
./selfie: selfie compiling selfie.c with starc
...
./selfie: 170064 bytes with 39372 instructions and 12448 bytes of data written into \
selfie1.m
./selfie: selfie executing selfie1.m with 3MB physical memory on mipster
selfie1.m: selfie compiling selfie.c with starc
...
selfie1.m: 170064 bytes with 39372 instructions and 12448 bytes of data written into\
 selfie2.m
./selfie: selfie1.m exiting with exit code 0 and 2.11MB mallocated memory
./selfie: selfie terminating selfie1.m with exit code 0
./selfie: summary: 284278675 executed instructions and 2.00MB mapped memory
...
Mipster-Executed Self-Compilation
Mipster-Executed Self-Compilation

Both files generated by starc are indeed identical. To verify that try the diff command as follows:

> diff -s selfie1.m selfie2.m
Files selfie1.m and selfie2.m are identical

We call this phenomenon the fixed point of a self-compiling compiler. If we continue generating machine code for starc using starc the machine code will remain the same. So, we still have only three different sequences of bits in selfie.c, selfie, and selfie.m that are supposed to have the same semantics. In particular, if we run selfie.m on mipster to compile selfie.c the result will again be an exact copy of selfie.m. However, there is a fourth sequence with the same semantics that we can generate with selfie. It represents a human-readable version of the machine code in selfie.m written in what is called assembly.

Assembly
A low-level programming language for a computer, or other programmable device, in which there is a very strong (generally one-to-one) correspondence between the language and the architecture’s machine code instructions.

Try the -s option to have selfie generate assembly as follows:

> ./selfie -c selfie.c -s selfie.s
./selfie: selfie compiling selfie.c with starc
...
./selfie: 1763885 characters of assembly with 39372 instructions written into selfie\
.s

The part of selfie that generates assembly is called a disassembler.

Disassembler
A computer program that translates machine language into assembly language - the inverse operation to that of an assembler.

Selfie does not implement an assembler though. The starc compiler simply generates machine code right away without going through assembly first (which would then require an assembler to generate machine code from that assembly).

Since assembly is human-readable let us check it out:

> less selfie.s

For example, the first few lines of selfie.s are:

0x0(~1): 0x000392B7: lui $t0,0x39
0x4(~1): 0x7D028293: addi $t0,$t0,2000
0x8(~1): 0x00028193: addi $gp,$t0,0
0xC(~1): 0x00000513: addi $a0,$zero,0
0x10(~1): 0x0D600893: addi $a7,$zero,214
0x14(~1): 0x00000073: ecall
0x18(~1): 0x00750513: addi $a0,$a0,7
0x1C(~1): 0x00800293: addi $t0,$zero,8
0x20(~1): 0x025572B3: remu $t0,$a0,$t0
0x24(~1): 0x40550533: sub $a0,$a0,$t0
0x28(~1): 0x0D600893: addi $a7,$zero,214
0x2C(~1): 0x00000073: ecall
0x30(~1): 0xFEA1BC23: sd $a0,-8($gp)
0x34(~1): 0x00000513: addi $a0,$zero,0
0x38(~1): 0x00810293: addi $t0,$sp,8
0x3C(~1): 0xFF810113: addi $sp,$sp,-8
0x40(~1): 0x00513023: sd $t0,0($sp)
0x44(~1): 0x680260EF: jal $ra,39328[0x266C4]

What you see is a human-readable version of the machine code in selfie.m. The purpose of selfie.s is here to study selfie.m and eventually understand its semantics. Selfie can even show the assembly code as it is being executed by mipster which helps debugging the machine code.

Debugging
The process of finding and resolving of defects that prevent correct operation of computer software or a system.

Let us try that using the -d option which invokes mipster in debugging mode. Careful though, there will be a lot of assembly code racing down your screen. We omit most of that here:

> ./selfie -c selfie.c -d 1
./selfie: selfie compiling selfie.c with starc
...
./selfie: selfie executing selfie.c with 1MB physical memory on mipster
$pc=0x10000(~1): 0x000392B7: lui $t0,0x39: |- $t0=0x0 -> $t0=0x39000
$pc=0x10004(~1): 0x7D028293: addi $t0,$t0,2000: $t0=233472(0x39000) |- $t0=233472(0x\
39000) -> $t0=235472(0x397D0)
$pc=0x10008(~1): 0x00028193: addi $gp,$t0,0: $t0=235472(0x397D0) |- $gp=0x0 -> $gp=0\
x397D0
$pc=0x1000C(~1): 0x00000513: addi $a0,$zero,0: $zero=0(0x0) |- $a0=0(0x0) -> $a0=0(0\
x0)
$pc=0x10010(~1): 0x0D600893: addi $a7,$zero,214: $zero=0(0x0) |- $a7=0(0x0) -> $a7=2\
14(0xD6)
$pc=0x10014(~1): 0x00000073: ecall(brk): $a0=0x0 |- $a0=0x0 -> $a0=0x397D0
$pc=0x10018(~1): 0x00750513: addi $a0,$a0,7: $a0=235472(0x397D0) |- $a0=235472(0x397\
D0) -> $a0=235479(0x397D7)
$pc=0x1001C(~1): 0x00800293: addi $t0,$zero,8: $zero=0(0x0) |- $t0=235472(0x397D0) -\
> $t0=8(0x8)
$pc=0x10020(~1): 0x025572B3: remu $t0,$a0,$t0: $a0=235479(0x397D7),$t0=8(0x8) |- $t0\
=8(0x8) -> $t0=7(0x7)
$pc=0x10024(~1): 0x40550533: sub $a0,$a0,$t0: $a0=235479(0x397D7),$t0=7(0x7) |- $a0=\
235479(0x397D7) -> $a0=235472(0x397D0)
$pc=0x10028(~1): 0x0D600893: addi $a7,$zero,214: $zero=0(0x0) |- $a7=214(0xD6) -> $a\
7=214(0xD6)
$pc=0x1002C(~1): 0x00000073: ecall(brk): $a0=0x397D0 |- ->
$pc=0x10030(~1): 0xFEA1BC23: sd $a0,-8($gp): $gp=0x397D0,$a0=235472(0x397D0) |- mem[\
0x397C8]=0 -> mem[0x397C8]=$a0=235472(0x397D0)
$pc=0x10034(~1): 0x00000513: addi $a0,$zero,0: $zero=0(0x0) |- $a0=235472(0x397D0) -\
> $a0=0(0x0)
$pc=0x10038(~1): 0x00810293: addi $t0,$sp,8: $sp=0xFFFFFFD0 |- $t0=7(0x7) -> $t0=429\
4967256(0xFFFFFFD8)
$pc=0x1003C(~1): 0xFF810113: addi $sp,$sp,-8: $sp=0xFFFFFFD0 |- $sp=0xFFFFFFD0 -> $s\
p=0xFFFFFFC8
$pc=0x10040(~1): 0x00513023: sd $t0,0($sp): $sp=0xFFFFFFC8,$t0=4294967256(0xFFFFFFD8\
) |- mem[0xFFFFFFC8]=0 -> mem[0xFFFFFFC8]=$t0=4294967256(0xFFFFFFD8)
$pc=0x10044(~1): 0x680260EF: jal $ra,39328[0x366C4]: |- $ra=0x0,$pc=0x10044 -> $pc=0\
x366C4,$ra=0x10048
          |
          |
          |
$pc=0x36728(~10003): 0x00008067: jalr $zero,0($ra): $ra=0x10048 |- $pc=0x36728 -> $p\
c=0x10048
$pc=0x10048(~1): 0xFF810113: addi $sp,$sp,-8: $sp=0xFFFFFFD8 |- $sp=0xFFFFFFD8 -> $s\
p=0xFFFFFFD0
$pc=0x1004C(~1): 0x00A13023: sd $a0,0($sp): $sp=0xFFFFFFD0,$a0=0(0x0) |- mem[0xFFFFF\
FD0]=1 -> mem[0xFFFFFFD0]=$a0=0(0x0)
$pc=0x10050(~1): 0x00013503: ld $a0,0($sp): $sp=0xFFFFFFD0,mem[0xFFFFFFD0]=0 |- $a0=\
0(0x0) -> $a0=0(0x0)=mem[0xFFFFFFD0]
$pc=0x10054(~1): 0x00810113: addi $sp,$sp,8: $sp=0xFFFFFFD0 |- $sp=0xFFFFFFD0 -> $sp\
=0xFFFFFFD8
$pc=0x10058(~1): 0x05D00893: addi $a7,$zero,93: $zero=0(0x0) |- $a7=64(0x40) -> $a7=\
93(0x5D)
$pc=0x1005C(~1): 0x00000073: ecall(exit): $a0=0x0 |- ->
./selfie: selfie.c exiting with exit code 0 and 0.00MB mallocated memory
./selfie: selfie terminating selfie.c with exit code 0
./selfie: summary: 67182 executed instructions and 0.16MB mapped memory
...

What is interesting about that output is the fact that each line on the screen shows a tiny bit of progress the machine is making during the run. In fact, all lines taken together make up what the semantics of selfie.c is during that run. Even though, in this case, it is only the response with its usage pattern there is quite a bit happening as you can see.

By the way, we can also slow down the machine to see what happens in slow motion by leveraging the fact that mipster can execute any machine code generated by starc. In particular, mipster can not only run starc but all of selfie including itself. Let us try that and see what happens if we load selfie.m into mipster and then run mipster on top of mipster to run selfie without any options. We call that self-execution of mipster:

> ./selfie -c selfie.c -o selfie.m -m 2 -l selfie.m -m 1
./selfie: selfie compiling selfie.c with starc
...
./selfie: 170064 bytes with 39372 instructions and 12448 bytes of data written into \
selfie.m
./selfie: selfie executing selfie.m with 2MB physical memory on mipster
selfie.m: 170064 bytes with 39372 instructions and 12448 bytes of data loaded from s\
elfie.m
selfie.m: selfie executing selfie.m with 1MB physical memory on mipster
selfie.m: usage: selfie { -c { source } | -o binary | -s assembly | -l binary | -sat\
 dimacs } [ ( -m | -d | -r | -n | -y | -min | -mob ) 0-64 ... ]
selfie.m: selfie.m exiting with exit code 0 and 0.00MB mallocated memory
selfie.m: selfie terminating selfie.m with exit code 0
selfie.m: summary: 67182 executed instructions and 0.17MB mapped memory
...
./selfie: selfie.m exiting with exit code 0 and 11.25MB mallocated memory
./selfie: selfie terminating selfie.m with exit code 0
./selfie: summary: 114226675 executed instructions and 1.81MB mapped memory
...

Now, we could have starc compile selfie.c while running on top of two mipsters but this would take several hours depending on the speed of our computer. We could even do that on top of three or in fact any number of mipsters but the time for starc to complete would increase exponentially with the number of mipsters running on top of each other. This means three mipsters take days, four take months, and five take years!

The Hypervisor

A way to avoid the slowdown of running emulators is virtualization.

Virtualization
The act of creating a virtual (rather than actual) version of something, including virtual computer hardware platforms, operating systems, storage devices, and computer network resources.

With selfie, instead of having mipster execute mipster to execute starc, we can also have mipster execute hypster to host starc rather than to execute starc. Hypster is a hypervisor that mimics mipster without executing any machine code itself but instead asks the mipster on which it runs to execute the machine code on its behalf. For the machine code there is no difference except that it runs much faster.

Hypervisor
A piece of computer software, firmware or hardware that creates and runs virtual machines. A computer on which a hypervisor runs one or more virtual machines is called a host machine, and each virtual machine is called a guest machine.

Let us try using the same command as above for running starc on mipster but this time running starc on hypster, using the -y option, running on top of mipster:

> ./selfie -c selfie.c -o selfie3.m -m 3 -l selfie3.m -y 3 -c selfie.c -o selfie4.m
./selfie: selfie compiling selfie.c with starc
...
./selfie: 170064 bytes with 39372 instructions and 12448 bytes of data written into \
selfie3.m
./selfie: selfie executing selfie3.m with 3MB physical memory on mipster
selfie3.m: 170064 bytes with 39372 instructions and 12448 bytes of data loaded from \
selfie3.m
selfie3.m: selfie executing selfie3.m with 3MB physical memory on hypster
selfie3.m: selfie compiling selfie.c with starc
...
selfie3.m: 170064 bytes with 39372 instructions and 12448 bytes of data written into\
 selfie4.m
selfie3.m: selfie3.m exiting with exit code 0 and 2.11MB mallocated memory
selfie3.m: selfie terminating selfie3.m with exit code 0
selfie3.m: summary: 0 executed instructions and 2.10MB mapped memory
./selfie: selfie3.m exiting with exit code 0 and 13.25MB mallocated memory
./selfie: selfie terminating selfie3.m with exit code 0
./selfie: summary: 554061441 executed instructions and 2.43MB mapped memory
...
Hypster-Hosted Self-Compilation
Hypster-Hosted Self-Compilation

Again, selfie3.m and selfie4.m are identical and equal to selfie.m, selfie1.m, and selfie2.m. Since hypster can run on top of hypster, which is why we call hypster self-hosting, we could now run starc on two or in fact any number of hypsters and mipsters on at least one mipster and get the same result. Try four hypsters on one mipster and see for yourself that it will not take years but in fact just a few minutes, not much longer than using just one mipster:

> ./selfie -c selfie.c -o selfie5.m -m 4 -l selfie5.m -y 4 -l selfie5.m -y 3 -l self\
ie5.m -y 3 -l selfie5.m -y 3 -c selfie.c -o selfie6.m
./selfie: selfie compiling selfie.c with starc
...
./selfie: 170064 bytes with 39372 instructions and 12448 bytes of data written into \
selfie5.m
./selfie: selfie executing selfie5.m with 4MB physical memory on mipster
selfie5.m: 170064 bytes with 39372 instructions and 12448 bytes of data loaded from \
selfie5.m
selfie5.m: selfie executing selfie5.m with 4MB physical memory on hypster
selfie5.m: 170064 bytes with 39372 instructions and 12448 bytes of data loaded from \
selfie5.m
selfie5.m: selfie executing selfie5.m with 3MB physical memory on hypster
selfie5.m: 170064 bytes with 39372 instructions and 12448 bytes of data loaded from \
selfie5.m
selfie5.m: selfie executing selfie5.m with 3MB physical memory on hypster
selfie5.m: 170064 bytes with 39372 instructions and 12448 bytes of data loaded from \
selfie5.m
selfie5.m: selfie executing selfie5.m with 3MB physical memory on hypster
selfie5.m: selfie compiling selfie.c with starc
...
selfie5.m: 170064 bytes with 39372 instructions and 12448 bytes of data written into\
 selfie6.m
selfie5.m: selfie5.m exiting with exit code 0 and 2.11MB mallocated memory
selfie5.m: selfie terminating selfie5.m with exit code 0
selfie5.m: summary: 0 executed instructions and 2.10MB mapped memory
selfie5.m: selfie5.m exiting with exit code 0 and 13.25MB mallocated memory
selfie5.m: selfie terminating selfie5.m with exit code 0
selfie5.m: summary: 0 executed instructions and 2.44MB mapped memory
selfie5.m: selfie5.m exiting with exit code 0 and 13.25MB mallocated memory
selfie5.m: selfie terminating selfie5.m with exit code 0
selfie5.m: summary: 0 executed instructions and 2.89MB mapped memory
selfie5.m: selfie5.m exiting with exit code 0 and 13.25MB mallocated memory
selfie5.m: selfie terminating selfie5.m with exit code 0
selfie5.m: summary: 0 executed instructions and 3.34MB mapped memory
./selfie: selfie5.m exiting with exit code 0 and 14.25MB mallocated memory
./selfie: selfie terminating selfie5.m with exit code 0
./selfie: summary: 1446486337 executed instructions and 3.77MB mapped memory
...

Performance

A significant part of computer science is about computational performance. How much time does a computation take, how much memory does it need, and, increasingly important, how much energy does it consume? We introduce the standard units for quantifying computational performance in terms of time and space, and even in terms of energy.

Before doing so, we should nevertheless point out that in computer science the notion of order of magnitude may be used with its standard meaning as it is common in other disciplines or with a meaning rather unique to the field. Traditionally, something is an order of magnitude bigger or smaller if it is bigger or smaller by a factor of ten. However, in computer science, because of the widespread use of binary codes, an order of magnitude may also refer to a factor of two.

Order of Magnitude
A factor with base 10 (decimal) or with base 2 (binary).

As a consequence and even more confusing is the fact that commonly used prefixes such as kilo, mega, giga, and tera may either refer to factors of 103=1000 or 210=1024 depending on context and unit. Only recently new binary prefixes for kilo, mega, giga, tera, and so on, called kibi, mebi, gibi, tebi, respectively, have been introduced. Adoption is nevertheless slow.

Decimal Prefix Value Binary Prefix Value
nano (n) 10-9=1000-3    
micro (u) 10-6=1000-2    
milli (m) 10-3=1000-1    
kilo (k) 103=10001 kilo (K,Ki,kibi) 210 = 10241
mega (M) 106=10002 mega (M,Mi,mebi) 220 = 10242
giga (G) 109=10003 giga (G,Gi,gibi) 230 = 10243
tera (T) 1012=10004 tera (T,Ti,tebi) 240 = 10244

Memory storage is typically quantified in bytes with base 2, for example, in gigabytes or, more recently and unambiguously in gibibytes. Processor speed, however, may be represented in instructions per second with base 10, for example, in million instructions per second (MIPS). Similarly, special-purpose metrics such as FLOPS, floating-point operations per second, are prefixed with base 10. Also, data rates are often represented in bits per second with base 10, for example, in gigabits per second (gbps), that is, 109 bits per second.

Speed is generally characterized in terms of throughput, the amount of work done per unit of time, and latency, the amount of time to do some work, in particular before some other work can be done. The difference is usually explained with a simple example. Imagine a fiber optic cable connecting, say, New York City and San Francisco and a truck loaded with DVDs driving from New York City to San Francisco. Which one provides higher throughput and which one lower latency? Surprisingly, it may very well be possible that the truck provides higher throughput. However, delivering just a single bit by truck may take days. Thus the truck clearly provides terrible latency not suitable to host, say, a skype call. Got it?

Throughput
Amount of work performed per unit of time.
Latency
Amount of time (or delay) to perform work.

For completeness and because of the increasing importance of energy efficiency in computing, in particular in mobile computing and large-scale data center computing, we also list standard units for quantifying energy here even though there is no further discussion of that topic.

Energy is quantified in Joule and power consumption in Watt, that is, the amount of energy consumed per second. Interestingly, we can quantify the efficiency of computation in number of operations performed by that computation per Joule, or even in MIPS or FLOPS per Watt.

Performance Unit
memory bit (b), kilobit (kb), megabit (mb), gigabit (gb), terabit (tb) with decimal prefix
  byte (B), kilobyte (kB), megabyte (MB), gigabyte (GB), terabyte (TB) with decimal prefix
  byte (B), kibibyte (KB,KiB), mebibyte (MB,MiB), gibibyte (GB,GiB), tebibyte (TiB) with binary prefix
latency nanoseconds (ns), microseconds (us), milliseconds (ms), seconds (s), minutes (m), hours (h)
throughput million instructions per second (MIPS)
  floating point operations per second (FLOPS)
  bytes/second, kB/s, MB/s, GB/s, TB/s
  bits/second, kbps, mbps, gbps, tbps
energy joule
power joule/second (watt)
efficiency operations/joule
  MIPS/watt
  FLOPS/watt

Summary

Let us summarize what we have seen so far and prepare for the next chapter. This chapter is the first step towards understanding selfie by observing that:

In particular, the meaning of bits or characters or any symbols comes from change over time, by being changed (making them data) and by changing others (making them code). Let us have another look at our earlier example of adding 7 to 42.

Bits represent data such as an integer value at the time when they are used as operand of an integer operation. Bits represent code such as an integer operation at the time when they direct a machine to perform an integer operation. At all other times, their meaning could be anything. This also means that bits can sometimes be data and sometimes be code. Selfie is a non-trivial example of that phenomenon.

For now it is important to remember that semantics comes from change and that bits can be data, when changed, and code, when changing others. Selfie shows that through its self-referential nature. Selfie in source code can be data when being compiled and selfie in machine code can be code when being executed or hosted.

Furthermore, the source code of self-compiling compilers such as the starc compiler is written in the programming language that they compile. For example, selfie.c is written in C* which is a tiny subset of the widely used programming language C hence the extension .c in the name of the file.

C
A general-purpose, imperative computer programming language, supporting structured programming, lexical variable scope and recursion, while a static type system prevents many unintended operations.

C* is much easier to learn than all of C. We designed C* to make everything as simple as possible but still realistic. This is similar to using only a tiny but useful subset of the full vocabulary of a foreign language. It also means that all C* programs are C programs but not the other way around.

The starc compiler is part of selfie and thus written in C* but starc not only compiles selfie and in particular itself but all C* programs hence the name starc. In other words, starc is a C* compiler written in C*. We do not need to know in which programming language the cc compiler is written (likely C) but we do know that it compiles all C programs hence the name cc. Since C* is a subset of C the cc compiler also compiles selfie.

We conclude this chapter by pointing out that there was a time when there was no compiler for C and in fact no compiler for any programming language. The first compilers were therefore written in machine code or in some programming language and then compiled into machine code by hand. Thus even selfie’s origins go back to that time!

3. Encoding

Information, whatever it is, needs to be encoded in bits for a computer to handle it. Since selfie.c is a sequence of characters let us first look at how characters are encoded. After that we explain how sequences of characters form larger constructs such as strings, identifiers, and even positive and negative numbers. We also recall parts of elementary arithmetics and explain how that works on a computer.

Interestingly, the encoding of numbers can also be used to represent information on machine level. For example, the encoding of memory addresses and machine instructions can be done in the same or a similar way than the encoding of numbers. We explain that here as well.

Characters

We mention in the previous chapter that the characters in selfie.c are ASCII characters encoded in eight bits, that is, one byte according to the UTF-8 standard.

Both ASCII and UTF-8 are essentially one-to-one mappings from bits to characters written down in a table. Each 8-bit sequence is associated with exactly one character. For a computer that table is only relevant when the machine receives characters from a keyboard and sends characters to a screen. Then, the machine needs to know the ASCII code for each key on the keyboard to remember which key was pressed and it needs to know which shape of character to draw on the screen for each bit sequence. Internally, however, characters are just 8-bit sequences, with selfie and a lot of other software as well.

Let us pause for a moment and reflect about this. The above bit sequence could really still mean anything. The encoding standards for characters are just agreements on how to associate bits and characters but there is still no meaning.

So what is it now, U or 85? The answer is both, and anything else. As mentioned in the previous chapter, meaning comes from change. When the machine draws U for 01010101 on the screen then 01010101 stands for U in that moment but in the next moment the machine could increment 01010101 according to elementary arithmetic making 01010101 represent 85.

But how does selfie and in particular the starc compiler actually read characters from files such as selfie.c? It turns out that all characters are read from left to right using just a single line of source code in selfie.c. Similarly, all characters written to files and the screen are written from left to right using just one line of code in selfie.c. For further details on what the code means refer to the comments in the code.

In general, we only provide links to code with comments so that text explaining code is not duplicated here. You may want read the code in selfie.c as if it was some sort of mechanical English. There are a lot of comments whenever the code is not sufficiently self-explanatory. In other words, reading code and comments is part of the experience of reading this book!

Comments

Now, what is a comment in code anyway? A comment is text that the compiler ignores completely. It is only there for people to read and understand the code. In C*, a comment is either a single-line comment which is any text to the right of two slashes // until the end of the line, or it is a multi-line comment which is any text enclosed by /* and */. There is a lot of that in the beginning of selfie.c. It actually takes a bit of scrolling down to see the first line of code that means something to the machine and is not a comment.

If we were to remove all comments from selfie.c the result would still be semantically equivalent to selfie.c from the perspective of the machine. In fact, we can safely remove even more characters called whitespace without changing any semantics.

Whitespace

Whitespace Character
Any character or series of characters that represent horizontal or vertical space in typography. When rendered, a whitespace character does not correspond to a visible mark, but typically does occupy an area on a page.

Whitespace characters are invisible but still important for formatting source code yet they are typically irrelevant in terms of semantics. While this is true for many programming languages including C and C* it is not true in general. There are programming languages in which whitespace does make a semantical difference. This is important to check when learning new programming languages.

The starc compiler considers the space, the tabulator, the line feed, and the carriage return characters whitespace and ignores them when reading source code:

> ./selfie -c selfie.c
./selfie: selfie compiling selfie.c with starc
./selfie: 288310 characters read in 10003 lines and 1334 comments
./selfie: with 169991(58.96%) characters in 43633 actual symbols
...

Out of all the characters in selfie.c only a bit more than half of the characters are actually considered code. The rest is whitespace and characters in comments. The code in selfie.c that starc uses to ignore whitespace and comments works by reading characters from left to right, one after the other, and discarding them until a character is found that is not whitespace and not occurring in a comment. This may also continue until the end of the file is reached without finding such a character. Important for us here is to understand that the machine really only looks at one character at a time from start to end of the file.

Let us have a look at the following “Hello World!” program written in C*:

A “Hello World!” Program
 1 // global variable for pointing to the "Hello World!    " string
 2 uint64_t* foo;
 3 
 4 // main procedure for printing "Hello World!    " on the console
 5 uint64_t* main() {
 6   // point to the "Hello World!    " string
 7   foo = "Hello World!    ";
 8 
 9   // strings are actually stored in chunks of 8 characters in memory,
10   // that is, here as "Hello Wo", and "rld!    " which allows us to
11   // print them conveniently in chunks of 8 characters at a time
12 
13   // as long as there are characters print them
14   while (*foo != 0) {
15     // 1 means that we print to the console
16     // foo points to a chunk of 8 characters
17     // 8 means that we print 8 characters
18     write(1, foo, 8);
19 
20     // go to the next chunk of 8 characters
21     foo = foo + 1;
22   }
23 }

and run the code as follows (ignoring the compiler warning):

> ./selfie -c manuscript/code/hello-world.c -m 1
./selfie: selfie compiling manuscript/code/hello-world.c with starc
./selfie: warning in manuscript/code/hello-world.c in line 5: type mismatch, uint64_\
t expected but uint64_t* found
./selfie: 734 characters read in 23 lines and 11 comments
./selfie: with 94(12.80%) characters in 39 actual symbols
./selfie: 1 global variables, 1 procedures, 1 string literals
./selfie: 2 calls, 2 assignments, 1 while, 0 if, 0 return
./selfie: symbol table search time was 1 iterations on average and 28 in total
./selfie: 504 bytes generated with 116 instructions and 40 bytes of data
./selfie: init:    lui: 1(0.73%), addi: 72(52.94%)
./selfie: memory:  ld: 20(14.70%), sd: 11(8.80%)
./selfie: compute: add: 3(2.20%), sub: 3(2.20%), mul: 1(0.73%), divu: 0(0.00%), remu\
: 2(1.47%)
./selfie: control: sltu: 1(0.73%), beq: 5(3.67%), jal: 3(2.20%), jalr: 6(4.41%), eca\
ll: 8(5.88%)
./selfie: selfie executing manuscript/code/hello-world.c with 1MB physical memory on\
 mipster
Hello World!    ./selfie: manuscript/code/hello-world.c exiting with exit code 0 and\
 0.00MB mallocated memory
./selfie: selfie terminating manuscript/code/hello-world.c with exit code 0
./selfie: summary: 110 executed instructions and 0.00MB mapped memory
...

There is a lot of whitespace for increasing the readability of the code as well as comments for explaining what the code does. In fact, only a little more than ten percent of the characters are actual code. Note that "Hello World! " is printed on the console right before the program exits. Also, note that there are four spaces printed right after Hello World!. This is on purpose and will become clear below.

Removing all whitespace and comments, also called minification, results in the following version:

Minified “Hello World!” Program
1 uint64_t*foo;uint64_t*main(){foo="Hello World!    ";while(*foo!=0){write(1,foo,8);fo\
2 o=foo+1;}}

with this output when running the code:

> ./selfie -c manuscript/code/hello-world-minified.c -m 1
./selfie: selfie compiling manuscript/code/hello-world-minified.c with starc
./selfie: warning in manuscript/code/hello-world-minified.c in line 1: type mismatch\
, uint64_t expected but uint64_t* found
./selfie: 94 characters read in 1 lines and 0 comments
./selfie: with 94(100.00%) characters in 39 actual symbols
./selfie: 1 global variables, 1 procedures, 1 string literals
./selfie: 2 calls, 2 assignments, 1 while, 0 if, 0 return
./selfie: symbol table search time was 1 iterations on average and 28 in total
./selfie: 504 bytes generated with 116 instructions and 40 bytes of data
./selfie: init:    lui: 1(0.73%), addi: 72(52.94%)
./selfie: memory:  ld: 20(14.70%), sd: 11(8.80%)
./selfie: compute: add: 3(2.20%), sub: 3(2.20%), mul: 1(0.73%), divu: 0(0.00%), remu\
: 2(1.47%)
./selfie: control: sltu: 1(0.73%), beq: 5(3.67%), jal: 3(2.20%), jalr: 6(4.41%), eca\
ll: 8(5.88%)
./selfie: selfie executing manuscript/code/hello-world-minified.c with 1MB physical \
memory on mipster
Hello World!    ./selfie: manuscript/code/hello-world-minified.c exiting with exit c\
ode 0 and 0.00MB mallocated memory
./selfie: selfie terminating manuscript/code/hello-world-minified.c with exit code 0
./selfie: summary: 110 executed instructions and 0.00MB mapped memory
...

This time all characters are actual code but otherwise the behavior is the same with "Hello World! " appearing on the console just like before.

Minification
The process of removing all unnecessary characters from source code without changing its functionality. These unnecessary characters usually include white space characters, new line characters, and comments, which are used to add readability to the code but are not required for it to execute.

Minification is useful for improving performance and saving bandwidth when communicating source code among machines. We only mention it here to show that whitespace and comments have typically no meaning for machines. In fact, both versions of the code are semantically equivalent. How can we check that? We can have selfie generate machine code for both as follows:

> ./selfie -c manuscript/code/hello-world.c -o hello-world.m -c manuscript/code/hell\
o-world-minified.c -o hello-world-minified.m
./selfie: selfie compiling manuscript/code/hello-world.c with starc
...
./selfie: 632 bytes with 116 instructions and 40 bytes of data written into hello-wo\
rld.m
./selfie: selfie compiling manuscript/code/hello-world-minified.c with starc
...
./selfie: 632 bytes with 116 instructions and 40 bytes of data written into hello-wo\
rld-minified.m

and then check that both files are indeed identical:

> diff -s hello-world.m hello-world-minified.m
Files hello-world.m and hello-world-minified.m are identical

Strings

In computer science sequences of characters such as Hello World! are called strings.

String
A finite sequence of characters taken from some finite alphabet.

In selfie, for example, Hello World! is a string whose alphabet is in fact the printable ASCII characters UTF-8-encoded in eight bits, that is, one byte per character. However, the question is how such strings are handled and in particular encoded and stored in the memory of a computer.

Memory
Hardware device that stores information for immediate use in a computer; it is synonymous with the term “primary storage”.

Logically, memory is storage for bits as well as addresses for identifying those bits. Memory addresses are usually natural numbers from zero or some positive number to some larger positive number. To save addresses and increase speed of access, most memory is byte-addressed, that is, each address refers to a whole byte and not just a single bit. The size of byte-addressed memory, that is, the number of bytes that can be stored is the difference between the smallest and largest address plus one. The number of bits that can be stored is therefore eight times that value.

In selfie, strings are stored contiguously in memory and null-terminated but what are the alternatives? We could store the number of characters in a string or the address of the last character in front of the string. Some systems do that but not selfie. Also, we could store the string non-contiguously in memory but would then need to remember where the characters are. This would require more space to store that information and more time to find the characters but enable us to store strings even if sufficiently large contiguous memory was not available. These are interesting and fundamental trade-offs that will become more relevant later. Important for us here is to know that there is a choice.

String Literals

You may have noticed the double quotes enclosing the Hello World! string in the “Hello World!” program. There are other sequences of characters in the program such as foo, for example, that also look like strings but are not enclosed with double quotes. The difference is that "Hello World! " is meant to be literally Hello World! whereas foo is an identifier that provides a name for something. If we were to change foo consistently in the whole program to an unused identifier such as bar, for example, the program would be semantically equivalent to the original version with foo. Changing "Hello World! " on the other hand would really change the output of the program. Try it!

String Literal
The representation of a string value within the source code of a computer program.

String literals in C* such as "Hello World! " make it convenient to read and write source code that needs to output text, for example. We make extensive use of string literals in selfie.c with strings for reporting compiler errors as just one example.

The code in selfie.c that actually recognizes a string literal in source code, after reading a double quote outside of a comment, first allocates memory not used by anyone to store the string. Then it reads one character at a time and stores the characters contiguously in memory until it reads another double quote. It then stores a null to terminate the string. This code ultimately determines how string literals in C* are handled.

Character Literals

There is also the notion of character literals in C* which we use in selfie.c in a number of situations, for example, for identifying characters that represent letters and for identifying characters that represent digits.

Character Literal
The representation of a character value within the source code of a computer program.

A character literal in C* such as 'a', for example, is a single character a enclosed with single quotes. However, character literals are actually quite different from string literals. A character literal represents the ASCII code of the enclosed character whereas a string literal is a sequence of characters which may contain any number of characters including just one or even none denoted by "". Note that '', on the other hand, is meaningless.

The code in selfie.c that identifies characters other than letters and digits is another example which shows how character literals are used. Take '{' as an example. If we were to replace '{' with 123 the semantics of the code would not change because 123 is the ASCII code of {. In other words, '{' stands for 123, that is, '{' is really just a human-readable version of the ASCII code of {.

The code in selfie.c that recognizes a character literal in source code, after reading a single quote outside of a comment, reads the next character and then stores the ASCII code of that character. It then looks for the second single quote and, if it is there, returns the ASCII code. Again, this code ultimately determines how character literals in C* are handled.

Identifiers

Let us now go back to the notion of identifiers and our example of the identifier foo in the “Hello World!” program. An identifier like foo is just a name of something.

Identifier
A name that identifies (that is, labels the identity of) either a unique entity or a unique class of entities. Some of the kinds of entities an identifier might denote include variables, types, and subroutines.

Identifiers in C* can indeed denote different kinds of entities. But, for now, we only need to know that, unlike string literals, identifiers in C* always begin with a letter. After that there may appear letters, digits, and underscores _ in any order but no other characters. Why is that? Because this is how the machine knows when an identifier begins and ends. Remember, identifiers are not enclosed by any special characters like double quotes, for example.

The code in selfie.c that recognizes an identifier in source code, after reading a letter outside of a comment, first allocates memory not used by anyone to store the identifier, just like a string. Then it reads one character at a time and stores the characters contiguously in memory until it reads a character that is neither a letter nor a digit nor an underscore. It then stores a null to terminate the identifier. However, before deciding whether it has just recognized an identifier the code checks if it has actually recognized a reserved identifier or keyword.

Keywords

C* features a number of keywords with special meaning that you can nevertheless safely ignore for now. The “Hello World!” program, for example, uses the uint64_t keyword twice and the while keyword once.

Keyword
In a computer language, a reserved word (also known as a reserved identifier) is a word that cannot be used as an identifier, such as the name of a variable, function, or label – it is “reserved from use”. This is a syntactic definition, and a reserved word may have no meaning. A closely related and often conflated notion is a keyword which is a word with special meaning in a particular context. This is a semantic definition. The terms “reserved word” and “keyword” are often used interchangeably – one may say that a reserved word is “reserved for use as a keyword”.

Since the keywords in C* all begin with a letter they should not be mistaken for identifiers. The code in selfie.c that distinguishes keywords from identifiers compares potential identifiers with all keywords to implement that distinction.

Integers

Numbers are important and computers are incredibly good at working with them. Not surprisingly, it is very easy to talk about numbers in C* and compute with them. For now, we look at how numbers are represented in source code and how they are encoded in digital memory. Numbers in selfie are integers, that is, zero as well as positive and negative whole numbers.

Integer
A number that can be written without a fractional component (from the Latin integer meaning “whole”). For example, 21, 4, 0, and −2048 are integers, while 9.75 and 5.5 are not. The set of integers consists of zero (0), the natural numbers (1, 2, 3, …), also called whole, counting, or positive numbers, and their additive inverses (the negative numbers, that is −1, −2, −3, …).

Beyond integers there is no support of, say, fixed-point or even floating-point numbers in C*. However, it is always possible to write code in C* based on integers that would support them. For example, there is code in selfie.c for printing profiling information that computes the fixed-point ratio of two integers with a given number of fractional digits.

Numbers, positive or negative, are encoded, like everything else, in bits. Let us go back to the earlier example of the decimal number 85.

Both representations look different but mean the same thing and are even based on the same numeral system using positional notation. The only difference is their base or radix.

Radix
The number of unique digits, including zero, used to represent numbers in a positional numeral system.

Most importantly, arithmetic operations such as addition and subtraction work the same way for any base greater than one as long as we use the same base for the operands of the operation. For example, manually adding two binary numbers works just like adding two decimal numbers, more on that below.

While 1010101 is a binary number with base 2, the decimal number 85 is obviously encoded with base 10 in the hindu-arabic numeral system.

Hindu-Arabic Numeral System
A positional decimal numeral system and the most common for the symbolic representation of numbers in the world.

Selfie does in fact implement exactly the above computation of a recurrence relation for encoding numbers but only for numbers represented in decimal notation. An extension to other bases is nevertheless easy to do. Think about it and try!

The encoding is done in the procedure atoi which stands for ascii-to-integer. This is a standard procedure that converts a sequence of ASCII characters representing digits in positional notation to an integer value.

Selfie also implements the procedure symmetric to atoi called itoa which obviously stands for integer-to-ascii. It works by dividing the given integer by the desired base repeatedly until the quotient is zero and saving the remainders during the process. At the end, the sequence of remainders is reversed (hindu-arabic) and then returned.

The implementation of itoa in selfie not only supports decimal but also binary, octal (base 8), and hexadecimal (base 16) notation. We prepared a simple program called integer.c that prints the value represented by 85 in all possible ways supported by selfie as follows:

> ./selfie -c manuscript/code/integer.c selfie.c -m 1
./selfie: selfie compiling manuscript/code/integer.c with starc
...
./selfie: selfie executing manuscript/code/integer.c with 1MB physical memory on mip\
ster
85 in decimal:     85
'U' in ASCII:      85
"85" string:       85
85 in hexadecimal: 0x55
85 in octal:       00125
85 in binary:      1010101
./selfie: manuscript/code/integer.c exiting with exit code 0 and 0.00MB mallocated m\
emory
./selfie: selfie terminating manuscript/code/integer.c with exit code 0
./selfie: summary: 78610 executed instructions and 0.17MB mapped memory
...

You may check for yourself that 0x55 and 00125 do in fact represent the value 85 by computing 5*16+5 and (1*8+2)*8+5.

Hexadecimal and octal notation are popular in computer science because their digits represent blocks of four and three bits, respectively. In other words, their bases are powers of four and three, respectively, of the base 2 of binary notation. Hexadecimal and octal notation are therefore convenient and compact notation for binary numbers. The base 10 of decimal notation is not a power of the base 2. Digits in decimal notation therefore do not evenly correspond to blocks of bits.

The actual encoding of numbers in the memory of a digital computer, however, is not in hexadecimal or octal but usually in binary. Before continuing to negative numbers let us point out that there is a way to represent numbers that is even more basic than binary. It is called unary and works by simply using an alphabet with a single digit for counting the value of a number.

Unary Numeral System
The bijective base-1 numeral system. It is the simplest numeral system to represent natural numbers: in order to represent a number N, an arbitrarily chosen symbol representing the value 1 is repeated N times.

Well, have you noticed the enormous difference in length between unary and other representations with higher bases?

The fundamental reason for the difference in size between different notations is that any representation with a base greater than one is exponentially more succinct than unary. However, representations with base greater than one are only different in size by a constant factor. For example, octal and hexadecimal representations are only three and four times, respectively, more succinct than binary. A larger alphabet makes even less of a difference, that is, logarithmically less in the size of the alphabet. ASCII, for example, is only seven times more succinct than binary although there are 128 ASCII characters.

Encoding Alphabet Base (Radix, Size of Alphabet) # Digits in Values \(n>1\) # Values in Digits \(n>0\)
Unary {1} 1 \(n\) \(n\)
Binary {0,1} 2 \(\lceil\frac{log(n)}{log(2)}\rceil\) \(2^n\)
Octal {0,1,2,3,4,5,6,7} 8 \(\lceil\frac{log(n)}{log(8)}\rceil\) \(8^n\)
Decimal {0,1,2,3,4,5,6,7,8,9} 10 \(\lceil\frac{log(n)}{log(10)}\rceil\) \(10^n\)
Hexadecimal {0,1,2,3,4,5,6,7,8,9, 16 \(\lceil\frac{log(n)}{log(16)}\rceil\) \(16^n\)
  A,B,C,D,E,F}      

Integer Arithmetics

Fortunately, elementary arithmetics with binary numbers works just like it does with decimal numbers or any other representation with base greater than one. For example, adding two numbers in any such representation works as usual by adding their digits from right to left while carrying any overflow to the left. Only unary is different! Elementary arithmetics with unary numbers is obviously done by, imagine, string concatenation.

In binary numbers the leftmost and rightmost bits have a meaning similar to the leftmost and rightmost digits in decimal numbers. The rightmost bit, also called least significant bit, determines if the number is even or odd. For example, 01010101 represents an odd number whereas 10101010 an even number. The leftmost bit, also called most significant bit, represents the greatest value. Thus 10101010 stands for a number larger than what 01010101 stands for.

Least Significant Bit (LSB)
The bit in a binary number that appears rightmost and determines if the number is even (0) or odd (1).
Most Significant Bit (MSB)
The bit in a binary number that appears leftmost and has the greatest value.

Now, what happens if we try to add two numbers where the result exceeds the number of digits necessary to represent them individually? For example, what if we compute 255+1=256 in binary? In this case, that is, 11111111+00000001, the result is 100000000, a binary number with 9 bits rather than the 8 bits representing 255. This is not a problem if we have more than 8 bits. However, with computers everything is finite, in particular memory. Moreover, arithmetic operations are on most machines implemented for bit strings with a fixed size such as 8 bits. On such machines adding 11111111 and 00000001 results in what is called arithmetic overflow.

Arithmetic Overflow
Occurs when an arithmetic operation attempts to create a numeric value that is too large to be represented within the available storage space.

How can we deal with arithmetic overflow? There are two approaches that can be combined: detection and semantics. If the occurrence of an arithmetic overflow can be detected one can discard the computation and do something else. For this purpose most processors feature a so-called carry bit or carry flag which is set if an arithmetic operation causes an overflow indicated by a carry out of the most significant bit. In our example, the 9-th bit in 100000000 is that carry bit.

In terms of semantics, if the result of an arithmetic overflow has a defined value, one may be able to use that value in a meaningful way. For example, a common semantics for n-bit arithmetics is to compute everything modulo 2n, also referred to as wrap-around semantics or just wrap around. For example, 255+1=256 modulo 28=256 modulo 256=0, which is exactly what 100000000 in an 8-bit system stands for. There are applications that are correct even when such wrap-arounds occur.

Modulo
The remainder after division of one number by another, also called modulus.

Arithmetic overflow nevertheless is the cause of numerous software bugs and even costly accidents. Restricting the space available for representing something that can be arbitrarily large such as numbers has serious consequences. Integer arithmetics are always an approximation of real arithmetics. For correctness, computer applications need to be properly adapted to work for integer arithmetics, not real arithmetics.

Signed Integers

Now, what about subtraction and negative numbers? Ideally, we would like to represent not just positive but also negative numbers just using bits with integer arithmetics on them still intact. Obviously, one bit, or more generally one digit, is enough to encode the sign of a number, that is, distinguish positive from negative numbers. Fortunately, however, there is an overall encoding scheme that works without changing integer arithmetics such as addition. In fact, subtraction will work by using addition, as previously discussed, with negative numbers.

So, how can we represent -85 and compute something like, say, 127-85? Again, fortunately, there is a representation of negative numbers that works for any representation with base greater than one and makes addition work like subtraction. We simply take the so-called radix complement to encode a negative number which only requires us to fix the number of digits that we intend to support. Subtracting a number then works by adding its complement and ignoring the overflow beyond the supported number of digits.

You may still ask how computing 1000-85 is any easier than computing 127-85 directly. Well, it is not but the radix complement of a number happens to be equal to the so-called diminished radix complement of that number plus one, which is in fact easier to compute than the radix complement.

Back to our example. What happens if we actually try to compute 85-127 instead? Let us go through that computation as well.

Using the ten’s complement for representing negative numbers really means that the leftmost digit represents the sign. In the decimal system, an even leftmost digit indicates that we are dealing with a positive number. Conversely, an odd leftmost digit such as the 9 in 958 means that the number is actually meant to be negative. This implies that with n digits and the ten’s complement for negative numbers we can represent signed integers i with -10n/2-1 < i < 10n/2. For example, in a 3-digit system we can represent signed integers i with -501 < i < 500. In other words, we can still represent 1000 numbers with 3 digits but offset by 500 into the negative numbers.

Now, how does a computer represent negative numbers? The short answer: the exact same way as in the decimal system, just in binary. In other words, a negative number may be represented in binary by its two’s complement (remember binary has base 2). Just like before, the two’s complement of a number is computed by taking the ones’ complement of that number and adding one to it. This time, however, computing the diminished radix complement, that is, the ones’ complement is particularly easy. It is done by simply inverting the bits.

Using two’s complement for representing negative numbers a byte can in total represent signed integer values i from -128 to 127, that is, with -28/2-1 = -27-1 = -129 < i < 128 = 27 = 28/2. Our running example fits these constraints. Moreover, the most significant bit has taken over a different role than before. It indicates if a number is positive or negative. For example, 01111111 represents a positive number whereas 10000000 stands for a negative number.

Ones’ Complement
All bits of the number inverted, that is, 0s swapped for 1s and vice versa.
Two’s Complement
Given an n-bit number i with -2n/2-1 = -2n-1-1 < i < 2n-1 = 2n/2, the complement of i with respect to 2n, that is, 2n-i which is equal to the ones’ complement of i plus one.
Most Significant Bit (MSB)
The bit in a binary number that appears leftmost and has the greatest value or, if the number represents a signed integer in two’s complement, determines if the integer is positive (0) or negative (1).

Finally, what happened to the problem of arithmetic overflow? It is still there. Only the carry bit is now insufficient to detect it. When can an overflow actually occur? Simple. Only when adding two numbers with the same sign! The result of an overflow is a number with the opposite sign. Thus we need to distinguish two cases: adding two positive numbers resulting in a negative number and adding two negative numbers resulting in a positive number. Both cases follow wrap-around semantics similar to unsigned integers.

In the first case, the MSBs of the two numbers are zero, since both represent positive numbers, and the MSB of the result is one. The second case is the exact opposite. How can we detect this? Easy. There is an overflow either if there is a carry into the MSB of the result but not out into the carry bit (wrapping two positive numbers into a negative number), or else if there is no carry into the MSB but out into the carry bit (wrapping two negative numbers into a positive number). Most processors feature an overflow bit or overflow flag that is set accordingly, that is, by a so-called exclusive or between the carry into and out of the MSB. An exclusive or of two bits is true if and only if the bits are different. Note, however, that just like the carry bit has no meaning when adding signed integers, the overflow bit has no meaning when adding unsigned integers.

Integer Literals

Similar to character and string literals, source code written in C* may contain integer values, also called integer literals, written in decimal notation.

Integer Literal
An integer whose value is directly represented in source code.

The code in selfie that reads integer literals is next to the code that reads identifiers. Similar to the characters of an identifier, the digits of an integer literal are first read and stored in a string. However, that string is then converted to an integer value using the atoi procedure.

What about negative numbers then? Can we write integer literals in C* that represent negative values? The answer is yes, very conveniently in fact, for example, by writing -85 which obviously represents the value -85. However, this notation is only an abbreviation for 0 - 85 which obviously represents the same value. When reading integer literals such as -85 the starc compiler does in fact generate code that subtracts the positive value 85 from 0 to obtain the negative value -85.

Symbols

Among all language elements of C* we have seen identifiers and keywords as well as character, string, and integer literals. The only missing elements are operator symbols and symbols for structuring source code. The complete list of C* symbols, also called tokens, is surprisingly small. Our “Hello World!” Program does not contain all possible symbols but at least one from each category, that is, keywords, identifiers, literals, operator symbols, and symbols for structuring source code.

Machine Words

So, before continuing let us point out that character, string, and integer literals are the only way to describe data in C* programs. All other language elements of C* including keywords and identifiers are there to describe code and manage memory! We already know that characters are encoded in bytes and strings are stored contiguously in byte-addressed memory. What about integers then? To answer that question we need to take a closer look at how a computer handles data.

First of all, in addition to memory, most computers contain at least one processor or central processing unit (CPU) connected to memory and other hardware. While memory stores information the CPU performs actual computation with that information.

Central Processing Unit (CPU)
The electronic circuitry within a computer that carries out the instructions of a computer program by performing the basic arithmetic, logical, control and input/output (I/O) operations specified by the instructions.

Instead of operating directly on memory, most CPUs load data from memory into registers, then perform some computation with that data, and finally store the result back in memory. The reason is performance. The registers of a CPU are by far the fastest storage available in a computer.

Register
A quickly accessible location available to a digital processor’s central processing unit (CPU). Registers usually consist of a small amount of fast storage, although some registers have specific hardware functions. Registers are typically addressed by mechanisms other than main memory.

Most registers of a CPU have the same size, that is, accommodate the same number of bits. Usually, data goes back and forth between memory and registers in chunks of that size called machine words or just words.

Word
The natural unit of data used by a particular processor design. A word is a fixed-sized piece of data handled as a unit by the instruction set or the hardware of the processor. The number of bits in a word (the word size, word width, or word length) is an important characteristic of any specific processor design or computer architecture.

The processor that the mipster emulator implements has a word size of 64 bits. In fact, virtually everything on that machine happens at the level of words. Loading data from memory and storing it back, arithmetic and logical operations among registers, and even fetching code from memory for execution, as we see below. The reason is again performance. Involving 64 bits in parallel in all operations is obviously faster than working with bits individually. However, there are two more reasons why we use a 64-bit machine. The first reason is that the size of an integer in C* is also 64 bits. This means that a single mipster register can hold exactly one C* integer value. Beautiful!

How many different integer values can 64 bits represent? Well, 264 values but what are they? This depends on how we interpret them. In C* integers are interpreted as unsigned which means that an integer value is either zero or a positive number.

However, we may also interpret 64-bit integers as signed in two’s complement with the MSB encoding the sign and the remaining 63 LSBs encoding the value. The number of different integer values that can be represented is nevertheless the same.

We prepared another simple program called negative.c that prints the numerical value represented by -85, and of UINT64_MAX, INT64_MAX, and INT64_MIN for reference, in all possible ways supported by selfie as follows:

> ./selfie -c manuscript/code/negative.c selfie.c -m 1
./selfie: selfie compiling manuscript/code/negative.c with starc
...
./selfie: selfie executing manuscript/code/negative.c with 1MB physical memory on mi\
pster
       -85 in decimal:     -85
       -85 in hexadecimal: 0xFFFFFFFFFFFFFFAB
       -85 in octal:       001777777777777777777653
       -85 in binary:      111111111111111111111111111111111111111111111111111111111\
0101011
UINT64_MAX in decimal:     -1
UINT64_MAX in hexadecimal: 0xFFFFFFFFFFFFFFFF
UINT64_MAX in octal:       001777777777777777777777
UINT64_MAX in binary:      111111111111111111111111111111111111111111111111111111111\
1111111
 INT64_MAX in decimal:     9223372036854775807
 INT64_MAX in hexadecimal: 0x7FFFFFFFFFFFFFFF
 INT64_MAX in octal:       00777777777777777777777
 INT64_MAX in binary:      011111111111111111111111111111111111111111111111111111111\
1111111
 INT64_MIN in decimal:     -9223372036854775808
 INT64_MIN in hexadecimal: 0x8000000000000000
 INT64_MIN in octal:       001000000000000000000000
 INT64_MIN in binary:      100000000000000000000000000000000000000000000000000000000\
0000000
./selfie: manuscript/code/negative.c exiting with exit code 0 and 0.00MB mallocated \
memory
./selfie: selfie terminating manuscript/code/negative.c with exit code 0
./selfie: summary: 861900 executed instructions and 0.17MB mapped memory
...

The second reason for using 64-bit words is that memory addresses in mipster and ultimately in C* are then 64 bits as well. This means in particular that the content of a register can also be seen as a memory address and not just an integer value. However, there is one important detail. On a mipster machine, memory is not only byte-addressed but also word-aligned for access. This means that words can only be accessed in memory at addresses that are multiples of eight, the word size in bytes.

An interesting question is which bits in a word are used to represent which bytes in memory. The two standard choices are determined by the endianness of a processor.

Endianness
The order of the bytes that compose a digital word in computer memory. Words may be represented in big-endian or little-endian format. When storing a word in big-endian format the most significant byte, which is the byte containing the most significant bit, is stored first and the following bytes are stored in decreasing significance order with the least significant byte, which is the byte containing the least significant bit, thus being stored at last place. Little-endian format reverses the order of the sequence and stores the least significant byte at the first location with the most significant byte being stored last.

Interestingly, the endianness of a mipster machine is irrelevant since there is no way to access data in sizes other than word size directly in memory at addresses that are not multiples of the word size. In general, however, this is not true. Most processors allow accessing memory such that endianness does make a difference. We have deliberately ruled that out in selfie to make things simpler. Another simplification in selfie is the absence of any native support of bitwise operations.

Bitwise Operation
Operates on one or more bit patterns or binary numerals at the level of their individual bits. It is a fast, simple action directly supported by the processor, and is used to manipulate values for comparisons and calculations.

Why would we want support of that anyway? Well, how do we read and modify individual bits in registers and eventually memory if everything happens at word granularity? This is where bitwise operations come in. There are typically bitwise logical operations such as bitwise NOT, AND, and OR operations as well as bitwise shift operations such as left shift and logical right shift operations.

Bitwise operations have in common that they treat integers and words purely as sequences of bits. For example, a left shift operation shifts the bits in an integer or word to the left by a given amount of bits while shifting in zeros into the LSB. The logical right shift operation does the exact opposite.

We prepared another simple program called bitwise.c that prints the numerical value 3 in binary and decimal notation and then shifts it repeatedly by six bits to the left until it reaches 0. The program also performs a bitwise OR operation of all intermediate values and prints the result. The program then reverses direction and shifts the most recent value before reaching 0 repeatedly by six bits to the right until it reaches 0 again:

> ./selfie -c manuscript/code/bitwise.c selfie.c -m 1
./selfie: selfie compiling manuscript/code/bitwise.c with starc
...
./selfie: selfie executing manuscript/code/bitwise.c with 1MB physical memory on mip\
ster
0000000000000000000000000000000000000000000000000000000000000011 in binary = 3 in de\
cimal
0000000000000000000000000000000000000000000000000000000011000000 in binary = 192 in \
decimal
0000000000000000000000000000000000000000000000000011000000000000 in binary = 12288 i\
n decimal
0000000000000000000000000000000000000000000011000000000000000000 in binary = 786432 \
in decimal
0000000000000000000000000000000000000011000000000000000000000000 in binary = 5033164\
8 in decimal
0000000000000000000000000000000011000000000000000000000000000000 in binary = 3221225\
472 in decimal
0000000000000000000000000011000000000000000000000000000000000000 in binary = 2061584\
30208 in decimal
0000000000000000000011000000000000000000000000000000000000000000 in binary = 1319413\
9533312 in decimal
0000000000000011000000000000000000000000000000000000000000000000 in binary = 8444249\
30131968 in decimal
0000000011000000000000000000000000000000000000000000000000000000 in binary = 5404319\
5528445952 in decimal
0011000000000000000000000000000000000000000000000000000000000000 in binary = 3458764\
513820540928 in decimal
0011000011000011000011000011000011000011000011000011000011000011 in binary = 3513665\
537849438403 in decimal
0011000000000000000000000000000000000000000000000000000000000000 in binary = 3458764\
513820540928 in decimal
0000000011000000000000000000000000000000000000000000000000000000 in binary = 5404319\
5528445952 in decimal
0000000000000011000000000000000000000000000000000000000000000000 in binary = 8444249\
30131968 in decimal
0000000000000000000011000000000000000000000000000000000000000000 in binary = 1319413\
9533312 in decimal
0000000000000000000000000011000000000000000000000000000000000000 in binary = 2061584\
30208 in decimal
0000000000000000000000000000000011000000000000000000000000000000 in binary = 3221225\
472 in decimal
0000000000000000000000000000000000000011000000000000000000000000 in binary = 5033164\
8 in decimal
0000000000000000000000000000000000000000000011000000000000000000 in binary = 786432 \
in decimal
0000000000000000000000000000000000000000000000000011000000000000 in binary = 12288 i\
n decimal
0000000000000000000000000000000000000000000000000000000011000000 in binary = 192 in \
decimal
0000000000000000000000000000000000000000000000000000000000000011 in binary = 3 in de\
cimal
./selfie: manuscript/code/bitwise.c exiting with exit code 0 and 0.00MB mallocated m\
emory
./selfie: selfie terminating manuscript/code/bitwise.c with exit code 0
./selfie: summary: 2727047 executed instructions and 0.17MB mapped memory
...

But how did we do this if there is no native support of bitwise operations? Well, we use integer arithmetics and wrap-around semantics to provide bitwise OR, left shift, and logical right shift operations in selfie.

In the bitwise.c program we shift the bits representing the integer value 3 to the left by six bits by multiplying the value repeatedly with 64=26 until the value wraps around and becomes zero.

The bitwise.c program demonstrates that by performing a bitwise OR operation on all intermediate values through simple integer addition.

The bitwise.c program applies that method to shift the bits to the right repeatedly by six bits back to their original position.

Interestingly, the bitwise OR, left shift, and logical right shift operations presented here are sufficient to implement all of selfie!

Before moving on let us quickly revisit how characters and strings are stored in memory. In selfie characters are represented by 8 bits. A 64-bit word may therefore hold up to eight characters. This is in fact done to store the characters of strings in memory. A character literal, however, is stored in the eight LSBs of a word leaving the remaining bits blank.

Try the following command to see that our “Hello World!” program does actually print the characters in chunks of eight by printing the two words containing the characters directly on the console. The command creates three mipsters on top of each other slowing down the execution of the program to the extent that the behavior is really visible:

> ./selfie -c manuscript/code/hello-world.c -o hello-world.m -c selfie.c -o selfie.m\
 -m 3 -l selfie.m -m 1 -l hello-world.m -m 1
...

This is nice but how do we then access individual characters of a string? Simple, by using our bitwise operations, of course! Selfie implements loading and storing characters of strings in memory accordingly.

So, on the machine everything related to data happens at the granularity of words. Interesting. What is even more fascinating is that even memory addresses are handled at that granularity as well.

Memory Addresses

The address of a byte or in fact a word in memory is obviously a positive number. On a mipster machine and many others as well, memory addresses are also represented by words. Thus the word size of such machines determines how many memory addresses can be distinguished and, as a consequence, how much memory can be accessed. A mipster machine, for example, supports up to 4GB of byte-addressed and word-aligned memory. Addressing that memory thus requires the 32 LSBs out of the 64 bits of a word. The remaining 32 MSBs are unused.

Let us reflect on that for a moment. So, on a mipster machine, the 64 bits of a word can be used to encode characters, integers, and even memory addresses! That’s right and this is not all. Even machine instructions are represented by words which is our next topic.

Instructions

The code of a mipster machine is represented by a sequence of machine words where each 64-bit word encodes in fact two rather than one machine instruction since each instruction only requires 32 bits. It is that easy and actually true for many other processors as well. The exact specification of the encoding as well as the meaning of each instruction is provided by the instruction set architecture or ISA for short.

Instruction Set Architecture (ISA)
The part of the computer architecture related to programming, including the native data types, instructions, registers, addressing modes, memory architecture, interrupt and exception handling, and external I/O. An ISA includes a specification of the set of opcodes (machine language), and the native commands implemented by a particular processor. An instruction set is the interface between a computer’s software and its hardware, and thereby enables the independent development of these two computing realms.

Let us have another look at the first few lines of the assembly code in selfie.s presented in the previous chapter:

0x0(~1): 0x000392B7: lui $t0,0x39
0x4(~1): 0x7D028293: addi $t0,$t0,2000
0x8(~1): 0x00028193: addi $gp,$t0,0
0xC(~1): 0x00000513: addi $a0,$zero,0
0x10(~1): 0x0D600893: addi $a7,$zero,214
0x14(~1): 0x00000073: ecall
0x18(~1): 0x00750513: addi $a0,$a0,7
0x1C(~1): 0x00800293: addi $t0,$zero,8
0x20(~1): 0x025572B3: remu $t0,$a0,$t0
0x24(~1): 0x40550533: sub $a0,$a0,$t0
0x28(~1): 0x0D600893: addi $a7,$zero,214
0x2C(~1): 0x00000073: ecall
0x30(~1): 0xFEA1BC23: sd $a0,-8($gp)
0x34(~1): 0x00000513: addi $a0,$zero,0
0x38(~1): 0x00810293: addi $t0,$sp,8
0x3C(~1): 0xFF810113: addi $sp,$sp,-8
0x40(~1): 0x00513023: sd $t0,0($sp)
0x44(~1): 0x680260EF: jal $ra,39328[0x266C4]

Each line represents one machine instruction. The fifth line, for example, reads like this. The hexadecimal number 0x10 is the 32-bit-word-aligned memory address of the instruction in memory. The expression (~1) is the approximate line number of the source code, in this case selfie.c, that was compiled to this instruction. The 32-bit word 0x0D600893 is in fact the binary encoded version of the instruction itself. Finally, addi $a7,$zero,214 is the human-readable assembly version of the instruction. This means in particular that 0x0D600893 and addi $a7,$zero,214 are semantically equivalent. The 32-bit word 0x0D600893 in binary stored at address 0x10 in memory is thus the only thing that the machine needs, the rest is for us to make it readable.

The machine code in selfie.m presented in the previous chapter contains just that binary code, that is, 0x0D600893 followed by 0x00000073 from the sixth line in selfie.s and so on. To prepare the machine for executing that code we only need to load selfie.m into memory starting at address 0x0 and then tell the machine to execute the code. How this is done is part of the next chapter.

So, how do we know that 0x0D600893 represents addi $a7,$zero,214? This is specified precisely by the ISA of the open RISC-V processor family.

RISC-V
An open instruction set architecture (ISA) based on established reduced instruction set computing (RISC) principles. In contrast to most ISAs, the RISC-V ISA can be freely used for any purpose, permitting anyone to design, manufacture and sell RISC-V chips and software. While not the first open architecture ISA, it is significant because it is designed to be useful in a wide range of devices. The instruction set also has a substantial body of supporting software, which avoids a usual weakness of new instruction sets.

The mipster emulator implements a strict subset of 64-bit RISC-V instructions which we call RISC-U. In fact, mipster only implements 14 out of several dozen RISC-V instructions. The starc compiler generates RISC-U code that runs on mipster and is compatible with the official RISC-V tool chain and thus runs, at least in principle, also on real RISC-V processors.

RISC-V can be seen as a streamlined modern version of MIPS which is the ISA that mipster used to support in older versions of selfie hence the name mipster. MIPS processors are still used in a large variety of devices.

MIPS
A reduced instruction set computer (RISC) instruction set architecture (ISA) developed by MIPS Technologies (formerly MIPS Computer Systems, Inc.). The early MIPS architectures were 32-bit, with 64-bit versions added later. Multiple revisions of the MIPS instruction set exist, including MIPS I, MIPS II, MIPS III, MIPS IV, MIPS V, MIPS32, and MIPS64. The current revisions are MIPS32 (for 32-bit implementations) and MIPS64 (for 64-bit implementations).

Let us go back to the above example. The addi string in addi $a7,$zero,214 is an assembly mnemonic of the operation code or opcode, for short, that identifies the operation to be performed while the remaining $a7,$zero,214 are three operands of that operation.

Opcode
The portion of a machine language instruction that specifies the operation to be performed. Beside the opcode itself, most instructions also specify the data they will process, in the form of operands.

In our example, addi instructs the processor to add the value of the integer 214 to the value stored in register $zero (which is always 0) and store the result in register $a7 (which will thus contain the value 214 after executing the instruction).

A 64-bit RISC-V processor has 32 64-bit registers that can be used for this purpose. They are numbered from 0 to 31. The register $zero is obviously register 0 while the register $a7, less obviously, happens to be register 17. The only missing piece of information is that addi is represented by the opcode 19. Now, how do we get from there to 0x0D600893?

We take the opcode 19 (0010011), register 17 (10001), register 0 (00000), and value 214 (000011010110) and put them together in binary as follows:

000011010110 00000 000 10001 0010011

If you merge that into a 32-bit word you get 0x0D600893. The RISC-V ISA specifies that the twelve MSBs encode the third (!) operand value 214, in 12-bit two’s complement in fact, so it could even be a negative value. The next five bits encode the second (!) operand register 0 followed by three bits set to 0. The next five bits encode the first (!) operand register 17 followed by the remaining seven LSBs that encode the opcode 19. The fact that the third operand is treated by addi as an integer value rather than, say, a number identifying a register, is called immediate addressing hence the i in addi. Immediate addressing is one of various so-called addressing modes.

Addressing Mode
An aspect of the instruction set architecture in most central processing unit (CPU) designs. The various addressing modes that are defined in a given instruction set architecture define how machine language instructions in that architecture identify the operand(s) of each instruction.

But why does the ISA provision five bits for the first and second operand? Because five bits allow us to address exactly the 25=32 different registers of the machine. The seven bits for the opcode obviously allow us to distinguish up to 26=64 different opcodes but we actually do not need that many. Now, think about the range of values that can be encoded in two’s complement in the twelve bits for the third operand! This is the range of values that we can get into a register such as $a7 with a single addi instruction. In other words, we can use that instruction to initialize registers. Cool!

Note that the value in register $zero is assumed to be 0 at all times. This is in fact needed for initializing registers using the addi instruction. There exists RISC-V assembly in which such intention is made explicit by using pseudo instructions. Here, the pseudo instruction movi $a7,214, for example, could be used instead but would anyhow just be a short version of addi $a7,$zero,214. The remaining thirteen instructions of RISC-U are just as simple, we introduce them on the fly as needed.

So, how does the starc compiler generate such code? It uses bitwise operations, of course, that is, bitwise OR and left shifting in particular. There are in total six different formats in RISC-U depending on the opcode. The actual source code in selfie.c for encoding (and decoding) machine instructions is nevertheless pretty straightforward.

An interesting feature of the implementation of selfie in a single file is that the source code for decoding machine instructions, which is used by the mipster emulator and selfie’s disassembler, is right next to the code for encoding instructions. Decoding machine code performs the exact inverse to encoding machine code extracting the opcode and operands that were originally encoded. It is done by a combination of left and logical right shifting. See for yourself how this works in the code! It may look technical but is actually very simple.

Summary

In this chapter we have seen how characters, strings, identifiers, integers, and even machine instructions are encoded and decoded, and how all that allows us to represent source and machine code using just bits. We even understand now some of the output of the starc compiler when compiling our “Hello World!” program:

> ./selfie -c manuscript/code/hello-world.c -m 1
./selfie: selfie compiling manuscript/code/hello-world.c with starc
...
./selfie: 734 characters read in 23 lines and 11 comments
./selfie: with 94(12.80%) characters in 39 actual symbols
./selfie: 1 global variables, 1 procedures, 1 string literals
./selfie: 2 calls, 2 assignments, 1 while, 0 if, 0 return
./selfie: symbol table search time was 1 iterations on average and 28 in total
./selfie: 504 bytes generated with 116 instructions and 40 bytes of data
./selfie: init:    lui: 1(0.73%), addi: 72(52.94%)
./selfie: memory:  ld: 20(14.70%), sd: 11(8.80%)
./selfie: compute: add: 3(2.20%), sub: 3(2.20%), mul: 1(0.73%), divu: 0(0.00%), remu\
: 2(1.47%)
./selfie: control: sltu: 1(0.73%), beq: 5(3.67%), jal: 3(2.20%), jalr: 6(4.41%), eca\
ll: 8(5.88%)
...

The compiler counts characters, lines, comments, whitespace, and symbols in source code, and instructions and bytes of data in the generated machine code. The information about global variables, procedures, string literals, calls, assignments, while, if, and return as well as the generated instructions will become clear in the next chapter.

But there is still something missing here. Why is all of this encoded the way it is and not some other way? There are two important reasons:

  1. Time: Depending on the encoding, data can be processed faster or slower, even fundamentally faster or slower in the sense that some encoding may not allow reasonably fast processing at all.

    But why do we not always pick the encoding that supports the fastest possible processing? We could do that but it could make the actual encoding process slow and in some cases even fundamentally slow, and because:

  2. Space: The amount of bits needed to encode data can vary significantly and even fundamentally depending on the encoding scheme.

Machine instructions are also encoded such that code size is compact while decoding them, which is ultimately done in hardware by the machine, is still fast. This is, of course, extremely important since every single instruction a processor executes needs to be decoded into opcode and operands before execution.

In computer science the trade-off between time and space shows up in all kinds of situations. Encoding information is just one example. The important lesson here is to be aware of the trade-off and understand that there is no such thing as a free lunch!

4. State

Computation is the evolution of state. At any given time, a computer stores a very large but still finite amount of bits in memory and registers. The values of all these bits together is what we call the state of the machine. Then the processor executes one instruction which directs it to change the values of a very small number of bits creating a new state. That process of change from one state to the next continues until the machine is turned off.

State
The state of a digital logic circuit or computer program is a technical term for all the stored information, at a given instant in time, to which the circuit or program has access. The output of a digital circuit or computer program at any time is completely determined by its current inputs and its state.

Software on source and machine code level specifies for each state what the next state is. There are the data bits that are being changed and the code bits that determine that change. Input is typically modeled by data bits that are changed by something other than the processor such as a keyboard, for example.

In this chapter, we explain how on machine level memory and registers represent state and how machine code describes change of that state. We also show how source code provides a high-level view of program state and how that translates down to machine code level. Seeing both levels and how they connect is an important step towards learning how to code.

Von Neumann Machine

Most general-purpose computers today are based on what is known as the von Neumann model or von Neumann architecture.

Von Neumann Architecture
A computer architecture, described in 1945 by the mathematician and physicist John von Neumann and others, for an electronic digital computer with parts consisting of a processing unit containing an arithmetic logic unit and processor registers; a control unit containing an instruction register and program counter; a memory to store both data and instructions; external mass storage; and input and output mechanisms.
Von Neumann Architecture
Von Neumann Architecture

The CPU and main memory in a typical computer correspond to the processing unit and memory of a von Neumann architecture, respectively. The mipster emulator is no exception. It emulates a von Neumann machine.

The key idea is very simple. A von Neumann machine is a stored-program computer which stores both code and data in the same memory. In fact, in memory there is really no distinction between code and data. A von Neumann machine fetches code from memory and executes it. The code will in turn instruct the machine to load data from memory into registers, perform some computation on registers, and finally store the results back in memory. We explain the details of how this works below. For now it is only important to understand that bits fetched from memory and executed happen to represent code in that moment while bits loaded from memory into registers, then modified, and finally stored back in memory represent data in that moment. At all other times, they are just bits.

State Space

A von Neumann machine that can store n bits in memory and registers can distinguish 2n states.

Interestingly, we can always, at least in principle, partition that enormously large state space into a set of good states and a set of bad states. Software without bugs would always keep the machine in good states, or conversely, prevent the machine from ever going to a bad state. However, what is a good state?

The answer to that question depends on what we would like the machine to do, it depends on the application. But most applications have nothing to do with individual bits. We therefore need formalisms that allow us to formalize what we want the machine to do on the level of the applications we are interested in. This is the reason why high-level programming languages were invented.

Since there are new applications or at least application characteristics appearing all the time new programming languages are also invented all the time. The key to being able to follow that trend is to understand the principles of programming languages and how they translate to machine level.

The programming language C of which we use a tiny subset here was originally designed for programming systems software such as operating systems and compilers. However, C has become very popular in many other application domains as well. Programming in C is called imperative programming which closely matches the imperative nature of most computers. It is therefore relatively straightforward to compile C code to machine code and manipulate machine states in C code even at the level of individual bits.

Imperative Programming
A programming paradigm that uses statements that change program’s state. In much the same way that the imperative mood in natural languages expresses commands, an imperative program consists of commands for the computer to perform. Imperative programming focuses on describing how a program operates.

Imperative programming is supported by many programming languages but it is not the only programming paradigm. Declarative programming is an important alternative that is also supported by many programming languages but handles program state differently.

Declarative Programming
A programming paradigm - a style of building the structure and elements of computer programs - that expresses the logic of a computation without describing its control flow.

Intuitively, rather than saying imperatively how to change state, declarative programming focuses on declaring what needs to change. While spelling out how to change state can become tedious with imperative programming spelling out what to change can become burdensome with declarative programming. Yet both paradigms have their important use cases and a port of selfie to a declarative programming language would be very nice to have but remains future work for now.

Before explaining how C* code works, we introduce C* language elements that allow us to describe program state as a high-level abstraction of machine state. Code written in C* then operates on that program state. Let us have a look at the following C* program which we call the countdown program or just countdown:

The Countdown Program
 1 // declaring bar as a global variable of type uint64_t and
 2 // defining the decimal value 10 as initial value of bar
 3 uint64_t bar = 10;
 4 
 5 // declaring main as a procedure with no arguments and return type uint64_t
 6 uint64_t main() {
 7   // defining main to be the following code
 8 
 9   // while bar is greater than the decimal value 0 ...
10   // otherwise continue to the return statement below
11   while (bar > 0) {
12     // ... subtract the decimal value 1 from bar and assign the result to bar
13     bar = bar - 1;
14 
15     // go back to the while statement and check again
16   }
17 
18   // return the value of bar as the result of the invocation of main
19   return bar;
20 }

The program takes the decimal value 10 (Line 3) and decrements it (Line 13) until it reaches the decimal value 0 (Line 11) which is then returned (Line 19) as so-called exit code. To see for yourself run the code as follows:

 1 > ./selfie -c manuscript/code/countdown.c -o countdown.m -s countdown.s -m 1
 2 ./selfie: selfie compiling manuscript/code/countdown.c with starc
 3 ./selfie: 645 characters read in 20 lines and 9 comments
 4 ./selfie: with 65(10.70%) characters in 28 actual symbols
 5 ./selfie: 1 global variables, 1 procedures, 0 string literals
 6 ./selfie: 1 calls, 1 assignments, 1 while, 0 if, 1 return
 7 ./selfie: symbol table search time was 1 iterations on average and 22 in total
 8 ./selfie: 416 bytes generated with 100 instructions and 16 bytes of data
 9 ./selfie: init:    lui: 1(0.83%), addi: 64(53.33%)
10 ./selfie: memory:  ld: 19(15.83%), sd: 7(5.83%)
11 ./selfie: compute: add: 1(0.83%), sub: 3(2.50%), mul: 0(0.00%), divu: 0(0.00%), remu\
12 : 2(1.66%)
13 ./selfie: control: sltu: 1(0.83%), beq: 5(4.16%), jal: 3(2.50%), jalr: 6(5.00%), eca\
14 ll: 8(6.66%)
15 ./selfie: 544 bytes with 100 instructions and 16 bytes of data written into countdow\
16 n.m
17 ./selfie: 3797 characters of assembly with 100 instructions written into countdown.s
18 ./selfie: selfie executing countdown.m with 1MB physical memory on mipster
19 ./selfie: countdown.m exiting with exit code 0 and 0.00MB mallocated memory
20 ./selfie: selfie terminating countdown.m with exit code 0
21 ./selfie: summary: 132 executed instructions and 0.00MB mapped memory
22 ./selfie: init:    lui: 1(0.75%), addi: 41(31.60%)
23 ./selfie: memory:  ld: 25(18.93%), sd: 15(11.36%)
24 ./selfie: compute: add: 0(0.00%), sub: 11(8.33%), mul: 0(0.00%), divu: 0(0.00%), rem\
25 u: 1(0.75%)
26 ./selfie: control: sltu: 11(8.33%), beq: 11(8.33%), jal: 12(9.90%), jalr: 1(0.75%), \
27 ecall: 3(2.27%)
28 ./selfie: profile: total,max(ratio%)@addr(line#),2max,3max
29 ./selfie: calls:   1,1(100.00%)@0x134(~11),0(0.00%),0(0.00%)
30 ./selfie: loops:   10,10(100.00%)@0x148(~11),0(0.00%),0(0.00%)
31 ./selfie: loads:   25,11(44.00%)@0x148(~11),10(40.00%)@0x158(~13),1(4.00%)@0x50(~1)
32 ./selfie: stores:  15,10(66.66%)@0x164(~13),1(6.66%)@0x30(~1),1(6.66%)@0x40(~1)

Global Variable

For the countdown program to be able to operate on a number there needs to be memory to store that number. For this purpose, Line 3 in the source code declares a so-called global variable called bar. The starc compiler even reports that it found exactly that one global variable, see Line 5 in the above output.

Global Variable
A variable with global scope, meaning that it is visible (hence accessible) throughout the program, unless shadowed. The set of all global variables is known as the global environment or global state.

So global really just means here that bar can be used throughout the program. Line 3 is thus a declaration that specifies that the identifier bar refers to the same variable in the whole program.

Declaration
Specifies properties of an identifier: it declares what an identifier means. Declarations are most commonly used for functions, variables, constants, and classes. Beyond the name (the identifier itself) and the kind of entity (function, variable, etc.), declarations typically specify the data type (for variables and constants), or the type signature (for functions). The term “declaration” is frequently contrasted with the term “definition”, but meaning and usage varies significantly between languages.

Line 3 not only declares bar but also defines the initial value of bar as the decimal value 10 represented by the integer literal 10. The initial value of a global variable is nevertheless optional. Line 3 could be rewritten to uint64_t bar; in which case the value of bar would be initially undefined meaning it could initially be any value. Undefined values are a common source of errors, if programs depend on them. Modern compilers usually warn programmers about that (not starc though since we need to keep things simple). A simple way to avoid depending on undefined values is to either provide an initial value for a variable or to assign a value to a variable before using the variable in any computation, see below for more about how to do that. A program that does not depend on undefined values has a single initial state from which it begins all computations. This is what we want!

Note that the equality sign = in Line 3 is merely syntactic sugar making the code more readable while the semicolon ; is a so-called terminator which indicates the end of a statement. After the semicolon we could insert more global variable declarations as long as they all were to introduce unique identifiers and were properly terminated with semicolons. Programming languages newer than C often make such terminators optional or omit them altogether since they are, similar to syntactic sugar, not necessary for the compiler to work and, unlike syntactic sugar, sometimes considered a burden.

Data Type

Line 3 also specifies that the data type of bar is uint64_t which, according to the C standard, means that bar represents an unsigned 64-bit integer, that is, 64 bits encoding a whole number. It also means that arithmetic operations involving bar will be done with unsigned 64-bit wrap-around semantics.

Data Type
A classification of data which tells the compiler or interpreter how the programmer intends to use the data. Most programming languages support various types of data, for example: real, integer, or Boolean. A data type provides a set of values from which an expression (i.e. variable, function …) may take its values. The type defines the operations that can be done on the data, the meaning of the data, and the way values of that type can be stored.

So, this is important! A data type tells us and the compiler what the intended meaning of the bits are that encode the data. Remember, bits can encode anything and have no meaning unless they are involved in an operation. Data types therefore help with identifying meaning even without performing any operations.

A global variable of type uint64_t such as bar provides storage for 64 bits which happens to match the size of a word on a mipster machine. In fact, the value of bar will be stored in exactly one word somewhere in memory. First of all, this means that bar provides storage that is identified by the identifier bar and not by some memory address. But it also means that the program as is cannot access any other bits in memory than the 64 bits identified by bar which obviously reduces the size of the state space dramatically! So the program state space is much smaller than the machine state space and therefore much easier to reason about. However, there is also code in countdown that operates on bar. Let us have a closer look at how that is introduced in C*.

Procedure

The source code of the countdown program declares a so-called procedure called main in Line 6. The broader term for procedures is subroutines defined as follows.

Subroutine
A sequence of program instructions that perform a specific task, packaged as a unit. This unit can then be used in programs wherever that particular task should be performed. Subprograms may be defined within programs, or separately in libraries that can be used by multiple programs. In different programming languages, a subroutine may be called a procedure, a function, a routine, a method, or a subprogram.

In C subroutines are called procedures. Line 6 specifies that main refers to a procedure rather than a global variable simply by using () after the identifier. In fact, it would be perfectly fine to just say uint64_t main();. However, the code enclosed in {}, called the body of the procedure, also defines the implementation of the procedure. It describes what the procedure does. We get to that further below. The uint64_t keyword before main specifies that the so-called return type of the procedure is an unsigned 64-bit integer. This means that the procedure returns an unsigned 64-bit integer value when done.

Global variable and procedure declarations, as in Lines 3 and 6, may use any identifier not used anywhere else in the program. In other words, identifiers used in declarations must be unique. The main procedure name, however, is even more special because the main procedure is the one that is invoked when a C program starts executing. Thus a valid C program needs to contain exactly one declaration and definition of a procedure called main. Otherwise, the system would not “know” what to execute. See for yourself by renaming main in countdown.c to something else. When main returns the program stops executing and the system outputs the value returned by main, which is 0, as the previously mentioned exit code, see Lines 19 and 20 in the above output.

There can be any number of global variable and procedure declarations in a C program. The starc compiler reports in Line 5 in the above output that there is exactly one procedure declared in countdown.c which is in fact the main procedure.

Line 8 in the above output mentions that starc generated exactly 416 bytes for countdown.c representing 100 instructions and 16 bytes of data (416=100*4+16). The 100 instructions are encoded in 400 bytes (remember each instruction takes 4 bytes). They represent the machine code generated by starc for initializing and controlling the machine and in particular for implementing the main procedure, that is, the C* code in Lines 11-19 in countdown.c. Out of the 16 bytes of data, 8 bytes represent the initial value of bar which is 10. The other 8 bytes represent a hidden global variable called _bump used for memory management.

Selfie can write the generated bytes into a binary file or just binary (using the -o option) that can later be loaded again (using the -l option) and executed (using the -m option, for example).

Binary File
A computer file that is not a text file. The term “binary file” is often used as a term meaning “non-text file”. Binary files are usually thought of as being a sequence of bytes, which means the binary digits (bits) are grouped in eights. Binary files typically contain bytes that are intended to be interpreted as something other than text characters. Compiled computer programs are typical examples; indeed, compiled applications are sometimes referred to, particularly by programmers, as binaries.

In our example, selfie is instructed to write the generated bytes into a binary file called countdown.m that actually contains a total of 544 bytes for the 416 bytes of code and data plus 128 bytes of additional information (544=416+128), as reported in Line 8 in the above output. Out of the the 128 bytes of additional information 120 bytes encode meta data according to the ELF standard. The remaining 8 bytes encode the amount of bytes needed to represent the instructions, that is, the 400 bytes of code, as opposed to the 16 bytes of data. This information is necessary to determine which bytes are code and which are data. The countdown.m binary is what is known as an executable which selfie outputs as proper ELF file making it compatible with a large number of existing tools.

Executable
Causes a computer “to perform indicated tasks according to encoded instructions,” as opposed to a data file that must be parsed by a program to be meaningful. These instructions are traditionally machine code instructions for a physical CPU.

We may use the terms binary and executable interchangeably even though, strictly speaking, there are binary files such as image files, for example, that are obviously not executable. However, executables are usually binaries.

The format of the countdown.m executable is very simple. It begins with the 120 bytes of ELF meta data followed by the 8 bytes encoding the number of bytes of code (which is 400 here) followed by the 400 bytes of code and the 16 bytes of data. That’s it.

When invoking selfie on the countdown program, we also instructed selfie to produce an assembly file countdown.s for the countdown program (using the -s option) that represents a human-readable version of the binary file countdown.m. Selfie reports on that in Line 17 in the above output. Note that countdown.s only contains code but not any data such as the initial value of bar.

Here is countdown.s but only showing the instructions that will actually be executed when running the program. The instructions that will not be executed, shown as ..., are generated by starc because they may be needed by some programs, just not the countdown program. The comments are inserted manually and explained below.

// machine initialization code

// initialize global pointer $gp
// to program break at 0x101A0
0x0(~1): 0x000102B7: lui $t0,0x10     // load 0x10000 into $t0
0x4(~1): 0x1A028293: addi $t0,$t0,416 // add 416 (0x1A0) to $t0
0x8(~1): 0x00028193: addi $gp,$t0,0   // copy $t0 to $gp

// initialize bump pointer, explained later
0xC(~1): 0x00000513: addi $a0,$zero,0
0x10(~1): 0x0D600893: addi $a7,$zero,214
0x14(~1): 0x00000073: ecall
0x18(~1): 0x00750513: addi $a0,$a0,7
0x1C(~1): 0x00800293: addi $t0,$zero,8
0x20(~1): 0x025572B3: remu $t0,$a0,$t0
0x24(~1): 0x40550533: sub $a0,$a0,$t0
0x28(~1): 0x0D600893: addi $a7,$zero,214
0x2C(~1): 0x00000073: ecall
0x30(~1): 0xFEA1BC23: sd $a0,-8($gp)
0x34(~1): 0x00000513: addi $a0,$zero,0

// initialize argv, explained later
0x38(~1): 0x00810293: addi $t0,$sp,8
0x3C(~1): 0xFF810113: addi $sp,$sp,-8
0x40(~1): 0x00513023: sd $t0,0($sp)

// call main procedure
0x44(~1): 0x0F0000EF: jal $ra,60[0x134] // jump and link to main procedure code

// prepare return value of main procedure
// as exit code for exit environment call
0x48(~1): 0xFF810113: addi $sp,$sp,-8   // allocate space on the stack
0x4C(~1): 0x00A13023: sd $a0,0($sp)     // push return value onto stack

// wrapper code for exit environment call
0x50(~1): 0x00013503: ld $a0,0($sp)     // load exit code from stack into $a0
0x54(~1): 0x00810113: addi $sp,$sp,8    // pop exit code off the stack
0x58(~1): 0x05D00893: addi $a7,$zero,93 // load exit environment call ID 93 into $a7
0x5C(~1): 0x00000073: ecall             // exit here

// wrapper code for unused environment calls removed
...
// prologue of main procedure, explained later
0x134(~11): 0xFF810113: addi $sp,$sp,-8
0x138(~11): 0x00113023: sd $ra,0($sp)
0x13C(~11): 0xFF810113: addi $sp,$sp,-8
0x140(~11): 0x00813023: sd $fp,0($sp)
0x144(~11): 0x00010413: addi $fp,$sp,0

// body of main procedure
0x148(~11): 0xFF01B283: ld $t0,-16($gp)        // load bar into $t0
0x14C(~11): 0x00000313: addi $t1,$zero,0       // initialize $t1 with 0
0x150(~11): 0x005332B3: sltu $t0,$t1,$t0       // check 0 < bar
0x154(~11): 0x00028C63: beq $t0,$zero,6[0x16C] // if no, branch
0x158(~13): 0xFF01B283: ld $t0,-16($gp)        // load bar into $t0
0x15C(~13): 0x00100313: addi $t1,$zero,1       // initialize $t1 with 1
0x160(~13): 0x406282B3: sub $t0,$t0,$t1        // compute bar - 1
0x164(~13): 0xFE51B823: sd $t0,-16($gp)        // store result in bar
0x168(~19): 0xFE1FF06F: jal $zero,-8[0x148]    // jump backwards to 0x148
0x16C(~19): 0xFF01B283: ld $t0,-16($gp)        // load bar into $t0
0x170(~19): 0x00028513: addi $a0,$t0,0         // copy $t0 to $a0
0x174(~19): 0x0040006F: jal $zero,1[0x178]     // jump to epilogue

// epilogue of main procedure, explained later
0x178(~20): 0x00040113: addi $sp,$fp,0
0x17C(~20): 0x00013403: ld $fp,0($sp)
0x180(~20): 0x00810113: addi $sp,$sp,8
0x184(~20): 0x00013083: ld $ra,0($sp)
0x188(~20): 0x00810113: addi $sp,$sp,8
0x18C(~20): 0x00008067: jalr $zero,0($ra)

Each line in countdown.s represents one instruction. The first line, for example, is the instruction lui $gp,0x10, which is encoded in the 32-bit word 0x000101B7. The instruction will be loaded into memory at some fixed address plus the offset 0x0, as indicated by 0x0(~1). The number ~1 in parentheses is the approximate line number in countdown.c for which starc generated the instruction. Even though there is a comment at Line 1 in countdown.c this still makes sense because starc always generates some instructions before compiling any actual source code. Try to find the four instructions that starc actually generated for bar = bar - 1 in Line 13 of countdown.c! They, along with the others, are explained below.

Ok, but what happens now when selfie is instructed by the final -m 1 option to execute the generated code? Doing that involves solving a problem that appears to have no solution. How does a computer load an executable into memory without an executable in memory that instructs the processor how to do this? The process that solves that problem is called bootstrapping.

Bootstrapping
A self-starting process that is supposed to proceed without external input. In computer technology the term (usually shortened to booting) usually refers to the process of loading the basic software into the memory of a computer after power-on or general reset, especially the operating system which will then take care of loading other software as needed.

A computer typically bootstraps itself by having the processor initially fetch, decode, and execute machine code from some non-volatile memory rather than volatile main memory. That machine code implements a so-called boot loader which instructs the processor to load the code that the processor is actually supposed to execute from some external source and store it in main memory. When done, the boot loader instructs the processor to start fetching, decoding, and executing the code stored in main memory.

Program Break

Before launching the mipster emulator, selfie bootstraps mipster exactly like a computer bootstraps itself. First of all, the emulated memory and registers are zeroed, that is, set to 0x0. The machine code and data generated by starc (or loaded from a binary file) is then copied into the emulated memory starting at some low address. The portions of memory where code and data are located are also called the code segment and the data segment, respectively. Then, any optional input parameters to the program are stored at some high address in a memory region called the stack. The details of that are explained in the stack chapter. At this point, we have the following memory layout.

Memory Layout
Memory Layout

With code, data, and input copied to memory the machine is essentially ready to go. The rest of the memory layout will be explained in later chapters. For now we only need to know that the border between code and data and the rest of memory is the program break which divides memory into statically and dynamically allocated storage. The addresses of the code and data stored in memory below the program break will not change while the storage above the program break may be used for different purposes during execution. Keep in mind though that the memory layout we describe here is only one choice out of many possible choices. However, that layout is probably the most widely adopted choice today.

Going back to C* in general and the countdown program in particular, global variable and procedure declarations specify exactly what is below the program break, what is code and data, and what the code does with the data, as we see next. Most important here is to understand that the state of memory is fully determined after copying the code for procedures and the data for global variables into memory. While countdown is a simple program think of the code and data for selfie. There are hundreds of global variable and procedure declarations in selfie.c but it is still the same thing. The fact that C* allows us to talk about variables and procedures without worrying about memory layout is a key ingredient for managing the enormously large state space. The only missing piece now for a complete picture is the state of the registers. Let us take a look at that next!

Program Counter

How does a computer “know” what to execute? After all the bits in memory could mean anything. They could be code, they could be data, anything. But the answer to that question can anyway not be any simpler.

Processors based on the von Neumann model feature a special-purpose register as part of their control unit called the program counter (PC). The PC of the machine emulated by mipster is, unsurprisingly, yet another 64-bit register.

Program Counter (PC)
A processor register that indicates where a computer is in its program sequence. In most processors, the PC is incremented after fetching an instruction, and holds the memory address of (“points to”) the next instruction that would be executed. Instructions are usually fetched sequentially from memory, but control transfer instructions change the sequence by placing a new value in the PC. These include branches (sometimes called jumps), subroutine calls, and returns. A transfer that is conditional on the truth of some assertion lets the computer follow a different sequence under different conditions. A branch provides that the next instruction is fetched from somewhere else in memory. A subroutine call not only branches but saves the preceding contents of the PC somewhere. A return retrieves the saved contents of the PC and places it back in the PC, resuming sequential execution with the instruction following the subroutine call.

At boot time, selfie first makes sure that all emulated memory and registers are zeroed, just like real hardware. Then, code and data is loaded into memory starting, by convention, at address 0x10000. Finally, the program counter is set to exactly that address 0x10000. Now, mipster is ready to start code execution. From then on mipster fetches the word in memory where the PC points to, decodes that word, and executes the resulting instruction. Each instruction not only instructs the processor to perform some computation but also determines the next value of the PC so that the processor “knows” where in memory the next instruction is stored. That sequence of PC values is called control flow.

Control Flow
The order in which individual statements, instructions or function calls of an imperative program are executed or evaluated. The emphasis on explicit control flow distinguishes an imperative programming language from a declarative programming language.

Let us take a look at how the first few and last few instructions for the countdown program execute. You will need to scroll back to the beginning after the following command though since selfie will output the whole sequence of executed instructions. Most of the output is skipped and just shown as ... here.

> ./selfie -c manuscript/code/countdown.c -d 1
./selfie: selfie compiling manuscript/code/countdown.c with starc
./selfie: 645 characters read in 20 lines and 9 comments
./selfie: with 65(10.70%) characters in 28 actual symbols
./selfie: 1 global variables, 1 procedures, 0 string literals
./selfie: 1 calls, 1 assignments, 1 while, 0 if, 1 return
./selfie: symbol table search time was 1 iterations on average and 22 in total
./selfie: 416 bytes generated with 100 instructions and 16 bytes of data
./selfie: init:    lui: 1(0.83%), addi: 64(53.33%)
./selfie: memory:  ld: 19(15.83%), sd: 7(5.83%)
./selfie: compute: add: 1(0.83%), sub: 3(2.50%), mul: 0(0.00%), divu: 0(0.00%), remu\
: 2(1.66%)
./selfie: control: sltu: 1(0.83%), beq: 5(4.16%), jal: 3(2.50%), jalr: 6(5.00%), eca\
ll: 8(6.66%)
./selfie: selfie executing manuscript/code/countdown.c with 1MB physical memory on m\
ipster
$pc=0x10000(~1): lui $t0,0x10: |- $t0=0x0 -> $t0=0x10000
$pc=0x10004(~1): addi $t0,$t0,416: $t0=65536(0x10000) |- $t0=65536(0x10000) -> $t0=6\
5952(0x101A0)
$pc=0x10008(~1): addi $gp,$t0,0: $t0=65952(0x101A0) |- $gp=0x0 -> $gp=0x101A0
---
$pc=0x1000C(~1): addi $a0,$zero,0: $zero=0(0x0) |- $a0=0(0x0) -> $a0=0(0x0)
$pc=0x10010(~1): addi $a7,$zero,214: $zero=0(0x0) |- $a7=0(0x0) -> $a7=214(0xD6)
$pc=0x10014(~1): ecall(brk): $a0=0x0 |- $a0=0x0 -> $a0=0x101A0
$pc=0x10018(~1): addi $a0,$a0,7: $a0=65952(0x101A0) |- $a0=65952(0x101A0) -> $a0=659\
59(0x101A7)
$pc=0x1001C(~1): addi $t0,$zero,8: $zero=0(0x0) |- $t0=65952(0x101A0) -> $t0=8(0x8)
$pc=0x10020(~1): remu $t0,$a0,$t0: $a0=65959(0x101A7),$t0=8(0x8) |- $t0=8(0x8) -> $t\
0=7(0x7)
$pc=0x10024(~1): sub $a0,$a0,$t0: $a0=65959(0x101A7),$t0=7(0x7) |- $a0=65959(0x101A7\
) -> $a0=65952(0x101A0)
$pc=0x10028(~1): addi $a7,$zero,214: $zero=0(0x0) |- $a7=214(0xD6) -> $a7=214(0xD6)
$pc=0x1002C(~1): ecall(brk): $a0=0x101A0 |- ->
$pc=0x10030(~1): sd $a0,-8($gp): $gp=0x101A0,$a0=65952(0x101A0) |- mem[0x10198]=0 ->\
 mem[0x10198]=$a0=65952(0x101A0)
$pc=0x10034(~1): addi $a0,$zero,0: $zero=0(0x0) |- $a0=65952(0x101A0) -> $a0=0(0x0)
---
$pc=0x10038(~1): addi $t0,$sp,8: $sp=0xFFFFFFC0 |- $t0=7(0x7) -> $t0=4294967240(0xFF\
FFFFC8)
$pc=0x1003C(~1): addi $sp,$sp,-8: $sp=0xFFFFFFC0 |- $sp=0xFFFFFFC0 -> $sp=0xFFFFFFB8
$pc=0x10040(~1): sd $t0,0($sp): $sp=0xFFFFFFB8,$t0=4294967240(0xFFFFFFC8) |- mem[0xF\
FFFFFB8]=0 -> mem[0xFFFFFFB8]=$t0=4294967240(0xFFFFFFC8)
---
$pc=0x10044(~1): jal $ra,60[0x10134]: |- $ra=0x0,$pc=0x10044 -> $pc=0x10134,$ra=0x10\
048
...
$pc=0x1018C(~20): jalr $zero,0($ra): $ra=0x10048 |- $pc=0x1018C -> $pc=0x10048
---
$pc=0x10048(~1): addi $sp,$sp,-8: $sp=0xFFFFFFB8 |- $sp=0xFFFFFFB8 -> $sp=0xFFFFFFB0
$pc=0x1004C(~1): sd $a0,0($sp): $sp=0xFFFFFFB0,$a0=0(0x0) |- mem[0xFFFFFFB0]=65608 -\
> mem[0xFFFFFFB0]=$a0=0(0x0)
---
$pc=0x10050(~1): ld $a0,0($sp): $sp=0xFFFFFFB0,mem[0xFFFFFFB0]=0 |- $a0=0(0x0) -> $a\
0=0(0x0)=mem[0xFFFFFFB0]
$pc=0x10054(~1): addi $sp,$sp,8: $sp=0xFFFFFFB0 |- $sp=0xFFFFFFB0 -> $sp=0xFFFFFFB8
$pc=0x10058(~1): addi $a7,$zero,93: $zero=0(0x0) |- $a7=214(0xD6) -> $a7=93(0x5D)
$pc=0x1005C(~1): ecall(exit): $a0=0x0 |- ->
./selfie: manuscript/code/countdown.c exiting with exit code 0 and 0.00MB mallocated\
 memory
./selfie: selfie terminating manuscript/code/countdown.c with exit code 0
./selfie: summary: 132 executed instructions and 0.00MB mapped memory
./selfie: init:    lui: 1(0.75%), addi: 41(31.60%)
./selfie: memory:  ld: 25(18.93%), sd: 15(11.36%)
./selfie: compute: add: 0(0.00%), sub: 11(8.33%), mul: 0(0.00%), divu: 0(0.00%), rem\
u: 1(0.75%)
./selfie: control: sltu: 11(8.33%), beq: 11(8.33%), jal: 12(9.90%), jalr: 1(0.75%), \
ecall: 3(2.27%)
./selfie: profile: total,max(ratio%)@addr(line#),2max,3max
./selfie: calls:   1,1(100.00%)@0x134(~11),0(0.00%),0(0.00%)
./selfie: loops:   10,10(100.00%)@0x148(~11),0(0.00%),0(0.00%)
./selfie: loads:   25,11(44.00%)@0x148(~11),10(40.00%)@0x158(~13),1(4.00%)@0x50(~1)
./selfie: stores:  15,10(66.66%)@0x164(~13),1(6.66%)@0x30(~1),1(6.66%)@0x40(~1)

Initialization

The purpose of the first eighteen instructions executed by mipster is to initialize the machine and get it ready for executing the code that implements the main procedure.

lui

Initially, the PC denoted by $pc points to address 0x10000 in memory. The instruction at this address is thus the first instruction that will be executed by the machine. The instruction is encoded in the word 0x000101B7 (omitted here but mentioned before) which stands for lui $gp,0x10.

In our discussion we provide for each new instruction a link to the source code of mipster that implements the instruction (see, for example, the above liu) and for each concrete instruction in the example a link to the source code of starc that generated the instruction (see, for example, the above lui $gp,0x10).

Now, here is the interesting part. The output |- $t0=0x0 -> $t0=0x10000 next to the instruction tells us which part of the state space (memory and registers) the instruction depends on (indicated here by the empty space to the left of the implication |-), what the affected state actually is right before executing the instruction ($t0=0x0 between |- and the arrow ->), and how the state space changes after executing the instruction ($t0=0x10000 to the right of ->). In other words, the execution of lui $gp,0x10 does not depend on any values in memory or registers but it does affect the value in $t0 by setting it from 0x0 to 0x10000.

This is because lui $gp,0x10 instructs the processor to load the value 0x10, shifted by 12 bits to the left, into the register $t0. Its purpose is to initialize the upper bits of $t0 (above the 12 LSBs) with the immediate value 0x10. The mnemonic lui stands for load upper immediate. The liu instruction may involve any of the 32 general-purpose registers, not just $t0, which happens to be register 5 among the 32 general-purpose registers. However, there is a convention of using registers for a certain purpose as reflected in the names of registers. For example, the t in $t0 stands for temporary. Registers with t are meant to store temporary results during computation.

Also, very important and not to forget, lui $gp,0x10 makes the processor increment the $pc register from 0x10000 to 0x10004 so that the next instruction executed by the processor is the instruction at address 0x10004 that immediately follows the current instruction in memory. Incrementing the PC like that creates so-called sequential control flow. Most instructions actually do that, not just liu. There are, however, also instructions that can alter the control flow by setting the $pc register depending on the values in registers other than $pc. We explain that below.

addi

This takes us to the next instruction addi $t0,$t0,416 at address 0x10004. Its effect on the machine state is that the value in the $t0 register is set to 0x101A0. This is because addi $t0,$t0,416 instructs the processor to add the immediate value 416, which is 0x1A0 in hexadecimal, to the value in $t0 (second occurrence of $t0 in addi $t0,$t0,416), which is currently 0x10000, and store the result in $t0 (first occurrence of $t0). The mnemonic addi stands for add immediate. The output $t0=65536(0x10000) |- $t0=65536(0x10000) -> $t0=65952(0x101A0) next to the instruction confirms that that the execution of the instruction depends on the value currently in $t0, as indicated by $t0=65536(0x10000) before the implication |-. Also, the value of the $pc register is incremented to 0x10008.

The effect of the instruction addi $gp,$t0,0 at address 0x10008 on the machine state is that the value in the $gp register is set to 0x101A0 because it instructs the processor to add 0 to the value in register $t0 which is currently 0x101A0 and store the result in $gp. The output $t0=65952(0x101A0) |- $gp=0x0 -> $gp=0x101A0 next to the instruction confirms that. In other words, addi $gp,$t0,0 does effectively copy the value in $t0 to $gp, which happens to be register 3 among the 32 general-purpose registers.

What is the purpose of the first three instructions? Simple. They are meant to initialize the $gp register which stands for global pointer. Why do we use three instructions instead of two? Good question. Just using lui $gp,0x10 followed by addi $gp,$gp,416 without referring to $t0 at all would do the trick as well. The reason why we are not doing that is because it makes the compiler simpler, and better performance through using fewer instructions and fewer registers is not our focus here. It is, however, of major concern in state-of-the-art compilers.

More important here is the question as to why $gp is initialized to 0x101A0? This is because 0x101A0 is the address of the first word after the code and data segments in memory which together occupy exactly, well, 416 bytes starting from address 0x10000. But wait, 0x101A0 is in this case the program break, right? Yes, that is exactly right!

The purpose of the global pointer is to provide a fixed point of reference for referring to memory words that store the values of global variables. To find out where the word storing the value of a global variable is in memory, we only need to know the offset in bytes of how far below the address to which $gp points to the word is located. By the way, the value in $gp never changes during the execution of the program. So, relative to the value of $gp, what is the offset of the word in memory that stores the value of the global variable bar? This is clarified and explained in more detail below.

Let us take a quick look at the next eleven instructions in the above output. Their purpose is to initialize a hidden global variable called _bump which is used for memory management. The first instruction addi $a0,$zero,0 at address 0x1000C demonstrates another use of the addi instruction, namely, it instructs the processor to add the immediate value 0 to the value in register $zero which is always 0 and store the result in register $a0. In other words, addi $a0,$zero,0 does effectively load the immediate value 0 into $a0, enabled by the fact that $zero always contains 0. Immediate values other than 0 are of course possible. By the way, the a in $a0 stands for argument, explained further below, and $a0 happens to be register 10 among the 32 general-purpose registers.

sd

The following ten instructions perform further computations to determine the correct initial value for _bump. The last instruction at address 0x10034 is again addi $a0,$zero,0 to clean up $a0. More interesting here is the instruction sd $a0,-8($gp) right before that at address 0x10030. This instruction stores the (double) word in $a0 in the memory word at the address derived from the value in $gp plus the offset -8, as indicated by the notation -8($gp). The sd stands for store double word. The RISC-V ISA refers to a double word rather than just a word to emphasize that the involved word is 64-bit rather than 32-bit. We ignore the distinction between double word and word because everything other than machine instructions in selfie is 64-bit.

More important here is that the value in $gp is interpreted as memory address plus the constant offset -8. This behavior is another addressing mode which is called register-relative addressing. We hear more about that below. The output $gp=0x101A0,$a0=65952(0x101A0) |- mem[0x10198]=0 -> mem[0x10198]=$a0=65952(0x101A0) next to the instruction shows exactly what is happening. The value in $a0, which is 0x101A0 (the program break, remember?), is stored in the memory word at address 0x10198, which is the value in $gp, that is, 0x101A0, plus -8. In other words, the program break is stored 8 bytes below the program break in memory. This is where the hidden _bump variable that we mentioned before is stored. Its purpose is to maintain a so-called bump pointer to a memory region called the heap where unused memory in the dynamically allocated part of memory is available. The details of that are explained in the name chapter.

Interestingly, the store operation is actually mentioned by the profiler in the above output, that is, in stores: 15,...,1(6.66%)@0x30(~1),... as one of the second most executed operations among a total of 15 store operations even though it is only executed once which corresponds to 6.66% of all store operations. The address 0x30 is the address of the store operation in the binary, not in memory, which would be the beginning of the code in memory, that is, 0x10000 plus 0x30 equal 0x10030.

The next three instructions at addresses 0x10038 to 0x10040 involve the $sp register which stands for stack pointer and is register 2 among the 32 general-purpose registers. The purpose of the stack pointer is to manage the previously mentioned stack which facilitates fast dynamic allocation of memory for procedures. At all times during code execution, the value of the stack pointer is an address that points to memory below which data can be stored temporarily. The details are explained in the stack chapter.

What is interesting here is that $sp is already set to the non-zero value 0xFFFFFFC0 as reported by the output $sp=0xFFFFFFC0 |- $t0=7(0x7) -> $t0=4294967240(0xFFFFFFC8) next to the instruction at address 0x10038. This happened after loading code, data, and input. Selfie set the stack pointer such that it points to the input of the program in memory. The three instructions here involving $sp are meant to complete stack initialization. However, don’t worry about the details for now. So, imagine, it took us seventeen instructions to initialize the global, bump, and stack pointers. We definitely need a higher-level programming language to raise the level of abstraction. However, as tedious as the machine level might be, it is completely deterministic and rather easy to understand.

For now, the important take-away message here is that we can reconstruct the full state of the machine at any instruction in the above output just by following the arrows -> line by line until that instruction. If you still cannot believe that a computer really is so simple and does work in these tiny steps and does so completely deterministically it is time to reflect about that here again. The machine starts in some given state and proceeds from there by changing very few bits in each step instructed by other bits that are identified by the program counter. That’s it. The only reason why computers appear to be so powerful is because they are so fast and can store enormous amounts of bits. This even applies to computers appearing to do many things at the same time. They don’t. When looking close enough things happen step by step.

Execution

Let us now look at the next instruction which makes the machine execute the code that actually implements the main procedure.

jal

That instruction is the jal $ra,60[0x10134] instruction where jal stands for jump and link. Its purpose is to call the code that implements the main procedure by setting the PC to the address 0x10134 which is where the first instruction of that code is located in memory. The profiler even mentions that first instruction in the above output, that is, in calls: 1,1(100.00%)@0x134(~11),...,... as the most called procedure code even though it is only called once (since there is no other). The jal instruction instructs the processor to jump to an address in memory relative to where the jal instruction is stored and continue code execution there. The addressing mode is called PC-relative addressing. The actual binary encoding of the address in the instruction contains the PC-relative offset in number of 32-bit words, that is, number of instructions, here 60, meaning jump forward by 60 instructions. The target address in bytes is only provided as [0x10134] for better readability.

However, most importantly, before setting the PC and performing the actual jump, the jal instruction sets the $ra register which stands for return address to the value of the $pc register plus 4 bytes, which amounts to 0x10048 in the example here, that is, the address of the next instruction. The output |- $ra=0x0,$pc=0x10044 -> $pc=0x10134,$ra=0x10048 next to the jalinstruction confirms that. The $ra register is register 1 among the 32 general-purpose registers. Its purpose is to remember or link to where code execution should resume when done with the code to which the machine is going to jump.

jalr

In the above output we are not showing the instructions implementing the main procedure except for the very last one which is the natural counterpart to the jal instruction. Let us first focus on that instruction and then see how the machine shuts down before explaining how the main procedure is implemented.

The jalr $zero,0($ra) instruction sets the PC to the value of the $ra register where jalr stands for jump and link register. The addressing mode is called register addressing. Here, the value of $ra is 0x10048 which is the address of the instruction that follows the jal instruction that took us here. The output $ra=0x10048 |- $pc=0x1018C -> $pc=0x10048 next to the jalr instruction confirms that behavior. In other words, jal and jalr work together by instructing the machine to jump to some code, execute it, and eventually come back to resume code execution right after the jump. Note that using the $ra register for this purpose is a software convention. In principle, other unused general-purpose registers could be used as well.

Termination

So, with the PC now pointing to the address 0x10048 in memory, the next four instructions to be executed prepare program termination. They actually copy the value returned by the main procedure, which is 0, into the $a0 register. This is something we could have done with zero instructions since $a0 already contained 0 before but never mind.

ld

Among these four instructions the instruction we have not seen yet is the ld $a0,0($sp) instruction at address 0x10050 where ld stands for load double word. This instruction loads, into the $a0 register, the (double) word from memory stored at the address derived from the value of $sp plus the offset 0, again as indicated by the notation 0($sp). Similar to the sd instruction, the ld instruction uses register-relative addressing and is thus the natural counterpart to the sd instruction. The effect of the four instructions is that the value in $a0 is copied to $a0 via the memory word at address $sp (after decrementing $sp by 8 bytes and before incrementing $sp, again by 8 bytes, back to its original value). The reasoning behind that behavior is explained in the stack chapter.

Interestingly again, this load instruction is also mentioned by the profiler in the above output, that is, in loads: 25,...,...,1(4.00%)@0x50(~1) as one of the third most executed load operations among a total of 25 load operations even though it is only executed once which corresponds to 4% of all load operations.

ecall

The next instruction addi $a7,$zero,93 loads the value 93 into the $a7 register which is register 17 among the 32 general-purpose registers. Upon executing the following ecall instruction, that value in $a7 instructs the machine to output the value in $a0 as exit code, which is 0, and then to shut down. That’s it.

Again, the exact reasoning why things are done this way and what other behavior is supported by mipster is explained in later chapters. Here, we only point out that the ecall instruction, which stands for environment call, does not have any explicit arguments. However, mipster does expect implicit arguments for environment calls provided in at least the $a7 register which identifies among a finite set of choices the functionality that the emulator is supposed to perform. The $a0 register can then be used as argument of an environment call to pass additional information such as an exit code. The term environment call is a generalization of the much older term system call. Since environment calls in selfie are implemented in two ways, that is, by emulated hardware (mipster) and by system software (hypster), we generally refer to them as environment calls but also as system calls, in particular when emphasizing the latter.

Statement

So, how does the main procedure of countdown actually work? A procedure in C* consists of a sequence of statements which are the C* counterpart to RISC-U machine instructions. In fact, each statement translates to a sequence of machine instructions generated automatically by starc during compilation.

Statement
The smallest standalone element of an imperative programming language that expresses some action to be carried out. It is an instruction written in a high-level language that commands the computer to perform a specified action. A program written in such a language is formed by a sequence of one or more statements. A statement may have internal components (e.g., expressions).

C* features only five different kinds of statements: assignment, while, if, procedure call, and return. We discuss assignment, while, and return here and explain if and procedure call in subsequent chapters. The default control flow in C*, just like in RISC-U, is sequential from one statement to the next. However, while only 3 out of the 14 RISC-U machine instructions are control flow instructions, all statements but assignment are control flow statements in C*.

Here is the output of mipster when executing the instructions that implement the main procedure. The ... part in the middle of the output is repetitive and skipped for brevity but explained below. The first five and the last six instructions, including the jalr instruction, correspond to the so-called prologue and epilogue of the procedure and can safely be ignored for now. They are generated by starc for all procedures. An important property is that the last five instructions before the jalr instruction undo exactly what the first five instructions did to the involved registers. The $fp register is explained in the next two chapters. Look at the register values before and after executing these instructions to see for yourself!

> ./selfie -c manuscript/code/countdown.c -d 1
...
$pc=0x10134(~11): addi $sp,$sp,-8: $sp=0xFFFFFFB8 |- $sp=0xFFFFFFB8 -> $sp=0xFFFFFFB0
$pc=0x10138(~11): sd $ra,0($sp): $sp=0xFFFFFFB0,$ra=0x10048 |- mem[0xFFFFFFB0]=0 -> \
mem[0xFFFFFFB0]=$ra=0x10048
$pc=0x1013C(~11): addi $sp,$sp,-8: $sp=0xFFFFFFB0 |- $sp=0xFFFFFFB0 -> $sp=0xFFFFFFA8
$pc=0x10140(~11): sd $fp,0($sp): $sp=0xFFFFFFA8,$fp=0x0 |- mem[0xFFFFFFA8]=0 -> mem[\
0xFFFFFFA8]=$fp=0x0
$pc=0x10144(~11): addi $fp,$sp,0: $sp=0xFFFFFFA8 |- $fp=0x0 -> $fp=0xFFFFFFA8
---
$pc=0x10148(~11): ld $t0,-16($gp): $gp=0x101A0,mem[0x10190]=10 |- $t0=4294967240(0xF\
FFFFFC8) -> $t0=10(0xA)=mem[0x10190]
$pc=0x1014C(~11): addi $t1,$zero,0: $zero=0(0x0) |- $t1=0(0x0) -> $t1=0(0x0)
$pc=0x10150(~11): sltu $t0,$t1,$t0: $t1=0(0x0),$t0=10(0xA) |- $t0=10(0xA) -> $t0=1(0\
x1)
$pc=0x10154(~11): beq $t0,$zero,6[0x1016C]: $t0=1(0x1),$zero=0(0x0) |- $pc=0x10154 -\
> $pc=0x10158
$pc=0x10158(~13): ld $t0,-16($gp): $gp=0x101A0,mem[0x10190]=10 |- $t0=1(0x1) -> $t0=\
10(0xA)=mem[0x10190]
$pc=0x1015C(~13): addi $t1,$zero,1: $zero=0(0x0) |- $t1=0(0x0) -> $t1=1(0x1)
$pc=0x10160(~13): sub $t0,$t0,$t1: $t0=10(0xA),$t1=1(0x1) |- $t0=10(0xA) -> $t0=9(0x\
9)
$pc=0x10164(~13): sd $t0,-16($gp): $gp=0x101A0,$t0=9(0x9) |- mem[0x10190]=10 -> mem[\
0x10190]=$t0=9(0x9)
$pc=0x10168(~19): jal $zero,-8[0x10148]: |- $pc=0x10168 -> $pc=0x10148
---
$pc=0x10148(~11): ld $t0,-16($gp): $gp=0x101A0,mem[0x10190]=9 |- $t0=9(0x9) -> $t0=9\
(0x9)=mem[0x10190]
$pc=0x1014C(~11): addi $t1,$zero,0: $zero=0(0x0) |- $t1=1(0x1) -> $t1=0(0x0)
$pc=0x10150(~11): sltu $t0,$t1,$t0: $t1=0(0x0),$t0=9(0x9) |- $t0=9(0x9) -> $t0=1(0x1)
$pc=0x10154(~11): beq $t0,$zero,6[0x1016C]: $t0=1(0x1),$zero=0(0x0) |- $pc=0x10154 -\
> $pc=0x10158
$pc=0x10158(~13): ld $t0,-16($gp): $gp=0x101A0,mem[0x10190]=9 |- $t0=1(0x1) -> $t0=9\
(0x9)=mem[0x10190]
$pc=0x1015C(~13): addi $t1,$zero,1: $zero=0(0x0) |- $t1=0(0x0) -> $t1=1(0x1)
$pc=0x10160(~13): sub $t0,$t0,$t1: $t0=9(0x9),$t1=1(0x1) |- $t0=9(0x9) -> $t0=8(0x8)
$pc=0x10164(~13): sd $t0,-16($gp): $gp=0x101A0,$t0=8(0x8) |- mem[0x10190]=9 -> mem[0\
x10190]=$t0=8(0x8)
$pc=0x10168(~19): jal $zero,-8[0x10148]: |- $pc=0x10168 -> $pc=0x10148
...
$pc=0x10148(~11): ld $t0,-16($gp): $gp=0x101A0,mem[0x10190]=1 |- $t0=1(0x1) -> $t0=1\
(0x1)=mem[0x10190]
$pc=0x1014C(~11): addi $t1,$zero,0: $zero=0(0x0) |- $t1=1(0x1) -> $t1=0(0x0)
$pc=0x10150(~11): sltu $t0,$t1,$t0: $t1=0(0x0),$t0=1(0x1) |- $t0=1(0x1) -> $t0=1(0x1)
$pc=0x10154(~11): beq $t0,$zero,6[0x1016C]: $t0=1(0x1),$zero=0(0x0) |- $pc=0x10154 -\
> $pc=0x10158
$pc=0x10158(~13): ld $t0,-16($gp): $gp=0x101A0,mem[0x10190]=1 |- $t0=1(0x1) -> $t0=1\
(0x1)=mem[0x10190]
$pc=0x1015C(~13): addi $t1,$zero,1: $zero=0(0x0) |- $t1=0(0x0) -> $t1=1(0x1)
$pc=0x10160(~13): sub $t0,$t0,$t1: $t0=1(0x1),$t1=1(0x1) |- $t0=1(0x1) -> $t0=0(0x0)
$pc=0x10164(~13): sd $t0,-16($gp): $gp=0x101A0,$t0=0(0x0) |- mem[0x10190]=1 -> mem[0\
x10190]=$t0=0(0x0)
$pc=0x10168(~19): jal $zero,-8[0x10148]: |- $pc=0x10168 -> $pc=0x10148
---
$pc=0x10148(~11): ld $t0,-16($gp): $gp=0x101A0,mem[0x10190]=0 |- $t0=0(0x0) -> $t0=0\
(0x0)=mem[0x10190]
$pc=0x1014C(~11): addi $t1,$zero,0: $zero=0(0x0) |- $t1=1(0x1) -> $t1=0(0x0)
$pc=0x10150(~11): sltu $t0,$t1,$t0: $t1=0(0x0),$t0=0(0x0) |- $t0=0(0x0) -> $t0=0(0x0)
$pc=0x10154(~11): beq $t0,$zero,6[0x1016C]: $t0=0(0x0),$zero=0(0x0) |- $pc=0x10154 -\
> $pc=0x1016C
---
$pc=0x1016C(~19): ld $t0,-16($gp): $gp=0x101A0,mem[0x10190]=0 |- $t0=0(0x0) -> $t0=0\
(0x0)=mem[0x10190]
$pc=0x10170(~19): addi $a0,$t0,0: $t0=0(0x0) |- $a0=0(0x0) -> $a0=0(0x0)
$pc=0x10174(~19): jal $zero,1[0x10178]: |- $pc=0x10174 -> $pc=0x10178
---
$pc=0x10178(~20): addi $sp,$fp,0: $fp=0xFFFFFFA8 |- $sp=0xFFFFFFA8 -> $sp=0xFFFFFFA8
$pc=0x1017C(~20): ld $fp,0($sp): $sp=0xFFFFFFA8,mem[0xFFFFFFA8]=0x0 |- $fp=0xFFFFFFA\
8 -> $fp=0x0=mem[0xFFFFFFA8]
$pc=0x10180(~20): addi $sp,$sp,8: $sp=0xFFFFFFA8 |- $sp=0xFFFFFFA8 -> $sp=0xFFFFFFB0
$pc=0x10184(~20): ld $ra,0($sp): $sp=0xFFFFFFB0,mem[0xFFFFFFB0]=0x10048 |- $ra=0x100\
48 -> $ra=0x10048=mem[0xFFFFFFB0]
$pc=0x10188(~20): addi $sp,$sp,8: $sp=0xFFFFFFB0 |- $sp=0xFFFFFFB0 -> $sp=0xFFFFFFB8
$pc=0x1018C(~20): jalr $zero,0($ra): $ra=0x10048 |- $pc=0x1018C -> $pc=0x10048
...

While Statement

The execution of a procedure like main always begins with the first statement of the procedure which is here the while statement, also called while loop, in Line 11 in countdown.c. The meaning of that statement is to check if the value of the global variable bar is greater than 0 and, if yes, to execute the sequence of statements, called the body of the while loop, enclosed in the curly braces {} right after while (bar > 0). Here, there is only one statement which is the assignment bar = bar - 1. When the body of the while loop is finished executing, control flows back to while (bar > 0) for checking the value of bar again. If the value is still greater than 0 the body of the while loop is executed again in another iteration of the loop and so on. Only if the value of bar is not greater than 0 the body is not executed. In this case, the while loop is terminated and control flows to the statement that follows the while statement, which is here the return bar statement.

While Loop
A control flow statement that allows code to be executed repeatedly based on a given Boolean condition.

Here, bar > 0 is that Boolean condition, also called loop condition, which evaluates to the Boolean values true or false. A possibly confusing convention in C is that the Boolean value false is represented by the integer value 0 while true is represented by the integer value 1 or in fact any integer value other than 0. Many other programming languages avoid that ambiguity by providing Boolean literals such as true and false.

The execution of a while loop always results in checking the loop condition at least once. However, the body of the while loop may be executed any number of times, including zero times, depending on the Boolean condition and the statements in the body. For example, while (0) { ... } would find its loop condition to evaluate to false and therefore proceed to the next statement, skipping any statements in its body, whereas while (1) { ... } would always find its loop condition to evaluate to true and therefore never proceed to the next statement. Try that yourself by modifying the countdown program accordingly!

You may also ask yourself if there can be nested while loops, that is, while statements in the body of a while loop. The answer is yes, of course, any finite number of times in fact. Imagine what this can do, a loop inside of a loop inside of a loop. This is very powerful and can get quite tricky so we stay away from that for now.

Let us now take a look at the machine code generated by starc for the while loop and see how it executes. The instructions implement exactly what we just described informally.

The first instruction of the while loop is ld $t0,-16($gp) at address 0x10148. Its purpose is to load from memory the value of the global variable bar occurring in while (bar > 0) into register $t0 for comparison with 0. The value of bar is stored in memory in the word at the address derived from $gp plus the offset -16 (bytes), that is, the value 0x10 in hexadecimal. Remember, $gp points to the first word in memory above the data segment. Since the value of $gp is 0x101A0, the actual address of the value of bar in memory is 0x10190 in hexadecimal. If there was a third global variable, besides _bump and bar, its value would be stored at the address derived from $gp plus the offset -24 (bytes) and so on. The output $gp=0x101A0,mem[0x10190]=10 |- $t0=4294967240(0xFFFFFFC8) -> $t0=10(0xA)=mem[0x10190] next to the instruction shows that the value stored at 0x10190 is in fact 10 which is the initial value of bar. How did that value get there? The boot loader put it there! So 10 was there after loading the code and data generated by starc even before mipster started executing any code.

The next instruction of the while loop is addi $t1,$zero,0 which loads the value of 0 occurring in while (bar > 0) into register $t1 for comparison with $t0.

sltu

The actual comparison is performed by the sltu $t0,$t1,$t0 instruction where sltu stands for set on less than unsigned. It sets the value of $t0 to 1 (first occurrence of $t0 in the instruction) if the unsigned value of $t1, which is here 0, is less than the unsigned value of $t0 (second occurrence of $t0), which is here 10. The output $t1=0(0x0),$t0=10(0xA) |- $t0=10(0xA) -> $t0=1(0x1) confirms that. If the value of $t1 was not less than the value of $t0 it would set $t0 to 0, which is exactly what happens when the value of bar stored at address 0x10190 eventually becomes 0, see the last occurrence of sltu $t0,$t1,$t0 in the above output.

beq

The next instruction is the beq $t0,$zero,6[0x1016C] instruction where beq stands for branch on equal. Its purpose is to check the result of the previous sltu instruction stored in $t0 and see if it is 0 or 1 by comparing it with the value of $zero which is still just 0. If the result is 1, which effectively means that the value of bar is greater than 0, nothing happens other than an increment of the PC by 4 bytes from 0x10154 to 0x10158 which is where the next instruction is located. The ld instruction there is the first instruction implementing the body of the while loop.

Now, what would happen if the result of the previous sltu instruction was 0, meaning that the value of bar was not greater than 0? This does in fact happen exactly once in our example, see the last occurrence of beq $t0,$zero,6[0x1016C] in the above output. In that case, the instruction branches forward to the first instruction after the 5 instructions that implement the body of the while loop, effectively terminating the while loop. In particular, beq $t0,$zero,6[0x1016C] increments in that case the PC with the offset 6 times 4 bytes from 0x10154 to 0x1016C. The addressing mode is thus PC-relative addressing. The address in absolute terms is only provided as 0x1016C for better readability. Branching typically uses PC-relative addressing rather than, say, register-relative or absolute addressing. The way we use the beq instruction here is called conditional branching.

Branch
An instruction in a computer program that can cause a computer to begin executing a different instruction sequence and thus deviate from its default behavior of executing instructions in order. A branch instruction can be either an unconditional branch, which always results in branching, or a conditional branch, which may or may not cause branching, depending on some condition. Branch instructions are used to implement control flow in program loops and conditionals (i.e., executing a particular sequence of instructions only if certain conditions are satisfied).

Before explaining the instructions that implement the body of the while loop we focus on the last instruction that implements the while statement. It is the jal $zero,-8[0x10148] instruction at address 0x10168 which is in fact used here for backward jumping. The instruction unconditionally jumps backwards by 8 instructions (due to the negative offset -8) to the first instruction that implements the while statement at address 0x10148. In other words, the unconditional backward jump completes the loop.

For brevity we are only showing in the above output the first two iterations and the last, in fact, tenth iteration of the loop. The profiler even mentions in loops: 10,10(100.00%)@0x148(~11),...,... the total number of loop iterations taken during program execution. It also reports that they were all done by the same while loop with its first instruction at address 0x148 in the binary, that is, 0x10148 in memory.

Also, note that the only relevant thing that changes in the machine’s state from one loop iteration to the next is the value of bar in memory. The values of $t0 and $t1 also change but are always overwritten in the next iteration and therefore never used beyond one iteration.

Moreover, the control structure of a while statement is only implemented by the beq and jal instructions we just explained, that is, a conditional forward branch and an unconditional backward jump. The instructions before the forward branch that belong to the while statement implement the loop condition and the instructions between the forward branch and backward jump implement the body of the while loop. Let us now have a look at the body of the while loop in our example.

Assignment

The assignment bar = bar - 1 in Line 13 in countdown.c constitutes the body of the previously discussed while loop. The assignment decrements the value of bar by 1, that is, it loads the value of bar (right hand side of =) from memory, subtracts 1 from that value, and stores the resulting value in the memory for bar (left hand side of =), overwriting the previous value of bar in memory.

Assignment
Sets and/or re-sets the value stored in the storage location(s) denoted by a variable name; in other words, it copies a value into the variable. In most imperative programming languages, the assignment statement (or expression) is a fundamental construct.

What does fundamental construct mean? In imperative programming languages assignments are the only way to change state other than control state which is the portion of state that represents the current state of control flow, that is, the currently executed statement. In the countdown program the only non-control-state information is thus the value of bar in memory. That’s it! On source code level, the values of all other memory and all registers is therefore not relevant for the correctness of countdown.

Expression

The right hand side of an assignment provides a so-called expression for computing non-control-state information, that is, the next value of the variable on the left hand side of the assignment. In bar = bar - 1 that expression is obviously bar - 1.

Expression
A combination of one or more explicit values, constants, variables, operators, and functions that the programming language interprets (according to its particular rules of precedence and of association) and computes to produce (“to return”, in a stateful environment) another value. This process, as for mathematical expressions, is called evaluation.

In other words, the expression bar - 1 could be more complex and involve other variables, constants, operators, and even functions. C*, for example, supports integer, character, and string literals as constants as well as operators for addition, subtraction, multiplication, division, and even remainder. It also supports procedure calls in expressions as functions.

A Boolean condition such as bar > 0 is actually also an example of an expression involving the comparison operator >. This means that a loop condition could be more complex and involve other comparison operators supported by C* as well.

Let us now again take a look at the machine code generated by starc for the assignment and see how it executes. Just like before, the instructions implement exactly what we described above informally.

The first two instructions of the assignment ld $t0,-16($gp) at address 0x10158 and addi $t1,$zero,1 at address 0x1015C load the values of bar and 1 into $t0 and $t1, respectively, to prepare for the evaluation of the expression bar - 1.

sub

The next instruction is sub $t0,$t0,$t1 which actually implements the subtraction in bar - 1, that is, the - operator, by subtracting the value of $t1 from the value of $t0 (second occurrence of $t0) and store the result in $t0 (first occurrence of $t0). Unsurprisingly, sub stands for subtract, analogous to addi. Unlike addi, however, sub uses register addressing, as opposed to immediate addressing. In other words, its third argument is a register, not an integer value.

At this point the evaluation of the expression and thus the right hand side of the = operator is complete. The only remaining thing to do is to perform the actual assignment. This is done by sd $t0,-16($gp) which is the last instruction generated by starc for the assignment, that is, for the left hand side of the = operator. Analogous to ld $t0,-16($gp), which loads the memory word that represents the value of bar into $t0, sd $t0,-16($gp) stores the result of the evaluation in $t0 in that memory word. That’s it! Interestingly, an assignment is actually implemented by only one instruction, namely that sd instruction. The other instructions are there for evaluating the expression in the right hand side of the assignment.

Return Statement

Once the while statement and the assignment in countdown.c has been executed ten times, control reaches the return statement return bar in Line 19. That statement does two things. It provides the value of bar as return value of the main procedure, and it terminates the execution of the body of main, even if there were more statements after return bar, and returns to the code that invoked main.

Return Statement
Causes execution to leave the current subroutine and resume at the point in the code immediately after where the subroutine was called, known as its return address. The return address is saved, usually on the process’s call stack, as part of the operation of making the subroutine call. Return statements in many languages allow a function to specify a return value to be passed back to the code that called the function.

We mentioned the return address for the invocation of the main procedure before, which is 0x10048. However, we did not mention that it is actually saved in memory where the stack pointer refers to by the sd $ra,0($sp) instruction at 0x10138 in the prologue of the procedure. There is also the ld $ra,0($sp) instruction at 0x10184 in the epilogue of the procedure that matches the sd instruction. The details of why this is done are explained in the stack chapter. Also, the terms procedure and function are used synonymously here. We prefer to just use procedure to avoid confusion with functions in a mathematical sense.

Recall that the main procedure in our example has a return type uint64_t declared in Line 6 in countdown.c. The data type of any value returned by a return statement in the body of main needs to match that return type. In our example, return bar is fine because the data type of bar is also uint64_t. Also, any expressions more complex than just bar are possible as long as they evaluate to a value of the return type. Try return bar + 7, for example.

Again, let us now take a look at the machine code generated by starc for the return statement and see how it executes. Just like before, the instructions implement exactly what we described above informally.

Upon terminating the execution of the instructions that implement the while loop in countdown.c, the machine takes the forward branch of the beq $t0,$zero,6[0x1016C] instruction at address 0x10154 to proceed to the ld $t0,-16($gp) instruction at address 0x1016C. This instruction is generated by starc for loading the value of bar occurring in return bar from memory into $t0.

The next instruction is addi $a0,$t0,0. Its purpose is to copy the value of $t0 to $a0. So, now the value of bar is in register $a0. Why? Because starc generates code such that return values of procedures are always stored in $a0. It is another software convention. Recall that the four instructions starting at address 0x10048 to which we eventually return do something with $a0. In fact, they copy the value of $a0 to, well, $a0 via the stack. The following ecall instruction takes the value of $a0 and outputs it as exit code. Voila.

The last instruction implementing the return statement is jal $zero,1[0x10178]. Its purpose is to jump to the epilogue of the procedure. Here, it does so by merely jumping to the next instruction since the epilogue is right there. In other words, the jump is actually not necessary. However, this is not true in general, namely whenever the return statement is not the last statement in the body of a procedure. To keep things simple we nevertheless have starc generate the jal instruction anyway.

Hard to believe but this concludes the discussion of the countdown program. We have seen all there is to say about it. However, there are four more RISC-U instructions that we have not seen or discussed yet. They, together with the sub instruction, are the RISC-U instructions in selfie for 64-bit integer arithmetic.

add

Analogous to the sub instruction, the add instruction adds the value of one general-purpose register to the value of another general-purpose register and stores the result in yet another general-purpose register.

mul

Similarly, the mul instruction multiplies the value of one general-purpose register with the value of another general-purpose register and stores the result in another general-purpose register.

divu

The divu instruction divides the value of one general-purpose register by the value of another general-purpose register and stores the result in another general-purpose register. The difference to the add, sub, and mul instructions is that the divu instruction specifically performs unsigned integer division, divu stands for divide unsigned. Integer division differs for signed and unsigned integers while integer addition, subtraction, and multiplication does not.

remu

The remu instruction is just like the divu instruction except that it computes the remainder.

With the full RISC-U instruction set introduced, there is one more thing we would like to point out here. In the above output, selfie provides statistical information on how many instructions of each type are generated during compilation by starc and how many are actually executed by mipster:

 1 > ./selfie -c manuscript/code/countdown.c -m 1
 2 ./selfie: selfie compiling manuscript/code/countdown.c with starc
 3 ...
 4 ./selfie: 416 bytes generated with 100 instructions and 16 bytes of data
 5 ./selfie: init:    lui: 1(0.83%), addi: 64(53.33%)
 6 ./selfie: memory:  ld: 19(15.83%), sd: 7(5.83%)
 7 ./selfie: compute: add: 1(0.83%), sub: 3(2.50%), mul: 0(0.00%), divu: 0(0.00%), remu\
 8 : 2(1.66%)
 9 ./selfie: control: sltu: 1(0.83%), beq: 5(4.16%), jal: 3(2.50%), jalr: 6(5.00%), eca\
10 ll: 8(6.66%)
11 ./selfie: selfie executing manuscript/code/countdown.c with 1MB physical memory on m\
12 ipster
13 ./selfie: manuscript/code/countdown.c exiting with exit code 0 and 0.00MB mallocated\
14  memory
15 ./selfie: selfie terminating manuscript/code/countdown.c with exit code 0
16 ./selfie: summary: 132 executed instructions and 0.00MB mapped memory
17 ./selfie: init:    lui: 1(0.75%), addi: 41(31.60%)
18 ./selfie: memory:  ld: 25(18.93%), sd: 15(11.36%)
19 ./selfie: compute: add: 0(0.00%), sub: 11(8.33%), mul: 0(0.00%), divu: 0(0.00%), rem\
20 u: 1(0.75%)
21 ./selfie: control: sltu: 11(8.33%), beq: 11(8.33%), jal: 12(9.90%), jalr: 1(0.75%), \
22 ecall: 3(2.27%)
23 ...

For example, when compiling countdown.c starc generated 3 sub instructions which is around 2.5% of the total number of generated instructions (sub: 3(2.50%) in Line 7). When executing the generated code sub instructions are executed 11 times which is around 8.33% of the total number of executed instructions (sub: 11(8.33%) in Line 19).

Pointer

Before concluding this chapter we go back to the “Hello World!” program presented in the previous chapter:

A “Hello World!” Program
 1 // global variable for pointing to the "Hello World!    " string
 2 uint64_t* foo;
 3 
 4 // main procedure for printing "Hello World!    " on the console
 5 uint64_t* main() {
 6   // point to the "Hello World!    " string
 7   foo = "Hello World!    ";
 8 
 9   // strings are actually stored in chunks of 8 characters in memory,
10   // that is, here as "Hello Wo", and "rld!    " which allows us to
11   // print them conveniently in chunks of 8 characters at a time
12 
13   // as long as there are characters print them
14   while (*foo != 0) {
15     // 1 means that we print to the console
16     // foo points to a chunk of 8 characters
17     // 8 means that we print 8 characters
18     write(1, foo, 8);
19 
20     // go to the next chunk of 8 characters
21     foo = foo + 1;
22   }
23 }

Here, we are interested in how the "Hello World!" string is actually handled in machine code. Consider the global variable declaration in Line 2. It declares a global variable foo. However, its data type is not uint64_t, it is uint64_t*. So, what is the difference? Values of both types are represented by 64 bits. So, that is not the difference. The only difference is the intention of what to do with them. Values of type uint64_t are obviously meant to be unsigned 64-bit integers which is reflected in the semantics of the operations on them which is of course integer arithmetics. Values of type uint64_t*, however, are meant to be pointers to values of type uint64_t stored in memory and identified by their memory addresses.

Pointer
A programming language object, whose value refers to (or “points to”) another value stored elsewhere in the computer memory using its memory address. A pointer references a location in memory, and obtaining the value stored at that location is known as dereferencing the pointer.

Line 7 in the "Hello World!" program provides an example. In the previous chapter, we already explained that the string "Hello World! " is stored contiguously in memory in two words encoding "Hello Wo" and "rld! ", and 0 to terminate the string. The assignment in Line 7 takes the address of the first word in memory encoding "Hello Wo" and assigns that address as value to foo. In other words, after the assignment foo points to the beginning of the "Hello World! " string in memory.

So, how do we now get access to the string, or in fact any data in memory to which a pointer points to? The programming language C and C* as well provide a unary operator for this purpose called the dereference operator denoted by *. Even though this operator is denoted by the same symbol it has nothing to do with the binary operator for multiplication. The dereference operator takes as argument a pointer for which it returns the value stored at the address in memory where the pointer points to. It is in fact the only way in C* to access data anywhere in memory hence the name C*.

Line 14 is an example of how to use the dereference operator. The expression *foo in while (*foo != 0) returns as integer value the word that encodes "Hello Wo" stored at the address in memory to which foo points to. Since that value is not equal to 0 (the != comparison operator checks for inequality) the body of the while loop is executed. The statement in Line 18 calls the write procedure to output the 8 characters encoded by that value to the console. The details of how this is done are explained in a later chapter.

The following assignment in Line 21 is interesting. It takes the value of foo and increments it by 8 (!), not 1, as the expression foo + 1 may suggest. The reason is that the + operator implements addition according to pointer arithmetics rather than integer arithmetics if the first operand is a pointer to an integer value (foo) and the second operand is just an integer value (1). In this case, + works by first multiplying the second operand 1 with the size of the data type in bytes to which the first operand foo points to. Only then the result of that multiplication, which is 8 because the size of an integer value is 8 bytes, is added to the value of foo. After the assignment, foo thus points to the word that encodes "rld! ".

After one more iteration of the while loop, the "Hello World! " string is output to the console and foo points to the word that terminates the string by encoding 0 effectively terminating the loop as well. The purpose of pointer arithmetics is to provide a convenient way to move a pointer across memory at the granularity of the size of the data type to which the pointer points to.

Let us now again take a look at the relevant machine code generated by starc for Lines 7, 14, and 21 in the "Hello World!" program and see how it executes. Just like before, the instructions implement exactly what we described above informally.

> ./selfie -c manuscript/code/hello-world.c -d 1
./selfie: selfie compiling manuscript/code/hello-world.c with starc
...
./selfie: selfie executing manuscript/code/hello-world.c with 1MB physical memory on\
 mipster
...
$pc=0x10148(~7): addi $t0,$zero,-40: $zero=0(0x0) |- $t0=4294967240(0xFFFFFFC8) -> $\
t0=-40(0xFFFFFFFFFFFFFFD8)
$pc=0x1014C(~7): add $t0,$gp,$t0: $gp=0x101F8,$t0=-40(0xFFFFFFFFFFFFFFD8) |- $t0=-40\
(0xFFFFFFFFFFFFFFD8) -> $t0=66000(0x101D0)
$pc=0x10150(~7): sd $t0,-16($gp): $gp=0x101F8,$t0=66000(0x101D0) |- mem[0x101E8]=0 -\
> mem[0x101E8]=$t0=66000(0x101D0)
$pc=0x10154(~14): ld $t0,-16($gp): $gp=0x101F8,mem[0x101E8]=66000 |- $t0=66000(0x101\
D0) -> $t0=66000(0x101D0)=mem[0x101E8]
$pc=0x10158(~14): ld $t0,0($t0): $t0=0x101D0,mem[0x101D0]=8022916924116329800 |- $t0\
=66000(0x101D0) -> $t0=8022916924116329800(0x6F57206F6C6C6548)=mem[0x101D0]
...
$pc=0x10198(~21): ld $t0,-16($gp): $gp=0x101F8,mem[0x101E8]=66000 |- $t0=8(0x8) -> $\
t0=66000(0x101D0)=mem[0x101E8]
$pc=0x1019C(~21): addi $t1,$zero,1: $zero=0(0x0) |- $t1=0(0x0) -> $t1=1(0x1)
$pc=0x101A0(~21): addi $t2,$zero,8: $zero=0(0x0) |- $t2=0(0x0) -> $t2=8(0x8)
$pc=0x101A4(~21): mul $t1,$t1,$t2: $t1=1(0x1),$t2=8(0x8) |- $t1=1(0x1) -> $t1=8(0x8)
$pc=0x101A8(~21): add $t0,$t0,$t1: $t0=66000(0x101D0),$t1=8(0x8) |- $t0=66000(0x101D\
0) -> $t0=66008(0x101D8)
$pc=0x101AC(~21): sd $t0,-16($gp): $gp=0x101F8,$t0=66008(0x101D8) |- mem[0x101E8]=66\
000 -> mem[0x101E8]=$t0=66008(0x101D8)
...
./selfie: manuscript/code/hello-world.c exiting with exit code 0 and 0.00MB mallocat\
ed memory
./selfie: selfie terminating manuscript/code/hello-world.c with exit code 0
./selfie: summary: 110 executed instructions and 0.00MB mapped memory
...

The first relevant instruction is the addi $t0,$zero,-40 instruction at memory address 0x10148 which starc generated for Line 7 to compute the address of the first word of the "Hello World! " string in memory. Why is it 40 bytes below the word in memory to which $gp points to? Simple. String literals, just like the values of global variables, are stored in the data segment in memory. Whenever starc encounters a string literal such as "Hello World! " it counts the number of words necessary to encode and store the string. The compiler then allocates the required space in the data segment and computes the address of that space relative to the global pointer.

Similar to the value of bar in the countdown program, the value of foo is stored 16 bytes below the word to which $gp points to. Since we need 3 words or 24 bytes to store the "Hello World! " string (including its termination) the closest address that still provides enough space is 40 bytes below the word to which $gp points to. The following add $t0,$gp,$t0 instruction computes that address and puts it in $t0. The sd $t0,-16($gp) instruction after that performs the actual assignment into foo by storing the value of $t0 in memory where the value of foo is located.

So, "Hello Wo", "rld! ", and 0 are stored contiguously 40, 32, and 24 bytes, respectively, below the word to which $gp points to. How did they get there? Similar to the initial values of global variables, the boot loader copied them from the binary to memory, even before code execution started. In other words, string literals are stored in binaries as data, just like initial values of global variables.

The next interesting instruction is the ld $t0,0($t0) instruction at address 0x10158 which starc generated for the occurrence of the dereference operator in while (*foo != 0). Before that instruction, the ld $t0,-16($gp) instruction at address 0x10154 loads the value of foo into $t0. The ld $t0,0($t0) instruction dereferences the pointer foo by loading the word into $t0 from memory where $t0 points to. After executing the instruction, $t0 thus contains the value that encodes "Hello Wo". Really? Check the mipster output $t0=0x101D0,mem[0x101D0]=8022916924116329800 |- $t0=66000(0x101D0) -> $t0=8022916924116329800(0x6F57206F6C6C6548)=mem[0x101D0] for that instruction. The decimal number 8022916924116329800 loaded into $t0 is equal to 0x6F57206F6C6C6548 in hexadecimal notation. The hexadecimal number 0x48 is the ASCII code for the uppercase letter H. Moreover, the hexadecimal numbers 0x65 and 0x6C are the ASCII codes for the lowercase letters e and l, respectively. The rest is straightforward. Very nice!

Before concluding this chapter, let us have a quick look at the code generated for the assignment foo = foo + 1 in Line 21. With the values of foo and 1 loaded into $t0 and $t1, respectively, the addi $t2,$zero,8 instruction at address 0x101A0 loads 8, the size of an integer value in bytes, into register $t2 which is another register for temporary results. It is register 7 among the 32 general-purpose registers.

The following mul $t1,$t1,$t2 instructions computes the increment 8 by multiplying the values of $t1 and $t2 and store the result in $t1. The following add $t0,$t0,$t1 instruction increments the value of $t0 by 8. The final sd $t0,-16($gp) instruction performs the actual assignment of the value of foo + 1 according to pointer arithmetics stored in $t0 into the word that stores the value of foo. That’s all.

Summary

Computation is a sequence of states. The execution of a program written in a high-level programming language like C* as well as the execution of machine code implementing that program evolves from one program or machine state to another. A key challenge is to make sure that bad states are never reached during any execution. This is difficult because there are usually very large amounts of possible states, good and bad. Programming correct code in a high-level language rather than a machine language feels easier because there is generally way fewer high-level program states than low-level machine states.

The "Hello World!" program, for example, only has one global variable foo. Its program state consists of the value of foo and the currently executed statement in the body of the main procedure. The same is true for the countdown program with its global variable bar. The machine code for both programs, however, is a lot more complicated because of its size and its state which involves a number of registers and memory locations.

The following table summarizes in which way the high-level programming artifacts we have seen so far relate to the low-level machine artifacts that implement them.

High-Level Programming Artifact Low-Level Machine Artifact
Global Variable Declaration Word in Memory
Global Variable Definition Initial Value in Memory Word
Integer Literal in Definition Value in Memory Word
Character Literal in Definition Character in Memory Word
Data Type Intended Meaning of Bits
Procedure Declaration Address of Code in Memory
Procedure Definition Actual Code in Memory
Procedure Call Jumping and Linking to Prologue
Statement Machine Instructions
Current Statement Program Counter
While Statement Forward and Backward Branching
Assignment Storing Register into Memory Word
Return Statement Forward Jumping to Epilogue
Variable in Expression Loading Memory Word into Register
Integer Literal in Expression Value in Instruction Argument
Character Literal in Expression Character in Instruction Argument
String Literal in Expression Characters in Memory Words
  Offset in Instruction Argument
Pointer Address in Memory
Arithmetic/Comparison Operator Computing with Registers

The next chapter introduces the arguably simplest way to control the state space of programs that still allows to implement a large set of interesting applications. We show which parts of selfie are using that technique and how.

5. Regularity

When reading source code the starc compiler does in fact just try to recognize such symbols, also called tokens. The process is very simple and called lexical analysis.

Lexical Analysis
The process of converting a sequence of characters (such as in a computer program or web page) into a sequence of tokens (strings with an identified “meaning”). A program that performs lexical analysis may be called a lexer, tokenizer, or scanner.

The compiler reads one character after another until it recognizes a symbol, after discarding whitespace and comments, of course. Once it recognizes a symbol it stores that symbol and then continues reading more characters until it recognizes the next symbol and so on until it reaches the end of the file. If the compiler reads a sequence of characters that does not constitute a symbol it reports an error and terminates. Try it! Just take the "Hello World!" program and modify it such that it contains something that is neither whitespace nor a comment nor a C* symbol. Then run starc to see what happens.