1. Introduction
The Python Programming language has been around for a long time. Guido van Rossum started development work on the first version in 1989, and it has since grown to become one of the more popular languages used in a wide range of applications from graphical interfaces to finance and data analysis.
This write-up looks at the nuts and bolts of the Python interpreter. It targets CPython, the most popular, and reference implementation of Python at the point of this write-up.
I regard the execution of a Python program as split into two or three main phases, as listed below. The relevant stages depend on how the interpreter is invoked, and this write-up covers them in different measures:
- Initialization: This step covers the set up of the various data structures needed by the Python process and is only relevant when a program is executed non-interactively through the command prompt.
- Compiling: This involves activities such as building syntax trees from source code, creating the abstract syntax tree, building the symbol tables, generating code objects etc.
- Interpreting: This involves the execution of the generated code object’s bytecode within some context.
The methods used in generating parse trees and syntax trees from source code are language-agnostic, so we do not spend much time on these. On the other hand, building symbol tables and code objects from the Abstract Syntax tree is the more exciting part of the compilation phase. This step is more Python-centric, and we pay particular attention to it. Topics we will cover include generating symbol tables, Python objects, frame objects, code objects, function objects etc. We will also look at how code objects are interpreted and the data structures that support this process.
This material is for anyone interested in gaining insight into how the CPython interpreter functions. The assumption is that the reader is already familiar with Python and understands the fundamentals of the language. As part of this exposition, we go through a
considerable amount of C code, so a reader with a rudimentary understanding of C will find it easier to follow. All that is needed to get through this material is a healthy desire to learn about the CPython virtual machine.
This work is an expanded version of personal notes taken while investigating the inner working of the Python interpreter. There is a substantial amount of wisdom in videos available in Pycon videos, school lectures and blog write-ups. This work will be incomplete without acknowledging these fantastic sources.
At the end of this write-up, a reader should understand the processes and data structures that are crucial to the execution of a Python program. We start next with an
overview of the execution of a script passed as a command-line argument to the interpreter. Readers can install the CPython executable from the source by following the instructions at the Python Developer’s Guide.