Leanpub: Publish Early, Publish Often

8. Intermezzo: The `abstract.c` Module

We have thus far mentioned severally that the Python virtual machine generically treat values for evaluation as PyObjects. This begets the obvious question - How are operations safely carried out on such generic objects ?. For example, when evaluating the bytecode instruction BINARY_ADD, two PyObject values are popped from the evaluation stack and used as arguments to an add operation, but how does the virtual machine know if the add operation makes sense for both values?

To understand how a lot of the operations on PyObjects work, we only have to look at the Objects/Abstract.c module. This module defines several functions that operate on objects that implement a given object protocol. This means that for example, if one were adding two objects, then the add function in this module would expect that both objects implement the __add__ method of the tp_numbers slots. The best way to explain this is to illustrate with an example.

Consider when the BINARY_ADD opcode adds two numbers, the function that does the addition is the PyNumber_Add function of the Objects/Abstract.c module. Listing 8.1 is the definition of the PyNumber_Add function.

Listing 8.1: Generic add function from abstract.c module

 1     PyObject * PyNumber_Add(PyObject *v, PyObject *w){
 2         PyObject *result = binary_op1(v, w, NB_SLOT(nb_add));
 3         if (result == Py_NotImplemented) {
 4             PySequenceMethods *m = v->ob_type->tp_as_sequence;
 5             Py_DECREF(result);
 6             if (m && m->sq_concat) {
 7                 return (*m->sq_concat)(v, w);
 8             }
 9             result = binop_type_error(v, w, "+");
10         }
11         return result;
12     }

Our interest at this point is in line 2 of the PyNumber_Add function defined in listing 8.1 - the call to the binary_op1 function. The binary_op1 function is another generic function that takes among its parameters, two numbers or subclass of numbers and applies a binary function to these; the NB_SLOT macro returns the offset of a given method into the PyNumberMethods structure; recall that this structure is a collection of methods that work on numbers. The definition of this generic binary_op1 function is in listing 8.2, and an in-depth explanation of this function immediately follows.

Listing 8.2: The generic binary_op1 function

 1     static PyObject * binary_op1(PyObject *v, PyObject *w, const int op_slot){
 2         PyObject *x;
 3         binaryfunc slotv = NULL;
 4         binaryfunc slotw = NULL;
 5 
 6         if (v->ob_type->tp_as_number != NULL)
 7             slotv = NB_BINOP(v->ob_type->tp_as_number, op_slot);
 8         if (w->ob_type != v->ob_type &&
 9             w->ob_type->tp_as_number != NULL) {
10             slotw = NB_BINOP(w->ob_type->tp_as_number, op_slot);
11             if (slotw == slotv)
12                 slotw = NULL;
13         }
14         if (slotv) {
15             if (slotw && PyType_IsSubtype(w->ob_type, v->ob_type)) {
16                 x = slotw(v, w);
17                 if (x != Py_NotImplemented)
18                     return x;
19                 Py_DECREF(x); /* can't do it */
20                 slotw = NULL;
21             }
22             x = slotv(v, w);
23             if (x != Py_NotImplemented)
24                 return x;
25             Py_DECREF(x); /* can't do it */
26         }
27         if (slotw) {
28             x = slotw(v, w);
29             if (x != Py_NotImplemented)
30                 return x;
31             Py_DECREF(x); /* can't do it */
32         }
33         Py_RETURN_NOTIMPLEMENTED;
34     }

The function takes three values, two PyObject * - v and w and an integer value, operation slot, which is the offset of that operation into the PyNumberMethods structure.
Lines 3 and 4 define two values slotv and slotw, structures that represent a binary function as their types suggest.
From line 3 to line 13, we attempt to dereference the function given by op_slot argument for both v and w. On line 8, there is an equality check of both values’ types, and if they are of the same type, there is no need to dereference the second value’s function in the op_slot. If both values are not of the same type, but the functions dereferenced from both are equal, then the slotw value is nulled out.
With the binary functions dereferenced, if slotv is not NULL then on line 15 we check that slotw is not NULL and the type of w is a subtype of the type of v. If that is true, slotw’s function is applied to both v and w. This happens because if you pause to think about it for a second, the method further down the inheritance tree is what we want to use not one further up. If w is not a subtype, then slotv is applied to both values at line 22.
Getting to line 27 means that the slotv function is NULL so we apply whatever slotw references to both v and w so long as it is not NULL.
In the case where both slotv and slotw both do not contain a function, then a Py_NotImplemented is returned. Py_RETURN_NOTIMPLEMENTED is just a macro that increments the reference count of the Py_NotImplemented value before returning it.

The idea captured by the explanation given above is a blueprint for how the interpreter performs operations on values. We have simplified things a bit here by ignoring that opcodes that can be overloaded. For example, the + symbol maps to the BINARY_ADD opcode and applies to strings, numbers and some sequences, but we have only looked at it in the context of numbers and subclasses of numbers. The BINARY_ADD implementation shown in Listing 8.3 can handle the other cases by looking at the type of values it is operating on and calling the corresponding functions. First, if both values are Unicode characters, the interpreter calls the function for concatenating Unicode characters. Otherwise, thePyNumber_Add function is invoked. This function’s implementation shows how it checks for numeric and then sequence types applying corresponding addition functions to the different types.

Listing 8.3: ceval implementation of binary add

    PyObject *right = POP();
    PyObject *left = TOP();
    PyObject *sum;
    if (PyUnicode_CheckExact(left) &&
        PyUnicode_CheckExact(right)) {
        sum = unicode_concatenate(left, right, f, next_instr);
        /* unicode_concatenate consumed the ref to left */
    }
    else {
        sum = PyNumber_Add(left, right);
        Py_DECREF(left);
    }

Ignore lines 1 and 2 as we discuss them when we talk about the interpreter loop. What we see from the rest of the snippet, is that when we encounter the BINARY_ADD, the first port of call is a check that both values are strings to apply string concatenation to the values. The PyNumber_Add function from Objects/Abstract.c is then applied to both values if they are not strings. Although the code seems a bit messy with the string check done in Python/ceval.c and the number and sequence checks done in Objects/Abstract.c, it is pretty clear what is happening when we have an overloaded opcode.

This explanation provided above is the way the interpreter handles most opcode operations - check the types of the values then dereference the method as required and apply to the argument values.

Up next

9. The evaluation loop, ceval.c

8. Intermezzo: The abstract.c Module

8. Intermezzo: The `abstract.c` Module