8. Intermezzo: The abstract.c Module
We have thus far mentioned severally that the Python virtual machine generically treat values for evaluation as PyObjects. This begets the obvious question - How are operations safely carried out on
such generic objects ?. For example, when evaluating the bytecode instruction BINARY_ADD, two PyObject
values are popped from the evaluation stack and used as arguments to an add operation, but how does the virtual machine know if the add operation makes sense for both values?
To understand how a lot of the operations on PyObjects work, we only have to look at the
Objects/Abstract.c module. This module defines several functions that operate on objects that implement a given object protocol. This means that for example, if one were adding two objects, then the add function in this module would expect that both objects implement the __add__ method of the tp_numbers slots. The best way to explain this is to illustrate with an example.
Consider when the BINARY_ADD opcode adds two numbers, the function that does the addition is the PyNumber_Add function of the Objects/Abstract.c module. Listing 8.1 is the definition of the PyNumber_Add function.
1 PyObject * PyNumber_Add(PyObject *v, PyObject *w){
2 PyObject *result = binary_op1(v, w, NB_SLOT(nb_add));
3 if (result == Py_NotImplemented) {
4 PySequenceMethods *m = v->ob_type->tp_as_sequence;
5 Py_DECREF(result);
6 if (m && m->sq_concat) {
7 return (*m->sq_concat)(v, w);
8 }
9 result = binop_type_error(v, w, "+");
10 }
11 return result;
12 }
Our interest at this point is in line 2 of the PyNumber_Add function defined in listing 8.1 - the call to the binary_op1 function. The binary_op1 function is another generic function that takes among
its parameters, two numbers or subclass of numbers and applies a binary function to these;
the NB_SLOT macro returns the offset of a given method into the PyNumberMethods structure; recall
that this structure is a collection of methods that work on numbers. The definition of this generic
binary_op1 function is in listing 8.2, and an in-depth explanation of this function immediately follows.
1 static PyObject * binary_op1(PyObject *v, PyObject *w, const int op_slot){
2 PyObject *x;
3 binaryfunc slotv = NULL;
4 binaryfunc slotw = NULL;
5
6 if (v->ob_type->tp_as_number != NULL)
7 slotv = NB_BINOP(v->ob_type->tp_as_number, op_slot);
8 if (w->ob_type != v->ob_type &&
9 w->ob_type->tp_as_number != NULL) {
10 slotw = NB_BINOP(w->ob_type->tp_as_number, op_slot);
11 if (slotw == slotv)
12 slotw = NULL;
13 }
14 if (slotv) {
15 if (slotw && PyType_IsSubtype(w->ob_type, v->ob_type)) {
16 x = slotw(v, w);
17 if (x != Py_NotImplemented)
18 return x;
19 Py_DECREF(x); /* can't do it */
20 slotw = NULL;
21 }
22 x = slotv(v, w);
23 if (x != Py_NotImplemented)
24 return x;
25 Py_DECREF(x); /* can't do it */
26 }
27 if (slotw) {
28 x = slotw(v, w);
29 if (x != Py_NotImplemented)
30 return x;
31 Py_DECREF(x); /* can't do it */
32 }
33 Py_RETURN_NOTIMPLEMENTED;
34 }
- The function takes three values, two
PyObject *-vandwand an integer value, operation slot, which is the offset of that operation into thePyNumberMethodsstructure. - Lines 3 and 4 define two values
slotvandslotw, structures that represent a binary function as their types suggest. - From line 3 to line 13, we attempt to dereference the function given by
op_slotargument for bothvandw. On line 8, there is an equality check of both values’ types, and if they are of the same type, there is no need to dereference the second value’s function in theop_slot. If both values are not of the same type, but the functions dereferenced from both are equal, then theslotwvalue is nulled out. - With the binary functions dereferenced, if
slotvis notNULLthen on line 15 we check thatslotwis notNULLand the type ofwis a subtype of the type ofv. If that is true,slotw’s function is applied to bothvandw. This happens because if you pause to think about it for a second, the method further down the inheritance tree is what we want to use not one further up. Ifwis not a subtype, thenslotvis applied to both values at line 22. - Getting to line 27 means that the
slotvfunction isNULLso we apply whateverslotwreferences to bothvandwso long as it is notNULL. - In the case where both
slotvandslotwboth do not contain a function, then aPy_NotImplementedis returned.Py_RETURN_NOTIMPLEMENTEDis just a macro that increments the reference count of thePy_NotImplementedvalue before returning it.
The idea captured by the explanation given above is a blueprint for how the interpreter performs operations on values. We have simplified things a bit here by ignoring that opcodes that can be overloaded. For example, the + symbol maps to the BINARY_ADD opcode and applies to strings, numbers and some sequences, but we have only looked at it in the context of numbers and subclasses of numbers. The BINARY_ADD implementation shown in Listing 8.3 can handle the other cases by looking at the type of values it is operating on and calling the corresponding functions. First, if both values are Unicode characters, the interpreter calls the function for concatenating Unicode characters. Otherwise, thePyNumber_Add function is invoked. This function’s implementation shows how it checks for numeric and then sequence types applying corresponding addition functions to the different types.
PyObject *right = POP();
PyObject *left = TOP();
PyObject *sum;
if (PyUnicode_CheckExact(left) &&
PyUnicode_CheckExact(right)) {
sum = unicode_concatenate(left, right, f, next_instr);
/* unicode_concatenate consumed the ref to left */
}
else {
sum = PyNumber_Add(left, right);
Py_DECREF(left);
}
Ignore lines 1 and 2 as we discuss them when we talk about the interpreter loop.
What we see from the rest of the snippet, is that when we encounter the BINARY_ADD, the first port of call is a check that both values are strings to apply string concatenation to the values. The
PyNumber_Add function from Objects/Abstract.c is then applied to both values if they are not
strings. Although the code seems a bit messy with the string check done in Python/ceval.c and
the number and sequence checks done in Objects/Abstract.c, it is pretty clear what is happening when we have an overloaded opcode.
This explanation provided above is the way the interpreter handles most opcode operations - check the types of the values then dereference the method as required and apply to the argument values.