Table of Contents
- 1. Acknowledgements
- 2. An Introduction
- 3. A Very Short Tutorial
- 4. Intermezzo: Glossary
- 5. Objects 201
- 6. Object Oriented Programming
- 7. The Function
- 8. Iterators and Generators
- 9. MetaProgramming and Co.
- 10. Modules And Packages
- 11. Inspecting Objects
- 12. The Zen of Python …
1. Acknowledgements
I would love to take the opportunity to thank all who have reviewed and spotted issues in the manuscript. This includes but is not limited to Ngozi Nwosu for taking the time out to review the whole manual and point out a whole load of grammatical errors, Olivia Enewally, Roman Turna and Abhen Ng for pointing out some factual and grammatical errors.A whole lot of other people on Reddit have pointed out errors and to those people I am really grateful.
Without the input of all these people, this manuscript would be worth less than it currently is. Thank you all!
2. An Introduction
The Python Programming language has been around for quite a while. Development work was started on the first version of Python by Guido Van Rossum in 1989. Since then, it has grown to become a highly loved and revered language that has been used and continues to be used in a host of different application types.
The Python interpreter and the extensive standard library that come with the interpreter are available for free in source or binary form for all major platforms from the Python Web site. This site also contains distributions of and pointers to many free third party Python modules, programs and tools, and additional documentation.
The Python interpreter can easily be extended with new functions and data types implemented in C, C++ or any other language that is callable from C. Python is also suitable as an extension language for customisable applications. One of the most notable feature of python is the easy and white-space aware syntax.
This book is intended as a concise intermediate level treatise on the Python programming language. There is a need for this due to the lack of availability of materials for python programmers at this level. The material contained in this book is targeted at the programmer that has been through a beginner level introduction to the Python programming language or that has some experience in a different object oriented programming language such as Java and wants to gain a more in-depth understanding of the Python programming language in a holistic manner. It is not intended as an introductory tutorial for beginners although programmers with some experience in other languages may find the very short tutorial included instructive.
The book covers only a handful of topics but tries to provide a holistic and in-depth coverage of these topics. It starts with a short tutorial introduction to get the reader up to speed with the basics of Python; experienced programmers from other object oriented languages such as Java may find that this is all the introduction to Python that they need. This is followed by a discussion of the Python object model then it moves on to discussing object oriented programming in Python. With a firm understanding of the Python object model, it goes ahead to discuss functions and functional programming. This is followed by a discussion of meta-programming techniques and their applications. The remaining chapters cover generators, a complex but very interesting topic in Python, modules and packaging, and python runtime services. In between, intermezzos are used to discuss topics are worth knowing because of the added understanding they provide.
I hope the content of the book achieves the purpose for the writing of this book. I welcome all feedback readers may have and actively encourage readers to provide such feedback.
2.1 The Evolution of Python
In December 1989, Guido Van Rossum started work on a language that he christened Python. Guido Van Rossum had been part of the team that worked on the ABC programming language as part of the Amoeba operating systems in the 1980s at CWI (Centrum Wiskunde & Informatica) in Amsterdam and although he liked the ABC programming language, he was frustrated by a number of features or lack of thereof. Guido wanted a high level programming language that would speed up the development of utilities for the the Amoeba project and ABC was not the answer. The ABC programming language would however play a very influential role in the development of python as Guido took parts he liked from the language and provided solutions for aspects of the ABC programming language that he found frustrating.
Guido released the first version of the Python programming language in February 1991. This release was object oriented, had a module system, included exception handling, functions, and the core data types of list
, dict
, str
and others. Python version 1.0 was released in January 1994 and this release included functional programming constructs such as lambda
, map
, filter
and reduce
.
Python 1.2 was the last version released while Guido was still at CWI. In 1995, Van Rossum continued his work on Python at the Corporation for National Research Initiatives (CNRI) in Reston, Virginia where he released several versions of the language with indirect funding support from DARPRA.
By version 1.4, Python had acquired several new features including the Modula-3 inspired keyword arguments and built-in support for complex numbers. It also included a basic form of data hiding by name mangling. Python 1.5 was released on December 31, 1997 while python 1.6 followed on September 5, 2000.
Python 2.0 was released on October 16, 2000 and it introduced list comprehensions, a feature borrowed from the functional programming languages SETL and Haskell as well as a garbage collection system capable of collecting reference cycles.
Python 2.2 was the first major update to the Python type system. This update saw the unification of Python’s in-built types and user defined classes written in Python into one hierarchy. This single unification made Python’s object model purely and consistently object oriented. This update to the class system of Python added a number of features that improved the programming experience. These included:
- The ability to subclass in built types such as
dicts
andlists
. - The addition of static and class methods
- The addition of properties defined by
get
andset
methods. - An update to meta-classes,
__new__()
andsuper()
methods, and MRO algorithms.
The next major milestone was Python 3 released on December 2, 2008. This was designed to rectify certain fundamental design flaws in the language that could not be implemented while maintaining full backwards compatibility with the 2.x series.
2.2 Python 2 vs Python 3
Perhaps the most visible and disruptive change to the Python ecosystem has been the introduction of Python 3. The major changes introduced into Python 3 include the following:
-
print()
is now a function - Some well known python APIs such as
range()
,dict.keys()
,dict.values()
return views and iterators rather than lists improving efficiency when they are used. - The rules for ordering comparisons have been simplified. For example, a heterogeneous list cannot be sorted, because all the elements of a list must be comparable to each other.
- The integer types have been whittled down to only one, i.e.
int
.long
is also anint
. - The division of two integers returns a float instead of an integer.
//
can be used to return an integer when division takes place. - All texts are now unicode but encoded unicode text is represented as binary data and any attempt to mix text and data will result in an exception. This breaks backwards compatibility with python 2.x versions.
- Python 3 also saw the introduction of some new syntax such as function annotations, the
nonlocal
statement, extended iterable unpacking, set literals, dictionary comprehensions etc. - Python 3 also saw update to some syntax such as that of exception handling, meta-class specification, list comprehensions etc.
The full details on changes from python 2 to python 3 can be viewed on the python website. The rest of the book will assumes the use of Python 3.4.
2.3 The Python Programming Language
The Python programming language refers to the language as documented in the language reference.
There is no official language specification but the language reference provides enough details to guide anyone implementing the Python programming language.
The implementation of the Python programming language available on the Python website is an implementation written in C
and commonly referred to as CPython. This is normally used as the reference implementation. However, there are other Python implementations in different languages. Popular among these are PyPy: python implemented in python and Jython: python implemented in Java. For the rest of this book, the reference CPython version that is freely distributed through the Python website is used.
3. A Very Short Tutorial
This short tutorial introduces the reader informally to the basic concepts and features of the Python programming language. It is not intended to be a comprehensive introductory tutorial to programming for a complete novice but rather assumes the reader has had previous experience programming with an object oriented programming language.
3.1 Using Python
Python is installed by default on most Unix-based systems including the Mac OS and various Linux-based distributions. To check if python is installed, open the command-line and type python
. If not installed then python can be installed by visiting the python language website and following instructions on how to install python for the given platform.
The Python Interpreter
The python interpreter can be invoked by opening the command-line and typing in
python
. In the case that it has been installed but the command cannot be found then the full path to the installation should be used or added to the path. Invoking the interpreter using python
brings up the python interactive session with an REPL
prompt. The primary prompt, >>>
, signals a user to enter statements while the secondary prompt, ...
, signals a continuation line as shown in the following example.
A user can type in python statements at the interpreter prompt and get instant feedback. For example, we can evaluate expressions at the REPL
and get values for such expressions as in the following example.
Typing Ctrl-D
at the primary prompt causes the interpreter to exit the session.
3.2 Python Statements, Line Structure and Indentation
A program in Python is composed of a number of logical lines and each of these logical lines is delimited by the NEWLINE
token. Each logical line is equivalent to a valid statement. Compound statements however can be made up of multiple logical lines. A logical line is made from one or more physical lines using the explicit or implicit line joining rules. A physical line is a sequence of characters terminated by an end-of-line
sequence. Python implicitly sees physical lines as logical lines eliminating the explicit need for semi-colons in terminating statements as in Java. Semi-colons however play a role in python; it is possible to have multiple logical lines on the same physical line by separating the logical lines with semi-colons such as shown below:
Multiple physical lines can be explicitly joined into a single logical line by use of the line continuation character, \
, as shown below:
Lines are joined implicitly, thus eliminating the need for line continuation characters, when expressions in triple quoted strings, enclosed in parenthesis (…), brackets [….] or braces {…} spans multiple lines.
From discussions above, it can be inferred that there are two types of statements in python:
- Simple statements that span a single logical line. These include statements such as assignment statements, yield statements etc. A simple statement can be summarized as follows:
- Compound statements that span multiple logical lines statements. These include statements such as the
while
andfor
statements. A compound statement is summarized as thus in python:
Compound statements are made up of one or more clauses. A clause is made up of a header and a suite. The clause headers for a given compound statement are all at the same indentation level; they begin with a unique identifier, while, if etc., and end with a colon. The suite execution is controlled by the header. This is illustrate with the example below:
The suite may be a set of one or more statements that follow the header’s colon with each statement separated from the previous by a semi-colon as shown in the following example.
The suite is conventionally written as one or more indented statements on subsequent lines that follow the header such as below:
Indentations are used to denote code blocks such as function bodies, conditionals, loops and classes. Leading white-space at the beginning of a logical line is used to compute the indentation level for that line, which in turn is used to determine the grouping of statements. Indentation used within the code body must always match the indentation of the the first statement of the block of code.
3.3 Strings
Strings are represented in Python using double "..."
or single '...'
quotes. Special characters can be used within a string by escaping them with \
as shown in the following example:
To avoid the interpretation of characters as special characters, the character, r
, is added before the opening quote for the string as shown in the following example.
String literals that span multiple lines can be created with the triple quotes but newlines are automatically added at the end of a line as shown in the following snippet.
To avoid the inclusion of a newline, the continuation character \
should be used at the end of a line as shown in the following example.
String are immutable so once created they cannot be modified. There is no character
type so characters are assumed to be strings of length, 1
. Strings are sequence types so support sequence type operations except assignment due to their immutability. Strings can be indexed with integers as shown in the following snippet:
Strings can be concatenated to create new strings as shown in the following example
One or more string literals can be concatenated together by writing them next to each other as shown in the following snippet:
The built-in method len
can also be used to get the length of a string as shown in the following snippet.
3.4 Flow Control
if-else
and if-elif-else
statements
Python supports the if
statement for conditional execution of a code block.
The if
statement can be followed by zero or more elif
statements and an optional else
statement that is executed when none of the conditions in the if
or elif
statements have been met.
for and range statements
The while
and for
statements constitute the main looping constructs provided by python.
The for
statement in python is used to iterate over sequence types (lists, sets, tuples etc.). More generally, the for
loop is used to iterate over any object that implements the python iterator protocol. This will be discussed further in chapters that follow. Example usage of the for
loop is shown by the following snippet:
Most programming languages have a syntax similar to the following for iterating over a progression of numbers:
Python replaces the above with the simpler range()
statement that is used to generate an arithmetic progression of integers. For example:
The range
statement has a syntax of range(start, stop, step)
. The stop value is never part of the progression that is returned.
while statement
The while
statement executes the statements in its suite as long as the condition expression in the while
statement evaluates to true
.
break and continue statements
The break
keyword is used to escape from an enclosing loop. Whenever the break
keyword is encountered during the execution of a loop, the loop is abruptly exited and no other statement within the loop is executed.
The continue
keyword is used to force the start of the next iteration of a loop. When used the interpreter ignores all statements that come after the continue
statement and continues with the next iteration of the loop.
In the example above, it can be observed that the number 5 is not printed due to the use of continue
when the value is 5 however all subsequent values are printed.
else
clause with looping constructs
Python has a quirky feature in which the else
keyword can be used with looping constructs. When an else
keyword is used with a looping construct such as while
or for
, the statements within the suite of the else
statement are executed as long as the looping construct was not ended by a break
statement.
If the loop was exited by a break
statement, the execution of the suite of the else
statement is skipped as shown in the following example:
Enumerate
Sometimes, when iterating over a list, a tuple or a sequence in general, having access to the index of the item, as well as the item being enumerated over maybe necessary. This could achieved using a while
loop as shown in the following snippet:
The above solution is how one would go about it in most languages but python has a better alternative to such in the form of the enumerate
keyword. The above solution can be reworked beautifully in python as shown in the following snippet:
3.5 Functions
Named functions are defined with the def
keyword which must be followed by the function name and the parenthesized list of formal parameters. The return
keyword is used to return a value from a function definition. A python function definition is shown in the example below:
Functions are invoked by calling the function name with required arguments in parenthesis for example full_name("Obi", "Ike-Nwosu")
. Python functions can return multiple values by returning a tuple of the required values as shown in the example below in which we return the quotient and remainder from a division operation:
Python functions can be defined without return
keyword. In that case the default returned value is None
as shown in the following snippet:
The return
keyword does not even have to return a value in python as shown in the following example.
Python also supports anonymous functions defined with the lambda
keyword. Python’s lambda support is rather limited, crippled a few people may say, because it supports only a single expression in the body of the lambda expression. Lambda expressions are another form of syntactic sugar and are equivalent to conventional named function definition. An example of a lambda expression is the following:
3.6 Data Structures
Python has a number of built-in data structures that make programming easy. The built-in data structures include lists, tuples and dictionaries.
- Lists: Lists are created using square brackets,
[]
or thelist()
function. The empty list is denoted by[]
. Lists preserve the order of items as they are created or insert into the list. Lists are sequence types so support integer indexing and all other sequence type subscripting that will be discussed in chapters that follow. Lists are indexed by integers starting with zero and going up to the length of the list minus one.
Items can be added to a list by appending to the list.
To get a full listing of all methods of the list, run the help
command with list
as argument.
- Tuples: These are also another type of sequence structures. A tuple consists of a number of comma separated objects for example.
When defining a non-empty tuple the parenthesis is optional but when the tuple is part of a larger expression, the parenthesis is required. The parenthesis come in handy when defining an empty tuple for instance:
Tuples have a quirky syntax that some people may find surprising. When defining a single element tuple, the comma must be included after the single element regardless of whether or not parenthesis are included. If the comma is left out then the result of the expression is not a tuple. For instance:
Tuples are integer indexed just like lists but are immutable; once created the contents cannot be changed by any means such as by assignment. For instance:
However, if the object in a tuple is a mutable object such as a list, such object can be changed as shown in the following example:
- Sets: A set is an unordered collection of objects that does not contain any duplicates. An empty set is created using
set()
or by using curly braces,{}
. Sets are unordered so unlike tuples or lists they cannot be indexed by integers. However sets, with the exception of frozen sets, are mutable so one canadd
,update
orremove
from a set as shown in the following: - Dictionary: This is a mapping data structure that is commonly referred to as an associative array or a hash table in other languages. Dictionaries or
dicts
as they are commonly called are indexed by keys that must be immutable. A pair of braces,{...}
or methoddict()
is used to create adict
. Dictionaries are unordered set ofkey:value
pairs, in which the keys are unique. A dictionary can be initialized by placing a set ofkey:value
pairs within the braces as shown in the following example.The primary operations of interest that are offered by dictionaries are the storage of a value by the key and retrieval of stored values also by key. Values are retrieved by using indexing the dictionary with the key using square brackets as shown in the following example.
Dictionaries are mutable so the values indexed by a key can be changed, keys can be deleted and added to the
dict
.
Python’s data structures are not limited to just those listed in this section. For example the collections
module provides additional data structures such as queues
and deques
however the data structures listed in this section form the workhorse for most Python applications. To get better insight into the capabilities of a data structure, the help()
function is used with the name of the data structure as argument for example, help(list)
.
3.7 Classes
The class
statement is used to define new types in python as shown in the following example:
Classes in python just like classes in other languages have class variables, instance variables, class methods, static methods and instance methods.
When defining classes, the base classes are included in the parenthesis that follows the class name. For those that are familiar with Java, the __init__
method is something similar to a constructor; it is in this method that instance variables are initialized. The above defined class can be initialized by calling the defined class with required arguments to __init__
in parenthesis ignoring the self
argument as shown in the following example.
Methods in a class that are defined with self
as first argument are instance methods. The self
argument is similar to this
in java and refers to the object instance. Methods are called in python using the dot notation syntax as shown below:
Python comes with built-in function, dir
, for introspection of objects. The dir
function can be called with an object as argument and it returns a list of all attributes, methods and variables, of a class.
3.8 Modules
Functions and classes provide mean for structuring your Python code but as the code grows in size and complexity, there is a need for such code to be split into multiple files with each source file containing related definitions. The source files can then be imported as needed in order to access definitions in any of such source file. In python, we refer to source files as modules and modules have the .py
extensions.
For example, the Account
class definition from the previous section can be saved to a module called Account.py
. To use this module else where, the import
statement is used to import the module as shown in the following example:
Note that the import statement takes the name of the module without the .py
extension.
Using the import statement creates a name-space, in this case the Account
name-space and all definitions in the module are available in such name-space. The dot notation (.
) is used to access the definitions as required.
An alias for an imported module can also be created using the as
keyword so the example from above can be reformulated as shown in the following snippet:
It is also possible to import only the definitions that are needed from the module resulting in the following:
All the definitions in a module can also be imported by using the wild card symbol a shown below:
This method of imports is not always advised as it can result in name clashes when one of the name definitions being imported is already used in the current name-space.
This is avoided by importing the module as a whole. Modules are also objects in Python so we can introspect on them using the dir
introspection function. Python modules can be further grouped together into packages. Modules and packages are discussed in depth in a subsequent chapter that follows.
3.9 Exceptions
Python has support for exceptions and exception handling. For example, when an attempt is made to divide by zero, a ZeroDivisionError
is thrown by the python interpreter as shown in the following example.
During the execution of a program, an exception is raised when an error occurs; if the exception is not handled, a trace-back is dumped to the screen. Errors that are not handled will normally cause an executing program to terminate.
Exceptions can be handled in Python by using the try...catch
statements. For example, the divide by zero exception from above could be handled as shown in the following snippet.
Exceptions in python can be of different types. For example, if an attempt was made to catch an IOError
in the previous snippet, the program would terminate because the resulting exception type is a ZeroDivisionError
exception. To catch all types of exceptions with a single handler, try...catch Exception
is used but this is advised against as it becomes impossible to tell what kind of exception has occurred thus masking the exception
Custom exceptions can be defined to handle custom exceptions in our code. To do this, define a custom exception class that inherits from the Exception
base class.
3.10 Input and Output
Python as expected has support for reading and writing to and from input and output sources. The file on the hard drive is the most popular IO device. The content of a file can be opened and read from using the snippet below:
The open method returns a file object or throws an exception if the file does not exist. The file object supports a number of methods such as read
that reads the whole content of the file into a string or readline
that reads the contents of the file one line at a time. Python supports the following syntactic sugar for iterating through the lines of a file.
Python supports writing to a file as shown below:
Python also has support for writing to standard input and standard output. This can be done using the sys.stdout.write()
or the sys.stdin.readline()
from the sys
module.
3.11 Getting Help
The python programming language has a very detailed set of documentation that can be obtained at the interpreter prompt by using the help
method. To get more information about a syntactic construct or data structure, pass it as an argument to the help function for example help(list)
.
4. Intermezzo: Glossary
A number of terms and esoteric python functions are used throughout this book and a good understanding of these terms is integral to gaining a better. and deeper understanding of python. A description of these terms and functions is provided in the sections that follow.
4.1 Names and Binding
In python, objects are referenced by names. names
are analogous to variables in C++
and Java
.
In the above, example, x
is a name that references the object, 5
. The process of assigning a reference to 5
to x
is called binding.
A binding causes a name to be associated with an object in the innermost scope of the currently executing program. Bindings may occur during a number of instances such as during variable assignment or function or method call when the supplied parameter is bound to the argument. It is important to note that names are just symbols and they have no type associated with them; names are just references to objects that actually have types
4.2 Code Blocks
A code block is a piece of program code that is executed as a single unit in python. Modules, functions and classes are all examples of code blocks. Commands typed in interactively at the REPL, script commands run with the -c
option are also code blocks. A code block has a number of name-spaces associated with it. For example, a module code block has access to the global
name-space while a function code block has access to the local
as well as the global
name-spaces.
4.3 Name-spaces
A name-space as the name implies is a context in which a given set of names is bound to objects. Name-spaces in python are currently implemented as dictionary mappings. The built-in name-space is an example of a name-space that contains all the built-in functions and this can be accessed by entering __builtins__.__dict__
at the terminal (the result is of a considerable amount). The interpreter has access to multiple name-spaces including
the global name-space, the built-in name-space and the local name-space.
These name-spaces are created at different times and have different lifetimes. For example, a new local name-space is created at the invocation of a function and forgotten when the function exits or returns. The global name-space is created at the start of the execution of a module and all names defined in this name-space are available module-wide while the built-in name-space comes into existence when the interpeter is invoked and contains all the built-in names. These three name-spaces are the main name-space available to the interpreter.
4.4 Scopes
A scope is an area of a program in which a set of name bindings (name-spaces) is visible and directly accessible. Direct access is an important characteristic of a scope as will be explained when classes are discussed. This simply means that a name, name
, can be used as is, without the need for dot notation such as SomeClassOrModule.name
to access it. At runtime, the following scopes may be available.
- Inner most scope with local names
- The scope of enclosing functions if any (this is applicable for nested functions)
- The current module’s globals scope
- The scope containing the builtin name-space.
Whan a name is used in python, the interpreter searches the name-spaces of the scopes in ascending order as listed above and if the name is not found in any of the name-spaces, an exception is raised. Python supports static scoping also known as lexical scoping; this means that the visibility of a set of name bindings can be inferred by only inspecting the program text.
Note
Python has a quirky scoping rule that prevents a reference to an object in the global scope from being modified in a local scope; such an attempt will throw an UnboundLocalError
exception. In order to modify an object from the global scope within a local scope, the global
keyword has to be used with the object name before modification is attempted. The following example illustrates this.
In order to modify the object from the global scope, the global
statement is used as shown in the following snippet.
Python also has the nonlocal
keyword that is used when there is a need to modify a variable bound in an outer non-global scope from an inner scope. This proves very handy when working with nested functions (also referred to as closures). A very trivial illustration of the nonlocal
keyword in action is shown in the following snippet that defines a simple counter object that counts in ascending order.
4.5 eval()
eval
is a python built-in method for dynamically executing python expressions in a string (the content of the string must be a valid python expression) or code objects. The function has the following signature eval(expression, globals=None, locals=None)
. If supplied, the globals
argument to the eval
function must be a dictionary while the locals
argument can be any mapping. The evaluation of the supplied expression is done using the globals and locals dictionaries as the global and local name-spaces. If the __builtins__
is absent from the globals dictionary, the current globals are copied into globals before expression is parsed. This means that the expression will have either full or restricted access to the standard built-ins depending on the execution environment; this way the exection environment of eval
can be restricted or sandboxed. eval
when called returns the result of executing the expression or code object for example:
Since eval
can take arbitrary code obects as argument and return the value of executing such expressions, it along with exec
, is used in executing arbitrary Python code that has been compiled into code objects using the compile
method. Online Python interpreters are able to execute python code supplied by their users using both eval
and exec
among other methods.
4.6 exec()
exec
is the counterpart to eval
. This executes a string interpreted as a suite of python statements or a code object. The code supplied is supposed to be valid as file input in both cases. exec
has the following signature: exec(object[, globals[, locals]])
. The following is an example of exec
using a string and the current name-spaces.
In all instances, if optional arguments are omitted, the code is executed in the current scope. If only the globals
argument is provided, it has to be a dictionary, that is used for both the global and the local variables. If globals
and locals
are given, they are used for the global and local variables, respectively. If provided, the locals
argument can be any mapping object. If the globals
dictionary does not contain a value for the key __builtins__
, a reference to the dictionary of the built-in module builtins
is inserted under that key. One can control the builtins
that are available to the executed code by inserting custom __builtins__
dictionary into globals
before passing it to exec()
thus creating a sandbox.
5. Objects 201
Python objects are the basic abstraction over data in python; every value is an object in python. Every object has an identity, a type and a value. An object’s identity never changes once it has been created. The id(obj)
function returns an integer representing the obj's
identity. The is
operator compares the identity of two objects returning a boolean. In CPython, the id()
function returns an integer that is a memory location for the object thus uniquely identifying such object. This is an implementation detail and implementations of Python are free to return whatever value uniquely identifies objects within the interpreter.
The type()
function returns an object’s type; the type of an object is also an object itself. An object’s type is also normally unchangeable. An object’s type determines the operations that the object supports and also defines the possible values for objects of that type. Python is a dynamic language because types are not associated with variables so a variable, x
, may refer to a string and later refer to an integer as shown in the following example.
However, Python unlike dynamic languages such as Javascript is strongly typed because the interpreter will never change the type of an object. This means that actions such as adding a string to a number will cause an exception in Python as shown in the following snippet:
This is unlike Javascript where the above succeeds because the interpreter implicitly converts the integer to a string then adds it to the supplied string.
Python objects are either one of the following:
- Mutable objects: These refer to objects whose value can change. For example a list is a mutable data structure as we can grow or shrink the list at will.
Programmers new to Python from other languages may find some behavior of mutable object puzzling; Python is a pass-by-object-reference language which means that the values of object references are the values passed to function or method calls and names bound to variables refer to these reference values. For example consider the snippets shown in the following example.
y
and x
refer to the same object so a change to x
is reflected in y
. To fully understand why this is so, it must be noted that the variable, x
does not actually hold the list, [1, 2, 3]
, rather it holds a reference that points to the location of that object so when the variable, y
is bound to the value contained in x
, it now also contains the reference to the original list, [1, 2, 3]
. Any operation on x
finds the list that x
refers to and carries out the operation on the list; y
also refers to the same list thus the change is also reflected in the variable, y
.
- Immutable objects: These objects have values that cannot be changed. A
tuple
is an example of an immutable data structure because once created we can not change the constituent objects as shown below:
However if an immutable object contains a mutable object the mutable object can have its value changed even if it is part of an immutable object. For example, a tuple is an immutable data structure however if a tuple contains a list object, a mutable object, then we can change the value of the list object as shown in the following snippet.
5.1 Strong and Weak Object References
Python objects get references when they are bound to names. This binding can be in form of an assignment, a function or method call that binds objects to argument names etc. Every time an object gets a reference, the reference count is increased. In fact the reference count for an object can be found using the sys.getrefcount
method as shown in the following example.
Two kind of references, strong and weak references, exist in Python but when discussing references, it is almost certainly the strong reference that is being referred to. The previous example for instance, has three references and these are all strong references. The defining characteristic of a strong reference in Python is that whenever a new strong reference is created, the reference count for the referenced object is incremented by 1
. This means that the garbage collector will never collect an object that is strongly referenced because the garbage collector collects only objects that have a reference count of 0
. Weak references on the other hand do not increase the reference count of the referenced object. Weak referencing is provided by the weakref
module. The following snippet shows weak referencing in action.
The weakref.ref
function returns an object that when called returns the weakly referenced object. The weakref
module the weakref.proxy
alternative to the weakref.ref
function for creating weak references. This method creates a proxy object that can be used just like the original object without the need for a call as shown in the following snippet.
When all the strong references to an object have deleted then the weak reference looses it reference to the original object and the object is ready for garbage collection. This is shown in the following example.
5.2 The Type Hierarchy
Python comes with its own set of built-in types and these built-in types broadly fall into one of the following categories:
None
Type
The None
type is a singleton object that has a single value and this value is accessed through the built-in name None
. It is used to signify the absence of a value in many situations, e.g., it is returned by functions that don’t explicitly return a value as illustrated below:
The None
type has a truth value of false
.
NotImplemented
Type
The NotImplemented
type is another singleton object that has a single value. The value of this object is accessed through the built-in name NotImplemented
. This object should be returned when we want to delegate the search for the implementation of a method to the interpreter rather than throwing a runtime NotImplementedError
exception. For example, consider the two types, Foo
and Bar
below:
When an attempt is made at comparisons, the effect of returning
NotImplemented
can be clearly observed. In Python, a == b
results in a call to a.__eq__(b)
. In this example, instance of Foo
and Bar
have implementations for comparing themselves to other instance of the same class, for example:
What actually happens when we compare f
with b
? The implementation of __eq__()
in Foo
checks that the other
argument is an instance of Bar
and handles it accordingly returning a value of True
:
If b
is compared with f
then b.__eq__(f)
is invoked and the NotImplemented
object is returned because the implementation of __eq__()
in Bar
only supports comparison with a Bar
instances. However, it can be seen in the following snippet that the comparison operation actually succeeds; what has happened?
The call to b.__eq__(f)
method returned NotImplemented
causing the python interpreter to invoke the __eq__()
method in Foo
and since a comparison between Foo
and Bar
is defined in the implementation of the __eq__()
method in Foo
the correct result, True
, is returned.
The NotImplmented
object has a truth value of true.
Ellipsis
Type
This is another singleton object type that has a single value. The value of
this object is accessed through the literal ...
or the built-in name Ellipsis
. The truth value for the Ellipsis
object is true. The Ellipsis
object is mainly used in numeric python for indexing and slicing matrices. The
numpy documentation provides more insight into how the Ellipsis
object is used.
Numeric Type
Numeric types are otherwise referred to as numbers. Numeric objects are immutable thus once created their value cannot be changed. Python numbers fall into one of the following categories:
- Integers: These represent elements from the set of positive and negative
integers. These fall into one of the following types:
- Plain integers: These are numbers in the range of
-2147483648
through2147483647
on a 32-bit machine; the range value is dependent on machine word size. Long integers are returned when results of operations fall outside the range of plain integers and in some cases, the exceptionOverflowError
is raised. For the purpose of shift and mask operations, integers are assumed to have a binary, 2’s complement notation using 32 or more bits, and hiding no bits from the user. - Long integers: Long integers are used to hold integer values that are as large as the virtual memory on a system can handle. This is illustrated in the following example.
It is important to note that from the perspective of a user, there is no difference between the plain and long integers as all conversions if any are done under covers by the interpreter.
- Booleans: These represent the truth values
False
andTrue
. TheBoolean
type is a subtype of plain integers. TheFalse
andTrue
Boolean values behave like0
and1
values respectively except when converted to a string, then the strings “False” or “True” are returned respectively. For example:
- Plain integers: These are numbers in the range of
- Float: These represent machine-level only double precision floating point numbers. The underlying machine architecture and specific python implementation determines the accepted range and the handling of overflow; so
CPython
will be limited by the underlyingC
language whileJython
will be limited by the underlyingJava
language. - Complex Numbers: These represent complex numbers as a pair of machine-level double precision floating point numbers. The same caveats apply as for floating point numbers. Complex numbers can be created using the
complex
keyword as shown in the following example.
Complex numbers can also be created by using a number literal prefixed with a j
. For instance, the previous complex number example can be created by the expression, 1+2j
. The real and imaginary parts of a complex number z
can be retrieved through the read-only attributes z.real
and z.imag
.
Sequence Type
Sequence types are finite ordered collections of objects that can be indexed by integers; using negative indices in python is legal. Sequences fall into two categories - mutable and immutable sequences.
- Immutable sequences: An immutable sequence type object is one whose value cannot change once it is created. This means that the collection of objects that are directly referenced by an immutable sequence is fixed. The collection of objects referenced by an immutable sequence maybe composed of mutable objects whose value may change at runtime but the mutable object itself that is directly referenced by an immutable sequence cannot be changed. For example, a tuple is an immutable sequence but if one of the elements in the tuple is a list, a mutable sequence, then the list can change but the reference to the list object that tuple holds cannot be changed as shown below:
The following are built-in immutable sequence types:
- Strings: A string is an immutable sequence of Unicode code points or more informally an immutable sequence of characters. There is no
char
type in python so a character is just a string of length, 1. Strings in python can represent all unicode code points in the rangeU+0000 - U+10FFFF
. All text in python is Unicode and the type of the objects used to hold such text isstr
. - Bytes: A
bytes
object is an immutable sequence of 8-bit bytes. Each bytes is represented by an integer in the range0 <= x < 256
. Bytes literals such asb'abc'
and the built-in functionbytes()
are used to createbytes
objects.Bytes
object have an intimate relationship withstrings
. Strings are abstractions over text representation used in the computer; text is represented internally using binary or bytes. Strings are just sequences of bytes that have been decoded using an encoding such asUTF-8
. The abstract characters of a string can also be encoded using available encodings such asUTF-8
to get the binary representation of the string in bytes objects. The relationship betweenbytes
andstrings
is illustrated with the following example. - Tuple: A tuple is a sequence of arbitrary python objects. Tuples of two or more items are formed by comma-separated lists of expressions. A tuple of one item is formed by affixing a comma to an expression while an empty tuple is formed by an empty pair of parentheses. This is illustrated in the following example.
- Mutable sequences: An immutable sequence type is one whose value can change after it has created. There are currently two built-in mutable sequence types -
byte arrays
andlists
- Byte Arrays: Bytearray objects are mutable arrays of bytes. Byte arrays are created using the built-in
bytearray()
constructor. Apart from being mutable and thus unhashable, byte arrays provide the same interface and functionality as immutablebyte
objects. Bytearrays are very useful when the efficiency offered by their mutability is required. For example, when receiving an unknown amount of data over a network, byte arrays are more efficient because the array can be extended as more data is received without having to allocate new objects as would be the case if the immutablebyte
type was used. - Lists: Lists are a sequence of arbitrary Python objects. Lists are formed by placing a comma-separated list of expressions in square brackets. The empty list is formed with the empty square bracket,
[]
. A list can be created from any iterable by passing such iterable to thelist
method. The list data structure is one of the most widely used data type in python.
- Byte Arrays: Bytearray objects are mutable arrays of bytes. Byte arrays are created using the built-in
Sequence types have some operations that are common to all sequence types. These are described in the following table; x is an object, s and t are sequences and n, i, j, k are integers.
Operation | Result |
---|---|
x in s |
True if an item of s is equal to x, else False |
x not in s |
False if an item of s is equal to x, else True |
s + t | the concatenation of s and t |
s * n or n * s | n shallow copies of s concatenated |
s[i] | ith item of s, origin 0 |
s[i:j] | slice of s from i to j |
s[i:j:k] | slice of s from i to j with step k |
len(s) | length of s |
min(s) | smallest item of s |
max(s) | largest item of s |
s.index(x[, i[, j]]) | index of the first occurrence of x in s (at or after index i and before index j) |
s.count(x) | total number of occurrences of x in s |
Note
- Values of n that are less than 0 are treated as 0 and this yields an empty sequence of the same type as s such as below:
- Copies made from using the
*
operation are shallow copies; any nested structures are not copied. This can result in some confusion when trying to create copies of a structure such as a nested list.To avoid shallow copies when dealing with nested lists, the following method can be adopted
- When i or j is negative, the index is relative to the end of the string thus
len(s) + i
orlen(s) + j
is substituted for the negative value of i or j. - Concatenating immutable sequences such as strings always results in a new object for example:
Python defines the interfaces (thats the closest word that can be used) - Sequences
and MutableSequences
in the collections
library and these define all the methods a type must implement to be considered a mutable or immutable sequence; when abstract base classes are discussed, this concept will become much clearer.
Set
These are unordered, finite collection of unique python objects. Sets
are unordered so they cannot be indexed by integers. The members of a set must be hash-able so only immutable objects can be members of a set. This is so because sets in python are implemented using a hash table; a hash table uses some kind of hash function to compute an index into a slot. If a mutable value is used then the index calculated will change when this object changes thus mutable values are not allowed in sets.
Sets provide efficient solutions for membership testing, de-duplication, computing of intersections, union and differences. Sets can be iterated over, and the built-in function len()
returns the number of items in a set. There are currently two intrinsic set types:- the mutable set type and the immutable frozenset type. Both have a number of common methods that are shown in the following table.
Method | Description |
---|---|
len(s) | return the cardinality of the set, s. |
x in s | Test x for membership in s. |
x not in s | Test x for non-membership in s. |
isdisjoint(other) | Return True if the set has no elements in common with other. Sets are disjoint if and only if their intersection is the empty set. |
issubset(other), set <= other | Test whether every element in the set is in other. |
set < other | Test whether the set is a proper subset of other, that is, set <= other and set ! other. |
issuperset(other), set >= other | Test whether every element in other is in the set. |
set > other | Test whether the set is a proper superset of other, that is, set >= other and set != other. |
union(other, …), set | other | … | Return a new set with elements from the set and all others. |
intersection(other, …), set & other & … | Return a new set with elements common to the set and all others. |
difference(other, …), set - other - … | Return a new set with elements in the set that are not in the others. |
symmetric_difference(other), set ^ other | Return a new set with elements in either the set or other but not both. |
copy() | Return a new set with a shallow copy of s. |
- Frozen set: This represents an immutable set. A frozen set is created by the built-in
frozenset()
constructor. A frozenset is immutable and thus hashable so it can be used as an element of another set, or as a dictionary key. - Set: This represents a mutable set and it is created using the built-in
set()
constructor. The mutable set is not hashable and cannot be part of another set. A set can also be created using the set literal{}
. Methods unique to the mutable set include:
Method | Description |
---|---|
update(other, …), set |= other | … | Update the set, adding elements from all others. |
intersection_update(other, …), set &= other & … | Update the set, keeping only elements found in it and all others. |
difference_update(other, …), set -= other | … | Update the set, removing elements found in others. |
symmetric_difference_update(other), set ^= other | Update the set, keeping only elements found in either set, but not in both. |
add(elem) | Add element elem to the set. |
remove(elem) | Remove element elem from the set. Raises KeyError if elem is not contained in the set. |
discard(elem) | Remove element elem from the set if it is present. |
pop() | Remove and return an arbitrary element from the set. Raises KeyError if the set is empty. |
clear() | Remove all elements from the set. |
Mapping
A python mapping is a finite set of objects (values) indexed by a set of immutable python objects (keys). The keys in the mapping must be hashable for the same reason given previously in describing set members thus eliminating mutable types like lists, frozensets, mappings etc.
The expression, a[k]
, selects the item indexed by the key, k
, from the mapping a
and can be used as in assignments or del
statements. The dictionary
mostly called dict
for convenience is the only intrinsic mapping type built into python:
- Dictionary: Dictionaries can be created by placing a comma-separated sequence of
key: value
pairs within braces, for example:{'name': "obi", 'age': 18}
, or by the dict() constructor. The main operations supported by thedictionary
type is the addition, deletion and selection of values using a given key. When adding a key that is already in use within a dict, the old value associated with that key is forgotten. Attempting to access a value with a non-existent key will result in aKeyError
exception. Dictionaries are perhaps one of the most important types within the interpreter. Without explicitly making use of a dictionary, the interpreter is already using them in a number of different places. For example, the namespaces, namespaces are discussed in a subsequent chapter, in python are implemented using dictionaries; this means that every time a symbol is referenced within a program, a dictionary access occurs. Objects are layered on dictionaries in python; all attributes of python objects are stored in a dictionary attribute,__dict__
. These are but a few applications of this type within the python interpreter.
Python supplies more advanced forms of the dictionary type in its collections
library. These are the OrderedDict
that introduces order into a dictionary thus remembering the order in which items were insert and the defaultdict
that takes a factory function that is called to produce a value when a key is missing. If a key is missing from a defaultdict instance, the factory function is called to produce a value for the key and the dictionary is updated with this key, value pair and the created value is returned. For example,
Callable Types
These are types that support the function call operation. The function call
operation is the use of ()
after the type name. In the example below, the function is print_name
and the function call is when the ()
is appended to the function name as such print_name()
.
Functions are not the only callable types in python; any object type that implements the __call__
special method is a callable type. The function, callable(type)
, is used to check that a given type is callable. The following are built-in callable python types:
- User-defined functions: these are functions that a user defines with the
def
statement such as theprint_name
function from the previous section. - Methods: these are functions defined within a class and accessible within the scope of the class or a class instance. These methods could either be instance methods, static or class methods.
- Built-in functions: These are functions available within the interpreter core such as the
len
function. - Classes: Classes are also callable types. The process of creating a class instance involves calling the class such as
Foo()
.
Each of the above types is covered in detail in subsequent chapters.
Custom Type
Custom types are created using the class
statements. Custom class objects have a type of type
. These are types created by user defined programs and they are discussed in the chapter on object oriented programming.
Module Type
A module is one of the organizational units of Python code just like functions or classes. A module is also an object just like every other value in the python. The module type is created by the import system as invoked either by the import statement, or by calling functions such as importlib.import_module()
and built-in __import__()
.
File/IO Types
A file object represents an open file. Files are created using the open
built-in functions that opens and returns a file object on the local file system; the file object can be open in either binary or text mode. Other methods for creating file objects include:
-
os.fdopen
that takes a file descriptor and create a file object from it. Theos.open
method not to be confused with theopen
built-in function is used to create a file descriptor that can then be passed to theos.fdopen
method to create a file object as shown in the following example. -
os.popen()
: this is marked for deprecation. -
makefile()
method of a socket object that opens and returns a file object that is associated with the socket on which it was called.
The built-in objects, sys.stdin
, sys.stdout
and sys.stderr
, are also file objects corresponding to the python interpreter’s standard input, output and error streams.
Built-in Types
These are objects used internally by the python interpreter but accessible by a user program. They include traceback objects, code objects, frame objects and slice objects
Code Objects
Code objects represent compiled executable Python code, or bytecode. Code objects are machine code for the python virtual machine along with all that is necessary for the execution of the bytecode they represent. They are normally created when a block of code is compiled. This executable piece of code can only be executed using the exec
or eval
python methods.
To give a concrete understanding of code objects we define a very simple function below and dissect the code object.
The code object for the above function can be obtained from the function object by assessing its __code__
attribute as shown below:
We can go further and inspect the code object using the dir
function to see the attributes of the code object.
Of particular interest to us at this point in time are the non-special methods that is methods that do not start with an underscore. We give a brief description of each of these non-special methods in the following table
Method | Description |
---|---|
co_argcount | number of arguments (not including * or ** args) |
co_code | string of raw compiled bytecode |
co_consts | tuple of constants used in the bytecode |
co_filename | name of file in which this code object was created |
co_firstlineno | number of first line in Python source code |
co_flags | bitmap: 1=optimized | 2=newlocals | 4=*arg | 8=**arg |
co_lnotab | encoded mapping of line numbers to bytecode indices |
co_name | name with which this code object was defined |
co_names | tuple of names of local variables |
co_nlocals | number of local variables |
co_stacksize | virtual machine stack space required |
co_varnames | tuple of names of arguments and local variables |
We can view the bytecode string for the function using the co_code
method of the code object as shown below.
The bytecode returned however is basically of no use to someone investigating code objects. This is where the python dis
module comes into play. The dis
module can be used to generate a human readable version of the code object. We use the dis
function from the dis
module to generate the code object for return_author_name
function.
The above shows the human readable version of the the python code object. The
LOAD_CONST
instruction reads a value from the co_consts
tuple, and pushes it onto the top of the stack (the CPython interpreter is a stack based virtual machine). The RETURN_VALUE
instruction pops the top of the stack, and returns this to the calling scope signalling the end of the execution of that python code block.
Code objects serve a number of purposes while programming. They contain information that can aid in interactive debugging while programming and can provide us with readable tracebacks during an exception.
Frame Objects
Frame objects represent execution frames. Python code blocks are executed in execution frames. The call stack of the interpreter stores information about currently executing subroutines and the call stack is made up of stack frame objects. Frame objects on the stack have a one-to-one mapping with subroutine calls by the program executing or the interpreter. The frame object contains code objects and all necessary information, including references to the local and global name-spaces, necessary for the runtime execution environment. The frame objects are linked together to form the call stack. To simplify how this all fits together a bit, the call stack can be thought of as a stack data structure (it actually is), every time a subroutine is called, a frame object is created and inserted into the stack and then the code object contained within the frame is executed. Some special read-only attributes of frame objects include:
-
f_back
is to the previous stack frame towards the caller, orNone
if this is the bottom stack frame. -
f_code
is the code object being executed in this frame. -
f_locals
is the dictionary used to look up local variables. -
f_globals
is used for global variables. -
f_builtins
is used for built-in names. -
f_lasti
gives the precise instruction - it is an index into the bytecode string of the code object.
Some special writable attributes include:
-
f_trace
: If this is notNone
, this is a function called at the start of each source code line. -
f_lineno
: This is the current line number of the frame. Writing to this from within a trace function jumps to the given line only for the bottom-most frame. A debugger can implement aJump
command by writing tof_lineno
.
Frame objects support one method:
-
frame.clear()
: This method clears all references to local variables held by the frame. If the frame belonged to a generator, the generator is finalized. This helps break reference cycles involving frame objects. ARuntimeError
is raised if the frame is currently executing.
Traceback Objects
Traceback objects represent the stack trace of an exception. A traceback object is created when an exception occurs. The interpreter searches for an exception handler by continuously popping the execution stack and inserting a traceback object in front of the current traceback for each frame popped.
When an exception handler is encountered, the stack trace is made available to the program. The stack trace object is accessible as the third item of the tuple returned by sys.exc_info()
. When the program contains no suitable handler, the stack trace is written to the standard error stream; if the interpreter is interactive, it is also made available to the user as sys.last_traceback
. A few important attributes of a traceback object is shown in the following table.
Method | Description |
---|---|
tb_next | is the next level in the stack trace (towards the frame where the exception occurred), or None if there is no next level |
tb_frame | points to the execution frame of the current level; tb_lineno gives the line number where the exception occurred |
tb_lasti | indicates the precise instruction. The line number and last instruction in the traceback may differ from the line number of its frame object if the exception occurred in a try statement with no matching except clause or with a finally clause. |
Slice Objects
Slice objects represent slices for __getitem__()
methods of sequence-like objects (more on special methods such as __getitem__()
in the chapter on object oriented programming). Slice object return a subset of the sequence they are applied to as shown below.
They are also created by the built-in slice([start,], stop [,step])
function. The returned object can be used in between the square brackets as a regular slice object.
Slice object read-only attributes include:
Attribute | Description |
---|---|
start |
which is the lower bound; |
stop |
the optional upper bound; |
step |
the optional step value; |
Each of the optional attributes is None
if omitted. Slices can take a number of forms in addition to the standard slice(start, stop [,step])
. Other forms include
The start or end values may also be negative in which case we count from the end of the array as shown below:
Slice objects support one method:
- slice.indices(self, length): This method takes a single integer argument, length, and returns a tuple of three integers - (start, stop, stride) that indicates how the slice would apply to the given length. The start and stop indices are actual indices they would be in a sequence of length given by the length argument. An example is shown below:
Generator Objects
Generator objects are created by the invocation of generator functions; these are functions that use the keyword, yield
. This type is discussed in detail in the chapter on Sequences and Generators
.
With a strong understanding of the built-in type hierarchy, the stage is now set for examining object oriented programming and how users can create their own type hierarchy and even make such types behave like built-in types.
6. Object Oriented Programming
Classes are the basis of object oriented programming in python and are one of the basic organizational units in a python program.
6.1 The Mechanics of Class Definitions
The class
statement is used to define a new type. The class statement defines a set of attributes, variables and methods, that are associated with and shared by a collection of instances of such a class. A simple class definition is given below:
Class definitions introduce class objects, instance objects and method objects.
Class Objects
The execution of a class statement creates a class object.
At the start of the execution of a class statement,
a new name-space is created and this serves as the name-space into which all class attributes go; unlike languages like Java, this name-space does not create a new local scope that can be used by class methods hence the need for fully qualified names when accessing attributes. The Account
class from the previous section illustrates this; a method trying to access the num_accounts
variable must use the fully qualified name, Account.num_accounts
else an error results such as when the fully qualified name is not used in the __init__
method as shown below:
At the end of the execution of a class statement, a class object is created; the scope preceding the class definition is reinstated, and the class object is bound in this scope to the class name given in the class definition header.
A little diversion here. One may ask, if the class created is an object then what is the class of the class object?. In accordance with the Python philosophy that every value is an object , the class object does indeed have a class which it is created from; this is the type
class.
So just to confuse you a bit, the type of a type, the Account type, is type
.
To get a better understanding of the fact that a class is indeed an object with its own class we go behind the scenes to explain what really goes on during the execution of a class statement using the Account
example from above.
During the execution of class statement, the interpreter kind of carries out the following steps behind the scene (greatly simplified here):
- The body of the class statement is isolated into a code object.
- A class dictionary representing the name-space for the class is created.
- The code object representing the body of the class is executed within this name-space.
- During the final step, the class object is created by instantiating the
type
class, passing in the class name, base classes and class dictionary as arguments. Thetype
class used here in creating theAccount
class object is a meta-class, the class of a class. The meta-class used in the class object creation can be explicitly specified by supplying themetaclass
keyword argument in theclass
definition. In the case that this is not supplied, the base classes if any are checked for a meta-class. If no base classes are supplied, then the defaulttype()
metaclass is used. More about meta-classes is discussed in subsequent chapters.
Class objects support attribute reference and object instantiation. Attributes are referenced using the standard dot syntax; an object followed by dot and then attribute name: obj.name
. Valid attribute names are all the variable and method names present in the class’ name-space when the class object was created. For example:
Object instantiation is carried out by calling the class object like a normal function with required parameters for the __init__
method of the class as shown in the following example:
An instance object that has been initialized with supplied arguments is returned from instantiation of a class object. In the case of the Account
class, the account name and account balance are set and, the number of instances is incremented by 1
in the __init__
method.
Instance Objects
If class objects are the cookie cutters then instance objects are the cookies that are the result of instantiating class objects. Instance objects are returned after the correct initialization of a class just as shown in the previous section. Attribute references are the only operations that are valid on instance objects. Instance attributes are either data attribute, better known as instance variables in languages like Java, or method attributes.
Method Objects
If x
is an instance of the Account
class, x.deposit
is an example of a method object. Method objects are similar to functions however during a method definition, an extra argument is included in the arguments list, the self
argument. This self
argument refers to an instance of the class but why do we have to pass an instance as an argument to a method? This is best illustrated by a method call such as the following.
But what exactly happens when an instance method is called? It can be observed that x.inquiry()
is called without an argument above, even though the method definition for inquiry()
requires the self
argument. What happened to this argument?
In the example from above, the call to x.inquiry()
is exactly equivalent to Account.inquiry(x)
; notice that the object instance, x
, is being passed as argument to the method - this is the self
argument. Invoking an object method with an argument list is equivalent to invoking the corresponding method from the object’s class with an argument list that is created by inserting the method’s object at the start of the list of argument. In order to understand this, note that methods are stored as functions in class dicts
.
To fully understand how this transformation takes place one has to understand descriptors and Python’s attribute references algorithm. These are discussed in subsequent sections of this chapter. In summary, the method object is a wrapper around a function object; when the method object is called with an argument list, a new argument list is constructed from the instance object and the argument list, and the underlying function object is called with this new argument list. This applies to all instance method objects including the __init__
method. Note that the self
argument is actually not a keyword; the name, self
is just a convention and any valid argument name can be used as shown in the Account
class definition below.
6.2 Customizing User-defined Types
Python is a very flexible language providing user with the ability to customize classes in ways that are unimaginable in other languages. Attribute access, class creation and object initialization are a few examples of ways in which classes can be customized. User defined types can also be customized to behave like built-in types and support special operators and syntax such as *
, +
, -
, []
etc.
All these customization is possible because of methods that are called special or magic methods. Python special methods are just ordinary python methods with double underscores as prefix and suffix to the method names.
Special methods have already encountered in this book. An example is the __init__
method that is called to initialize class instances; another is the __getitem__
method invoked by the index, []
operator; an index such as a[i]
is translated by the interpreter to a call to type(a).__getitem__(a, i)
. Methods with the double underscore as prefix and suffix are just ordinary python methods; users can define their own class methods with method names prefixed and suffixed with the double underscore and use it just like normal python methods. This is however not the conventional approach to defining normal user methods.
User defined classes can also implement these special methods; a corollary of this is that built-in operators such as +
or []
can be adapted for use by user defined classes. This is one of the essence of polymorphism in Python. In this book, special methods are grouped according to the functions they serve. These groups include:
Special methods for instance creation
The __new__
and __init__
special methods are the two methods that are integral to instance creation. New class instances are created in a two step process; first the static method, __new__
, is called to create and return a new class instance then the __init__
method is called to to initialize the newly created object with supplied arguments. A very important instance in which there is a need to override the __new__
method is when sub-classing built-in immutable types. Any initialization that is done in the sub-class must be done before object creation. This is because once an immutable object is created, its value cannot be changed so it makes no sense trying to carry out any function that modifies the created object in an __init__
method. An example of sub-classing is shown in the following snippet in which whatever value is supplied is rounded up to the next integer.
Attempting to do the math.ceil
operation in an __init__
method will cause the object initialization to fail. The __new__
method can also be overridden to create a Singleton
super class; subclasses of this class can only ever have a single instance throughout the execution of a program; the following example illustrates this.
It is worth noting that when implementing the __new__
method, the implementation must call its base class’ __new__
and the implementation method must return an object.
Users are already familiar with defining the __init__
method; the __init__
method is overridden to perform attribute initialization for an instance of a mutable types.
Special methods for attribute access
The special methods in this category provide means for customizing attribute references; this maybe in order to access or set such an attribute. This set of special methods available for this include:
-
__getattr__
: This method can be implemented to handle situations in which a referenced attribute cannot be found. This method is only called when an attribute that is referenced is neither an instance attribute nor is it found in the class tree of that object. This method should return some value for the attribute or raise anAttributeError
exception. For example, if x is an instance of theAccount
class defined above, trying to access an attribute that does not exist will result in a call to this method as shown in the following snippet
Care should be taken with the implementation of __getattr__
because if the implementation references an instance attribute that does not exist, an infinite loop may occur because the __getattr__
method is called successively without end.
-
__getattribute__
: This method is implemented to customize the attribute access for a class. This method is always called unconditionally during attribute access for instances of a class. -
__setattr__
: This method is implemented to unconditionally handle all attribute assignment.__setattr__
should insert the value being assigned into the dictionary of the instance attributes rather than usingself.name=value
which results in an infinite recursive call. When__setattr__()
is used for instance attribute assignment, the base class method with the same name should be called such assuper().__setattr__(self, name, value)
. -
__delattr__
: This is implemented to customize the process of deleting an instance of a class. it is invoked wheneverdel obj
is called. -
__dir__
: This is implemented to customize the list of object attributes returned by a call todir(obj)
.
Special methods for Type Emulation
Built-in types in python have special operators that work with them.
For example, numeric types support the +
operator for adding two numbers, numeric types also support the -
operator for subtracting two numbers, sequence and mapping types support the []
operator for indexing values held. Sequence types even also have support for the +
operator for concatenating such sequences.
User defined classes can be customized to behave like these built-in types where it makes sense. This can be done by implementing the special methods that are invoked by the interpreter when these special operators are encountered. The special methods that provide these functionalities for emulating built-in types can be broadly grouped into one of the following:
Numeric Type Special Methods
The following table shows some of the basic operators and the special methods invoked when these operators are encountered.
Special Method | Operator Description |
---|---|
a.__add__(self, b) |
binary addition, a + b |
a.__sub__(self, b) |
binary subtraction, a - b |
a.__mul__(self, b) |
binary multiplication, a * b |
a.__truediv__(self, b) |
division of a by b |
a.__floordiv__(self, b) |
truncating division of a by b |
a.__mod__(self, b) |
a modulo b |
a.__divmod__(self, b) |
returns a divided by b, a modulo b |
a.__pow__(self, b[, modulo]) |
a raised to the bth power |
Python has the concept of reflected operations; this was covered in the section on the NotImplemented
of previous chapter. The idea behind this concept is that if the left operand of a binary arithmetic operation does not support a required operation and returns NotImplemented
then an attempt is made to call the corresponding reflected operation on the right operand provided the type of both operands differ. An example of this rarely used functionality is shown in the following trivial example for emphasis.
In the next snippet the class implements the reflected special method and this reflected method is called by the interpreter.
The following special methods implement reflected binary arithmetic operations.
Special Method | Operator Description |
---|---|
a.__radd__(self, b) |
reflected binary addition, a + b |
a.__rsub__(self, b) |
reflected binary subtraction, a - b |
a.__rmul__(self, b) |
reflected binary multiplication, a * b |
a.__rtruediv__(self, b) |
reflected division of a by b |
a.__rfloordiv__(self, b) |
reflected truncating division of a by b |
a.__rmod__(self, b) |
reflected a modulo b |
a.__rdivmod__(self, b) |
reflected a divided by b, a modulo b |
a.__rpow__(self, b[, modulo]) |
reflected a raised to the bth power |
Another set of operators that work with numeric types are the augmented assignment operators. An example of an augmented operation is shown in the following code snippet.
A few of the special methods for implementing augmented arithmetic operations are listed in the following table.
Special Method | Description |
---|---|
a.__iadd__(self, b) |
a += b |
a.__isub__(self, b) |
a -= b |
a.__imul__(self, b) |
a *= b |
a.__itruediv__(self, b) |
a //= b |
a.__ifloordiv__(self, b) |
a /= b |
a.__imod__(self, b) |
a %= b |
a.__ipow__(self, b[, modulo]) |
a **= b |
Sequence and Mapping Types Special Methods
Sequence and mapping are often referred to as container types because they can hold references to other objects. User-defined classes can emulate container types to the extent that this makes sense if such classes implement the special methods listed in the following table.
Special Method | Description |
---|---|
__len__(obj) |
returns length of obj . This is invoked to implement the built-in function len() . An object that doesn’t define a __bool__() method and whose __len__() method returns zero is considered to be false in a Boolean context. |
__getitem__(obj, key) |
fetches item, obj[key] . For sequence types, the keys should be integers or slice objects. If key is of an inappropriate type, TypeError may be raised; if the key has a value outside the set of indices for the sequence, IndexError should be raised. For mapping types, if key is absent from the container, KeyError should be raised. |
__setitem__(obj, key, value) |
Sets obj[key] = value
|
__delitem__(obj, key) |
deletes obj[key] . Invoked by del obj[key]
|
__contains__(obj, key) |
Returns true if key is contained in obj and false otherwise. Invoked by a call to key in obj
|
__iter__(self) |
This method is called when an iterator is required for a container. This method should return a new iterator object that can iterate over all the objects in the container. For mappings, it should iterate over the keys of the container. Iterator objects also need to implement this method; they are required to return themselves. This is also used by the for..in construct. |
Sequence types such as lists support the addition (for concatenating lists) and multiplication operators (for creating copies), + and *
respectively, by defining the methods __add__()
, __radd__()
, __iadd__()
, __mul__()
, __rmul__()
and __imul__()
. Sequence types also implement the __reversed__
method that implements the reversed()
method that is used for reverse iteration over a sequence. User defined classes can implement these special methods to get the required functionality.
Emulating Callable Types
Callable types support the function call syntax, (args)
. Classes that implement the __call__(self[, args...])
method are callable. User defined classes for which this functionality makes sense can implement this method to make class instances callable. The following example shows a class implementing the __call__(self[, args...])
method and how instances of this class can be called using the function call syntax.
Special Methods for comparing objects
User-defined classes can provide custom implementation for the special methods invoked by the five object comparison operators in python, <
, >
, >=
, <=
, =
in order to control how these operators work. These special methods are given in the following table.
Special Method | Description |
a.__lt__(self, b) |
a < b |
a.__le__(self, b) |
a <= b |
a.__eq__(self, b) |
a == b |
a.__ne__(self, b) |
a != b |
a.__gt__(self, b) |
a > b |
a.__ge__(self, b) |
a >= b |
In Python, x==y
is True
does not imply that x!=y
is False
so __eq__()
should be defined along with __ne__()
so that the operators are well behaved. __lt__()
and __gt__()
, and __le__()
and __ge__()
are each other’s reflection while __eq__()
and __ne__()
are their own reflection; this means that if a call to the implementation of any of these methods on the left argument returns NotImplemented
, the reflected operator is is used.
Special Methods and Attributes for Miscellaneous Customizations
-
__slots__
: This is a special attribute rather than a method. It is an optimization trick that is used by the interpreter to efficiently store object attributes. Objects by default store all attributes in a dictionary (the__dict__
attribute) and this is very inefficient when objects with few attributes are created in large numbers.__slots__
make use of a static iterable that reserves just enough space for each attribute rather than the dynamic__dict__
attribute. The iterable representing the__slot__
variable can also be a string made up of the attribute names. The following example shows how__slots__
works.
A few things that are worth noting about __slots__
include the following:
- If a superclass has the
__dict__
attribute then using__slots__
in sub-classes is of no use as the dictionary is available. - If
__slots__
are used then attempting to assign to a variable not in the__slots__
variable will result in anAttributeError
as shown in the previous example. - Sub-classes will have a
__dict__
even if they inherit from a base class with a__slots__
declaration; subclasses have to define their own__slots__
attribute which must contain only the additional names in order to avoid having the__dict__
for storing names. - Subclasses with “variable-length” built-in types as base class cannot have a non-empty
__slots__
variable. -
__bool__
: This method implements the truth value testing for a given class; it is invoked by the built-in operationbool()
and should return aTrue
orFalse
value. In the absence of an implementation,__len__()
is called and if__len__
is implemented, the object’s truth value is considered to beTrue
if result of the call to__len__
is non-zero. If neither__len__()
nor__bool__()
are defined by a class then all its instances are considered to beTrue
. -
__repr__
and__str__
: These are two closely related methods as they both return string representations for a given object and only differ subtly in the intent behind their creation. Both are invoked by a call torepr
andstr
methods respectively. The__repr__
method implementation should return an unambiguous string representation of the object it is being called on. Ideally, the representation that is returned should be an expression that when evaluated by theeval
method returns the given object; when this is not possible the representation returned should be as unambiguous as possible. On the other hand,__str__
exists to provide a human readable version of an object; a version that would make sense to some one reading the output but that doesn’t necessarily understand the semantics of the language. A very good illustration of how both methods differ is shown below by calling both methods on a data object.When using string interpolation,
%r
makes a call torepr
while%s
makes a call tostr
. -
__bytes__
: This is invoked by a call to thebytes()
built-in and it should return a byte string representation for an object. The byte string should be abytes
object. -
__hash__
: This is invoked by thehash()
built-in. It is also used by operations that work on types such asset
,frozenset
, anddict
that make use of object hash values. Providing__hash__
implementation for user defined classes is an involved and delicate act that should be carried out with care as will be seen. Immutable built-in types are hashable while mutable types such as lists are not. For example, the hash of a number is the value of the number as shown in the following snippet.
User defined classes have a default hash value that is derived from their id()
value. Any __hash__()
implementation must return an integer and objects that are equal by comparison must have the same hash value so for two object, a
and b
, (a==b and hash(a)==hash(b))
must be true.
A few rules for implementing a __hash__()
method include the following:
1. A class should only define the __hash__()
method if it also defines the __eq__()
method.
- The absence of an implementation for the
__hash__()
method in a class renders its instances unhashable. - The interpreter provides user-defined classes with default implementations for
__eq__()
and__hash__()
. By default, all objects compare unequal except with themselves andx.__hash__()
returns a value such that (x == y
andx is y
andhash(x) == hash(y)
) is always true. In CPython, the default__hash__()
implementation returns a value derived from theid()
of the object. - Overriding the
__eq__()
method without defining the__hash__()
method sets the__hash__()
method toNone
in the class. When the__hash__()
method of a class isNone
, an instance of the class will raise an appropriateTypeError
when an attempt is made to retrieve its hash value. The object will also be correctly identified as unhashable when checkingisinstance(obj, collections.Hashable)
. - If a class overrides the
__eq__()
and needs to keep the implementation of__hash__()
from a base class, this must be done explicitly by setting__hash__ = BaseClass.__hash__.
- A class that does not override the
__eq__()
can suppress hash support by setting__hash__
toNone
. If a class defines its own__hash__()
method that explicitly raises aTypeError
, instances of such class will be incorrectly identified as hashable by anisinstance(obj, collections.Hashable)
test.
6.3 A Vector
class
In this section, a complete example of the use of special methods to emulate built-in types is provided by a Vector
class. The Vector
class provides support for performing vector arithmetic operations.
6.4 Inheritance
Inheritance is one of the basic tenets of object oriented programming and python supports multiple inheritance just like C++
. Inheritance provides a mechanism for creating new classes that specialise or modify a base class thereby introducing new functionality. We call the base class the parent class or the super class. An example of a class inheriting from a base class in python is given in the following example.
The super
keyword
The super
keyword plays an integral part in python inheritance. In a single inheritance hierarchy, the super
keyword is used to refer to the parent/super class without explicitly naming it. This is similar to the super
method in Java. This comes into play when overriding a method and there is a need to also call the parent version of such method as shown in the above example in which the __init__
method in the SavingsAccount
class is overridden but the __init__
method of the parent class is also called using the super
method. The super
keyword plays a more integral role in python inheritance when a multiple inheritance hierarchy exists.
Multiple Inheritance
In multiple inheritance, a class can have multiple parent classes. This type of hierarchy is strongly discouraged.
One of the issues with this kind of inheritance is the complexity involved in properly resolving methods when called. Imagine a class, D
, that inherits from two classes, B
and C
and there is a need to call a method from the parent classes however both parent classes implement the same method. How is the order in which classes are searched for the method determined ?
A Method Resolution Order algorithm determines how a method is found in a class or any of the class’ base classes. In Python, the resolution order is calculated at class definition time and stored in the class __dict__
as the __mro__
attribute. To illustrate this, imagine a class hierarchy with multiple inheritance such as that showed in the following example.
To obtain an mro
, the interpreter method resolution algorithm carries out a left to right depth first listing of all classes in the hierarchy. In the trivial example above, this results in the following class list, [D, B, A, C, A, object]
. Note that all objects will inherit from the root object
class if no parent class is supplied during class definition. Finally, for each class that occurs multiple times, all occurrences are removed except the last occurrence resulting in an mro of [D, B, C, A, object]
for the previous class hierarchy. This result is the order in which classes would be searched for attributes for a given instance of D
.
Cooperative method calls with super
This section will show the power of the super
keyword in a multiple inheritance hierarchy. The class hierarchy from the previous section is used. This example is from the excellent write up by Guido Van Rossum on Type Unification. Imagine that A
defines a method that is overridden by B
, C
and D
. Suppose that there is a requirement that all the methods are called; the method may be a save
method that saves an attribute for each type it is defined for, so missing any call will result in some unsaved data in the hierarchy. A combination of super
and __mro__
provide the ammunition for solving this problem. This solution is referred to as the call-next method by Guido van Rossum and is shown in the following snippet:
When self.meth() is called by an instance of D
for example, super(D, self).meth()
will find and call B.meth(self)
, since B is the first base class following D
that defines meth
in D.__mro__
. Now in B.meth
, super(B, self).m()
is called and since self
is an instance of D
, the next class after B
is C
(__mro__
is [D, B, C, A]
) and the search for a definition of meth
continues here. This finds C.meth
which is called, and which in turn calls super(C, self).m()
. Using the same MRO, the next class after C
is A
, and thus A.meth
is called. This is the original definition of m, so no further super() call is made at this point. Using super
and method resolution order, the interpreter has been able to find and call all version of the meth
method implemented by each of the classes in the hierarchy. However, multiple inheritance is best avoided because for more complex class hierarchies, the calls may be way more complicated than this.
6.5 Static and Class Methods
All methods defined in a class by default operate on instances. However, one can define static or class methods by decorating such methods with the corresponding @staticmethod
or @classmethod
decorators.
Static Methods
Static methods are normal functions that exist in the name-space of a class. Referencing a static method from a class shows that rather than an unbound method type, a function type is returned as shown below:
To define a static method, the @staticmethod
decorator is used and such methods do not require the self
argument. Static methods provide a mechanism for better organization as code related to a class are placed in that class and can be overridden in a sub-class as needed. Unlike ordinary class methods that are wrappers around the actual underlying functions, static methods return the underlying functions without any modification when used.
Class Methods
Class methods as the name implies operate on classes themselves rather than instances. Class methods are created using the @classmethod
decorator with the class
rather than instance
passed as the first argument to the method.
A motivating example of the usage of class methods is as a factory for object creation. Imagine data for the Account
class comes in different formats such as tuples, json string etc. It is not possible to define multiple __init__
methods in a class so class methods come in handy for such situations. In the Account
class defined above for example, there is a requirement to initialize an account from a json string object so we define a class factory method, from_json
that takes in a json string object and handles the extraction of parameters and creation of the account object using the extracted parameters. Another example of a class method in action as a factory method is the dict.fromkeys
methods that is used for creating dict objects from a sequence of supplied keys and value.
6.6 Descriptors and Properties
Descriptors are an esoteric but integral part of the python programming language. They are used widely in the core of the python language and a good grasp of descriptors provides a python programmer with a deeper understanding of the language. To set the stage for the discussion of descriptors, some scenarios that a programmer may encounter are described; this is followed by an explanation of descriptors and how they provide elegant solutions to these scenarios.
- Consider a program in which some rudimentary type checking of object data attributes needs to be enforced. Python is a dynamic languages so does not support type checking but this does not prevent anyone from implementing a version of type checking regardless of how rudimentary it may be. The conventional way to go about type checking object attributes may take the following form.
The above method maybe feasible for enforcing such type checking for one or two data attributes but as the attributes increase in number it gets cumbersome. Alternatively, a
type_check(type, val)
function could be defined and this will be called in the__init__
method before assignment; but this cannot be elegantly applied when the attribute value is set after initialization. A quick solution that comes to mind is thegetters
andsetters
present in Java but that is un-pythonic and cumbersome. - Consider a program that needs object attributes to be read-only once initialized. One could also think of ways of implementing this using Python special methods but once again such implementation could be unwieldy and cumbersome.
- Finally, consider a program in which the attribute access needs to be customized. This maybe to log such attribute access or to even perform some kind of transformation of the attribute for example. Once again, it is not too difficult to come up with a solution to this although such solution maybe unwieldy and not reusable.
All the above mentioned issues are all linked together by the fact that they are all related to attribute references. Attribute access is trying to be customized.
Enter Python Descriptors
Descriptors provide elegant, simple, robust and re-usable solutions to the above listed issues. Simply put, a descriptor is an object that represents the value of an attribute. This means that if an account object has an attribute name
, a descriptor is another object that can be used to represent the value held by that attribute, name
. Such an object implements the __get__
, __set__
or __delete__
special methods of the descriptor protocol. The signature for each of these methods is shown below:
Objects implementing only the __get__
method are non-data descriptors so they can only be read from after initialization while objects implementing the __get__
and __set__
are data descriptors meaning that such descriptor objects are writable.
To get a better understanding of descriptors descriptor based solutions are provided to the issues mentioned in the previous section. Implementing type checking on an object attribute using descriptors is a simple and straightforward task. A decorator implementing this type checking is shown in the following snippet.
In the example, a descriptor, TypedAttribute
is implemented and this descriptor class enforces rudimentary type checking for any attribute of a class which it is used to represent. It is important to note that descriptors are effective in this kind of case only when defined at the class level rather than instance level i.e. in __init__
method as shown in the example above.
Descriptors are integral to the Python language. Descriptors provide the mechanism behind properties, static methods, class methods, super
and a host of other functionality in Python classes. In fact, descriptors are the first type of object searched for during an attribute reference. When an object is referenced, a reference, b.x
, is transformed into type(b).__dict__['x'].__get__(b, type(b))
. The algorithm then searches for the attribute in the following order.
-
type(b).__dict__
is searched for the attribute name and if a data descriptor is found, the result of calling the descriptor’s__get__
method is returned. If it is not found, then all base classes oftype(b)
are searched in the same way. -
b.__dict__
is searched and if attribute name is found here, it is returned. -
type(b).__dict
is searched for a non-data descriptor with given attribute name and if found it is returned, - If the name is not found, an
AttributeError
is raised or__getattr__()
is called if provided.
This precedence chain can be overridden by defining custom __getattribute__
methods for a given object class (the precedence defined above is contained in the default __getattribute__
provided by the interpreter).
With a firm understanding of the mechanics of descriptors, it is easy to implement elegant solutions to the second and third issues raised in the previous section. Implementing a read only attribute with descriptors becomes a simple case of implementing a non-data descriptor i.e descriptor with no __set__
method. To solve the custom access issue, whatever functionality is required is added to the __get__
and __set__
methods respectively.
Class Properties
Defining descriptor classes each time a descriptor is required is cumbersome. Python properties provide a concise way of adding data descriptors to attributes. A property signature is given below:
fget
, fset
and fdel
are the getter, setter and deleter methods for such class attributes. The process of creating properties is illustrated with the following example.
If acct
is an instance of Account
, acct.acct_num
will invoke the getter, acct.acct_num = value
will invoke the setter and del acct_num.acct_num
will invoke the deleter.
The property object and functionality can be implemented in python as illustrated in Descriptor How-To Guide using the descriptor protocol as shown below :
Python also provides the @property
decorator that can be used to create read only attributes. A property object has getter, setter, and deleter decorator methods that can be used to create a copy of the property with the corresponding accessor function set to the decorated function. This is best explained with an example:
If a property is read-only then the setter
method is left out.
An understanding of descriptors puts us in a better corner to understand what actually goes on during a method call. Note that methods are stored as ordinary functions in a class dictionary as shown in the following snippet.
However, object methods are of bound method type as shown in the following snippet.
To understand how this transformation takes place, note that a bound method is just a thin wrapper around the class function. Functions are descriptors because they have the __get__
method attribute so a reference to a function will result in a call to the __get__
method of the function and this returns the desired type, the function itself or a bound method, depending on whether this reference is from a class or from an instance of the class. It is not difficult to imagine how static and class methods maybe implemented by the function descriptor and this is left to the reader to come up with.
6.7 Abstract Base Classes
Sometimes, it is necessary to enforce a contract between classes in a program. For example, it may be necessary for all classes to implement a set of methods. This is accomplished using interfaces and abstract classes in statically typed languages like Java. In Python, a base class with default methods may be implemented and then all other classes within the set inherit from the base class. However, there is a requirement for each each subclass to have its own implementation and this rule needs to be enforced. All the needed methods can be defined in a base class with each of them having an implementation that raises the NotImplementedError
exception. All subclasses then have to override these methods in order to use them. However this does not still solve the problem fully. It is possible that some subclasses may not implement some of these method and it would not be known till a method call was attempted at runtime.
Consider another situation of a proxy object that passes method calls to another object. Such a proxy object may implement all required methods of a type via its proxied object, but an isinstance
test on such a proxy object for the proxied object will fail to produce the correct result.
Python’s Abstract base classes provide a simple and elegant solution to these issues mentioned above. The abstract base class functionality is provided by the abc module. This module defines a meta-class (we discuss meta-classes in the chapter on meta-programming) and a set of decorators that are used in the creation of abstract base classes.
When defining an abstract base class we use the ABCMeta
meta-class from the abc
module as the meta-class for the abstract base class and then make use of the @abstractmethod
and @abstractproperty
decorators to create methods and properties that must be implemented by non-abstract subclasses. If a subclass doesn’t implement any of the abstract methods or properties then it is also an abstract class and cannot be instantiated as illustrated below:
Once, a class implements all abstract methods then that class becomes a concrete class and can be instantiated by a user.
Abstract base classes also allow existing classes to register as part of its hierarchy but it performs no check on whether such classes implement all the methods and properties that have been marked as abstract. This provides a simple solution to the second issue raised in the opening paragraph. Now, a proxy class can be registered with an abstract base class and isinstance
check will return the correct answer when used.
Abstract base classes are used a lot in python library. They provide a mean to group python objects such as number types that have a relatively flat hierarchy. The collections
module also contains abstract base classes for various kinds of operations involving sets, sequences and dictionaries.
Whenever we want to enforce contracts between classes in python just as interfaces do in Java, abstract base classes is the way to go.
7. The Function
The function is another organizational unit of code in Python. Python functions are either named or anonymous set of statements or expressions. In Python, functions are first class objects. This means that there is no restriction on function use as values; introspection on functions can be carried out, functions can be assigned to variables, functions can be used as arguments to other function and functions can be returned from method or function calls just like any other python value such as strings and numbers.
7.1 Function Definitions
The def
keyword is the usual way of creating user-defined functions. Function definitions are executable statements.
When a function definition such as the square
function defined above is encountered, only the function definition statement, that is def square(x)
, is executed; this implies that all arguments are evaluated. The evaluation of arguments has some implications for function default arguments that have mutable data structure as values; this will be covered later on in this chapter. The execution of a function definition binds the function name in the current name-space to a function object which is a wrapper around the executable code for the function. This function object contains a reference to the current global name-space which is the global name-space that is used when the function is called. The function definition does not execute the function body; this gets executed only when the function is called.
Python also has support for anonymous functions. These functions are created using the lambda
keyword. Lambda expressions in python are of the form:
Lambda expressions return function objects after evaluation and have same attributes as named functions. Lambda expressions are normally only used for very simple functions in python due to the fact that a lambda definition can contain only one expression. A lambda definition for the square function defined above is given in the following snippet.
7.2 Functions are Objects
Functions just like other values are objects. Functions have the type function
.
Like every other object, introspection on functions using the dir()
function provides a list of function attributes.
Some important function attributes include:
-
__annotations__
this attribute contains optional meta-data information about arguments and return types of a function definition. Python 3 introduced the optional annotation functionality primarily to help tools used in developing python software. An example of a function annotation is shown in the following example.Parameters are annotated by a colon after the parameter name, followed by an expression evaluating to the value of the annotation. Return values are annotated by a literal
->
, followed by an expression, between the parameter list and the colon denoting the end of thedef
statement. In the case of default values for functions, the annotation is of the following form. -
__doc__
returns the documentation string for the given function. -
__name__
returns function name. -
__module__
returns the name of module function is defined in. -
__defaults__
returns a tuple of the default argument values. Default arguments are discussed later on. -
__kwdefaults__
returns adict
containing default keyword argument values. -
__globals__
returns a reference to the dictionary that holds the function’s global variables (see the chapter 5 for a word on global variables). -
__dict__
returns the name-space supporting arbitrary function attributes. -
__closure__
returns tuple of cells that contain bindings for the function’s free variables. Closures are discussed later on in this chapter.
Functions can be passed as arguments to other functions. These functions that take other functions as argument are commonly referred to as higher order functions and these form a very important part of functional programming. A very good example of a higher order function is the map
function that takes a function and an iterable
and applies the function to each item in the iterable
returning a new list. In the following example, we illustrate the use of the map()
higher order function by passing the square
function previously defined and an iterable
of numbers to the map
function.
A function can be defined inside another function as well as returned from a function call.
In the previous example, a function, counter
is defined within another function, make_counter
, and the counter
function is returned whenever the make_counter
function is executed. Functions can also be assigned to variables just like any other python object as shown below:
In the above example, the make_counter
function returns a function when called and this is assigned to the variable func
. This variable refers to a function object and can be called just like any other function as shown in the following example:
7.3 Functions are descriptors
As mentioned in the previous chapter, functions are also descriptors. An inspection of the attributes of a function as shown in the following example shows that a function has the __get__
method attribute thus making them non-data descriptors.
This __get__
method is called whenever a function is referenced and provides the mechanism for handling method calls from objects and ordinary function calls. This descriptor characteristic of a function enables functions to return either itself or a bound method/ when referenced depending on where and how it is referenced.
7.4 Calling Functions
In addition to calling functions in the conventional way with normal arguments, Python also supports functions with variable number of arguments. These variable number of arguments come in three flavours that are described below:
-
Default Argument Values: This allows a user to define default values for some or all function arguments. In this case, such a function can be called
with fewer arguments and the interpreter will use default values for arguments that are not supplied during function call. This following example is illustrative.
The above function has been defined with a single normal positional argument,
arg
and two default arguments,def_arg
anddef_arg2
. The function can be called in any of the following ways below:- Supplying non-default positional argument values only; in this case the other arguments take on the supplied default values:
- Supplying values to override some default arguments in addition to the non-default positional arguments:
- Supplying values for all arguments overriding even arguments with default values.
It is also very important to be careful when using mutable data structures as default arguments. Function definitions get executed only once so these mutable data structures are created once at definition time. This means that the same mutable data structure is used for all function calls as shown in the following example:
On every function call,
Hello World
is added to thedef_arg
list and after two function calls the default argument has two hello world strings. It is important to take note of this when using mutable default arguments as default values. -
Keyword Argument: Functions can be called using keyword arguments of the form
kwarg=value
. Akwarg
refers to the name of arguments used in a function definition. Take the function defined below with positional non-default and default arguments.To illustrate function calls with key word arguments, the following function can be called in any of the following ways:
In a function call, keyword arguments must not come before non-keyword arguments thus the following will fail:
A function cannot supply duplicate values for an argument so the following is illegal:
In the above the argument
arg
is a positional argument so the value
test
is assigned to it. Trying to assign to the keywordarg
again is an attempt at multiple assignment and this is illegal.All the keyword arguments passed must match one of the arguments accepted by the function and the order of keywords including non-optional arguments is not important so the following in which the order of argument is switched is legal:
-
Arbitrary Argument List: Python also supports defining functions that
take arbitrary number of arguments that are passed to the function in a
tuple. An example of this from the python tutorial is given below:The arbitrary number of arguments must come after normal arguments; in this case, after the
file
andseparator
arguments. The following is an example of function calls to the above defined function:The arguments
one two three four five
are all bunched together into a tuple that can be accessed via theargs
argument.
Unpacking Function Argument
Sometimes, arguments for a function call are either in a tuple, a list or a dict
. These arguments can be unpacked into functions for function calls using *
or **
operators. Consider the following function that takes two positional arguments and prints out the values
If the values for a function call are in a list then these values can be unpacked directly into the function as shown below:
Similarly, dictionaries can be used to store keyword to value
mapping and the **
operator is used to unpack the keyword arguments to the functions as shown below:
*
and **
Function Parameters
Sometimes, when defining a function, it is not known before hand the number of arguments to expect. This leads to function definitions of the following signature:
The *args
argument represents an unknown length of sequence of positional arguments while **kwargs
represents a dict of keyword name value mappings which may contain any amount of keyword name value mapping. The *args
must come before **kwargs
in the function definition. The following illustrates this:
The normal argument must be supplied to the function but the *args
and **kwargs
are optional as shown below:
At function call the normal argument(s) is/are supplied normally while the optional arguments are unpacked. This kind of function definition comes in handy when dealing with function decorators as will be seen in the chapter on decorators.
7.5 Nested functions and Closures
Function definitions within another function creates nested functions as shown in the following snippet.
In the nested function definition, the function counter
is in scope only inside the function make_counter
, so it is often useful when the counter
function is returned from the make_counter
function.
In nested functions such as in the above example, a new instance of the nested function is created on each call to outer function. This is because during each execution of the make_counter
function, the definition of the counter
function is executed but the body is not executed.
A nested function has access to the environment in which it was created. A result is that a variable defined in the outer function can be referenced in the inner function even after the outer functions has finished execution.
When nested functions reference variables from the outer function in which they are defined, the nested function is said to be closed over the referenced variable.
The __closure__
special attribute of a function object is used to access the closed variables as shown in the next example.
Closures in previous versions of Python have a quirky behaviour. In Python 2.x and below, variables that reference immutable types such as string and numbers cannot be rebound within a closure. The following example illustrates this.
A rather wonky solution to this is to make use of a mutable type to capture the closure as shown below:
Python 3 introduced the nonlocal
key word that fixed this closure scoping issue as shown in the following snippet.
Closures can be used for maintaining states (isn’t that what classes are for) and for some simple cases provide a more succinct and readable solution than classes. A class version of a logging API tech_pro is shown in the following example.
The same functionality that the class based version possesses can be implemented with functions closures as shown in the following snippet:
The closure based version as can be seen is way more succinct and readable even though both versions implement exactly the same functionality. Closures also play a major role in a major function decorators. This is a widely used functionality that is explained in the chapter on meta-programming.
Closures also form the basis for the partial
function, a function that is described in detail in the next section.
With a firm understanding of functions, a tour of some techniques and modules for functional programming in Python is given in the following section.
7.6 A Byte of Functional Programming
The Basics
The hallmark of functional programming is the absence of side effects in written code. This essentially means that in the functional style of programming object values do not change once they are created and to reflect a change in an object value, a new object with the changed value is created. An example of a function with side effects is the following snippet in which the original argument is modified and then returned.
A functional version of the above would avoid any modification to arguments and create new values that are then returned as shown in the following example.
Language features such as first class functions make functional programming possible while programming techniques such as mapping
, reducing
, filtering
, currying
and recursion
are examples of techniques for implementing a functional style of programming. In the above example, the map
function applies the function lambda x:x*x
to each element in the supplied sequence of numbers.
Python provides built-in functions such as map
, filter
and reduce
that aid in functional programming. A description of these functions follows.
-
map(func, iterable)
: This is a classic functional programming construct that takes a function and an iterable as argument and returns an iterator that applies the function to each item in the iterable. Thesquares
function from above is an illustration ofmap
in use. The ideas behind themap
andreduce
constructs have seen application in large scale data processing with the popularMapReduce
programming model that is used to fan out (map
) operation on large data streams to a cluster of distributed machines for computation and then gather the result of these computations together (reduce
). -
filter(func, iterable)
: This also takes a function and an iterable as argument. It returns an iterator that appliesfunc
to each element of the iterable and returns elements of the iterable for which the result of the application isTrue
. The following trivial example selects all even numbers from a list. -
reduce(func, iterable[, initializer])
: This is no longer a built-in and was moved into thefunctools
modules in Python 3 but is discussed here for completeness. Thereduce
function appliesfunc
cumulatively to the items initerable
in order to get a single value that is returned.func
is a function that takes two positional arguments. For example,reduce(lambda x, y: x+y, [1, 2, 3, 4, 5])
calculates((((1+2)+3)+4)+5)
; it starts out reducing the first two arguments then reduces the third with the result of the first two and so on. If the optional initializer is provided, then it serves as the base case. An illustration of this is flattening a nested list which we illustrate below.
The above listed functions are examples of built-in higher order functions in Python. Some of the functionality they provide can be replicated using more common constructs. Comprehensions are one of the most popular alternatives to these higher order functions.
Comprehensions
Python comprehensions are syntactic constructs that enable sequences to be built from other sequences in a clear and concise manner. Python comprehensions are of three types namely:
- List Comprehensions.
- Set Comprehensions.
- Dictionary Comprehensions.
List Comprehensions
List comprehensions are by far the most popular Python comprehension construct. List comprehensions provide a concise way to create new list of elements that satisfy a given condition from an iterable. A list of squares for a sequence of numbers can be computed using the following squares
function that makes use of the map
function.
The same list can be created in a more concise manner by using list comprehensions rather than the map
function as in the following example.
The comprehension version is clearer and more concise than the conventional map
method for one without any experience in higher order functions.
According to the python documentation,
a list comprehension consists of square brackets containing an expression followed by a for clause and zero or more for or if clauses.
The result of a list comprehension expression is a new list that results from evaluating the expression in the context of the for and if clauses that follow it. For example, to create a list of the squares of even numbers between 0 and 10, the following comprehension is used.
The expression i**2
is computed in the context of the for clause that iterates over the numbers from 0 to 10 and the if clause that filters out non-even numbers.
Nested for loops and List Comprehensions
List comprehensions can also be used with multiple or nested for loops. Consider for example, the simple code fragment shown below that creates a tuple from pair of numbers drawn from the two sequences given.
The above can be rewritten in a more concise and simple manner as shown below using list comprehensions
It is important to take into consideration the order of the for loops as used in the list comprehension. Careful observation of the code snippets using comprehension and that without comprehension shows that the order of the for loops in the comprehension follows the same order if it had been written without comprehensions. The same applies to nested for loops with nesting depth greater than two.
Nested List Comprehensions
List comprehensions can also be nested. Consider the following example drawn from the Python documentation of a 3x4
matrix implemented as a list of 3 lists each of length 4:
Transposition is a matrix operation that creates a new matrix from an old one using the rows of the old matrix as the columns of the new matrix and the columns of the old matrix as the rows of the new matrix. The rows and columns of the matrix can be transposed using the following nested list comprehension:
The above is equivalent to the following snippet.
Set Comprehensions
In set comprehensions, braces rather than square brackets are used to create new sets. For example, to create the set of the squares of all numbers between 0 and 10, the following set comprehensions is used.
Dict Comprehensions
Braces are also used to create new dictionaries in dict
comprehensions. In the following example, a mapping of a number to its square is created using
dict
comprehensions.
Functools
The functools
module in Python contains a few higher order functions that act on and return other functions. A few of the interesting higher order functions that are included in this module are described.
-
partial(func, *args, **keywords)
This is a function that when called returns an object that can be called like the originalfunc
argument with*args
and**keywords
as arguments. If the returned object is called with additional*args
or**keyword
arguments then these are added to the original*args
and**keywords
and the updated set of arguments are used in the function call. This is illustrated with the following trivial example.In the above example, a new callable,
basetwo
, that takes a number in binary and converts it a number in decimal is created. What has happened is that theint()
functions that takes two arguments has been wrapped by a callable,basetwo
that takes only one argument. To understand how this may work, take your mind back to the discussion about closures and how variable captures work. Once this is understood, it is easy to imagine how to implement this partial function. The partial function has functionality that is equivalent to the following closure as defined in the Python documentation.Partial objects provide elegant solutions to some practical problems that are encountered during development. For example, suppose one has a list of points represented as tuples of
(x,y)
coordinates and there is a requirement to sort all the points according to their distance from some other central point. The following function computes the distance between two points in the xy plane:The built-in
sort()
method of lists is handy here and accepts a key argument that can be used to customize sorting, but it only works with functions that take a single argument thusdistance()
is unsuitable. Thepartial
method provides an elegant method of dealing with this as shown in the following snippet.The
partial
function creates and returns a callable that takes a single argument, a point. Now note that the partial object has captured the reference point,pt
already so when the key is called with the point argument, the distance function passed to the partial function is used to compute the distance between the supplied point and the reference point. -
@functools.lru_cache(maxsize=128, typed=False)
: This decorator is used to wrap a function with a memoizing callable that saves up to themaxsize
number of most recent calls. Whenmaxsize
is reached, oldest cached values are ejected. Caching can save time when an expensive or I/O bound function is periodically called with the same arguments. This decorator makes use of adictionary
for storing results so is limited to caching only arguments that are hashable. Thelru_cache
decorator provides a function, thecache_info
for stats on cache useage. -
@functools.singledispatch
: This is a decorator that changes a function into a single dispatch generic function. The functionality aims to handle dynamic overloading in which a single function can handle multiple types. The mechanics of this is illustrated with the following code snippet.A generic function is defined with the
@singledispatch
function, theregister
decorator is then used to define functions for each type that is handled. Dispatch to the correct function is carried out based on the type of the first argument to the function call hence the name,single generic dispatch
. In the event that no function is defined for the type of the first argument then the base generic function,fun
in this case is called.
Sequences and Functional Programming
Sequences such as lists and tuples play a central role in functional programming. The Structure and Interpretation of Computer Programs, one of the greatest computer science books ever written devotes almost a whole chapter to discussing sequences and their processing. The importance of sequences can be seen from their pervasiveness in the language. Built-ins such as map
and filter
consume and produce sequences. Other built-ins such as min
, max
, reduce
etc. consume sequence and return values. Functions such range
, dict.items()
produce sequences.
The ubiquity of sequences requires that they are represented efficiently. One could come up with multiple ways of representing sequences. For example, a naive way of implementing sequences would be to store all the members of a sequence in memory. This however has a significant drawback that sequences are limited in size to the RAM available on the machine. A more clever solution is to use a single object to represent sequences. This object knows how to compute the next required elements of the sequence on the fly just as it is needed. Python has a built-in protocol exactly for doing this, the __iter__
protocol. This is strongly related to generators, a brilliant feature of the language and these are both dived into in the next chapter.
8. Iterators and Generators
In the last section of the previous chapter, the central part sequences play in functional programming and the need for their efficient representation was mentioned. The idea of representing a sequence as an objects that computes and returns the next value of a sequence just at the time such value is needed for computation was also introduces. This may seem hard to grasp at first but this chapter is dedicated to explaining all about this wonderful idea. It however begins with a description of a profound construct that has been left out of the discussion till now, iterators.
8.1 Iterators
An iterable in Python is any object that implements the __iter__
special method that when called returns an iterator (the __iter__
special method is invoked by a call to iter(obj)
). Simply put, a Python iterable is any type that can be used with a for..in
loop. Python lists
, tuples
,
dicts
and sets
are all examples of built-in iterables. Iterators are objects that implement the iterator protocol. The iterator protocol in defines the following set of methods that need to be implemented by any object that wants to be used as an iterator.
-
__iter__
method that is called on initialization of an iterator. This should return an object that has a__next__
method. -
__next__
method that is called whenever the next() global function is invoked with the iterator as argument. The iterator’s__next__
method should return the next value of the iterable. When an iterator is used with afor...in
loop, thefor
loop implicitly callsnext()
on the iterator object. This method should raise aStopIteration
exception when there is no longer any new value to return to signal the end of the iteration.
Care should be taken when distinguishing between an iterable
and an iterator
because an iterable is not necessarily an iterator. The following snippet shows how this is possible.
Something worth noting is that most times an iterable object is also an iterator so a call to such an object’s __iter__
special method returns the object itself. This will be seen later on in this section.
Any class that fully implements the iterator protocol can be used as an iterator. This is illustrated in the following by implementing a simple iterator that returns Fibonacci numbers up to a given maximum value.
A custom range
function for looping through numbers can also be modelled as an iterator. The following is a simple implementation of a range function that loops from 0
upwards.
Before attempting to move on, stop for a second and study both examples carefully. The essence of an iterator is that an iterator object knows how to calculate and return the elements in the sequence as needed not all at once. The CustomRange
does not return all the elements in the range when it is initialized rather it returns an object that when the object’s __iter__
method is called returns an iterator object that can calculate the next element of the range using the steps defined in the __next__
method. It is possible to define a range
function that returns all positive whole numbers (an infinite sequence) by simply removing the upper bound on the method. The same idea applies to the Fib
iterator. This basic idea just explained above can be seen in built-in functions that return sequences. For example, the built-in range
function does not return a list as one would intuitively expect but returns an object that returns a range
iterator object when its __iter__
method is called. To get the sequence as expected the range
iterator object is passed to the list constructor as shown in the following example.
The iterator protocol implements a form of computing that is referred to as lazy computation; it does not do more work than it has to do at any given time.
The Itertools
Module
The concept of iterators is so important that Python comes with a module,
the itertools
module, that provides some useful general purpose functions that return iterators. The results of these functions can be obtained eagerly by passing the returned iterator to the list()
constructor. A few of these functions are described below.
-
accumulate(iterable[, func]):
This takes an iterable and an optionalfunc
argument that defaults to theoperator.add
function. The supplied function should take two arguments and return a single value. The elements of the iterable must be a type that is acceptable to the supplied function. A call toaccumulate
returns an iterator that represents the result of applying the supplied function to the elements of the iterable. The accumulated result for the first element of an iterable is the element itself while the accumulated result for the nth element isfunc(nth element, accumulated result of (n-1)th element)
. Examples of the usage of this function are shown in the following snippet. -
chain(*iterables):
This takes a single iterable that contains a variable number of iterables and returns an iterator representing a union of all the iterables in supplied iterable. -
combinations(iterable, r)
This returns an iterator representing a set of r length sub-sequences of elements from the input iterable. Elements are treated as unique based on their value and not on their position. -
filterfalse(predicate, iterable):
: This returns an iterator that filters elements from the iterable argument returning only those for which the value of applying the predicate to the element isFalse
. If predicate isNone
, the function returns the items that are false. -
groupby(iterable, key=None):
This returns an iterator that returns consecutive keys and corresponding groups for these keys from the iterable argument. Thekey
argument is a function computing a key value for each element. If a key function is not specified or isNone
, the key defaults to an identity function that returns the element unchanged. Generally, the iterable needs to already be sorted on the same key function. The returned group is itself an iterator that shares the underlying iterable withgroupby()
. An example usage of this is shown in the following snippet. -
islice(terable, start, stop[, step]):
This returns an iterator that returns elements from the iterable that are within the specified range. If start is non-zero, then elements from the iterable are skipped until start is reached. Afterwards, elements are returned consecutively withstep
elements skipped if step is greater than one just as in the conventionalslice
until the iterable argument is exhausted. Unlike conventional slicing,islice()
does not support negative values for start, stop, or step. -
permutation(iterable, r=None):
This returns a succession ofr
length permutations of elements in the iterable. If r is not specified or is None, it defaults to the length of the iterable. Elements are treated as unique based on their position, not on their value and this is where permutations differs from combinations that was previously defined. So if the input elements are unique, there will be no repeat values in each permutation. -
product(*iterables, repeat=1):
This returns a iterator that returns successive Cartesian product of input iterables. This is equivalent to nested for-loops in a list expression. For example,product(A, B)
returns an iterator that returns values that are the same as[(x,y) for x in A for y in B]
. This function can compute the product of an iterable with itself by specifying the number of repetitions with the optional repeat keyword argument. For example,product(A, repeat=4)
means the same asproduct(A, A, A, A)
.
8.2 Generators
Generators and iterators have a very intimate relationship. In short, Python generators are iterators and understanding generators gives one an idea of how iterators can be implemented. This may sound quite circular but after going through an explanation of generators, it will become clearer. PEP 255 that describes simple generators refers to generators by their full name, generator-iterators. Generators just like the name suggests generate (or consume) values when their __next__
method is called.
Generators are used either by explicitly calling the __next__
method on the generator object or using the generator object in a for...in
loop. Generators are of two types:
- Generator Functions
- Generator Expressions
Generator Functions
Generator functions are functions that contain the yield
expression. Calling a function that contains a yield
expression returns a generator object. For example, the Fibonacci
iterator can be recast as a generator using the yield
keyword as shown in the following example.
The yield
keyword
The yield keyword has the following syntax.
The yield
keyword expression is central to generator functions but what does this expression really do? To understand the yield
expression, contrast it with the return
keyword. The return
keyword when encountered returns control to the caller of a function effectively ending the function execution. This is shown in the following example by calling the normal Fibonacci function to return all Fibonacci numbers less than 10.
On the other hand, the presence of the yield
expression in a function complicates things a bit. When a function with a yield
expression is called, the function does not run like a normal function rather it returns a generator expression. This is illustrated by a call to the fib
function in the following snippet.
The generator object executes when its __next__
method is invoked and the generator object executes all statements in the function definition till the yield
keyword is encountered.
The object suspends execution at that point, saves its context and returns any
value in the expression_list to the caller. When the caller invokes __next__()
method of the generator object, execution of the function continues till
another yield
or return
expression is encountered or end of function is
reached. This continues till the loop condition is false and a StopIteration
exception is raised to signal that there is no more data to generate. To quote
PEP 255,
If a
yield
statement is encountered, the state of the function is frozen, and the value of expression_list is returned to.__next__()'s
caller. By “frozen” we mean that all local state is retained, including the current bindings of local variables, the instruction pointer, and the internal evaluation stack: enough information is saved so that the next time.next()
is invoked, the function can proceed exactly as if the yield statement were just another external call. On the other hand when a function encounters areturn
statement, it returns to the caller along with any value proceeding the return statement and the execution of such function is complete for all intent and purposes. One can think ofyield
as causing only a temporary interruption in the executions of a function.
With a better understanding of generators, it is not difficult to see how generators can be used to implement iterators. Generators know how to calculate the next value in a sequence so functions that return iterators can be rewritten using the yield
statement. To illustrate this, the accumulator
function from the itertools
module can be rewritten using generators as in the following snippet.
Similarly, one can emulate a generator object by implementing the iterator protocol discussed at the start of this chapter. However, the yield
keyword provides a more succinct and elegant method for creating generators.
Generator Expressions
In the previous chapter, list comprehensions were discussed. One drawback with list comprehensions is that values are calculated all at once regardless of whether the values are needed at that time or not (eager calculation). This may sometimes consume an inordinate amount of computer memory. PEP 289 proposed the generator expression to resolve this and this proposal was accepted and added to the language. Generator expressions are like list comprehensions; the only difference is that the square brackets in list comprehensions are replaced by circular brackets that return a generator expression object.
To generate a list of the square of number from 0 to 10 using list comprehensions, the following is done.
A generator expression could be used in place of a list comprehension as shown in the following snippet.
The values of the generator can then be accessed using for...in
loops or via a call to the __next__()
method of the generator object as shown below.
Generator expression create generator objects without using the yield
expression.
The Beauty of Generators and Iterators
Generators really shine when working with massive amounts of data streams.
Consider the very representative but rather trivial example of generating a stream
of prime numbers. A method for calculating this set is the Sieve of
Eratosthenes. The following algorithm will find all the prime numbers less
than or equal to a given integer, n
, by the sieve of Eratosthenes’ method:
When the algorithm terminates, the remaining numbers not marked in the list are
all the primes below n
. Now this is a rather trivial algorithm and this is
implemented using generators.
The above example though very simple, shows the beauty of how generators can
be chained
together with the output of one acting as input to another; think of this
stacking of generators with one another as a kind of processing pipeline.
The filter_multiples_of_n
function is worth discussing a bit here because it
maybe confusing at first. counts(2)
when initialized returns a generator that
returns a sequence of consecutive numbers from 2 so the line,
prime=ints.__next__()
returns 2
on the first iteration. After the yield
expression, ints=filter_multiples_of_n(2, ints)
is invoked creating a generator
that returns a stream of numbers that are not multiples of 2
- note that the original
sequence generator is captured within this new generator (this is very
important). Now on the next iteration of the loop
within the sieve
function, the ints
generator is invoked. The generator loops through the
original sequence now [3, 4, 5, 6, 7, ....]
yielding the first number that
is not divisible by 2, 3 in this case. This part of the pipeline is easy to
understand. The prime, 3
, is yielded from the sieve
function then another
generator that returns non-multiples of the prime, 3, is created and assigned
to ints
. This generator captures the previous generator that produces non-
multiples of 2
and that generator captured the original generator that
produces sequences of infinite consecutive numbers. A call to the __next__()
method of this generator will loop through the previous generator that returns
non-multiples of 2
and every non-multiple of 2
returned by the generator is checked for divisibility
by 3
and if the number is not divisible by 3
it is yielded. This chaining of generators goes on and
on. The next prime is 5
so the generator excluding the multiples of primes
will loop through the generator that returns non-multiples of 3 which in turn
loops through the generator that produces non-multiple of 2.
This streaming of data through multiple generators can be applied to the space and sometime time efficient processing of any other stream of massive data such as log files and data bases. Generators however have other nifty and mind-blowing use cases as will be seen in the following sections.
8.3 From Generators To Coroutines
“Subroutines are special cases of … coroutines.”
– Donald Knuth
A subroutine is a set of program instructions bundled together to perform a
specific task. Functions and methods are examples of subroutines in Python.
Subroutines have a single point of entry or exit; this is seen in ordinary
functions and methods which once called execute till they exit and cannot be suspended. Coroutines however are a more general program construct that allow multiple entry points
for suspending and resuming execution. Multiple entry points for suspending and resuming
sounds exactly just like what the yield
expression provides to generator
functions and in-fact one could argue that Python generators are in-fact
coroutines because they allow the production and consumption of values at their suspension or resumption points. The send()
method of generators added in Python version 2.5 provides generators with the ability to consume data when a generator resumes execution. The documentation provided
for the send()
method by the Python documentation follows.
generator.send(value): Resumes the execution and “sends” a value into the generator function. The value argument becomes the result of the current yield expression. The send() method returns the next value yielded by the generator, or raises StopIteration if the generator exits without yielding another value. When send() is called to start the generator, it must be called with None as the argument, because there is no yield expression that could receive the value.
The above explanation maybe a little ambiguous so an illustration of the use of the send()
method is provided with the following example.
The generator is initialized and run as shown in the following snippet.
To fully grasp the send()
method, observe that the argument passed to the
send()
method of the generator will be the result of the yield
expression
so in the above example, the value that send()
is called with is assigned to
the variable, line
. The rest of the function is straightforward to
understand. Note that calling send(None)
is equivalent to calling the
generator’s __next__()
method.
Simulating Multitasking with Coroutines
The ability to send data into generators using the send()
method truly
expands the frontier for generators. Now think carefully about the tools in our
arsenal at this point:
A little thinking shows that multiple generators or coroutines can be scheduled to run in an interleaved manner and it would be like they were executing simultaneously. With that, we have multitasking or something like it. In this section, rudimentary multitasking is simulated to illustrate the versatility of generators/coroutines.
In reality, even a full blown multitasking operating system is only ever executing a single task at one time. A task is any set of instructions with own internal state. For example, a simple task may take a stream of text count the occurrence of some words and print a running count of some words in the stream or may just print any word it receives. What is important here is that tasks are totally independent of each other. The illusion of multitasking is achieved by giving each task a slice of time to run until it encounters a trap which forces it to stop running so that other tasks can run. This happens so fast that human users cannot sense what is happening. This can easily be simulated with a collection of coroutines that run independent of each as shown in this section. A very simple example of this multitasking is shown in the following snippet in which text is read from a source and then sent for processing to multiple coroutines. In the snippet, a task is modelled as a thin wrapper around a coroutine.
Observer how the outputs are interleaved because execution of each coroutine
happens for a limited time then another coroutines executes.
The above example is very instructive in showing the power of generators and
coroutines. The above has just been provided for illustration purposes. The
asyncio
module is provided in Python 3.5 for concurrent programming using
coroutines.
8.4 The yield from
keyword
Sometimes re-factoring a code block may involve moving some functionality into
another generator. Using just the yield
keyword may prove cumbersome in some
of these cases and impossible in other cases such as when there is a need to send
data to the delegated generator that was sent to the delegating generator.
This kind of scenarios led to the introduction of the yield from
keyword.
This keyword allows a section of code containing yield
to be moved into another
generator. Furthermore, the delegated generator can return a value that is made
available to the delegating generator. An instructive example on how the
yield from
keyword works is given in the following example (note that a call to
the range
function returns a generator).
As previously mentioned, yielding data from a delegated generator was not the only reason for the introduction of the
yield from
keyword because the previous yield from
snippet can be replicated without
yield from
as shown in the following example.
The real benefit of using the new yield from
keyword comes from the ability of
a calling generator to send values into the delegated generator as shown in the
following example. Thus if a value is sent into a generator yield from
enables
that generator to also implicitly send the same value into the delegated generator.
The complete semantics for yield from
is explained in PEP 380 and given below.
- Any values that the iterator yields are passed directly to the caller.
- Any values sent to the delegating generator using
send()
are passed directly to the iterator. If the sent value isNone
, the iterator’s__next__()
method is called. If the sent value is notNone
, the iterator’ssend()
method is called. If the call raises
StopIteration`, the delegating generator is resumed. Any other exception is propagated to the delegating generator. - Exceptions other than
GeneratorExit
thrown into the delegating generator are passed to thethrow()
method of the iterator. If the call raisesStopIteration
, the delegating generator is resumed. Any other exception is propagated to the delegating generator. - If a
GeneratorExit
exception is thrown into the delegating generator, or theclose()
method of the delegating generator is called, then theclose()
method of the iterator is called if it has one. If this call results in an exception, it is propagated to the delegating generator. Otherwise,GeneratorExit
is raised in the delegating generator. - The value of the yield from expression is the first argument to the
StopIteration
exception raised by the iterator when it terminates. -
return expr
in a generator causesStopIteration(expr)
to be raised upon exit from the generator.
8.5 A Game of Life
To end the chapter, a very simple game, Conway’s Game of Life, is implemented using generators and coroutines to simulate the basics of the game; a thorough understanding of this example will prove further enlightening. This example is inspired by Brett Slatkin’s Effective Python chapter on using coroutines for running thousands of function and all credits are due to him.
An explanation of the Game of Life as given by Wikipedia follows. The Game of Life is a simulation that takes place on a two-dimensional orthogonal grid of cells each of which is either in an alive or dead state. Each cell transitions to a new state in a step of time by its interaction with its neighbours, which are the cells that are horizontally, vertically, or diagonally adjacent. At each step of time, the following transitions occur:
The initial pattern of cells on the grid constitutes the seed of the system. The first generation is created by applying the above rules simultaneously to every cell and the discrete moment at which this happens is sometimes called a tick. The rules continue to be applied repeatedly to create further generations.
In the following implementation, each cell’s simulation is carried out using a coroutine
with the state of the cells stored in a Grid
object from generation to generation.
Generators are a fascinating topic and this chapter has barely scratched the surface of what is possible. David Beazley gave a series of excellent talks, 1,2 and 3, that go into great detail about very advanced usage of generators.
9. MetaProgramming and Co.
Metaprogramming is quite an interesting area of programming. Metaprogramming deals with code that manipulates other code. It is a broad
category that covers areas such as function decorators, class decorators, metaclasses and the use of built-ins like exec
, eval
and
context managers etc. These constructs sometimes help to prevent repetitive code and most times add new functionality to a piece of code
in elegant ways. In this chapter, decorators, metaclasses and context managers are discussed.
9.1 Decorators
A decorator is a function that wraps another function or class. It introduces new functionality to the wrapped class or function without altering the original functionality of such class or function thus the interface of such class or function remains the same.
Function Decorators
A good understanding of functions as first class objects is important in order to understand function decorators. A reader will be well served by reviewing the material on functions. When functions are first class objects the following will apply to functions:
- Functions can be passed as arguments to other functions.
- Functions can be returned from other function calls.
- Functions can be defined within other functions resulting in closures.
The above listed properties of first class functions provide the foundation needed to explain function decorators. Put simply, function decorators are “wrappers” that enable the execution of code before and after the function they decorate without modifying the function itself.
Function decorators are not unique to Python so to explain them, Python function decorators and the corresponding syntax are ignored for the moment and instead the essence of function decorators is focused on. To understand what decorators do, a very trivial function is decorated with another trivial function that logs calls to the decorated function in the following example. This function decoration is achieved using function composition as shown below (follow the comments):
In the trivial example defined above, the decorator adds a new feature, printing some information before and after the original function
call, to the original function without altering it. The decorator, logger
takes a function to be decorated, print_full_name
and returns
a function, func_wrapper
that calls the decorated function, print_full_name
, when it is executed. The decoration process here is calling
the decorator with the function to be decorated as argument. The function returned, func_wrapper
is closed over the reference to the
decorated function, print_full_name
and thus can invoke the decorated function when it is executing. In the above, calling
decorated_func
results in print_full_name
being executed in addition to some other code snippets that implement new functionality.
This ability to add new functionality to a function without modifying the original function is the essence of function decorators. Once
this concept is understood, the concept of decorators is understood.
Decorators in Python
Now that the essence of function decorators have been discussed, an attempt is made to de-construct Python constructs that enable the
definition of decorators more easily. The previous section describes the essence of decorators but having to use decorators via function
compositions as described is cumbersome. Python introduces the @
symbol for decorating functions. Decorating a function using the Python
decorator syntax is achieved as shown in the following example.
Calling stand_alone_function
now is equivalent to calling decorated_func
function from the previous section but there is no longer
a need to to define the intermediate decorated_func
.
It is important to understand what the @
symbol does with respect to decorators in Python. The @decorator
line does not define a python
decorator rather one can think of it as syntactic sugar for decorating a function. I like to define decorating a function as the
process of applying an existing decorator to a function. The decorator is the actual function, decorator
, that adds the new
functionality to the original function. According to PEP 318, the following decorator
snippet
is equivalent to
without the intermediate func
argument. In the above, @dec1
and @dec2
are the decorator invocations. Stop, think carefully and ensure you understand this. dec1
and dec2
are function object references and these are the actual decorators. These values can even be replaced by any function call or a value that when evaluated returns a function that takes another function. What is of paramount importance is that the name reference following the @
symbol is a reference to a function object (for this tutorial we assume this should be a function object but in reality it should be a callable object) that takes a function as argument. Understanding this profound fact will help in understanding python decorators and more involved decorator topics such as decorators that take arguments.
Passing Arguments To Decorated Functions
Arguments are supplied to functions that are being decorated by simply passing the arguments into the function that wraps, i.e the inner function returned when the decorator is invoked, the decorated function. This is illustrated with the following example.
Note how the *args
and **kwargs
parameters are used in defining the inner wrapper function; this is for the simple reason that it cannot be known beforehand what functions are going to be decorated and thus the function signature of such functions.
Decorator Functions with Arguments
Decorator functions can also be defined to take arguments but this is more involved than the case of passing functions to decorated functions. The following example illustrates this.
As mentioned previously, the key to understanding what is going on with this is to note that we can replace the reference value following the @
in a function decoration with any value that evaluates to a function object that takes another function as argument. In the above snippet, the value returned by the function call, decorator_maker_with_arguments("Apollo 11 Landing")
, is the decorator. The call evaluates to a function, decorator
that accepts a function as argument. Thus the decoration @decorator_maker_with_arguments("Apollo 11 Landing")
is equivalent to @decorator
but with the decorator, decorator
, closed over the argument, Apollo 11 Landing
by the decorator_maker_with_arguments
function call. Note that the arguments supplied to a decorator can not be dynamically changed at run time as they are executed on script import.
Functools.wrap
Using decorators involves swapping out one function for another. A result of this is that meta information such as docstrings in the swapped out function are lost when using a decorator with such function. This is illustrated below:
In the above example, an attempt to print the documentation string returns None
because the decorator has swapped out the print_full_name
function with the func_wrapper
function that has no documentation string.
Even the function name now references the name of the wrapper function rather than the actual function. This, most times, is not what we want when using decorators. To work around this Python functools
module provides the wraps
function that also happens to be a decorator. This decorator is applied to the wrapper function and takes the function to be decorated as argument. The usage is illustrated in the following example.
Class Decorators
Like functions, classes can also be decorated. Class decorations server the same purpose as function decorators - introducing new functionality without modifying the actual classes. An example of a class decorator is given in the following singleton decorator that ensures that only one instance of a decorated class is ever initialised throughout the lifetime of the execution of the program.
Putting the decorator to use in the following examples shows how this works. In the following example, the Foo
class is initialized twice however comparing the ids of both initialized objects shows that they both refer to the same object.
The same singleton functionality can be achieved using a metaclass by overriding the __call__
method of the metaclass as shown below:
Applying Decorators to instance and static methods
Instance, static and class methods can also be decorated. The important thing is to take note of the order in which the decroators are placed in static and class methods. The decorator must come before the static and class method decorators that are used to create static and class methods because these method decorators do not return callable objects. A valid example of method decorators is shown in the following example.
9.2 Decorator Recipes
Decorators have a wide range of applications in python; this section discusses some interesting uses of decorators providing the implementation for such decorators. The following are just samples of the possible applications of decorators. A more comprehensive listing of recipes including the examples listed that are discussed in this section can be found at Python decorator library website. A major benefit of a lot of decorators is that cross cutting concerns such as logging information can be done in a single place, the decorator, rather across multiple functions. The benefit of having such functionality in one place is glaringly obvious as changes are localised and way easier to maintain. The following recipes illustrate this.
- Decorators provide a mean to log information about other functions; these maybe information such as timing information or argument information. An example of such a decorator is shown in the following example.
- A memoization decorator can be used to decorate a function that performs a calculation so that for a given argument if the result has been previously computed, the stored value is returned but if it has not then it is computed and stored before it is returned to the caller. This kind of decorator is available in the functools module as discussed in the chapter on functions. An implementation for such a decorator is shown in the following example.
- Decorators could also easily be used to implement functionality that retries a callable up to a maximum amount of times.
- Another very interesting decorator recipe is the use of decorators to enforce types for function call as shown in the following example.
- A common use of class decorators is for registering classes as the class statements are executed as shown in the following example.
A more comprehensive listing of recipes including the examples listed that are discussed in this section can be found at Python decorator library website.
9.3 Metaclasses
“Metaclasses are deeper magic than 99% of users should ever worry about. If you wonder whether you need them, you don’t”
– Tim Peters
All values in Python are objects including classes so a given class object must have another class from which it is created. Consider, an instance, f
, of a user defined class Foo
. The type/class of the instance, f
, can be found by using the built-in method, type
and in the case of the object, f
,the type of f
is Foo
.
This introspection can also extended to a class object to find out the type/class of such a class. The following example shows the result of applying the type()
function to the the Foo
class.
In Python, the class of all other class objects is the type
class. This applies to user defined classes as shown above as well as built-in classes as shown in the following code example.
A class such as the type
class that is used to create other classes is called a metaclass. That is all there is to a metaclass - a metaclass is a class that are used in creating other classes. Custom metaclasses are not used often in Python but sometimes it is necessary to control the way classes are created most especially when working on big projects with big team.
Before explaining how metaclasses are used to customize class creation, a recap of how class objects are created when a class statement is encountered during the execution of a program is provided.
The following snippet is the class definition for a simple class that every Python user is familiar with but this is not the only way a class can be defined.
The following snippet shows a more involved method for defining the same class with all the syntactic sugar provided by the class
keyword stripped away. This snippet provides a better understanding of what actually goes on under the covers during the execution of a class statement.
During the execution of class statement, the interpreter carries out the following procedures behind the scene:
- The body of the class statement is isolated in a string.
- A class dictionary representing the namespace for the class is created.
- The body of the class is executed as a set of statements within this namespace.
- As a final step in the process, the class object is created by instantiating the
type
class passing in the class name, base classes and class dictionary as arguments. Thetype
class used here in creating theAccount
class object is the metaclass. The metaclass value used in creating the class object can be explicitly specified by supplying themetaclass
keyword argument in theclass
definition. In the case that it is not supplied, the class statement examines the first entry in the tuple of the the base classes if any. If no base classes are used, the global variable__metaclass__
is searched for and if no value is found for this, the default metaclass value is used.
Armed with a basic understanding of metaclasses, a discussion of their applications follows.
Metaclasses in Action
It is possible to define custom metaclasses that can be used when creating classes. These custom metaclasses will normally inherit from type
and re-implement certain methods such as the __init__
or __new__
methods.
Imagine that you are the chief architect for a shiny new project and you have diligently read dozens of software engineering books and style guides that have hammered on the importance of docstrings so you want to enforce the requirement that all non-private methods in the project must have *docstrings; how would you enforce this requirement?
A simple and straightforward solution is to create a custom metaclass for use across the project that enforces this requirement. The snippet that follows though not of production quality is an example of such a metaclass.
DocMeta
is a type
subclass that overrides the type
class __init__
method. The implemented __init__
method iterates through all the class attributes searching for non-private methods missing a docstring; if such is encountered an exception is thrown as shown below.
Another trivial example that illustrates an application of a metaclass is in the creation of a final class, that is a class that cannot be sub-classed. Some people may argue that final classes are unpythonic but for illustration purposes such functionality is implemented using a metaclass in the following snippet.
In the example, the metaclass simply performs a check ensuring that the final class is never part of the base classes for any class being created.
Another very good example of a metaclass in action is in Abstract Base Classes that was previously discussed. When defining an abstract base class, the ABCMeta
metaclass from the abc
module is used as the metaclass for the abstract base class being defined and the @abstractmethod
and @abstractproperty
decorators are used to create methods and properties that must be implemented by non-abstract subclasses.
Once a class implements all abstract methods then such a class becomes a concrete class and can be instantiated by a user.
Overriding __new__
vs __init__
in Custom Metaclasses
Sometimes, there is confusion over whether to override the __init__
or __new__
method when defining custom metaclasses. The decision about which to override depends the objective of such custom metaclasses. If the intent is to modify the class by changing some class attribute such as the list of base classes then the __new__
method should be overridden. The following example is a metaclass that checks for camel case attribute names and converts such to names with underscores between words.
It would not be possible to modify class attributes such as the list of base classes or attribute names in the __init__
method because as has been said previously, this method is called after the object has already been created.
On the other hand, when the intent is just to carry out initialization or validation checks such as was done with the DocMeta
and Final
metaclasses then the __init__
method of the metaclass should be overridden.
9.4 Context Managers
Sometimes, there is a need to execute some operations between another pair of operations. For example, open a file, read from the file and close the file or acquire a lock on a data structure, work with the data structure and release the data structure. These kinds of requirements come up most especially when dealing with system resources where the resource is acquired, worked with and then released. It is important that the acquisition and release of such resources are handled carefully so that any errors that may occur are correctly handled. Writing code to handle this all the time leads to a lot of repetition and cumbersome code. Context managers provide a solution to this. They provide a mean for abstracting away a pair of operations that are executed before and after another group of operation using the with
statement. The with
statement enables a set of operations to run within a context. The context is controlled by a context manager object.
An example of the use of the with
statement is in opening files; this involves a pair of operations - opening and closing the file.
The with
statement can be used with any object that implements the context management protocol. This protocol defines a set of operations, __enter__
and __exit__
that are executed just before the start of execution of some piece of code and after the end of execution of some piece of code respectively. Generally, the definition and use of a context manager is shown in the following snippet.
If the initialised resource is used within the context then the __enter__
method must return the resource object so that it is bound within the with
statement using the as
mechanism. A resource object must not be returned if the code being executed in the context doesn’t require a reference to the object that is set-up. The following is a very trivial example of a class that implements the context management protocol in a very simple fashion.
When the with
statement executes, the __enter__()
method is called to create a new context; if a resource is initialized for use here then it is returned but this is not the case in this example. After the operations within the context are executed, the __exit__()
method is called with the type
, value
and traceback
as arguments. If no exception is raised during the execution of the of the operations within the context then all arguments are set to None
. The __exit__
method returns a True
or False
depending on whether any raised exceptions have been handled. When False
is returned then exception raised are propagated outside of the context for other code blocks to handle. Any resource clean-up is also carried out within the __exit__()
method. This is all there is to context management. Now rather than write try...finally
code to ensure that a file is closed or that a lock is released every time such resource is used, such chores can be handled in the the __exit__
method of a context manager class thus eliminating code duplication and making the code more intelligible.
The Contextlib
module
For very simple use cases, there is no need to go through the hassle of implementing our own classes with __enter__
and __exit__
methods. The python contextlib
module provides us with a high level method for implementing context manager. To define a context manager, the @contextmanager
decorator from the contextlib
module is used to decorate a function that handles the resource in question or carries out any initialization and clean-up; this function carrying out the initialization and tear down must however be a generator function. The following example illustrates this.
This context generator function, time_func
in this case, must yield exactly one value if it is required that a value be bound to a name in the with
statement’s as
clause. When generator yields, the code block nested in the with
statement is executed. The generator is then resumed after the code block finishes execution. If an exception occurs during the execution of a block and is not handled in the block, the exception is re-raised inside the generator at the point where the yield
occurred. If an exception is caught for purposes other than adequately handling such an exception then the generator must re-raise that exception otherwise the generator context manager will indicate to the with
statement that the exception has been handled, and execution will resume normally after the context block.
Context managers just like decorators and metaclasses provide a clean method for abstracting away these kind of repetitive code that can clutter code and makes following code logic difficult.
10. Modules And Packages
Modules and packages are the last organizational unit of code that are discussed. They provide the means by which large programs can be developed and shared.
10.1 Modules
Modules enable the reuse of programs. A module is a file that contains a collection of definitions and statements and has a .py
extension. The contents of a module can be used by importing the module either into another module or into the interpreter. To illustrate this, our favourite Account
class shown in the following snippet is saved in a module called account.py
.
To re-use the module definitions, the import
statement is used to import the module as shown in the following snippet.
All executable statements contained within a module are executed when the module is imported. A module is also an object that has a type - module
as such all generic operations that apply to objects can be applied to modules. The following snippets show some unintuitive ways of manipulating module objects.
Each module possesses its own unique global namespace that is used by all functions and classes defined within the module and when this feature is properly used, it eliminates worries about name clashes from third party modules. The dir()
function without any argument can be used within a module to find out what names are available in a module’s namespace.
As mentioned, a module can import another module; when this happens and depending on the form of the import
, the imported module’s name, part of the name defined within the imported module or all names defined with the imported module could be placed in the namespace of the module doing the importing. For example, from account import Account
imports and place the Account
name from the account
module into the namespace, import account
imports and adds the account
name referencing the whole module to the namespace while from account import *
will import and add all names in the account module except those that start with an underscore to the current namespace. Using from module import *
as a form of import is strongly advised against as it may import names that the developer is not aware of and that conflict with names used in the module doing the importing. Python has the __all__
special variable that can be used within modules. This value of the __all__
variable should be a list that contains the names within a module that are imported from such module when the from module import *
syntax is used. Defining this method is totally optional on the part of the developer. We illustrate the use of the __all__
special method with the following example.
The name of an imported module is gotten by referencing the __name__
attribute of the imported module. In the case of a module that is currently executing, the __name__
value is set to __main__
. Python modules can be executed with python module <arguments>
. A corollary of the fact that the __name__
of the currently executing module is set to __main__
is that we can have a recipe such as the following.
That makes the module usable as a standalone script as well as an importable module. A popular use of the above recipe is for running unittest; we can run the module as a standalone to test it but then import it for use into another module without running the test cases.
Reloading Modules
Once modules have been imported into the interpreter, any change to such a module is not reflected within the interpreters. However, Python provides the importlib.reload
that can be used to re-import a module once again into the current namepace.
10.2 How are Modules found?
Import statements are able to import modules that are in any of the paths given by the sys.path
variable. The import system uses a greedy strategy in which the first module found is imported. The content of the sys.path
variable is unique to each Python installation. An example of the value of the sys.path
variable on a Mac operating system is shown in the following snippet.
The sys.path
list can be modified at runtime by adding or removing elements from this list. However, when the interpreter is started conventionally, the sys.path
list contains paths that come from three sources namely: sys.prefix
, PYTHONPATH and initialization by the site.py
module.
-
sys.prefix
: This variable specifies the base location for a given Python installation. From this base location, the Python interpreter can work out the location of the Python standard library modules. The location of the standard library is given by the following paths.
The paths of the standard library can be found by running the Python interpreter with the -S
option; this prevents the site.py
initialization that adds the third party package paths to the sys.path
list. The location of the standard library can also be overriden by defining the PYTHONHOME environment variable that replaces the sys.prefix
and sys.exec_prefix
.
-
PYTHONPATH: Users can define the PYTHONPATH environment variable and the value of this variable is added as the first argument to the
sys.path
list. This variable can be set to the directory where a user keeps user defined modules. -
site.py
: This is a path configuration module that is loaded during the initialization of the interpreter. This modules adds site-specific paths to the module search path. Thesite.py
starts by constructing up to four directories from a prefix and a suffix. For the prefix, it usessys.prefix
andsys.exec_prefix
. For the suffix, it uses the empty string and thenlib/site-packages
on Windows orlib/pythonX.Y/site-packages
on Unix and Macintosh. For each of these distinct combinations, if it refers to an existing directory, it is added to thesys.path
and further inspected for configuration files. The configuration files are files with.pth
extension. The contents are additional items one per line to be added tosys.path
. Non-existing items are never added tosys.path
, and no check is made that the item refers to a directory rather than a file. Each item is added tosys.path
once. Blank lines and lines beginning with # are skipped. Lines starting withimport
followed by space or tab are executed. After these path manipulations, an attempt is made to import a module namedsitecustomize
that can perform arbitrary site-specific customizations. It is typically created by a system administrator in the site-packages directory. If this import fails with anImportError
exception, it is silently ignored. After this, ifENABLE_USER_SITE
is true, an attempt is made to import a module namedusercustomize
that can perform arbitrary user-specific customizations, . This file is intended to be created in the user site-packages directory that is part ofsys.path
unless disabled by-s
. AnyImportError
is silently ignored.
10.3 Packages
Just as modules provide a mean for organizing statements and definitions, packages provide a mean for organizing modules. A close but imperfect analogy of the relationship of packages to modules is that of folders to files on computer file systems. A package just like a folder can be composed of a number of module files.
In Python however, packages are just like modules; in fact all packages are modules but not all modules are packages. The difference between a module and package is the presence of a __path__
special variable in a package object that does not have a None
value. Packages can have sub-packages and so on; when referencing a package and it corresponding sub-packages the dot notation is used so a complex number sub-package within a mathematics package will be referenced as math.complex
.
There are currently two types of packages:- regular packages and namespace packages.
Regular Packages
A regular package is one that consists of a group of modules in a folder with an __init__.py
module within the folder. The presence of this __init__.py
file within the
folder cause the interpreter to treat the folder as a package. An example of package structure is the following.
The parent, one and two folders are all packages because they all contain an __init__.py
module within each of their respective folders. one
and two
are sub-packages
of the parent
package. Whenever a package is imported, the __init__.py
module of such a package is executed. One can think of the __init__.py
as the store of attributes for the
package - only symbols defined in this module are attributes of the imported module. Assuming the __init__.py
module from the above parent
package
is empty and the package, parent
, is imported using import parent
, the parent
package will have no module or subpackage as an attribute. The following code listing
shows this.
As the example shows, none of the modules or sub-packages is listed as an attribute of the imported package object. On the other hand, if a symbol,
package="testing packages"
, is defined in the __init__.py
module of the parent package and the parent package is imported, the package object has this symbol as an attribute as shown in the following code listing .
When a sub-package is imported, all __init__.py
modules in parent packages are imported in addition to the __init__.py
module of the sub-package.
Sub-packages are referenced during import using the dot notation just like modules in packages are. In the previous package structure, the notation would be parent.one
to reference the one
sub-package. Packages support the same kind of import
semantics as modules; individual modules or packages can be
imported as in the following example.
When the above method is used then the fully qualified name for the module, parent.one.a
, must be used to access any symbol in the module. Note that
when using this method of import, the last symbol can be either a module or sub-package only; classes, functions or variables defined within modules are
not allowed. It is also possible to import
just the module or sub-package that is needed as the following example shows.
Symbols defined in the a
module or modules in the one
package can then be referenced using dot notation with just a
or one
as the prefix. The import forms,
from package import *
or from package.subpackage import *
, can be used to import all the modules in a package or sub-package. This form of import should however be
used carefully if ever used as it may import some names into the namespace that may cause naming conflicts. Packages support the __all__
(the value of this should by convention be a list) variable for listing modules or names that are visible when the package is imported using the from package
import *
syntax. If __all__
is not defined, the statement from package import *
does not import all submodules from the package into the current namespace rather
it only ensures that the package has been imported possibly running any initialization code in __init__.py
and then imports whatever symbols are defined in the
__init__.py
module; including any names defined here and submodules imported here.
Namespace Packages
A namespace package is a package in which the component modules and sub-packages of the package may reside in multiple different locations. The various components
may reside on different part of the file system, in zip files, on the network or on any other location searched by interpreter during the import process however when
the package is imported, all components exist in a common namespace. To illustrate a namespace package, observe the following directory structures
containing modules; both directories, apollo
and gemini
could be located on any part of the file system and not necessarily next to each other.
In these directories, the name, space
, is used as a common namespace and will serve as the package name. Observe the absence of __init__.py
modules in either directory. The absence of this module within these directories is a signal to the interpreter that it should create a namespace package when it encounters such. To be able to import this space
package, the paths for its components must first of all be added to interpreter’s module search path, sys.path
.
Observe that the two different package directories are now logically regarded as a single name space and either space.test
or space.test1
can be imported as if they existed in the same package. The key to a namespace package is the absence of the __init__.py
modules in the top-level directory that serves as the common namespace. The absence of the __init__.py
module causes the interpreter to create a list of all directories within its sys.path
variable that contain a matching directory name rather than throw an exception. A special namespace package module is then created and a read-only copy of the list of directories is stored in its __path__
variable. The following code listing gives an example of this.
Namespaces bring added flexibility to package manipulation because namespaces can be extend by anyone with their own code thus eliminating the need to modify package structures in third party packages. For example, suppose a user had his or her own directory of code like this:
Once this directory is added to sys.path
along with the other packages, it would seamlessly merge together with the other space
package directories and the contents
can also be imported along with any existing artefacts.
10.4 The Import System
The import
statement and importlib.import_module()
function provide the required import functionality in Python. A call to the import
statement combines two actions:
- A search operation to find the requested module through a call to the
__import__
statement and - A binding operation to add the module returned from operation 1 to the current namespace.
If the __import__
call does not find the requested module then an ImportError
is returned.
The Import Search Process
The import mechanism uses the fully qualified name of the module for the search. In the case that the fully qualified name is a sequence of names separated by dots e.g foo.bar.baz
, the interpreter will attempt to import foo
followed by bar
followed by bar
. If any of these modules is not found then an ImportError
is raised.
The sys.modules
variable is a cache for all previously imported modules and is the first port of call in the module search process. If a module is present in the sys.modules
cache then it is returned otherwise an ImportError
is raised and the search continues. The sys.modules
cache is writeable so user code can manipulate the content of the cache. An example of the content of the cache is shown in the following snippet.
Finders and Loaders
When a module is not found in the cache, the interpreter makes use of its import protocol to try and find the module. The Python import protocol defines finder and loader objects. Objects that implement both interfaces are called importers.
Finders define strategies for locating modules. Modules maybe available locally on the file system in regular files or in zipped files, or in other locations such as a database or even at a remote location. Finders have to be able to deal with such locations if modules are going to be imported from any of such locations. By default, Python has support for finders that handle the following scenarios.
- Built-in modules,
- Frozen modules and
- Path based modules - this finder handles imports that have to interact with the import path given by the
sys.path
variable as shown in the following.
These finders are located in the sys.meta_path
variable as shown in the following snippet.
The interpreter continues the search for the module by querying each finder in the meta_path
to find which can handle the module. The finder objects must implement the find_spec
method that takes three arguments: the first is the fully qualified name of the module, the second is an import path that is used for the module search - this is None
for top level modules but for sub-modules or sub-packages, it is the value of the parent package’s __path__
and the third argument is an existing module object that is passed in by the system only when a module is being reloaded.
If one of the finders locates the module, it returns a module spec that is used by the interpreter import machinery to create and load the module (loading is tantamount to executing the module). The loaders carry out the module execution in the module’s global namespace. This is done by a call to the importlib.abc.Loader.exec_module()
method with the already created module object as argument.
Customizing the import process
The import process can be customized via import hooks. There are two types of this hook: meta hooks and import path hooks.
Meta hooks
These are called at the start of the import process immediately after the sys.modules
cache lookup and before any other process. These hooks can override any of the default finders search processes. Meta hooks are registered by adding new finder objects to the sys.meta_path
variable.
To understand how a custom meta_path
hook can be implemented, a very simple case is illustrated. In online Python interpreters, some built-in modules such as the os
are disabled or restricted to prevent malicious use. A very simple way to implement this is to implement a meta import hook that raises an exception any time a restricted import is attempted; the following snippet shows such an example.
Import Path hooks
These hooks are called as part of the sys.path
or package.__path__
processing. Recall from our previous discussion that a path based finder is one of the default meta-finder and this finder works with entries in the sys.path
variable. The meta path based finder delegates the job of finding modules on the sys.path
variables to other finders - these are the import path hooks. The sys.path_hooks
is a collection of built in path entry finders. By default, the Python interpreter has support for processing files in zip folders and normal files in directories as shown in the following snippet.
Each hooks knows how to handle a particular kind of file. For example, an attempt to get the finder for one of the entries in sys.path
is attempted in the following snippet.
New import path hooks can be added by inserting new callables into the sys.path_hooks
.
Why You Probably Should Not Reload Modules…
Now that we understand that the last step of a module import is the exec
of the module code within the global namespace of the importing module, it is clearer why it maybe a bad idea to use the importlib.reaload
to reload modules that have changed.
A module reload does not purge the global namespace of objects from the module being imported. Imagine a module, Foo
, that has a function, print_name
imported into another module, Bar
; the function, Foo.print_name
, is referenced by a variable, x
, in the module, Bar
. Now if the implementation for print_name
is changed for some reason and then reloaded in Bar
, something interesting happens. Since the reload of the module Foo
will cause an exec
of the module contents without any prior clean-up, the reference that x
holds to the previous implementation of Foo.print_name
will persist thus we have two implementations and this is most probably not the behaviour expected.
For this reason, reloading a module is something that maybe worth avoiding in any sufficiently complex Python program.
10.5 Distributing Python Programs
Python provides the distutils
module for packaging up Python code for distribution. Assuming the program has been properly written, documented and structured then distributing it is relatively straightforward using distutils
. One just has to:
A set-up script using distutils
is a setup.py
file. For a program with the following package structure,
an example of a simple setup.py
file is given in the following snippet.
The setup.py
file must exist at the top level directory so in this case, it should exist at parent/setup.py
. The values used in the set-up script are self explanatory. py_modules
will contain the names of all single file python modules, packages will contains a list of all packages,scripts will contain a list of all scripts within the program. The rest of the arguments though not exhaustive of the possible parameters are self explanatory.
Once the setup.py
file is ready, the following snippet is used at the commandline to create an archive file for distribution.
sdist
will create an archive file (e.g., tarball on Unix, ZIP file on Windows) containing your setup script setup.py
, your modules and packages. The archive file will be named parent-1.0.tar.gz (or .zip), and will unpack into a directory parent-1.0.
. To install the created distribution, the file is unzipped and python setup.py install
is run inside the directory. This will install the package in the site-packages
directory for the installation.
One can also create one or more built distributions for programs. For instance, if running a Windows machine, one can make the use of the program easy for end users by creating an executable installer with the bdist_wininst
command. For example:
python setup.py bdist_wininst
will create an executable installer, parent-1.0.win32.exe
, in the current directory.
Other useful built distribution formats are RPM, implemented by the bdist_rpm
command, bdist_pkgtool
for Solaris, and bdist_sdux
for HP-UX install.
It is important to note that the use of distutils
assumes that the end user of a distributed package will have the python interpreter already installed.
11. Inspecting Objects
The Python inspect
module provides powerful methods for interacting with live Python objects. The methods provided by this module help with type checking, sourcecode retrieval, class and function inspection and Interpreter stack inspection. The documentation is the golden source of informtion for this module but a few of the classes and methods in this module are discussed to show the power of this module.
11.1 Handling source code
The inspect
module provides functions for accessing the source code of functions, classes and modules. All examples are carried out using our Account
class as defined in the following snippet.
Some of the methods from the inspect
module for handling source code include:
-
inspect.getdoc(object)
: This returns the documentation string for the argument object. The string returned is cleaned up withinspect.cleandoc()
. -
inspect.getcomments(object)
: This returns a single string containing any lines of comments. For a class, function or method these are comments immediately preceding the argument object’s source code while for a module, it is the comments at the top of the Python source file. -
inspect.getfile(object)
: Return the name of the file in which an object was defined. The argument should be a module, class, method, function, traceback, frame or code object. This will fail with aTypeError
if the object is a built-in module, class, or function. -
inspect.getmodule(object)
: This function attempts to guess the module that the argument object was defined in. -
inspect.getsourcefile(object)
: This returns the name of the Python source file in which the argument object was defined. This will fail with aTypeError
if the object is a built-in module, class, or function. -
inspect.getsourcelines(object)
: This returns a tuple of the list of source code lines and the line number on which the source code for the argument object begins. The argument may be a module, class, method, function, traceback, frame, or code object. AnOSError
is raised if the source code cannot be retrieved. -
inspect.getsource(object)
: Return the human readable text of the source code for the argument object. The argument may be a module, class, method, function, traceback, frame, or code object. The source code is returned as a single string. AnOSError
is raised if the source code cannot be retrieved. Note the difference between this andinspect.getsourcelines
is that this method returns the source code as a single string whileinspect.getsourcelines
returns a list of source code lines. -
inspect.cleandoc(doc)
: This cleans up indentation from documentation strings that have been indented to line up with blocks of code. Any white-space that can be uniformly removed from the second line onwards is removed and all tabs are expanded to spaces.
11.2 Inspecting Classes and Functions
The inspect
module provides some classes and functions for interacting with classes and functions. Signature
, Parameter
and BoundArguments
are important classes in the the inspect module.
-
Signature
: This can be used to represent the call signature and return annotation of a function or method. ASignature
object can be obtained by calling theinspect.signature
method with a function or method as argument. Each parameter accepted by the function or method is represented as aParameter
object in the parameter collection of theSignature
object.Signature
objects support thebind
method for mapping from positional and keyword arguments to parameters. Thebind(*args, **kwargs)
method will return aBoundsArguments
object if*args
and**kwargs
match thesignature
else it raises aTypeError
. TheSignature
class also has thebind_partial(*args, **kwargs)
method that works in the same way asSignature.bind
but allows the omission of some arguments. -
Parameter
:Parameter
objects represent function or method arguments within aSignature
. Using the previous example for illustration, the parameters of a signature can be accessed as shown in the following snippet.Important attributes of a
Parameter
object are thename
andkind
attributes. The kind attribute is a string that could either bePOSITIONAL_ONLY
,POSITIONAL_OR_KEYWORD
,VAR_POSITIONAL
,KEYWORD_ONLY
orVAR_KEYWORD
. -
BoundArguments
: This is the return value of aSignature.bind
orSignature.partial_bind
method call.A
BoundArguments
objects contains the mapping of arguments to function or method parameters.A
BoundArguments
object has the following attributes.-
args
: this is a tuple of postional parameter argument values. -
arguments
: this is an ordered mapping of parameter argument names to parameter argument values. -
kwargs
: this is adict
of keyword argument values. -
signature
: this is a reference to the parentSignature
object.
-
Interesting functionality can be implemented by making use of these classes mentioned above. For example, we can implement a rudimentary type checker
decorator for a function by making use of these classes as shown in the following snippet (thanks to the python cookbook for this).
The defined decorator, type_assert
, can then be used to enforce type assertion as shown in the following example.
The bind_partial
is used rather than bind
so that we do not have to specify the type for all arguments; the idea behind the partial
function from the functools
module is also behind this.
The inspect
module further defines some functions for interacting with classes and functions. A cross-section of these functions include:
-
inspect.getclasstree(classes, unique=False)
: This arranges the list of classes into a hierarchy of nested lists. If the returned list contains a nested list, the nested list contains classes derived from the class whose entry immediately precedes the list. Each entry is a tuple containing a class and a tuple of its base classes.
-
inspect.getfullargspec(func)
: This returns the names and default values of a function’s arguments; the return value is a named tuple of the form:FullArgSpec(args, varargs, varkw, defaults, kwonlyargs, kwonlydefaults, annotations)
.-
args
is a list of the argument names. -
varargs
andvarkw
are the names of the*
and**
arguments orNone
. -
defaults
is an n-tuple of the default values of the last n arguments, or None if there are no default arguments. - kwonlyargs is a list of keyword-only argument names.
- kwonlydefaults is a dictionary mapping names from kwonlyargs to defaults.
-
annotations
is a dictionary mapping argument names to annotations.
-
-
inspect.getargvalues(frame)
: This returns information about function arguments that have been passed into a particular frame. The return value is a named tupleArgInfo(args, varargs, keywords, locals)
.-
args
is a list of the argument names. -
varargs
andkeywords
are the names of the*
and**
arguments orNone
. -
locals
is the locals dictionary of the given frame.
-
-
inspect.getcallargs(func, *args, **kwds)
: This binds theargs
andkwds
to the argument names of the function or method,func
, as if it was called with them. For bound methods, bind also the first argument typically namedself
to the associated instance. A dict is returned, mapping the argument names including the names of the*
and**
arguments, if any to their values from args and kwds. Wheneverfunc(*args, **kwds)
would raise an exception because of incompatible signature, an exception of the same type and the same or similar message is raised. -
inspect.getclosurevars(func)
: This returns the mapping of external name references in function or method,func
, to their current values. A named tupleClosureVars(nonlocals, globals, builtins, unbound)
is returned.
-
nonlocals
maps referenced names to lexical closure variables. -
globals
maps referenced names to the function’s module globals and -
builtins
maps referenced names to the builtins visible from the function body. -
unbound
is the set of names referenced in the function that could not be resolved at all given the current module globals and builtins.
-
The inspect
module also supplies functions for accessing members of objects. An example of this is the inspect.getmembers(object[, predicate])
that returns all attribute members of the object
arguments; the predicate is an optional value that serves as a filter on the values returned. For example for a given class instance, i
, we can get a list of attribute members of i
that are methods by making the call inspect.getmembers(i, inspect.ismethod)
; this returns a list of tuples of the attribute name and attribute object. The following example illustrates this.
The inspect
module has predicates for this method that include isclass
, ismethod
, isfunction
, isgeneratorfunction
, isgenerator
, istraceback
, isframe
, iscode
, isbuiltin
, isroutine
, isabstract
, ismethoddescriptor
.
11.3 Interacting with Interpreter Stacks
The inspect
module also provides functions for dealing with interpeter stacks. The interpeter stack is composed of frames. All the functions below return a tuple of the frame object, the filename, the line number of the current line, the function name, a list of lines of context from the source code, and the index of the current line within that list. The provided functions enable user to inspect and manipulate the frame records.
-
inspect.currentframe()
: This returns the frame object for the caller’s stack frame. This function relies on stack frame support in the interpreter and this is not guaranteed to exist in all implementations of Python for example stackless python. If running in an implementation without Python stack frame support this function returnsNone
. -
inspect.getframeinfo(frame, context=1)
: This returns information about the given argument frame or traceback object. A named tupleTraceback(filename, lineno, function, code_context, index)
is returned. -
inspect.getouterframes(frame, context=1)
: This returns a list of frame records for a given frame argument and all outer frames. These frames represent the calls that led to the creation of the argument frame. The first entry in the returned list represents the argument frame; the last entry represents the outermost call on the arugment frame’s stack. -
inspect.getinnerframes(traceback, context=1)
: This returns a list of frame records for a traceback’s frame and all inner frames. These frames represent calls made as a consequence of frame. The first entry in the list represents traceback; the last entry represents where the exception was raised. -
inspect.stack(context=1)
: This returns a list of frame records for the caller’s stack. The first entry in the returned list represents the caller; the last entry represents the outermost call on the stack. -
inspect.trace(context=1)
: This returns a list of frame records for the stack between the current frame and the frame in which an exception currently being handled was raised in. The first entry in the list represents the caller; the last entry represents where the exception was raised.