Table of Contents
- The Shared History of PERL and Python
- Learning through Examples
- A Quick View of Pythonland
- Numbers and Variables
- From Array to List
- From Associative Array to Dictionary
- Loops and Conditions
- From Subroutines to Functions
- Strings
- Regular Expressions
- Reading and Writing Files
- Modules and Packages
- From CPAN to PyPI
- Useful Python Packages and Tools
- Class, Iterator, etc.
- From BioPERL to Biopython
- Closing Comments
The Shared History of PERL and Python
The year 1969 was pivotal in the history of computing. Later that year Ken Thompson and Dennis Ritchie released the first version of the UNIX operating system. The eventual births of the scripting languages like PERL and Python two decades later were related to that event.
UNIX was the first operating system written almost entirely in a high-level language (C). Previously, operating systems were developed in machine-dependent assembly languages and therefore could not be ported to different types of computers. In contrast, the C-version of UNIX, released in 1973, could be ported easily to all kinds of computing hardware, and that made UNIX the most popular operating system. As a side effect, C also became the primary language for computing.
The concept of scripting originated from the UNIX world. Scripts were short informal programs, which did not need to be compiled unlike the C programs. Often they were used to search rapidly for patterns in the text files. UNIX developers not only included software tools like AWK to facilitate scripting, but also built another immensely helpful feature for approximate pattern search - regular expressions (regex). Ken Thompson initially added this capability to the UNIX editor ‘ed’. However, regexes turned out to be so useful that they eventually got adopted into many popular UNIX tools.
PERL language was created (1987) by Larry Wall to bring the capabilities of various UNIX-related scripting tools in one place and also give the scripts C-like syntax. Right when the internet boom needed a language for efficient parsing of web documents (1994), PERL was ready with a matured release including regular expressions. For almost identical reasons, PERL became popular among the bioinformaticians in the 1990s and early 2000s.
While PERL was created to give C-like structure to the UNIX scripts, Python was developed by Guido van Rossum (late 1980s) to have a simplify C and make it ready for scripting. Therefore, two languages attempted to accomplish almost the same objectives coming from two ends. Python took longer than PERL to mature, but its simpler syntax made it a favorite among the bioinformaticians since the mid-2000s. This created inconveniences for some of the earlier researchers trained in PERL, because they increasingly need to use Python to access shared public repositories or share codes with others.
Here is the good news. Although, from a cursory look, the codes written in those two languages appear as different as English and Chinese, they are actually way closer than English and French. In fact, the two languages are as comparable as British and American English. By knowing how to convert a few major differences, a PERL programmer can easily translate codes to Python.
This book will make your transition easy by showing what those major differences are in the next chapter.
Learning through Examples
In the previous chapter, you learned about the shared history of PERL and Python. Here you will see how similar the codes written in two languages are based on three examples. You will learn how to move easily from PERL to Python by making three simple changes in the code structure.
Example 1 - Hello World
The most basic “hello world” program in PERL looks like -
# This is a simple program
print “Hello from Homolog.us\n”;
You can run it by typing -
> perl hello.perl
A similar code in Python looks like -
# This is a simple program
print “Hello from Homolog.us”
You run it with -
> Python hello.py
Example 2 - Multiplication Table
In this example, we print the multiplication table of 10.
PERL
for($i=0; $i<10; $i++)
{
print “10 times, $i, “ is”, 10*$i,”\n”;
}
Python
for i in range(10):
print “10 times, i,” is”, 10*i
Example 3 - Prime Number
In this example, we check whether an integer is prime.
PERL
$num=59;
$true=1;
for($i=2; $i < $num/2; $i++)
{
if($num % $i ==0)
{
$true=0;
}
}
if($true==1)
{
print $num, “ is prime\n”;
}
else
{
print $num, “ is not prime\n”;
}
Python
num=59
true=1
for i in range(2,num/2):
if num % i ==0:
true=0
if true==1:
print num, “ is prime”
else:
print num, “ is not prime”
Let us go over the differences between the PERL and Python scripts one by one.
- You can see that the Python programs do not include semicolons at the end of the lines. By removing all semicolons from the ends of Python statements, you can make them look almost like Python.
This change has an important ramification however. PERL programmers can pack multiple statements in the same line, but that is not possible in Python. Rather than a drawback, this is seen by Python programmers as a big plus. The inability to pack multiple statements in one line keeps Python programs clean and readable.
- Unlike PERL, Python code does not use curly brackets. This change also has an important ramification. Python uses indentation to identify blocks inside ‘for’ or ‘while’ loops. Therefore, Python is picky about the size of the indentation, which needs to be identical for all statements within the block.
- Unlike PERL, Python variables do not have characters ‘$’, ‘%’ or ‘@’ before their names.
- Python ‘print’ statement automatically adds a newline, whereas PERL statements need “\n” to be added to be added explicitly.
A Quick View of Pythonland
Now that you know the main steps to convert PERL programs into Python, let us look at Python from a new programmers’ angle.
Python core contains 32 keywords. Those words are special and cannot be used in as the names of variables. Also, Python comes with a set of commonly used functions. Those functions are shown below.
Python Keywords are Special Words
Keyword | Action |
---|---|
prints text on screen | |
and | logical ‘and’ |
or | logical ‘or’ |
not | logical ‘not’ |
True | logical True |
False | logical False |
is | tests equality |
in | checks if an element is in a list or dictionary |
del | deletes an element from a list or dictionary |
if, else, elif | conditional statement |
while | conditinal loop |
for | loop |
continue | skips current execution of loop |
break | quits the loop prematurely |
def | defines new function |
return | returns value at the end of function |
from, import | imports functions from file |
Special words cannot be used as the names of variables.
Wrong code -
for = 1
in = 2
print for + in
Here is the Full List of Keywords
and, or, not
if, else, elif
while, for, continue, break, in
def, return
import, from, as
with, as (file)
in (list)
del (delete dictionary item, list item)
exec (shell command)
global, with, assert, pass, yield
except, class, raise, finally
is, lambda, try
The Same Keywords Listed Alphabetically
and del from not while
as elif global or with
assert else if pass yield
break except import print
class exec in raise
continue finally is return
def for lambda try
Built-in Functions in Python
Function - range()
The function range creates a list of integers.
print range(3)
print range(1,5)
print range(1,5,2)
Function - len()
This function gives the length of a list.
x=range(9,2,-2)
print len(x)
In the following code, the variable i goes from 0 to 3, because len(a)=4.
a=[0,1,2,3]
print “loop using list indices”
for i in range(len(a)):
print i,”a[i]+8=”,a[i]+8
Function - float()
The function float converts an integer to a floating point number.
x=1
y=float(x)
print x,y
Function - int()
The function int gives the integer part of a floating point number.
x=1.7
y=int(x)
print x,y
Function - str()
The function str convers a number into a string.
x=723
y=str(x)
print y[0]
All Built-in Functions
Python language includes 68 built-in functions.
Name | Action |
---|---|
help() | Invoke the built-in help system. |
Number-related | |
abs() | Return the absolute value of a number. |
pow() | Return power raised to a number. |
round() | Return the rounded floating point value. |
divmod() | Return a pair of numbers consisting of quotient and remainder when using integer division. |
Creates Objects | |
ascii() | Return a string containing a printable representation of an object, but escape the non-ASCII characters. |
bytearray() | Return a new array of bytes. |
bytes() | Return a new “bytes” object. |
chr() | Return the string representing a character. |
complex() | Create a complex number or convert a string or number to a complex number. |
dict() | Create a new dictionary. |
enumerate() | Return an enumerate object. |
frozenset() | Return a new frozenset object. |
hash() | Return the hash value of the object. |
id() | Return the “identity” of an object. |
iter() | Return an iterator object. |
list() | Return a list. |
memoryview() | Return a “memory view” object created from the given argument. |
object() | Return a new featureless object. |
repr() | Return a string containing a printable representation of an object. |
str() | Return a str version of object. |
set() | Return a new set object. |
slice() | Return a slice object. |
tuple() | Return a tuple |
type() | Return the type of an object. |
Converts | |
bin() | Convert an integer number to a binary string. |
bool() | Convert a value to a Boolean. |
float() | Convert a string or a number to floating point. |
format() | Convert a value to a “formatted” representation. |
hex() | Convert an integer number to a hexadecimal string. |
int() | Convert a number or string to an integer. |
oct() | Convert an integer number to an octal string. |
ord() | Return an integer representing the Unicode. |
List operations | |
len() | Return the length (the number of items) of an object. |
min() | Return the smallest item in an iterable. |
max() | Return the largest item in an iterable. |
sorted() | Return a new sorted list. |
sum() | Sums the items of an iterable from left to right and returns the total. |
** Iterables ** | |
all() | Return True if all elements of the iterable are true (or if the iterable is empty). |
any() | Return True if any element of the iterable is true. If the iterable is empty, return False. |
callable() | Return True if the object argument appears callable, False if not. |
map() | Return an iterator that applies function to every item of iterable, yielding the results. |
filter() | Construct an iterator from elements of iterable for which function returns true. |
zip() | Make an iterator that aggregates elements from each of the iterables. |
range() | Return an iterable sequence. |
next() | Retrieve the next item from the iterator. |
reversed() | Return a reverse iterator. |
I/O-related | |
dir() | Return the list of names in the current local scope. |
open() | Open file and return a corresponding file object. |
print() | Print objects to the stream. |
input() | Reads a line from input, converts it to a string (stripping a trailing newline), and returns that. |
Runs Code | |
compile() | Compile the source into a code or AST object. |
eval() | The argument is parsed and evaluated as a Python expression. |
exec() | Dynamic execution of Python code. |
Other functions | |
classmethod() | Return a class method for the function. |
getattr() | Return the value of the named attribute of an object. |
setattr() | Assigns the value to the attribute. |
delattr() | Deletes the named attribute of an object. |
hasattr() | Return True if the name is one of the object’s attributes. |
globals() | Return a dictionary representing the current global symbol table. |
locals() | Update and return a dictionary representing the current local symbol table. |
isinstance() | Return True if the object argument is an instance. |
issubclass() | Return True if class is a subclass. |
property() | Return a property attribute. |
staticmethod() | Return a static method for function. |
super() | Return a proxy object that delegates method calls to a parent or sibling class. |
vars() | Return the _dict_ attribute for a module, class, instance, or any other object. |
_import_() | This function is invoked by the import statement. |
References
- https://gist.github.com/mindful108/6412490
- http://www.programiz.com/python-programming/keyword-list
- https://learnpythonthehardway.org/book/ex37.html
Numbers and Variables
Apart from ‘$’, ‘%’ and ‘@’ symbols in front of the names, PERL and Python variables are named in the same way. Here are two equivalent PERL and Python programs -
$x=10;
$y=2*$x;
print “$y\n”;
$y=$y+1;
print “$y\n”;
x=10
y=2*x
print y
y=y+1
print y
Apart from the steps listed in the previous chapter, there are no differences between the two programs.
We also note that the mathematical operators are identical in PERL and Python. They are shown in the following table.
Operator | Action |
---|---|
+ | Addition |
- | Subtraction |
* | Multiplication |
/ | Division |
% | Remainder |
** | Power |
From Array to List
PERL arrays are called lists in Python.
Here is a PERL program demonstrating various aspects of arrays.
@A=(10, 20, 30, 40, 3);
print $A[2],”\n”;
$N=@A;
print “$N\n”;
@A=();
$N=@A;
print “$N\n”;
print $A[2],”\n”;
The equivalent code in Python looks like -
A=[10,20,30,40,3]
print A[2]
print len(A)
A=[]
print len(A)
print A[2]
Apart from the differences mentioned in chapter 2, here are the additional changes.
- Length of list in Python is obtained by using the ‘len’ function.
- PERL is more forgiving than Python if the command seeks out-of-range elements of arrays/lists.
Python Shortcuts on Lists
Here we discuss a number of useful shortcuts related to lists in Python.
- The ‘+’ symbol concatenates two lists.
a=[1,3,2,0]
b=[2,3,1,7]
print a+b
- You can use ‘:’ to get sublist.
a=[1,3,4,9,6,2,0]
b=a[3:7]
print b
- The following command gives a sublist from 3 to 7, skip 2.
a=[1,3,4,9,6,2,0]
b=a[3:7:2]
print b
- The following command reverses the list.
a=[1,3,4,9,6,2,0]
b=a[::-1]
print b
Keywords ‘in’ and ‘del’
Keyword | Action |
---|---|
in | checks if an element is in a list or dictionary |
del | deletes an element from a list or dictionary |
Keyword ‘in’
The keyword ‘in’ can be used to iterate over the keys or values. It checks wherher a number is in the list or not.
a=[3,4,9,1]
print 3 in a
print 100 in a
Keyword - del
The keyword del is used to remove a list element at a known index.
x=['a','b','c','d']
del x[2]
print x
Try -
a=[1,3,4,9,6,2,0]
print a
del a[2]
print a
From Associative Array to Dictionary
Associative arrays in PERL are called dictionaries in Python.
Here is a PERL program using an associative array.
%A = (“john”, 39, “mark”, 170);
print $A{“john”},”\n”;
print %A,”\n”;
Its equivalent Python code is shown below.
A = {‘john’: 12, ‘mark’: 170}
print A['john']
print A
Iterating over keys and values
Keyword | Action |
---|---|
in | checks if an element is in a list or dictionary |
del | deletes an element from a list or dictionary |
Keyword ‘in’
The keyword ‘in’ can be used to iterate over the keys or values.
age={}
age['john']=12
age['paul']=77
print 12 in age
print ‘john’ in age
Keyword - del
The keyword del can be used to delete a member of a dictionary.
age={}
age['john']=12
age['paul']=77
print age
del age['john']
print age
Loops and Conditions
‘while’ loop
Let us demonstrate the ‘while’ loop by writing the multiplication table for 9 in both PERL and Python.
The PERL code -
$i=1;
while($i<=10)
{
print 9*$i, “\n”;
$i++;
}
Python code -
i=1
while(i<=10):
print 9*i
i=i+1
Apart from the differences mentioned in chapter 2, two codes are identical.
‘if-else’
PERL code -
$i=15;
if($i>10) {
print “$i greater than 10\n”;
}
else {
print “$i less than 10\n”;
}
Python code -
i=15
if(i>10):
print “$i greater than 10\n”;
else:
print “$i less than 10\n”;
‘for’ Loops
Keyword | Action |
---|---|
for | loop |
continue | skips over the remaining lines and repeats |
break | quits the loop |
PERL code -
for($i=1; $i<11; $i++)
{
print “5 times”, $i, “is”, 5*$i, “\n”;
}
print “completed for loop”
Python code -
for i in range(1,11):
print “5 times”, i, “is”, 5*i
print “completed for loop”
Using ‘for’ over a Dictionary
age={}
age['john']=12
age['paul']=77
for key in age:
print key
print age[key]+7
When ‘for’ is written on a dictionary, the loop variable takes the values of
the keys of the dictionary.
Keywords ‘break’ and ‘continue’
‘While’ loops become even more powerful, when they are customized using an
internal condition (‘if’). The keywords ‘break’ and ‘continue’ come handy in that situation.
i=0
while True:
i=i+1
if i==4:
break
print “5 times”, i, “is”, 5*i
In the above code, the condition for ‘while’ is always True. Therefore, it is expected to run infinite times. That does not happen, because the loop is terminated using ‘break’, when i reaches 4.
i=0
while i<10:
i=i+1
if i==4:
continue
print “5 times”, i, “is”, 5*i
The keyword ‘continue’ skips over the remaining lines of the ‘while’ block and starts the following run of the ‘while’ loop.
From Subroutines to Functions
PERL subroutines are called functions in Python.
Here is a PERL code to show a simple subroutine -
sub name {
my($name)=@_;
print “My name is $name\n”;
}
name(“Alice”);
name(‘John’);
The equivalent Python code is shown below -
def name(str):
print “My name is”, str
return
name(“Alice”)
name(“John”)
Differences -
- parameter passing
- return at the end
Keyword | Action |
---|---|
def | Defines new function |
return | Returns value at the end of function |
from | Gives name of an external file |
import | Brings in functions from an external file |
You have been using many Pythons functions, such as range(), sort(), etc., to improve your
code. Internally, a function is block of code with a given name. When you use
a function (e.g. range(4)) within your code, Python executes the corresponding
block of code and returns the result. That way your code stays small and readable.
Apart from the in-built functions Python provides you with, you can also create your
own functions. Here is an example.
def square(x):
return x*x
print square(2)
print square(3)
The keyword ‘def’ gives name to a function, and the variables within the
parenthesis are its parameter. The block of indented code following def represents the
code of the function, and the ‘return’ statement gives its return value to be used
by the main program.
Here, you created a function named ‘square’ that takes only one parameter x. Internally, this
function computes x*x and returns the result. Whenever you use square()
in your main code, Python runs the block of code from its definition to get a result.
Code Flow with Functions
We need to also make clear that your main code consists of all lines after excluding the def
blocks. The standard linear flow of execution from top to bottom does not hold for the
functions. Let us illustrate the point with two codes.
i=2
j=3
def square(x):
print “inside”,i,j
return x*x
print square(i)
print square(j)
def square(x):
print “inside”,i,j
return x*x
i=2
j=3
print square(i)
print square(j)
You will see that both produce the same output. You may find that odd, because i and j are not defined
before the function in the second case. How does the function know their values?
They work identically in both cases, because Python isolates the def block and keeps it separately.
Then it takes the remaining lines and executes the code from top to bottom. Hence, i and j
are already defined by the time the function square is called.
Default Parameter
def square(x=1):
return x*x
print square()
print square(2)
print square(3)
The above code gives default value of 1 to the parameter x. When the function square() is
called without any number, Python uses the default value to print 1*1.
Importing Functions from a File
You learned in the previous section that Python separates out the function definitions,
while executing the code. To keep the code readable, programmers often prefer to write the function
definitions is a file separate from the main program. How does one run such multi-program code?
We will learn that here by creating two files - ‘names.py’ for functions and ‘code.py’ for the
main code. You cannot do this in the sandbox.
In file names.py, type -
def square(x=1):
return x*X
def cube(x=1):
return xxx
In code.py -
#from names import square
from names import *
print square(2)
print cube(2)
print cube(10)/square(10)
Both files need to be in the same directory. When you run code.py, it will automatically
incorporate the functions ‘square’ and ‘cube’ from names.py.
Strings
In the followin PERL code, ‘$x’ is a string -
$x=”My name is Alice”;
print $x,”\n”;
The equivalent Python code is shown below -
x=’My name is Alice’
print x
Substring
The following program prints ‘name’.
$x=”My name is Alice”;
$y=substr($x,3,4);
print $y,”\n”;
The equivalent Python code is shown below -
x=”My name is Alice”
y=x[3:7]
print y
String-related Function
https://docs.python.org/2/library/string.html
Python String is a List
Internally, Python represents each string as an immutable list. Therefore, many list-related commands and functions can be used for strings. Here is an example.
line=”Welcome to the class”
print line[10]
print line[1:9]
print line[::-1]
The first print command prints a single character from the list ‘line’, the second
command prints a substring, and the third one reverses the string.
String-related functions
Functions - upper(), lower()
line=”A to Z”
print line.upper()
print line.lower()
Function - strip()
line=f.readline()
l=line.strip()
print line
print l
Function - find()
mystring=”ATGCAAATGCAT”
print mystring.find(“AAA”)
Function - replace()
mystring=”ATGCAAATGCAT”
new=mystring.replace(“A”,”T”)
print new
The function replace() replaces a substring with a different string.
You can use it to replace or remove letters. For example, the following
code removes all commas from a line.
mystring=”John, Jane, Jill, Juan, Jedi”
new = mystring.replace(“,”,””)
print new
Function - split()
line=”A big fat hen”
x=line.split()
for w in x:
print w
Function - join()
x=["ATGC", "TGGG", "TAAA"]
y=”ATGCTGGGTAAA”
z= ““.join(x)
if(z==y):
print “YES”
Regular Expressions
Regular expressions in PERL -
$line= “ATGAAATGTGGTGGG”
if($line=/^ATG(\S\S\S).*(\S)G$/) { print "match.group(1): $1\n"; print "match.group(2): $2\n"; } ~~~
Regular expressions in Python -
import re
line = “ATGAAATGTGGTGGG”
match = re.match( r’^ATG(SSS).*(S)G$’, line)
if match:
print “match.group(1) : “, match.group(1)
print “match.group(2) : “, match.group(2)
Regular expression is a special sublanguage to make searches through strings easy. Python has a special library (‘re’) to facilitate the use of regular expressions.
Search
import re
S=re.search(‘[a-z]a[a-z]’, ‘a fat cat sat’)
if S:
print “yes”
else:
print “no”
The above code searches for three letter patterns within the sentence, where
the first and third letter can be ‘a-z’, but the middle letter is ‘a’.
import re
str = ‘I am flying from Seatttle to San Francisco’
match = re.search(r’[SF]’, str)
if match:
print ‘found S/F’, match.group()
else:
print ‘did not find S/F’
Search and Replace
import re
seq=”ATTCGATCT”
s= re.sub(‘A’, ‘’, seq)
diff=len(seq)-len(s)
print “count of A =”, diff
The sub command replaces “Seattle” with “London” in the following example.
import re
str = ‘I am flying from Seattle to San Francisco’
x = re.sub(r’Seattle’, ‘London’, str)
print x
References
For description of regular expression sublanguage, check here -
https://regexr.com/
https://regex101.com/
https://developers.google.com/edu/python/regular-expressions?hl=en
Reading and Writing Files
Reading from a file
In PERL, files are read as -
open(IN,”filename”);
$=<IN>;
while(<IN>)
{
print $,”\n”;
}
close(IN);
Above command first reads one line in the statement ‘$_=<IN>’. Then it continually reads sentences and prints
on the screen.
The equivalent command in Python is -
f = open(‘filename’, ‘r’)
line = f.readline()
for line in f:
print line
f.close()
print line
Reading the entire file in an array
open(IN,”filename”);
@array=<IN>;
f = open(‘filename’, ‘r’)
lines = f.readlines()
Writing a string into a file
PERL
open(OUT,”>myfile”);
print OUT “My name is john\n”;
close(OUT);
Python
f = open(‘myfile’, ‘w’)
f.write(“hi, my name is john\n”)
File read/write symbols in PERL and Python
file request | PERL | Python |
---|---|---|
open for reading | open(F,”myfile”); | f = open(‘myfile’, ‘r’) |
open for writing | open(F,”>myfile”); | f = open(‘myfile’, ‘w’) |
open for appending | open(F,”>>myfile”); | f = open(‘myfile’, ‘a’) |
open for read/write | f = open(‘myfile’, ‘r+’) |
Modules and Packages
A module in PERL is a collection of functions. A package is much larger.
In Python, a package
http://stackoverflow.com/questions/3733969/old-pl-modules-versus-new-pm-modules
PERL modules look like -
use Useful;
open(IN,”gene”);
$_=<IN>;
$x=Useful::translate($_);
print $x,”\n”;
Python modules look like -
import Useful
Splitting Code into Multiple Files
Keyword | Action |
---|---|
import | Brings in code from an external file |
We will separate our code into two files and see how they run.
You cannot do this in the sandbox.
In file other.py, type -
print “code in other file”
In main.py -
import other
print “code in main file”
Both files need to be in the same directory. When you run main.py, it will automatically
include the code from ‘other.py’ and run it.
print “code in main file”
import other
Import from external file happens only once.
The main purpose of import is to separate function definitions in a separate file.
From CPAN to PyPI
Installing perl modules from CPAN (http://www.cpan.org/) -
> cpan App::cpanminus
cpanm Module::Name
Python Package Index
https://pypi.python.org/pypi
One step installation process -
git install pip
pip install package
Where do they go in the unix directory structure?
Useful Python Packages and Tools
Python command line
Numpy
Jupyter Notebook
Inline -
http://search.cpan.org/dist/Inline-Python/Python.pod
Class, Iterator, etc.
Object oriented programming.
We do not want you to create classes. Just understand them so that you can
use them from the available libraries.
Code -
class complex_number:
self.re = re
self.im = im
z=complex_number(2,5)
print z.re, z.im
class complex_number:
self.re = re
self.im = im
def absquare(self):
return self.reself.re + self.imself.im
z=complex_number(2,5)
print z.re, z.im, z.absquare()
Example - integer_list, dna_seq
Purpose of class is to make sure data conforms to standard.
Iterables, Iterators, Generators
Very powerful concepts.
Collection.
Map, lambda.
http://nvie.com/posts/iterators-vs-generators/
a=iter(range(5))
print a.next()
print a.next()
print a.next()
print a.next()
print a.next()
http://nvie.com/posts/iterators-vs-generators/
>>> from itertools import cycle
>>> colors = cycle(['red', 'white', 'blue'])
>>> next(colors)
‘red’
>>> next(colors)
‘white’
>>> next(colors)
‘blue’
>>> next(colors)
‘red’
Protocols -
http://anandology.com/python-practice-book/iterators.html
http://www.dabeaz.com/generators-uk/
https://stackoverflow.com/questions/9884132/what-exactly-are-pythons-iterator-iterable-and-iteration-protocols
https://stackoverflow.com/questions/32799980/what-exactly-does-iterable-mean-in-python
** make sure example is changed **
https://docs.python.org/2/tutorial/classes.html
http://www.diveintopython3.net/iterators.html
http://nvie.com/posts/iterators-vs-generators/
The following code is from online.
class Fib:
‘'’iterator that yields numbers in the Fibonacci sequence’’’
self.max = max
def iter(self):
self.a = 0
self.b = 1
return self
def next(self):
fib = self.a
if fib > self.max:
raise StopIteration
self.a, self.b = self.b, self.a + self.b
return fib
x=Fib(10)
print x
From BioPERL to Biopython
Biopython is good for -
- Quick analysis of nucleotide and protein sequence. You can easily extract
a segment from a longer sequence, get reverse complement, do nucleotide
to protein translation. - Parsing of all kinds of files, including simple FASTA files,
BLAST output, MUSCLE output, PDB files, and so on. - Submitting requests to online databases and fetchin data from them. For example,
you can programmatically run BLAST at NCBI, instead of manually filling up the
form. - Statistical and bioinformatics analysis - clustering, motifs, phylogeny, etc.
Analyzing Nucleotide and Protein Sequences
Biopython has many functions to perform routine analysis
of nucleotide and protein sequences. The sequences themselves
are saved in the Bio.Seq class.
Extracting Subsequences
from Bio.Seq import Seq
read = Seq(“GATCGTAGATAGTGCGCGCGTAGAGGAGAGATAGAGAGAGGAGATAGAGATA”)
print read[10:20]
You will see “AGTGCGCGCG” being printed.
Here read is a Bio.Seq object that can be used to store nucleotide
and protein sequences. A
subsequence of Bio.Seq object can be obtained in the same
way we get substrings. Its coordinate system starts from 0.
Reverse Complement
from Bio.Seq import Seq
read = Seq(“GATCGTAGATAGTGCGCGCGTAGAGGAGAGATAGAGAGAGGAGATAGAGATA”)
print read.reverse_complement()
The function ‘reverse_complement’ is included in Bio.Seq class. You
will see the output “TATCTCTATCTCCTCTCTCTATCTCTCCTCTACGCGCGCACTATCTACGATC”.
Translate into Proteins
from Bio.Seq import Seq
read = Seq(“GATCGTAGATAGTGCGCGCGTAGAGGAGAGATAGAGAGAGGAGATAGAGATA”)
print read.translate()
/usr/local/lib/python2.7/dist-packages/Bio/Seq.py:2095: BiopythonWarning: Partial codon, len(sequence) not a multiple of three. Explicitly trim the sequence or add trailing N before translation. This may become an error in future.
BiopythonWarning)
DRR*CARRGEIERGDRD
If you trim the last nucleotide, the error will go away.
Compute GC Content
from Bio.SeqUtils import GC
from Bio.Seq import Seq
read = Seq(“GATCGTAGATAGTGCGCGCGTAGAGGAGAGATAGAGAGAGGAGATAGAGATA”)
print GC(read)
Output - 48.0769230769
You can experiment by changing the sequence to
all As or all Gs to see whether the GC function works correctly.
Parsing Biological Records
Parsing text files of different formats is a tedious task in
bioinformatics. Biopython makes this process
very easy. It can load data stored in many different file formats.
Functions - ‘read’ and ‘parse’
Biopython maintains uniform syntax for loading data from a file. Each
of its class parsing text files has two functions - ‘read’ and ‘parse’.
Typically, biological data files have multiple records of the
same type. For example, a FASTA file contains several different
sequences. A BLAST output file contains search results
for several different input sequence. The function ‘read’ loads only the first
record from a large file, whereas the function ‘parse’ creates an iterator
to go over all records. Examples for the functions are shown below.
Reading FASTA File
The function ‘read’ fetches only the first record from a FASTA file.
from Bio import SeqIO
record=SeqIO.read(“seq.fasta”, “fasta”)
print record.id, record.seq
The function ‘parse’ creates an iterator to go over all records in
a FASTA file. You can either loop over the records, or use next()
function to fetch one record at a time.
from Bio import SeqIO
records=SeqIO.parse(“seq.fasta”, “fasta”)
for record in records:
print record.id, record.seq
or
from Bio import SeqIO
records=SeqIO.parse(“seq.fasta”, “fasta”)
record = next(records)
print record.id
print record.seq
print len(record)
record = next(records)
print record.id
print record.seq
print len(record)
KEGG Example
http://www.genome.jp/dbget-bin/www_bget?ec:5.1.1.1
The function ‘read’ fetches only one record.
from Bio.KEGG import Enzyme
record = Enzyme.read(open(“ec_5.1.1.1.txt”))
The function ‘parse’ creates an iterator to go over all records.
from Bio.KEGG import Enzyme
records = Enzyme.parse(open(“ec_5.1.1.1.txt”))
record=next(records)
…
…
record=next(records)
Commands for Different Types of Data
Similar ‘parse’ and ‘read’ functions can be used to process many different
types of data files. In the following table, we list only one of the
two functions. The other one is also valid.
Data Type | Biopython Library |
---|---|
FASTA | from Bio import SeqIO<br>records=SeqIO.parse(“seq.fasta”, “fasta”) |
Genbank | from Bio import SeqIO<br>records = SeqIO.parse(“dat.gbk”, “genbank”) |
BLAST | from Bio.Blast import NCBIXML<br>records = NCBIXML.parse(open(“blast_out.xml”)) |
CLUSTAL | from Bio import AlignIO<br>align = AlignIO.read(“alignment.aln”, “clustal”) |
MUSCLE | from Bio import AlignIO<br>align = AlignIO.read(“alignment.faa”, “fasta”) |
Phylogeny | from Bio import Phylo<br>tree = Phylo.read(“tree.dnd”, “newick”) |
Entrez | from Bio import Entrez<br>records = Entrez.parse(open(“Homo_sapiens.xml”)) |
UniGene | from Bio import UniGene<br>record = UniGene.read(open(“gene.data”)) |
GEO | from Bio import Geo<br>records = Geo.parse(open(“GSE273.txt”)) |
Medline | from Bio import Medline<br>records=Medline.parse(open(“pubmed_file.txt”)) |
Pubmed | |
SwissProt Keywords | from Bio.SwissProt import KeyWList<br>records = KeyWList.parse(open(“keywlist.txt”)) |
Prosite | from Bio.ExPASy import Prosite<br>records = Prosite.parse(open(“prosite.dat”)) |
Prosite Doc | from Bio.ExPASy import Prodoc<br>records = Prodoc.parse(open(“prosite.doc”)) |
EXPASy | from Bio.ExPASy import Enzyme<br>records = Enzyme.parse(open(“enzyme.dat”)) |
PDB PDBParser | from Bio.PDB.PDBParser import PDBParser |
PDB MMCIF2Dict | from Bio.PDB.MMCIF2Dict import MMCIF2Dict |
PDB MMCIFParser | from Bio.PDB.MMCIFParser import MMCIFParser |
PDB MMTFParser | from Bio.PDB.mmtf import MMTFParser |
KEGG | from Bio.KEGG import Enzyme<br>records = Enzyme.parse(open(“ec_5.1.1.1.txt”)) |
Although ‘parse’ and ‘read’ functions are used to parse different types
of data files, the records created by them are not identical. SeqIO.parse produces
‘SeqIO’ type of records, whereas ‘Medline.parse’ produces ‘Medline’ type of records. We
will see more details about those records in the following section.
Objects to Store Different Types of Data
In the last section, we learned about ‘read’ and ‘parse’
functions to read files in different formats. The output
from those calls create different kinds of records, as
appropriate for the situation. Let us see a few examples. Some
of those fields themselves can be iterators.
FASTA Record
Bio.SeqIO with fasta
id –>
seq –>
BLAST Record
Bio.Blast.NCBIXML object
from Bio.Blast import NCBIXML
records = NCBIXML.parse(open(“blast_output.xml”))
E_VALUE_THRESH = 0.00001
for record in records:
for alignment in record.alignments:
for hsp in alignment.hsps:
if hsp.expect < E_VALUE_THRESH:
print(‘Next Alignment’)
print(‘seq:’, alignment.title)
print(‘L:’, alignment.length)
print(‘e value:’, hsp.expect)
print(hsp.query[0:75] + ‘…’)
print(hsp.match[0:75] + ‘…’)
print(hsp.sbjct[0:75] + ‘…’)
More examples are shown in the Example section.
Accessing Data from the Internet
One attractive feature of Biopython is that it can fetch different
kinds of data from the internet. For example, in case of BLAST, it
can submit BLAST request to the NCBI, and then get back the output for you.
BLAST
from Bio.Blast import NCBIWWW
from Bio import SeqIO
record = SeqIO.read(“m_cold.fasta”, format=”fasta”)
result_handle = NCBIWWW.qblast(“blastn”, “nt”, record.seq)
save_file = open(“my_blast.xml”, “w”)
save_file.write(result_handle.read())
save_file.close()
result_handle.close()
KEGG
http://www.genome.jp/dbget-bin/www_bget?ec:5.1.1.1
from Bio.KEGG import REST
from Bio.KEGG import Enzyme
req = REST.kegg_get(“ec:5.1.1.1”)
open(“ec_5.1.1.1.txt”, ‘w’).write(req.read())
Entrez
from Bio import Entrez
Entrez.email = “A.N.Other@example.com” # Always tell NCBI who you are
handle = Entrez.efetch(db=”nucleotide”, id=”186972394”, rettype=”gb”, retmode=”text”)
print handle.read()
References
- https://coding4medicine.com/Materials/biopython/index.html
- http://biopython.org/DIST/docs/tutorial/Tutorial.html
- http://people.duke.edu/~ccc14/pcfb/biopython/BiopythonBasics.html
- https://www.coursera.org/learn/python-genomics/lecture/ahlsr/lecture-8-biopython-13-32
- https://www.gitbook.com/book/krother/biopython-tutorial/details
Closing Comments
PERL was known as the “duct tape that held the internet”.
Who killed PERL? We note that scripting languages go out of fashion not because they are bad, but they are successful in solving the problem at hand. That allows the rise of technologies solving the higher level of problems, and people working on new technology like to create their own tools to start out fresh.
https://www.linuxjournal.com/article/3394
https://www.fastcompany.com/3026446/the-fall-of-perl-the-webs-most-promising-language
PERL6 than never came and Python3 that never got adopted
PERL5 came out on October 17, 1994. That was the same year Netscape web browser was published and
internet was officially born.
https://thenewstack.io/larry-walls-quest-100-year-programming-language/