Move Seamlessly between PERL and Python
Move Seamlessly between PERL and Python
Homolog_us
Buy on Leanpub

The Shared History of PERL and Python

The year 1969 was pivotal in the history of computing. Later that year Ken Thompson and Dennis Ritchie released the first version of the UNIX operating system. The eventual births of the scripting languages like PERL and Python two decades later were related to that event.

UNIX was the first operating system written almost entirely in a high-level language (C). Previously, operating systems were developed in machine-dependent assembly languages and therefore could not be ported to different types of computers. In contrast, the C-version of UNIX, released in 1973, could be ported easily to all kinds of computing hardware, and that made UNIX the most popular operating system. As a side effect, C also became the primary language for computing.

The concept of scripting originated from the UNIX world. Scripts were short informal programs, which did not need to be compiled unlike the C programs. Often they were used to search rapidly for patterns in the text files. UNIX developers not only included software tools like AWK to facilitate scripting, but also built another immensely helpful feature for approximate pattern search - regular expressions (regex). Ken Thompson initially added this capability to the UNIX editor ‘ed’. However, regexes turned out to be so useful that they eventually got adopted into many popular UNIX tools.

PERL language was created (1987) by Larry Wall to bring the capabilities of various UNIX-related scripting tools in one place and also give the scripts C-like syntax. Right when the internet boom needed a language for efficient parsing of web documents (1994), PERL was ready with a matured release including regular expressions. For almost identical reasons, PERL became popular among the bioinformaticians in the 1990s and early 2000s.

While PERL was created to give C-like structure to the UNIX scripts, Python was developed by Guido van Rossum (late 1980s) to have a simplify C and make it ready for scripting. Therefore, two languages attempted to accomplish almost the same objectives coming from two ends. Python took longer than PERL to mature, but its simpler syntax made it a favorite among the bioinformaticians since the mid-2000s. This created inconveniences for some of the earlier researchers trained in PERL, because they increasingly need to use Python to access shared public repositories or share codes with others.

Here is the good news. Although, from a cursory look, the codes written in those two languages appear as different as English and Chinese, they are actually way closer than English and French. In fact, the two languages are as comparable as British and American English. By knowing how to convert a few major differences, a PERL programmer can easily translate codes to Python.

This book will make your transition easy by showing what those major differences are in the next chapter.

Learning through Examples

In the previous chapter, you learned about the shared history of PERL and Python. Here you will see how similar the codes written in two languages are based on three examples. You will learn how to move easily from PERL to Python by making three simple changes in the code structure.

Example 1 - Hello World

The most basic “hello world” program in PERL looks like -


# This is a simple program

print “Hello from Homolog.us\n”;

You can run it by typing -


> perl hello.perl

A similar code in Python looks like -


# This is a simple program

print “Hello from Homolog.us”

You run it with -


> Python hello.py

Example 2 - Multiplication Table

In this example, we print the multiplication table of 10.

PERL


for($i=0; $i<10; $i++)
{
       print “10 times, $i, “ is”, 10*$i,”\n”;
}

Python


for i in range(10):
       print “10 times, i,” is”, 10*i

Example 3 - Prime Number

In this example, we check whether an integer is prime.

PERL


$num=59;

$true=1;

for($i=2; $i < $num/2; $i++)
{
        if($num % $i ==0)
        {
                $true=0;
        }
}

if($true==1)
{
        print $num, “ is prime\n”;
}
else
{
        print $num, “ is not prime\n”;
}

Python


num=59

true=1

for i in range(2,num/2):
        if num % i ==0:
                true=0

if true==1:
        print num, “ is prime”
else:
        print num, “ is not prime”

Let us go over the differences between the PERL and Python scripts one by one.

  1. You can see that the Python programs do not include semicolons at the end of the lines. By removing all semicolons from the ends of Python statements, you can make them look almost like Python.

    This change has an important ramification however. PERL programmers can pack multiple statements in the same line, but that is not possible in Python. Rather than a drawback, this is seen by Python programmers as a big plus. The inability to pack multiple statements in one line keeps Python programs clean and readable.

  2. Unlike PERL, Python code does not use curly brackets. This change also has an important ramification. Python uses indentation to identify blocks inside ‘for’ or ‘while’ loops. Therefore, Python is picky about the size of the indentation, which needs to be identical for all statements within the block.
  3. Unlike PERL, Python variables do not have characters ‘$’, ‘%’ or ‘@’ before their names.
  4. Python ‘print’ statement automatically adds a newline, whereas PERL statements need “\n” to be added to be added explicitly.

A Quick View of Pythonland

Now that you know the main steps to convert PERL programs into Python, let us look at Python from a new programmers’ angle.

Python core contains 32 keywords. Those words are special and cannot be used in as the names of variables. Also, Python comes with a set of commonly used functions. Those functions are shown below.

Python Keywords are Special Words

Keyword Action
print prints text on screen
and logical ‘and’
or logical ‘or’
not logical ‘not’
True logical True
False logical False
is tests equality
in checks if an element is in a list or dictionary
del deletes an element from a list or dictionary
if, else, elif conditional statement
while conditinal loop
for loop
continue skips current execution of loop
break quits the loop prematurely
def defines new function
return returns value at the end of function
from, import imports functions from file

Special words cannot be used as the names of variables.

Wrong code -


for = 1
in = 2
print for + in

Here is the Full List of Keywords

print

and, or, not

if, else, elif

while, for, continue, break, in

def, return
import, from, as

with, as (file)

in (list)

del (delete dictionary item, list item)

exec (shell command)

global, with, assert, pass, yield
except, class, raise, finally
is, lambda, try

The Same Keywords Listed Alphabetically

and del from not while
as elif global or with
assert else if pass yield
break except import print
class exec in raise
continue finally is return
def for lambda try

Built-in Functions in Python

Function - range()

The function range creates a list of integers.


print range(3)


print range(1,5)


print range(1,5,2)

Function - len()

This function gives the length of a list.


x=range(9,2,-2)
print len(x)

In the following code, the variable i goes from 0 to 3, because len(a)=4.


a=[0,1,2,3]
print “loop using list indices”
for i in range(len(a)):
        print i,”a[i]+8=”,a[i]+8

Function - float()

The function float converts an integer to a floating point number.


x=1
y=float(x)
print x,y

Function - int()

The function int gives the integer part of a floating point number.


x=1.7
y=int(x)
print x,y

Function - str()

The function str convers a number into a string.


x=723
y=str(x)
print y[0]

All Built-in Functions

Python language includes 68 built-in functions.

Name Action
help() Invoke the built-in help system.
Number-related  
abs() Return the absolute value of a number.
pow() Return power raised to a number.
round() Return the rounded floating point value.
divmod() Return a pair of numbers consisting of quotient and remainder when using integer division.
Creates Objects  
ascii() Return a string containing a printable representation of an object, but escape the non-ASCII characters.
bytearray() Return a new array of bytes.
bytes() Return a new “bytes” object.
chr() Return the string representing a character.
complex() Create a complex number or convert a string or number to a complex number.
dict() Create a new dictionary.
enumerate() Return an enumerate object.
frozenset() Return a new frozenset object.
hash() Return the hash value of the object.
id() Return the “identity” of an object.
iter() Return an iterator object.
list() Return a list.
memoryview() Return a “memory view” object created from the given argument.
object() Return a new featureless object.
repr() Return a string containing a printable representation of an object.
str() Return a str version of object.
set() Return a new set object.
slice() Return a slice object.
tuple() Return a tuple
type() Return the type of an object.
Converts  
bin() Convert an integer number to a binary string.
bool() Convert a value to a Boolean.
float() Convert a string or a number to floating point.
format() Convert a value to a “formatted” representation.
hex() Convert an integer number to a hexadecimal string.
int() Convert a number or string to an integer.
oct() Convert an integer number to an octal string.
ord() Return an integer representing the Unicode.
List operations  
len() Return the length (the number of items) of an object.
min() Return the smallest item in an iterable.
max() Return the largest item in an iterable.
sorted() Return a new sorted list.
sum() Sums the items of an iterable from left to right and returns the total.
** Iterables **  
all() Return True if all elements of the iterable are true (or if the iterable is empty).
any() Return True if any element of the iterable is true. If the iterable is empty, return False.
callable() Return True if the object argument appears callable, False if not.
map() Return an iterator that applies function to every item of iterable, yielding the results.
filter() Construct an iterator from elements of iterable for which function returns true.
zip() Make an iterator that aggregates elements from each of the iterables.
range() Return an iterable sequence.
next() Retrieve the next item from the iterator.
reversed() Return a reverse iterator.
I/O-related  
dir() Return the list of names in the current local scope.
open() Open file and return a corresponding file object.
print() Print objects to the stream.
input() Reads a line from input, converts it to a string (stripping a trailing newline), and returns that.
Runs Code  
compile() Compile the source into a code or AST object.
eval() The argument is parsed and evaluated as a Python expression.
exec() Dynamic execution of Python code.
Other functions  
classmethod() Return a class method for the function.
getattr() Return the value of the named attribute of an object.
setattr() Assigns the value to the attribute.
delattr() Deletes the named attribute of an object.
hasattr() Return True if the name is one of the object’s attributes.
globals() Return a dictionary representing the current global symbol table.
locals() Update and return a dictionary representing the current local symbol table.
isinstance() Return True if the object argument is an instance.
issubclass() Return True if class is a subclass.
property() Return a property attribute.
staticmethod() Return a static method for function.
super() Return a proxy object that delegates method calls to a parent or sibling class.
vars() Return the _dict_ attribute for a module, class, instance, or any other object.
_import_() This function is invoked by the import statement.

References

  1. https://gist.github.com/mindful108/6412490
  2. http://www.programiz.com/python-programming/keyword-list
  3. https://learnpythonthehardway.org/book/ex37.html

Numbers and Variables

Apart from ‘$’, ‘%’ and ‘@’ symbols in front of the names, PERL and Python variables are named in the same way. Here are two equivalent PERL and Python programs -


$x=10;
$y=2*$x;
print “$y\n”;
$y=$y+1;
print “$y\n”;


x=10
y=2*x
print y
y=y+1
print y

Apart from the steps listed in the previous chapter, there are no differences between the two programs.

We also note that the mathematical operators are identical in PERL and Python. They are shown in the following table.

Operator Action
+ Addition
- Subtraction
* Multiplication
/ Division
% Remainder
** Power

From Array to List

PERL arrays are called lists in Python.

Here is a PERL program demonstrating various aspects of arrays.


@A=(10, 20, 30, 40, 3);
print $A[2],”\n”;
$N=@A;
print “$N\n”;

@A=();
$N=@A;
print “$N\n”;
print $A[2],”\n”;

The equivalent code in Python looks like -


A=[10,20,30,40,3]
print A[2]
print len(A)

A=[]
print len(A)
print A[2]

Apart from the differences mentioned in chapter 2, here are the additional changes.

  1. Length of list in Python is obtained by using the ‘len’ function.
  2. PERL is more forgiving than Python if the command seeks out-of-range elements of arrays/lists.

Python Shortcuts on Lists

Here we discuss a number of useful shortcuts related to lists in Python.

  1. The ‘+’ symbol concatenates two lists.


a=[1,3,2,0]
b=[2,3,1,7]
print a+b

  1. You can use ‘:’ to get sublist.


a=[1,3,4,9,6,2,0]
b=a[3:7]
print b

  1. The following command gives a sublist from 3 to 7, skip 2.


a=[1,3,4,9,6,2,0]
b=a[3:7:2]
print b

  1. The following command reverses the list.


a=[1,3,4,9,6,2,0]
b=a[::-1]
print b

Keywords ‘in’ and ‘del’

Keyword Action
in checks if an element is in a list or dictionary
del deletes an element from a list or dictionary

Keyword ‘in’

The keyword ‘in’ can be used to iterate over the keys or values. It checks wherher a number is in the list or not.


a=[3,4,9,1]

print 3 in a
print 100 in a

Keyword - del

The keyword del is used to remove a list element at a known index.


x=['a','b','c','d']
del x[2]
print x

Try -


a=[1,3,4,9,6,2,0]
print a
del a[2]
print a

From Associative Array to Dictionary

Associative arrays in PERL are called dictionaries in Python.

Here is a PERL program using an associative array.


%A = (“john”, 39, “mark”, 170);

print $A{“john”},”\n”;
print %A,”\n”;

Its equivalent Python code is shown below.


A = {‘john’: 12, ‘mark’: 170}

print A['john']
print A

Iterating over keys and values

Keyword Action
in checks if an element is in a list or dictionary
del deletes an element from a list or dictionary

Keyword ‘in’

The keyword ‘in’ can be used to iterate over the keys or values.


age={}
age['john']=12
age['paul']=77

print 12 in age
print ‘john’ in age

Keyword - del

The keyword del can be used to delete a member of a dictionary.


age={}
age['john']=12
age['paul']=77

print age
del age['john']
print age

Loops and Conditions

‘while’ loop

Let us demonstrate the ‘while’ loop by writing the multiplication table for 9 in both PERL and Python.

The PERL code -


$i=1;
while($i<=10)
{
       print 9*$i, “\n”;
       $i++;
}
 

Python code -


i=1
while(i<=10):
       print 9*i
       i=i+1

Apart from the differences mentioned in chapter 2, two codes are identical.

‘if-else’

PERL code -


$i=15;
if($i>10) {
       print “$i greater than 10\n”;
}
else {
       print “$i less than 10\n”;
}

Python code -


i=15
if(i>10):
       print “$i greater than 10\n”;
else:
       print “$i less than 10\n”;

‘for’ Loops

Keyword Action
for loop
continue skips over the remaining lines and repeats
break quits the loop

PERL code -


for($i=1; $i<11; $i++)
{
    print “5 times”, $i, “is”, 5*$i, “\n”;

}
print “completed for loop”

Python code -


for i in range(1,11):
      print “5 times”, i, “is”, 5*i

print “completed for loop”

Using ‘for’ over a Dictionary


age={}
age['john']=12
age['paul']=77

for key in age:
        print key
        print age[key]+7

When ‘for’ is written on a dictionary, the loop variable takes the values of
the keys of the dictionary.

Keywords ‘break’ and ‘continue’

‘While’ loops become even more powerful, when they are customized using an
internal condition (‘if’). The keywords ‘break’ and ‘continue’ come handy in that situation.


i=0
while True:
    i=i+1
    if i==4:
        break
    print “5 times”, i, “is”, 5*i

In the above code, the condition for ‘while’ is always True. Therefore, it is expected to run infinite times. That does not happen, because the loop is terminated using ‘break’, when i reaches 4.


i=0
while i<10:
    i=i+1
    if i==4:
        continue
    print “5 times”, i, “is”, 5*i

The keyword ‘continue’ skips over the remaining lines of the ‘while’ block and starts the following run of the ‘while’ loop.

From Subroutines to Functions

PERL subroutines are called functions in Python.

Here is a PERL code to show a simple subroutine -


sub name {
        my($name)=@_;
        print “My name is $name\n”;
}

name(“Alice”);
name(‘John’);

The equivalent Python code is shown below -


def name(str):
     print “My name is”, str
     return

name(“Alice”)
name(“John”)

Differences -

  1. parameter passing
  2. return at the end
Keyword Action
def Defines new function
return Returns value at the end of function
from Gives name of an external file
import Brings in functions from an external file

You have been using many Pythons functions, such as range(), sort(), etc., to improve your
code. Internally, a function is block of code with a given name. When you use
a function (e.g. range(4)) within your code, Python executes the corresponding
block of code and returns the result. That way your code stays small and readable.

Apart from the in-built functions Python provides you with, you can also create your
own functions. Here is an example.


def square(x):
 return x*x

print square(2)
print square(3)

The keyword ‘def’ gives name to a function, and the variables within the
parenthesis are its parameter. The block of indented code following def represents the
code of the function, and the ‘return’ statement gives its return value to be used
by the main program.

Here, you created a function named ‘square’ that takes only one parameter x. Internally, this
function computes x*x and returns the result. Whenever you use square()
in your main code, Python runs the block of code from its definition to get a result.

Code Flow with Functions

We need to also make clear that your main code consists of all lines after excluding the def
blocks. The standard linear flow of execution from top to bottom does not hold for the
functions. Let us illustrate the point with two codes.


i=2
j=3

def square(x):
 print “inside”,i,j
 return x*x

print square(i)
print square(j)


def square(x):
 print “inside”,i,j
 return x*x

i=2
j=3
print square(i)
print square(j)

You will see that both produce the same output. You may find that odd, because i and j are not defined
before the function in the second case. How does the function know their values?

They work identically in both cases, because Python isolates the def block and keeps it separately.
Then it takes the remaining lines and executes the code from top to bottom. Hence, i and j
are already defined by the time the function square is called.

Default Parameter


def square(x=1):
 return x*x

print square()
print square(2)
print square(3)

The above code gives default value of 1 to the parameter x. When the function square() is
called without any number, Python uses the default value to print 1*1.

Importing Functions from a File

You learned in the previous section that Python separates out the function definitions,
while executing the code. To keep the code readable, programmers often prefer to write the function
definitions is a file separate from the main program. How does one run such multi-program code?
We will learn that here by creating two files - ‘names.py’ for functions and ‘code.py’ for the
main code. You cannot do this in the sandbox.

In file names.py, type -


def square(x=1):
        return x*X

def cube(x=1):
 return xxx

In code.py -


#from names import square
from names import *

print square(2)
print cube(2)
print cube(10)/square(10)

Both files need to be in the same directory. When you run code.py, it will automatically
incorporate the functions ‘square’ and ‘cube’ from names.py.

Strings

In the followin PERL code, ‘$x’ is a string -


$x=”My name is Alice”;
print $x,”\n”;

The equivalent Python code is shown below -


x=’My name is Alice’
print x

Substring

The following program prints ‘name’.


$x=”My name is Alice”;
$y=substr($x,3,4);

print $y,”\n”;

The equivalent Python code is shown below -


x=”My name is Alice”
y=x[3:7]
print y

https://docs.python.org/2/library/string.html

Python String is a List

Internally, Python represents each string as an immutable list. Therefore, many list-related commands and functions can be used for strings. Here is an example.


line=”Welcome to the class”

print line[10]
print line[1:9]
print line[::-1]

The first print command prints a single character from the list ‘line’, the second
command prints a substring, and the third one reverses the string.

Functions - upper(), lower()


line=”A to Z”
print line.upper()
print line.lower()

Function - strip()


line=f.readline()
l=line.strip()

print line
print l

Function - find()


mystring=”ATGCAAATGCAT”

print mystring.find(“AAA”)

Function - replace()


mystring=”ATGCAAATGCAT”

new=mystring.replace(“A”,”T”)

print new

The function replace() replaces a substring with a different string.

You can use it to replace or remove letters. For example, the following
code removes all commas from a line.


mystring=”John, Jane, Jill, Juan, Jedi”

new = mystring.replace(“,”,””)

print new

Function - split()


line=”A big fat hen”

x=line.split()
for w in x:
   print w

Function - join()


x=["ATGC", "TGGG", "TAAA"]

y=”ATGCTGGGTAAA”

z= ““.join(x)

if(z==y):
        print “YES”

Regular Expressions

Regular expressions in PERL -


$line= “ATGAAATGTGGTGGG”

if($line=/^ATG(\S\S\S).*(\S)G$/) { print "match.group(1): $1\n"; print "match.group(2): $2\n"; } ~~~

Regular expressions in Python -


import re

line = “ATGAAATGTGGTGGG”

match = re.match( r’^ATG(SSS).*(S)G$’, line)

if match:
   print “match.group(1) : “, match.group(1)
   print “match.group(2) : “, match.group(2)

Regular expression is a special sublanguage to make searches through strings easy. Python has a special library (‘re’) to facilitate the use of regular expressions.


import re

S=re.search(‘[a-z]a[a-z]’, ‘a fat cat sat’)

if S:
  print “yes”
else:
  print “no”

The above code searches for three letter patterns within the sentence, where
the first and third letter can be ‘a-z’, but the middle letter is ‘a’.


import re
str = ‘I am flying from Seatttle to San Francisco’
match = re.search(r’[SF]’, str)
if match:
   print ‘found S/F’, match.group()
else:
    print ‘did not find S/F’

Search and Replace


import re

seq=”ATTCGATCT”

s= re.sub(‘A’, ‘’, seq)
diff=len(seq)-len(s)
print “count of A =”, diff

The sub command replaces “Seattle” with “London” in the following example.


import re
str = ‘I am flying from Seattle to San Francisco’
x = re.sub(r’Seattle’, ‘London’, str)
print x

References

For description of regular expression sublanguage, check here -

https://regexr.com/

https://regex101.com/

https://developers.google.com/edu/python/regular-expressions?hl=en

Reading and Writing Files

Reading from a file

In PERL, files are read as -


open(IN,”filename”);
$=<IN>;
while(<IN>)
{
 print $
,”\n”;
}
close(IN);

Above command first reads one line in the statement ‘$_=<IN>’. Then it continually reads sentences and prints
on the screen.

The equivalent command in Python is -


f = open(‘filename’, ‘r’)
line = f.readline()

for line in f:
    print line
f.close()

print line

Reading the entire file in an array


open(IN,”filename”);
@array=<IN>;


f = open(‘filename’, ‘r’)
lines = f.readlines()

Writing a string into a file

PERL


open(OUT,”>myfile”);
print OUT “My name is john\n”;
close(OUT);

Python


f = open(‘myfile’, ‘w’)

f.write(“hi, my name is john\n”)

File read/write symbols in PERL and Python

file request PERL Python
open for reading open(F,”myfile”); f = open(‘myfile’, ‘r’)
open for writing open(F,”>myfile”); f = open(‘myfile’, ‘w’)
open for appending open(F,”>>myfile”); f = open(‘myfile’, ‘a’)
open for read/write   f = open(‘myfile’, ‘r+’)

Modules and Packages

A module in PERL is a collection of functions. A package is much larger.

In Python, a package

http://stackoverflow.com/questions/3733969/old-pl-modules-versus-new-pm-modules

PERL modules look like -


use Useful;
open(IN,”gene”);
$_=<IN>;

$x=Useful::translate($_);

print $x,”\n”;

Python modules look like -


import Useful

Splitting Code into Multiple Files

Keyword Action
import Brings in code from an external file

We will separate our code into two files and see how they run.
You cannot do this in the sandbox.

In file other.py, type -


print “code in other file”

In main.py -


import other

print “code in main file”

Both files need to be in the same directory. When you run main.py, it will automatically
include the code from ‘other.py’ and run it.


print “code in main file”

import other

Import from external file happens only once.

The main purpose of import is to separate function definitions in a separate file.

From CPAN to PyPI

Installing perl modules from CPAN (http://www.cpan.org/) -


> cpan App::cpanminus

cpanm Module::Name

Python Package Index

https://pypi.python.org/pypi

One step installation process -


git install pip

pip install package

Where do they go in the unix directory structure?

Useful Python Packages and Tools

Python command line

Numpy

Jupyter Notebook

Inline -

http://search.cpan.org/dist/Inline-Python/Python.pod

Class, Iterator, etc.

Object oriented programming.

We do not want you to create classes. Just understand them so that you can
use them from the available libraries.

Code -


class complex_number:

def init(self, re, im):
                self.re = re
                self.im = im

z=complex_number(2,5)

print z.re, z.im


class complex_number:

def init(self, re, im):
                self.re = re
                self.im = im
        def absquare(self):
                return self.reself.re + self.imself.im

z=complex_number(2,5)

print z.re, z.im, z.absquare()

Example - integer_list, dna_seq

Purpose of class is to make sure data conforms to standard.

Iterables, Iterators, Generators

Very powerful concepts.

Collection.

Map, lambda.

http://nvie.com/posts/iterators-vs-generators/

All codes here.


a=iter(range(5))
print a.next()
print a.next()
print a.next()
print a.next()
print a.next()

http://nvie.com/posts/iterators-vs-generators/


>>> from itertools import cycle
>>> colors = cycle(['red', 'white', 'blue'])
>>> next(colors) ‘red’
>>> next(colors) ‘white’
>>> next(colors) ‘blue’
>>> next(colors) ‘red’

Protocols -

http://anandology.com/python-practice-book/iterators.html

http://www.dabeaz.com/generators-uk/

https://stackoverflow.com/questions/9884132/what-exactly-are-pythons-iterator-iterable-and-iteration-protocols

https://stackoverflow.com/questions/32799980/what-exactly-does-iterable-mean-in-python

** make sure example is changed **

https://docs.python.org/2/tutorial/classes.html

http://www.diveintopython3.net/iterators.html

http://nvie.com/posts/iterators-vs-generators/

The following code is from online.


class Fib:
    ‘'’iterator that yields numbers in the Fibonacci sequence’’’

def init(self, max):
        self.max = max
    def iter(self):
        self.a = 0
        self.b = 1
        return self
    def next(self):
        fib = self.a
        if fib > self.max:
            raise StopIteration
        self.a, self.b = self.b, self.a + self.b
        return fib

x=Fib(10)
print x

From BioPERL to Biopython

Biopython is good for -

  1. Quick analysis of nucleotide and protein sequence. You can easily extract
    a segment from a longer sequence, get reverse complement, do nucleotide
    to protein translation.
  2. Parsing of all kinds of files, including simple FASTA files,
    BLAST output, MUSCLE output, PDB files, and so on.
  3. Submitting requests to online databases and fetchin data from them. For example,
    you can programmatically run BLAST at NCBI, instead of manually filling up the
    form.
  4. Statistical and bioinformatics analysis - clustering, motifs, phylogeny, etc.

Analyzing Nucleotide and Protein Sequences

Biopython has many functions to perform routine analysis
of nucleotide and protein sequences. The sequences themselves
are saved in the Bio.Seq class.

Extracting Subsequences


from Bio.Seq import Seq
read = Seq(“GATCGTAGATAGTGCGCGCGTAGAGGAGAGATAGAGAGAGGAGATAGAGATA”)

print read[10:20]

You will see “AGTGCGCGCG” being printed.

Here read is a Bio.Seq object that can be used to store nucleotide
and protein sequences. A
subsequence of Bio.Seq object can be obtained in the same
way we get substrings. Its coordinate system starts from 0.

Reverse Complement


from Bio.Seq import Seq
read = Seq(“GATCGTAGATAGTGCGCGCGTAGAGGAGAGATAGAGAGAGGAGATAGAGATA”)

print read.reverse_complement()

The function ‘reverse_complement’ is included in Bio.Seq class. You
will see the output “TATCTCTATCTCCTCTCTCTATCTCTCCTCTACGCGCGCACTATCTACGATC”.

Translate into Proteins


from Bio.Seq import Seq
read = Seq(“GATCGTAGATAGTGCGCGCGTAGAGGAGAGATAGAGAGAGGAGATAGAGATA”)

print read.translate()


/usr/local/lib/python2.7/dist-packages/Bio/Seq.py:2095: BiopythonWarning: Partial codon, len(sequence) not a multiple of three. Explicitly trim the sequence or add trailing N before translation. This may become an error in future.
  BiopythonWarning)
DRR*CARRGEIERGDRD

If you trim the last nucleotide, the error will go away.

Compute GC Content


from Bio.SeqUtils import GC
from Bio.Seq import Seq
read = Seq(“GATCGTAGATAGTGCGCGCGTAGAGGAGAGATAGAGAGAGGAGATAGAGATA”)

print GC(read)


Output - 48.0769230769

You can experiment by changing the sequence to
all As or all Gs to see whether the GC function works correctly.

Parsing Biological Records

Parsing text files of different formats is a tedious task in
bioinformatics. Biopython makes this process
very easy. It can load data stored in many different file formats.

Functions - ‘read’ and ‘parse’

Biopython maintains uniform syntax for loading data from a file. Each
of its class parsing text files has two functions - ‘read’ and ‘parse’.

Typically, biological data files have multiple records of the
same type. For example, a FASTA file contains several different
sequences. A BLAST output file contains search results
for several different input sequence. The function ‘read’ loads only the first
record from a large file, whereas the function ‘parse’ creates an iterator
to go over all records. Examples for the functions are shown below.

Reading FASTA File

The function ‘read’ fetches only the first record from a FASTA file.


from Bio import SeqIO
record=SeqIO.read(“seq.fasta”, “fasta”)
print record.id, record.seq

The function ‘parse’ creates an iterator to go over all records in
a FASTA file. You can either loop over the records, or use next()
function to fetch one record at a time.


from Bio import SeqIO
records=SeqIO.parse(“seq.fasta”, “fasta”)

for record in records:
        print record.id, record.seq

or


from Bio import SeqIO
records=SeqIO.parse(“seq.fasta”, “fasta”)

record = next(records)
print record.id
print record.seq
print len(record)

record = next(records)
print record.id
print record.seq
print len(record)

KEGG Example

http://www.genome.jp/dbget-bin/www_bget?ec:5.1.1.1

The function ‘read’ fetches only one record.


from Bio.KEGG import Enzyme
record = Enzyme.read(open(“ec_5.1.1.1.txt”))

The function ‘parse’ creates an iterator to go over all records.


from Bio.KEGG import Enzyme
records = Enzyme.parse(open(“ec_5.1.1.1.txt”))

record=next(records)

record=next(records)

Commands for Different Types of Data

Similar ‘parse’ and ‘read’ functions can be used to process many different
types of data files. In the following table, we list only one of the
two functions. The other one is also valid.

Data Type Biopython Library
FASTA from Bio import SeqIO<br>records=SeqIO.parse(“seq.fasta”, “fasta”)
Genbank from Bio import SeqIO<br>records = SeqIO.parse(“dat.gbk”, “genbank”)
BLAST from Bio.Blast import NCBIXML<br>records = NCBIXML.parse(open(“blast_out.xml”))
CLUSTAL from Bio import AlignIO<br>align = AlignIO.read(“alignment.aln”, “clustal”)
MUSCLE from Bio import AlignIO<br>align = AlignIO.read(“alignment.faa”, “fasta”)
Phylogeny from Bio import Phylo<br>tree = Phylo.read(“tree.dnd”, “newick”)
Entrez from Bio import Entrez<br>records = Entrez.parse(open(“Homo_sapiens.xml”))
UniGene from Bio import UniGene<br>record = UniGene.read(open(“gene.data”))
GEO from Bio import Geo<br>records = Geo.parse(open(“GSE273.txt”))
Medline from Bio import Medline<br>records=Medline.parse(open(“pubmed_file.txt”))
Pubmed  
SwissProt Keywords from Bio.SwissProt import KeyWList<br>records = KeyWList.parse(open(“keywlist.txt”))
Prosite from Bio.ExPASy import Prosite<br>records = Prosite.parse(open(“prosite.dat”))
Prosite Doc from Bio.ExPASy import Prodoc<br>records = Prodoc.parse(open(“prosite.doc”))
EXPASy from Bio.ExPASy import Enzyme<br>records = Enzyme.parse(open(“enzyme.dat”))
PDB PDBParser from Bio.PDB.PDBParser import PDBParser
PDB MMCIF2Dict from Bio.PDB.MMCIF2Dict import MMCIF2Dict
PDB MMCIFParser from Bio.PDB.MMCIFParser import MMCIFParser
PDB MMTFParser from Bio.PDB.mmtf import MMTFParser
KEGG from Bio.KEGG import Enzyme<br>records = Enzyme.parse(open(“ec_5.1.1.1.txt”))

Although ‘parse’ and ‘read’ functions are used to parse different types
of data files, the records created by them are not identical. SeqIO.parse produces ‘SeqIO’ type of records, whereas ‘Medline.parse’ produces ‘Medline’ type of records. We
will see more details about those records in the following section.

Objects to Store Different Types of Data

In the last section, we learned about ‘read’ and ‘parse’
functions to read files in different formats. The output
from those calls create different kinds of records, as
appropriate for the situation. Let us see a few examples. Some
of those fields themselves can be iterators.

FASTA Record


Bio.SeqIO with fasta

id –>
seq –>

BLAST Record

Bio.Blast.NCBIXML object


from Bio.Blast import NCBIXML
records = NCBIXML.parse(open(“blast_output.xml”))

E_VALUE_THRESH = 0.00001
for record in records:
 for alignment in record.alignments:
   for hsp in alignment.hsps:
     if hsp.expect < E_VALUE_THRESH:
             print(‘Next Alignment’)
             print(‘seq:’, alignment.title)
             print(‘L:’, alignment.length)
             print(‘e value:’, hsp.expect)
             print(hsp.query[0:75] + ‘…’)
             print(hsp.match[0:75] + ‘…’)
             print(hsp.sbjct[0:75] + ‘…’)

More examples are shown in the Example section.

Accessing Data from the Internet

One attractive feature of Biopython is that it can fetch different
kinds of data from the internet. For example, in case of BLAST, it
can submit BLAST request to the NCBI, and then get back the output for you.

BLAST


from Bio.Blast import NCBIWWW
from Bio import SeqIO
record = SeqIO.read(“m_cold.fasta”, format=”fasta”)
result_handle = NCBIWWW.qblast(“blastn”, “nt”, record.seq)

save_file = open(“my_blast.xml”, “w”)
save_file.write(result_handle.read())
save_file.close()
result_handle.close()

KEGG

http://www.genome.jp/dbget-bin/www_bget?ec:5.1.1.1


from Bio.KEGG import REST
from Bio.KEGG import Enzyme
req = REST.kegg_get(“ec:5.1.1.1”)
open(“ec_5.1.1.1.txt”, ‘w’).write(req.read())

Entrez


from Bio import Entrez
Entrez.email = “A.N.Other@example.com” # Always tell NCBI who you are
handle = Entrez.efetch(db=”nucleotide”, id=”186972394”, rettype=”gb”, retmode=”text”)
print handle.read()

References

  1. https://coding4medicine.com/Materials/biopython/index.html
  2. http://biopython.org/DIST/docs/tutorial/Tutorial.html
  3. http://people.duke.edu/~ccc14/pcfb/biopython/BiopythonBasics.html
  4. https://www.coursera.org/learn/python-genomics/lecture/ahlsr/lecture-8-biopython-13-32
  5. https://www.gitbook.com/book/krother/biopython-tutorial/details

Closing Comments

PERL was known as the “duct tape that held the internet”.

Who killed PERL? We note that scripting languages go out of fashion not because they are bad, but they are successful in solving the problem at hand. That allows the rise of technologies solving the higher level of problems, and people working on new technology like to create their own tools to start out fresh.

https://www.linuxjournal.com/article/3394
https://www.fastcompany.com/3026446/the-fall-of-perl-the-webs-most-promising-language

PERL6 than never came and Python3 that never got adopted

PERL5 came out on October 17, 1994. That was the same year Netscape web browser was published and
internet was officially born.

https://thenewstack.io/larry-walls-quest-100-year-programming-language/