Finding Hidden Messages in DNA
Finding Hidden Messages in DNA
Minimum price
Suggested price
Finding Hidden Messages in DNA

This book is 100% complete

Completed on 2015-12-06

About the Book

Finding Hidden Messages in DNA represents the first two chapters of Bioinformatics Algorithms: an Active Learning Approach, which is one of the first textbooks to emerge from the recent Massive Open Online Course (MOOC) revolution. A light-hearted and analogy-filled companion to the authors’ acclaimed MOOC on Coursera, this book presents students with a dynamic approach to learning bioinformatics. It strikes a unique balance between practical challenges in modern biology and fundamental algorithmic ideas, thus capturing the interest of students of both biology and computer science.

The two chapters cover two central biological questions: “Where in the Genome Does Replication Begin?” and “Which DNA Patterns Play the Role of Molecular Clocks?” The textbook then steadily develops the algorithmic sophistication required to answer each question. Dozens of exercises are incorporated directly into the text as soon as they are needed; readers can test their knowledge through automated coding challenges on the Rosalind Bioinformatics Textbook Track.

The textbook website augments the textbook by providing additional educational materials, including video lectures and PowerPoint slides.

About the Authors

Phillip Compeau
Phillip Compeau

Phillip Compeau is an Assistant Teaching Professor in the Carnegie Mellon University Department of Computational Biology, where he serves as Assistant Director of the Master's in Computational Biology program ( He holds a Ph.D. from UC San Diego, a Master's degree from Cambridge University, and a Bachelor's degree with High Honors from Davidson College. He is the author of Bioinformatics Algorithms: An Active Learning Approach (

Phillip co-created the first massive open online course (MOOC) in bioinformatics, which has grown into the six-course Bioinformatics Specialization on Coursera ( He also co-founded Rosalind (, an online platform for learning bioinformatics. A retired tennis player, he dreams of one day going pro in golf.

Pavel Pevzner
Pavel Pevzner

Pavel Pevzner ( is Professor of Computer Science and Engineering at University of California San Diego (UCSD), where he holds the Ronald R. Taylor Chair and has taught a Bioinformatics Algorithms course for the last 12 years.  In 2006, he was named a Howard Hughes Medical Institute Professor. In 2011, he founded the Algorithmic Biology Laboratory in St. Petersburg, Russia, which develops online bioinformatics platform Rosalind ( His research concerns the creation of bioinformatics algorithms for analyzing genome rearrangements, DNA sequencing, and computational proteomics. He authored Computational Molecular Biology (The MIT Press, 2000), co-authored (jointly with Neil Jones) An Introduction to Bioinformatics Algorithms (The MIT Press, 2004), and Bioinformatics Algorithms: An Active Learning Approach  (Active Learning Publishers, 2014). For his research, he has been named a Fellow of both the Association for Computing Machinery (ACM) and the International Society for Computational Biology (ISCB).

Table of Contents

Chapter 1: Where in the Genome Does DNA Replication Begin?

A Journey of a Thousand Miles

Hidden Messages in the Replication Origin

          DnaA boxes

          Hidden messages in "The Gold-Bug"

          Counting words

          The Frequent Words Problem

          Frequent words in Vibrio cholerae

Some Hidden Messages are More Surprising than Others

An Explosion of Hidden Messages

          Looking for hidden messages in multiple geomes

          The Clump Finding Problem

The Simplest Way to Replicate DNA

Asymmetry of Replication

Peculiar Statistics of the Forward and Reverse Half-Strands


          The skew diagram

Some Hidden Messages are More Elusive than Others

A Final Attempt at Finding DnaA Boxes in E. coli

Epilogue: Complications in oriC Predictions

Open Problems

          Multiple replication origins in a bacterial genome

          Finding replication origins in archaea

          Finding replication origins in yeast

          Computing probabilities of patterns in a string

Charging Stations

          The frequency array

          Converting patterns to numbers and vice-versa

          Finding frequent words by sorting

          Solving the Clump Finding Problem

          Solving the Frequent Words with Mismatches Problem

          Generating the neighborhood of a string

          Finding frequent words with mismatches by sorting


          Big-O notation

          Probabilities of patterns in a string

          The most beautiful experiment in biology

          Directionality of DNA strands

          The Towers of Hanoi

          The overlapping words paradox

Bibliography Notes

Chapter 2: Which DNA Patterns Play the Role of Molecular Clocks?

Do We Have a "Clock" Gene?

Motif Finding is More Difficult than You Think

          Identifying the evening element

          Hide and seek with motifs

          A brute force algorithm for motif finding

Scoring Motifs

          From motifs to profile matrices and consensus strings

          Towards a more adequate motif scoring function

          Entropy and the motif logo

From Motif Finding to Finding a Median String

          The Motif Finding Problem

          Reformulating the Motif Finding Problem

          The Median String Problem

          Why have we reformulated the Motif Finding Problem?

Greedy Motif Search

          Using the profile matrix to roll dice

          Analyzing greedy motif finding

Motif Finding Meets Oliver Cromwell

          What is the probability that the sun will not rise tomorrow?

          Laplace's Rule of Succession

          An improved greedy motif search

Randomized Motif Search

          Rolling dice to find motifs

          Why randomized motif search works

How Can a Randomized Algorithm Perform So Well?

Gibbs Sampling

Gibbs Sampling in Action

Epilogue: How Does Tuberculosis Hibernate to Hide from Antibiotics?

Charging Stations

          Solving the Median String Problem


          Gene expression

          Dna arrays

          Buffon's needle

          Complications in motif finding

          Relative entropy

Bibliography Notes

The Leanpub 45-day 100% Happiness Guarantee

Within 45 days of purchase you can get a 100% refund on any Leanpub purchase, in two clicks.

See full terms

Free Updates. Free App. DRM Free.

If you buy a Leanpub book, you get free updates for as long as the author updates the book! Many authors use Leanpub to publish their books in-progress, while they are writing them. All readers get free updates, regardless of when they bought the book or how much they paid (including free).

Most Leanpub books are available in PDF (for computers), EPUB (for phones and tablets), MOBI (for Kindle) and in the free Leanpub App (for Mac, Windows, iOS and Android). The formats that a book includes are shown at the top right corner of this page.

Finally, Leanpub books don't have any DRM copy-protection nonsense, so you can easily read them on any supported device.

Learn more about Leanpub's ebook formats and where to read them

Write and Publish on Leanpub

Authors, publishers and universities use Leanpub to publish amazing in-progress and completed books and courses, just like this one. You can use Leanpub to write, publish and sell your book or course as well! Leanpub is a powerful platform for serious authors, combining a simple, elegant writing and publishing workflow with a store focused on selling in-progress ebooks. Leanpub is a magical typewriter for authors: just write in plain text, and to publish your ebook, just click a button. It really is that easy.

Learn more about writing on Leanpub