Finding Hidden Messages in DNA
Minimum price
Suggested price

Finding Hidden Messages in DNA

About the Book

Finding Hidden Messages in DNA represents the first two chapters of Bioinformatics Algorithms: an Active Learning Approach, which is one of the first textbooks to emerge from the recent Massive Open Online Course (MOOC) revolution. A light-hearted and analogy-filled companion to the authors’ acclaimed MOOC on Coursera, this book presents students with a dynamic approach to learning bioinformatics. It strikes a unique balance between practical challenges in modern biology and fundamental algorithmic ideas, thus capturing the interest of students of both biology and computer science.

The two chapters cover two central biological questions: “Where in the Genome Does Replication Begin?” and “Which DNA Patterns Play the Role of Molecular Clocks?” The textbook then steadily develops the algorithmic sophistication required to answer each question. Dozens of exercises are incorporated directly into the text as soon as they are needed; readers can test their knowledge through automated coding challenges on the Rosalind Bioinformatics Textbook Track.

The textbook website augments the textbook by providing additional educational materials, including video lectures and PowerPoint slides.

  • Share this book

  • Categories

    • Computers and Programming
    • Data Science
    • Sciences
    • Textbooks
    • Software Engineering
  • Feedback

    Email the Author(s)

About the Authors

Phillip Compeau
Phillip Compeau

Dr. Compeau is the Assistant Department Head and an Associate Teaching Professor in the Computational Biology Department at Carnegie Mellon University. He directs the undergraduate program and serves as an assistant director for the MS in Computational Biology program, and co-founded (with Josh Kangas) the PreCollege Program in Computational Biology, the first educational program in computational biology for high school students in the United States. 

Dr. Compeau also teaches a variety of courses, and he is passionate about how online and offline educational materials can inform and enrich each other as we build an effective 21st Century classroom. He completed a Ph.D. in mathematics at UC San Diego, where he co-founded Rosalind, a platform for learning computational biology and algorithms through problem solving that has reached over 300,000 users. Phillip also helped lead the development of the first massive open online course (MOOC) in computational biology in 2013, which has since grown into the Bioinformatics Specialization on Coursera. He is the co-author (with Pavel Pevzner) of Bioinformatics Algorithms: An Active Learning Approach, a bestselling textbook in computational biology that has been adopted by 200 instructors in over 40 countries. And he is the founder of the Biological Modeling open online course project.

Phillip Compeau

Episode 227

Pavel Pevzner
Pavel Pevzner

Pavel Pevzner ( is Professor of Computer Science and Engineering at University of California San Diego (UCSD), where he holds the Ronald R. Taylor Chair and has taught a Bioinformatics Algorithms course for the last 12 years.  In 2006, he was named a Howard Hughes Medical Institute Professor. In 2011, he founded the Algorithmic Biology Laboratory in St. Petersburg, Russia, which develops online bioinformatics platform Rosalind ( His research concerns the creation of bioinformatics algorithms for analyzing genome rearrangements, DNA sequencing, and computational proteomics. He authored Computational Molecular Biology (The MIT Press, 2000), co-authored (jointly with Neil Jones) An Introduction to Bioinformatics Algorithms (The MIT Press, 2004), and Bioinformatics Algorithms: An Active Learning Approach  (Active Learning Publishers, 2014). For his research, he has been named a Fellow of both the Association for Computing Machinery (ACM) and the International Society for Computational Biology (ISCB).

Table of Contents

Chapter 1: Where in the Genome Does DNA Replication Begin?

A Journey of a Thousand Miles

Hidden Messages in the Replication Origin

          DnaA boxes

          Hidden messages in "The Gold-Bug"

          Counting words

          The Frequent Words Problem

          Frequent words in Vibrio cholerae

Some Hidden Messages are More Surprising than Others

An Explosion of Hidden Messages

          Looking for hidden messages in multiple geomes

          The Clump Finding Problem

The Simplest Way to Replicate DNA

Asymmetry of Replication

Peculiar Statistics of the Forward and Reverse Half-Strands


          The skew diagram

Some Hidden Messages are More Elusive than Others

A Final Attempt at Finding DnaA Boxes in E. coli

Epilogue: Complications in oriC Predictions

Open Problems

          Multiple replication origins in a bacterial genome

          Finding replication origins in archaea

          Finding replication origins in yeast

          Computing probabilities of patterns in a string

Charging Stations

          The frequency array

          Converting patterns to numbers and vice-versa

          Finding frequent words by sorting

          Solving the Clump Finding Problem

          Solving the Frequent Words with Mismatches Problem

          Generating the neighborhood of a string

          Finding frequent words with mismatches by sorting


          Big-O notation

          Probabilities of patterns in a string

          The most beautiful experiment in biology

          Directionality of DNA strands

          The Towers of Hanoi

          The overlapping words paradox

Bibliography Notes

Chapter 2: Which DNA Patterns Play the Role of Molecular Clocks?

Do We Have a "Clock" Gene?

Motif Finding is More Difficult than You Think

          Identifying the evening element

          Hide and seek with motifs

          A brute force algorithm for motif finding

Scoring Motifs

          From motifs to profile matrices and consensus strings

          Towards a more adequate motif scoring function

          Entropy and the motif logo

From Motif Finding to Finding a Median String

          The Motif Finding Problem

          Reformulating the Motif Finding Problem

          The Median String Problem

          Why have we reformulated the Motif Finding Problem?

Greedy Motif Search

          Using the profile matrix to roll dice

          Analyzing greedy motif finding

Motif Finding Meets Oliver Cromwell

          What is the probability that the sun will not rise tomorrow?

          Laplace's Rule of Succession

          An improved greedy motif search

Randomized Motif Search

          Rolling dice to find motifs

          Why randomized motif search works

How Can a Randomized Algorithm Perform So Well?

Gibbs Sampling

Gibbs Sampling in Action

Epilogue: How Does Tuberculosis Hibernate to Hide from Antibiotics?

Charging Stations

          Solving the Median String Problem


          Gene expression

          Dna arrays

          Buffon's needle

          Complications in motif finding

          Relative entropy

Bibliography Notes

The Leanpub 60-day 100% Happiness Guarantee

Within 60 days of purchase you can get a 100% refund on any Leanpub purchase, in two clicks.

See full terms

80% Royalties. Earn $16 on a $20 book.

We pay 80% royalties. That's not a typo: you earn $16 on a $20 sale. If we sell 5000 non-refunded copies of your book or course for $20, you'll earn $80,000.

(Yes, some authors have already earned much more than that on Leanpub.)

In fact, authors have earnedover $12 millionwriting, publishing and selling on Leanpub.

Learn more about writing on Leanpub

Free Updates. DRM Free.

If you buy a Leanpub book, you get free updates for as long as the author updates the book! Many authors use Leanpub to publish their books in-progress, while they are writing them. All readers get free updates, regardless of when they bought the book or how much they paid (including free).

Most Leanpub books are available in PDF (for computers) and EPUB (for phones, tablets and Kindle). The formats that a book includes are shown at the top right corner of this page.

Finally, Leanpub books don't have any DRM copy-protection nonsense, so you can easily read them on any supported device.

Learn more about Leanpub's ebook formats and where to read them

Write and Publish on Leanpub

You can use Leanpub to easily write, publish and sell in-progress and completed ebooks and online courses!

Leanpub is a powerful platform for serious authors, combining a simple, elegant writing and publishing workflow with a store focused on selling in-progress ebooks.

Leanpub is a magical typewriter for authors: just write in plain text, and to publish your ebook, just click a button. (Or, if you are producing your ebook your own way, you can even upload your own PDF and/or EPUB files and then publish with one click!) It really is that easy.

Learn more about writing on Leanpub