Using de Bruijn Graphs for Short-read Assembly
Minimum price
Suggested price

Using de Bruijn Graphs for Short-read Assembly

About the Book

About the Author

Homolog_us blog provides cutting-edge information on bioinformatics, transcriptomics and computational biology. Our tutorials provide simple introduction for biologists on algorithms related to assembly and analysis of next-generation sequencing. Here we are publishing the tutorials in book format for easy access.

Table of Contents

  • 1. Introduction
    • Assembly procedure requires an understanding of sequencing technologies, algorithms and statistics
      • Sequencing technologies
      • Algorithms
      • Statistics
    • Why should biologists learn about assembly algorithms?
      • Software tools change, but the algorithms remain more stable
      • Knowledge of algorithms helps in experimental design
      • Understanding algorithms helps in purchasing adequate computing hardware
      • Improving assembly quality and interpreting results better
      • RNA-seq in non-model organisms
      • Using NGS technologies to solve novel problems
    • Description of this book
      • Chapters
      • Algebraic notations are avoided as much as possible
      • Living electronic book
      • Core concepts are reinforced through repetition
    • Regarding the exercises
      • Code - Pandora’s Toolbox for Bioinformatics
      • Data - E. coli and Electrophorus
    • Other online resources for learning
      • Introductory resources
      • Intermediate Resources
      • Advanced resources
    • Acknowledgements for this book
    • Further readings
  • 2. The Genome Assembly Problem
    • Shotgun sequencing and assembly of genomes
    • Is the problem solvable?
      • Lander-Waterman statistics for read length and coverage
      • Log(N) estimate for random sequences
      • Ukkonen’s condition
      • Optimum read length and coverage in presence of repeats
      • Reconstruction with paired end read
      • Summary
    • Greedy, overlap-layout-consensus and de Bruijn graph-based algorithms
      • Definitions
      • Greedy algorithm
      • Overlap-layout-consensus algorithm
      • de Bruijn graph-based algorithm
    • A short historical overview of using de Bruijn graphs in genome assembly
    • Advantages and disadvantages of using de Bruijn graphs for assembly
    • Further readings
  • 3. De Bruijn Graph of the Genome and a Simple Assembler
    • De Bruijn Graph of a known genome
      • De Bruijn graph of a small sequence
      • Double-stranded nature of the genome
      • De Bruijn graph in repetitive regions
    • Properties of de Bruijn Graph
      • The graph structure is unique for a given set of kmers
      • Irreversible
      • Impact of changing kmer size
      • Generalized definition of de Bruijn graph
    • Viewing de Bruijn graphs
      • Graphview
      • Online method - Alex Hadik
      • Ray cloud browser
      • FASTG Viewer BANDAGE
    • De Bruijn Graph of the E. coli genome
    • Genome assembly from de Bruijn graphs in the absense of noise
      • Relationship between de Bruijn graph of short reads and the underlying genome
      • Loss of read coherence during de Bruijn graph construction
      • Varying k-mer sizes to compensate for the loss of read coherence
      • Memory requirement and k-mer distribution of perfect library
    • Summary
    • Further readings
  • 4. Experimental Considerations
    • Evolution of sequencing technologies
      • Sanger sequencing
      • 454 pyrosequencing
      • Illumina dye sequencing
      • ABI SOLiD sequencing
      • Ion semiconductor sequencing
      • PacBio single molecule real time sequencing
    • Sources of artifacts
      • Random errors - substitutions and insertion-deletions
      • Homopolymer error
      • Quality drop near 3’ ends of reads
      • Coverage bias in AT-rich or GC-rich regions
      • Distance between the read pairs is variable in mate pairs
      • Mate pair inversion
      • Duplication due to PCR amplification
      • Inherent noise of PacBio reads
      • Diploid genome
    • Paired-end and mate-pair reads
      • Paired end
      • Mate pairs
      • Insert size
    • Data formats
      • FASTA and FASTQ
      • BAM and SAM Alignments
      • CIGAR
    • Detailed description of ABI SOLiD color space data
      • How to convert sequences to color space?
      • How do we compute reverse complement in color space?
      • Simple sequences
      • SNP
      • Advantage of color space
      • Disadvantage
      • Is conversion to nucleotide space prudent?
      • Pseudo-basespace
      • Error correction
      • de Bruijn Graph of SOLiD Reads
    • Summary of what we learned so far
    • Further readings
  • 5. Genome Assembly from Noisy Reads with Uneven Coverage
    • Errors lead to high RAM usage for de Bruijn graph-based assemblers
    • Impact of sequencing errors on the structure of de Bruijn graphs
      • Tips
      • Bubbles
      • Crosslinks
      • Tips and bubbles can appear due to read errors or polymorphism
    • Impact of uneven coverage
    • Full conceptual picture of short read assembly using de Bruijn graph
      • Effect of changing the coverage cutoff parameter
      • Effect of changing the k-mer size
    • Further readings
  • 6. Assembling Transcriptomes, Metagenomes and Heterozygous Genomes
    • General approach for solving complex assembly problems related to short reads
    • Using de Bruijn graphs to assembles heterozygous genomes
      • Impact of haplotype differences on de Bruijn Graph structure
      • Coverage
      • Assembly method
      • Separating phases after assembly
    • Using de Bruijn graphs for transcriptome (RNA-seq) assembly
      • De Bruijn graph structure of transcriptomic libraries
      • K-mer coverage
      • Assembly method
    • Using de Bruijn graphs for metagenome assembly
      • De Bruijn graph structure for metagenomic libraries
      • Coverage
      • Assembly method
    • Further readings
  • 7. Faster, Better and Cheaper
    • Computer science concepts for advanced work
      • Architecture of modern computer
      • Conference analogy
      • Processing Elements - CPU, GPU and FPGA
      • Algorithm and data structure
      • Disk-based algorithms and Hadoop
      • Parallel code, compare-and-swap, shared memory and queues
      • Hashing-related concepts
      • Alignment-related concepts
      • Other algorithmic concepts related to bioinformatics
    • Data structures for efficient storage and processing of de-Bruijn graphs
      • An elaborate data structure for de Bruijn graph
      • Using simplified k-mers - ABySS
      • Using edge-based data structure - Conway and Bromage
      • Using Sparse de Bruijn Graph - Sparse-assembler
      • Using Bloom filter - Minia
      • Using Perfect hash - Meraculous
      • Using Minimizer - BCALM
      • Using Succint Data Structure
    • Efficient algorithms for counting k-mers
      • Meryl
      • Suffix array - Tallymer
      • Bloom filter - BFcounter and scTurtle
      • Lock-free hash table - Jellyfish
      • Disk-based - DSK, KMC and KAnalyze
      • Minimizer-based - MSPKmerCounter, KMC2 and DSK2
      • Approximate counting - khmer
    • Efficient algorithms for read error correction
    • Improving assembly by changing k-mer size for de Bruin graph
      • Optimal k-mer-size for dBG construction
      • Merging read pairs to increase effective read length
      • Combining assemblies from multiple k-mers - IDBA and SPAdes
      • Variable order k-mer
      • String Graph Assembler for Short Reads
    • Scaffolding
      • Hierarchical scaffolding - SOAPdenovo
      • Rectangular Graph - SPAdes
    • Repeat resolution
      • SOAPdenovo
    • SPAdes
      • Hyperconnected k-mers
    • Further readings
  • 8. In Depth Discussion of Three de Bruijn Graph-based Assemblers
    • Where to get them
    • Genome assembler - SOAPdenovo
      • History
      • How to run
      • Features
      • Details of algorithm
      • Details of code
      • SOAPdenovo-trans Transcriptome Assembler
    • Genome Assembler - SPAdes
      • History
      • How to run
      • Features of SPAdes
      • Details of algorithm
      • Details of code
    • Transcriptome assembler - Trinity
      • History
      • How to run
      • Details of algorithm
      • Details of code
    • Further readings
  • 9. References
    • 1. Pre-NGS genome assemblers
      • Base-calling and error detection
      • Assemblers
    • 2. NGS genome assemblers
      • non de Bruijn, k-mer based
      • de Bruijn graph-based assemblers
      • Applications
      • Comparison
    • 3. Exomes, transcriptomes, metagenomes and highly polymorphic genomes
      • Transcriptome assemblers
      • Metagenomes
      • Polymorphic genomes
      • Targeted assembly
    • 4. Faster, better, cheaper
      • k-mer counting
      • Storage
      • Error correction
      • Hadoop
      • Hardware accelerators
      • String graph assembler
      • Scaffolding
      • Repeats
    • 6. Reviews and forecasts

The Leanpub 60 Day 100% Happiness Guarantee

Within 60 days of purchase you can get a 100% refund on any Leanpub purchase, in two clicks.

Now, this is technically risky for us, since you'll have the book or course files either way. But we're so confident in our products and services, and in our authors and readers, that we're happy to offer a full money back guarantee for everything we sell.

You can only find out how good something is by trying it, and because of our 100% money back guarantee there's literally no risk to do so!

So, there's no reason not to click the Add to Cart button, is there?

See full terms...

80% Royalties. Earn $16 on a $20 book.

We pay 80% royalties. That's not a typo: you earn $16 on a $20 sale. If we sell 5000 non-refunded copies of your book or course for $20, you'll earn $80,000.

(Yes, some authors have already earned much more than that on Leanpub.)

In fact, authors have earnedover $13 millionwriting, publishing and selling on Leanpub.

Learn more about writing on Leanpub

Free Updates. DRM Free.

If you buy a Leanpub book, you get free updates for as long as the author updates the book! Many authors use Leanpub to publish their books in-progress, while they are writing them. All readers get free updates, regardless of when they bought the book or how much they paid (including free).

Most Leanpub books are available in PDF (for computers) and EPUB (for phones, tablets and Kindle). The formats that a book includes are shown at the top right corner of this page.

Finally, Leanpub books don't have any DRM copy-protection nonsense, so you can easily read them on any supported device.

Learn more about Leanpub's ebook formats and where to read them

Write and Publish on Leanpub

You can use Leanpub to easily write, publish and sell in-progress and completed ebooks and online courses!

Leanpub is a powerful platform for serious authors, combining a simple, elegant writing and publishing workflow with a store focused on selling in-progress ebooks.

Leanpub is a magical typewriter for authors: just write in plain text, and to publish your ebook, just click a button. (Or, if you are producing your ebook your own way, you can even upload your own PDF and/or EPUB files and then publish with one click!) It really is that easy.

Learn more about writing on Leanpub