Genovo: De Novo Assembly for Metagenomes

  • Jonathan Laserson
  • Vladimir Jojic
  • Daphne Koller
Conference paper

DOI: 10.1007/978-3-642-12683-3_22

Part of the Lecture Notes in Computer Science book series (LNCS, volume 6044)
Cite this paper as:
Laserson J., Jojic V., Koller D. (2010) Genovo: De Novo Assembly for Metagenomes. In: Berger B. (eds) Research in Computational Molecular Biology. RECOMB 2010. Lecture Notes in Computer Science, vol 6044. Springer, Berlin, Heidelberg

Abstract

Next-generation sequencing technologies produce a large number of noisy reads from the DNA in a sample. Metagenomics and population sequencing aim to recover the genomic sequences of the species in the sample, which could be of high diversity. Methods geared towards single sequence reconstruction are not sensitive enough when applied in this setting. We introduce a generative probabilistic model of read generation from environmental samples and present Genovo, a novel de novo sequence assembler that discovers likely sequence reconstructions under the model. A Chinese restaurant process prior accounts for the unknown number of genomes in the sample. Inference is made by applying a series of hill-climbing steps iteratively until convergence. We compare the performance of Genovo to three other short read assembly programs across one synthetic dataset and eight metagenomic datasets created using the 454 platform, the largest of which has 311k reads. Genovo’s reconstructions cover more bases and recover more genes than the other methods, and yield a higher assembly score.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Copyright information

© Springer-Verlag Berlin Heidelberg 2010

Authors and Affiliations

  • Jonathan Laserson
    • 1
  • Vladimir Jojic
    • 1
  • Daphne Koller
    • 1
  1. 1.Department of Computer ScienceStanford UniversityStanfordUSA

Personalised recommendations