IDBA – A Practical Iterative de Bruijn Graph De Novo Assembler

  • Yu Peng
  • Henry C. M. Leung
  • S. M. Yiu
  • Francis Y. L. Chin
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6044)

Abstract

The de Bruijn graph assembly approach breaks reads into k-mers before assembling them into contigs. The string graph approach forms contigs by connecting two reads with k or more overlapping nucleotides. Both approaches must deal with the following problems: false-positive vertices, due to erroneous reads; gap problem, due to non-uniform coverage; branching problem, due to erroneous reads and repeat regions. A proper choice of k is crucial but for single k there is always a trade-off: a small k favors the situation of erroneous reads and non-uniform coverage, and a large k favors short repeat regions.

We propose an iterative de Bruijn graph approach iterating from small to large k exploring the advantages of the in between values. Our IDBA outperforms the existing algorithms by constructing longer contigs with similar accuracy and using less memory, both with real and simulated data. The running time of the algorithm is comparable to existing algorithms.

Availability: IDBA is available at http://www.cs.hku.hk/~alse/idba/

Keywords

De novo assembly de Bruijn graph string graph mate-pair high throughput short reads 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Wang, J., et al.: The diploid genome sequence of an Asian individual. Nature 456(7218), 60–65 (2008)CrossRefGoogle Scholar
  2. 2.
    Chaisson, M.J., Brinza, D., Pevzner, P.A.: De novo fragment assembly with short mate-paired reads: Does the read length matter? Genome Res. 19(2), 336–346 (2009)CrossRefGoogle Scholar
  3. 3.
    Warren, R.L., et al.: Assembling millions of short DNA sequences using SSAKE. Bioinformatics 23(4), 500–501 (2007)CrossRefGoogle Scholar
  4. 4.
    Jeck, W.R., et al.: Extending assembly of short DNA sequences to handle error. Bioinformatics 23(21), 2942–2944 (2007)CrossRefGoogle Scholar
  5. 5.
    Dohm, J.C., et al.: SHARCGS, a fast and highly accurate short-read assembly algorithm for de novo genomic sequencing. Genome Res. 17(11), 1697–1706 (2007)CrossRefGoogle Scholar
  6. 6.
    Chaisson, M.J., Pevzner, P.A.: Short read fragment assembly of bacterial genomes. Genome Res. 18(2), 324–330 (2008)CrossRefGoogle Scholar
  7. 7.
    Zerbino, D.R., Birney, E.: Velvet: algorithms for de novo short read assembly using de Bruijn graphs. Genome Res. 18(5), 821–829 (2008)CrossRefGoogle Scholar
  8. 8.
    Simpson, J.T., et al.: ABySS: a parallel assembler for short read sequence data. Genome Res. 19(6), 1117–1123 (2009)CrossRefMathSciNetGoogle Scholar
  9. 9.
    Hernandez, D., et al.: De novo bacterial genome sequencing: millions of very short reads assembled on a desktop computer. Genome Res. 18(5), 802–809 (2008)CrossRefGoogle Scholar
  10. 10.
    Chaisson, M., Pevzner, P., Tang, H.: Fragment assembly with short reads. Bioinformatics 20(13), 2067–2074 (2004)CrossRefGoogle Scholar
  11. 11.
    Butler, J., et al.: ALLPATHS: de novo assembly of whole-genome shotgun microreads. Genome Res. 18(5), 810–820 (2008)CrossRefGoogle Scholar
  12. 12.
    Pevzner, P.A., Tang, H., Waterman, M.S.: An Eulerian path approach to DNA fragment assembly. Proc. Natl. Acad. Sci. U.S.A. 98(17), 9748–9753 (2001)CrossRefMathSciNetMATHGoogle Scholar
  13. 13.
    Idury, R.M., Waterman, M.S.: A new algorithm for DNA sequence assembly. J. Comput. Biol. 2(2), 291–306 (1995)CrossRefGoogle Scholar
  14. 14.
    Myers, E.W.: The fragment assembly string graph. Bioinformatics 21(suppl. 2), ii79–ii85 (2005)Google Scholar
  15. 15.
    Chin, F.Y., et al.: Finding optimal threshold for correction error reads in DNA assembling. BMC Bioinformatics 10(suppl. 1), S15 (2009)Google Scholar
  16. 16.

Copyright information

© Springer-Verlag Berlin Heidelberg 2010

Authors and Affiliations

  • Yu Peng
    • 1
  • Henry C. M. Leung
    • 1
  • S. M. Yiu
    • 1
  • Francis Y. L. Chin
    • 1
  1. 1.Department of Computer ScienceThe University of Hong KongHong Kong

Personalised recommendations