Advertisement

FinIS: Improved in silico Finishing Using an Exact Quadratic Programming Formulation

  • Song Gao
  • Denis Bertrand
  • Niranjan Nagarajan
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7534)

Abstract

With the increased democratization of sequencing, the reliance of sequence assembly programs on heuristics is at odds with the need for black-box assembly solutions that can be used reliably by non-specialists. In this work, we present a formal definition for in silico assembly validation and finishing and explore the feasibility of an exact solution for this problem using quadratic programming (FinIS). Based on results for several real and simulated datasets, we demonstrate that FinIS validates the correctness of a larger fraction of the assembly than existing ad hoc tools. Using a test for unique optimal solutions, we show that FinIS can improve on both precision and recall values for the correctness of assembled sequences, when compared to competing programs. Source code and executables for FinIS are freely available at http://sourceforge.net/projects/finis/ .

Keywords

Genome Assembly Finishing Quadratic Programming Graph Algorithms 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Li, Y., Zheng, H., Luo, R., et al.: Structural variation in two human genomes mapped at single-nucleotide resolution by whole genome de novo assembly. Nature Biotechnology 29, 6723–6730 (2011)Google Scholar
  2. 2.
    Birol, I., Jackman, S.D., Nielsen, C.B., et al.: De novo transcriptome assembly with ABySS. Bioinformatics 25(21), 2872–2877 (2009)CrossRefGoogle Scholar
  3. 3.
    Woyke, T., Teeling, H., Ivanova, N.N., et al.: Symbiosis insights through metagenomic analysis of a microbial consortium. Nature 443, 950–955 (2006)CrossRefGoogle Scholar
  4. 4.
    Nagarajan, N., Pop, M.: Sequencing and genome assembly using next-generation technologies. Methods in Molecular Biology 673, 1–17 (2010)CrossRefGoogle Scholar
  5. 5.
    Baker, M.: De novo genome assembly: what every biologist should know. Nature Methods 9, 333–337 (2012)CrossRefGoogle Scholar
  6. 6.
    Nagarajan, N., Pop, M.: Parametric complexity of sequence assembly: theory and applications to next generation sequencing. Journal of Computational Biology 16(7), 897–908 (2009)MathSciNetCrossRefGoogle Scholar
  7. 7.
    Gao, S., Sung, W.K., Nagarajan, N.: Opera: reconstructing optimal genomic scaffolds with high-throughput paired-end sequences. Journal of Computational Biology 18(11), 1681–1691 (2011)MathSciNetCrossRefGoogle Scholar
  8. 8.
    Pop, M., Kosack, S.D., Salzberg, S.L.: Hierarchical scaffolding with bambus. Genome Research 14, 149–159 (2004)CrossRefGoogle Scholar
  9. 9.
    Nagarajan, N., Read, T.D., Pop, M.: Scaffolding and validation of bacterial genome assemblies using optical restriction maps. Bioinformatics 24(10), 1229–1235 (2008)CrossRefGoogle Scholar
  10. 10.
    Pop, M., Phillipy, A., Delcher, A.L., Salzberg, S.L.: Comparative genome assembly. Briefings in Bioinformatics 5(3), 237–248 (2004)CrossRefGoogle Scholar
  11. 11.
    Nagarajan, N., Cook, C., Bonaventura, M.D., et al.: Finishing genomes with limited resources: lessons from an ensemble of microbial genomes. BMC Genomics 11(242) (2010)Google Scholar
  12. 12.
    Zerbino, D.R., McEwen, G.K., Marguiles, E.H., Birney, E.: Pebble and rock band: heuristic resolution of repeats and scaffolding in the velvet short-read de novo assembler. PLoS ONE 4(12) (2009)Google Scholar
  13. 13.
    Li, R.H., Zhu, J., Ruan, W., et al.: De novo assembly of human genomes with massively parallel short read sequencing. Genome Research 20, 265–272 (2010)CrossRefGoogle Scholar
  14. 14.
    Tsai, I.J., Otto, T.D., Berriman, M.: Improving draft assemblies by iterative mapping and assembly of short reads to eliminate gaps. Genome Biology 11, R41 (2010)CrossRefGoogle Scholar
  15. 15.
    Kececioglu, J.D., Myers, E.W.: Combinatorial algorithms for DNA sequence assembly. Algorithmica 13, 7–51 (1993)MathSciNetCrossRefGoogle Scholar
  16. 16.
    Pevzner, P.A., Tang, H., Waterman, M.S.: A Eularian path approach to DNA fragment assembly. Proceedings of the National Academy of Sciences 98(17), 9748–9753 (2001)MathSciNetzbMATHCrossRefGoogle Scholar
  17. 17.
    Myers, E.W.: The fragment assembly string graph. Bioinformatics 21(2), 79–85 (2005)CrossRefGoogle Scholar
  18. 18.
    Zerbino, D., Birney, E.: Velvet: algorithms for de novo short read assembly using de Bruijn graphs. Genome Research (2008), doi:10.1101/gr.074492.107Google Scholar
  19. 19.
    Karger, D., Motwani, R., Ramkumar, G.D.S.: On approximating the longest path in a graph. Algorithmica 18, 421–432 (1993)MathSciNetGoogle Scholar
  20. 20.
    Kleinberg, J.M.: Approximation algorithms for disjoint path problems. Ph.D Thesis, Dept. of EECS. MIT (1996)Google Scholar
  21. 21.
    Fleischner, H.: Algorithms for Eulerian Trails, Eulerian Graphs and Related Topics. Annals of Discrete Mathematics, Part 1 2(50), X.1C13 (1991)Google Scholar
  22. 22.
    Kingsford, C., Schatz, M.C., Pop, M.: Assembly complexity of prokaryotic genomes using short reads. BMC Bioinformatics 11(21) (2010)Google Scholar
  23. 23.
    Richter, D.C., Ott, F., Schmid, R., Huson, D.H.: Metasim: a sequencing simulator for genomics and metagenomics. PloS One 3(10) (2008)Google Scholar
  24. 24.
    Kelley, D.R., Schatz, M.C., Salzberg, S.L.: Quake: quality-aware detection and correction of sequencing errors. Genome Biology 11, R116 (2010)CrossRefGoogle Scholar
  25. 25.
    Kurtz, S.A., Phillippy, A., Delcher, A.L., et al.: Versatile and open software for comparing large genomes. Genome Biology 5, R12 (2004)CrossRefGoogle Scholar
  26. 26.
    Katoh, K., Misawa, K., Kuma, K., Miyata, T.: MAFFT: a novel method for rapid multiple sequence alignment based on fast Fourier transform. Nucleic Acids Research 30(14) (2002)Google Scholar
  27. 27.
    Jarrod, A.C., Isaac, H., Sirisha, S., Shujun, L., Gary, P.S., Daniel, S.R.: Meraculous: De Novo Genome Assembly with Short Paired-End Reads. PLoS ONE 6(8), e23501 (2011), doi:10.1371/journal.pone.0023501CrossRefGoogle Scholar
  28. 28.
    Vandenberghe, L., Boyd, S.: Semidefinite Programming. SIAM Review 38, 49–95 (1996)MathSciNetzbMATHCrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  • Song Gao
    • 1
  • Denis Bertrand
    • 2
  • Niranjan Nagarajan
    • 2
  1. 1.NUS Graduate School for Integrative Sciences and EngineeringNational University of SingaporeSingapore
  2. 2.Computational and Systems BiologyGenome Institute of SingaporeSingapore

Personalised recommendations