FinIS: Improved in silico Finishing Using an Exact Quadratic Programming Formulation
With the increased democratization of sequencing, the reliance of sequence assembly programs on heuristics is at odds with the need for black-box assembly solutions that can be used reliably by non-specialists. In this work, we present a formal definition for in silico assembly validation and finishing and explore the feasibility of an exact solution for this problem using quadratic programming (FinIS). Based on results for several real and simulated datasets, we demonstrate that FinIS validates the correctness of a larger fraction of the assembly than existing ad hoc tools. Using a test for unique optimal solutions, we show that FinIS can improve on both precision and recall values for the correctness of assembled sequences, when compared to competing programs. Source code and executables for FinIS are freely available at http://sourceforge.net/projects/finis/ .
KeywordsGenome Assembly Finishing Quadratic Programming Graph Algorithms
Unable to display preview. Download preview PDF.
- 1.Li, Y., Zheng, H., Luo, R., et al.: Structural variation in two human genomes mapped at single-nucleotide resolution by whole genome de novo assembly. Nature Biotechnology 29, 6723–6730 (2011)Google Scholar
- 11.Nagarajan, N., Cook, C., Bonaventura, M.D., et al.: Finishing genomes with limited resources: lessons from an ensemble of microbial genomes. BMC Genomics 11(242) (2010)Google Scholar
- 12.Zerbino, D.R., McEwen, G.K., Marguiles, E.H., Birney, E.: Pebble and rock band: heuristic resolution of repeats and scaffolding in the velvet short-read de novo assembler. PLoS ONE 4(12) (2009)Google Scholar
- 18.Zerbino, D., Birney, E.: Velvet: algorithms for de novo short read assembly using de Bruijn graphs. Genome Research (2008), doi:10.1101/gr.074492.107Google Scholar
- 20.Kleinberg, J.M.: Approximation algorithms for disjoint path problems. Ph.D Thesis, Dept. of EECS. MIT (1996)Google Scholar
- 21.Fleischner, H.: Algorithms for Eulerian Trails, Eulerian Graphs and Related Topics. Annals of Discrete Mathematics, Part 1 2(50), X.1C13 (1991)Google Scholar
- 22.Kingsford, C., Schatz, M.C., Pop, M.: Assembly complexity of prokaryotic genomes using short reads. BMC Bioinformatics 11(21) (2010)Google Scholar
- 23.Richter, D.C., Ott, F., Schmid, R., Huson, D.H.: Metasim: a sequencing simulator for genomics and metagenomics. PloS One 3(10) (2008)Google Scholar
- 26.Katoh, K., Misawa, K., Kuma, K., Miyata, T.: MAFFT: a novel method for rapid multiple sequence alignment based on fast Fourier transform. Nucleic Acids Research 30(14) (2002)Google Scholar