Abstract
Accurate de novo assembly using short reads generated by next generation sequencing technologies is still an open problem. Although there are several assembly algorithms developed for data generated with different sequencing technologies, and some that can make use of hybrid data, the assemblies are still far from being perfect. There is still a need for computational approaches to improve draft assemblies. Here we propose a new method to correct assembly mistakes when there are multiple types of data generated using different sequencing technologies that have different strengths and biases. We exploit the assembly of highly accurate short reads to correct the contigs obtained from less accurate long reads. We apply our method to Illumina, 454, and Ion Torrent data, and also compare our results with existing hybrid assemblers, Celera and Masurca.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Warren, R.L., Sutton, G.G., Jones, S.J.M., Holt, R.A.: Assembling millions of short DNA sequences using SSAKE. Bioinformatics 23(4), 500–501 (2007)
Dohm, J.C., Lottaz, C., Borodina, T., Himmelbauer, H.: SHARCGS, a fast and highly accurate short-read assembly algorithm for de novo genomic sequencing. Genome Res. 17(11), 1697–1706 (2007)
Jeck, W.R., Reinhardt, J.A., Baltrus, D.A., Hickenbotham, M.T., Magrini, V., Mardis, E.R., Dangl, J.L., Jones, C.D.: Extending assembly of short DNA sequences to handle error. Bioinformatics 23(21), 2942–2944 (2007)
Donmez, N., Brudno, M.: Hapsembler: an assembler for highly polymorphic genomes. In: Proceedings of the 15th Annual International Conference on Research in Computational Molecular Biology, pp. 38–52 (2008)
Myers, E.W., Sutton, G.G., Delcher, A.L., Dew, I.M., Fasulo, D.P., Flanigan, M.J., Kravitz, S.A., Mobarry, C.M., et al.: A whole-genome assembly of drosophila. Science 287(5461), 2196–2204 (2000). doi:10.1126/science.287.5461.2196
Simpson, J., Durbin, R.: Efficient de novo assembly of large genomes using compressed data structures. Genome Res. 22, 549–556 (2012). doi:10.1101/gr.126953.111
Zerbino, D.R., Birney, E.: Velvet: algorithms for de novo short read assembly using de Bruijn graphs. Genome Res. 18(5), 821–829 (2000). doi:10.1101/gr.074492.107
Bankevich, A., Nurk, S., Antipov, D., Gurevich, A.A., Dvorkin, M., Kulikov, A.S., Lesin, V.M., Nikolenko, S.I., et al.: SPAdes: a new genome assembly algorithm and its applications to single-cell sequencing. J. Comput. Biol. 19(5), 455–477 (2012). doi:10.1089/cmb.2012.0021
Butler, J., MacCallum, I., Kleber, M., Shlyakhter, I.A., Belmonte, M.K., Lander, E.S., Nusbaum, C., Jaffe, D.B.: ALLPATHS: de novo assembly of whole-genome shotgun microreads. Genome Res. 18(5), 810–820 (2008). doi:10.1101/gr.7337908
Simpson, J.T., Wong, K., Jackman, S.D., Schein, J.E., Jones, S.J.M., Birol, İ.: ABySS: a parallel assembler for short read sequence data. Genome Res. 19(6), 1117–1123 (2009)
Chaisson, M.J., Brinza, D., Pevzner, P.A.: De novo fragment assembly with short mate-paired reads: does the read length matter? Genome Res. 19(2), 336–346 (2008)
Miller, J.R., Delcher, A.L., Koren, S., Venter, E., Walenz, B.P., Brownley, A., Johnson, J., Li, K., Mobarry, C., Sutton, G.: Aggressive assembly of pyrosequencing reads with mates. Bioinformatics 24(24), 2818–2824 (2008). doi:10.1093/bioinformatics/btn548
Zimin, A., Marçais, G., Puiu, D., Roberts, M., Salzberg, S.L., Yorke, J.A.: The MaSuRCA genome assembler. Bioinformatics 29(21), 2669–2677 (2013). doi:10.1093/bioinformatics/btt476
Chevreux, B., Wetter, T., Suhai, S.: Genome sequence assembly using trace signals and additional sequence information. In: Computer Science and Biology: Proceedings of the German Conference on Bioinformatics (GCB), vol. 99, pp. 45–56 (1999)
Chevreux, B., Pfisterer, T., Drescher, B., Driesel, A.J., Müller, W.E., Wetter, T., Suhai, S.: Using the miraEST assembler for reliable and automated mRNA transcript assembly and SNP detection in sequenced ESTs. Genome Res. 14(6), 1147–1159 (2004)
Deshpande, V., Fung, E.D., Pham, S., Bafna, V.: Cerulean: A hybrid assembly using high throughput short and long reads (2013). arXiv:1307.7933 [q-bio.QM]
Ergüner, B., Ustek, D., Sağroğlu, M.: Performance comparison of next generation sequencing platforms. In: Poster presented at: 37th International Conference of the IEEE Engineering in Medicine and Biology Society (2015)
Wang, Y., Yao, Y., Bohu, P., Pei, H., Yixue, L., Zhifeng, S., Xiaogang, X., Xuan, L.: Optimizing hybrid assembly of next-generation sequence data from enterococcus faecium: a microbe with highly divergent genome. BMC Syst. Biol. 6(Suppl 3), S21 (2012). doi:10.1186/1752-0509-6-S3-S21
Altschul, S., Gish, W., Miller, W., Myers, E., Lipman, D.J.: Basic local alignment search tool. J. Mol. Biol. 215(3), 403–410 (1990)
Zhang, Z., Schwartz, S., Wagner, L., Miller, W.: A greedy algorithm for aligning DNA sequences. J. Comput. Biol. 7(12), 203–214 (2000)
Author information
Authors and Affiliations
Corresponding authors
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing Switzerland
About this paper
Cite this paper
Kavak, P. et al. (2016). Improving Genome Assemblies Using Multi-platform Sequence Data. In: Angelini, C., Rancoita, P., Rovetta, S. (eds) Computational Intelligence Methods for Bioinformatics and Biostatistics. CIBB 2015. Lecture Notes in Computer Science(), vol 9874. Springer, Cham. https://doi.org/10.1007/978-3-319-44332-4_17
Download citation
DOI: https://doi.org/10.1007/978-3-319-44332-4_17
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-44331-7
Online ISBN: 978-3-319-44332-4
eBook Packages: Computer ScienceComputer Science (R0)