Improving Genome Assemblies Using Multi-platform Sequence Data

  • Pınar KavakEmail author
  • Bekir Ergüner
  • Duran Üstek
  • Bayram Yüksel
  • Mahmut Şamil Sağıroğlu
  • Tunga Güngör
  • Can AlkanEmail author
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9874)


Accurate de novo assembly using short reads generated by next generation sequencing technologies is still an open problem. Although there are several assembly algorithms developed for data generated with different sequencing technologies, and some that can make use of hybrid data, the assemblies are still far from being perfect. There is still a need for computational approaches to improve draft assemblies. Here we propose a new method to correct assembly mistakes when there are multiple types of data generated using different sequencing technologies that have different strengths and biases. We exploit the assembly of highly accurate short reads to correct the contigs obtained from less accurate long reads. We apply our method to Illumina, 454, and Ion Torrent data, and also compare our results with existing hybrid assemblers, Celera and Masurca.


de novo assembly Assembly improvement Next generation multi-platform sequencing 


  1. 1.
    Warren, R.L., Sutton, G.G., Jones, S.J.M., Holt, R.A.: Assembling millions of short DNA sequences using SSAKE. Bioinformatics 23(4), 500–501 (2007)CrossRefGoogle Scholar
  2. 2.
    Dohm, J.C., Lottaz, C., Borodina, T., Himmelbauer, H.: SHARCGS, a fast and highly accurate short-read assembly algorithm for de novo genomic sequencing. Genome Res. 17(11), 1697–1706 (2007)CrossRefGoogle Scholar
  3. 3.
    Jeck, W.R., Reinhardt, J.A., Baltrus, D.A., Hickenbotham, M.T., Magrini, V., Mardis, E.R., Dangl, J.L., Jones, C.D.: Extending assembly of short DNA sequences to handle error. Bioinformatics 23(21), 2942–2944 (2007)CrossRefGoogle Scholar
  4. 4.
    Donmez, N., Brudno, M.: Hapsembler: an assembler for highly polymorphic genomes. In: Proceedings of the 15th Annual International Conference on Research in Computational Molecular Biology, pp. 38–52 (2008)Google Scholar
  5. 5.
    Myers, E.W., Sutton, G.G., Delcher, A.L., Dew, I.M., Fasulo, D.P., Flanigan, M.J., Kravitz, S.A., Mobarry, C.M., et al.: A whole-genome assembly of drosophila. Science 287(5461), 2196–2204 (2000). doi: 10.1126/science.287.5461.2196 CrossRefGoogle Scholar
  6. 6.
    Simpson, J., Durbin, R.: Efficient de novo assembly of large genomes using compressed data structures. Genome Res. 22, 549–556 (2012). doi: 10.1101/gr.126953.111 CrossRefGoogle Scholar
  7. 7.
    Zerbino, D.R., Birney, E.: Velvet: algorithms for de novo short read assembly using de Bruijn graphs. Genome Res. 18(5), 821–829 (2000). doi: 10.1101/gr.074492.107 CrossRefGoogle Scholar
  8. 8.
    Bankevich, A., Nurk, S., Antipov, D., Gurevich, A.A., Dvorkin, M., Kulikov, A.S., Lesin, V.M., Nikolenko, S.I., et al.: SPAdes: a new genome assembly algorithm and its applications to single-cell sequencing. J. Comput. Biol. 19(5), 455–477 (2012). doi: 10.1089/cmb.2012.0021 MathSciNetCrossRefGoogle Scholar
  9. 9.
    Butler, J., MacCallum, I., Kleber, M., Shlyakhter, I.A., Belmonte, M.K., Lander, E.S., Nusbaum, C., Jaffe, D.B.: ALLPATHS: de novo assembly of whole-genome shotgun microreads. Genome Res. 18(5), 810–820 (2008). doi: 10.1101/gr.7337908 CrossRefGoogle Scholar
  10. 10.
    Simpson, J.T., Wong, K., Jackman, S.D., Schein, J.E., Jones, S.J.M., Birol, İ.: ABySS: a parallel assembler for short read sequence data. Genome Res. 19(6), 1117–1123 (2009)CrossRefGoogle Scholar
  11. 11.
    Chaisson, M.J., Brinza, D., Pevzner, P.A.: De novo fragment assembly with short mate-paired reads: does the read length matter? Genome Res. 19(2), 336–346 (2008)CrossRefGoogle Scholar
  12. 12.
    Miller, J.R., Delcher, A.L., Koren, S., Venter, E., Walenz, B.P., Brownley, A., Johnson, J., Li, K., Mobarry, C., Sutton, G.: Aggressive assembly of pyrosequencing reads with mates. Bioinformatics 24(24), 2818–2824 (2008). doi: 10.1093/bioinformatics/btn548 CrossRefGoogle Scholar
  13. 13.
    Zimin, A., Marçais, G., Puiu, D., Roberts, M., Salzberg, S.L., Yorke, J.A.: The MaSuRCA genome assembler. Bioinformatics 29(21), 2669–2677 (2013). doi: 10.1093/bioinformatics/btt476 CrossRefGoogle Scholar
  14. 14.
    Chevreux, B., Wetter, T., Suhai, S.: Genome sequence assembly using trace signals and additional sequence information. In: Computer Science and Biology: Proceedings of the German Conference on Bioinformatics (GCB), vol. 99, pp. 45–56 (1999)Google Scholar
  15. 15.
    Chevreux, B., Pfisterer, T., Drescher, B., Driesel, A.J., Müller, W.E., Wetter, T., Suhai, S.: Using the miraEST assembler for reliable and automated mRNA transcript assembly and SNP detection in sequenced ESTs. Genome Res. 14(6), 1147–1159 (2004)CrossRefGoogle Scholar
  16. 16.
    Deshpande, V., Fung, E.D., Pham, S., Bafna, V.: Cerulean: A hybrid assembly using high throughput short and long reads (2013). arXiv:1307.7933 [q-bio.QM]
  17. 17.
    Ergüner, B., Ustek, D., Sağroğlu, M.: Performance comparison of next generation sequencing platforms. In: Poster presented at: 37th International Conference of the IEEE Engineering in Medicine and Biology Society (2015)Google Scholar
  18. 18.
    Wang, Y., Yao, Y., Bohu, P., Pei, H., Yixue, L., Zhifeng, S., Xiaogang, X., Xuan, L.: Optimizing hybrid assembly of next-generation sequence data from enterococcus faecium: a microbe with highly divergent genome. BMC Syst. Biol. 6(Suppl 3), S21 (2012). doi: 10.1186/1752-0509-6-S3-S21 CrossRefGoogle Scholar
  19. 19.
    Altschul, S., Gish, W., Miller, W., Myers, E., Lipman, D.J.: Basic local alignment search tool. J. Mol. Biol. 215(3), 403–410 (1990)CrossRefGoogle Scholar
  20. 20.
    Zhang, Z., Schwartz, S., Wagner, L., Miller, W.: A greedy algorithm for aligning DNA sequences. J. Comput. Biol. 7(12), 203–214 (2000)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing Switzerland 2016

Authors and Affiliations

  • Pınar Kavak
    • 1
    • 2
    Email author
  • Bekir Ergüner
    • 1
  • Duran Üstek
    • 3
  • Bayram Yüksel
    • 4
  • Mahmut Şamil Sağıroğlu
    • 1
  • Tunga Güngör
    • 2
  • Can Alkan
    • 5
    Email author
  1. 1.Advanced Genomics and Bioinformatics Research Group (İGBAM), BİLGEMThe Scientific and Technological Research Council of Turkey (TÜBİTAK)KocaeliTurkey
  2. 2.Department of Computer EngineeringBoğaziçi UniversityİstanbulTurkey
  3. 3.Department of Medical Geneticsİstanbul Medipol UniversityİstanbulTurkey
  4. 4.TÜBİTAK - MAM - GMBE (The Scientific and Technological Research Council of Turkey, Genetic Engineering and Biotechnology Institute)KocaeliTurkey
  5. 5.Department of Computer EngineeringBilkent UniversityAnkaraTurkey

Personalised recommendations