Skip to main content

GAVGA: A Genetic Algorithm for Viral Genome Assembly

  • Conference paper
  • First Online:
Progress in Artificial Intelligence (EPIA 2017)

Abstract

Bioinformatics has grown considerably since the development of the first sequencing machine, being today intensively used with the next generation DNA sequencers. Viral genomes represent a great challenge to bioinformatics due to its high mutation rate, forming quasispecies in the same infected host. In this paper, we implement and evaluate the performance of a genetic algorithm, named GAVGA, through the quality of a viral genome assembly. The assembly process works by first clustering the reads that share a common substring called seed and for each cluster, checks if there are overlapping reads with a given similarity percentage using a genetic algorithm. The assembled data are then compared to Newbler, SPAdes and ABySS assemblers, and also to a viral assembler such as VICUNA, which confirms the feasibility of our approach. GAVGA was implemented in python 2.7+ and can be downloaded at https://sourceforge.net/projects/gavga-assembler/.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

References

  1. Nagarajan, N., Pop, M.: Sequence assembly demystified. Nat. Rev. Genet. 14(3), 157–167 (2013)

    Article  Google Scholar 

  2. Palmer, L.E., Dejori, M., Bolanos, R., Fasulo, D.: Improving de novo sequence assembly using machine learning and comparative genomics for overlap correction. BMC Bioinform. 11(1), 33 (2010)

    Article  Google Scholar 

  3. Goés, F., Alves, R., Corrêa, L., Chaparro, C., Thom, L.: Towards an ensemble learning strategy for metagenomic gene prediction. In: Campos, S. (ed.) BSB 2014. LNCS, vol. 8826, pp. 17–24. Springer, Cham (2014). doi:10.1007/978-3-319-12418-6_3

    Chapter  Google Scholar 

  4. Peltola, H., Söderlund, H., Tarhio, J., Ukkonen, E.: Algorithms for some string matching problems arising in molecular genetics. In: IFIP Congress, pp. 59–64 (1983)

    Google Scholar 

  5. Henn, M.R., Boutwell, C.L., Charlebois, P., Lennon, N.J., Power, K.A., Macalalad, A.R., Zody, M.C.: Whole genome deep sequencing of HIV-1 reveals the impact of early minor variants upon immune recognition during acute infection. PLoS Pathog. 8(3), e1002529 (2012)

    Article  Google Scholar 

  6. Yang, X., Charlebois, P., Gnerre, S., Coole, M.G., Lennon, N.J., Levin, J.Z., Henn, M.R.: De novo assembly of highly diverse viral populations. BMC Genom. 13(1), 475 (2012)

    Article  Google Scholar 

  7. Hunt, M., Gall, A., Ong, S.H., Brener, J., Ferns, B., Goulder, P., Otto, T.D.: IVA: accurate de novo assembly of RNA virus genomes. Bioinformatics 31(14), 2374–2376 (2015)

    Article  Google Scholar 

  8. Ruby, J.G., Bellare, P., DeRisi, J.L.: PRICE: software for the targeted assembly of components of (Meta) genomic sequence data. G3: Genes Genomes Genet. 3(5), 865–880 (2013)

    Article  Google Scholar 

  9. Yamashita, A., Sekizuka, T., Kuroda, M.: VirusTAP: Viral genome-targeted assembly pipeline. Frontiers in microbiology 7, 32 (2016)

    Google Scholar 

  10. Gusfield, D. Algorithms on strings, trees and sequences: computer science and computational biology. Cambridge university press (1997)

    Google Scholar 

  11. Szwarcfiter, J.L.: Grafos e algoritimos computacionais, 2nd edn. Rio de Janeiro (1998)

    Google Scholar 

  12. Hoffman, A.J., Wolfe, J., Garfinkel, R.S., Johnson, D.S., Papadimitriou, C.H., Gilmore, P.C., Golden, B.L.: The Traveling Salesman Problem: A Guided Tour of Combinatorial Optimization. Wiley, Hoboken (1986)

    Google Scholar 

  13. Bertossi, A.A.: The edge Hamiltonian path problem is NP-complete. Inf. Process. Lett. 13(4–5), 157–159 (1981)

    Article  MathSciNet  Google Scholar 

  14. Héam, P.C., Hugot, V., Kouchnarenko, O.: The emptiness problem for tree automata with at least one global disequality constraint is NP-hard. Inf. Process. Lett. 118, 6–9 (2017)

    Article  MathSciNet  Google Scholar 

  15. Davis, L.: Handbook of Genetic Algorithms. Van Nostrand Reinhold, New York (1991)

    Google Scholar 

  16. NCBI Illumina MiSeq paired end sequencing of Human Immunodeficiency Virus 1. http://www.ncbi.nlm.nih.gov/sra/?term=ERR846528. Accessed 15 Apr 2017

  17. Sherry, S.: NCBI SRA Toolkit technology for next generation sequence data. In: Plant and Animal Genome XX Conference (2012)

    Google Scholar 

  18. Zhang, J., Kobert, K., Flouri, T., Stamatakis, A.: PEAR: a fast and accurate Illumina Paired-End reAd mergeR. Bioinformatics 30(5), 614–620 (2014)

    Article  Google Scholar 

  19. Chu, J., Sadeghi, S., Raymond, A., Jackman, S.D., Nip, K.M., Mar, R., Birol, I.: BioBloom tools: fast, accurate and memory-efficient host species sequence screening using bloom filters. Bioinformatics 30(23), 3402–3404 (2014)

    Article  Google Scholar 

  20. Schmieder, R., Edwards, R.: Quality control and preprocessing of metagenomic datasets. Bioinformatics 27(6), 863–864 (2011)

    Article  Google Scholar 

  21. Vavak, F., Fogarty, T.C.: Comparison of steady state and generational genetic algorithms for use in nonstationary environments. In: Proceedings of IEEE International Conference on Evolutionary Computation, pp. 192–195. IEEE (1996)

    Google Scholar 

  22. Rechenberg, I.: Evolution Strategy: Optimization of Technical systems by means of biological evolution. Fromman-Holzboog, Stuttgart, 104 (1973)

    Google Scholar 

  23. Altschul, S.F., Gish, W., Miller, W., Myers, E.W., Lipman, D.J.: Basic local alignment search tool. J. Mol. Biol. 215(3), 403–410 (1990)

    Article  Google Scholar 

  24. Parsons, R.J., Forrest, S., Burks, C.: Genetic algorithms, operators, and DNA fragment assembly. Mach. Learn. 21(1), 11–33 (1995)

    Google Scholar 

  25. Fraga, J.S. Algoritmos genéticos e o problema de montagem de reads (Master’s thesis) (2014)

    Google Scholar 

  26. Kikuchi, S., Chakraborty, G.: Heuristically tuned GA to solve genome fragment assembly problem. In: IEEE Congress on Evolutionary Computation, CEC 2006. pp. 1491–1498. IEEE (2006)

    Google Scholar 

  27. Indumathy, R., Maheswari, S.U.: Nature inspired algorithms to solve DNA fragment assembly problem: a survey. Int. J. Bioinform. Res. Appl. 2(2), 45–50 (2012)

    Google Scholar 

  28. Hughes, J., Houghten, S., Mallen-Fullerton, G. M., Ashlock, D.: Recentering and restarting genetic algorithm variations for dna fragment assembly. In: 2014 IEEE Conference on Computational Intelligence in Bioinformatics and Computational Biology, pp. 1–8. IEEE (2014)

    Google Scholar 

  29. Huang, W., Li, L., Myers, J.R., Marth, G.T.: ART: a next-generation sequencing read simulator. Bioinformatics 28(4), 593–594 (2012)

    Article  Google Scholar 

  30. Bankevich, A., Nurk, S., Antipov, D., Gurevich, A.A., Dvorkin, M., Kulikov, A.S., Pyshkin, A.V.: SPAdes: a new genome assembly algorithm and its applications to single-cell sequencing. J. Comput. Biol. 19(5), 455–477 (2012)

    Article  MathSciNet  Google Scholar 

  31. Simpson, J.T., Wong, K., Jackman, S.D., Schein, J.E., Jones, S.J., Birol, I.: ABySS: a parallel assembler for short read sequence data. Genome Res. 19(6), 1117–1123 (2009)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Renato R. M. Oliveira .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Cite this paper

Oliveira, R.R.M. et al. (2017). GAVGA: A Genetic Algorithm for Viral Genome Assembly. In: Oliveira, E., Gama, J., Vale, Z., Lopes Cardoso, H. (eds) Progress in Artificial Intelligence. EPIA 2017. Lecture Notes in Computer Science(), vol 10423. Springer, Cham. https://doi.org/10.1007/978-3-319-65340-2_33

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-65340-2_33

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-65339-6

  • Online ISBN: 978-3-319-65340-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics