Advertisement

High Performance Computing for Haplotyping: Models and Platforms

  • Andrea Tangherloni
  • Leonardo Rundo
  • Simone Spolaor
  • Marco S. Nobile
  • Ivan Merelli
  • Daniela Besozzi
  • Giancarlo Mauri
  • Paolo Cazzaniga
  • Pietro Liò
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11339)

Abstract

The reconstruction of the haplotype pair for each chromosome is a hot topic in Bioinformatics and Genome Analysis. In Haplotype Assembly (HA), all heterozygous Single Nucleotide Polymorphisms (SNPs) have to be assigned to exactly one of the two chromosomes. In this work, we outline the state-of-the-art on HA approaches and present an in-depth analysis of the computational performance of GenHap, a recent method based on Genetic Algorithms. GenHap was designed to tackle the computational complexity of the HA problem by means of a divide-et-impera strategy that effectively leverages multi-core architectures. In order to evaluate GenHap’s performance, we generated different instances of synthetic (yet realistic) data exploiting empirical error models of four different sequencing platforms (namely, Illumina NovaSeq, Roche/454, PacBio RS II and Oxford Nanopore Technologies MinION). Our results show that the processing time generally decreases along with the read length, involving a lower number of sub-problems to be distributed on multiple cores.

Keywords

Future-generation sequencing Genome Analysis Haplotype Assembly High Performance Computing Master-Slave paradigm 

Notes

Acknowledgment

We acknowledge the CINECA for the availability of High Performance Computing resources and support.

References

  1. 1.
    Bansal, V., Bafna, V.: HapCUT: an efficient and accurate algorithm for the haplotype assembly problem. Bioinformatics 24(16), i153–i159 (2008)CrossRefGoogle Scholar
  2. 2.
    Benedettini, S., Roli, A., Di Gaspero, L.: Two-level ACO for haplotype inference under pure parsimony. In: Dorigo, M., Birattari, M., Blum, C., Clerc, M., Stützle, T., Winfield, A.F.T. (eds.) ANTS 2008. LNCS, vol. 5217, pp. 179–190. Springer, Heidelberg (2008).  https://doi.org/10.1007/978-3-540-87527-7_16CrossRefGoogle Scholar
  3. 3.
    Bianchi, L., Liò, P.: Opportunities for community awareness platforms in personal genomics and bioinformatics education. Brief. Bioinform. 18(6), 1082–1090 (2016)Google Scholar
  4. 4.
    Bracciali, A., et al.: pWhatsHap: efficient haplotyping for future generation sequencing. BMC Bioinform. 17(Suppl. 11), 342 (2016)CrossRefGoogle Scholar
  5. 5.
    Browning, S.R., Browning, B.L.: Rapid and accurate haplotype phasing and missing-data inference for whole-genome association studies by use of localized haplotype clustering. Am. J. Hum. Genet. 81(5), 1084–1097 (2007)CrossRefGoogle Scholar
  6. 6.
    Chen, Z.Z., Deng, F., Wang, L.: Exact algorithms for haplotype assembly from whole-genome sequence data. Bioinformatics 29(16), 1938–1945 (2013)CrossRefGoogle Scholar
  7. 7.
    Choi, Y., Chan, A.P., Kirkness, E., Telenti, A., Schork, N.J.: Comparison of phasing strategies for whole human genomes. PLoS Genet. 14(4), e1007308 (2018)CrossRefGoogle Scholar
  8. 8.
    Daly, M.J., Rioux, J.D., Schaffner, S.F., Hudson, T.J., Lander, E.S.: High-resolution haplotype structure in the human genome. Nat. Genet. 29(2), 229 (2001)CrossRefGoogle Scholar
  9. 9.
    Das, S., Vikalo, H.: SDhaP: haplotype assembly for diploids and polyploids via semi-definite programming. BMC Genomics 16(1), 260 (2015)CrossRefGoogle Scholar
  10. 10.
    Delaneau, O., Marchini, J., Zagury, J.F.: A linear complexity phasing method for thousands of genomes. Nat. Methods 9(2), 179 (2012)CrossRefGoogle Scholar
  11. 11.
    Duitama, J., Huebsch, T., McEwen, G., Suk, E., Hoehe, M.: ReFHap: a reliable and fast algorithm for single individual haplotyping. In: Proceedings of the First ACM International Conference on Bioinformatics and Computational Biology, pp. 160–169. ACM (2010)Google Scholar
  12. 12.
    Edge, P., Bafna, V., Bansal, V.: HapCUT2: robust and accurate haplotype assembly for diverse sequencing technologies. Genome Res. 27(5), 801–812 (2017)CrossRefGoogle Scholar
  13. 13.
    Gabriel, S.B., et al.: The structure of haplotype blocks in the human genome. Science 296(5576), 2225–2229 (2002)CrossRefGoogle Scholar
  14. 14.
    Greenberg, H.J., Hart, W.E., Lancia, G.: Opportunities for combinatorial optimization in computational biology. INFORMS J. Comput. 16(3), 211–231 (2004)MathSciNetCrossRefGoogle Scholar
  15. 15.
    Hermisson, J., Pennings, P.S.: Soft sweeps and beyond: understanding the patterns and probabilities of selection footprints under rapid adaptation. Methods Ecol. Evol. 8(6), 700–716 (2017)CrossRefGoogle Scholar
  16. 16.
    Jain, M., Fiddes, I.T., Miga, K.H., Olsen, H.E., Paten, B., Akeson, M.: Improved data analysis for the MinION Nanopore sequencer. Nat. Methods 12(4), 351 (2015)CrossRefGoogle Scholar
  17. 17.
    Jain, M., et al.: Nanopore sequencing and assembly of a human genome with ultra-long reads. Nat. Biotechnol. 36(4), 338 (2018)CrossRefGoogle Scholar
  18. 18.
    Kuleshov, V.: Probabilistic single-individual haplotyping. Bioinformatics 30(17), i379–i385 (2014)CrossRefGoogle Scholar
  19. 19.
    Kuleshov, V., et al.: Whole-genome haplotyping using long reads and statistical methods. Nat. Biotechnol. 32(3), 261–266 (2014)CrossRefGoogle Scholar
  20. 20.
    Lander, E.S., Waterman, M.S.: Genomic mapping by fingerprinting random clones: a mathematical analysis. Genomics 2(3), 231–239 (1988)CrossRefGoogle Scholar
  21. 21.
    Levy, S., et al.: The diploid genome sequence of an individual human. PLoS Biol. 5(10), e254 (2007)CrossRefGoogle Scholar
  22. 22.
    Loh, P.R., et al.: Reference-based phasing using the haplotype reference consortium panel. Nat. Genet. 48(11), 1443 (2016)CrossRefGoogle Scholar
  23. 23.
    Luo, C., Tsementzi, D., Kyrpides, N., Read, T., Konstantinidis, K.T.: Direct comparisons of Illumina vs. Roche 454 sequencing technologies on the same microbial community DNA sample. PloS One 7(2), e30087 (2012)CrossRefGoogle Scholar
  24. 24.
    Maisto, D., Donnarumma, F., Pezzulo, G.: Divide et impera: subgoaling reduces the complexity of probabilistic inference and problem solving. J. R. Soc. Interface 12(104), 20141335 (2015)CrossRefGoogle Scholar
  25. 25.
    McElroy, K.E., Luciani, F., Thomas, T.: GemSIM: general, error-model based simulator of next-generation sequencing data. BMC Genomics 13(1), 74 (2012)CrossRefGoogle Scholar
  26. 26.
    Na, J.C., Lee, J.C., Rhee, J.K., Shin, S.Y.: PEATH: single individual haplotyping by a probabilistic evolutionary algorithm with toggling. Bioinformatics 34(11), 1801–1807 (2018)CrossRefGoogle Scholar
  27. 27.
    Nachman, M.W.: Single nucleotide polymorphisms and recombination rate in humans. Trends Genet. 17(9), 481–485 (2001)CrossRefGoogle Scholar
  28. 28.
    Otto, S.P., Whitton, J.: Polyploid incidence and evolution. Annu. Rev. Genet. 34(1), 401–437 (2000)CrossRefGoogle Scholar
  29. 29.
    Patterson, M., et al.: WhatsHap: weighted haplotype assembly for future-generation sequencing reads. J. Comput. Biol. 22(6), 498–509 (2015)CrossRefGoogle Scholar
  30. 30.
    Pirola, Y., Zaccaria, S., Dondi, R., Klau, G., Pisanti, N., Bonizzoni, P.: HapCol: accurate and memory-efficient haplotype assembly from long reads. Bioinformatics 32(11), 1610–1617 (2015)CrossRefGoogle Scholar
  31. 31.
    Quail, M.A., et al.: A large genome center’s improvements to the Illumina sequencing system. Nat. Methods 5(12), 1005 (2008)CrossRefGoogle Scholar
  32. 32.
    Rhoads, A., Au, K.F.: PacBio sequencing and its applications. Genomics Proteomics Bioinform. 13(5), 278–289 (2015)CrossRefGoogle Scholar
  33. 33.
    Roberts, R.J., Carneiro, M.O., Schatz, M.C.: The advantages of SMRT sequencing. Genome Biol. 14(6), 405 (2013)CrossRefGoogle Scholar
  34. 34.
    Rodriguez, F., Arkhipova, I.R.: Transposable elements and polyploid evolution in animals. Curr. Opin. Genet. Dev. 49, 115–123 (2018)CrossRefGoogle Scholar
  35. 35.
    Rundo, L., et al.: MedGA: a novel evolutionary method for image enhancement in medical imaging systems. Expert Syst. Appl. 119, 387–399 (2019)CrossRefGoogle Scholar
  36. 36.
    Senol Cali, D., Kim, J.S., Ghose, S., Alkan, C., Mutlu, O.: Nanopore sequencing technology and tools for genome assembly: computational analysis of the current state, bottlenecks and future directions. Brief. Bioinform., bby017 (2018)Google Scholar
  37. 37.
    Sheehan, S., Song, Y.S.: Deep learning for population genetic inference. PLoS Comput. Biol. 12(3), e1004845 (2016)CrossRefGoogle Scholar
  38. 38.
    Sims, D., Sudbery, I., Ilott, N.E., Heger, A., Ponting, C.P.: Sequencing depth and coverage: key considerations in genomic analyses. Nat. Rev. Genet. 15(2), 121 (2014)CrossRefGoogle Scholar
  39. 39.
    Tangherloni, A., Rundo, L., Spolaor, S., Cazzaniga, P., Nobile, M.S.: GPU-powered multi-swarm parameter estimation of biological systems: a master-slave approach. In: Proceedings of the 26th Euromicro International Conference on Parallel, Distributed and Network-based Processing (PDP), pp. 698–705. IEEE (2018)Google Scholar
  40. 40.
    Tangherloni, A., et al.: GenHap: a novel computational method based on genetic algorithms for haplotype assembly. BMC Bioinform. (2018, in press)Google Scholar
  41. 41.
    Wang, R., Wu, L., Li, Z., Zhang, X.: Haplotype reconstruction from SNP fragments by minimum error correction. Bioinformatics 21(10), 2456–2462 (2005)CrossRefGoogle Scholar
  42. 42.
    Wang, T.C., Taheri, J., Zomaya, A.Y.: Using genetic algorithm in reconstructing single individual haplotype with minimum error correction. J. Biomed. Inform. 45(5), 922–930 (2012)CrossRefGoogle Scholar
  43. 43.
    Zhang, K., Calabrese, P., Nordborg, M., Sun, F.: Haplotype block structure and its applications to association studies: power and study designs. Am. J. Hum. Genet. 71(6), 1386–1394 (2002)CrossRefGoogle Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  • Andrea Tangherloni
    • 1
  • Leonardo Rundo
    • 1
    • 5
  • Simone Spolaor
    • 1
  • Marco S. Nobile
    • 1
    • 6
  • Ivan Merelli
    • 2
  • Daniela Besozzi
    • 1
  • Giancarlo Mauri
    • 1
    • 6
  • Paolo Cazzaniga
    • 3
    • 6
  • Pietro Liò
    • 4
  1. 1.Department of Informatics, Systems and CommunicationUniversity of Milano-BicoccaMilanItaly
  2. 2.Institute of Biomedical Technologies, Italian National Research CouncilSegrateItaly
  3. 3.Department of Human and Social SciencesUniversity of BergamoBergamoItaly
  4. 4.Computer LaboratoryUniversity of CambridgeCambridgeUK
  5. 5.Institute of Molecular Bioimaging and Physiology, Italian National Research CouncilCefalùItaly
  6. 6.SYSBIO.IT Centre of Systems BiologyMilanoItaly

Personalised recommendations