High-Performance Haplotype Assembly

  • Marco Aldinucci
  • Andrea Bracciali
  • Tobias Marschall
  • Murray Patterson
  • Nadia Pisanti
  • Massimo Torquati
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8623)


The problem of Haplotype Assembly is an essential step in human genome analysis. It is typically formalised as the Minimum Error Correction (MEC) problem which is NP-hard. MEC has been approached using heuristics, integer linear programming, and fixed-parameter tractability (FPT), including approaches whose runtime is exponential in the length of the DNA fragments obtained by the sequencing process. Technological improvements are currently increasing fragment length, which drastically elevates computational costs for such methods. We present pWhatsHap, a multi-core parallelisation of WhatsHap, a recent FPT optimal approach to MEC. WhatsHap moves complexity from fragment length to fragment overlap and is hence of particular interest when considering sequencing technology’s current trends. pWhatsHap further improves the efficiency in solving the MEC problem, as shown by experiments performed on datasets with high coverage.


Gray Code Multicore Architecture Vertical Decomposition Haplotype Assembly Pipeline Parallelism 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Aldinucci, M., Bracciali, A., Liò, P., Sorathiya, A., Torquati, M.: StochKit-FF: Efficient systems biology on multicore architectures. In: Guarracino, M.R., et al. (eds.) Euro-Par-Workshop 2010. LNCS, vol. 6586, pp. 167–175. Springer, Heidelberg (2011)CrossRefGoogle Scholar
  2. 2.
    Aldinucci, M., Danelutto, M., Kilpatrick, P., Meneghin, M., Torquati, M.: Accelerating code on multi-cores with fastflow. In: Jeannot, E., Namyst, R., Roman, J. (eds.) Euro-Par 2011, Part II. LNCS, vol. 6853, pp. 170–181. Springer, Heidelberg (2011)CrossRefGoogle Scholar
  3. 3.
    Aldinucci, M., Torquati, M., Spampinato, C., Drocco, M., Misale, C., Calcagno, C., Coppo, M.: Parallel stochastic systems biology in the cloud. Briefings in Bioinformatics, June 2013Google Scholar
  4. 4.
    Amdahl, G.M.: Validity of the single processor approach to achieving large scale computing capabilities. In: AFIPS 1967 (Spring): Proc. of the April 18-20, pp. 483–485 (1967)Google Scholar
  5. 5.
    Asanovic, K., Bodik, R., Demmel, J., Keaveny, T., Keutzer, K., Kubiatowicz, J., Morgan, N., Patterson, D., Sen, K., Wawrzynek, J., Wessel, D., Yelick, K.: A view of the parallel computing landscape. Communications of the ACM 52(10), 56–67 (2009)CrossRefGoogle Scholar
  6. 6.
    Bansal, V., Bafna, V.: HapCUT: an efficient and accurate algorithm for the haplotype assembly problem. Bioinformatics 24(16), i153–159 (2008)Google Scholar
  7. 7.
    Bansal, V., Halpern, A.L., Axelrod, N., Bafna, V.: An MCMC algorithm for haplotype assembly from whole-genome sequence data. Genome Research 18(8), 1336–1346 (2008)CrossRefGoogle Scholar
  8. 8.
    Chen, Z.-Z., Deng, F., Wang, L.: Exact algorithms for haplotype assembly from whole-genome sequence data. Bioinformatics 29(16), 1938–1945 (2013)CrossRefGoogle Scholar
  9. 9.
    Cilibrasi, R., van Iersel, L., Kelk, S., Tromp, J.: On the complexity of several haplotyping problems. In: Casadio, R., Myers, G. (eds.) WABI 2005. LNCS (LNBI), vol. 3692, pp. 128–139. Springer, Heidelberg (2005)CrossRefGoogle Scholar
  10. 10.
    R.G. Downey, M.R. Fellows: Parameterized Complexity, 530 pp. Springer (1999)Google Scholar
  11. 11.
    Fouilhoux, P., Mahjoub, A.R.: Solving VLSI design and DNA sequencing problems using bipartization of graphs. Computational Optimization and Applications 51(2), 749–781 (2012)MathSciNetCrossRefzbMATHGoogle Scholar
  12. 12.
    Greenberg, H.J., Hart, W.E., Lancia, G.: Opportunities for combinatorial optimization in computational biology. INFORMS J. on Computing 16(3), 211–231 (2004)MathSciNetCrossRefzbMATHGoogle Scholar
  13. 13.
    He, D., Choi, A., Pipatsrisawat, K., Darwiche, A., Eskin, E.: Optimal algorithms for haplotype assembly from whole-genome sequence data. Bioinformatics 26(12), i183–i190 (2010)Google Scholar
  14. 14.
    Kuleshov, V.: Probabilistic single-individual haplotyping. Bioinformatics 30(17), i379–i385 (2014)Google Scholar
  15. 15.
    Levy, S., Sutton, G., Ng, P.C., Feuk, L., Halpern, A.L., et al.: The diploid genome sequence of an individual human. PLoS Biol. 5(10), e254 (2007)Google Scholar
  16. 16.
    Mattson, T., Sanders, B., Massingill, B.: Patterns for parallel programming. Addison-Wesley Professional (2004)Google Scholar
  17. 17.
    Misale, C.: Accelerating bowtie2 with a lock-less concurrency approach and memory affinity. In: Proc. of the 22nd International Euromicro Conference PDP 2014: Parallel Distributed and network-based Processing, pp. 578–585 (2014)Google Scholar
  18. 18.
    Mousavi, S.R., Mirabolghasemi, M., Bargesteh, N., Talebi, M.: Effective haplotype assembly via maximum Boolean satisfiablility. Biochemical and Biophysical Research Communications 404(2), 593–598 (2011)CrossRefGoogle Scholar
  19. 19.
    Panconesi, A., Sozio, M.: Fast hare: A fast heuristic for single individual SNP haplotype reconstruction. In: Jonassen, I., Kim, J. (eds.) WABI 2004. LNCS (LNBI), vol. 3240, pp. 266–277. Springer, Heidelberg (2004)CrossRefGoogle Scholar
  20. 20.
    Patterson, M., Marschall, T., Pisanti, N., van Iersel, L., Stougie, L., Klau, G.W., Schönhuth, A.: Whatshap: Haplotype assembly for future-generation sequencing reads. In: Proc. of 18th ACM Annual International Conference on Research in Computational Molecular Biology (RECOMB), pp. 237–249 (2014)Google Scholar
  21. 21.
    Zhao, Y.-T., Wu, L.-Y., Zhang, J.-H., Wang, R.-S., Zhang, X.-S.: Haplotype assembly from aligned weighted SNP fragments. Computational Biology and Chemistry 29, 281–287 (2005)CrossRefzbMATHGoogle Scholar

Copyright information

© Springer International Publishing Switzerland 2015

Authors and Affiliations

  • Marco Aldinucci
    • 3
  • Andrea Bracciali
    • 1
  • Tobias Marschall
    • 4
    • 5
  • Murray Patterson
    • 6
  • Nadia Pisanti
    • 2
  • Massimo Torquati
    • 2
  1. 1.Computer Science and MathematicsStirling UniversityStirlingUK
  2. 2.ERABLE team, INRIA, Computer Science DepartmentUniversity of PisaPisaItaly
  3. 3.Computer Science DepartmentUniversity of TorinoTorinoItaly
  4. 4.Center for BioinformaticsSaarland UniversitySaarbrückenGermany
  5. 5.Computational Biology and Applied AlgorithmicsMax Planck Inst. for InformaticsSaarbrückenGermany
  6. 6.Lab. Biométrie et Biologie EvolutiveUniversity LyonLyonFrance

Personalised recommendations