High-Performance Haplotype Assembly
The problem of Haplotype Assembly is an essential step in human genome analysis. It is typically formalised as the Minimum Error Correction (MEC) problem which is NP-hard. MEC has been approached using heuristics, integer linear programming, and fixed-parameter tractability (FPT), including approaches whose runtime is exponential in the length of the DNA fragments obtained by the sequencing process. Technological improvements are currently increasing fragment length, which drastically elevates computational costs for such methods. We present pWhatsHap, a multi-core parallelisation of WhatsHap, a recent FPT optimal approach to MEC. WhatsHap moves complexity from fragment length to fragment overlap and is hence of particular interest when considering sequencing technology’s current trends. pWhatsHap further improves the efficiency in solving the MEC problem, as shown by experiments performed on datasets with high coverage.
KeywordsGray Code Multicore Architecture Vertical Decomposition Haplotype Assembly Pipeline Parallelism
Unable to display preview. Download preview PDF.
- 3.Aldinucci, M., Torquati, M., Spampinato, C., Drocco, M., Misale, C., Calcagno, C., Coppo, M.: Parallel stochastic systems biology in the cloud. Briefings in Bioinformatics, June 2013Google Scholar
- 4.Amdahl, G.M.: Validity of the single processor approach to achieving large scale computing capabilities. In: AFIPS 1967 (Spring): Proc. of the April 18-20, pp. 483–485 (1967)Google Scholar
- 6.Bansal, V., Bafna, V.: HapCUT: an efficient and accurate algorithm for the haplotype assembly problem. Bioinformatics 24(16), i153–159 (2008)Google Scholar
- 10.R.G. Downey, M.R. Fellows: Parameterized Complexity, 530 pp. Springer (1999)Google Scholar
- 13.He, D., Choi, A., Pipatsrisawat, K., Darwiche, A., Eskin, E.: Optimal algorithms for haplotype assembly from whole-genome sequence data. Bioinformatics 26(12), i183–i190 (2010)Google Scholar
- 14.Kuleshov, V.: Probabilistic single-individual haplotyping. Bioinformatics 30(17), i379–i385 (2014)Google Scholar
- 15.Levy, S., Sutton, G., Ng, P.C., Feuk, L., Halpern, A.L., et al.: The diploid genome sequence of an individual human. PLoS Biol. 5(10), e254 (2007)Google Scholar
- 16.Mattson, T., Sanders, B., Massingill, B.: Patterns for parallel programming. Addison-Wesley Professional (2004)Google Scholar
- 17.Misale, C.: Accelerating bowtie2 with a lock-less concurrency approach and memory affinity. In: Proc. of the 22nd International Euromicro Conference PDP 2014: Parallel Distributed and network-based Processing, pp. 578–585 (2014)Google Scholar
- 20.Patterson, M., Marschall, T., Pisanti, N., van Iersel, L., Stougie, L., Klau, G.W., Schönhuth, A.: Whatshap: Haplotype assembly for future-generation sequencing reads. In: Proc. of 18th ACM Annual International Conference on Research in Computational Molecular Biology (RECOMB), pp. 237–249 (2014)Google Scholar