HapMonster: A Statistically Unified Approach for Variant Calling and Haplotyping Based on Phase-Informative Reads

  • Kaname Kojima
  • Naoki Nariai
  • Takahiro Mimori
  • Yumi Yamaguchi-Kabata
  • Yukuto Sato
  • Yosuke Kawai
  • Masao Nagasaki
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8542)

Abstract

Haplotype phasing is essential for identifying disease-causing variants with phase-dependent interactions as well as for the coalescent-based inference of demographic history. One of approaches for estimating haplotypes is to use phase-informative reads, which span multiple heterozygous variant positions. Although the quality of estimated variants is crucial in haplotype phasing, accurate variant calling is still challenging due to errors on sequencing and read mapping. Since some of such errors can be corrected by considering haplotype phasing, simultaneous estimation of variants and haplotypes is important. Thus, we propose a statistically unified approach for variant calling and haplotype phasing named HapMonster, where haplotype phasing information is used for improving the accuracy of variant calling and the improved variant calls are used for more accurate haplotype phasing. From the comparison with other existing methods on simulation and real sequencing data, we confirm the effectiveness of HapMonster in both variant calling and haplotype phasing.

Keywords

Next generation sequencing variant call haplotype phasing 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Aguiar, D., Istrail, S.: Haplotype assembly in polyploid genomes and identical by descent shared tracts. Bioinformatics 29(13), i352–i360 (2013)Google Scholar
  2. 2.
    Bansal, V., Libiger, O., Torkamani, A., Schork, N.J.: Statistical analysis strategies for association studies involving rare variants. Nature Reviews Genetics 11, 773–785 (2010)CrossRefGoogle Scholar
  3. 3.
    Browning, R., Browning, B.L.: Rapid and accurate haplotype phasing and missing data inference for whole genome association studies using localized haplotype clustering. Ametican Journal of Human Genetics 81, 1084–1097 (2007)CrossRefGoogle Scholar
  4. 4.
    Delaneau, O., Marchini, J., Zagury, J.F.: A linear complexity phasing method for thousands of genomes. Nature Methods 9(2), 179–181 (2011)CrossRefGoogle Scholar
  5. 5.
    DePristo, M.A., et al.: A framework for variation discovery and genotyping using next-generation DNA sequencing data. Nature Genetics 43, 491–498 (2011)CrossRefGoogle Scholar
  6. 6.
    Kojima, K., Nariai, N., Mimori, T., Takahashi, M., Yamaguchi-Kabata, Y., Sato, Y., Nagasaki, M.: A statistical variant calling approach from pedigree information and local haplotyping with phase informative reads. Bioinformatics 29(22), 2835–2843 (2013)CrossRefGoogle Scholar
  7. 7.
    Kuhner, M.K.: Coalescent genealogy samplers: Windows into population history. Trends in Ecology and Evolution 24(2), 86–93 (2009)CrossRefGoogle Scholar
  8. 8.
    Li, H.: Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM. arXiv:1303.3997 (2013)Google Scholar
  9. 9.
    Li, H., Durbin, R.: Fast and accurate short-read alignment with Burrows-Wheeler Transform. Bioinformatics 25(14), 1754–1760 (2009)CrossRefGoogle Scholar
  10. 10.
    Li, H., Ruan, J., Durbin, R.: Mapping short DNA sequencing reads and calling variants using mapping quality scores. Genome Research 18(11), 1851–1858 (2008)CrossRefGoogle Scholar
  11. 11.
    Li, Y., Willer, C.J., Ding, J., Scheet, P., Abecasis, G.R.: MaCH: Using sequence and genotype data to estimate haplotypes and unobserved genotypes. Genetic Epidemiology 34(8), 816–834 (2010)CrossRefGoogle Scholar
  12. 12.
    Sasaki, E., Sugino, R.P., Innan, H.: The linkage method: a novel approach for SNP detection and haplotype reconstruction from a single diploid individual using next generation sequence data. Molecular Biology and Evolution (9), 2187–2196 (2013)Google Scholar
  13. 13.
    Yedidia, J.S., Freeman, W.T., Weiss, Y.: Constructing free-energy approximations and generalized belief propagation algorithms. IEEE Transactions on Information Theory 51(7), 2282–2312 (2005)CrossRefMATHMathSciNetGoogle Scholar
  14. 14.
    1000 Genomes Project Consortium, Abecasis, G.R., Altshuler, D., Auton, A., Brooks, L.D., Durbin, R.M., Gibbs, R.A., Hurles, M.E., McVean, G.A.: A map of human genome variation from population-scale sequencing. Nature, 467(7319), 1061–1073 (2010)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2014

Authors and Affiliations

  • Kaname Kojima
    • 1
  • Naoki Nariai
    • 1
  • Takahiro Mimori
    • 1
  • Yumi Yamaguchi-Kabata
    • 1
  • Yukuto Sato
    • 1
  • Yosuke Kawai
    • 1
  • Masao Nagasaki
    • 1
  1. 1.Department of Integrative Genomics, Tohoku Medical Megabank OrganizationTohoku UniversityMiyagiJapan

Personalised recommendations