Multi-SNP Haplotype Analysis Methods for Association Analysis

  • Daniel O. Stram
Part of the Methods in Molecular Biology book series (MIMB, volume 1666)


Haplotype analysis forms the basis of much of genetic association analysis using both related and unrelated individuals (we concentrate on unrelated). For example, haplotype analysis indirectly underlies the SNP imputation methods that are used for testing trait associations with known but unmeasured variants and for performing collaborative post-GWAS meta-analysis. This chapter is focused on the direct use of haplotypes in association testing. It reviews the rationale for haplotype-based association testing, discusses statistical issues related to haplotype uncertainty that affect the analysis, then gives practical guidance for testing haplotype-based associations with phenotype or outcome trait, first of candidate gene regions and then for the genome as a whole. Haplotypes are interesting for two reasons, first they may be in closer LD with a causal variant than any single measured SNP, and therefore may enhance the coverage value of the genotypes over single SNP analysis. Second, haplotypes may themselves be the causal variants of interest and some solid examples of this have appeared in the literature.

This chapter discusses three possible approaches to incorporation of SNP haplotype analysis into generalized linear regression models: (1) a simple substitution method involving imputed haplotypes, (2) simultaneous maximum likelihood (ML) estimation of all parameters, including haplotype frequencies and regression parameters, and (3) a simplified approximation to full ML for case–control data.

Examples of the various approaches for a haplotype analysis of a candidate gene are provided. We compare the behavior of the approximation-based methods and argue that in most instances the simpler methods hold up well in practice. We also describe the practical implementation of haplotype risk estimation genome-wide and discuss several shortcuts that can be used to speed up otherwise potentially very intensive computational requirements.

Key words

Haplotype-specific risk estimation Phase estimation Genetic association testing Expectation-substitution methods Maximum likelihood Uncertainty analysis 


  1. 1.
    Excoffier L, Slatkin M (1995) Maximum-likelihood estimation of molecular haplotype frequencies in a diploid population. Mol Biol Evol 12:921–927PubMedGoogle Scholar
  2. 2.
    Zaykin DV, Westfall PH, Young SS, Karnoub MA, Wagner MJ, Ehm MG (2002) Testing association of statistically inferred haplotypes with discrete and continuous traits in samples of unrelated individuals. Hum Hered 53:79–91CrossRefPubMedGoogle Scholar
  3. 3.
    Xie R, Stram DO (2005) Asymptotic equivalence between two score tests for haplotype-specific risk in general linear models. Genet Epidemiol 29:166–170CrossRefPubMedGoogle Scholar
  4. 4.
    Qin ZS, Niu T, Liu JS (2002) Partition-ligation-expectation-maximization algorithm for haplotype inference with single-nucleotide polymorphisms. Am J Hum Genet 71:1242–1247CrossRefPubMedPubMedCentralGoogle Scholar
  5. 5.
    Stram DO, Haiman CA, Hirschhorn JN, Altshuler D, Kolonel LN, Henderson BE, Pike MC (2003) Choosing haplotype-tagging SNPs based on unphased genotype data from a preliminary sample of unrelated subjects with an example from the Multiethnic Cohort Study. Hum Hered 55:27–36CrossRefPubMedGoogle Scholar
  6. 6.
    Abecasis GR, Auton A, Brooks LD, DePristo MA, Durbin RM, Handsaker RE, Kang HM, Marth GT, McVean GA (2012) An integrated map of genetic variation from 1,092 human genomes. Nature 491:56–65CrossRefPubMedGoogle Scholar
  7. 7.
    Carroll RJ, Ruppert D, Stefanski LA, Crainiceanu C (2006) Measurement error in nonlinear models: a modern perspective, 2nd edn. Chapman and Hall, New YorkCrossRefGoogle Scholar
  8. 8.
    Kraft P, Cox DG, Paynter RA, Hunter D, De Vivo I (2005) Accounting for haplotype uncertainty in matched association studies: a comparison of simple and flexible techniques. Genet Epidemiol 28:261–272CrossRefPubMedGoogle Scholar
  9. 9.
    Sinnott JA, Kraft P (2011) Artifact due to differential error when cases and controls are imputed from different platforms. Hum Genet 131(1):111–119CrossRefPubMedPubMedCentralGoogle Scholar
  10. 10.
    Rosner B, Spiegelman D, Willett W (1992) Correction of logistic relative risk estimates and confidence intervals for random within-person measurement error. Am J Epidemiol 136:1400–1409CrossRefPubMedGoogle Scholar
  11. 11.
    Stram DO, Pearce CL, Bretsky P, Freedman M, Hirschhorn JN, Altshuler D, Kolonel LN, Henderson BE, Thomas DC (2003) Modeling and E-M estimation of haplotype-specific relative risks from genotype data for a case-control study of unrelated individuals. Hum Hered 55:179–190CrossRefPubMedGoogle Scholar
  12. 12.
    Lin DY, Zeng D (2006) Likelihood-based inference on haplotype effects in genetic association studies. J Am Stat Assoc 101:89–104CrossRefGoogle Scholar
  13. 13.
    Lin DY, Huang BE (2007) The use of inferred haplotypes in downstream analyses. Am J Hum Genet 80:577–579CrossRefPubMedPubMedCentralGoogle Scholar
  14. 14.
    Hu Y, Lin D (2010) Analysis of untyped snps, maximum likelihood and imputation methods. Genet Epidemiol 34:803–815CrossRefPubMedPubMedCentralGoogle Scholar
  15. 15.
    Scheet P, Stephens M (2006) A fast and flexible statistical model for large-scale population genotype data: applications to inferring missing genotypes and haplotypic phase. Am J Hum Genet 78:629–644CrossRefPubMedPubMedCentralGoogle Scholar
  16. 16.
    Browning SR, Browning BL (2007) Rapid and accurate haplotype phasing and missing-data inference for whole-genome association studies by use of localized haplotype clustering. Am J Hum Genet 81:1084–1097CrossRefPubMedPubMedCentralGoogle Scholar
  17. 17.
    Stephens M, Smith NJ, Donnelly P (2001) A new statistical method for haplotype reconstruction from population data. Am J Hum Genet 68:978–989CrossRefPubMedPubMedCentralGoogle Scholar
  18. 18.
    Kraft P, Stram DO (2007) Re: the use of inferred haplotypes in downstream analysis. Am J Hum Genet 81:863–865. author reply 865–866CrossRefPubMedPubMedCentralGoogle Scholar
  19. 19.
    Loh PR, Danecek P, Palamara PF, Fuchsberger C, Reshef YA, Finucane HK, Schoenherr S, Forer L, McCarthy S, Abecasis GR, Durbin R, Price AL (2016) Reference-based phasing using the haplotype reference consortium panel. Nat Genet 48:1443–1448Google Scholar
  20. 20.
    Delaneau O, Marchini J, Zagury J-F (2011) A linear complexity phasing method for thousands of genomes. Nat Methods 9:179–181CrossRefPubMedGoogle Scholar
  21. 21.
    Haiman CA, Stram DO, Pike MC, Kolonel LN, Burtt NP, Altshuler D, Hirschhorn J, Henderson BE (2003) A comprehensive haplotype analysis of CYP19 and breast cancer risk: the multiethnic cohort study. Hum Mol Genet 12:2679–2692CrossRefPubMedGoogle Scholar
  22. 22.
    Louis T (1982) Finding the observed information matrix when using the EM algorithm. JRSS-B 44(2):226–233Google Scholar
  23. 23.
    Spinka C, Carroll RJ, Chatterjee N (2005) Analysis of case-control studies of genetic and environmental factors with missing genetic information and haplotype-phase ambiguity. Genet Epidemiol 29:108–127CrossRefPubMedPubMedCentralGoogle Scholar
  24. 24.
    Zhao LP, Li SS, Khalid N (2003) A method for the assessment of disease associations with single-nucleotide polymorphism haplotypes and environmental variables in case-control studies. Am J Hum Genet 72:1231–1250CrossRefPubMedPubMedCentralGoogle Scholar
  25. 25.
    Li Y, Willer CJ, Ding J, Scheet P, Abecasis GR (2010) MaCH: using sequence and genotype data to estimate haplotypes and unobserved genotypes. Genet Epidemiol 34:816–834CrossRefPubMedPubMedCentralGoogle Scholar

Copyright information

© Springer Science+Business Media LLC 2017

Authors and Affiliations

  1. 1.Department of Preventive Medicine, Keck School of MedicineUniversity of Southern CaliforniaLos AngelesUSA

Personalised recommendations