Skip to main content

Multi-SNP Haplotype Analysis Methods for Association Analysis

  • Protocol
  • First Online:
Statistical Human Genetics

Part of the book series: Methods in Molecular Biology ((MIMB,volume 850))

Abstract

This chapter reviews the rationale for the use of haplotypes in association-based testing, discusses statistical issues related to haplotype uncertainty that complicate the analysis, then gives practical guidance for testing haplotype-based associations with phenotype or outcome trait, first of candidate gene regions and then for the genome as a whole. Haplotypes are interesting for two reasons: First, they may be in closer LD with a causal variant than any single measured SNP, and therefore may enhance the coverage value of the genotypes over single SNP analysis. Second, haplotypes may themselves be the causal variants of interest and some solid examples of this have appeared in the literature. This chapter discusses three possible approaches to incorporation of SNP haplotype analysis into generalized linear regression models: (1) a simple substitution method involving imputed haplotypes; (2) simultaneous maximum likelihood (ML) estimation of all parameters, including haplotype frequencies and regression parameters; and (3) a simplified approximation to full ML for case–control data. Examples of the various approaches for a haplotype analysis of a candidate gene are provided. We compare the behavior of the approximation-based methods and show that in most instances the simpler methods hold up well in practice. We also describe the practical implementation of genome-wide haplotype risk estimation and discuss several shortcuts that can be used to speed up otherwise potentially very intensive computational requirements.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Protocol
USD 49.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 89.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 119.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Daly, M. J., Rioux, J., Schaffner, S., Hudson, T., and Lander, E. (2001) High-resolution haplotype structure in the human genome, Nature Genetics 29:229–232.

    Article  PubMed  CAS  Google Scholar 

  2. Gabriel, S. B., Schaffner, S. F., Nguyen, H., Moore, J. M., Roy, J., Blumenstiel, B., Higgins, J., DeFelice, M., Lochner, A., Faggart, M., Liu-Cordero, S. N., Rotimi, C., Adeyemo, A., Cooper, R., Ward, R., Lander, E. S., Daly, M. J., and Altshuler, D. (2002) The structure of haplotype blocks in the human genome, Science 296:2225–2229.

    Article  PubMed  CAS  Google Scholar 

  3. Barrett, J. C., Fry, B., Maller, J., and Daly, M. J. (2005) Haploview: analysis and visualization of LD and haplotype maps, Bioinformatics 21:263–265.

    Article  PubMed  CAS  Google Scholar 

  4. Excoffier, L., and Slatkin, M. (1995) Maximum-likelihood estimation of molecular haplotype frequencies in a diploid population, Mol Biol Evol 12:921–927.

    PubMed  CAS  Google Scholar 

  5. Zaykin, D. V., Westfall, P. H., Young, S. S., Karnoub, M. A., Wagner, M. J., and Ehm, M. G. (2002) Testing association of statistically inferred haplotypes with discrete and continuous traits in samples of unrelated individuals, Hum Hered 53:79–91.

    Article  PubMed  Google Scholar 

  6. Xie, R., and Stram, D. O. (2005) Asymptotic equivalence between two score tests for haplotype-specific risk in general linear models, Genet Epidemiol 29:166–170.

    Article  PubMed  Google Scholar 

  7. Qin, Z. S., Niu, T., and Liu, J. S. (2002) Partition-ligation-expectation-maximization algorithm for haplotype inference with single-nucleotide polymorphisms, Am J Hum Genet 71:1242–1247.

    Article  PubMed  CAS  Google Scholar 

  8. Stram, D. O., Haiman, C. A., Hirschhorn, J. N., Altshuler, D., Kolonel, L. N., Henderson, B. E., and Pike, M. C. (2003) Choosing haplotype-tagging SNPs based on unphased genotype data from a preliminary sample of unrelated subjects with an example from the Multiethnic Cohort Study, Hum Hered 55 (1):27–36.

    Article  PubMed  Google Scholar 

  9. Carroll, R. J., Ruppert, D., Stefanski, L. A., and Crainiceanu, C. (2006) Measurement error in nonlinear models: A modern perspective, Second Edition, 2 ed., Chapman and Hall, New York.

    Book  Google Scholar 

  10. Kraft, P., Cox, D. G., Paynter, R. A., Hunter, D., and De Vivo, I. (2005) Accounting for haplotype uncertainty in matched association studies: A comparison of simple and flexible techniques., Genetic Epidemiology 28:261–272.

    Article  PubMed  Google Scholar 

  11. Rosner, B., Spiegelman, D., and Willett, W. (1992) Correction of logistic relative risk estimates and confidence intervals for random within-person measurement error, Amer J of Epidemiology 136:1400–1409.

    CAS  Google Scholar 

  12. Stram, D. O., Pearce, C. L., Bretsky, P., Freedman, M., Hirschhorn, J. N., Altshuler, D., Kolonel, L. N., Henderson, B. E., and Thomas, D. C. (2003) Modeling and E-M Estimation of Haplotype-Specific Relative Risks from Genotype Data for a Case–control Study of Unrelated Individuals, Human Heredity 55:179–190.

    Article  PubMed  Google Scholar 

  13. Lin, D. Y., and Zeng, D. (2006) Likelihood-Based Inference on Haplotype Effects in Genetic Association Studies, Journal of the American Statistical Association 101:89–104.

    Article  CAS  Google Scholar 

  14. Lin, D. Y., and Huang, B. E. (2007) The use of inferred haplotypes in downstream analyses, Am J Hum Genet 80:577–579.

    Article  PubMed  CAS  Google Scholar 

  15. Hu, Y., and Lin, D. (2010) Analysis of untyped snps, maximum likelihood and Imputation Methods, Genetic Epidemiology 34:803–815.

    Article  PubMed  CAS  Google Scholar 

  16. Scheet, P., and Stephens, M. (2006) A fast and flexible statistical model for large-scale population genotype data: applications to inferring missing genotypes and haplotypic phase, Am J Hum Genet 78:629–644.

    Article  PubMed  CAS  Google Scholar 

  17. Browning, S. R., and Browning, B. L. (2007) Rapid and accurate haplotype phasing and missing-data inference for whole-genome association studies by use of localized haplotype clustering, Am J Hum Genet 81:1084–1097.

    Article  PubMed  CAS  Google Scholar 

  18. Stephens, M., Smith, N. J., and Donnelly, P. (2001) A new statistical method for haplotype reconstruction from population data, Am J Hum Genet 68:978–989.

    Article  PubMed  CAS  Google Scholar 

  19. Kraft, P., and Stram, D. O. (2007) Re: the use of inferred haplotypes in downstream analysis, Am J Hum Genet 81:863–865; author reply 865–866.

    Google Scholar 

  20. Haiman, C. A., Stram, D. O., Pike, M. C., Kolonel, L. N., Burtt, N. P., Altshuler, D., Hirschhorn, J., and Henderson, B. E. (2003) A Comprehensive Haplotype Analysis of CYP19 and Breast Cancer Risk: The Multiethnic Cohort Study, Hum Mol Genet 12:2679–2692.

    Article  PubMed  CAS  Google Scholar 

  21. Louis, T. (1982) Finding the Observed Information Matrix when using the EM algorithm, JRSS-B 44 (2) :226–233.

    Google Scholar 

  22. Spinka, C., Carroll, R. J., and Chatterjee, N. (2005) Analysis of case–control studies of genetic and environmental factors with missing genetic information and haplotype-phase ambiguity, Genet Epidemiol 29:108–127.

    Article  PubMed  Google Scholar 

  23. Zhao, L. P., Li, S. S., and Khalid, N. (2003) A method for the assessment of disease associations with single-nucleotide polymorphism haplotypes and environmental variables in case–control studies, Am J Hum Genet 72:1231–1250.

    Article  PubMed  CAS  Google Scholar 

  24. Venkatraman, E. S., Mitra, N., and Begg, C. B. (2004) A method of evaluating the impact of individual haplotypes on disease incidence in molecular epidemiology studies, Statistical Applications in Genetics and Molecular Biology (Berkley Electronic Press) 3: 1–20.

    Google Scholar 

  25. Siegmund, D., and Yakir, Y. (2007) The Statistics of Gene Mapping, Springer, New York.

    Google Scholar 

  26. Dudbridge, F. (2006) A note on permutation tests in multistage association scans, Am J Hum Genet 78:1094–1095; author reply 1096.

    Google Scholar 

  27. Dudbridge, F., and Koeleman, B. P. (2004) Efficient computation of significance levels for multiple associations in large studies of correlated data, including genomewide association studies, Am J Hum Genet 75: 424–435.

    Article  PubMed  CAS  Google Scholar 

  28. Dudbridge, F., and Koeleman, B. P. (2003) Rank truncated product of P-values, with application to genomewide association scans, Genet Epidemiol 25: 360–366.

    Article  PubMed  Google Scholar 

  29. Lin, D. Y. (2006) Evaluating Statistical Significance in Two-Stage Genomewide Association Studies, Am J Hum Genet 78(3):505–509.

    Article  PubMed  CAS  Google Scholar 

  30. Nackley, A. G., Shabalina, S. A., Tchivileva, I. E., Satterfield, K., Korchynskyi, O., Makarov, S. S., Maixner, W., and Diatchenko, L. (2006) Human catechol-O-methyltransferase haplotypes modulate protein expression by altering mRNA secondary structure, Science 314 :1930–1933.

    Article  PubMed  CAS  Google Scholar 

  31. Marchini, J., Donnelly, P., and Cardon, L. R. (2005) Genome-wide strategies for detecting multiple loci that influence complex diseases, Nat Genet 37:413–417.

    Article  PubMed  CAS  Google Scholar 

  32. Millstein, J., Conti, D. V., Gilliland, F. D., and Gauderman, W. J. (2006) A testing framework for identifying susceptibility genes in the presence of epistasis, Am J Hum Genet 78:15–27.

    Article  PubMed  CAS  Google Scholar 

  33. Evans, D. M., Marchini, J., Morris, A. P., and Cardon, L. R. (2006) Two-stage two-locus models in genome-wide association, PLoS Genet 2:e157.

    Article  PubMed  Google Scholar 

  34. Millikan, R. C., Hummer, A., Begg, C., Player, J., de Cotret, A. R., Winkel, S., Mohrenweiser, H., Thomas, N., Armstrong, B., Kricker, A., Marrett, L. D., Gruber, S. B., Culver, H. A., Zanetti, R., Gallagher, R. P., Dwyer, T., Rebbeck, T. R., Busam, K., From, L., Mujumdar, U., and Berwick, M. (2006) Polymorphisms in nucleotide excision repair genes and risk of multiple primary melanoma: the Genes Environment and Melanoma Study, Carcinogenesis 27: 610–618.

    Article  PubMed  CAS  Google Scholar 

  35. Scritable, Nature Education. http://www.nature.com/scitable/topicpage/the-information-in-dna-determines-cellular-function-6523228

  36. Li, Y., Willer, C. J., Ding, J., Scheet, P., and Abecasis, G. R. (2010) MaCH: using sequence and genotype data to estimate haplotypes and unobserved genotypes, Genetic Epidemiology 34:816–834.

    Article  PubMed  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Daniel O. Stram .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2012 Springer Science+Business Media, LLC

About this protocol

Cite this protocol

Stram, D.O., Seshan, V.E. (2012). Multi-SNP Haplotype Analysis Methods for Association Analysis. In: Elston, R., Satagopan, J., Sun, S. (eds) Statistical Human Genetics. Methods in Molecular Biology, vol 850. Humana Press. https://doi.org/10.1007/978-1-61779-555-8_23

Download citation

  • DOI: https://doi.org/10.1007/978-1-61779-555-8_23

  • Published:

  • Publisher Name: Humana Press

  • Print ISBN: 978-1-61779-554-1

  • Online ISBN: 978-1-61779-555-8

  • eBook Packages: Springer Protocols

Publish with us

Policies and ethics