Multi-SNP Haplotype Analysis Methods for Association Analysis

Stram, Daniel O.; Seshan, Venkatraman E.

doi:10.1007/978-1-61779-555-8_23

Daniel O. Stram⁴ &
Venkatraman E. Seshan⁵

Part of the book series: Methods in Molecular Biology ((MIMB,volume 850))

4890 Accesses
5 Citations

Abstract

This chapter reviews the rationale for the use of haplotypes in association-based testing, discusses statistical issues related to haplotype uncertainty that complicate the analysis, then gives practical guidance for testing haplotype-based associations with phenotype or outcome trait, first of candidate gene regions and then for the genome as a whole. Haplotypes are interesting for two reasons: First, they may be in closer LD with a causal variant than any single measured SNP, and therefore may enhance the coverage value of the genotypes over single SNP analysis. Second, haplotypes may themselves be the causal variants of interest and some solid examples of this have appeared in the literature. This chapter discusses three possible approaches to incorporation of SNP haplotype analysis into generalized linear regression models: (1) a simple substitution method involving imputed haplotypes; (2) simultaneous maximum likelihood (ML) estimation of all parameters, including haplotype frequencies and regression parameters; and (3) a simplified approximation to full ML for case–control data. Examples of the various approaches for a haplotype analysis of a candidate gene are provided. We compare the behavior of the approximation-based methods and show that in most instances the simpler methods hold up well in practice. We also describe the practical implementation of genome-wide haplotype risk estimation and discuss several shortcuts that can be used to speed up otherwise potentially very intensive computational requirements.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Protocol: USD 49.95; Price excludes VAT (USA)

eBook: USD 89.00; Price excludes VAT (USA)

Softcover Book: USD 119.99; Price excludes VAT (USA)

Hardcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Daly, M. J., Rioux, J., Schaffner, S., Hudson, T., and Lander, E. (2001) High-resolution haplotype structure in the human genome, Nature Genetics 29:229–232.
Article PubMed CAS Google Scholar
Gabriel, S. B., Schaffner, S. F., Nguyen, H., Moore, J. M., Roy, J., Blumenstiel, B., Higgins, J., DeFelice, M., Lochner, A., Faggart, M., Liu-Cordero, S. N., Rotimi, C., Adeyemo, A., Cooper, R., Ward, R., Lander, E. S., Daly, M. J., and Altshuler, D. (2002) The structure of haplotype blocks in the human genome, Science 296:2225–2229.
Article PubMed CAS Google Scholar
Barrett, J. C., Fry, B., Maller, J., and Daly, M. J. (2005) Haploview: analysis and visualization of LD and haplotype maps, Bioinformatics 21:263–265.
Article PubMed CAS Google Scholar
Excoffier, L., and Slatkin, M. (1995) Maximum-likelihood estimation of molecular haplotype frequencies in a diploid population, Mol Biol Evol 12:921–927.
PubMed CAS Google Scholar
Zaykin, D. V., Westfall, P. H., Young, S. S., Karnoub, M. A., Wagner, M. J., and Ehm, M. G. (2002) Testing association of statistically inferred haplotypes with discrete and continuous traits in samples of unrelated individuals, Hum Hered 53:79–91.
Article PubMed Google Scholar
Xie, R., and Stram, D. O. (2005) Asymptotic equivalence between two score tests for haplotype-specific risk in general linear models, Genet Epidemiol 29:166–170.
Article PubMed Google Scholar
Qin, Z. S., Niu, T., and Liu, J. S. (2002) Partition-ligation-expectation-maximization algorithm for haplotype inference with single-nucleotide polymorphisms, Am J Hum Genet 71:1242–1247.
Article PubMed CAS Google Scholar
Stram, D. O., Haiman, C. A., Hirschhorn, J. N., Altshuler, D., Kolonel, L. N., Henderson, B. E., and Pike, M. C. (2003) Choosing haplotype-tagging SNPs based on unphased genotype data from a preliminary sample of unrelated subjects with an example from the Multiethnic Cohort Study, Hum Hered 55 (1):27–36.
Article PubMed Google Scholar
Carroll, R. J., Ruppert, D., Stefanski, L. A., and Crainiceanu, C. (2006) Measurement error in nonlinear models: A modern perspective, Second Edition, 2 ed., Chapman and Hall, New York.
Book Google Scholar
Kraft, P., Cox, D. G., Paynter, R. A., Hunter, D., and De Vivo, I. (2005) Accounting for haplotype uncertainty in matched association studies: A comparison of simple and flexible techniques., Genetic Epidemiology 28:261–272.
Article PubMed Google Scholar
Rosner, B., Spiegelman, D., and Willett, W. (1992) Correction of logistic relative risk estimates and confidence intervals for random within-person measurement error, Amer J of Epidemiology 136:1400–1409.
CAS Google Scholar
Stram, D. O., Pearce, C. L., Bretsky, P., Freedman, M., Hirschhorn, J. N., Altshuler, D., Kolonel, L. N., Henderson, B. E., and Thomas, D. C. (2003) Modeling and E-M Estimation of Haplotype-Specific Relative Risks from Genotype Data for a Case–control Study of Unrelated Individuals, Human Heredity 55:179–190.
Article PubMed Google Scholar
Lin, D. Y., and Zeng, D. (2006) Likelihood-Based Inference on Haplotype Effects in Genetic Association Studies, Journal of the American Statistical Association 101:89–104.
Article CAS Google Scholar
Lin, D. Y., and Huang, B. E. (2007) The use of inferred haplotypes in downstream analyses, Am J Hum Genet 80:577–579.
Article PubMed CAS Google Scholar
Hu, Y., and Lin, D. (2010) Analysis of untyped snps, maximum likelihood and Imputation Methods, Genetic Epidemiology 34:803–815.
Article PubMed CAS Google Scholar
Scheet, P., and Stephens, M. (2006) A fast and flexible statistical model for large-scale population genotype data: applications to inferring missing genotypes and haplotypic phase, Am J Hum Genet 78:629–644.
Article PubMed CAS Google Scholar
Browning, S. R., and Browning, B. L. (2007) Rapid and accurate haplotype phasing and missing-data inference for whole-genome association studies by use of localized haplotype clustering, Am J Hum Genet 81:1084–1097.
Article PubMed CAS Google Scholar
Stephens, M., Smith, N. J., and Donnelly, P. (2001) A new statistical method for haplotype reconstruction from population data, Am J Hum Genet 68:978–989.
Article PubMed CAS Google Scholar
Kraft, P., and Stram, D. O. (2007) Re: the use of inferred haplotypes in downstream analysis, Am J Hum Genet 81:863–865; author reply 865–866.
Google Scholar
Haiman, C. A., Stram, D. O., Pike, M. C., Kolonel, L. N., Burtt, N. P., Altshuler, D., Hirschhorn, J., and Henderson, B. E. (2003) A Comprehensive Haplotype Analysis of CYP19 and Breast Cancer Risk: The Multiethnic Cohort Study, Hum Mol Genet 12:2679–2692.
Article PubMed CAS Google Scholar
Louis, T. (1982) Finding the Observed Information Matrix when using the EM algorithm, JRSS-B 44 (2) :226–233.
Google Scholar
Spinka, C., Carroll, R. J., and Chatterjee, N. (2005) Analysis of case–control studies of genetic and environmental factors with missing genetic information and haplotype-phase ambiguity, Genet Epidemiol 29:108–127.
Article PubMed Google Scholar
Zhao, L. P., Li, S. S., and Khalid, N. (2003) A method for the assessment of disease associations with single-nucleotide polymorphism haplotypes and environmental variables in case–control studies, Am J Hum Genet 72:1231–1250.
Article PubMed CAS Google Scholar
Venkatraman, E. S., Mitra, N., and Begg, C. B. (2004) A method of evaluating the impact of individual haplotypes on disease incidence in molecular epidemiology studies, Statistical Applications in Genetics and Molecular Biology (Berkley Electronic Press) 3: 1–20.
Google Scholar
Siegmund, D., and Yakir, Y. (2007) The Statistics of Gene Mapping, Springer, New York.
Google Scholar
Dudbridge, F. (2006) A note on permutation tests in multistage association scans, Am J Hum Genet 78:1094–1095; author reply 1096.
Google Scholar
Dudbridge, F., and Koeleman, B. P. (2004) Efficient computation of significance levels for multiple associations in large studies of correlated data, including genomewide association studies, Am J Hum Genet 75: 424–435.
Article PubMed CAS Google Scholar
Dudbridge, F., and Koeleman, B. P. (2003) Rank truncated product of P-values, with application to genomewide association scans, Genet Epidemiol 25: 360–366.
Article PubMed Google Scholar
Lin, D. Y. (2006) Evaluating Statistical Significance in Two-Stage Genomewide Association Studies, Am J Hum Genet 78(3):505–509.
Article PubMed CAS Google Scholar
Nackley, A. G., Shabalina, S. A., Tchivileva, I. E., Satterfield, K., Korchynskyi, O., Makarov, S. S., Maixner, W., and Diatchenko, L. (2006) Human catechol-O-methyltransferase haplotypes modulate protein expression by altering mRNA secondary structure, Science 314 :1930–1933.
Article PubMed CAS Google Scholar
Marchini, J., Donnelly, P., and Cardon, L. R. (2005) Genome-wide strategies for detecting multiple loci that influence complex diseases, Nat Genet 37:413–417.
Article PubMed CAS Google Scholar
Millstein, J., Conti, D. V., Gilliland, F. D., and Gauderman, W. J. (2006) A testing framework for identifying susceptibility genes in the presence of epistasis, Am J Hum Genet 78:15–27.
Article PubMed CAS Google Scholar
Evans, D. M., Marchini, J., Morris, A. P., and Cardon, L. R. (2006) Two-stage two-locus models in genome-wide association, PLoS Genet 2:e157.
Article PubMed Google Scholar
Millikan, R. C., Hummer, A., Begg, C., Player, J., de Cotret, A. R., Winkel, S., Mohrenweiser, H., Thomas, N., Armstrong, B., Kricker, A., Marrett, L. D., Gruber, S. B., Culver, H. A., Zanetti, R., Gallagher, R. P., Dwyer, T., Rebbeck, T. R., Busam, K., From, L., Mujumdar, U., and Berwick, M. (2006) Polymorphisms in nucleotide excision repair genes and risk of multiple primary melanoma: the Genes Environment and Melanoma Study, Carcinogenesis 27: 610–618.
Article PubMed CAS Google Scholar
Scritable, Nature Education. http://www.nature.com/scitable/topicpage/the-information-in-dna-determines-cellular-function-6523228
Li, Y., Willer, C. J., Ding, J., Scheet, P., and Abecasis, G. R. (2010) MaCH: using sequence and genotype data to estimate haplotypes and unobserved genotypes, Genetic Epidemiology 34:816–834.
Article PubMed Google Scholar

Download references

Author information

Authors and Affiliations

Department of Preventive Medicine, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA
Daniel O. Stram
Department of Epidemiology and Biostatistics, Memorial Sloan-Kettering Cancer Center, New York, NY, USA
Venkatraman E. Seshan

Authors

Daniel O. Stram
View author publications
You can also search for this author in PubMed Google Scholar
Venkatraman E. Seshan
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Daniel O. Stram .

Editor information

Editors and Affiliations

School of Medicine, Dept. Epidemiology & Biostatistics, Case Western Reserve University, Cornell Road 2103, Cleveland, 44106, Ohio, USA
Robert C. Elston
Dept. Epidemiology & Biostatistics, Memorial Sloan-Kettering Cancer Center, East 63rd Street 307, New York, 10021, New York, USA
Jaya M. Satagopan
School of Medicine, Dept. Epidemiology & Biostatistics, Case Western Reserve University, Cornell Road 2103, Cleveland, 44106, Ohio, USA
Shuying Sun

Rights and permissions

Reprints and permissions

Copyright information

About this protocol

Cite this protocol

Stram, D.O., Seshan, V.E. (2012). Multi-SNP Haplotype Analysis Methods for Association Analysis. In: Elston, R., Satagopan, J., Sun, S. (eds) Statistical Human Genetics. Methods in Molecular Biology, vol 850. Humana Press. https://doi.org/10.1007/978-1-61779-555-8_23

Download citation

DOI: https://doi.org/10.1007/978-1-61779-555-8_23
Published: 20 December 2011
Publisher Name: Humana Press
Print ISBN: 978-1-61779-554-1
Online ISBN: 978-1-61779-555-8
eBook Packages: Springer Protocols

Publish with us

Policies and ethics