Abstract
This chapter reviews the rationale for the use of haplotypes in association-based testing, discusses statistical issues related to haplotype uncertainty that complicate the analysis, then gives practical guidance for testing haplotype-based associations with phenotype or outcome trait, first of candidate gene regions and then for the genome as a whole. Haplotypes are interesting for two reasons: First, they may be in closer LD with a causal variant than any single measured SNP, and therefore may enhance the coverage value of the genotypes over single SNP analysis. Second, haplotypes may themselves be the causal variants of interest and some solid examples of this have appeared in the literature. This chapter discusses three possible approaches to incorporation of SNP haplotype analysis into generalized linear regression models: (1) a simple substitution method involving imputed haplotypes; (2) simultaneous maximum likelihood (ML) estimation of all parameters, including haplotype frequencies and regression parameters; and (3) a simplified approximation to full ML for case–control data. Examples of the various approaches for a haplotype analysis of a candidate gene are provided. We compare the behavior of the approximation-based methods and show that in most instances the simpler methods hold up well in practice. We also describe the practical implementation of genome-wide haplotype risk estimation and discuss several shortcuts that can be used to speed up otherwise potentially very intensive computational requirements.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Daly, M. J., Rioux, J., Schaffner, S., Hudson, T., and Lander, E. (2001) High-resolution haplotype structure in the human genome, Nature Genetics 29:229–232.
Gabriel, S. B., Schaffner, S. F., Nguyen, H., Moore, J. M., Roy, J., Blumenstiel, B., Higgins, J., DeFelice, M., Lochner, A., Faggart, M., Liu-Cordero, S. N., Rotimi, C., Adeyemo, A., Cooper, R., Ward, R., Lander, E. S., Daly, M. J., and Altshuler, D. (2002) The structure of haplotype blocks in the human genome, Science 296:2225–2229.
Barrett, J. C., Fry, B., Maller, J., and Daly, M. J. (2005) Haploview: analysis and visualization of LD and haplotype maps, Bioinformatics 21:263–265.
Excoffier, L., and Slatkin, M. (1995) Maximum-likelihood estimation of molecular haplotype frequencies in a diploid population, Mol Biol Evol 12:921–927.
Zaykin, D. V., Westfall, P. H., Young, S. S., Karnoub, M. A., Wagner, M. J., and Ehm, M. G. (2002) Testing association of statistically inferred haplotypes with discrete and continuous traits in samples of unrelated individuals, Hum Hered 53:79–91.
Xie, R., and Stram, D. O. (2005) Asymptotic equivalence between two score tests for haplotype-specific risk in general linear models, Genet Epidemiol 29:166–170.
Qin, Z. S., Niu, T., and Liu, J. S. (2002) Partition-ligation-expectation-maximization algorithm for haplotype inference with single-nucleotide polymorphisms, Am J Hum Genet 71:1242–1247.
Stram, D. O., Haiman, C. A., Hirschhorn, J. N., Altshuler, D., Kolonel, L. N., Henderson, B. E., and Pike, M. C. (2003) Choosing haplotype-tagging SNPs based on unphased genotype data from a preliminary sample of unrelated subjects with an example from the Multiethnic Cohort Study, Hum Hered 55 (1):27–36.
Carroll, R. J., Ruppert, D., Stefanski, L. A., and Crainiceanu, C. (2006) Measurement error in nonlinear models: A modern perspective, Second Edition, 2 ed., Chapman and Hall, New York.
Kraft, P., Cox, D. G., Paynter, R. A., Hunter, D., and De Vivo, I. (2005) Accounting for haplotype uncertainty in matched association studies: A comparison of simple and flexible techniques., Genetic Epidemiology 28:261–272.
Rosner, B., Spiegelman, D., and Willett, W. (1992) Correction of logistic relative risk estimates and confidence intervals for random within-person measurement error, Amer J of Epidemiology 136:1400–1409.
Stram, D. O., Pearce, C. L., Bretsky, P., Freedman, M., Hirschhorn, J. N., Altshuler, D., Kolonel, L. N., Henderson, B. E., and Thomas, D. C. (2003) Modeling and E-M Estimation of Haplotype-Specific Relative Risks from Genotype Data for a Case–control Study of Unrelated Individuals, Human Heredity 55:179–190.
Lin, D. Y., and Zeng, D. (2006) Likelihood-Based Inference on Haplotype Effects in Genetic Association Studies, Journal of the American Statistical Association 101:89–104.
Lin, D. Y., and Huang, B. E. (2007) The use of inferred haplotypes in downstream analyses, Am J Hum Genet 80:577–579.
Hu, Y., and Lin, D. (2010) Analysis of untyped snps, maximum likelihood and Imputation Methods, Genetic Epidemiology 34:803–815.
Scheet, P., and Stephens, M. (2006) A fast and flexible statistical model for large-scale population genotype data: applications to inferring missing genotypes and haplotypic phase, Am J Hum Genet 78:629–644.
Browning, S. R., and Browning, B. L. (2007) Rapid and accurate haplotype phasing and missing-data inference for whole-genome association studies by use of localized haplotype clustering, Am J Hum Genet 81:1084–1097.
Stephens, M., Smith, N. J., and Donnelly, P. (2001) A new statistical method for haplotype reconstruction from population data, Am J Hum Genet 68:978–989.
Kraft, P., and Stram, D. O. (2007) Re: the use of inferred haplotypes in downstream analysis, Am J Hum Genet 81:863–865; author reply 865–866.
Haiman, C. A., Stram, D. O., Pike, M. C., Kolonel, L. N., Burtt, N. P., Altshuler, D., Hirschhorn, J., and Henderson, B. E. (2003) A Comprehensive Haplotype Analysis of CYP19 and Breast Cancer Risk: The Multiethnic Cohort Study, Hum Mol Genet 12:2679–2692.
Louis, T. (1982) Finding the Observed Information Matrix when using the EM algorithm, JRSS-B 44 (2) :226–233.
Spinka, C., Carroll, R. J., and Chatterjee, N. (2005) Analysis of case–control studies of genetic and environmental factors with missing genetic information and haplotype-phase ambiguity, Genet Epidemiol 29:108–127.
Zhao, L. P., Li, S. S., and Khalid, N. (2003) A method for the assessment of disease associations with single-nucleotide polymorphism haplotypes and environmental variables in case–control studies, Am J Hum Genet 72:1231–1250.
Venkatraman, E. S., Mitra, N., and Begg, C. B. (2004) A method of evaluating the impact of individual haplotypes on disease incidence in molecular epidemiology studies, Statistical Applications in Genetics and Molecular Biology (Berkley Electronic Press) 3: 1–20.
Siegmund, D., and Yakir, Y. (2007) The Statistics of Gene Mapping, Springer, New York.
Dudbridge, F. (2006) A note on permutation tests in multistage association scans, Am J Hum Genet 78:1094–1095; author reply 1096.
Dudbridge, F., and Koeleman, B. P. (2004) Efficient computation of significance levels for multiple associations in large studies of correlated data, including genomewide association studies, Am J Hum Genet 75: 424–435.
Dudbridge, F., and Koeleman, B. P. (2003) Rank truncated product of P-values, with application to genomewide association scans, Genet Epidemiol 25: 360–366.
Lin, D. Y. (2006) Evaluating Statistical Significance in Two-Stage Genomewide Association Studies, Am J Hum Genet 78(3):505–509.
Nackley, A. G., Shabalina, S. A., Tchivileva, I. E., Satterfield, K., Korchynskyi, O., Makarov, S. S., Maixner, W., and Diatchenko, L. (2006) Human catechol-O-methyltransferase haplotypes modulate protein expression by altering mRNA secondary structure, Science 314 :1930–1933.
Marchini, J., Donnelly, P., and Cardon, L. R. (2005) Genome-wide strategies for detecting multiple loci that influence complex diseases, Nat Genet 37:413–417.
Millstein, J., Conti, D. V., Gilliland, F. D., and Gauderman, W. J. (2006) A testing framework for identifying susceptibility genes in the presence of epistasis, Am J Hum Genet 78:15–27.
Evans, D. M., Marchini, J., Morris, A. P., and Cardon, L. R. (2006) Two-stage two-locus models in genome-wide association, PLoS Genet 2:e157.
Millikan, R. C., Hummer, A., Begg, C., Player, J., de Cotret, A. R., Winkel, S., Mohrenweiser, H., Thomas, N., Armstrong, B., Kricker, A., Marrett, L. D., Gruber, S. B., Culver, H. A., Zanetti, R., Gallagher, R. P., Dwyer, T., Rebbeck, T. R., Busam, K., From, L., Mujumdar, U., and Berwick, M. (2006) Polymorphisms in nucleotide excision repair genes and risk of multiple primary melanoma: the Genes Environment and Melanoma Study, Carcinogenesis 27: 610–618.
Scritable, Nature Education. http://www.nature.com/scitable/topicpage/the-information-in-dna-determines-cellular-function-6523228
Li, Y., Willer, C. J., Ding, J., Scheet, P., and Abecasis, G. R. (2010) MaCH: using sequence and genotype data to estimate haplotypes and unobserved genotypes, Genetic Epidemiology 34:816–834.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer Science+Business Media, LLC
About this protocol
Cite this protocol
Stram, D.O., Seshan, V.E. (2012). Multi-SNP Haplotype Analysis Methods for Association Analysis. In: Elston, R., Satagopan, J., Sun, S. (eds) Statistical Human Genetics. Methods in Molecular Biology, vol 850. Humana Press. https://doi.org/10.1007/978-1-61779-555-8_23
Download citation
DOI: https://doi.org/10.1007/978-1-61779-555-8_23
Published:
Publisher Name: Humana Press
Print ISBN: 978-1-61779-554-1
Online ISBN: 978-1-61779-555-8
eBook Packages: Springer Protocols