An Introduction to Association Analysis

Stram, Daniel O.

doi:10.1007/978-1-4614-9443-0_3

An Introduction to Association Analysis

Daniel O. Stram⁷

Chapter
First Online: 11 November 2013

3351 Accesses

Part of the book series: Statistics for Biology and Health ((SBH))

Abstract

This chapter focuses on techniques commonly used in GWAS studies to estimate single SNP marker associations in samples of unrelated individuals; when the phenotype is discrete (disease/no disease) then case–control methods, conditional and unconditional logistic regression, are typically utilized. Maximum likelihood estimation for generalized linear models is reviewed, and the score, Wald, and likelihood ratio tests are defined and discussed. The analysis of data from nuclear family-based designs is also briefly introduced. Issues regarding confounding, measurement error, effect mediation, and interactions are described. Control for multiple comparisons is reviewed with an emphasis placed on the behavior of the Bonferroni criteria for multiple correlated tests. The effects on statistical estimation and inference of the loss of independence between outcomes are characterized for a specific model of loss of independence, which is relevant to the presence of hidden population structure or relatedness. These last results build on a basic theme described in Chap. 2 and are then carried forward in Chap. 4.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 109.00; Price excludes VAT (USA)

Softcover Book: USD 139.99; Price excludes VAT (USA)

Hardcover Book: USD 139.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

1.
Hidden stratification or admixture + either cultural practices affecting disease risk varying by ethnicity and/or the presence of an unmeasured polygene are obvious candidates.

References

Armitage, P. (1955). Tests for linear trends in rates and proportions. Biometrics, 11, 375–386.
Article Google Scholar
McCullagh, P., & Nelder, J. (1989). Generalized linear models (2nd ed.). Boca Raton, FL: CRC Press.
Book MATH Google Scholar
Moore, D. F. (1986). Asymptotic properties of moment estimates for overdispersed counts and proportions. Biometrika, 73(3), 583–588.
Article MathSciNet MATH Google Scholar
Liang, K. Y., & Zeger, S. L. (1986). Longitudinal data analysis using generalized linear models. Biometrika, 73(1), 13–22.
Article MathSciNet MATH Google Scholar
Hauck, W., & Donner, A. (1977). Wald’s test as applied to hypotheses in Logit analysis. Journal of the American Statistical Association, 72, 851–853.
MathSciNet MATH Google Scholar
Schott, J. R. (1997). Matrix analysis for statistics. New York, NY: Wiley.
MATH Google Scholar
Kraft, P., Yen, Y. C., Stram, D. O., Morrison, J., & Gauderman, W. J. (2007). Exploiting gene-environment interaction to detect genetic associations. Human Heredity, 63, 111–119.
Article Google Scholar
Maskarinec, G., Grandinetti, A., Matsuura, G., Sharma, S., Mau, M., Henderson, B. E., et al. (2009). Diabetes prevalence and body mass index differ by ethnicity: The Multiethnic Cohort. Ethnicity and Disease, 19, 49–55.
Google Scholar
Frayling, T. M., Timpson, N. J., Weedon, M. N., Zeggini, E., Freathy, R. M., Lindgren, C. M., et al. (2007). A common variant in the FTO gene is associated with body mass index and predisposes to childhood and adult obesity. Science, 316, 889–894.
Article Google Scholar
Hertel, J. K., Johansson, S., Raeder, H., Midthjell, K., Lyssenko, V., Groop, L., et al. (2008). Genetic analysis of recently identified type 2 diabetes loci in 1,638 unselected patients with type 2 diabetes and 1,858 control participants from a Norwegian population-based cohort (the HUNT study). Diabetologia, 51, 971–977.
Article Google Scholar
Freathy, R. M., Timpson, N. J., Lawlor, D. A., Pouta, A., Ben-Shlomo, Y., Ruokonen, A., et al. (2008). Common variation in the FTO gene alters diabetes-related metabolic traits to the extent expected given its effect on BMI. Diabetes, 57, 1419–1426.
Article Google Scholar
Smith, G. D., & Ebrahim, S. (2004). Mendelian randomization: Prospects, potentials, and limitations. International Journal of Epidemiology, 33, 30–42.
Article Google Scholar
Reeves, J. R., Dulude, H., Panchal, C., Daigneault, L., & Ramnani, D. M. (2006). Prognostic value of prostate secretory protein of 94 amino acids and its binding protein after radical prostatectomy. Clinical Cancer Research, 12, 6018–6022.
Article Google Scholar
Waters, K. M., Stram, D. O., Le Marchand, L., Klein, R. J., Valtonen-Andre, C., Peltola, M., et al. (2010). A common prostate cancer risk variant 5′ of MSMB (microseminoprotein-beta) is a strong predictor of circulating MSP (microseminoprotein) in multiple populations. Cancer Epidemiology, Biomarkers and Prevention, 19(10), 2639–2646.
Article Google Scholar
Eeles, R. A., Kote-Jarai, Z., Giles, G. G., Olama, A. A., Guy, M., Jugurnauth, S. K., et al. (2008). Multiple newly identified loci associated with prostate cancer susceptibility. Nature Genetics, 40, 316–321.
Article Google Scholar
Hung, R. J., McKay, J. D., Gaborieau, V., Boffetta, P., Hashibe, M., Zaridze, D., et al. (2008). A susceptibility locus for lung cancer maps to nicotinic acetylcholine receptor subunit genes on 15q25. Nature, 452, 633–637.
Article Google Scholar
McKay, J. D., Hung, R. J., Gaborieau, V., Boffetta, P., Chabrier, A., Byrnes, G., et al. (2008). Lung cancer susceptibility locus at 5p15.33. Nature Genetics, 40, 1404–1406.
Article Google Scholar
Greenland, S. (1980). The effect of misclassification in the presence of covariates. American Journal of Epidemiology, 112, 564–569.
Google Scholar
Spielman, R. S., McGinnis, R. E., & Ewens, W. J. (1993). Transmission test for linkage disequilibrium: The insulin gene region and insulin-dependent diabetes mellitus (IDDM). The American Journal of Human Genetics, 52, 506–516.
Google Scholar
Self, S. G., Longton, G., Kopecky, K. J., & Liang, K. Y. (1991). On estimating HLA/disease association with application to a study of aplastic anemia. Biometrics, 47, 53–61.
Article Google Scholar
Weinberg, C. R. (1999). Methods for detection of parent-of-origin effects in genetic studies of case-parents triads. The American Journal of Human Genetics, 65, 229–235.
Article Google Scholar
Piegorsch, W. W., Weinberg, C. R., & Taylor, J. A. (1994). Non-hierarchical logistic models and case-only designs for assessing susceptibility in population-based case–control studies. Statistics in Medicine, 13, 153–162.
Article Google Scholar
Cornelis, M. C., Tchetgen, E. J., Liang, L., Qi, L., Chatterjee, N., Hu, F. B., et al. (2012). Gene-environment interactions in genome-wide association studies: A comparative study of tests applied to empirical studies of type 2 diabetes. American Journal of Epidemiology, 175, 191–202.
Article Google Scholar
Mukherjee, B., Ahn, J., Gruber, S. B., & Chatterjee, N. (2012). Testing gene-environment interaction in large-scale case–control association studies: Possible choices and comparisons. American Journal of Epidemiology, 175, 177–190.
Article Google Scholar
Mukherjee, B., & Chatterjee, N. (2008). Exploiting gene-environment independence for analysis of case–control studies: An empirical Bayes-type shrinkage estimator to trade-off between bias and efficiency. Biometrics, 64, 685–694.
Article MathSciNet MATH Google Scholar
Murcray, C. E., Lewinger, J. P., & Gauderman, W. J. (2009). Gene-environment interaction in genome-wide association studies. American Journal of Epidemiology, 169, 219–226.
Article Google Scholar
Murcray, C. E., Lewinger, J. P., Conti, D. V., Thomas, D. C., & Gauderman, W. J. (2011). Sample size requirements to detect gene-environment interactions in genome-wide association studies. Genetic Epidemiology, 35, 201–210.
Article Google Scholar
Wang, H., Haiman, C. A., Kolonel, L. N., Henderson, B. E., Wilkens, L. R., Le Marchand, L., et al. (2010). Self-reported ethnicity, genetic structure and the impact of population stratification in a multiethnic study. Human Genetics, 128, 165–177.
Article Google Scholar
Purcell, S., Neale, B., Todd-Brown, K., Thomas, L., Ferreira, M. A., Bender, D., et al. (2007). PLINK: A tool set for whole-genome association and population-based linkage analyses. The American Journal of Human Genetics, 81, 559–575.
Article Google Scholar
Dudbridge, F., & Gusnanto, A. (2008). Estimation of significance thresholds for genomewide association scans. Genetic Epidemiology, 32, 227–234.
Article Google Scholar
Pe’er, I., Yelensky, R., Altshuler, D., & Daly, M. J. (2008). Estimation of the multiple testing burden for genomewide association studies of nearly all common variants. Genetic Epidemiology, 32, 381–385.
Article Google Scholar
Siegmund, D., & Yakir, Y. (2007). The statistics of gene mapping. New York, NY: Springer.
MATH Google Scholar
Song, C., Chen, G. K., Millikan, R. C., Ambrosone, C. B., John, E. M., Bernstein, L., et al. (2013). A genome-wide scan for breast cancer risk haplotypes among African American women. PLoS One, 8, e57298.
Article Google Scholar
Cheng, I., Chen, G. K., Nakagawa, H., He, J., Wan, P., Lurie, C., et al. (2012). Evaluating genetic risk for prostate cancer among Japanese and Latinos. Cancer Epidemiology, Biomarkers and Prevention, 21(11), 2048–2058.
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Preventive Medicine, University of Southern California Keck School of Medicine, Los Angeles, CA, USA
Daniel O. Stram

Authors

Daniel O. Stram
View author publications
You can also search for this author in PubMed Google Scholar

3.1 Electronic Supplementary Material

Below is the link to the electronic supplementary material.

chapter3 (ZIP 24.3 KB)

Appendix

Proof of Equation (3.26)

We have the OLS estimate of σ ² equal to $ \frac{1}{N-r}Y\prime \left(I-\mathbf{X}{\left(\mathbf{X}\hbox{'}\mathbf{X}\right)}^{-1}\mathbf{X}\prime \right)Y $. In order to simply the notation re-write this as

$$ \frac{1}{N-r}Y\prime \left(I-\mathbf{P}\right)Y, $$

where P = X(X′X)^− 1 X′. Note that PP = P and (I − P)(I − P) = (I − P) i.e. both P and I − P are idempotent with trace equal to r and N − r respectively. Now we take the expected value of the estimate. We have

$$ \begin{array}{l}E\left({\widehat{\sigma}}^2\right)=E\left\{\frac{1}{N-r}Y\prime \left(\mathbf{I}-\mathbf{P}\right)Y\right\}=\frac{1}{N-r}\mathrm{tr}\left\{\left(\mathbf{I}-\mathbf{P}\right)E\right( YY\prime \left)\right\}\\ {}=\frac{1}{N-r}\mathrm{tr}\left\{\left(\mathbf{I}-\mathbf{P}\right)\right[\mathrm{Var}(Y)+E(Y)E\left(Y\prime \right)\left]\right\}.\\ {}\end{array} $$

Note that

$$ \left(\mathbf{I}-\mathbf{P}\right)E(Y)E\left(Y\prime \right)=\mathbf{X}\beta \beta \prime \mathbf{X}\prime -\mathbf{X}\beta \beta \prime \mathbf{X}\prime \mathbf{X}{\left(\mathbf{X}\prime \mathbf{X}\right)}^{-1}\mathbf{X}=0. $$

Thus the above expression simplifies to

$$ \frac{1}{N-r}\mathrm{tr}\left\{\left(\mathbf{I}-\mathbf{P}\right)\left[\mathrm{Var}(Y)\right]\right\}. $$

Since it assumed that Var(Y) = σ ² I + γ ² K, the above expression is equal to

$$ \begin{array}{l}\frac{1}{N-r}\mathrm{tr}\left\{\left(\mathbf{I}-\mathbf{P}\right)\left[{\sigma}^2\mathbf{I}+{\gamma}^2\mathbf{K}\right]\right\}=\frac{\sigma^2}{N-r}\mathrm{tr}\left\{\right(\mathbf{I}-\mathbf{P}\left)\right\}+\frac{\gamma^2}{N-r}\mathrm{tr}\left\{\right(\mathbf{I}-\mathbf{P}\left)\mathbf{K}\right\}\\ {}={\sigma}^2+\frac{\gamma^2}{N-r}\mathrm{tr}\left\{\left(\mathbf{K}-\mathbf{P}\mathbf{K}\right)\right\}={\sigma}^2+\frac{\gamma^2}{N-r}\mathrm{tr}\left\{\right(\mathbf{K}-\mathbf{X}{\left(\mathbf{X}\hbox{'}\mathbf{X}\right)}^{-1}\mathbf{X}\mathbf{K}\left)\right\}\\ {}={\sigma}^2+\frac{\gamma^2}{N-r}\left[\mathrm{tr}\left\{\mathbf{K}\right\}-\mathrm{tr}\left\{{\left(\mathbf{X}\hbox{'}\mathbf{X}\right)}^{-1}\mathbf{X}\prime \mathbf{KX}\right\}\right]\end{array} $$

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Stram, D.O. (2014). An Introduction to Association Analysis. In: Design, Analysis, and Interpretation of Genome-Wide Association Scans. Statistics for Biology and Health. Springer, New York, NY. https://doi.org/10.1007/978-1-4614-9443-0_3

Download citation

DOI: https://doi.org/10.1007/978-1-4614-9443-0_3
Published: 11 November 2013
Publisher Name: Springer, New York, NY
Print ISBN: 978-1-4614-9442-3
Online ISBN: 978-1-4614-9443-0
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)

Publish with us

Policies and ethics

Abstract

Buying options

Notes

References

Author information

Authors and Affiliations

3.1 Electronic Supplementary Material

chapter3 (ZIP 24.3 KB)

Appendix

Appendix

Proof of Equation (3.26)

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Share this chapter

Publish with us

Search

Navigation