Skip to main content

An Introduction to Association Analysis

  • Chapter
  • First Online:
  • 3351 Accesses

Part of the book series: Statistics for Biology and Health ((SBH))

Abstract

This chapter focuses on techniques commonly used in GWAS studies to estimate single SNP marker associations in samples of unrelated individuals; when the phenotype is discrete (disease/no disease) then case–control methods, conditional and unconditional logistic regression, are typically utilized. Maximum likelihood estimation for generalized linear models is reviewed, and the score, Wald, and likelihood ratio tests are defined and discussed. The analysis of data from nuclear family-based designs is also briefly introduced. Issues regarding confounding, measurement error, effect mediation, and interactions are described. Control for multiple comparisons is reviewed with an emphasis placed on the behavior of the Bonferroni criteria for multiple correlated tests. The effects on statistical estimation and inference of the loss of independence between outcomes are characterized for a specific model of loss of independence, which is relevant to the presence of hidden population structure or relatedness. These last results build on a basic theme described in Chap. 2 and are then carried forward in Chap. 4.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   109.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   139.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD   139.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    Hidden stratification or admixture + either cultural practices affecting disease risk varying by ethnicity and/or the presence of an unmeasured polygene are obvious candidates.

References

  1. Armitage, P. (1955). Tests for linear trends in rates and proportions. Biometrics, 11, 375–386.

    Article  Google Scholar 

  2. McCullagh, P., & Nelder, J. (1989). Generalized linear models (2nd ed.). Boca Raton, FL: CRC Press.

    Book  MATH  Google Scholar 

  3. Moore, D. F. (1986). Asymptotic properties of moment estimates for overdispersed counts and proportions. Biometrika, 73(3), 583–588.

    Article  MathSciNet  MATH  Google Scholar 

  4. Liang, K. Y., & Zeger, S. L. (1986). Longitudinal data analysis using generalized linear models. Biometrika, 73(1), 13–22.

    Article  MathSciNet  MATH  Google Scholar 

  5. Hauck, W., & Donner, A. (1977). Wald’s test as applied to hypotheses in Logit analysis. Journal of the American Statistical Association, 72, 851–853.

    MathSciNet  MATH  Google Scholar 

  6. Schott, J. R. (1997). Matrix analysis for statistics. New York, NY: Wiley.

    MATH  Google Scholar 

  7. Kraft, P., Yen, Y. C., Stram, D. O., Morrison, J., & Gauderman, W. J. (2007). Exploiting gene-environment interaction to detect genetic associations. Human Heredity, 63, 111–119.

    Article  Google Scholar 

  8. Maskarinec, G., Grandinetti, A., Matsuura, G., Sharma, S., Mau, M., Henderson, B. E., et al. (2009). Diabetes prevalence and body mass index differ by ethnicity: The Multiethnic Cohort. Ethnicity and Disease, 19, 49–55.

    Google Scholar 

  9. Frayling, T. M., Timpson, N. J., Weedon, M. N., Zeggini, E., Freathy, R. M., Lindgren, C. M., et al. (2007). A common variant in the FTO gene is associated with body mass index and predisposes to childhood and adult obesity. Science, 316, 889–894.

    Article  Google Scholar 

  10. Hertel, J. K., Johansson, S., Raeder, H., Midthjell, K., Lyssenko, V., Groop, L., et al. (2008). Genetic analysis of recently identified type 2 diabetes loci in 1,638 unselected patients with type 2 diabetes and 1,858 control participants from a Norwegian population-based cohort (the HUNT study). Diabetologia, 51, 971–977.

    Article  Google Scholar 

  11. Freathy, R. M., Timpson, N. J., Lawlor, D. A., Pouta, A., Ben-Shlomo, Y., Ruokonen, A., et al. (2008). Common variation in the FTO gene alters diabetes-related metabolic traits to the extent expected given its effect on BMI. Diabetes, 57, 1419–1426.

    Article  Google Scholar 

  12. Smith, G. D., & Ebrahim, S. (2004). Mendelian randomization: Prospects, potentials, and limitations. International Journal of Epidemiology, 33, 30–42.

    Article  Google Scholar 

  13. Reeves, J. R., Dulude, H., Panchal, C., Daigneault, L., & Ramnani, D. M. (2006). Prognostic value of prostate secretory protein of 94 amino acids and its binding protein after radical prostatectomy. Clinical Cancer Research, 12, 6018–6022.

    Article  Google Scholar 

  14. Waters, K. M., Stram, D. O., Le Marchand, L., Klein, R. J., Valtonen-Andre, C., Peltola, M., et al. (2010). A common prostate cancer risk variant 5′ of MSMB (microseminoprotein-beta) is a strong predictor of circulating MSP (microseminoprotein) in multiple populations. Cancer Epidemiology, Biomarkers and Prevention, 19(10), 2639–2646.

    Article  Google Scholar 

  15. Eeles, R. A., Kote-Jarai, Z., Giles, G. G., Olama, A. A., Guy, M., Jugurnauth, S. K., et al. (2008). Multiple newly identified loci associated with prostate cancer susceptibility. Nature Genetics, 40, 316–321.

    Article  Google Scholar 

  16. Hung, R. J., McKay, J. D., Gaborieau, V., Boffetta, P., Hashibe, M., Zaridze, D., et al. (2008). A susceptibility locus for lung cancer maps to nicotinic acetylcholine receptor subunit genes on 15q25. Nature, 452, 633–637.

    Article  Google Scholar 

  17. McKay, J. D., Hung, R. J., Gaborieau, V., Boffetta, P., Chabrier, A., Byrnes, G., et al. (2008). Lung cancer susceptibility locus at 5p15.33. Nature Genetics, 40, 1404–1406.

    Article  Google Scholar 

  18. Greenland, S. (1980). The effect of misclassification in the presence of covariates. American Journal of Epidemiology, 112, 564–569.

    Google Scholar 

  19. Spielman, R. S., McGinnis, R. E., & Ewens, W. J. (1993). Transmission test for linkage disequilibrium: The insulin gene region and insulin-dependent diabetes mellitus (IDDM). The American Journal of Human Genetics, 52, 506–516.

    Google Scholar 

  20. Self, S. G., Longton, G., Kopecky, K. J., & Liang, K. Y. (1991). On estimating HLA/disease association with application to a study of aplastic anemia. Biometrics, 47, 53–61.

    Article  Google Scholar 

  21. Weinberg, C. R. (1999). Methods for detection of parent-of-origin effects in genetic studies of case-parents triads. The American Journal of Human Genetics, 65, 229–235.

    Article  Google Scholar 

  22. Piegorsch, W. W., Weinberg, C. R., & Taylor, J. A. (1994). Non-hierarchical logistic models and case-only designs for assessing susceptibility in population-based case–control studies. Statistics in Medicine, 13, 153–162.

    Article  Google Scholar 

  23. Cornelis, M. C., Tchetgen, E. J., Liang, L., Qi, L., Chatterjee, N., Hu, F. B., et al. (2012). Gene-environment interactions in genome-wide association studies: A comparative study of tests applied to empirical studies of type 2 diabetes. American Journal of Epidemiology, 175, 191–202.

    Article  Google Scholar 

  24. Mukherjee, B., Ahn, J., Gruber, S. B., & Chatterjee, N. (2012). Testing gene-environment interaction in large-scale case–control association studies: Possible choices and comparisons. American Journal of Epidemiology, 175, 177–190.

    Article  Google Scholar 

  25. Mukherjee, B., & Chatterjee, N. (2008). Exploiting gene-environment independence for analysis of case–control studies: An empirical Bayes-type shrinkage estimator to trade-off between bias and efficiency. Biometrics, 64, 685–694.

    Article  MathSciNet  MATH  Google Scholar 

  26. Murcray, C. E., Lewinger, J. P., & Gauderman, W. J. (2009). Gene-environment interaction in genome-wide association studies. American Journal of Epidemiology, 169, 219–226.

    Article  Google Scholar 

  27. Murcray, C. E., Lewinger, J. P., Conti, D. V., Thomas, D. C., & Gauderman, W. J. (2011). Sample size requirements to detect gene-environment interactions in genome-wide association studies. Genetic Epidemiology, 35, 201–210.

    Article  Google Scholar 

  28. Wang, H., Haiman, C. A., Kolonel, L. N., Henderson, B. E., Wilkens, L. R., Le Marchand, L., et al. (2010). Self-reported ethnicity, genetic structure and the impact of population stratification in a multiethnic study. Human Genetics, 128, 165–177.

    Article  Google Scholar 

  29. Purcell, S., Neale, B., Todd-Brown, K., Thomas, L., Ferreira, M. A., Bender, D., et al. (2007). PLINK: A tool set for whole-genome association and population-based linkage analyses. The American Journal of Human Genetics, 81, 559–575.

    Article  Google Scholar 

  30. Dudbridge, F., & Gusnanto, A. (2008). Estimation of significance thresholds for genomewide association scans. Genetic Epidemiology, 32, 227–234.

    Article  Google Scholar 

  31. Pe’er, I., Yelensky, R., Altshuler, D., & Daly, M. J. (2008). Estimation of the multiple testing burden for genomewide association studies of nearly all common variants. Genetic Epidemiology, 32, 381–385.

    Article  Google Scholar 

  32. Siegmund, D., & Yakir, Y. (2007). The statistics of gene mapping. New York, NY: Springer.

    MATH  Google Scholar 

  33. Song, C., Chen, G. K., Millikan, R. C., Ambrosone, C. B., John, E. M., Bernstein, L., et al. (2013). A genome-wide scan for breast cancer risk haplotypes among African American women. PLoS One, 8, e57298.

    Article  Google Scholar 

  34. Cheng, I., Chen, G. K., Nakagawa, H., He, J., Wan, P., Lurie, C., et al. (2012). Evaluating genetic risk for prostate cancer among Japanese and Latinos. Cancer Epidemiology, Biomarkers and Prevention, 21(11), 2048–2058.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

3.1 Electronic Supplementary Material

Below is the link to the electronic supplementary material.

chapter3 (ZIP 24.3 KB)

Appendix

Appendix

Proof of Equation (3.26)

We have the OLS estimate of σ 2 equal to \( \frac{1}{N-r}Y\prime \left(I-\mathbf{X}{\left(\mathbf{X}\hbox{'}\mathbf{X}\right)}^{-1}\mathbf{X}\prime \right)Y \). In order to simply the notation re-write this as

$$ \frac{1}{N-r}Y\prime \left(I-\mathbf{P}\right)Y, $$

where P = X(XX)− 1 X′. Note that PP = P and (I − P)(I − P) = (I − P) i.e. both P and I − P are idempotent with trace equal to r and N − r respectively. Now we take the expected value of the estimate. We have

$$ \begin{array}{l}E\left({\widehat{\sigma}}^2\right)=E\left\{\frac{1}{N-r}Y\prime \left(\mathbf{I}-\mathbf{P}\right)Y\right\}=\frac{1}{N-r}\mathrm{tr}\left\{\left(\mathbf{I}-\mathbf{P}\right)E\right( YY\prime \left)\right\}\\ {}=\frac{1}{N-r}\mathrm{tr}\left\{\left(\mathbf{I}-\mathbf{P}\right)\right[\mathrm{Var}(Y)+E(Y)E\left(Y\prime \right)\left]\right\}.\\ {}\end{array} $$

Note that

$$ \left(\mathbf{I}-\mathbf{P}\right)E(Y)E\left(Y\prime \right)=\mathbf{X}\beta \beta \prime \mathbf{X}\prime -\mathbf{X}\beta \beta \prime \mathbf{X}\prime \mathbf{X}{\left(\mathbf{X}\prime \mathbf{X}\right)}^{-1}\mathbf{X}=0. $$

Thus the above expression simplifies to

$$ \frac{1}{N-r}\mathrm{tr}\left\{\left(\mathbf{I}-\mathbf{P}\right)\left[\mathrm{Var}(Y)\right]\right\}. $$

Since it assumed that Var(Y) = σ 2 I + γ 2 K, the above expression is equal to

$$ \begin{array}{l}\frac{1}{N-r}\mathrm{tr}\left\{\left(\mathbf{I}-\mathbf{P}\right)\left[{\sigma}^2\mathbf{I}+{\gamma}^2\mathbf{K}\right]\right\}=\frac{\sigma^2}{N-r}\mathrm{tr}\left\{\right(\mathbf{I}-\mathbf{P}\left)\right\}+\frac{\gamma^2}{N-r}\mathrm{tr}\left\{\right(\mathbf{I}-\mathbf{P}\left)\mathbf{K}\right\}\\ {}={\sigma}^2+\frac{\gamma^2}{N-r}\mathrm{tr}\left\{\left(\mathbf{K}-\mathbf{P}\mathbf{K}\right)\right\}={\sigma}^2+\frac{\gamma^2}{N-r}\mathrm{tr}\left\{\right(\mathbf{K}-\mathbf{X}{\left(\mathbf{X}\hbox{'}\mathbf{X}\right)}^{-1}\mathbf{X}\mathbf{K}\left)\right\}\\ {}={\sigma}^2+\frac{\gamma^2}{N-r}\left[\mathrm{tr}\left\{\mathbf{K}\right\}-\mathrm{tr}\left\{{\left(\mathbf{X}\hbox{'}\mathbf{X}\right)}^{-1}\mathbf{X}\prime \mathbf{KX}\right\}\right]\end{array} $$

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer Science+Business Media New York

About this chapter

Cite this chapter

Stram, D.O. (2014). An Introduction to Association Analysis. In: Design, Analysis, and Interpretation of Genome-Wide Association Scans. Statistics for Biology and Health. Springer, New York, NY. https://doi.org/10.1007/978-1-4614-9443-0_3

Download citation

Publish with us

Policies and ethics