An Introduction to Association Analysis

  • Daniel O. Stram
Part of the Statistics for Biology and Health book series (SBH)


This chapter focuses on techniques commonly used in GWAS studies to estimate single SNP marker associations in samples of unrelated individuals; when the phenotype is discrete (disease/no disease) then case–control methods, conditional and unconditional logistic regression, are typically utilized. Maximum likelihood estimation for generalized linear models is reviewed, and the score, Wald, and likelihood ratio tests are defined and discussed. The analysis of data from nuclear family-based designs is also briefly introduced. Issues regarding confounding, measurement error, effect mediation, and interactions are described. Control for multiple comparisons is reviewed with an emphasis placed on the behavior of the Bonferroni criteria for multiple correlated tests. The effects on statistical estimation and inference of the loss of independence between outcomes are characterized for a specific model of loss of independence, which is relevant to the presence of hidden population structure or relatedness. These last results build on a basic theme described in Chap. 2 and are then carried forward in Chap. 4.


Logistic Regression Likelihood Ratio Test Wald Test Conditional Logistic Regression Ordinary Little Square Regression 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Supplementary material (24 kb)
chapter3 (ZIP 24.3 KB)


  1. 1.
    Armitage, P. (1955). Tests for linear trends in rates and proportions. Biometrics, 11, 375–386.CrossRefGoogle Scholar
  2. 2.
    McCullagh, P., & Nelder, J. (1989). Generalized linear models (2nd ed.). Boca Raton, FL: CRC Press.CrossRefMATHGoogle Scholar
  3. 3.
    Moore, D. F. (1986). Asymptotic properties of moment estimates for overdispersed counts and proportions. Biometrika, 73(3), 583–588.MathSciNetCrossRefMATHGoogle Scholar
  4. 4.
    Liang, K. Y., & Zeger, S. L. (1986). Longitudinal data analysis using generalized linear models. Biometrika, 73(1), 13–22.MathSciNetCrossRefMATHGoogle Scholar
  5. 5.
    Hauck, W., & Donner, A. (1977). Wald’s test as applied to hypotheses in Logit analysis. Journal of the American Statistical Association, 72, 851–853.MathSciNetMATHGoogle Scholar
  6. 6.
    Schott, J. R. (1997). Matrix analysis for statistics. New York, NY: Wiley.MATHGoogle Scholar
  7. 7.
    Kraft, P., Yen, Y. C., Stram, D. O., Morrison, J., & Gauderman, W. J. (2007). Exploiting gene-environment interaction to detect genetic associations. Human Heredity, 63, 111–119.CrossRefGoogle Scholar
  8. 8.
    Maskarinec, G., Grandinetti, A., Matsuura, G., Sharma, S., Mau, M., Henderson, B. E., et al. (2009). Diabetes prevalence and body mass index differ by ethnicity: The Multiethnic Cohort. Ethnicity and Disease, 19, 49–55.Google Scholar
  9. 9.
    Frayling, T. M., Timpson, N. J., Weedon, M. N., Zeggini, E., Freathy, R. M., Lindgren, C. M., et al. (2007). A common variant in the FTO gene is associated with body mass index and predisposes to childhood and adult obesity. Science, 316, 889–894.CrossRefGoogle Scholar
  10. 10.
    Hertel, J. K., Johansson, S., Raeder, H., Midthjell, K., Lyssenko, V., Groop, L., et al. (2008). Genetic analysis of recently identified type 2 diabetes loci in 1,638 unselected patients with type 2 diabetes and 1,858 control participants from a Norwegian population-based cohort (the HUNT study). Diabetologia, 51, 971–977.CrossRefGoogle Scholar
  11. 11.
    Freathy, R. M., Timpson, N. J., Lawlor, D. A., Pouta, A., Ben-Shlomo, Y., Ruokonen, A., et al. (2008). Common variation in the FTO gene alters diabetes-related metabolic traits to the extent expected given its effect on BMI. Diabetes, 57, 1419–1426.CrossRefGoogle Scholar
  12. 12.
    Smith, G. D., & Ebrahim, S. (2004). Mendelian randomization: Prospects, potentials, and limitations. International Journal of Epidemiology, 33, 30–42.CrossRefGoogle Scholar
  13. 13.
    Reeves, J. R., Dulude, H., Panchal, C., Daigneault, L., & Ramnani, D. M. (2006). Prognostic value of prostate secretory protein of 94 amino acids and its binding protein after radical prostatectomy. Clinical Cancer Research, 12, 6018–6022.CrossRefGoogle Scholar
  14. 14.
    Waters, K. M., Stram, D. O., Le Marchand, L., Klein, R. J., Valtonen-Andre, C., Peltola, M., et al. (2010). A common prostate cancer risk variant 5′ of MSMB (microseminoprotein-beta) is a strong predictor of circulating MSP (microseminoprotein) in multiple populations. Cancer Epidemiology, Biomarkers and Prevention, 19(10), 2639–2646.CrossRefGoogle Scholar
  15. 15.
    Eeles, R. A., Kote-Jarai, Z., Giles, G. G., Olama, A. A., Guy, M., Jugurnauth, S. K., et al. (2008). Multiple newly identified loci associated with prostate cancer susceptibility. Nature Genetics, 40, 316–321.CrossRefGoogle Scholar
  16. 16.
    Hung, R. J., McKay, J. D., Gaborieau, V., Boffetta, P., Hashibe, M., Zaridze, D., et al. (2008). A susceptibility locus for lung cancer maps to nicotinic acetylcholine receptor subunit genes on 15q25. Nature, 452, 633–637.CrossRefGoogle Scholar
  17. 17.
    McKay, J. D., Hung, R. J., Gaborieau, V., Boffetta, P., Chabrier, A., Byrnes, G., et al. (2008). Lung cancer susceptibility locus at 5p15.33. Nature Genetics, 40, 1404–1406.CrossRefGoogle Scholar
  18. 18.
    Greenland, S. (1980). The effect of misclassification in the presence of covariates. American Journal of Epidemiology, 112, 564–569.Google Scholar
  19. 19.
    Spielman, R. S., McGinnis, R. E., & Ewens, W. J. (1993). Transmission test for linkage disequilibrium: The insulin gene region and insulin-dependent diabetes mellitus (IDDM). The American Journal of Human Genetics, 52, 506–516.Google Scholar
  20. 20.
    Self, S. G., Longton, G., Kopecky, K. J., & Liang, K. Y. (1991). On estimating HLA/disease association with application to a study of aplastic anemia. Biometrics, 47, 53–61.CrossRefGoogle Scholar
  21. 21.
    Weinberg, C. R. (1999). Methods for detection of parent-of-origin effects in genetic studies of case-parents triads. The American Journal of Human Genetics, 65, 229–235.CrossRefGoogle Scholar
  22. 22.
    Piegorsch, W. W., Weinberg, C. R., & Taylor, J. A. (1994). Non-hierarchical logistic models and case-only designs for assessing susceptibility in population-based case–control studies. Statistics in Medicine, 13, 153–162.CrossRefGoogle Scholar
  23. 23.
    Cornelis, M. C., Tchetgen, E. J., Liang, L., Qi, L., Chatterjee, N., Hu, F. B., et al. (2012). Gene-environment interactions in genome-wide association studies: A comparative study of tests applied to empirical studies of type 2 diabetes. American Journal of Epidemiology, 175, 191–202.CrossRefGoogle Scholar
  24. 24.
    Mukherjee, B., Ahn, J., Gruber, S. B., & Chatterjee, N. (2012). Testing gene-environment interaction in large-scale case–control association studies: Possible choices and comparisons. American Journal of Epidemiology, 175, 177–190.CrossRefGoogle Scholar
  25. 25.
    Mukherjee, B., & Chatterjee, N. (2008). Exploiting gene-environment independence for analysis of case–control studies: An empirical Bayes-type shrinkage estimator to trade-off between bias and efficiency. Biometrics, 64, 685–694.MathSciNetCrossRefMATHGoogle Scholar
  26. 26.
    Murcray, C. E., Lewinger, J. P., & Gauderman, W. J. (2009). Gene-environment interaction in genome-wide association studies. American Journal of Epidemiology, 169, 219–226.CrossRefGoogle Scholar
  27. 27.
    Murcray, C. E., Lewinger, J. P., Conti, D. V., Thomas, D. C., & Gauderman, W. J. (2011). Sample size requirements to detect gene-environment interactions in genome-wide association studies. Genetic Epidemiology, 35, 201–210.CrossRefGoogle Scholar
  28. 28.
    Wang, H., Haiman, C. A., Kolonel, L. N., Henderson, B. E., Wilkens, L. R., Le Marchand, L., et al. (2010). Self-reported ethnicity, genetic structure and the impact of population stratification in a multiethnic study. Human Genetics, 128, 165–177.CrossRefGoogle Scholar
  29. 29.
    Purcell, S., Neale, B., Todd-Brown, K., Thomas, L., Ferreira, M. A., Bender, D., et al. (2007). PLINK: A tool set for whole-genome association and population-based linkage analyses. The American Journal of Human Genetics, 81, 559–575.CrossRefGoogle Scholar
  30. 30.
    Dudbridge, F., & Gusnanto, A. (2008). Estimation of significance thresholds for genomewide association scans. Genetic Epidemiology, 32, 227–234.CrossRefGoogle Scholar
  31. 31.
    Pe’er, I., Yelensky, R., Altshuler, D., & Daly, M. J. (2008). Estimation of the multiple testing burden for genomewide association studies of nearly all common variants. Genetic Epidemiology, 32, 381–385.CrossRefGoogle Scholar
  32. 32.
    Siegmund, D., & Yakir, Y. (2007). The statistics of gene mapping. New York, NY: Springer.MATHGoogle Scholar
  33. 33.
    Song, C., Chen, G. K., Millikan, R. C., Ambrosone, C. B., John, E. M., Bernstein, L., et al. (2013). A genome-wide scan for breast cancer risk haplotypes among African American women. PLoS One, 8, e57298.CrossRefGoogle Scholar
  34. 34.
    Cheng, I., Chen, G. K., Nakagawa, H., He, J., Wan, P., Lurie, C., et al. (2012). Evaluating genetic risk for prostate cancer among Japanese and Latinos. Cancer Epidemiology, Biomarkers and Prevention, 21(11), 2048–2058.CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media New York 2014

Authors and Affiliations

  • Daniel O. Stram
    • 1
  1. 1.Department of Preventive MedicineUniversity of Southern California Keck School of MedicineLos AngelesUSA

Personalised recommendations