Abstract
This chapter focuses on techniques commonly used in GWAS studies to estimate single SNP marker associations in samples of unrelated individuals; when the phenotype is discrete (disease/no disease) then case–control methods, conditional and unconditional logistic regression, are typically utilized. Maximum likelihood estimation for generalized linear models is reviewed, and the score, Wald, and likelihood ratio tests are defined and discussed. The analysis of data from nuclear family-based designs is also briefly introduced. Issues regarding confounding, measurement error, effect mediation, and interactions are described. Control for multiple comparisons is reviewed with an emphasis placed on the behavior of the Bonferroni criteria for multiple correlated tests. The effects on statistical estimation and inference of the loss of independence between outcomes are characterized for a specific model of loss of independence, which is relevant to the presence of hidden population structure or relatedness. These last results build on a basic theme described in Chap. 2 and are then carried forward in Chap. 4.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsNotes
- 1.
Hidden stratification or admixture + either cultural practices affecting disease risk varying by ethnicity and/or the presence of an unmeasured polygene are obvious candidates.
References
Armitage, P. (1955). Tests for linear trends in rates and proportions. Biometrics, 11, 375–386.
McCullagh, P., & Nelder, J. (1989). Generalized linear models (2nd ed.). Boca Raton, FL: CRC Press.
Moore, D. F. (1986). Asymptotic properties of moment estimates for overdispersed counts and proportions. Biometrika, 73(3), 583–588.
Liang, K. Y., & Zeger, S. L. (1986). Longitudinal data analysis using generalized linear models. Biometrika, 73(1), 13–22.
Hauck, W., & Donner, A. (1977). Wald’s test as applied to hypotheses in Logit analysis. Journal of the American Statistical Association, 72, 851–853.
Schott, J. R. (1997). Matrix analysis for statistics. New York, NY: Wiley.
Kraft, P., Yen, Y. C., Stram, D. O., Morrison, J., & Gauderman, W. J. (2007). Exploiting gene-environment interaction to detect genetic associations. Human Heredity, 63, 111–119.
Maskarinec, G., Grandinetti, A., Matsuura, G., Sharma, S., Mau, M., Henderson, B. E., et al. (2009). Diabetes prevalence and body mass index differ by ethnicity: The Multiethnic Cohort. Ethnicity and Disease, 19, 49–55.
Frayling, T. M., Timpson, N. J., Weedon, M. N., Zeggini, E., Freathy, R. M., Lindgren, C. M., et al. (2007). A common variant in the FTO gene is associated with body mass index and predisposes to childhood and adult obesity. Science, 316, 889–894.
Hertel, J. K., Johansson, S., Raeder, H., Midthjell, K., Lyssenko, V., Groop, L., et al. (2008). Genetic analysis of recently identified type 2 diabetes loci in 1,638 unselected patients with type 2 diabetes and 1,858 control participants from a Norwegian population-based cohort (the HUNT study). Diabetologia, 51, 971–977.
Freathy, R. M., Timpson, N. J., Lawlor, D. A., Pouta, A., Ben-Shlomo, Y., Ruokonen, A., et al. (2008). Common variation in the FTO gene alters diabetes-related metabolic traits to the extent expected given its effect on BMI. Diabetes, 57, 1419–1426.
Smith, G. D., & Ebrahim, S. (2004). Mendelian randomization: Prospects, potentials, and limitations. International Journal of Epidemiology, 33, 30–42.
Reeves, J. R., Dulude, H., Panchal, C., Daigneault, L., & Ramnani, D. M. (2006). Prognostic value of prostate secretory protein of 94 amino acids and its binding protein after radical prostatectomy. Clinical Cancer Research, 12, 6018–6022.
Waters, K. M., Stram, D. O., Le Marchand, L., Klein, R. J., Valtonen-Andre, C., Peltola, M., et al. (2010). A common prostate cancer risk variant 5′ of MSMB (microseminoprotein-beta) is a strong predictor of circulating MSP (microseminoprotein) in multiple populations. Cancer Epidemiology, Biomarkers and Prevention, 19(10), 2639–2646.
Eeles, R. A., Kote-Jarai, Z., Giles, G. G., Olama, A. A., Guy, M., Jugurnauth, S. K., et al. (2008). Multiple newly identified loci associated with prostate cancer susceptibility. Nature Genetics, 40, 316–321.
Hung, R. J., McKay, J. D., Gaborieau, V., Boffetta, P., Hashibe, M., Zaridze, D., et al. (2008). A susceptibility locus for lung cancer maps to nicotinic acetylcholine receptor subunit genes on 15q25. Nature, 452, 633–637.
McKay, J. D., Hung, R. J., Gaborieau, V., Boffetta, P., Chabrier, A., Byrnes, G., et al. (2008). Lung cancer susceptibility locus at 5p15.33. Nature Genetics, 40, 1404–1406.
Greenland, S. (1980). The effect of misclassification in the presence of covariates. American Journal of Epidemiology, 112, 564–569.
Spielman, R. S., McGinnis, R. E., & Ewens, W. J. (1993). Transmission test for linkage disequilibrium: The insulin gene region and insulin-dependent diabetes mellitus (IDDM). The American Journal of Human Genetics, 52, 506–516.
Self, S. G., Longton, G., Kopecky, K. J., & Liang, K. Y. (1991). On estimating HLA/disease association with application to a study of aplastic anemia. Biometrics, 47, 53–61.
Weinberg, C. R. (1999). Methods for detection of parent-of-origin effects in genetic studies of case-parents triads. The American Journal of Human Genetics, 65, 229–235.
Piegorsch, W. W., Weinberg, C. R., & Taylor, J. A. (1994). Non-hierarchical logistic models and case-only designs for assessing susceptibility in population-based case–control studies. Statistics in Medicine, 13, 153–162.
Cornelis, M. C., Tchetgen, E. J., Liang, L., Qi, L., Chatterjee, N., Hu, F. B., et al. (2012). Gene-environment interactions in genome-wide association studies: A comparative study of tests applied to empirical studies of type 2 diabetes. American Journal of Epidemiology, 175, 191–202.
Mukherjee, B., Ahn, J., Gruber, S. B., & Chatterjee, N. (2012). Testing gene-environment interaction in large-scale case–control association studies: Possible choices and comparisons. American Journal of Epidemiology, 175, 177–190.
Mukherjee, B., & Chatterjee, N. (2008). Exploiting gene-environment independence for analysis of case–control studies: An empirical Bayes-type shrinkage estimator to trade-off between bias and efficiency. Biometrics, 64, 685–694.
Murcray, C. E., Lewinger, J. P., & Gauderman, W. J. (2009). Gene-environment interaction in genome-wide association studies. American Journal of Epidemiology, 169, 219–226.
Murcray, C. E., Lewinger, J. P., Conti, D. V., Thomas, D. C., & Gauderman, W. J. (2011). Sample size requirements to detect gene-environment interactions in genome-wide association studies. Genetic Epidemiology, 35, 201–210.
Wang, H., Haiman, C. A., Kolonel, L. N., Henderson, B. E., Wilkens, L. R., Le Marchand, L., et al. (2010). Self-reported ethnicity, genetic structure and the impact of population stratification in a multiethnic study. Human Genetics, 128, 165–177.
Purcell, S., Neale, B., Todd-Brown, K., Thomas, L., Ferreira, M. A., Bender, D., et al. (2007). PLINK: A tool set for whole-genome association and population-based linkage analyses. The American Journal of Human Genetics, 81, 559–575.
Dudbridge, F., & Gusnanto, A. (2008). Estimation of significance thresholds for genomewide association scans. Genetic Epidemiology, 32, 227–234.
Pe’er, I., Yelensky, R., Altshuler, D., & Daly, M. J. (2008). Estimation of the multiple testing burden for genomewide association studies of nearly all common variants. Genetic Epidemiology, 32, 381–385.
Siegmund, D., & Yakir, Y. (2007). The statistics of gene mapping. New York, NY: Springer.
Song, C., Chen, G. K., Millikan, R. C., Ambrosone, C. B., John, E. M., Bernstein, L., et al. (2013). A genome-wide scan for breast cancer risk haplotypes among African American women. PLoS One, 8, e57298.
Cheng, I., Chen, G. K., Nakagawa, H., He, J., Wan, P., Lurie, C., et al. (2012). Evaluating genetic risk for prostate cancer among Japanese and Latinos. Cancer Epidemiology, Biomarkers and Prevention, 21(11), 2048–2058.
Author information
Authors and Affiliations
3.1 Electronic Supplementary Material
Below is the link to the electronic supplementary material.
Appendix
Appendix
Proof of Equation (3.26)
We have the OLS estimate of σ 2 equal to \( \frac{1}{N-r}Y\prime \left(I-\mathbf{X}{\left(\mathbf{X}\hbox{'}\mathbf{X}\right)}^{-1}\mathbf{X}\prime \right)Y \). In order to simply the notation re-write this as
where P = X(X′X)− 1 X′. Note that PP = P and (I − P)(I − P) = (I − P) i.e. both P and I − P are idempotent with trace equal to r and N − r respectively. Now we take the expected value of the estimate. We have
Note that
Thus the above expression simplifies to
Since it assumed that Var(Y) = σ 2 I + γ 2 K, the above expression is equal to
Rights and permissions
Copyright information
© 2014 Springer Science+Business Media New York
About this chapter
Cite this chapter
Stram, D.O. (2014). An Introduction to Association Analysis. In: Design, Analysis, and Interpretation of Genome-Wide Association Scans. Statistics for Biology and Health. Springer, New York, NY. https://doi.org/10.1007/978-1-4614-9443-0_3
Download citation
DOI: https://doi.org/10.1007/978-1-4614-9443-0_3
Published:
Publisher Name: Springer, New York, NY
Print ISBN: 978-1-4614-9442-3
Online ISBN: 978-1-4614-9443-0
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)