Skip to main content

Correcting for Hidden Population Structure in Single Marker Association Testing and Estimation

  • Chapter
  • First Online:
  • 3244 Accesses

Part of the book series: Statistics for Biology and Health ((SBH))

Abstract

Chapter 2 discussed both relatedness of study participants and hidden population structure in terms of the correlations induced between the number of copies, n iA and n jA , of a diallelic genetic variant carried by two individuals i and j. In Chap. 3 we discussed the requirement for association studies of unrelated subjects that the outcomes of interest, Y i , be independent between study subjects. In this chapter we will expand on this initial discussion (1) to examine the impact of non-independence on the distribution of statistical tests for the influence of alleles (here a and A) on phenotype or disease risk, and (2) how non-independence between individuals’ outcomes can arise as a direct result of correlation among the genotypes of study subjects due to hidden strata or relatedness or due to other factors (e.g., cultural/behavioral) that act as confounders of genetic associations. The chapter introduces several basic approaches for dealing with population structure in single marker association analyses and shows how all these methods deal, at least in part, with the fundamental problem of the analysis of correlated phenotypes. At the heart of these methods is the empirical estimation of a relationship matrix (more precisely a covariance structure matrix) that describes the relative relatedness of individuals. The statistical methods for dealing with covariances in estimation of single marker effects fall into three categories: fixed effects models utilizing adjustment for eigenvectors (“principal components”) of this matrix; random effects methods dealing explicitly with the relationship matrix as a covariance matrix of random effects in extended generalized linear modeling; and retrospective methods, which invert the usual generalized linear modeling procedures so that the conditional distribution of the genetic markers given the phenotypes (rather than the reverse) is used for inference in genetic association studies. Our discussion of all these approaches is unified around the theme of dealing with false-positive associations that are due to unrecognized inflation of the variance of estimators relied upon in traditional regression methods when correlated data are analyzed. Finally the relative performance of the various methods is described in various settings.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   109.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   139.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD   139.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Pike, M. C., Kolonel, L. N., Henderson, B. E., Wilkens, L. R., Hankin, J. H., Feigelson, H. S., et al. (2002). Breast cancer in a multiethnic cohort in Hawaii and Los Angeles: Risk factor-adjusted incidence in Japanese equals and in Hawaiians exceeds that in whites. Cancer Epidemiology, Biomarkers and Prevention, 11, 795–800.

    Google Scholar 

  2. Manolio, T. A., Collins, F. S., Cox, N. J., Goldstein, D. B., Hindorff, L. A., Hunter, D. J., et al. (2009). Finding the missing heritability of complex diseases. Nature, 461, 747–753.

    Article  Google Scholar 

  3. Lango Allen, H., Estrada, K., Lettre, G., Berndt, S. I., Weedon, M. N., Rivadeneira, F., et al. (2010). Hundreds of variants clustered in genomic loci and biological pathways affect human height. Nature, 467, 832–838.

    Article  Google Scholar 

  4. Speliotes, E. K., Willer, C. J., Berndt, S. I., Monda, K. L., Thorleifsson, G., Jackson, A. U., et al. (2010). Association analyses of 249,796 individuals reveal 18 new loci associated with body mass index. Nature Genetics, 42, 937–948.

    Article  Google Scholar 

  5. Chambers, J. C., Zhang, W., Sehmi, J., Li, X., Wass, M. N., Van der Harst, P., et al. (2011). Genome-wide association study identifies loci influencing concentrations of liver enzymes in plasma. Nature Genetics, 43, 1131–1138.

    Article  Google Scholar 

  6. Ehret, G. B., Munroe, P. B., Rice, K. M., Bochud, M., Johnson, A. D., Chasman, D. I., et al. (2011). Genetic variants in novel pathways influence blood pressure and cardiovascular disease risk. Nature, 478, 103–109.

    Article  Google Scholar 

  7. O’Donovan, M. C., Craddock, N., Norton, N., Williams, H., Peirce, T., Moskvina, V., et al. (2008). Identification of loci associated with schizophrenia by genome-wide association and follow-up. Nature Genetics, 40, 1053–1055.

    Article  Google Scholar 

  8. Haiman, C. A., Chen, G. K., Blot, W. J., Strom, S. S., Berndt, S. I., Kittles, R. A., et al. (2011). Genome-wide association study of prostate cancer in men of African ancestry identifies a susceptibility locus at 17q21. Nature Genetics, 43, 570–573.

    Article  Google Scholar 

  9. Knowler, W. C., Williams, R. C., Pettitt, D. J., & Steinberg, A. G. (1988). Gm3;5,13,14 and type 2 diabetes mellitus: An association in American Indians with genetic admixture. American Journal of Human Genetics, 43, 520–526.

    Google Scholar 

  10. Chen, G. K., Millikan, R. C., John, E. M., Ambrosone, C. B., Bernstein, L., Zheng, W., et al. (2010). The potential for enhancing the power of genetic association studies in African Americans through the reuse of existing genotype data. PLoS Genetics, 6, e101096.

    Google Scholar 

  11. Lowe, J. K., Maller, J. B., Pe’er, I., Neale, B. M., Salit, J., Kenny, E. E., et al. (2009). Genome-wide association studies in an isolated founder population from the Pacific Island of Kosrae. PLoS Genetics, 5, e1000365.

    Article  Google Scholar 

  12. Bonnen, P. E., Lowe, J. K., Altshuler, D. M., Breslow, J. L., Stoffel, M., Friedman, J. M., et al. (2010). European admixture on the Micronesian island of Kosrae: Lessons from complete genetic information. European Journal of Human Genetics, 18, 309–316.

    Article  Google Scholar 

  13. Rabinowitz, D., & Laird, N. (2000). A unified approach to adjusting association tests for population admixture with arbitrary pedigree structure and arbitrary missing marker information. Human Heredity, 50, 211–223.

    Article  Google Scholar 

  14. Laird, N. M., Horvath, S., & Xu, X. (2000). Implementing a unified approach to family-based tests of association. Genetic Epidemiology, 19(Suppl 1), S36–S42.

    Article  Google Scholar 

  15. Devlin, B., & Roeder, K. (1999). Genomic control for association studies. Biometrics, 55, 997–1004.

    Article  MATH  Google Scholar 

  16. Devlin, B., Roeder, K., & Wasserman, L. (2001). Genomic control, a new approach to genetic-based association studies. Theoretical Population Biology, 60, 155–166.

    Article  MATH  Google Scholar 

  17. Price, A. L., Patterson, N. J., Plenge, R. M., Weinblatt, M. E., Shadick, N. A., & Reich, D. (2006). Principal components analysis corrects for stratification in genome-wide association studies. Nature Genetics, 38, 904–909.

    Article  Google Scholar 

  18. Kirkpatrick, M. (2010). How and why chromosome inversions evolve. PLoS Biology, 8. doi: 10.1371/journal.pbio.1000501.

    Google Scholar 

  19. Zou, F., Lee, S., Knowles, M. R., & Wright, F. A. (2010). Quantification of population structure using correlated SNPs by shrinkage principal components. Human Heredity, 70, 9–22.

    Article  Google Scholar 

  20. Hoggart, C. J., O’Reilly, P. F., Kaakinen, M., Zhang, W., Chambers, J. C., Kooner, J. S., et al. (2012). Fine-scale estimation of location of birth from genome-wide single-nucleotide polymorphism data. Genetics, 190, 669–677.

    Article  Google Scholar 

  21. Patterson, N., Price, A. L., & Reich, D. (2006). Population structure and eigenanalysis. PLoS Genetics, 2, e190.

    Article  Google Scholar 

  22. Tracy, C., & Widom, H. (1994). Level-spacing distributions and the Airy kernel. Communications in Mathematical Physics, 159, 151–174.

    Article  MathSciNet  MATH  Google Scholar 

  23. Price, A. L., Zaitlen, N. A., Reich, D., & Patterson, N. (2010). New approaches to population stratification in genome-wide association studies. Nature Reviews Genetics, 11, 459–463.

    Article  Google Scholar 

  24. Anderson, T. W. (1973). Asympotically efficient estimation of covariance matrices with linear structure. The Annals of Statistics, 1, 135–141.

    Article  MathSciNet  MATH  Google Scholar 

  25. Goldstein, H. (1986). Multilevel mixed linear model analysis using iterative generalized least squares. Biometrika, 73, 43–56.

    Article  MathSciNet  MATH  Google Scholar 

  26. Yang, J., Benyamin, B., McEvoy, B. P., Gordon, S., Henders, A. K., Nyholt, D. R., et al. (2010). Common SNPs explain a large proportion of the heritability for human height. Nature Genetics, 42, 565–569.

    Article  Google Scholar 

  27. Fisher, R. A. (1918). The correlation between relatives on the supposition of Mendelian inheritance. Transactions of the Royal Society of Edinburgh, 52, 399–433.

    Article  Google Scholar 

  28. Pilia, G., Chen, W. M., Scuteri, A., Orru, M., Albai, G., Dei, M., et al. (2006). Heritability of cardiovascular and personality traits in 6,148 Sardinians. PLoS Genetics, 2, e132.

    Article  Google Scholar 

  29. Falconer, D. S., & Mcackay, T. F. C. (1996). Introduction to quantitative genetics. Harlow: Longman.

    Google Scholar 

  30. Kang, H. M., Sul, J. H., Service, S. K., Zaitlen, N. A., Kong, S. Y., Freimer, N. B., et al. (2010). Variance component model to account for sample structure in genome-wide association studies. Nature Genetics, 42, 348–354.

    Article  Google Scholar 

  31. Jennrich, R. I., & Schluchter, M. D. (1986). Unbalanced repeated-measures models with structured covariance matrices. Biometrics, 42, 805–820.

    Article  MathSciNet  MATH  Google Scholar 

  32. Almasy, L., & Warren, D. M. (2005). Software for quantitative trait analysis. Human Genomics, 2, 191–195.

    Article  Google Scholar 

  33. Wu, M. C., Kraft, P., Epstein, M. P., Taylor, D. M., Chanock, S. J., Hunter, D. J., et al. (2010). Powerful SNP-set analysis for case–control genome-wide association studies. American Journal of Human Genetics, 86, 929–942.

    Article  Google Scholar 

  34. Prentice, R., & Pyke, R. (1979). Logistic disease incidence models and case–control studies. Biometrika, 66, 403–411.

    Article  MathSciNet  MATH  Google Scholar 

  35. Bourgain, C., Hoffjan, S., Nicolae, R., Newman, D., Steiner, L., Walker, K., et al. (2003). Novel case–control test in a founder population identifies P-selectin as an atopy-susceptibility locus. American Journal of Human Genetics, 73, 612–626.

    Article  Google Scholar 

  36. Rakovski, C., & Stram, D. O. (2009). A kinship-based modification of the Armitage trend test to address population structure and small differential genotyping errors. PloS One, 4, e5825.

    Article  Google Scholar 

  37. Thornton, T., & McPeek, M. S. (2010). ROADTRIPS: Case–control association testing with partially or completely unknown population and pedigree structure. American Journal of Human Genetics, 86, 172–184.

    Article  Google Scholar 

  38. Gauderman, W. J., Witte, J. S., & Thomas, D. C. (1999). Family-based association studies. Journal of the National Cancer Institute Monographs, 31–37.

    Google Scholar 

  39. Astle, W., & Balding, D. J. (2009). Population structure and cryptic relatedness in genetic association studies. Statistical Science, 24, 451–471.

    Article  MathSciNet  Google Scholar 

  40. Spielman, R. S., McGinnis, R. E., & Ewens, W. J. (1993). Transmission test for linkage disequilibrium: The insulin gene region and insulin-dependent diabetes mellitus (IDDM). American Journal of Human Genetics, 52, 506–516.

    Google Scholar 

  41. Cornelis, M. C., Tchetgen, E. J., Liang, L., Qi, L., Chatterjee, N., Hu, F. B., et al. (2012). Gene-environment interactions in genome-wide association studies: A comparative study of tests applied to empirical studies of type 2 diabetes. American Journal of Epidemiology, 175, 191–202.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

4.1 Electronic Supplementary Material

Below is the link to the electronic supplementary material.

chapter4 (ZIP 49.9 KB)

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer Science+Business Media New York

About this chapter

Cite this chapter

Stram, D.O. (2014). Correcting for Hidden Population Structure in Single Marker Association Testing and Estimation. In: Design, Analysis, and Interpretation of Genome-Wide Association Scans. Statistics for Biology and Health. Springer, New York, NY. https://doi.org/10.1007/978-1-4614-9443-0_4

Download citation

Publish with us

Policies and ethics