Genetic Association Studies

Chapter

Abstract

In genetic association studies, one analyzes associations between a (potentially very large) set of genetic markers and a phenotype of interest. This is a particular multiple test problem which has several challenging aspects, for instance the high dimensionality of the statistical parameter and the discreteness of the statistical model. In this chapter, we discuss how to fine-tune multiple tests that we have described theoretically in Part I in order to address these challenges. In particular, we propose the usage of realized randomized \(p\)-values in data-adaptive multiple tests and show how linkage disequilibrium among genetic markers can be employed to construct simultaneous test procedures and to establish probability bounds which lead to effective numbers of tests. Finally, we analyze (positive) dependency properties among test statistics and the applicability of standard margin-based multiple tests. The methods are applied to two real-life datasets.

Notes

Acknowledgments

This chapter makes use of data generated by the Wellcome Trust Case Control Consortium. A full list of the investigators who contributed to the generation of the data is available from http://www.wtccc.org.uk. Funding for the Wellcome Trust Case Control Consortium project was provided by the Wellcome Trust under award 076113. Parts of this chapter originated from joint work with Klaus Straßburger, Daniel Schunk, Carlos Morcillo-Suarez, Thomas Illig, Arcadi Navarro and Jens Stange. I am grateful to Mette Langaas and Øyvind Bakke for inviting me and for their hospitality during my visit to Norwegian University of Science and Technology (NTNU), for many fruitful discussions and for some valuable references.

References

  1. Agresti A (2002) Categorical data analysis. Wiley Series in Probability and Mathematical Statistics, 2nd edn. Wiley, ChichesterGoogle Scholar
  2. Dickhaus T (2012) Simultaneous Statistical Inference in dynamic factor models. SFB 649 Discussion Paper 2012–033, Sonderforschungsbereich 649, Humboldt Universität zu Berlin, Germany. http://sfb649.wiwi.hu-berlin.de/papers/pdf/SFB649DP2012-033.pdf
  3. Dickhaus T, Stange J (2013) Multiple point hypothesis test problems and effective numbers of tests for control of the family-wise error rate. Calcutta Statis Assoc Bull, to appearGoogle Scholar
  4. Dickhaus T, Strassburger K, Schunk D, Morcillo-Suarez C, Illig T, Navarro A (2012) How to analyze many contingency tables simultaneously in genetic association studies. Stat Appl Genet Mol Biol 11(4):Article 12Google Scholar
  5. Finner H, Straßburger K, Heid IM, Herder C, Rathmann W, Giani G, Dickhaus T, Lichtner P, Meitinger T, Wichmann HE, Illig T, Gieger C (2010) How to link call rate and p-values for Hardy-Weinberg equilibrium as measures of genome-wide SNP data quality. Stat Med 29(22):2347–2358Google Scholar
  6. Herder C, Rathmann W, Strassburger K, Finner H, Grallert H, Huth C, Meisinger C, Gieger C, Martin S, Giani G, Scherbaum WA, Wichmann HE, Illig T (2008) Variants of the PPARG, IGF2BP2, CDKAL1, HHEX, and TCF7L2 genes confer risk of type 2 diabetes independently of BMI in the German KORA studies. Horm Metab Res 40:722–726PubMedCrossRefGoogle Scholar
  7. Howie BN, Donnelly P, Marchini J (2009) A flexible and accurate genotype imputation method for the next generation of genome-wide association studies. PLoS Genet 5:e1000,529Google Scholar
  8. Karlin S, Rinott Y (1980) Classes of orderings of measures and related correlation inequalities I. Multivariate totally positive distributions. J Multivariate Anal 10:467–498CrossRefGoogle Scholar
  9. Langaas M, Bakke Ø (2013) Robust Methods for Disease-Genotype Association in Genetic Association Studies: Calculate p-values using exact conditional enumeration instead of asymptotic approximations. arXiv:1307.7536v1Google Scholar
  10. Lewontin RC, Kojima KI (1960) The evolutionary dynamics of complex polymorphisms. Evolution 14:458–472CrossRefGoogle Scholar
  11. Li Y, Willer CJ, Ding J, Scheet P, Abecasis GR (2010) MaCH: using sequence and genotype data to estimate haplotypes and unobserved genotypes. Genet Epidemiol 34:816–834PubMedCentralPubMedCrossRefGoogle Scholar
  12. Marchini J, Howie B, Myers S, McVean G, Donnelly P (2007) A new multipoint method for genome-wide association studies by imputation of genotypes. Nat Genet 39:906–913PubMedCrossRefGoogle Scholar
  13. Meinshausen N, Meier L, Bühlmann P (2009) \(p\)-Values for high-dimensional regression. J Am Stat Assoc 104(488):1671–1681. doi:  10.1198/jasa.2009.tm08647 CrossRefGoogle Scholar
  14. Moskvina V, Schmidt KM (2008) On multiple-testing correction in genome-wide association studies. Genet Epidemiol 32:567–573PubMedCrossRefGoogle Scholar
  15. Spokoiny V, Dickhaus T (2014) Basics of modern parametric statistics. Springer, Heidelberg, forthcomingGoogle Scholar
  16. The 1000 Genomes Consortium (2010) A map of human genome variation from population-scale sequencing. Nature 467:1061–1073Google Scholar
  17. The International HapMap Consortium (2005) A haplotype map of the human genome. Nature 437(7063):1299–1320Google Scholar
  18. The Wellcome Trust Case Control Consortium (2007) Genome-wide association study of 14,000 cases of seven common diseases and 3,000 hared controls. Nature 447(7):661–678Google Scholar
  19. Wasserman L, Roeder K (2009) High-dimensional variable selection. Ann Stat 37(5A):2178–2201PubMedCentralPubMedCrossRefGoogle Scholar
  20. Weir BS (1996) Genetic data analysis II. Sinauer Associates, Sunderland, MAGoogle Scholar
  21. Wigginton JE, Cutler DJ, Abecasis GR (2005) A Note on Exact Tests of Hardy-Weinberg Equilibrium. Am J Hum Genet 76:887–893PubMedCentralPubMedCrossRefGoogle Scholar
  22. Willer CJ, Sanna S, Jackson AU, Scuteri A, Bonnycastle LL, Clarke R, Heath SC, Timpson NJ, Najjar SS, Stringham HM, Strait J, Duren WL, Maschio A, Busonero F, Mulas A, Albai G, Swift AJ, Morken MA, Narisu N, Bennett D, Parish S, Shen H, Galan P, Meneton P, Hercberg S, Zelenika D, Chen WM, Li Y, Scott LJ, Scheet PA, Sundvall J, Watanabe RM, Nagaraja R, Ebrahim S, Lawlor DA, Ben-Shlomo Y, Davey-Smith G, Shuldiner AR, Collins R, Bergman RN, Uda M, Tuomilehto J, Cao A, Collins FS, Lakatta E, Lathrop GM, Boehnke M, Schlessinger D, Mohlke KL, Abecasis GR (2008) Newly identified loci that influence lipid concentrations and risk of coronary artery disease. Nat Genet 40:161–169PubMedCrossRefGoogle Scholar
  23. Zheng G, Yang Y, Zhu X, Elston RC (2012) Analysis of genetic association studies. Statistics for biology and health. Springer, New York. doi: 10.1007/978-1-4614-2245-7
  24. Ziegler A, König IR (2006) A statistical approach to genetic epidemiology. Wiley, WeinheimGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2014

Authors and Affiliations

  1. 1.Research Group “Stochastic Algorithms and Nonparametric Statistics”Weierstrass Institute for Applied Analysis and StochasticsBerlinGermany

Personalised recommendations