Skip to main content

Introduction to Statistical Methods for Integrative Data Analysis in Genome-Wide Association Studies

  • Chapter
  • First Online:
Big Data Analytics in Genomics

Abstract

Scientists in the life science field have long been seeking genetic variants associated with complex phenotypes to advance our understanding of complex genetic disorders. In the past decade, genome-wide association studies (GWASs) have been used to identify many thousands of genetic variants, each associated with at least one complex phenotype. Despite these successes, there is one major challenge towards fully characterizing the biological mechanism of complex diseases. It has been long hypothesized that many complex diseases are driven by the combined effect of many genetic variants, formally known as “polygenicity,” each of which may only have a small effect. To identify these genetic variants, large sample sizes are required but meeting such a requirement is usually beyond the capacity of a single GWAS. As the era of big data is coming, many genomic consortia are generating an enormous amount of data to characterize the functional roles of genetic variants and these data are widely available to the public. Integrating rich genomic data to deepen our understanding of genetic architecture calls for statistically rigorous methods in the big-genomic-data analysis. In this book chapter, we present a brief introduction to recent progresses on the development of statistical methodology for integrating genomic data. Our introduction begins with the discovery of polygenic genetic architecture, and aims at providing a unified statistical framework of integrative analysis. In particular, we highlight the importance of integrative analysis of multiple GWAS and functional information. We believe that statistically rigorous integrative analysis can offer more biologically interpretable inference and drive new scientific insights.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 149.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 199.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 199.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Hana Lango Allen, Karol Estrada, Guillaume Lettre, Sonja I Berndt, Michael N Weedon, Fernando Rivadeneira, and et al. Hundreds of variants clustered in genomic loci and biological pathways affect human height. Nature, 467(7317):832–838, 2010.

    Google Scholar 

  2. Kristin G Ardlie, David S Deluca, Ayellet V Segrè, Timothy J Sullivan, Taylor R Young, Ellen T Gelfand, Casandra A Trowbridge, Julian B Maller, Taru Tukiainen, Monkol Lek, et al. The Genotype-Tissue Expression (GTEx) pilot analysis: Multitissue gene regulation in humans. Science, 348(6235):648–660, 2015.

    Google Scholar 

  3. Christopher M Bishop and Nasser M Nasrabadi. Pattern recognition and machine learning, volume 1. Springer New York, 2006.

    Google Scholar 

  4. Brendan K Bulik-Sullivan, Po-Ru Loh, Hilary K Finucane, Stephan Ripke, Jian Yang, Nick Patterson, Mark J Daly, Alkes L Price, Benjamin M Neale, Schizophrenia Working Group of the Psychiatric Genomics Consortium, et al. LD score regression distinguishes confounding from polygenicity in genome-wide association studies. Nature genetics, 47(3):291–295, 2015.

    Google Scholar 

  5. Rita M Cantor, Kenneth Lange, and Janet S Sinsheimer. Prioritizing GWAS results: a review of statistical methods and recommendations for their application. The American Journal of Human Genetics, 86(1):6–22, 2010.

    Google Scholar 

  6. Peter Carbonetto and Matthew Stephens. Integrated enrichment analysis of variants and pathways in genome-wide association studies indicates central role for il-2 signaling genes in type 1 diabetes, and cytokine signaling genes in Crohn’s disease. PLoS Genet, 9(10):1003770, 2013.

    Google Scholar 

  7. Peter Carbonetto, Matthew Stephens, et al. Scalable variational inference for Bayesian variable selection in regression, and its accuracy in genetic association studies. Bayesian Analysis, 7(1):73–108, 2012.

    Google Scholar 

  8. Dongjun Chung, Can Yang, Cong Li, Joel Gelernter, and Hongyu Zhao. GPA: A Statistical Approach to Prioritizing GWAS Results by Integrating Pleiotropy and Annotation. PLoS genetics, 10(11):e1004787, 2014.

    Google Scholar 

  9. ENCODE Project Consortium et al. An integrated encyclopedia of DNA elements in the human genome. Nature, 489(7414):57–74, 2012.

    Google Scholar 

  10. Chris Cotsapas, Benjamin F Voight, Elizabeth Rossin, Kasper Lage, Benjamin M Neale, Chris Wallace, Gonçalo R Abecasis, Jeffrey C Barrett, Timothy Behrens, Judy Cho, et al. Pervasive sharing of genetic effects in autoimmune disease. PLoS genetics, 7(8):e1002254, 2011.

    Google Scholar 

  11. Cross-Disorder Group of the Psychiatric Genomics Consortium. Genetic relationship between five psychiatric disorders estimated from genome-wide SNPs. Nature genetics, 45(9):984–994, 2013.

    Google Scholar 

  12. Cross-Disorder Group of the Psychiatric Genomics Consortium. Identification of risk loci with shared effects on five major psychiatric disorders: a genome-wide analysis. Lancet, 2013.

    Google Scholar 

  13. Gustavo de los Campos, Daniel Sorensen, and Daniel Gianola. Genomic heritability: what is it? PLoS Genetics, 10(5):e1005048, 2015.

    Google Scholar 

  14. B. Efron. Large-Scale Inference: Empirical Bayes Methods for Estimation, Testing, and Prediction. Cambridge University Press, 2010.

    Book  MATH  Google Scholar 

  15. Bradley Efron. The future of indirect evidence. Statistical science: a review journal of the Institute of Mathematical Statistics, 25(2):145, 2010.

    Google Scholar 

  16. Bradley Efron et al. Microarrays, empirical Bayes and the two-groups model. STAT SCI, 23(1):1–22, 2008.

    Google Scholar 

  17. John D Eicher, Christa Landowski, Brian Stackhouse, Arielle Sloan, Wenjie Chen, Nicole Jensen, Ju-Ping Lien, Richard Leslie, and Andrew D Johnson. GRASP v2. 0: an update on the Genome-Wide Repository of Associations between SNPs and phenotypes. Nucleic acids research, 43(D1):D799–D804, 2015.

    Google Scholar 

  18. Douglas S Falconer, Trudy FC Mackay, and Richard Frankham. Introduction to quantitative genetics (4th edn). Trends in Genetics, 12(7):280, 1996.

    Google Scholar 

  19. Hilary K Finucane, Brendan Bulik-Sullivan, Alexander Gusev, Gosia Trynka, Yakir Reshef, Po-Ru Loh, Verneri Anttila, Han Xu, Chongzhi Zang, Kyle Farh, et al. Partitioning heritability by functional annotation using genome-wide association summary statistics. Nature genetics, 47(11):1228–1235, 2015.

    Google Scholar 

  20. R. A. Fisher. The correlations between relatives on the supposition of Mendelian inheritance. Philosophical Transactions of the Royal Society of Edinburgh, 52:399–433, 1918.

    Article  Google Scholar 

  21. Olivia Fletcher and Richard S Houlston. Architecture of inherited susceptibility to common cancer. Nature Reviews Cancer, 10(5):353–361, 2010.

    Google Scholar 

  22. Mary D Fortune, Hui Guo, Oliver Burren, Ellen Schofield, Neil M Walker, Maria Ban, Stephen J Sawcer, John Bowes, Jane Worthington, Anne Barton, et al. Statistical colocalization of genetic risk variants for related autoimmune diseases in the context of common controls. Nature genetics, 47(7):839–846, 2015.

    Google Scholar 

  23. Eric R Gamazon, Heather E Wheeler, Kaanan P Shah, Sahar V Mozaffari, Keston Aquino-Michaels, Robert J Carroll, Anne E Eyler, Joshua C Denny, Dan L Nicolae, Nancy J Cox, et al. A gene-based association method for mapping traits using reference transcriptome data. Nature genetics, 47(9):1091–1098, 2015.

    Google Scholar 

  24. Claudia Giambartolomei, Damjan Vukcevic, Eric E Schadt, Lude Franke, Aroon D Hingorani, Chris Wallace, and Vincent Plagnol. Bayesian test for colocalisation between pairs of genetic association studies using summary statistics. PLoS Genetics, 10(5):e1004383, 2014.

    Google Scholar 

  25. Arthur R Gilmour, Robin Thompson, and Brian R Cullis. Average information REML: an efficient algorithm for variance parameter estimation in linear mixed models. Biometrics, pages 1440–1450, 1995.

    Google Scholar 

  26. David Golan, Eric S Lander, and Saharon Rosset. Measuring missing heritability: Inferring the contribution of common variants. Proceedings of the National Academy of Sciences, 111(49):E5272–E5281, 2014.

    Google Scholar 

  27. Anthony J.F. Griffiths, Susan R. Wessler, Sean B. Carroll, and John Doebley. An introduction to genetic analysis, 11 edition. W. H. Freeman, 2015.

    Google Scholar 

  28. William G Hill, Michael E Goddard, and Peter M Visscher. Data and theory point to mainly additive genetic variance for complex traits. PLoS Genet, 4(2):e1000008, 2008.

    Google Scholar 

  29. L.A. Hindorff, P. Sethupathy, H.A. Junkins, E.M. Ramos, J.P. Mehta, F.S. Collins, and T.A. Manolio. Potential etiologic and functional implications of genome-wide association loci for human diseases and traits. Proceedings of the National Academy of Sciences, 106(23):9362, 2009.

    Google Scholar 

  30. Jiming Jiang, Cong Li, Debashis Paul, Can Yang, and Hongyu Zhao. High-dimensional genome-wide association study and misspecified mixed model analysis. arXiv preprint arXiv:1404.2355, to appear in Annals of statistics, 2014.

    Google Scholar 

  31. Robert J Klein, Caroline Zeiss, Emily Y Chew, Jen-Yue Tsai, Richard S Sackler, Chad Haynes, Alice K Henning, John Paul SanGiovanni, Shrikant M Mane, Susan T Mayne, et al. Complement factor h polymorphism in age-related macular degeneration. Science, 308(5720):385–389, 2005.

    Google Scholar 

  32. Siddharth Krishna Kumar, Marcus W Feldman, David H Rehkopf, and Shripad Tuljapurkar. Limitations of GCTA as a solution to the missing heritability problem. Proceedings of the National Academy of Sciences, 113(1):E61–E70, 2016.

    Google Scholar 

  33. Anshul Kundaje, Wouter Meuleman, Jason Ernst, Misha Bilenky, Angela Yen, Alireza Heravi-Moussavi, Pouya Kheradpour, Zhizhuo Zhang, Jianrong Wang, Michael J Ziller, et al. Integrative analysis of 111 reference human epigenomes. Nature, 518(7539):317–330, 2015.

    Google Scholar 

  34. S Hong Lee, Teresa R DeCandia, Stephan Ripke, Jian Yang, Patrick F Sullivan, Michael E Goddard, and et al. Estimating the proportion of variation in susceptibility to schizophrenia captured by common SNPs. Nature genetics, 44(3):247–250, 2012.

    Google Scholar 

  35. SH Lee, J Yang, ME Goddard, PM Visscher, and NR Wray. Estimation of pleiotropy between complex diseases using SNP-derived genomic relationships and restricted maximum likelihood. Bioinformatics, page bts474, 2012.

    Google Scholar 

  36. Richard Leslie, Christopher J ODonnell, and Andrew D Johnson. GRASP: analysis of genotype–phenotype results from 1390 genome-wide association studies and corresponding open access database. Bioinformatics, 30(12):i185–i194, 2014.

    Google Scholar 

  37. Cong Li, Can Yang, Joel Gelernter, and Hongyu Zhao. Improving genetic risk prediction by leveraging pleiotropy. Human genetics, 133(5):639–650, 2014.

    Article  Google Scholar 

  38. James Liley and Chris Wallace. A pleiotropy-informed Bayesian false discovery rate adapted to a shared control design finds new disease associations from GWAS summary statistics. PLoS genetics, 11(2):e1004926, 2015.

    Google Scholar 

  39. John Lonsdale, Jeffrey Thomas, Mike Salvatore, Rebecca Phillips, Edmund Lo, Saboor Shad, Richard Hasz, Gary Walters, Fernando Garcia, Nancy Young, et al. The genotype-tissue expression (GTEx) project. Nature genetics, 45(6):580–585, 2013.

    Google Scholar 

  40. Michael Lynch, Bruce Walsh, et al. Genetics and analysis of quantitative traits, volume 1. Sinauer Sunderland, MA, 1998.

    Google Scholar 

  41. Robert Maier, Gerhard Moser, Guo-Bo Chen, Stephan Ripke, William Coryell, James B Potash, William A Scheftner, Jianxin Shi, Myrna M Weissman, Christina M Hultman, et al. Joint analysis of psychiatric disorders increases accuracy of risk prediction for schizophrenia, bipolar disorder, and major depressive disorder. The American Journal of Human Genetics, 96(2):283–294, 2015.

    Google Scholar 

  42. Teri A Manolio, Francis S Collins, Nancy J Cox, David B Goldstein, Lucia A Hindorff, David J Hunter, Mark I McCarthy, Erin M Ramos, Lon R Cardon, Aravinda Chakravarti, et al. Finding the missing heritability of complex diseases. Nature, 461(7265):747–753, 2009.

    Google Scholar 

  43. Geoffrey McLachlan and Thriyambakam Krishnan. The EM algorithm and extensions, volume 382. John Wiley & Sons, 2008.

    Google Scholar 

  44. Toby J Mitchell and John J Beauchamp. Bayesian variable selection in linear regression. Journal of the American Statistical Association, 83(404):1023–1032, 1988.

    Google Scholar 

  45. Alkes L Price, Nick J Patterson, Robert M Plenge, Michael E Weinblatt, Nancy A Shadick, and David Reich. Principal components analysis corrects for stratification in genome-wide association studies. Nature genetics, 38(8):904–909, 2006.

    Google Scholar 

  46. Neil Risch, Kathleen Merikangas, et al. The future of genetic studies of complex human diseases. Science, 273(5281):1516–1517, 1996.

    Google Scholar 

  47. Marylyn D Ritchie, Emily R Holzinger, Ruowang Li, Sarah A Pendergrass, and Dokyoon Kim. Methods of integrating data to uncover genotype-phenotype interactions. Nature Reviews Genetics, 16(2):85–97, 2015.

    Google Scholar 

  48. Schizophrenia Working Group of the Psychiatric Genomics Consortium. Biological insights from 108 schizophrenia-associated genetic loci. Nature, 511(7510):421–427, 2014.

    Google Scholar 

  49. Shanya Sivakumaran, Felix Agakov, Evropi Theodoratou, et al. Abundant pleiotropy in human complex diseases and traits. AM J HUM GENET, 89(5):607–618, 2011.

    Google Scholar 

  50. Nadia Solovieff, Chris Cotsapas, Phil H Lee, Shaun M Purcell, and Jordan W Smoller. Pleiotropy in complex traits: challenges and strategies. Nature Reviews Genetics, 14(7): 483–495, 2013.

    Google Scholar 

  51. Doug Speed and David J Balding. Relatedness in the post-genomic era: is it still useful? Nature Reviews Genetics, 16(1):33–44, 2015.

    Google Scholar 

  52. Doug Speed, Gibran Hemani, Michael R Johnson, and David J Balding. Improved heritability estimation from genome-wide SNPs. The American Journal of Human Genetics, 91(6):1011–1021, 2012.

    Google Scholar 

  53. Frank W Stearns. One hundred years of pleiotropy: a retrospective. Genetics, 186(3):767–773, 2010.

    Google Scholar 

  54. Aravind Subramanian, Pablo Tamayo, Vamsi K Mootha, Sayan Mukherjee, Benjamin L Ebert, Michael A Gillette, Amanda Paulovich, Scott L Pomeroy, Todd R Golub, Eric S Lander, et al. Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proceedings of the National Academy of Sciences of the United States of America, 102(43):15545–15550, 2005.

    Google Scholar 

  55. Jason M Torres, Eric R Gamazon, Esteban J Parra, Jennifer E Below, Adan Valladares-Salgado, Niels Wacher, Miguel Cruz, Craig L Hanis, and Nancy J Cox. Cross-tissue and tissue-specific eQTLs: partitioning the heritability of a complex trait. The American Journal of Human Genetics, 95(5):521–534, 2014.

    Google Scholar 

  56. Shashaank Vattikuti, Juen Guo, and Carson C Chow. Heritability and genetic correlations explained by common SNPs for metabolic syndrome traits. PLoS genetics, 8(3):e1002637, 2012.

    Google Scholar 

  57. Peter M Visscher, Matthew A Brown, Mark I McCarthy, and Jian Yang. Five years of GWAS discovery. The American Journal of Human Genetics, 90(1):7–24, 2012.

    Google Scholar 

  58. Peter M Visscher, William G Hill, and Naomi R Wray. Heritability in the genomics era-concepts and misconceptions. Nature Reviews Genetics, 9(4):255–266, 2008.

    Google Scholar 

  59. Peter M Visscher, Sarah E Medland, MA Ferreira, Katherine I Morley, Gu Zhu, Belinda K Cornes, Grant W Montgomery, and Nicholas G Martin. Assumption-free estimation of heritability from genome-wide identity-by-descent sharing between full siblings. PLoS Genet, 2(3):e41, 2006.

    Google Scholar 

  60. Qian Wang, Can Yang, Joel Gelernter, and Hongyu Zhao. Pervasive pleiotropy between psychiatric disorders and immune disorders revealed by integrative analysis of multiple GWAS. Human genetics, 134(11–12):1195–1209, 2015.

    Article  Google Scholar 

  61. Danielle Welter, Jacqueline MacArthur, Joannella Morales, Tony Burdett, Peggy Hall, Heather Junkins, Alan Klemm, Paul Flicek, Teri Manolio, Lucia Hindorff, et al. The NHGRI GWAS Catalog, a curated resource of SNP-trait associations. Nucleic acids research, 42(D1):D1001–D1006, 2014.

    Google Scholar 

  62. Can Yang, Cong Li, Henry R Kranzler, Lindsay A Farrer, Hongyu Zhao, and Joel Gelernter. Exploring the genetic architecture of alcohol dependence in African-Americans via analysis of a genomewide set of common variants. Human Genetics, 133(5):617–624, 2014.

    Google Scholar 

  63. Can Yang, Cong Li, Qian Wang, Dongjun Chung, and Hongyu Zhao. Implications of pleiotropy: challenges and opportunities for mining big data in biomedicine. Frontiers in genetics, 6, 2015.

    Google Scholar 

  64. Jian Yang, Andrew Bakshi, Zhihong Zhu, Gibran Hemani, Anna AE Vinkhuyzen, Sang Hong Lee, Matthew R Robinson, John RB Perry, Ilja M Nolte, Jana V van Vliet-Ostaptchouk, et al. Genetic variance estimation with imputed variants finds negligible missing heritability for human height and body mass index. Nature genetics, 2015.

    Google Scholar 

  65. Jian Yang, Andrew Bakshi, Zhihong Zhu, Gibran Hemani, Anna AE Vinkhuyzen, Ilja M Nolte, Jana V van Vliet-Ostaptchouk, Harold Snieder, Tonu Esko, Lili Milani, et al. Genome-wide genetic homogeneity between sexes and populations for human height and body mass index. Human molecular genetics, 24(25):7445–7449, 2015.

    Google Scholar 

  66. Jian Yang, Beben Benyamin, Brian P McEvoy, Scott Gordon, Anjali K Henders, Dale R Nyholt, Pamela A Madden, Andrew C Heath, Nicholas G Martin, Grant W Montgomery, et al. Common SNPs explain a large proportion of the heritability for human height. Nature genetics, 42(7):565–569, 2010.

    Google Scholar 

  67. Jian Yang, S Hong Lee, Michael E Goddard, and Peter M Visscher. GCTA: a tool for genome-wide complex trait analysis. The American Journal of Human Genetics, 88(1):76–82, 2011.

    Google Scholar 

  68. Jian Yang, Sang Hong Lee, Naomi R Wray, Michael E Goddard, and Peter M Visscher. Commentary on “Limitations of GCTA as a solution to the missing heritability problem”. bioRxiv, page 036574, 2016.

    Google Scholar 

  69. Zhihong Zhu, Andrew Bakshi, Anna AE Vinkhuyzen, Gibran Hemani, Sang Hong Lee, Ilja M Nolte, Jana V van Vliet-Ostaptchouk, Harold Snieder, Tonu Esko, Lili Milani, et al. Dominance genetic variation contributes little to the missing heritability for human complex traits. The American Journal of Human Genetics, 96(3):377–385, 2015.

    Google Scholar 

Download references

Acknowledgements

This work was supported in part by grant NO. 61501389 from National Natural Science Foundation of China (NSFC), grants HKBU_22302815 and HKBU_12202114 from Hong Kong Research Grant Council, and grants FRG2/14-15/069, FRG2/15-16/011, and FRG2/14-15/077 from Hong Kong Baptist University, and Duke-NUS Medical School WBS: R-913-200-098-263.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Can Yang .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing Switzerland

About this chapter

Cite this chapter

Yang, C., Wan, X., Liu, J., Ng, M. (2016). Introduction to Statistical Methods for Integrative Data Analysis in Genome-Wide Association Studies. In: Wong, KC. (eds) Big Data Analytics in Genomics. Springer, Cham. https://doi.org/10.1007/978-3-319-41279-5_1

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-41279-5_1

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-41278-8

  • Online ISBN: 978-3-319-41279-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics