Abstract
Scientists in the life science field have long been seeking genetic variants associated with complex phenotypes to advance our understanding of complex genetic disorders. In the past decade, genome-wide association studies (GWASs) have been used to identify many thousands of genetic variants, each associated with at least one complex phenotype. Despite these successes, there is one major challenge towards fully characterizing the biological mechanism of complex diseases. It has been long hypothesized that many complex diseases are driven by the combined effect of many genetic variants, formally known as “polygenicity,” each of which may only have a small effect. To identify these genetic variants, large sample sizes are required but meeting such a requirement is usually beyond the capacity of a single GWAS. As the era of big data is coming, many genomic consortia are generating an enormous amount of data to characterize the functional roles of genetic variants and these data are widely available to the public. Integrating rich genomic data to deepen our understanding of genetic architecture calls for statistically rigorous methods in the big-genomic-data analysis. In this book chapter, we present a brief introduction to recent progresses on the development of statistical methodology for integrating genomic data. Our introduction begins with the discovery of polygenic genetic architecture, and aims at providing a unified statistical framework of integrative analysis. In particular, we highlight the importance of integrative analysis of multiple GWAS and functional information. We believe that statistically rigorous integrative analysis can offer more biologically interpretable inference and drive new scientific insights.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Hana Lango Allen, Karol Estrada, Guillaume Lettre, Sonja I Berndt, Michael N Weedon, Fernando Rivadeneira, and et al. Hundreds of variants clustered in genomic loci and biological pathways affect human height. Nature, 467(7317):832–838, 2010.
Kristin G Ardlie, David S Deluca, Ayellet V Segrè, Timothy J Sullivan, Taylor R Young, Ellen T Gelfand, Casandra A Trowbridge, Julian B Maller, Taru Tukiainen, Monkol Lek, et al. The Genotype-Tissue Expression (GTEx) pilot analysis: Multitissue gene regulation in humans. Science, 348(6235):648–660, 2015.
Christopher M Bishop and Nasser M Nasrabadi. Pattern recognition and machine learning, volume 1. Springer New York, 2006.
Brendan K Bulik-Sullivan, Po-Ru Loh, Hilary K Finucane, Stephan Ripke, Jian Yang, Nick Patterson, Mark J Daly, Alkes L Price, Benjamin M Neale, Schizophrenia Working Group of the Psychiatric Genomics Consortium, et al. LD score regression distinguishes confounding from polygenicity in genome-wide association studies. Nature genetics, 47(3):291–295, 2015.
Rita M Cantor, Kenneth Lange, and Janet S Sinsheimer. Prioritizing GWAS results: a review of statistical methods and recommendations for their application. The American Journal of Human Genetics, 86(1):6–22, 2010.
Peter Carbonetto and Matthew Stephens. Integrated enrichment analysis of variants and pathways in genome-wide association studies indicates central role for il-2 signaling genes in type 1 diabetes, and cytokine signaling genes in Crohn’s disease. PLoS Genet, 9(10):1003770, 2013.
Peter Carbonetto, Matthew Stephens, et al. Scalable variational inference for Bayesian variable selection in regression, and its accuracy in genetic association studies. Bayesian Analysis, 7(1):73–108, 2012.
Dongjun Chung, Can Yang, Cong Li, Joel Gelernter, and Hongyu Zhao. GPA: A Statistical Approach to Prioritizing GWAS Results by Integrating Pleiotropy and Annotation. PLoS genetics, 10(11):e1004787, 2014.
ENCODE Project Consortium et al. An integrated encyclopedia of DNA elements in the human genome. Nature, 489(7414):57–74, 2012.
Chris Cotsapas, Benjamin F Voight, Elizabeth Rossin, Kasper Lage, Benjamin M Neale, Chris Wallace, Gonçalo R Abecasis, Jeffrey C Barrett, Timothy Behrens, Judy Cho, et al. Pervasive sharing of genetic effects in autoimmune disease. PLoS genetics, 7(8):e1002254, 2011.
Cross-Disorder Group of the Psychiatric Genomics Consortium. Genetic relationship between five psychiatric disorders estimated from genome-wide SNPs. Nature genetics, 45(9):984–994, 2013.
Cross-Disorder Group of the Psychiatric Genomics Consortium. Identification of risk loci with shared effects on five major psychiatric disorders: a genome-wide analysis. Lancet, 2013.
Gustavo de los Campos, Daniel Sorensen, and Daniel Gianola. Genomic heritability: what is it? PLoS Genetics, 10(5):e1005048, 2015.
B. Efron. Large-Scale Inference: Empirical Bayes Methods for Estimation, Testing, and Prediction. Cambridge University Press, 2010.
Bradley Efron. The future of indirect evidence. Statistical science: a review journal of the Institute of Mathematical Statistics, 25(2):145, 2010.
Bradley Efron et al. Microarrays, empirical Bayes and the two-groups model. STAT SCI, 23(1):1–22, 2008.
John D Eicher, Christa Landowski, Brian Stackhouse, Arielle Sloan, Wenjie Chen, Nicole Jensen, Ju-Ping Lien, Richard Leslie, and Andrew D Johnson. GRASP v2. 0: an update on the Genome-Wide Repository of Associations between SNPs and phenotypes. Nucleic acids research, 43(D1):D799–D804, 2015.
Douglas S Falconer, Trudy FC Mackay, and Richard Frankham. Introduction to quantitative genetics (4th edn). Trends in Genetics, 12(7):280, 1996.
Hilary K Finucane, Brendan Bulik-Sullivan, Alexander Gusev, Gosia Trynka, Yakir Reshef, Po-Ru Loh, Verneri Anttila, Han Xu, Chongzhi Zang, Kyle Farh, et al. Partitioning heritability by functional annotation using genome-wide association summary statistics. Nature genetics, 47(11):1228–1235, 2015.
R. A. Fisher. The correlations between relatives on the supposition of Mendelian inheritance. Philosophical Transactions of the Royal Society of Edinburgh, 52:399–433, 1918.
Olivia Fletcher and Richard S Houlston. Architecture of inherited susceptibility to common cancer. Nature Reviews Cancer, 10(5):353–361, 2010.
Mary D Fortune, Hui Guo, Oliver Burren, Ellen Schofield, Neil M Walker, Maria Ban, Stephen J Sawcer, John Bowes, Jane Worthington, Anne Barton, et al. Statistical colocalization of genetic risk variants for related autoimmune diseases in the context of common controls. Nature genetics, 47(7):839–846, 2015.
Eric R Gamazon, Heather E Wheeler, Kaanan P Shah, Sahar V Mozaffari, Keston Aquino-Michaels, Robert J Carroll, Anne E Eyler, Joshua C Denny, Dan L Nicolae, Nancy J Cox, et al. A gene-based association method for mapping traits using reference transcriptome data. Nature genetics, 47(9):1091–1098, 2015.
Claudia Giambartolomei, Damjan Vukcevic, Eric E Schadt, Lude Franke, Aroon D Hingorani, Chris Wallace, and Vincent Plagnol. Bayesian test for colocalisation between pairs of genetic association studies using summary statistics. PLoS Genetics, 10(5):e1004383, 2014.
Arthur R Gilmour, Robin Thompson, and Brian R Cullis. Average information REML: an efficient algorithm for variance parameter estimation in linear mixed models. Biometrics, pages 1440–1450, 1995.
David Golan, Eric S Lander, and Saharon Rosset. Measuring missing heritability: Inferring the contribution of common variants. Proceedings of the National Academy of Sciences, 111(49):E5272–E5281, 2014.
Anthony J.F. Griffiths, Susan R. Wessler, Sean B. Carroll, and John Doebley. An introduction to genetic analysis, 11 edition. W. H. Freeman, 2015.
William G Hill, Michael E Goddard, and Peter M Visscher. Data and theory point to mainly additive genetic variance for complex traits. PLoS Genet, 4(2):e1000008, 2008.
L.A. Hindorff, P. Sethupathy, H.A. Junkins, E.M. Ramos, J.P. Mehta, F.S. Collins, and T.A. Manolio. Potential etiologic and functional implications of genome-wide association loci for human diseases and traits. Proceedings of the National Academy of Sciences, 106(23):9362, 2009.
Jiming Jiang, Cong Li, Debashis Paul, Can Yang, and Hongyu Zhao. High-dimensional genome-wide association study and misspecified mixed model analysis. arXiv preprint arXiv:1404.2355, to appear in Annals of statistics, 2014.
Robert J Klein, Caroline Zeiss, Emily Y Chew, Jen-Yue Tsai, Richard S Sackler, Chad Haynes, Alice K Henning, John Paul SanGiovanni, Shrikant M Mane, Susan T Mayne, et al. Complement factor h polymorphism in age-related macular degeneration. Science, 308(5720):385–389, 2005.
Siddharth Krishna Kumar, Marcus W Feldman, David H Rehkopf, and Shripad Tuljapurkar. Limitations of GCTA as a solution to the missing heritability problem. Proceedings of the National Academy of Sciences, 113(1):E61–E70, 2016.
Anshul Kundaje, Wouter Meuleman, Jason Ernst, Misha Bilenky, Angela Yen, Alireza Heravi-Moussavi, Pouya Kheradpour, Zhizhuo Zhang, Jianrong Wang, Michael J Ziller, et al. Integrative analysis of 111 reference human epigenomes. Nature, 518(7539):317–330, 2015.
S Hong Lee, Teresa R DeCandia, Stephan Ripke, Jian Yang, Patrick F Sullivan, Michael E Goddard, and et al. Estimating the proportion of variation in susceptibility to schizophrenia captured by common SNPs. Nature genetics, 44(3):247–250, 2012.
SH Lee, J Yang, ME Goddard, PM Visscher, and NR Wray. Estimation of pleiotropy between complex diseases using SNP-derived genomic relationships and restricted maximum likelihood. Bioinformatics, page bts474, 2012.
Richard Leslie, Christopher J ODonnell, and Andrew D Johnson. GRASP: analysis of genotype–phenotype results from 1390 genome-wide association studies and corresponding open access database. Bioinformatics, 30(12):i185–i194, 2014.
Cong Li, Can Yang, Joel Gelernter, and Hongyu Zhao. Improving genetic risk prediction by leveraging pleiotropy. Human genetics, 133(5):639–650, 2014.
James Liley and Chris Wallace. A pleiotropy-informed Bayesian false discovery rate adapted to a shared control design finds new disease associations from GWAS summary statistics. PLoS genetics, 11(2):e1004926, 2015.
John Lonsdale, Jeffrey Thomas, Mike Salvatore, Rebecca Phillips, Edmund Lo, Saboor Shad, Richard Hasz, Gary Walters, Fernando Garcia, Nancy Young, et al. The genotype-tissue expression (GTEx) project. Nature genetics, 45(6):580–585, 2013.
Michael Lynch, Bruce Walsh, et al. Genetics and analysis of quantitative traits, volume 1. Sinauer Sunderland, MA, 1998.
Robert Maier, Gerhard Moser, Guo-Bo Chen, Stephan Ripke, William Coryell, James B Potash, William A Scheftner, Jianxin Shi, Myrna M Weissman, Christina M Hultman, et al. Joint analysis of psychiatric disorders increases accuracy of risk prediction for schizophrenia, bipolar disorder, and major depressive disorder. The American Journal of Human Genetics, 96(2):283–294, 2015.
Teri A Manolio, Francis S Collins, Nancy J Cox, David B Goldstein, Lucia A Hindorff, David J Hunter, Mark I McCarthy, Erin M Ramos, Lon R Cardon, Aravinda Chakravarti, et al. Finding the missing heritability of complex diseases. Nature, 461(7265):747–753, 2009.
Geoffrey McLachlan and Thriyambakam Krishnan. The EM algorithm and extensions, volume 382. John Wiley & Sons, 2008.
Toby J Mitchell and John J Beauchamp. Bayesian variable selection in linear regression. Journal of the American Statistical Association, 83(404):1023–1032, 1988.
Alkes L Price, Nick J Patterson, Robert M Plenge, Michael E Weinblatt, Nancy A Shadick, and David Reich. Principal components analysis corrects for stratification in genome-wide association studies. Nature genetics, 38(8):904–909, 2006.
Neil Risch, Kathleen Merikangas, et al. The future of genetic studies of complex human diseases. Science, 273(5281):1516–1517, 1996.
Marylyn D Ritchie, Emily R Holzinger, Ruowang Li, Sarah A Pendergrass, and Dokyoon Kim. Methods of integrating data to uncover genotype-phenotype interactions. Nature Reviews Genetics, 16(2):85–97, 2015.
Schizophrenia Working Group of the Psychiatric Genomics Consortium. Biological insights from 108 schizophrenia-associated genetic loci. Nature, 511(7510):421–427, 2014.
Shanya Sivakumaran, Felix Agakov, Evropi Theodoratou, et al. Abundant pleiotropy in human complex diseases and traits. AM J HUM GENET, 89(5):607–618, 2011.
Nadia Solovieff, Chris Cotsapas, Phil H Lee, Shaun M Purcell, and Jordan W Smoller. Pleiotropy in complex traits: challenges and strategies. Nature Reviews Genetics, 14(7): 483–495, 2013.
Doug Speed and David J Balding. Relatedness in the post-genomic era: is it still useful? Nature Reviews Genetics, 16(1):33–44, 2015.
Doug Speed, Gibran Hemani, Michael R Johnson, and David J Balding. Improved heritability estimation from genome-wide SNPs. The American Journal of Human Genetics, 91(6):1011–1021, 2012.
Frank W Stearns. One hundred years of pleiotropy: a retrospective. Genetics, 186(3):767–773, 2010.
Aravind Subramanian, Pablo Tamayo, Vamsi K Mootha, Sayan Mukherjee, Benjamin L Ebert, Michael A Gillette, Amanda Paulovich, Scott L Pomeroy, Todd R Golub, Eric S Lander, et al. Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proceedings of the National Academy of Sciences of the United States of America, 102(43):15545–15550, 2005.
Jason M Torres, Eric R Gamazon, Esteban J Parra, Jennifer E Below, Adan Valladares-Salgado, Niels Wacher, Miguel Cruz, Craig L Hanis, and Nancy J Cox. Cross-tissue and tissue-specific eQTLs: partitioning the heritability of a complex trait. The American Journal of Human Genetics, 95(5):521–534, 2014.
Shashaank Vattikuti, Juen Guo, and Carson C Chow. Heritability and genetic correlations explained by common SNPs for metabolic syndrome traits. PLoS genetics, 8(3):e1002637, 2012.
Peter M Visscher, Matthew A Brown, Mark I McCarthy, and Jian Yang. Five years of GWAS discovery. The American Journal of Human Genetics, 90(1):7–24, 2012.
Peter M Visscher, William G Hill, and Naomi R Wray. Heritability in the genomics era-concepts and misconceptions. Nature Reviews Genetics, 9(4):255–266, 2008.
Peter M Visscher, Sarah E Medland, MA Ferreira, Katherine I Morley, Gu Zhu, Belinda K Cornes, Grant W Montgomery, and Nicholas G Martin. Assumption-free estimation of heritability from genome-wide identity-by-descent sharing between full siblings. PLoS Genet, 2(3):e41, 2006.
Qian Wang, Can Yang, Joel Gelernter, and Hongyu Zhao. Pervasive pleiotropy between psychiatric disorders and immune disorders revealed by integrative analysis of multiple GWAS. Human genetics, 134(11–12):1195–1209, 2015.
Danielle Welter, Jacqueline MacArthur, Joannella Morales, Tony Burdett, Peggy Hall, Heather Junkins, Alan Klemm, Paul Flicek, Teri Manolio, Lucia Hindorff, et al. The NHGRI GWAS Catalog, a curated resource of SNP-trait associations. Nucleic acids research, 42(D1):D1001–D1006, 2014.
Can Yang, Cong Li, Henry R Kranzler, Lindsay A Farrer, Hongyu Zhao, and Joel Gelernter. Exploring the genetic architecture of alcohol dependence in African-Americans via analysis of a genomewide set of common variants. Human Genetics, 133(5):617–624, 2014.
Can Yang, Cong Li, Qian Wang, Dongjun Chung, and Hongyu Zhao. Implications of pleiotropy: challenges and opportunities for mining big data in biomedicine. Frontiers in genetics, 6, 2015.
Jian Yang, Andrew Bakshi, Zhihong Zhu, Gibran Hemani, Anna AE Vinkhuyzen, Sang Hong Lee, Matthew R Robinson, John RB Perry, Ilja M Nolte, Jana V van Vliet-Ostaptchouk, et al. Genetic variance estimation with imputed variants finds negligible missing heritability for human height and body mass index. Nature genetics, 2015.
Jian Yang, Andrew Bakshi, Zhihong Zhu, Gibran Hemani, Anna AE Vinkhuyzen, Ilja M Nolte, Jana V van Vliet-Ostaptchouk, Harold Snieder, Tonu Esko, Lili Milani, et al. Genome-wide genetic homogeneity between sexes and populations for human height and body mass index. Human molecular genetics, 24(25):7445–7449, 2015.
Jian Yang, Beben Benyamin, Brian P McEvoy, Scott Gordon, Anjali K Henders, Dale R Nyholt, Pamela A Madden, Andrew C Heath, Nicholas G Martin, Grant W Montgomery, et al. Common SNPs explain a large proportion of the heritability for human height. Nature genetics, 42(7):565–569, 2010.
Jian Yang, S Hong Lee, Michael E Goddard, and Peter M Visscher. GCTA: a tool for genome-wide complex trait analysis. The American Journal of Human Genetics, 88(1):76–82, 2011.
Jian Yang, Sang Hong Lee, Naomi R Wray, Michael E Goddard, and Peter M Visscher. Commentary on “Limitations of GCTA as a solution to the missing heritability problem”. bioRxiv, page 036574, 2016.
Zhihong Zhu, Andrew Bakshi, Anna AE Vinkhuyzen, Gibran Hemani, Sang Hong Lee, Ilja M Nolte, Jana V van Vliet-Ostaptchouk, Harold Snieder, Tonu Esko, Lili Milani, et al. Dominance genetic variation contributes little to the missing heritability for human complex traits. The American Journal of Human Genetics, 96(3):377–385, 2015.
Acknowledgements
This work was supported in part by grant NO. 61501389 from National Natural Science Foundation of China (NSFC), grants HKBU_22302815 and HKBU_12202114 from Hong Kong Research Grant Council, and grants FRG2/14-15/069, FRG2/15-16/011, and FRG2/14-15/077 from Hong Kong Baptist University, and Duke-NUS Medical School WBS: R-913-200-098-263.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing Switzerland
About this chapter
Cite this chapter
Yang, C., Wan, X., Liu, J., Ng, M. (2016). Introduction to Statistical Methods for Integrative Data Analysis in Genome-Wide Association Studies. In: Wong, KC. (eds) Big Data Analytics in Genomics. Springer, Cham. https://doi.org/10.1007/978-3-319-41279-5_1
Download citation
DOI: https://doi.org/10.1007/978-3-319-41279-5_1
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-41278-8
Online ISBN: 978-3-319-41279-5
eBook Packages: Computer ScienceComputer Science (R0)