Skip to main content
Log in

Improving genetic risk prediction by leveraging pleiotropy

  • Original Investigation
  • Published:
Human Genetics Aims and scope Submit manuscript

Abstract

An important task of human genetics studies is to predict accurately disease risks in individuals based on genetic markers, which allows for identifying individuals at high disease risks, and facilitating their disease treatment and prevention. Although hundreds of genome-wide association studies (GWAS) have been conducted on many complex human traits in recent years, there has been only limited success in translating these GWAS data into clinically useful risk prediction models. The predictive capability of GWAS data is largely bottlenecked by the available training sample size due to the presence of numerous variants carrying only small to modest effects. Recent studies have shown that different human traits may share common genetic bases. Therefore, an attractive strategy to increase the training sample size and hence improve the prediction accuracy is to integrate data from genetically correlated phenotypes. Yet, the utility of genetic correlation in risk prediction has not been explored in the literature. In this paper, we analyzed GWAS data for bipolar and related disorders and schizophrenia with a bivariate ridge regression method, and found that jointly predicting the two phenotypes could substantially increase prediction accuracy as measured by the area under the receiver operating characteristic curve. We also found similar prediction accuracy improvements when we jointly analyzed GWAS data for Crohn’s disease and ulcerative colitis. The empirical observations were substantiated through our comprehensive simulation studies, suggesting that a gain in prediction accuracy can be obtained by combining phenotypes with relatively high genetic correlations. Through both real data and simulation studies, we demonstrated pleiotropy can be leveraged as a valuable asset that opens up a new opportunity to improve genetic risk prediction in the future.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

References

  • Andreassen OA, Djurovic S, Thompson WK, Schork AJ, Kendler KS, ODonovan MC, Rujescu D, Werge T, van de Bunt M, Morris AP et al (2013) Improved detection of common variants associated with schizophrenia by leveraging pleiotropy with cardiovascular-disease risk factors. Am J Hum Genet 92(2):197–209

    Article  CAS  PubMed Central  PubMed  Google Scholar 

  • Brown PJ, Zidek JV (1980) Adaptive multivariate ridge regression. Ann Stat 8(1):64–74

    Article  Google Scholar 

  • Burton PR, Clayton DG, Cardon LR, Craddock N, Deloukas P, Duncanson A, Kwiatkowski DP, McCarthy MI, Ouwehand WH, Samani NJ et al (2007) Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls. Nature 447(7145):661–678

    Article  CAS  Google Scholar 

  • Clarke AJ, Cooper DN (2010) GWAS: heritability missing in action? Eur J Hum Genet 18(8):859–861

    Article  PubMed Central  PubMed  Google Scholar 

  • Collins FS, McKusick VA (2001) Implications of the human genome project for medical science. JAMA 285(5):540–544

    Article  CAS  PubMed  Google Scholar 

  • de los Campos G, Gianola D, Allison D (2010) Predicting genetic predisposition in humans the promise of whole-genome markers. Nat Rev Genet 11(12):880–886

    Article  Google Scholar 

  • Dickson SP, Wang K, Krantz I, Hakonarson H, Goldstein DB (2010) Rare variants create synthetic genome-wide associations. PLoS Biol 8(1):e1000,294

    Article  Google Scholar 

  • Eichler EE, Flint J, Gibson G, Kong A, Leal SM, Moore JH, Nadeau JH (2010) Missing heritability and strategies for finding the underlying causes of complex disease. Nat Rev Genet 11(6):446–450

    Article  CAS  PubMed Central  PubMed  Google Scholar 

  • Falconer DS (1965) The inheritance of liability to certain diseases, estimated from the incidence among relatives. Ann Hum Genet 29(1):51–76

    Article  Google Scholar 

  • Fan RE, Chang KW, Hsieh CJ, Wang XR, Lin CJ (2008) Liblinear: a library for large linear classification. J Mach Learn Res 9:1871–1874

    Google Scholar 

  • Forer L, Schönherr S, Weissensteiner H, Haider F, Kluckner T, Gieger C, Wichmann HE, Specht G, Kronenberg F, Kloss-Brandstätter A (2010) CONAN: copy number variation analysis software for genome-wide association studies. BMC Bioinform 11(1):318

    Article  Google Scholar 

  • Gibson G et al. (2010) Hints of hidden heritability in GWAS. Nat Genet 42(7):558–560

    Article  CAS  PubMed  Google Scholar 

  • Gilmour AR, Thompson R, Cullis BR (1995) Average information REML: an efficient algorithm for variance parameter estimation in linear mixed models. Biometrics 51(4):1440–1450

    Article  Google Scholar 

  • Haitovsky Y. (1987) On multivariate ridge regression. Biometrika 74(3):563–570

    Article  Google Scholar 

  • Hartley SW, Monti S, Liu CT, Steinberg MH, Sebastiani P (2012) Bayesian methods for multivariate modeling of pleiotropic SNP associations and genetic risk prediction. Frontiers Genet 3

  • Hartley SW, Sebastiani P (2013) PleioGRiP: genetic risk prediction with pleiotropy. Bioinformatics 29(8):1086–1088

    Article  CAS  PubMed  Google Scholar 

  • Huang J, Johnson AD, O’Donnell CJ (2011) PRIMe: a method for characterization and evaluation of pleiotropic regions from multiple genome-wide association studies. Bioinformatics 27(9):1201–1206

    Article  CAS  PubMed Central  PubMed  Google Scholar 

  • Huebinger RM, Garner HR, Barber RC (2010) Pathway genetic load allows simultaneous evaluation of multiple genetic associations. Burns 36(6):787–792

    Article  PubMed  Google Scholar 

  • Korte A, Vilhjálmsson BJ, Segura V, Platt A, Long Q, Nordborg M (2012) A mixed-model approach for genome-wide association studies of correlated traits in structured populations. Nat Genet 44(9):1066–1071

    Article  CAS  PubMed Central  PubMed  Google Scholar 

  • Lee S, Wray N, Goddard M, Visscher P (2011) Estimating missing heritability for disease from genome-wide association studies. Am J Hum Genet 88(3):294–305

    Article  PubMed Central  PubMed  Google Scholar 

  • Lee S, Yang J, Goddard M, Visscher P, Wray N (2012) Estimation of pleiotropy between complex diseases using single-nucleotide polymorphism-derived genomic relationships and restricted maximum likelihood. Bioinformatics 28(19):2540–2542

    Article  CAS  PubMed Central  PubMed  Google Scholar 

  • Maher B (2008) The case of the missing heritability. Nature 456(7218):18–21

    Article  CAS  PubMed  Google Scholar 

  • Makowsky R, Pajewski NM, Klimentidis YC, Vazquez AI, Duarte CW, Allison DB, de los Campos G (2011) Beyond missing heritability: prediction of complex traits. PLoS Genet 7(4):e1002,051

    Article  CAS  Google Scholar 

  • Manolio TA, Collins FS, Cox NJ, Goldstein DB, Hindorff LA, Hunter DJ, McCarthy MI, Ramos EM, Cardon LR, Chakravarti A et al (2009) Finding the missing heritability of complex diseases. Nature 461(7265):747–753

    Article  CAS  PubMed Central  PubMed  Google Scholar 

  • Park JH, Wacholder S, Gail MH, Peters U, Jacobs KB, Chanock SJ, Chatterjee N (2010) Estimation of effect size distribution from genome-wide association studies and implications for future discoveries. Nat Genet 42(7):570–575

    Article  CAS  PubMed  Google Scholar 

  • Purcell SM, Wray NR, Stone JL, Visscher PM, O’Donovan MC, Sullivan PF, Sklar P, Ruderfer DM, McQuillin A, Morris DW et al (2009) Common polygenic variation contributes to risk of schizophrenia and bipolar disorder. Nature 460(7256):748–752

    CAS  PubMed  Google Scholar 

  • Sakoda LC, Jorgenson E, Witte JS (2013) Turning of COGS moves forward findings for hormonally mediated cancers. Nat Genet 45(4):345–348

    Article  CAS  PubMed  Google Scholar 

  • Sivakumaran S, Agakov F, Theodoratou E, Prendergast JG, Zgaga L, Manolio T, Rudan I, McKeigue P, Wilson JF, Campbell H (2011) Abundant pleiotropy in human complex diseases and traits. Am J Hum Genet 89(5):607–618

    Article  CAS  PubMed Central  PubMed  Google Scholar 

  • Smoller JW, Craddock N, Kendler K, Lee PH, Neale BM, Nurnberger JI, Ripke S, Santangelo S, Sullivan PF et al (2013) Identification of risk loci with shared effects on five major psychiatric disorders: a genome-wide analysis. Lancet 381(9875):1371–1379

    Article  CAS  Google Scholar 

  • Solovieff N, Cotsapas C, Lee PH, Purcell SM, Smoller JW (2013) Pleiotropy in complex traits: challenges and strategies. Nat Rev Genet 14(7):483–495

    Article  CAS  PubMed  Google Scholar 

  • Thompson R (1973) The estimation of variance and covariance components with an application when records are subject to culling. Biometrics 29(3):527–550

    Article  Google Scholar 

  • Tibshirani R (1996) Regression shrinkage and selection via the lasso. J R Stat Soc Ser B (Methodological) 58(1):267–288

    Google Scholar 

  • Vattikuti S, Guo J, Chow CC (2012) Heritability and genetic correlations explained by common SNPs for metabolic syndrome traits. PLoS Genet 8(3):e1002,637

    Article  CAS  Google Scholar 

  • Visscher P, Brown M, McCarthy M, Yang J (2012) Five years of GWAS discovery. Am J Hum Genet 90(1):7

    Article  CAS  PubMed Central  PubMed  Google Scholar 

  • Wray NR, Yang J, Hayes BJ, Price AL, Goddard ME, Visscher PM (2013) Pitfalls of predicting complex traits from SNPs. Nat Rev Genet 14(7):507–515

    Article  CAS  PubMed  Google Scholar 

  • Yang J, Benyamin B, McEvoy B, Gordon S, Henders A, Nyholt D, Madden P, Heath A, Martin N, Montgomery G et al (2010) Common SNPs explain a large proportion of the heritability for human height. Nat Genet 42(7):565–569

    Article  CAS  PubMed Central  PubMed  Google Scholar 

  • Zhou X, Carbonetto P, Stephens M (2013) Polygenic modeling with Bayesian sparse linear mixed models. PLoS Genet 9(2):e1003,264

    Article  CAS  Google Scholar 

Download references

Acknowledgments

This study was supported by NIH grants R01 AA11330, AA017535, DA030976, GM59507, VA Cooperative Studies Program 572, and a fellowship from China Scholarship Council. We also thank Yale University High Performance Computing Center (funded by NIH RR19895) for the computation resource and data storage. Funding support for the Whole Genome Association Study of Bipolar Disorder was provided by the National Institute of Mental Health (NIMH) and the genotyping of samples was provided through the Genetic Association Information Network (GAIN). The datasets used for the analyses described in this manuscript were obtained from the database of Genotypes and Phenotypes (dbGaP) found at http://www.ncbi.nlm.nih.gov/gap through dbGaP accession number phs000017.v3.p1. Samples and associated phenotype data for the Collaborative Genomic Study of Bipolar Disorder were provided by the The NIMH Genetics Initiative for Bipolar Disorder. Data and biomaterials were collected in four projects that participated in NIMH Bipolar Disorder Genetics Initiative. From 1991–98, the Principal Investigators and Co-Investigators were: Indiana University, Indianapolis, IN, U01 MH46282, John Nurnberger, M.D., Ph.D., Marvin Miller, M.D., and Elizabeth Bowman, M.D.; Washington University, St. Louis, MO, U01 MH46280, Theodore Reich, M.D., Allison Goate, Ph.D., and John Rice, Ph.D.; Johns Hopkins University, Baltimore, MD U01 MH46274, J. Raymond DePaulo, Jr., M.D., Sylvia Simpson, M.D., MPH, and Colin Stine, Ph.D.; NIMH Intramural Research Program, Clinical Neurogenetics Branch, Bethesda, MD, Elliot Gershon, M.D., Diane Kazuba, B.A., and Elizabeth Maxwell, M.S.W. Data and biomaterials were collected as part of ten projects that participated in the NIMH Bipolar Disorder Genetics Initiative. From 1999–03, the Principal Investigators and Co-Investigators were: Indiana University, Indianapolis, IN, R01 MH59545, John Nurnberger, M.D., Ph.D., Marvin J. Miller, M.D., Elizabeth S. Bowman, M.D., N. Leela Rau, M.D., P. Ryan Moe, M.D., Nalini Samavedy, M.D., Rif El-Mallakh, M.D. (at University of Louisville), Husseini Manji, M.D. (at Wayne State University), Debra A. Glitz, M.D. (at Wayne State University), Eric T. Meyer, M.S., Carrie Smiley, R.N., Tatiana Foroud, Ph.D., Leah Flury, M.S., Danielle M. Dick, Ph.D., Howard Edenberg, Ph.D.; Washington University, St. Louis, MO, R01 MH059534, John Rice, Ph.D, Theodore Reich, M.D., Allison Goate, Ph.D., Laura Bierut, M.D.; Johns Hopkins University, Baltimore, MD, R01 MH59533, Melvin McInnis M.D., J. Raymond DePaulo, Jr., M.D., Dean F. MacKinnon, M.D., Francis M. Mondimore, M.D., James B. Potash, M.D., Peter P. Zandi, Ph.D, Dimitrios Avramopoulos, and Jennifer Payne; University of Pennsylvania, PA, R01 MH59553, Wade Berrettini M.D., Ph.D.; University of California at Irvine, CA, R01 MH60068, William Byerley M.D., and Mark Vawter M.D.; University of Iowa, IA, R01 MH059548, William Coryell M.D., and Raymond Crowe M.D. ; University of Chicago, IL, R01 MH59535, Elliot Gershon, M.D., Judith Badner Ph.D., Francis McMahon M.D., Chunyu Liu Ph.D., Alan Sanders M.D., Maria Caserta, Steven Dinwiddie M.D., Tu Nguyen, Donna Harakal; University of California at San Diego, CA, R01 MH59567, John Kelsoe, M.D., Rebecca McKinney, B.A.; Rush University, IL, R01 MH059556, William Scheftner M.D., Howard M. Kravitz, D.O., M.P.H., Diana Marta, B.S., Annette Vaughn-Brown, MSN, RN, and Laurie Bederow, MA; NIMH Intramural Research Program, Bethesda, MD, 1Z01MH002810-01, Francis J. McMahon, M.D., Layla Kassem, PsyD, Sevilla Detera-Wadleigh, Ph.D, Lisa Austin,Ph.D, Dennis L. Murphy, M.D. Funding support for the Genome-Wide Association of Schizophrenia Study was provided by the National Institute of Mental Health (R01 MH67257, R01 MH59588, R01 MH59571, R01 MH59565, R01 MH59587, R01 MH60870, R01 MH59566, R01 MH59586, R01 MH61675, R01 MH60879, R01 MH81800, U01 MH46276, U01 MH46289 U01 MH46318, U01 MH79469, and U01 MH79470) and the genotyping of samples was provided through the Genetic Association Information Network (GAIN). The datasets used for the analyses described in this manuscript were obtained from the database of Genotypes and Phenotypes (dbGaP) found at http://www.ncbi.nlm.nih.gov/gap through dbGaP accession number phs000021.v3.p2. Samples and associated phenotype data for the Genome-Wide Association of Schizophrenia Study were provided by the Molecular Genetics of Schizophrenia Collaboration (PI: Pablo V. Gejman, Evanston Northwestern Healthcare (ENH) and Northwestern University, Evanston, IL, USA). The NIDDK IBD Genetics Consortium Crohn’s Disease Genome-Wide Association Study was conducted by the NIDDK IBD Genetics Consortium Crohn’s Disease Genome-Wide Association Study Investigators and supported by the National Institute of Diabetes and Digestive and Kidney Diseases (NIDDK). The data and samples from the NIDDK IBD Genetics Consortium Crohn’s Disease Genome-Wide Association Study reported here were supplied by the NIDDK Central Repositories. The datasets used for the analyses described in this manuscript were obtained from the database of Genotypes and Phenotypes (dbGaP) found at http://www.ncbi.nlm.nih.gov/gap through dbGaP accession number phs000130.v1.p1. This manuscript was not prepared in collaboration with Investigators of the NIDDK IBD Genetics Consortium Crohn’s Disease Genome-Wide Association Study and does not necessarily reflect the opinions or views of the NIDDK IBD Genetics Consortium Crohn’s Disease Genome-Wide Association Study, the NIDDK Central Repositories, or the NIDDK. The NIDDK IBD Genetics Consortium Ulcerative Colitis Genome-Wide Association Study was conducted by the NIDDK IBD Genetics Consortium Ulcerative Colitis Genome-Wide Association Study Investigators and supported by the National Institute of Diabetes and Digestive and Kidney Diseases (NIDDK). The data and samples from the NIDDK IBD Genetics Consortium Ulcerative Colitis Genome-Wide Association Study reported here were supplied by the NIDDK Central Repositories. The datasets used for the analyses described in this manuscript were obtained from the database of Genotypes and Phenotypes (dbGaP) found at http://www.ncbi.nlm.nih.gov/gap through dbGaP accession number phs000345.v1.p1. This manuscript was not prepared in collaboration with Investigators of the NIDDK IBD Genetics Consortium Ulcerative Colitis Genome-Wide Association Study and does not necessarily reflect the opinions or views of the NIDDK IBD Genetics Consortium Ulcerative Colitis Genome-Wide Association Study, the NIDDK Central Repositories, or the NIDDK.

Conflict of interest

The authors declare that they have no conflict of interest.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hongyu Zhao.

Electronic supplementary material

Below is the link to the electronic supplementary material.

DOCx (387 KB)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Li, C., Yang, C., Gelernter, J. et al. Improving genetic risk prediction by leveraging pleiotropy. Hum Genet 133, 639–650 (2014). https://doi.org/10.1007/s00439-013-1401-5

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00439-013-1401-5

Keywords

Navigation