Abstract
Genome-wide association studies (GWASs) have successfully identified a large amount of single-nucleotide polymorphisms associated with many complex phenotypes in diverse populations. However, a comprehensive understanding of the genetic correlation of associated loci of phenotypes across populations remains lacking and the extent to which associations discovered in one population can be generalized to other populations or can be utilized for trans-ethnic genetic prediction is also unclear. By leveraging summary statistics, we proposed MAGIC to evaluate the trans-ethnic marginal genetic correlation (rm) of per-allele effect sizes for associated SNPs (P < 5E-8) under the framework of measurement error models. We confirmed the methodological advantage of MAGIC over general approaches through simulations and demonstrated its utility by analyzing 34 GWAS summary statistics of phenotypes from the East Asian (Nmax = 254,373) and European (Nmax = 1,220,901) populations. Among these phenotypes, rm was estimated to range from 0.584 (se = 0.140) for breast cancer to 0.949 (se = 0.035) for age of menarche, with an average of 0.835 (se = 0.045). We also uncovered that the trans-ethnic genetic prediction accuracy for phenotypes in the target population would substantially become low when using associated SNPs identified in non-target populations, indicating that associations discovered in the one population cannot be simply generalized to another population and that the accuracy of trans-ethnic phenotype prediction is generally dissatisfactory. Overall, our study provides in-depth insight into trans-ethnic genetic correlation and prediction for complex phenotypes across diverse populations.
Similar content being viewed by others
Data availability
All data generated or analyzed during this study are included in this published article and its supplementary information file.
References
Altshuler D, Daly M, Lander E (2008) Genetic mapping in human disease. Science 322:881–888
Banda Y, Kvale MN, Hoffmann TJ et al (2015) Characterizing race/ethnicity and genetic ancestry for 100,000 subjects in the Genetic Epidemiology Research on Adult Health and Aging (GERA) cohort. Genetics 200:1285–1295
Bigdeli TB, Ripke S, Peterson RE et al (2017) Genetic effects influencing risk for major depressive disorder in China and Europe. Transl Psychiatry 7:e1074–e1074
Bomba L, Walter K, Soranzo N (2017) The impact of rare and low-frequency genetic variants in common disease. Genome Biol 18:77
Bowden J, Del Greco MF, Minelli C et al (2016) Assessing the suitability of summary data for two-sample Mendelian randomization analyses using MR-Egger regression: the role of the I 2 statistic. Int J Epidemiol 45:1961–1974
Boyle EA, Li YI, Pritchard JK (2017) An expanded view of complex traits: from polygenic to omnigenic. Cell 169:1177–1186
Brown BC, Ye CJ, Price AL et al (2016) Transethnic genetic-correlation estimates from summary statistics. Am J Hum Genet 99:76–88
Bulik-Sullivan B, Finucane HK, Anttila V et al (2015a) An atlas of genetic correlations across human diseases and traits. Nat Genet 47:1236–1241
Bulik-Sullivan BK, Loh P-R, Finucane HK et al (2015b) LD Score regression distinguishes confounding from polygenicity in genome-wide association studies. Nat Genet 47:291–295
Buonaccorsi JP (2010) Measurement error: models, methods, and applications. Chapman and Hall/CRC, New York
Bustamante CD, Burchard EG, De la Vega FM (2011) Genomics for the world. Nature 475:163–165
Carroll RJ, Küchenhoff H, Lombard F et al (1996) Asymptotics for the SIMEX estimator in nonlinear measurement error models. J Am Stat Assoc 91:242–250
Chanock S, Manolio T, Boehnke M et al (2007) Replicating genotype–phenotype associations. Nature 447:655–660
Charles E (2005) The correction for attenuation due to measurement error: clarifying concepts and creating confidence sets. Psychol Methods 10:206–226
Cook JR, Stefanski LA (1994) Simulation-extrapolation estimation in parametric measurement error models. J Am Stat Assoc 89:1314–1328
Coram MA, Candille SI, Duan Q et al (2015) Leveraging multi-ethnic evidence for mapping complex traits in minority populations: an empirical Bayes approach. Am J Hum Genet 96:740–752
Coram MA, Fang H, Candille SI et al (2017) Leveraging multi-ethnic evidence for risk assessment of quantitative traits in minority populations. Am J Hum Genet 101:218–226
Corbin LJ, Richmond RC, Wade KH et al (2016) BMI as a modifiable risk factor for type 2 diabetes: refining and understanding causal estimates using Mendelian randomization. Diabetes 65:3002–3007
Davey Smith G, Paternoster L, Relton C (2017) When will Mendelian randomization become relevant for clinical practice and public health? JAMA 317:589–591
Davies NM, Dickson M, Davey Smith G et al (2018) The causal effects of education on health outcomes in the UK Biobank. Nat Hum Behav 2:117–125
De Candia TR, Lee SH, Yang J et al (2013) Additive genetic variation in schizophrenia risk is shared by populations of African and European descent. Am J Hum Genet 93:463–470
De La Vega FM, Bustamante CD (2018) Polygenic risk scores: a biased prediction? Genome Med 10:100
Ding M, Huang T, Bergholdt HK et al (2017) Dairy consumption, systolic blood pressure, and risk of hypertension: Mendelian randomization study. BMJ 356:j1000
Disney-Hogg L, Cornish AJ, Sud A et al (2018) Impact of atopy on risk of glioma: a Mendelian randomisation study. BMC Med 16:42
Dudbridge F (2013) Power and predictive accuracy of polygenic risk scores. PLoS Genet 9:e1003348
Efron B, Tibshirani RJ (1994) An introduction to the bootstrap. CRC Press
Galinsky KJ, Reshef YA, Finucane HK et al (2019) Estimating cross-population genetic correlations of causal effect sizes. Genet Epidemiol 43:180–188
Gallagher MD, Chen-Plotkin AS (2018) The post-GWAS era: from association to function. Am J Hum Genet 102:717–730
Greenland S (2000) An introduction to instrumental variables for epidemiologists. Int J Epidemiol 29:722–729
Guo J, Wu Y, Zhu Z et al (2018) Global genetic differentiation of complex traits shaped by natural selection in humans. Nat Commun 9:1865
Guo J, Bakshi A, Wang Y et al (2021) Quantifying genetic heterogeneity between continental populations for human height and body mass index. Sci Rep 11:1–9
Guolo A (2008) Robust techniques for measurement error correction: a review. Stat Methods Med Res 17:555–580
Gurdasani D, Barroso I, Zeggini E et al (2019) Genomics of disease risk in globally diverse populations. Nat Rev Genet 20:520–535
Ikeda M, Takahashi A, Kamatani Y et al (2018) A genome-wide association study identifies two novel susceptibility loci and trans population polygenicity associated with bipolar disorder. Mol Psychiatry 23:639–647
Ishigaki K, Akiyama M, Kanai M et al (2020) Large-scale genome-wide association study in a Japanese population identifies novel susceptibility loci across different diseases. Nat Genet 52:669–679
Khera AV, Chaffin M, Aragam KG et al (2018) Genome-wide polygenic scores for common diseases identify individuals with risk equivalent to monogenic mutations. Nat Genet 50:1219–1224
Klein RJ, Xu X, Mukherjee S et al (2010) Successes of genome-wide association studies. Cell 142:350–351
Kraft P (2008) Curses—winner’s and otherwise—in genetic epidemiology. Epidemiology 19:649–651
Krapohl E, Patel H, Newhouse S et al (2018) Multi-polygenic score approach to trait prediction. Mol Psychiatry 23:1368–1374
Lee SH, Yang J, Goddard ME et al (2012) Estimation of pleiotropy between complex diseases using single-nucleotide polymorphism-derived genomic relationships and restricted maximum likelihood. Bioinformatics 28:2540–2542
Lewis CM, Vassos E (2017) Prospects for using risk scores in polygenic medicine. Genome Med 9:96
Li YR, Keating BJ (2014) Trans-ethnic genome-wide association studies: advantages and challenges of mapping in diverse populations. Genome Med 6:91
Liu JZ, van Sommeren S, Huang H et al (2015) Association analyses identify 38 susceptibility loci for inflammatory bowel disease and highlight shared genetic risk across populations. Nat Genet 47:979–986
Lockwood J, McCaffrey DF (2017) Simulation-extrapolation with latent heteroskedastic error variance. Psychometrika 82:717–736
MacKinnon DP, Krull JL, Lockwood CM (2000) Equivalence of the mediation, confounding and suppression effect. Prev Sci 1:173–181
Márquez-Luna C, Loh P-R, Consortium SATD et al (2017) Multiethnic polygenic risk scores improve risk prediction in diverse populations. Genet Epidemiol 41:811–823
Martin AR, Gignoux CR, Walters RK et al (2017) Human demographic history impacts genetic risk prediction across diverse populations. Am J Hum Genet 100:635–649
Martin AR, Kanai M, Kamatani Y et al (2019) Clinical use of current polygenic risk scores may exacerbate health disparities. Nat Genet 51:584–591
McMahon A, Malangone C, Suveges D et al (2019) The NHGRI-EBI GWAS Catalog of published genome-wide association studies, targeted arrays and summary statistics 2019. Nucleic Acids Res 47:D1005–D1012
Morris AP (2011) Transethnic meta-analysis of genomewide association studies. Genet Epidemiol 35:809–822
Okada Y, Wu D, Trynka G et al (2014) Genetics of rheumatoid arthritis contributes to biology and drug discovery. Nature 506:376–381
Paré G, Mao S, Deng WQ (2018) A robust method to estimate regional polygenic correlation under misspecified linkage disequilibrium structure. Genet Epidemiol 42:636–647
Power RA, Steinberg S, Bjornsdottir G et al (2015) Polygenic risk scores for schizophrenia and bipolar disorder predict creativity. Nat Neurosci 18:953–955
Pritchard JK, Przeworski M (2001) Linkage disequilibrium in humans: models and data. Am J Hum Genet 69:1–14
Purcell S, Neale B, Todd-Brown K et al (2007) PLINK: a tool set for whole-genome association and population-based linkage analyses. Am J Hum Genet 81:559–575
Qi T, Wu Y, Zeng J et al (2018) Identifying gene targets for brain-related traits using transcriptomic and methylomic data from blood. Nat Commun 9:2282–2282
Race E, Group GW (2005) The use of racial, ethnic, and ancestral categories in human genetics research. Am J Hum Genet 77:519–532
Robinson MR, Hemani G, Medina-Gomez C et al (2015) Population genetic differentiation of height and body mass index across Europe. Nat Genet 47:1357–1362
Robinson PC, Choi HK, Do R et al (2016) Insight into rheumatological cause and effect through the use of Mendelian randomization. Nat Rev Rheumatol 12:486–496
Rosenberg NA, Huang L, Jewett EM et al (2010) Genome-wide association studies in diverse populations. Nat Rev Genet 11:356–366
Schoech AP, Jordan DM, Loh P-R et al (2019) Quantification of frequency-dependent genetic architectures in 25 UK Biobank traits reveals action of negative selection. Nat Commun 10:790
Shi H, Kichaev G, Pasaniuc B (2016) Contrasting the genetic architecture of 30 complex traits from summary association data. Am J Hum Genet 99:139–153
Shi H, Mancuso N, Spendlove S et al (2017) Local genetic correlation gives insights into the shared genetic architecture of complex traits. Am J Hum Genet 101:737–751
Spiller W, Davies NM, Palmer TM (2019) Software application profile: mrrobust—a tool for performing two-sample summary Mendelian randomization analyses. Int J Epidemiol 48:684
Spracklen CN, Chen P, Kim YJ et al (2017) Association analyses of East Asian individuals and trans-ancestry analyses with European individuals reveal new loci associated with cholesterol and triglyceride levels. Hum Mol Genet 26:1770–1784
Spracklen CN, Horikoshi M, Kim YJ et al (2020) Identification of type 2 diabetes loci in 433,540 East Asian individuals. Nature 582:240–245
Stefanski LA, Cook JR (1995) Simulation-extrapolation: the measurement error jackknife. J Am Stat Assoc 90:1247–1256
Tam V, Patel N, Turcotte M et al (2019) Benefits and limitations of genome-wide association studies. Nat Rev Genet 20:467–484
Teo Y-Y, Small KS, Kwiatkowski DP (2010) Methodological challenges of genome-wide association analysis in Africa. Nat Rev Genet 11:149–160
The 1000 Genomes Project Consortium (2015) A global reference for human genetic variation. Nature 526:68–74
The Wellcome Trust Case Control Consortium (2007) Genome-wide association study of 14,000 cases of seven common diseases and 3000 shared controls. Nature 447:661–678
van Rheenen W, Peyrot WJ, Schork AJ et al (2019) Genetic correlations of polygenic disease traits: from theory to practice. Nat Rev Genet 20:567–581
van’t Hof FNG, Vaucher J, Holmes MV et al (2017) Genetic variants associated with type 2 diabetes and adiposity and risk of intracranial and abdominal aortic aneurysms. Eur J Hum Genet 25:758–762
Veturi Y, de los Campos G, Yi N et al (2019) Modeling heterogeneity in the genetic architecture of ethnically diverse groups using random effect interaction models. Genetics 211:1395–1407
Visscher PM, Wray NR, Zhang Q et al (2017) 10 years of GWAS discovery: biology, function, and translation. Am J Hum Genet 101:5–22
Voight BF, Peloso GM, Orho-Melander M et al (2012) Plasma HDL cholesterol and risk of myocardial infarction: a mendelian randomisation study. Lancet 380:572–580
Vuckovic D, Bao EL, Akbari P et al (2020) The polygenic and monogenic basis of blood traits and diseases. Cell 182:1214-1231.e11
Wall JD, Pritchard JK (2003) Haplotype blocks and linkage disequilibrium in the human genome. Nat Rev Genet 4:587–597
Wang H, Zhang F, Zeng J et al (2019) Genotype-by-environment interactions inferred from genetic effects on phenotypic variability in the UK Biobank. Sci Adv 5:eaaw3538
Willer CJ, Li Y, Abecasis GR (2010) METAL: fast and efficient meta-analysis of genome-wide association scans. Bioinformatics 26:2190–2191
Wojcik GL, Graff M, Nishimura KK et al (2019) Genetic analyses of diverse populations improves discovery for complex traits. Nature 570:514–518
Yu X, Wang T, Chen Y et al (2020a) Alcohol drinking and amyotrophic lateral sclerosis: an instrumental variable causal inference. Ann Neurol 88:195–198
Yu X, Yuan Z, Lu H et al (2020b) Relationship between birth weight and chronic kidney disease: evidence from systematics review and two-sample Mendelian randomization analysis. Hum Mol Genet 29:2261–2274
Zaitlen N, Paşaniuc B, Gur T et al (2010) Leveraging genetic variability across populations for the identification of causal variants. Am J Hum Genet 86:23–33
Zeng P, Zhou X (2019) Causal effects of blood lipids on amyotrophic lateral sclerosis: a Mendelian randomization study. Hum Mol Genet 28:688–697
Zeng J, De Vlaming R, Wu Y et al (2018) Signatures of negative selection in the genetic architecture of human complex traits. Nat Genet 50:746–753
Zeng P, Wang T, Zheng J et al (2019) Causal association of type 2 diabetes with amyotrophic lateral sclerosis: new evidence from Mendelian randomization using GWAS summary statistics. BMC Med 17:225
Zhang X, Rice M, Tworoger SS et al (2018) Addition of a polygenic risk score, mammographic density, and endogenous hormones to existing breast cancer risk prediction models: a nested case–control study. PLoS Med 15:e1002644
Zhu Z, Zhang F, Hu H et al (2016) Integration of summary data from GWAS and eQTL studies predicts complex trait gene targets. Nat Genet 48:481–487
Zhu Z, Zheng Z, Zhang F et al (2018) Causal associations between risk factors and common diseases inferred from GWAS summary data. Nat Commun 9:224
Zollner S, Pritchard J (2007) Overcoming the winner’s curse: estimating penetrance parameters from case-control. Am J Hum Genet 80:605–615
Acknowledgements
We thank all the GWAS consortia for making summary statistics publicly available for us and are grateful to all the investigators and participants contributed to those studies. The data analyses in the present study were carried out with the high-performance computing cluster that was supported by the special central finance project of local universities for Xuzhou Medical University. We are especially grateful to two anonymous referees for making a lot of constructive comments that have led to substantial improvements of our manuscript.
Funding
The research of Ping Zeng was supported in part by the Youth Foundation of Humanity and Social Science funded by Ministry of Education of China (18YJC910002), the Natural Science Foundation of Jiangsu Province of China (BK20181472), the China Postdoctoral Science Foundation (2018M630607 and 2019T120465), the QingLan Research Project of Jiangsu Province for Outstanding Young Teachers, the Six-Talent Peaks Project in Jiangsu Province of China (WSN-087), the Training Project for Youth Teams of Science and Technology Innovation at Xuzhou Medical University (TD202008), the Postdoctoral Science Foundation of Xuzhou Medical University, the National Natural Science Foundation of China (81402765), and the Statistical Science Research Project from National Bureau of Statistics of China (2014LY112). The research of Shuiping Huang was supported in part by the Social Development Project of Xuzhou City (KC19017). The research of Ting Wang was supported in part by the Social Development Project of Xuzhou City (KC20062).
Author information
Authors and Affiliations
Contributions
PZ conceived the idea for the study. PZ, TW, and SH obtained and cleared the datasets; PZ, HL, TW, JZ, and SZ performed the data analyses. PZ, HL, and TW interpreted the results of the data analyses. PZ and HL wrote the manuscript with the help from other authors.
Corresponding author
Ethics declarations
Conflict of interest
The authors declare no conflicts of interest.
Consent for publication
All the authors agreed that this manuscript be submitted to the journal of Human Genetics for publication.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
Below is the link to the electronic supplementary material.
Rights and permissions
About this article
Cite this article
Lu, H., Wang, T., Zhang, J. et al. Evaluating marginal genetic correlation of associated loci for complex diseases and traits between European and East Asian populations. Hum Genet 140, 1285–1297 (2021). https://doi.org/10.1007/s00439-021-02299-8
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00439-021-02299-8