Evaluating marginal genetic correlation of associated loci for complex diseases and traits between European and East Asian populations

Abstract

Genome-wide association studies (GWASs) have successfully identified a large amount of single-nucleotide polymorphisms associated with many complex phenotypes in diverse populations. However, a comprehensive understanding of the genetic correlation of associated loci of phenotypes across populations remains lacking and the extent to which associations discovered in one population can be generalized to other populations or can be utilized for trans-ethnic genetic prediction is also unclear. By leveraging summary statistics, we proposed MAGIC to evaluate the trans-ethnic marginal genetic correlation (rm) of per-allele effect sizes for associated SNPs (P < 5E-8) under the framework of measurement error models. We confirmed the methodological advantage of MAGIC over general approaches through simulations and demonstrated its utility by analyzing 34 GWAS summary statistics of phenotypes from the East Asian (Nmax = 254,373) and European (Nmax = 1,220,901) populations. Among these phenotypes, rm was estimated to range from 0.584 (se = 0.140) for breast cancer to 0.949 (se = 0.035) for age of menarche, with an average of 0.835 (se = 0.045). We also uncovered that the trans-ethnic genetic prediction accuracy for phenotypes in the target population would substantially become low when using associated SNPs identified in non-target populations, indicating that associations discovered in the one population cannot be simply generalized to another population and that the accuracy of trans-ethnic phenotype prediction is generally dissatisfactory. Overall, our study provides in-depth insight into trans-ethnic genetic correlation and prediction for complex phenotypes across diverse populations.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Data availability

All data generated or analyzed during this study are included in this published article and its supplementary information file.

References

  1. Altshuler D, Daly M, Lander E (2008) Genetic mapping in human disease. Science 322:881–888

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  2. Banda Y, Kvale MN, Hoffmann TJ et al (2015) Characterizing race/ethnicity and genetic ancestry for 100,000 subjects in the Genetic Epidemiology Research on Adult Health and Aging (GERA) cohort. Genetics 200:1285–1295

    PubMed  PubMed Central  Article  Google Scholar 

  3. Bigdeli TB, Ripke S, Peterson RE et al (2017) Genetic effects influencing risk for major depressive disorder in China and Europe. Transl Psychiatry 7:e1074–e1074

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  4. Bomba L, Walter K, Soranzo N (2017) The impact of rare and low-frequency genetic variants in common disease. Genome Biol 18:77

    PubMed  PubMed Central  Article  CAS  Google Scholar 

  5. Bowden J, Del Greco MF, Minelli C et al (2016) Assessing the suitability of summary data for two-sample Mendelian randomization analyses using MR-Egger regression: the role of the I 2 statistic. Int J Epidemiol 45:1961–1974

    PubMed  PubMed Central  Article  Google Scholar 

  6. Boyle EA, Li YI, Pritchard JK (2017) An expanded view of complex traits: from polygenic to omnigenic. Cell 169:1177–1186

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  7. Brown BC, Ye CJ, Price AL et al (2016) Transethnic genetic-correlation estimates from summary statistics. Am J Hum Genet 99:76–88

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  8. Bulik-Sullivan B, Finucane HK, Anttila V et al (2015a) An atlas of genetic correlations across human diseases and traits. Nat Genet 47:1236–1241

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  9. Bulik-Sullivan BK, Loh P-R, Finucane HK et al (2015b) LD Score regression distinguishes confounding from polygenicity in genome-wide association studies. Nat Genet 47:291–295

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  10. Buonaccorsi JP (2010) Measurement error: models, methods, and applications. Chapman and Hall/CRC, New York

    Book  Google Scholar 

  11. Bustamante CD, Burchard EG, De la Vega FM (2011) Genomics for the world. Nature 475:163–165

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  12. Carroll RJ, Küchenhoff H, Lombard F et al (1996) Asymptotics for the SIMEX estimator in nonlinear measurement error models. J Am Stat Assoc 91:242–250

    Article  Google Scholar 

  13. Chanock S, Manolio T, Boehnke M et al (2007) Replicating genotype–phenotype associations. Nature 447:655–660

    CAS  PubMed  Article  PubMed Central  Google Scholar 

  14. Charles E (2005) The correction for attenuation due to measurement error: clarifying concepts and creating confidence sets. Psychol Methods 10:206–226

    PubMed  Article  PubMed Central  Google Scholar 

  15. Cook JR, Stefanski LA (1994) Simulation-extrapolation estimation in parametric measurement error models. J Am Stat Assoc 89:1314–1328

    Article  Google Scholar 

  16. Coram MA, Candille SI, Duan Q et al (2015) Leveraging multi-ethnic evidence for mapping complex traits in minority populations: an empirical Bayes approach. Am J Hum Genet 96:740–752

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  17. Coram MA, Fang H, Candille SI et al (2017) Leveraging multi-ethnic evidence for risk assessment of quantitative traits in minority populations. Am J Hum Genet 101:218–226

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  18. Corbin LJ, Richmond RC, Wade KH et al (2016) BMI as a modifiable risk factor for type 2 diabetes: refining and understanding causal estimates using Mendelian randomization. Diabetes 65:3002–3007

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  19. Davey Smith G, Paternoster L, Relton C (2017) When will Mendelian randomization become relevant for clinical practice and public health? JAMA 317:589–591

    PubMed  Article  PubMed Central  Google Scholar 

  20. Davies NM, Dickson M, Davey Smith G et al (2018) The causal effects of education on health outcomes in the UK Biobank. Nat Hum Behav 2:117–125

    PubMed  PubMed Central  Article  Google Scholar 

  21. De Candia TR, Lee SH, Yang J et al (2013) Additive genetic variation in schizophrenia risk is shared by populations of African and European descent. Am J Hum Genet 93:463–470

    PubMed  PubMed Central  Article  CAS  Google Scholar 

  22. De La Vega FM, Bustamante CD (2018) Polygenic risk scores: a biased prediction? Genome Med 10:100

    Article  Google Scholar 

  23. Ding M, Huang T, Bergholdt HK et al (2017) Dairy consumption, systolic blood pressure, and risk of hypertension: Mendelian randomization study. BMJ 356:j1000

    PubMed  PubMed Central  Article  Google Scholar 

  24. Disney-Hogg L, Cornish AJ, Sud A et al (2018) Impact of atopy on risk of glioma: a Mendelian randomisation study. BMC Med 16:42

    PubMed  PubMed Central  Article  CAS  Google Scholar 

  25. Dudbridge F (2013) Power and predictive accuracy of polygenic risk scores. PLoS Genet 9:e1003348

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  26. Efron B, Tibshirani RJ (1994) An introduction to the bootstrap. CRC Press

    Book  Google Scholar 

  27. Galinsky KJ, Reshef YA, Finucane HK et al (2019) Estimating cross-population genetic correlations of causal effect sizes. Genet Epidemiol 43:180–188

    PubMed  Article  PubMed Central  Google Scholar 

  28. Gallagher MD, Chen-Plotkin AS (2018) The post-GWAS era: from association to function. Am J Hum Genet 102:717–730

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  29. Greenland S (2000) An introduction to instrumental variables for epidemiologists. Int J Epidemiol 29:722–729

    CAS  PubMed  Article  PubMed Central  Google Scholar 

  30. Guo J, Wu Y, Zhu Z et al (2018) Global genetic differentiation of complex traits shaped by natural selection in humans. Nat Commun 9:1865

    PubMed  PubMed Central  Article  CAS  Google Scholar 

  31. Guo J, Bakshi A, Wang Y et al (2021) Quantifying genetic heterogeneity between continental populations for human height and body mass index. Sci Rep 11:1–9

    Article  CAS  Google Scholar 

  32. Guolo A (2008) Robust techniques for measurement error correction: a review. Stat Methods Med Res 17:555–580

    PubMed  Article  PubMed Central  Google Scholar 

  33. Gurdasani D, Barroso I, Zeggini E et al (2019) Genomics of disease risk in globally diverse populations. Nat Rev Genet 20:520–535

    CAS  PubMed  Article  PubMed Central  Google Scholar 

  34. Ikeda M, Takahashi A, Kamatani Y et al (2018) A genome-wide association study identifies two novel susceptibility loci and trans population polygenicity associated with bipolar disorder. Mol Psychiatry 23:639–647

    CAS  PubMed  Article  PubMed Central  Google Scholar 

  35. Ishigaki K, Akiyama M, Kanai M et al (2020) Large-scale genome-wide association study in a Japanese population identifies novel susceptibility loci across different diseases. Nat Genet 52:669–679

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  36. Khera AV, Chaffin M, Aragam KG et al (2018) Genome-wide polygenic scores for common diseases identify individuals with risk equivalent to monogenic mutations. Nat Genet 50:1219–1224

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  37. Klein RJ, Xu X, Mukherjee S et al (2010) Successes of genome-wide association studies. Cell 142:350–351

    CAS  PubMed  Article  PubMed Central  Google Scholar 

  38. Kraft P (2008) Curses—winner’s and otherwise—in genetic epidemiology. Epidemiology 19:649–651

    PubMed  Article  PubMed Central  Google Scholar 

  39. Krapohl E, Patel H, Newhouse S et al (2018) Multi-polygenic score approach to trait prediction. Mol Psychiatry 23:1368–1374

    CAS  PubMed  Article  PubMed Central  Google Scholar 

  40. Lee SH, Yang J, Goddard ME et al (2012) Estimation of pleiotropy between complex diseases using single-nucleotide polymorphism-derived genomic relationships and restricted maximum likelihood. Bioinformatics 28:2540–2542

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  41. Lewis CM, Vassos E (2017) Prospects for using risk scores in polygenic medicine. Genome Med 9:96

    PubMed  PubMed Central  Article  Google Scholar 

  42. Li YR, Keating BJ (2014) Trans-ethnic genome-wide association studies: advantages and challenges of mapping in diverse populations. Genome Med 6:91

    PubMed  PubMed Central  Article  Google Scholar 

  43. Liu JZ, van Sommeren S, Huang H et al (2015) Association analyses identify 38 susceptibility loci for inflammatory bowel disease and highlight shared genetic risk across populations. Nat Genet 47:979–986

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  44. Lockwood J, McCaffrey DF (2017) Simulation-extrapolation with latent heteroskedastic error variance. Psychometrika 82:717–736

    Article  Google Scholar 

  45. MacKinnon DP, Krull JL, Lockwood CM (2000) Equivalence of the mediation, confounding and suppression effect. Prev Sci 1:173–181

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  46. Márquez-Luna C, Loh P-R, Consortium SATD et al (2017) Multiethnic polygenic risk scores improve risk prediction in diverse populations. Genet Epidemiol 41:811–823

    PubMed  PubMed Central  Article  Google Scholar 

  47. Martin AR, Gignoux CR, Walters RK et al (2017) Human demographic history impacts genetic risk prediction across diverse populations. Am J Hum Genet 100:635–649

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  48. Martin AR, Kanai M, Kamatani Y et al (2019) Clinical use of current polygenic risk scores may exacerbate health disparities. Nat Genet 51:584–591

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  49. McMahon A, Malangone C, Suveges D et al (2019) The NHGRI-EBI GWAS Catalog of published genome-wide association studies, targeted arrays and summary statistics 2019. Nucleic Acids Res 47:D1005–D1012

    PubMed  Article  CAS  PubMed Central  Google Scholar 

  50. Morris AP (2011) Transethnic meta-analysis of genomewide association studies. Genet Epidemiol 35:809–822

    PubMed  PubMed Central  Article  Google Scholar 

  51. Okada Y, Wu D, Trynka G et al (2014) Genetics of rheumatoid arthritis contributes to biology and drug discovery. Nature 506:376–381

    CAS  PubMed  Article  PubMed Central  Google Scholar 

  52. Paré G, Mao S, Deng WQ (2018) A robust method to estimate regional polygenic correlation under misspecified linkage disequilibrium structure. Genet Epidemiol 42:636–647

    PubMed  Article  PubMed Central  Google Scholar 

  53. Power RA, Steinberg S, Bjornsdottir G et al (2015) Polygenic risk scores for schizophrenia and bipolar disorder predict creativity. Nat Neurosci 18:953–955

    CAS  PubMed  Article  PubMed Central  Google Scholar 

  54. Pritchard JK, Przeworski M (2001) Linkage disequilibrium in humans: models and data. Am J Hum Genet 69:1–14

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  55. Purcell S, Neale B, Todd-Brown K et al (2007) PLINK: a tool set for whole-genome association and population-based linkage analyses. Am J Hum Genet 81:559–575

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  56. Qi T, Wu Y, Zeng J et al (2018) Identifying gene targets for brain-related traits using transcriptomic and methylomic data from blood. Nat Commun 9:2282–2282

    PubMed  PubMed Central  Article  CAS  Google Scholar 

  57. Race E, Group GW (2005) The use of racial, ethnic, and ancestral categories in human genetics research. Am J Hum Genet 77:519–532

    Article  Google Scholar 

  58. Robinson MR, Hemani G, Medina-Gomez C et al (2015) Population genetic differentiation of height and body mass index across Europe. Nat Genet 47:1357–1362

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  59. Robinson PC, Choi HK, Do R et al (2016) Insight into rheumatological cause and effect through the use of Mendelian randomization. Nat Rev Rheumatol 12:486–496

    PubMed  Article  PubMed Central  Google Scholar 

  60. Rosenberg NA, Huang L, Jewett EM et al (2010) Genome-wide association studies in diverse populations. Nat Rev Genet 11:356–366

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  61. Schoech AP, Jordan DM, Loh P-R et al (2019) Quantification of frequency-dependent genetic architectures in 25 UK Biobank traits reveals action of negative selection. Nat Commun 10:790

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  62. Shi H, Kichaev G, Pasaniuc B (2016) Contrasting the genetic architecture of 30 complex traits from summary association data. Am J Hum Genet 99:139–153

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  63. Shi H, Mancuso N, Spendlove S et al (2017) Local genetic correlation gives insights into the shared genetic architecture of complex traits. Am J Hum Genet 101:737–751

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  64. Spiller W, Davies NM, Palmer TM (2019) Software application profile: mrrobust—a tool for performing two-sample summary Mendelian randomization analyses. Int J Epidemiol 48:684

    Article  Google Scholar 

  65. Spracklen CN, Chen P, Kim YJ et al (2017) Association analyses of East Asian individuals and trans-ancestry analyses with European individuals reveal new loci associated with cholesterol and triglyceride levels. Hum Mol Genet 26:1770–1784

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  66. Spracklen CN, Horikoshi M, Kim YJ et al (2020) Identification of type 2 diabetes loci in 433,540 East Asian individuals. Nature 582:240–245

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  67. Stefanski LA, Cook JR (1995) Simulation-extrapolation: the measurement error jackknife. J Am Stat Assoc 90:1247–1256

    Article  Google Scholar 

  68. Tam V, Patel N, Turcotte M et al (2019) Benefits and limitations of genome-wide association studies. Nat Rev Genet 20:467–484

    CAS  PubMed  Article  PubMed Central  Google Scholar 

  69. Teo Y-Y, Small KS, Kwiatkowski DP (2010) Methodological challenges of genome-wide association analysis in Africa. Nat Rev Genet 11:149–160

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  70. The 1000 Genomes Project Consortium (2015) A global reference for human genetic variation. Nature 526:68–74

    PubMed Central  Article  CAS  Google Scholar 

  71. The Wellcome Trust Case Control Consortium (2007) Genome-wide association study of 14,000 cases of seven common diseases and 3000 shared controls. Nature 447:661–678

    PubMed Central  Article  CAS  Google Scholar 

  72. van Rheenen W, Peyrot WJ, Schork AJ et al (2019) Genetic correlations of polygenic disease traits: from theory to practice. Nat Rev Genet 20:567–581

    PubMed  Article  CAS  PubMed Central  Google Scholar 

  73. van’t Hof FNG, Vaucher J, Holmes MV et al (2017) Genetic variants associated with type 2 diabetes and adiposity and risk of intracranial and abdominal aortic aneurysms. Eur J Hum Genet 25:758–762

    Article  CAS  Google Scholar 

  74. Veturi Y, de los Campos G, Yi N et al (2019) Modeling heterogeneity in the genetic architecture of ethnically diverse groups using random effect interaction models. Genetics 211:1395–1407

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  75. Visscher PM, Wray NR, Zhang Q et al (2017) 10 years of GWAS discovery: biology, function, and translation. Am J Hum Genet 101:5–22

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  76. Voight BF, Peloso GM, Orho-Melander M et al (2012) Plasma HDL cholesterol and risk of myocardial infarction: a mendelian randomisation study. Lancet 380:572–580

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  77. Vuckovic D, Bao EL, Akbari P et al (2020) The polygenic and monogenic basis of blood traits and diseases. Cell 182:1214-1231.e11

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  78. Wall JD, Pritchard JK (2003) Haplotype blocks and linkage disequilibrium in the human genome. Nat Rev Genet 4:587–597

    CAS  PubMed  Article  PubMed Central  Google Scholar 

  79. Wang H, Zhang F, Zeng J et al (2019) Genotype-by-environment interactions inferred from genetic effects on phenotypic variability in the UK Biobank. Sci Adv 5:eaaw3538

    PubMed  PubMed Central  Article  Google Scholar 

  80. Willer CJ, Li Y, Abecasis GR (2010) METAL: fast and efficient meta-analysis of genome-wide association scans. Bioinformatics 26:2190–2191

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  81. Wojcik GL, Graff M, Nishimura KK et al (2019) Genetic analyses of diverse populations improves discovery for complex traits. Nature 570:514–518

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  82. Yu X, Wang T, Chen Y et al (2020a) Alcohol drinking and amyotrophic lateral sclerosis: an instrumental variable causal inference. Ann Neurol 88:195–198

    PubMed  Article  PubMed Central  Google Scholar 

  83. Yu X, Yuan Z, Lu H et al (2020b) Relationship between birth weight and chronic kidney disease: evidence from systematics review and two-sample Mendelian randomization analysis. Hum Mol Genet 29:2261–2274

    CAS  PubMed  Article  PubMed Central  Google Scholar 

  84. Zaitlen N, Paşaniuc B, Gur T et al (2010) Leveraging genetic variability across populations for the identification of causal variants. Am J Hum Genet 86:23–33

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  85. Zeng P, Zhou X (2019) Causal effects of blood lipids on amyotrophic lateral sclerosis: a Mendelian randomization study. Hum Mol Genet 28:688–697

    CAS  PubMed  Article  PubMed Central  Google Scholar 

  86. Zeng J, De Vlaming R, Wu Y et al (2018) Signatures of negative selection in the genetic architecture of human complex traits. Nat Genet 50:746–753

    CAS  PubMed  Article  PubMed Central  Google Scholar 

  87. Zeng P, Wang T, Zheng J et al (2019) Causal association of type 2 diabetes with amyotrophic lateral sclerosis: new evidence from Mendelian randomization using GWAS summary statistics. BMC Med 17:225

    PubMed  PubMed Central  Article  Google Scholar 

  88. Zhang X, Rice M, Tworoger SS et al (2018) Addition of a polygenic risk score, mammographic density, and endogenous hormones to existing breast cancer risk prediction models: a nested case–control study. PLoS Med 15:e1002644

    PubMed  PubMed Central  Article  CAS  Google Scholar 

  89. Zhu Z, Zhang F, Hu H et al (2016) Integration of summary data from GWAS and eQTL studies predicts complex trait gene targets. Nat Genet 48:481–487

    CAS  PubMed  Article  PubMed Central  Google Scholar 

  90. Zhu Z, Zheng Z, Zhang F et al (2018) Causal associations between risk factors and common diseases inferred from GWAS summary data. Nat Commun 9:224

    PubMed  PubMed Central  Article  CAS  Google Scholar 

  91. Zollner S, Pritchard J (2007) Overcoming the winner’s curse: estimating penetrance parameters from case-control. Am J Hum Genet 80:605–615

    CAS  PubMed  PubMed Central  Article  Google Scholar 

Download references

Acknowledgements

We thank all the GWAS consortia for making summary statistics publicly available for us and are grateful to all the investigators and participants contributed to those studies. The data analyses in the present study were carried out with the high-performance computing cluster that was supported by the special central finance project of local universities for Xuzhou Medical University. We are especially grateful to two anonymous referees for making a lot of constructive comments that have led to substantial improvements of our manuscript.

Funding

The research of Ping Zeng was supported in part by the Youth Foundation of Humanity and Social Science funded by Ministry of Education of China (18YJC910002), the Natural Science Foundation of Jiangsu Province of China (BK20181472), the China Postdoctoral Science Foundation (2018M630607 and 2019T120465), the QingLan Research Project of Jiangsu Province for Outstanding Young Teachers, the Six-Talent Peaks Project in Jiangsu Province of China (WSN-087), the Training Project for Youth Teams of Science and Technology Innovation at Xuzhou Medical University (TD202008), the Postdoctoral Science Foundation of Xuzhou Medical University, the National Natural Science Foundation of China (81402765), and the Statistical Science Research Project from National Bureau of Statistics of China (2014LY112). The research of Shuiping Huang was supported in part by the Social Development Project of Xuzhou City (KC19017). The research of Ting Wang was supported in part by the Social Development Project of Xuzhou City (KC20062).

Author information

Affiliations

Authors

Contributions

PZ conceived the idea for the study. PZ, TW, and SH obtained and cleared the datasets; PZ, HL, TW, JZ, and SZ performed the data analyses. PZ, HL, and TW interpreted the results of the data analyses. PZ and HL wrote the manuscript with the help from other authors.

Corresponding author

Correspondence to Ping Zeng.

Ethics declarations

Conflict of interest

The authors declare no conflicts of interest.

Consent for publication

All the authors agreed that this manuscript be submitted to the journal of Human Genetics for publication.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary file1 (DOCX 5147 KB)

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Lu, H., Wang, T., Zhang, J. et al. Evaluating marginal genetic correlation of associated loci for complex diseases and traits between European and East Asian populations. Hum Genet (2021). https://doi.org/10.1007/s00439-021-02299-8

Download citation