Skip to main content

Principal Component Analysis Reduces Collider Bias in Polygenic Score Effect Size Estimation

Abstract

In this study, we test principal component analysis (PCA) of measured confounders as a method to reduce collider bias in polygenic association models. We present results from simulations and application of the method in the Collaborative Study of the Genetics of Alcoholism (COGA) sample with a polygenic score for alcohol problems, DSM-5 alcohol use disorder as the target phenotype, and two collider variables: tobacco use and educational attainment. Simulation results suggest that assumptions regarding the correlation structure and availability of measured confounders are complementary, such that meeting one assumption relaxes the other. Application of the method in COGA shows that PC covariates reduce collider bias when tobacco use is used as the collider variable. Application of this method may improve PRS effect size estimation in some cases by reducing the effect of collider bias, making efficient use of data resources that are available in many studies.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Availability of Data and Material

Data from the Collaborative Study on the Genetics of Alcoholism (COGA) are available via dbGaP (phs000763.v1.p1, phs000125.v1.p1) or through the National Institute on Alcohol Abuse and Alcoholism.

Code Availability

The R scripts used in this work are available on GitHub at https://github.com/thomasns0/PCA_Collider.git.

References

  • Akimova ET, Breen R, Brazel DM, Mills MC (2021) Gene-environment dependencies lead to collider bias in models with polygenic scores. Sci Rep 11(1):9457. https://doi.org/10.1038/s41598-021-89020-x

    Article  PubMed  PubMed Central  Google Scholar 

  • American Psychiatric Association (2013) Diagnostic and statistical manual of mental disorders: DSM-5. (5th edition). American Psychiatric Association

  • Barr PB, Ksinan A, Su J, Johnson EC, Meyers JL, Wetherill L, Latvala A, Aliev F, Chan G, Kuperman S, Nurnberger J, Kamarajan C, Anokhin A, Agrawal A, Rose RJ, Edenberg HJ, Schuckit M, Kaprio J, Dick DM (2020) Using polygenic scores for identifying individuals at increased risk of substance use disorders in clinical and population samples. Translational Psychiatry 10(1):1–9. https://doi.org/10.1038/s41398-020-00865-8

    Article  Google Scholar 

  • Begleiter H (1995) The Collaborative Study on the Genetics of Alcoholism. Alcohol Health and Research World 19(3):228–236

    Google Scholar 

  • Bucholz KK, Cadoret R, Cloninger CR, Dinwiddie SH, Hesselbrock VM, Nurnberger JI, Reich T, Schmidt I, Schuckit MA (1994) A new, semi-structured psychiatric interview for use in genetic linkage studies: A report on the reliability of the SSAGA. J Stud Alcohol 55(2):149–158. https://doi.org/10.15288/jsa.1994.55.149

    Article  PubMed  Google Scholar 

  • Bucholz KK, McCutcheon VV, Agrawal A, Dick DM, Hesselbrock VM, Kramer JR, Kuperman S, Nurnberger JI, Salvatore JE, Schuckit MA, Bierut LJ, Foroud TM, Chan G, Hesselbrock M, Meyers JL, Edenberg HJ, Porjesz B (2017) Comparison of parent, peer, psychiatric, and cannabis use influences across stages of offspring alcohol involvement: Evidence from the COGA Prospective Study. Alcohol Clin Exp Res 41(2):359–368. https://doi.org/10.1111/acer.13293

    Article  PubMed  PubMed Central  Google Scholar 

  • Cheng H, Furnham A (2021) Personality, educational and social class predictors of adult tobacco usage. Pers Indiv Differ 182:111085. https://doi.org/10.1016/j.paid.2021.111085

    Article  Google Scholar 

  • Dinno A (2018) paran: Horn’s Test of Principal Components/Factors (R package version 1.5.2) [Computer software]. https://CRAN.R-project.org/package=paran

  • Domingue BW, Trejo S, Armstrong-Carter E, Tucker-Drob EM (2020) Interactions between Polygenic Scores and Environments: Methodological and Conceptual Challenges. Sociol Sci 7:465–486. https://doi.org/10.15195/v7.a19

    Article  Google Scholar 

  • Duncan LE, Ostacher M, Ballon J (2019) How genome-wide association studies (GWAS) made traditional candidate gene studies obsolete. Neuropsychopharmacology 44(9):1518–1523. https://doi.org/10.1038/s41386-019-0389-5

    Article  PubMed  PubMed Central  Google Scholar 

  • Esch P, Bocquet V, Pull C, Couffignal S, Lehnert T, Graas M, Fond-Harmant L, Ansseau M (2014) The downward spiral of mental disorders and educational attainment: A systematic review on early school leaving. BMC Psychiatry 14(1):237. https://doi.org/10.1186/s12888-014-0237-4

    Article  PubMed  PubMed Central  Google Scholar 

  • Fox J (2019) polycor: Polychoric and Polyserial Correlations. (R package version 0.7–10) [Computer software]. https://CRAN.R-project.org/package=polycor

  • Ge T, Chen C-Y, Ni Y, Feng Y-CA, Smoller JW (2019) Polygenic prediction via Bayesian regression and continuous shrinkage priors. Nat Commun 10(1):1–10. https://doi.org/10.1038/s41467-019-09718-5

    Article  Google Scholar 

  • Green VR, Conway KP, Silveira ML, Kasza KA, Cohn A, Cummings KM, Stanton CA, Callahan-Lyon P, Slavit W, Sargent JD, Hilmi N, Niaura RS, Reissig CJ, Lambert E, Zandberg I, Brunette MF, Tanski SE, Borek N, Hyland AJ, Compton WM (2018) Mental Health Problems and Onset of Tobacco Use Among 12- to 24-Year-Olds in the PATH Study. J Am Acad Child Adolesc Psychiatry 57(12):944–954e4. https://doi.org/10.1016/j.jaac.2018.06.029

    Article  PubMed  PubMed Central  Google Scholar 

  • Heatherton TF, Kozlowski LT, Frecker RC, Fagerström KO (1991) The Fagerström Test for Nicotine Dependence: A revision of the Fagerström Tolerance Questionnaire. Br J Addict 86(9):1119–1127. https://doi.org/10.1111/j.1360-0443.1991.tb01879.x

    Article  PubMed  Google Scholar 

  • Hesselbrock M, Easton C, Bucholz KK, Schuckit M, Hesselbrock V (1999) A validity study of the SSAGA–a comparison with the SCAN. Addiction (Abingdon England) 94(9):1361–1370. https://doi.org/10.1046/j.1360-0443.1999.94913618.x

    Article  Google Scholar 

  • Keller MC (2014) Gene × environment interaction studies have not properly controlled for potential confounders: The problem and the (simple) solution. Biol Psychiatry 75(1):18–24. https://doi.org/10.1016/j.biopsych.2013.09.006

    Article  PubMed  Google Scholar 

  • Kowarik A, Templ M (2016) Imputation with the R Package VIM. J Stat Softw 74(1):1–16. https://doi.org/10.18637/jss.v074.i07

    Article  Google Scholar 

  • Kranzler HR, Zhou H, Kember RL, Smith RV, Justice AC, Damrauer S, Tsao PS, Klarin D, Baras A, Reid J, Overton J, Rader DJ, Cheng Z, Tate JP, Becker WC, Concato J, Xu K, Polimanti R, Zhao H, Gelernter J (2019) Genome-wide association study of alcohol consumption and use disorder in 274,424 individuals from multiple populations. Nat Commun 10(1):1–11. https://doi.org/10.1038/s41467-019-09480-8

    Article  Google Scholar 

  • Krapohl E, Rimfeld K, Shakeshaft NG, Trzaskowski M, McMillan A, Pingault J-B, Asbury K, Harlaar N, Kovas Y, Dale PS, Plomin R (2014) The high heritability of educational achievement reflects many genetically influenced traits, not just intelligence. Proceedings of the National Academy of Sciences, 111(42), 15273–15278. https://doi.org/10.1073/pnas.1408777111

  • Kuperman S, Chan G, Kramer JR, Wetherill L, Bucholz KK, Dick D, Hesselbrock V, Porjesz B, Rangaswamy M, Schuckit M (2013) A Model to Determine the Likely Age of an Adolescent’s First Drink of Alcohol. Pediatrics 131(2):242–248. https://doi.org/10.1542/peds.2012-0880

    Article  PubMed  PubMed Central  Google Scholar 

  • Lai D, Wetherill L, Bertelsen S, Carey CE, Kamarajan C, Kapoor M, Meyers JL, Anokhin AP, Bennett DA, Bucholz KK, Chang KK, De Jager PL, Dick DM, Hesselbrock V, Kramer J, Kuperman S, Nurnberger JI, Raj T, Schuckit M, Foroud T (2019) Genome-wide association studies of alcohol dependence, DSM-IV criterion count and individual criteria. Genes Brain Behav 18(6):e12579. https://doi.org/10.1111/gbb.12579

    Article  PubMed  PubMed Central  Google Scholar 

  • Martin AR, Daly MJ, Robinson EB, Hyman SE, Neale BM (2019) Predicting Polygenic Risk of Psychiatric Disorders. Biol Psychiatry 86(2):97–109. https://doi.org/10.1016/j.biopsych.2018.12.015

    Article  PubMed  Google Scholar 

  • Mõttus R, Realo A, Vainik U, Allik J, Esko T (2017) Educational Attainment and Personality Are Genetically Intertwined. Psychol Sci 28(11):1631–1639. https://doi.org/10.1177/0956797617719083

    Article  PubMed  Google Scholar 

  • Pasman JA, Verweij KJH, Vink JM (2019) Systematic Review of Polygenic Gene–Environment Interaction in Tobacco, Alcohol, and Cannabis Use. Behav Genet 49(4):349–365. https://doi.org/10.1007/s10519-019-09958-7

    Article  PubMed  PubMed Central  Google Scholar 

  • R Core Team (2017) R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. https://www.R-project.org/

  • Reich T, Edenberg HJ, Goate A, Williams JT, Rice JP, Van Eerdewegh P, Foroud T, Hesselbrock V, Schuckit MA, Bucholz K, Porjesz B, Li TK, Conneally PM, Nurnberger JI, Tischfield JA, Crowe RR, Cloninger CR, Wu W, Shears S, Begleiter H (1998) Genome-wide search for genes affecting the risk for alcohol dependence. Am J Med Genet 81(3):207–215

    Article  Google Scholar 

  • Sanchez-Roige S, Palmer AA, Fontanillas P, Elson SL, Adams MJ, Howard DM, Edenberg HJ, Davies G, Crist RC, Deary IJ, McIntosh AM, Clarke T-K (2019) Genome-wide association study meta-analysis of the Alcohol Use Disorder Identification Test (AUDIT) in two population-based cohorts. Am J Psychiatry 176(2):107–118. https://doi.org/10.1176/appi.ajp.2018.18040369

    Article  PubMed  Google Scholar 

  • Thomas NS, Kuo SI-C, Aliev F, McCutcheon VV, Jacquelyn MM, Chan G, Hesselbrock V, Kamarajan C, Kinreich S, Kramer JR, Kuperman S, Lai D, Plawecki MH, Porjesz B, Schuckit MA, Dick DM, Bucholz KK, Salvatore JE (2021) Alcohol Use Disorder, Psychiatric Comorbidities, Marriage and Divorce in a High-risk Sample [Manuscript submitted for publication]

  • Uher R, Zwicker A (2017) Etiology in psychiatry: Embracing the reality of poly-gene-environmental causation of mental illness. World Psychiatry 16(2):121–129. https://doi.org/10.1002/wps.20436

    Article  PubMed  PubMed Central  Google Scholar 

  • Veldman K, Bültmann U, Stewart RE, Ormel J, Verhulst FC, Reijneveld SA (2014) Mental Health Problems and Educational Attainment in Adolescence: 9-Year Follow-Up of the TRAILS Study. PLoS ONE 9(7):e101751. https://doi.org/10.1371/journal.pone.0101751

    Article  PubMed  PubMed Central  Google Scholar 

  • Walters RK, Polimanti R, Johnson EC, McClintick JN, Adams MJ, Adkins AE, Aliev F, Bacanu S-A, Batzler A, Bertelsen S, Biernacka JM, Bigdeli TB, Chen L-S, Clarke T-K, Chou Y-L, Degenhardt F, Docherty AR, Edwards AC, Fontanillas P, Agrawal A (2018) Transancestral GWAS of alcohol dependence reveals common genetic underpinnings with psychiatric disorders. Nat Neurosci 21(12):1656–1669. https://doi.org/10.1038/s41593-018-0275-1

    Article  PubMed  PubMed Central  Google Scholar 

  • Wickham H (2009) ggplot2: Elegant Graphics for Data Analysis. Springer-Verlag. https://www.springer.com/us/book/9780387981413

  • Wray NR, Lee SH, Mehta D, Vinkhuyzen AAE, Dudbridge F, Middeldorp CM (2014) Research review: Polygenic methods and their application to psychiatric traits. J Child Psychol Psychiatry Allied Discip 55(10):1068–1087. https://doi.org/10.1111/jcpp.12295

    Article  Google Scholar 

  • Zheng X, Levine D, Shen J, Gogarten SM, Laurie C, Weir BS (2012) A high-performance computing toolset for relatedness and principal component analysis of SNP data. Bioinf (Oxford England) 28(24):3326–3328. https://doi.org/10.1093/bioinformatics/bts606

    Article  Google Scholar 

  • Zhou H, Sealock JM, Sanchez-Roige S, Clarke T-K, Levey DF, Cheng Z, Li B, Polimanti R, Kember RL, Smith RV, Thygesen JH, Morgan MY, Atkinson SR, Thursz MR, Nyegaard M, Mattheisen M, Børglum AD, Johnson EC, Justice AC, Palmer AA, McQuillin A, Davis LK, Edenberg HJ, Agrawal A, Kranzler HR, Gelernter J (2020) Genome-wide meta-analysis of problematic alcohol use in 435,563 individuals yields insights into biology and relationships with other traits. Nat Neurosci 23:809–818. https://doi.org/10.1038/s41593-020-0643-5

Download references

Acknowledgements

The Collaborative Study on the Genetics of Alcoholism (COGA), Principal Investigators B. Porjesz, V. Hesselbrock, T. Foroud; Scientific Director, A. Agrawal; Translational Director, D. Dick, includes eleven different centers: University of Connecticut (V. Hesselbrock); Indiana University (H.J. Edenberg, T. Foroud, Y. Liu, M. Plawecki); University of Iowa Carver College of Medicine (S. Kuperman, J. Kramer); SUNY Downstate Health Sciences University (B. Porjesz, J. Meyers, C. Kamarajan, A. Pandey); Washington University in St. Louis (L. Bierut, J. Rice, K. Bucholz, A. Agrawal); University of California at San Diego (M. Schuckit); Rutgers University (J. Tischfield, R. Hart, J. Salvatore); The Children’s Hospital of Philadelphia, University of Pennsylvania (L. Almasy); Virginia Commonwealth University (D. Dick); Icahn School of Medicine at Mount Sinai (A. Goate, P. Slesinger); and Howard University (D. Scott). Other COGA collaborators include: L. Bauer (University of Connecticut); J. Nurnberger Jr., L. Wetherill, X., Xuei, D. Lai, S. O’Connor, (Indiana University); G. Chan (University of Iowa; University of Connecticut); D.B. Chorlian, J. Zhang, P. Barr, S. Kinreich, G. Pandey (SUNY Downstate); N. Mullins (Icahn School of Medicine at Mount Sinai); A. Anokhin, S. Hartz, E. Johnson, V. McCutcheon, S. Saccone (Washington University); J. Moore, Z. Pang, S. Kuo (Rutgers University); A. Merikangas (The Children’s Hospital of Philadelphia and University of Pennsylvania); F. Aliev (Virginia Commonwealth University); H. Chin and A. Parsian are the NIAAA Staff Collaborators. We continue to be inspired by our memories of Henri Begleiter and Theodore Reich, founding PI and Co-PI of COGA, and also owe a debt of gratitude to other past organizers of COGA, including Ting- Kai Li, P. Michael Conneally, Raymond Crowe, and Wendy Reich, for their critical contributions. This national collaborative study is supported by NIH Grant U10AA008401 from the National Institute on Alcohol Abuse and Alcoholism (NIAAA) and the National Institute on Drug Abuse (NIDA). This work was also supported by the National Institutes of Health (NIH) Grants R01AA028064 (PI: Salvatore) and K01AA024152 (PI: Salvatore) from the National Institute on Alcohol Abuse and Alcoholism (NIAAA).

Funding

This work was supported by the National Institutes of Health (NIH) Grants R01AA028064 (PI: Salvatore) and K01AA024152 (PI: Salvatore) from the National Institute on Alcohol Abuse and Alcoholism (NIAAA). The Collaborative Study on the Genetics of Alcoholism (COGA) is supported by NIH Grant U10AA008401 (PI: Porjesz).

Author information

Authors and Affiliations

Authors

Contributions

Nathaniel S. Thomas: conceived of the study, conducted statistical analyses, and wrote the manuscript. Peter Barr, Fazil Aliev, Mallory Stephenson, and Sally I-Chun Kuo assisted with the design and implementation of the study and provided editorial feedback on the whole manuscript. Grace Chan, Danielle M. Dick, Howard J. Edenberg, Victor Hesselbrock, and Chella Kamarajan provided editorial feedback on the whole manuscript. Jessica E. Salvatore supervised the design and implementation of the study and provided editorial feedback on the whole manuscript. All authors contributed to and have approved the final manuscript.

Corresponding author

Correspondence to Nathaniel S. Thomas.

Ethics declarations

Conflicts of Interest/Competing Interests

Nathaniel S. Thomas, Peter Barr, Fazil Aliev, Mallory Stephenson, Sally I-Chun Kuo, Grace Chan, Danielle M. Dick, Howard J. Edenberg, Victor Hesselbrock, Chella Kamarajan, and Jessica E. Salvatore declare that they have no conflicts of interest.

Ethics Approval

The Institutional Review Board at all data collection sites approved the study.

Consent to Participate

Written consent was obtained from all participants.

Consent for publication

NA.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Edited by Valerie Knopik.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary Material 1

Supplementary Material 2

Rights and permissions

Reprints and Permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Thomas, N.S., Barr, P., Aliev, F. et al. Principal Component Analysis Reduces Collider Bias in Polygenic Score Effect Size Estimation. Behav Genet 52, 268–280 (2022). https://doi.org/10.1007/s10519-022-10104-z

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10519-022-10104-z

Keywords