Abstract
In this study, we test principal component analysis (PCA) of measured confounders as a method to reduce collider bias in polygenic association models. We present results from simulations and application of the method in the Collaborative Study of the Genetics of Alcoholism (COGA) sample with a polygenic score for alcohol problems, DSM-5 alcohol use disorder as the target phenotype, and two collider variables: tobacco use and educational attainment. Simulation results suggest that assumptions regarding the correlation structure and availability of measured confounders are complementary, such that meeting one assumption relaxes the other. Application of the method in COGA shows that PC covariates reduce collider bias when tobacco use is used as the collider variable. Application of this method may improve PRS effect size estimation in some cases by reducing the effect of collider bias, making efficient use of data resources that are available in many studies.
This is a preview of subscription content, access via your institution.




Availability of Data and Material
Data from the Collaborative Study on the Genetics of Alcoholism (COGA) are available via dbGaP (phs000763.v1.p1, phs000125.v1.p1) or through the National Institute on Alcohol Abuse and Alcoholism.
Code Availability
The R scripts used in this work are available on GitHub at https://github.com/thomasns0/PCA_Collider.git.
References
Akimova ET, Breen R, Brazel DM, Mills MC (2021) Gene-environment dependencies lead to collider bias in models with polygenic scores. Sci Rep 11(1):9457. https://doi.org/10.1038/s41598-021-89020-x
American Psychiatric Association (2013) Diagnostic and statistical manual of mental disorders: DSM-5. (5th edition). American Psychiatric Association
Barr PB, Ksinan A, Su J, Johnson EC, Meyers JL, Wetherill L, Latvala A, Aliev F, Chan G, Kuperman S, Nurnberger J, Kamarajan C, Anokhin A, Agrawal A, Rose RJ, Edenberg HJ, Schuckit M, Kaprio J, Dick DM (2020) Using polygenic scores for identifying individuals at increased risk of substance use disorders in clinical and population samples. Translational Psychiatry 10(1):1–9. https://doi.org/10.1038/s41398-020-00865-8
Begleiter H (1995) The Collaborative Study on the Genetics of Alcoholism. Alcohol Health and Research World 19(3):228–236
Bucholz KK, Cadoret R, Cloninger CR, Dinwiddie SH, Hesselbrock VM, Nurnberger JI, Reich T, Schmidt I, Schuckit MA (1994) A new, semi-structured psychiatric interview for use in genetic linkage studies: A report on the reliability of the SSAGA. J Stud Alcohol 55(2):149–158. https://doi.org/10.15288/jsa.1994.55.149
Bucholz KK, McCutcheon VV, Agrawal A, Dick DM, Hesselbrock VM, Kramer JR, Kuperman S, Nurnberger JI, Salvatore JE, Schuckit MA, Bierut LJ, Foroud TM, Chan G, Hesselbrock M, Meyers JL, Edenberg HJ, Porjesz B (2017) Comparison of parent, peer, psychiatric, and cannabis use influences across stages of offspring alcohol involvement: Evidence from the COGA Prospective Study. Alcohol Clin Exp Res 41(2):359–368. https://doi.org/10.1111/acer.13293
Cheng H, Furnham A (2021) Personality, educational and social class predictors of adult tobacco usage. Pers Indiv Differ 182:111085. https://doi.org/10.1016/j.paid.2021.111085
Dinno A (2018) paran: Horn’s Test of Principal Components/Factors (R package version 1.5.2) [Computer software]. https://CRAN.R-project.org/package=paran
Domingue BW, Trejo S, Armstrong-Carter E, Tucker-Drob EM (2020) Interactions between Polygenic Scores and Environments: Methodological and Conceptual Challenges. Sociol Sci 7:465–486. https://doi.org/10.15195/v7.a19
Duncan LE, Ostacher M, Ballon J (2019) How genome-wide association studies (GWAS) made traditional candidate gene studies obsolete. Neuropsychopharmacology 44(9):1518–1523. https://doi.org/10.1038/s41386-019-0389-5
Esch P, Bocquet V, Pull C, Couffignal S, Lehnert T, Graas M, Fond-Harmant L, Ansseau M (2014) The downward spiral of mental disorders and educational attainment: A systematic review on early school leaving. BMC Psychiatry 14(1):237. https://doi.org/10.1186/s12888-014-0237-4
Fox J (2019) polycor: Polychoric and Polyserial Correlations. (R package version 0.7–10) [Computer software]. https://CRAN.R-project.org/package=polycor
Ge T, Chen C-Y, Ni Y, Feng Y-CA, Smoller JW (2019) Polygenic prediction via Bayesian regression and continuous shrinkage priors. Nat Commun 10(1):1–10. https://doi.org/10.1038/s41467-019-09718-5
Green VR, Conway KP, Silveira ML, Kasza KA, Cohn A, Cummings KM, Stanton CA, Callahan-Lyon P, Slavit W, Sargent JD, Hilmi N, Niaura RS, Reissig CJ, Lambert E, Zandberg I, Brunette MF, Tanski SE, Borek N, Hyland AJ, Compton WM (2018) Mental Health Problems and Onset of Tobacco Use Among 12- to 24-Year-Olds in the PATH Study. J Am Acad Child Adolesc Psychiatry 57(12):944–954e4. https://doi.org/10.1016/j.jaac.2018.06.029
Heatherton TF, Kozlowski LT, Frecker RC, Fagerström KO (1991) The Fagerström Test for Nicotine Dependence: A revision of the Fagerström Tolerance Questionnaire. Br J Addict 86(9):1119–1127. https://doi.org/10.1111/j.1360-0443.1991.tb01879.x
Hesselbrock M, Easton C, Bucholz KK, Schuckit M, Hesselbrock V (1999) A validity study of the SSAGA–a comparison with the SCAN. Addiction (Abingdon England) 94(9):1361–1370. https://doi.org/10.1046/j.1360-0443.1999.94913618.x
Keller MC (2014) Gene × environment interaction studies have not properly controlled for potential confounders: The problem and the (simple) solution. Biol Psychiatry 75(1):18–24. https://doi.org/10.1016/j.biopsych.2013.09.006
Kowarik A, Templ M (2016) Imputation with the R Package VIM. J Stat Softw 74(1):1–16. https://doi.org/10.18637/jss.v074.i07
Kranzler HR, Zhou H, Kember RL, Smith RV, Justice AC, Damrauer S, Tsao PS, Klarin D, Baras A, Reid J, Overton J, Rader DJ, Cheng Z, Tate JP, Becker WC, Concato J, Xu K, Polimanti R, Zhao H, Gelernter J (2019) Genome-wide association study of alcohol consumption and use disorder in 274,424 individuals from multiple populations. Nat Commun 10(1):1–11. https://doi.org/10.1038/s41467-019-09480-8
Krapohl E, Rimfeld K, Shakeshaft NG, Trzaskowski M, McMillan A, Pingault J-B, Asbury K, Harlaar N, Kovas Y, Dale PS, Plomin R (2014) The high heritability of educational achievement reflects many genetically influenced traits, not just intelligence. Proceedings of the National Academy of Sciences, 111(42), 15273–15278. https://doi.org/10.1073/pnas.1408777111
Kuperman S, Chan G, Kramer JR, Wetherill L, Bucholz KK, Dick D, Hesselbrock V, Porjesz B, Rangaswamy M, Schuckit M (2013) A Model to Determine the Likely Age of an Adolescent’s First Drink of Alcohol. Pediatrics 131(2):242–248. https://doi.org/10.1542/peds.2012-0880
Lai D, Wetherill L, Bertelsen S, Carey CE, Kamarajan C, Kapoor M, Meyers JL, Anokhin AP, Bennett DA, Bucholz KK, Chang KK, De Jager PL, Dick DM, Hesselbrock V, Kramer J, Kuperman S, Nurnberger JI, Raj T, Schuckit M, Foroud T (2019) Genome-wide association studies of alcohol dependence, DSM-IV criterion count and individual criteria. Genes Brain Behav 18(6):e12579. https://doi.org/10.1111/gbb.12579
Martin AR, Daly MJ, Robinson EB, Hyman SE, Neale BM (2019) Predicting Polygenic Risk of Psychiatric Disorders. Biol Psychiatry 86(2):97–109. https://doi.org/10.1016/j.biopsych.2018.12.015
Mõttus R, Realo A, Vainik U, Allik J, Esko T (2017) Educational Attainment and Personality Are Genetically Intertwined. Psychol Sci 28(11):1631–1639. https://doi.org/10.1177/0956797617719083
Pasman JA, Verweij KJH, Vink JM (2019) Systematic Review of Polygenic Gene–Environment Interaction in Tobacco, Alcohol, and Cannabis Use. Behav Genet 49(4):349–365. https://doi.org/10.1007/s10519-019-09958-7
R Core Team (2017) R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. https://www.R-project.org/
Reich T, Edenberg HJ, Goate A, Williams JT, Rice JP, Van Eerdewegh P, Foroud T, Hesselbrock V, Schuckit MA, Bucholz K, Porjesz B, Li TK, Conneally PM, Nurnberger JI, Tischfield JA, Crowe RR, Cloninger CR, Wu W, Shears S, Begleiter H (1998) Genome-wide search for genes affecting the risk for alcohol dependence. Am J Med Genet 81(3):207–215
Sanchez-Roige S, Palmer AA, Fontanillas P, Elson SL, Adams MJ, Howard DM, Edenberg HJ, Davies G, Crist RC, Deary IJ, McIntosh AM, Clarke T-K (2019) Genome-wide association study meta-analysis of the Alcohol Use Disorder Identification Test (AUDIT) in two population-based cohorts. Am J Psychiatry 176(2):107–118. https://doi.org/10.1176/appi.ajp.2018.18040369
Thomas NS, Kuo SI-C, Aliev F, McCutcheon VV, Jacquelyn MM, Chan G, Hesselbrock V, Kamarajan C, Kinreich S, Kramer JR, Kuperman S, Lai D, Plawecki MH, Porjesz B, Schuckit MA, Dick DM, Bucholz KK, Salvatore JE (2021) Alcohol Use Disorder, Psychiatric Comorbidities, Marriage and Divorce in a High-risk Sample [Manuscript submitted for publication]
Uher R, Zwicker A (2017) Etiology in psychiatry: Embracing the reality of poly-gene-environmental causation of mental illness. World Psychiatry 16(2):121–129. https://doi.org/10.1002/wps.20436
Veldman K, Bültmann U, Stewart RE, Ormel J, Verhulst FC, Reijneveld SA (2014) Mental Health Problems and Educational Attainment in Adolescence: 9-Year Follow-Up of the TRAILS Study. PLoS ONE 9(7):e101751. https://doi.org/10.1371/journal.pone.0101751
Walters RK, Polimanti R, Johnson EC, McClintick JN, Adams MJ, Adkins AE, Aliev F, Bacanu S-A, Batzler A, Bertelsen S, Biernacka JM, Bigdeli TB, Chen L-S, Clarke T-K, Chou Y-L, Degenhardt F, Docherty AR, Edwards AC, Fontanillas P, Agrawal A (2018) Transancestral GWAS of alcohol dependence reveals common genetic underpinnings with psychiatric disorders. Nat Neurosci 21(12):1656–1669. https://doi.org/10.1038/s41593-018-0275-1
Wickham H (2009) ggplot2: Elegant Graphics for Data Analysis. Springer-Verlag. https://www.springer.com/us/book/9780387981413
Wray NR, Lee SH, Mehta D, Vinkhuyzen AAE, Dudbridge F, Middeldorp CM (2014) Research review: Polygenic methods and their application to psychiatric traits. J Child Psychol Psychiatry Allied Discip 55(10):1068–1087. https://doi.org/10.1111/jcpp.12295
Zheng X, Levine D, Shen J, Gogarten SM, Laurie C, Weir BS (2012) A high-performance computing toolset for relatedness and principal component analysis of SNP data. Bioinf (Oxford England) 28(24):3326–3328. https://doi.org/10.1093/bioinformatics/bts606
Zhou H, Sealock JM, Sanchez-Roige S, Clarke T-K, Levey DF, Cheng Z, Li B, Polimanti R, Kember RL, Smith RV, Thygesen JH, Morgan MY, Atkinson SR, Thursz MR, Nyegaard M, Mattheisen M, Børglum AD, Johnson EC, Justice AC, Palmer AA, McQuillin A, Davis LK, Edenberg HJ, Agrawal A, Kranzler HR, Gelernter J (2020) Genome-wide meta-analysis of problematic alcohol use in 435,563 individuals yields insights into biology and relationships with other traits. Nat Neurosci 23:809–818. https://doi.org/10.1038/s41593-020-0643-5
Acknowledgements
The Collaborative Study on the Genetics of Alcoholism (COGA), Principal Investigators B. Porjesz, V. Hesselbrock, T. Foroud; Scientific Director, A. Agrawal; Translational Director, D. Dick, includes eleven different centers: University of Connecticut (V. Hesselbrock); Indiana University (H.J. Edenberg, T. Foroud, Y. Liu, M. Plawecki); University of Iowa Carver College of Medicine (S. Kuperman, J. Kramer); SUNY Downstate Health Sciences University (B. Porjesz, J. Meyers, C. Kamarajan, A. Pandey); Washington University in St. Louis (L. Bierut, J. Rice, K. Bucholz, A. Agrawal); University of California at San Diego (M. Schuckit); Rutgers University (J. Tischfield, R. Hart, J. Salvatore); The Children’s Hospital of Philadelphia, University of Pennsylvania (L. Almasy); Virginia Commonwealth University (D. Dick); Icahn School of Medicine at Mount Sinai (A. Goate, P. Slesinger); and Howard University (D. Scott). Other COGA collaborators include: L. Bauer (University of Connecticut); J. Nurnberger Jr., L. Wetherill, X., Xuei, D. Lai, S. O’Connor, (Indiana University); G. Chan (University of Iowa; University of Connecticut); D.B. Chorlian, J. Zhang, P. Barr, S. Kinreich, G. Pandey (SUNY Downstate); N. Mullins (Icahn School of Medicine at Mount Sinai); A. Anokhin, S. Hartz, E. Johnson, V. McCutcheon, S. Saccone (Washington University); J. Moore, Z. Pang, S. Kuo (Rutgers University); A. Merikangas (The Children’s Hospital of Philadelphia and University of Pennsylvania); F. Aliev (Virginia Commonwealth University); H. Chin and A. Parsian are the NIAAA Staff Collaborators. We continue to be inspired by our memories of Henri Begleiter and Theodore Reich, founding PI and Co-PI of COGA, and also owe a debt of gratitude to other past organizers of COGA, including Ting- Kai Li, P. Michael Conneally, Raymond Crowe, and Wendy Reich, for their critical contributions. This national collaborative study is supported by NIH Grant U10AA008401 from the National Institute on Alcohol Abuse and Alcoholism (NIAAA) and the National Institute on Drug Abuse (NIDA). This work was also supported by the National Institutes of Health (NIH) Grants R01AA028064 (PI: Salvatore) and K01AA024152 (PI: Salvatore) from the National Institute on Alcohol Abuse and Alcoholism (NIAAA).
Funding
This work was supported by the National Institutes of Health (NIH) Grants R01AA028064 (PI: Salvatore) and K01AA024152 (PI: Salvatore) from the National Institute on Alcohol Abuse and Alcoholism (NIAAA). The Collaborative Study on the Genetics of Alcoholism (COGA) is supported by NIH Grant U10AA008401 (PI: Porjesz).
Author information
Authors and Affiliations
Contributions
Nathaniel S. Thomas: conceived of the study, conducted statistical analyses, and wrote the manuscript. Peter Barr, Fazil Aliev, Mallory Stephenson, and Sally I-Chun Kuo assisted with the design and implementation of the study and provided editorial feedback on the whole manuscript. Grace Chan, Danielle M. Dick, Howard J. Edenberg, Victor Hesselbrock, and Chella Kamarajan provided editorial feedback on the whole manuscript. Jessica E. Salvatore supervised the design and implementation of the study and provided editorial feedback on the whole manuscript. All authors contributed to and have approved the final manuscript.
Corresponding author
Ethics declarations
Conflicts of Interest/Competing Interests
Nathaniel S. Thomas, Peter Barr, Fazil Aliev, Mallory Stephenson, Sally I-Chun Kuo, Grace Chan, Danielle M. Dick, Howard J. Edenberg, Victor Hesselbrock, Chella Kamarajan, and Jessica E. Salvatore declare that they have no conflicts of interest.
Ethics Approval
The Institutional Review Board at all data collection sites approved the study.
Consent to Participate
Written consent was obtained from all participants.
Consent for publication
NA.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Edited by Valerie Knopik.
Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
About this article
Cite this article
Thomas, N.S., Barr, P., Aliev, F. et al. Principal Component Analysis Reduces Collider Bias in Polygenic Score Effect Size Estimation. Behav Genet 52, 268–280 (2022). https://doi.org/10.1007/s10519-022-10104-z
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10519-022-10104-z