GW-SEM 2.0: Efficient, Flexible, and Accessible Multivariate GWAS

Pritikin, Joshua N.; Neale, Michael C.; Prom-Wormley, Elizabeth C.; Clark, Shaunna L.; Verhulst, Brad

doi:10.1007/s10519-021-10043-1

GW-SEM 2.0: Efficient, Flexible, and Accessible Multivariate GWAS

Original Research
Published: 19 February 2021

Volume 51, pages 343–357, (2021)
Cite this article

Behavior Genetics Aims and scope Submit manuscript

Joshua N. Pritikin^1,2,
Michael C. Neale^1,2,3,
Elizabeth C. Prom-Wormley⁴,
Shaunna L. Clark⁵ &
…
Brad Verhulst ORCID: orcid.org/0000-0001-5369-9757⁵

2051 Accesses
8 Citations
5 Altmetric
Explore all metrics

Abstract

Most genome-wide association study (GWAS) analyses test the association between single-nucleotide polymorphisms (SNPs) and a single trait or outcome. While valuable second-step analyses of these associations (e.g., calculating genetic correlations between traits) are common, single-step multivariate analyses of GWAS data are rarely performed. This is unfortunate because multivariate analyses can reveal information which is irrevocably obscured in multi-step analysis. One simple example is the distinction between variance common to a set of measures, and variance specific to each. Neither GWAS of sum- or factor-scores, nor GWAS of the individual measures will deliver a clean picture of loci associated with each measure’s specific variance. While multivariate GWAS opens up a broad new landscape of feasible and informative analyses, its adoption has been slow, likely due to the heavy computational demands and difficulties specifying models it requires. Here we describe GW-SEM 2.0, which is designed to simplify model specification and overcome the inherent computational challenges associated with multivariate GWAS. In addition, GW-SEM 2.0 allows users to accurately model ordinal items, which are common in behavioral and psychological research, within a GWAS context. This new release enhances computational efficiency, allows users to select the fit function that is appropriate for their analyses, expands compatibility with standard genomic data formats, and outputs results for seamless reading into other standard post-GWAS processing software. To demonstrate GW-SEM’s utility, we conducted (1) a series of GWAS using three substance use frequency items from data in the UK Biobank, (2) a timing study for several predefined GWAS functions, and (3) a Type I Error rate study. Our multivariate GWAS analyses emphasize the utility of GW-SEM for identifying novel patterns of associations that vary considerably between genomic loci for specific substances, highlighting the importance of differentiating between substance-specific use behaviors and polysubstance use. The timing studies demonstrate that the analyses take a reasonable amount of time and show the cost of including additional items. The Type I Error rate study demonstrates that hypothesis tests for genetic associations with latent variable models follow the hypothesized uniform distribution. Taken together, we suggest that GW-SEM may provide substantially deeper insights into the underlying genomic architecture for multivariate behavioral and psychological systems than is currently possible with standard GWAS methods. The current release of GW-SEM 2.0 is available on CRAN (stable release) and GitHub (beta release), and tutorials are available on our github wiki (https://jpritikin.github.io/gwsem/).

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

GW-SEM: A Statistical Package to Conduct Genome-Wide Structural Equation Modeling

Article 15 March 2017

Guidelines for Evaluating the Comparability of Down-Sampled GWAS Summary Statistics

Article Open access 15 September 2023

Beyond the factor indeterminacy problem using genome-wide association data

Article 15 January 2024

References

Allen NE, Sudlow C, Peakman T, Collins R, Uk biobank (2014) Uk biobank data: come and get it. Sci Transl Med 6(224):224ed4. https://doi.org/10.1126/scitranslmed.3008601
Article PubMed Google Scholar
Asparouhov T, Muthén B (2010) Weighted least squares estimation with missing data. http://ww.statmodel2.com/download/GstrucMissingRevision.pdf. Accessed 1 Nov 2016
Band G, Marchini J (2018) BGEN: a binary file format for imputed genotype and haplotype data. https://doi.org/10.1101/308296
Barrett JC, Dunham I, Birney E (2015) Using human genetics to make new medicines. Nat Rev Genet 16(10):561–2. https://doi.org/10.1038/nrg3998
Article PubMed Google Scholar
Bidwell LC, McGeary JE, Gray JC, Palmer RHC, Knopik VS, MacKillop J (2015a) An initial investigation of associations between dopamine-linked genetic variation and smoking motives in African Americans. Pharmacol Biochem Behav 138:104–10. https://doi.org/10.1016/j.pbb.2015.09.018
Article PubMed PubMed Central Google Scholar
Bidwell LC, McGeary JE, Gray JC, Palmer RHC, Knopik VS, MacKillop J (2015b) Ncam1-ttc12-ankk1-drd2 variants and smoking motives as intermediate phenotypes for nicotine dependence. Psychopharmacology 232(7):1177–86. https://doi.org/10.1007/s00213-014-3748-2
Article PubMed Google Scholar
Bradley EL (1973) The equivalence of maximum likelihood and weighted least squares estimates in the exponential family. J Am Stat Assoc 68(341):199–200
Google Scholar
Bulik-Sullivan BK, Finucane HK, Anttila V, Gusev A, Day FR, Loh P-R, ReproGen Consortium, Psychiatric Genomics Consortium, Genetic Consortium for Anorexia Nervosa of the Wellcome Trust Case Control Consortium 3, Duncan, L, Perry JRB, Patterson N, Robinson EB, Daly MJ, Price AL, Neale BM (2015a) An atlas of genetic correlations across human diseases and traits. Nat Genet 47(11):1236–1241. https://doi.org/10.1038/ng.3406
Article PubMed PubMed Central Google Scholar
Bulik-Sullivan BK, Loh P-R, Finucane HK, Ripke S, Yang J, Schizophrenia Working Group of the Psychiatric Genomics Consortium, Patterson N, Daly MJ, Price AL, Neale BM (2015b) Ld score regression distinguishes confounding from polygenicity in genome-wide association studies. Nat Genet 47(3):291–295. https://doi.org/10.1038/ng.3211
Article PubMed PubMed Central Google Scholar
Cardon LR, Harris T (2016) Precision medicine, genomics and drug discovery. Hum Mol Genet 25(R2):R166–R172. https://doi.org/10.1093/hmg/ddw246
Article PubMed Google Scholar
Chang CC, Chow CC, Tellier LC, Vattikuti S, Purcell SM, Lee JJ (2015) Second-generation PLINK: rising to the challenge of larger and richer datasets. Gigascience 4(1):7. https://doi.org/10.1186/s13742-015-0047-8
Article PubMed PubMed Central Google Scholar
Devlin B, Roeder K (1999) Genomic control for association studies. Biometrics 55(4):997–1004. https://doi.org/10.1111/j.0006-341x.1999.00997.x
Article PubMed Google Scholar
Duncan LE, Keller MC (2011) A critical review of the first 10 years of candidate gene-by-environment interaction research in psychiatry. Am J Psychiatry 168(10):1041–9. https://doi.org/10.1176/appi.ajp.2011.11020191
Article PubMed PubMed Central Google Scholar
Enders CK, Bandalos DL (2001) The relative performance of full information maximum likelihood estimation for missing data in structural equation models. Struct Equ Model 8(3):430–457. https://doi.org/10.1207/S15328007SEM0803_5
Article Google Scholar
Grotzinger AD, Rhemtulla M, de Vlaming R, Ritchie SJ, Mallard TT, Hill WD, Ip HF, Marioni RE, McIntosh AM, Deary IJ, Koellinger PD, Harden KP, Nivard MG, Tucker-Drob EM (2019) Genomic structural equation modelling provides insights into the multivariate genetic architecture of complex traits. Nat Hum Behav 3(5):513–525. https://doi.org/10.1038/s41562-019-0566-x
Article PubMed PubMed Central Google Scholar
Hagenaars JA (1988) Latent structure models with direct effects between indicators local dependence models. Sociol Methods Res 16(3):379–405. https://doi.org/10.1177/0049124188016003002
Article Google Scholar
International Schizophrenia Consortium, Purcell SM, Wray NR, Stone JL, Visscher PM, O’Donovan MC, Sullivan PF, Sklar P (2009) Common polygenic variation contributes to risk of schizophrenia and bipolar disorder. Nature 460(7256):748–752. https://doi.org/10.1038/nature08185
Article PubMed Central Google Scholar
Jones MP (1996) Indicator and stratification methods for missing explanatory variables in multiple linear regression. J Am Stat Assoc 91(433):222–230
Article Google Scholar
Jöreskog KG (1990) New developments in LISREL: analysis of ordinal variables using polychoric correlations and weighted least squares. Qual Quant 24(4):387–404. https://doi.org/10.1007/BF00152012
Article Google Scholar
Jöreskog KG, Moustaki I (2001) Factor analysis of ordinal variables: a comparison of three approaches. Multivar Behav Res 36(3):347–387. https://doi.org/10.1207/S15327906347-387
Article Google Scholar
Lee S-Y, Poon W-Y, Bentler PM (1992) Structural equation models with continuous and polytomous variables. Psychometrika 57(1):89–105. https://doi.org/10.1007/BF02294660
Article Google Scholar
Lee JJ, Wedow R, Okbay A, Kong E, Maghzian O, Zacher M, Nguyen-Viet TA, Bowers P, Sidorenko J, Karlsson Linnér R, Fontana MA, Kundu T, Lee C, Li H, Li R, Royer R, Timshel PN, Walters RK, Willoughby EA, Cesarini D (2018) Gene discovery and polygenic prediction from a genome-wide association study of educational attainment in 1.1 million individuals. Nat Genet 50(8):1112–1121. https://doi.org/10.1038/s41588-018-0147-3
Article PubMed PubMed Central Google Scholar
Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, Marth G, Abecasis G, Durbin R, 1000 Genome Project Data Processing Subgroup (2009) The sequence alignment/map format and samtools. Bioinformatics 25(16):2078–2079. https://doi.org/10.1093/bioinformatics/btp352
Article PubMed PubMed Central Google Scholar
Liu M, Jiang Y, Wedow R, Li Y, Brazel DM, Chen F, Datta G, Davila-Velderrain J, McGuire D, Tian C, Zhan X, 23 and Me Research Team, HUNT All-In Psychiatry, Choquet H, Docherty AR, Faul JD, Foerster JR, Fritsche LG, Gabrielsen ME, Vrieze S (2019) Association studies of up to 12 million individuals yield new insights into the genetic etiology of tobacco and alcohol use. Nat Genet 51(2):237–244. https://doi.org/10.1038/s41588-018-0307-5
Article PubMed PubMed Central Google Scholar
Marchini J, Howie B, Myers S, McVean G, Donnelly P (2007) A new multipoint method for genome-wide association studies by imputation of genotypes. Nat Genet 39(7):906–13. https://doi.org/10.1038/ng2088
Article PubMed Google Scholar
Muthén B (1984) A general structural equation model with dichotomous, ordered categorical, and continuous latent variable indicators. Psychometrika 49(1):115–132. https://doi.org/10.1007/BF02294210
Article Google Scholar
Nagel M, Jansen PR, Stringer S, Watanabe K, de Leeuw CA, Bryois J, Savage JE, Hammerschlag AR, Skene NG, Muñoz-Manchado AB, 23andMe Research Team, White T, Tiemeier H, Linnarsson S, Hjerling-Leffler J, Polderman TJC, Sullivan PF, van der Sluis S, Posthuma D (2018) Meta-analysis of genome-wide association studies for neuroticism in 449,484 individuals identifies novel genetic loci and pathways. Nat Genet 50(7):920–927 https://doi.org/10.1038/s41588-018-0151-7
Article PubMed Google Scholar
Neale MC, Hunter MD, Pritikin JN, Zahery M, Brick TR, Kirkpatrick R, Estabrook R, Bates TC, Maes H, Boker SM (2016) OpenMx 2.0: extended structural equation and statistical modeling. Psychometrika 81(2):535–549. https://doi.org/10.1007/s11336-014-9435-8
Article PubMed Google Scholar
Nelson MR, Tipney H, Painter JL, Shen J, Nicoletti P, Shen Y, Floratos A, Sham PC, Li MJ, Wang J, Cardon LR, Whittaker JC, Sanseau P (2015) The support of human genetic evidence for approved drug indications. Nat Genet 47(8):856–60. https://doi.org/10.1038/ng.3314
Article PubMed Google Scholar
Olsson U (1979) Maximum likelihood estimation of the polychoric correlation coefficient. Psychometrika 44(4):443–460. https://doi.org/10.1007/BF02296207
Article Google Scholar
Pritikin JN, Brick TR, Neale MC (2018) Multivariate normal maximum likelihood with both ordinal and continuous variables, and data missing at random. Behav Res Methods 50(2):395–401. https://doi.org/10.3758/s13428-017-1011-6
Article Google Scholar
Pritikin JN, Schmitt JE, Neale MC (2019) Cloud computing for voxel-wise SEM analysis of MRI data. Struct Equ Model 26(3):470–480. https://doi.org/10.1080/10705511.2018.1521285
Article Google Scholar
Pruim RJ, Welch RP, Sanna S, Teslovich TM, Chines PS, Gliedt TP, Boehnke M, Abecasis GR, Willer CJ (2010) Locuszoom: regional visualization of genome-wide association scan results. Bioinformatics 26(18):2336–7. https://doi.org/10.1093/bioinformatics/btq419
Article PubMed PubMed Central Google Scholar
Purcell S, Neale B, Todd-Brown K, Thomas L, Ferreira MAR, Bender D, Maller J, Sklar P, de Bakker PIW, Daly MJ, Sham PC (2007) Plink: a tool set for whole-genome association and population-based linkage analyses. Am J Hum Genet 81(3):559–75. https://doi.org/10.1086/519795
Article PubMed PubMed Central Google Scholar
R Core Team (2014) R: a language and environment for statistical computing. R Foundation for Statistical Computing, Vienna
Google Scholar
Ripke S, O’Dushlaine C, Chambert K, Moran JL, Kähler AK, Akterin S, Bergen SE, Collins AL, Crowley JJ, Fromer M, Kim Y, Lee SH, Magnusson PKE, Sanchez N, Stahl EA, Williams S, Wray NR, Xia K, Bettella F, Sullivan PF (2013) Genome-wide association analysis identifies 13 new risk loci for schizophrenia. Nat Genet 45(10):1150–9. https://doi.org/10.1038/ng.2742
Article PubMed PubMed Central Google Scholar
Rubin DB (1976) Inference and missing data. Biometrika 63(3):581–592. https://doi.org/10.2307/2335739
Article Google Scholar
Sudlow C, Gallacher J, Allen N, Beral V, Burton P, Danesh J, Downey P, Elliott P, Green J, Landray M, Liu B, Matthews P, Ong G, Pell J, Silman A, Young A, Sprosen T, Peakman T, Collins R (2015) Uk biobank: an open access resource for identifying the causes of a wide range of complex diseases of middle and old age. PLoS Med 12(3):e1001779. https://doi.org/10.1371/journal.pmed.1001779
Article PubMed PubMed Central Google Scholar
Turner S (2014) Qqman: an r package for visualizing gwas results using q-q and manhattan plots. biorXiv. https://doi.org/10.1101/005165.
van der Sluis S, Posthuma D, Dolan CV (2013) Tates: efficient multivariate genotype-phenotype analysis for genome-wide association studies. PLoS Genet 9(1):e1003235. https://doi.org/10.1371/journal.pgen.1003235
Article PubMed PubMed Central Google Scholar
Verhulst B, Neale MC (2021) Best practices for binary or ordinal data analysis. Behav Genet. https://doi.org/10.1037/a002824
Article PubMed PubMed Central Google Scholar
Verhulst B, Maes HH, Neale MC (2017) Gw-sem: a statistical package to conduct genome-wide structural equation modeling. Behav Genet 47(3):345–359. https://doi.org/10.1007/s10519-017-9842-6
Article PubMed PubMed Central Google Scholar
Verhulst B, Pritikin JN, Clifford J, Prom-Wormley EC (Under Review). The importance of genetic marginal effects for the interpretation of gene-environment interactions in the genome wide association studies (gwas). Behav Genet
von Oertzen T, Brandmaier A, Tsang S (2015) Structural equation modeling with nyx. Struct Equ Model 22(1):148–161
Article Google Scholar
Wray NR, Ripke S, Mattheisen M, Trzaskowski M, Byrne EM, Abdellaoui A, Adams MJ, Agerbo E, Air TM, Andlauer TMF, Bacanu S-A, Bækvad-Hansen M, Beekman AFT, Bigdeli TB, Binder EB, Blackwood DRH, Bryois J, Buttenschøn HN, Bybjerg-Grauholm J, Major Depressive Disorder Working Group of the Psychiatric Genomics Consortium (2018) Genome-wide association analyses identify 44 risk variants and refine the genetic architecture of major depression. Nat Genet 50(5):668–681. https://doi.org/10.1038/s41588-018-0090-3
Article PubMed PubMed Central Google Scholar
Xue A, Wu Y, Zhu Z, Zhang F, Kemper KE, Zheng Z, Yengo L, Lloyd-Jones LR, Sidorenko J, Wu Y, eQTLGen Consortium, McRae AF, Visscher PM, Zeng J, Yang J (2018) Genome-wide association analyses identify 143 risk variants and putative regulatory mechanisms for type 2 diabetes. Nat Commun 9(1):2941. https://doi.org/10.1038/s41467-018-04951-w
Article PubMed PubMed Central Google Scholar
Yengo L, Sidorenko J, Kemper KE, Zheng Z, Wood AR, Weedon MN, Frayling TM, Hirschhorn J, Yang J, Visscher PM, GIANT Consortium (2018) Meta-analysis of genome-wide association studies for height and body mass index in 700000 individuals of european ancestry. Hum Mol Genet 27(20):3641–3649. https://doi.org/10.1093/hmg/ddy271
Article PubMed PubMed Central Google Scholar
Zheng J, Erzurumluoglu AM, Elsworth BL, Kemp JP, Howe L, Haycock PC, Hemani G, Tansey K, Laurin C, Early Genetics and Lifecourse Epidemiology (EAGLE) Eczema Consortium, Pourcain BS, Warrington NM, Finucane HK, Price AL, Bulik-Sullivan BK, Anttila V, Paternoster L, Gaunt TR, Evans DM, Neale BM (2017) Ld hub: a centralized database and web interface to perform ld score regression that maximizes the potential of summary level gwas data for snp heritability and genetic correlation analysis. Bioinformatics 33(2):272–279. https://doi.org/10.1093/bioinformatics/btw613
Article PubMed Google Scholar
Zhou X, Stephens M (2012) Genome-wide efficient mixed-model analysis for association studies. Nat Genet 44(7):821–4. https://doi.org/10.1038/ng.2310
Article PubMed PubMed Central Google Scholar
Zhou X, Stephens M (2014) Efficient multivariate linear mixed model algorithms for genome-wide association studies. Nat Methods 11(4):407–9. https://doi.org/10.1038/nmeth.2848
Article PubMed PubMed Central Google Scholar

Download references

Acknowledgements

The authors would like to express our deepest gratitude to the anonymous reviewers for their invaluable comments as reviewers of this manuscript that undoubtedly improved the overall quality of the manuscript.

Funding

MCN was supported by NIDA Grant R01-DA018673. JNP was supported by NIDA Grant R25-DA-26119 (PI: Neale).

Author information

Authors and Affiliations

The Department of Psychiatry, Virginia Commonwealth University, Richmond, USA
Joshua N. Pritikin & Michael C. Neale
The Virginia Institute for Psychiatric and Behavioral Genetics, Virginia Commonwealth University, Richmond, USA
Joshua N. Pritikin & Michael C. Neale
The Department of Human and Molecular Genetics, Virginia Commonwealth University, Richmond, USA
Michael C. Neale
The Division of Epidemiology, Department of Family Medicine and Population Health, Virginia Commonwealth University, Richmond, USA
Elizabeth C. Prom-Wormley
The Department of Psychiatry and Behavioral Sciences, Texas A&M University, College Station, USA
Shaunna L. Clark & Brad Verhulst

Authors

Joshua N. Pritikin
View author publications
You can also search for this author in PubMed Google Scholar
Michael C. Neale
View author publications
You can also search for this author in PubMed Google Scholar
Elizabeth C. Prom-Wormley
View author publications
You can also search for this author in PubMed Google Scholar
Shaunna L. Clark
View author publications
You can also search for this author in PubMed Google Scholar
Brad Verhulst
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Brad Verhulst.

Ethics declarations

Conflict of interest

Joshua N. Pritikin, Michael C. Neale, Elizabeth C. Prom-Wormley, Shaunna L. Clark, and Brad Verhulst declare that they have no conflicts of interest related to the publication of this article.

Ethical approval

The data used for the demonstration section of this study were obtained from the UK Biobank (Application Number 40967) and involved secondary data analysis. As no identifying information was transfered, the data was not deemed “Human Subjects Data”, and appropriate human subjects waivers were obtained by the authors.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Edited by Sarah Medland.

Supplementary Information

Below is the link to the electronic supplementary material.

Electronic supplementary material 1 (RNW 60 kb)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Pritikin, J.N., Neale, M.C., Prom-Wormley, E.C. et al. GW-SEM 2.0: Efficient, Flexible, and Accessible Multivariate GWAS. Behav Genet 51, 343–357 (2021). https://doi.org/10.1007/s10519-021-10043-1

Download citation

Received: 17 June 2020
Accepted: 18 January 2021
Published: 19 February 2021
Issue Date: May 2021
DOI: https://doi.org/10.1007/s10519-021-10043-1

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

GW-SEM 2.0: Efficient, Flexible, and Accessible Multivariate GWAS

Abstract

Access this article

Similar content being viewed by others

GW-SEM: A Statistical Package to Conduct Genome-Wide Structural Equation Modeling

Guidelines for Evaluating the Comparability of Down-Sampled GWAS Summary Statistics

Beyond the factor indeterminacy problem using genome-wide association data

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Ethical approval

Additional information

Publisher's Note

Supplementary Information

Electronic supplementary material 1 (RNW 60 kb)

Rights and permissions

About this article

Cite this article

Keywords

Navigation

GW-SEM 2.0: Efficient, Flexible, and Accessible Multivariate GWAS

Abstract

Access this article

Similar content being viewed by others

GW-SEM: A Statistical Package to Conduct Genome-Wide Structural Equation Modeling

Guidelines for Evaluating the Comparability of Down-Sampled GWAS Summary Statistics

Beyond the factor indeterminacy problem using genome-wide association data

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Ethical approval

Additional information

Publisher's Note

Supplementary Information

Electronic supplementary material 1 (RNW 60 kb)

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation