Combining Structural-Equation Modeling with Genomic-Relatedness-Matrix Restricted Maximum Likelihood in OpenMx

Kirkpatrick, Robert M.; Pritikin, Joshua N.; Hunter, Michael D.; Neale, Michael C.

doi:10.1007/s10519-020-10037-5

Combining Structural-Equation Modeling with Genomic-Relatedness-Matrix Restricted Maximum Likelihood in OpenMx

Original Research
Published: 13 January 2021

Volume 51, pages 331–342, (2021)
Cite this article

Behavior Genetics Aims and scope Submit manuscript

Robert M. Kirkpatrick¹^nAff3,
Joshua N. Pritikin¹,
Michael D. Hunter² &
…
Michael C. Neale¹

768 Accesses
8 Citations
8 Altmetric
Explore all metrics

Abstract

There is a long history of fitting biometrical structural-equation models (SEMs) in the pregenomic behavioral-genetics literature of twin, family, and adoption studies. Recently, a method has emerged for estimating biometrical variance–covariance components based not upon the expected degree of genetic resemblance among relatives, but upon the observed degree of genetic resemblance among unrelated individuals for whom genome-wide genotypes are available—genomic-relatedness-matrix restricted maximum-likelihood (GREML). However, most existing GREML software is concerned with quickly and efficiently estimating heritability coefficients, genetic correlations, and so on, rather than with allowing the user to fit SEMs to multitrait samples of genotyped participants. We therefore introduce a feature in the OpenMx package, “mxGREML”, designed to fit the biometrical SEMs from the pregenomic era in present-day genomic study designs. We explain the additional functionality this new feature has brought to OpenMx, and how the new functionality works. We provide an illustrative example of its use. We discuss the feature’s current limitations, and our plans for its further development.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

RMSEA, CFI, and TLI in structural equation modeling with ordered categorical data: The story they tell depends on the estimation methods

Article 04 June 2018

Yan Xia & Yanyun Yang

A new criterion for assessing discriminant validity in variance-based structural equation modeling

Article Open access 22 August 2014

Jörg Henseler, Christian M. Ringle & Marko Sarstedt

Estimating power in (generalized) linear mixed models: An open introduction and tutorial in R

Article Open access 05 May 2021

Levi Kumle, Melissa L.-H. Võ & Dejan Draschkow

Notes

Here, “raw data” is meant in the OpenMx sense, i.e. “not covariance-matrix input” (which is commonly used in SEM). Although less than ideal, it is possible to run an mxGREML analysis without raw genotypic or phenotypic data. The data’s owner would need to provide the user with one or more GRMs calculated from raw genotypes, and residuals for one or more phenotypes corrected for covariates. In such a case, the residuals would be what populates y, and X would consist only of constants.
mxGREML analyses of ordinal-threshold traits is the topic of a forthcoming manuscript.
As pointed out to us by an anonymous referee, one consequence of this design assumption is that it is not straightforward to incorporate regressions among endogenous variables in an mxGREML model, since doing so would require the corresponding regression coefficients to appear in both the model-expected mean vector and covariance matrix. That is a limitation inherent to REML, and is not specific to OpenMx. The referee suggested that there might be ways to circumvent this limitation, such as mean-centering manifest endogenous variables prior to mxGREML analysis; another possibility might be to conduct the desired regressions outside of OpenMx, and analyze the resulting residuals in the mxGREML model. To date, we have not explored such workarounds. One approach to endogenous-variable regression that will certainly work is to analyze y as a dataset with 1 row and np columns, using the pre-existing mxExpectationNormal() and mxFitFunctionML(), as they allow the user to freely and explicitly specify the model-expected mean vector (e.g., Eaves et al. 2014).
See below, under “Customization: Data-handling”.
As of this writing, the GREML fitfunction requires a partial derivative for all (or none) of the model’s explicit free parameters, though that requirement will be relaxed in the future. It is true that providing a derivative of V for every free parameter can require a fair amount of input from the user—see, for example, script #13 in Table 1, which has 16 free parameters.
To give the reader a sense of scale: on a computing cluster (Intel Xeon E5-2680 v4 CPU at 2.4 GHz), we recently ran script #11 (a five-timepoint latent-growth model) from Table 1, except edited to have a sample size of 4000 and to use 8 processing threads. The job used about 55 GB of memory, and OpenMx’s running time was slightly under 20 h.

References

Abadi M, Agarwal A, Barham P, Brevdo E, Chen Z et al (2015). TensorFlow: large-scale machine learning on heterogeneous systems. White paper and software available at tensorflow.org
Benjamin DJ, Cesarini D, van der Loos MJHM, Dawes CT, Koellinger PD, Magnusson PKE et al (2012) The genetic architecture of economic and political preferences. PNAS 109(21):8026–8031. https://doi.org/10.1073/pnas.1120666109
Article PubMed PubMed Central Google Scholar
Benyamin B, St Pourcaine B, Davis OS, Davies G, Hansell NK et al (2014) Childhood intelligence is heritable, highly polygenic and associated with FNBP1L. Mol Psychiatry 19:253–254. https://doi.org/10.1038/mp.2012.184
Article PubMed Google Scholar
Boker S, Neale M, Maes H, Wilde M, Spiegel M et al (2011) OpenMx: an open source extended structural equation modeling framework. Psychometrika 76(2):306–317
Article PubMed PubMed Central Google Scholar
Carpenter B, Gelman A, Hoffman MD, Lee D, Goodrich B et al (2017) Stan: a probabilistic programming language. J Stat Softw. https://doi.org/10.18637/jss.v076.i01
Article Google Scholar
Davies G, Tenesa A, Payton A, Yang J, Harris SE et al (2011) Genome-wide association studies establish that human intelligence is highly heritable and polygenic. Mol Psychiatry 16:996–1005
Article PubMed PubMed Central Google Scholar
DeFries JC, Fulker DW (1985) Multiple regression analysis of twin data. Behav Genet 15(5):467–473
Article PubMed Google Scholar
Eaves LJ, St Pourcain B, Davey Smith G, York TP, Evans DE (2014) Resolving the effects of maternal and offspring genotype on dyadic outcomes in genome wide complex trait analysis (M-GCTA”). Behav Genet 44:445–455
Article PubMed PubMed Central Google Scholar
Gaugler T, Klei L, Sanders SJ, Bodea CA, Goldberg AP et al (2014) Most genetic risk for autism resides with common variation. Nat Genet 46(8):881–885
Article PubMed PubMed Central Google Scholar
Gill PE, Murray W, Saunders MA, Wright MH (2001) User’s Guide for NPSOL 5.0: a Fortran Package for Nonlinear Programming. Adapted from Stanford University Department of Operations Research Technical Report SOL 86-1, 1986. http://www.ccom.ucsd.edu/~peg/papers/npdoc.pdf
Gillespie NA, Eaves LJ, Maes H, Silberg JL (2015) Testing models for the contributions of genes and environment to developmental change in adolescent depression. Behav Genet 45:382–393
Article PubMed PubMed Central Google Scholar
Gilmour AR, Thompson R, Cullis BR (1995) Average information REML: an efficient algorithm for variance parameter estimation in linear mixed models. Biometrics 51(4):1440–1450
Article Google Scholar
Haworth S, Shapland CY, Hayward C, Prins BP, Felix JF et al (2019) Low-frequency variation in TP53 has large effects on head circumference and intracranial volume. Nat Commun 10:357. https://doi.org/10.1038/s41467-018-07863-x
Article PubMed PubMed Central Google Scholar
Jacob B, Guennebaud G et al (2010). Eigen v3. http://eigen.tuxfamily.org/
Johnson SG (2020) The NLopt nonlinear-optimization package, http://github.com/stevengj/nlopt
Johnson DL, Thompson R (1995) Restricted maximum likelihood estimation of variance components for univariate animal models using sparse techniques and average information. J Dairy Sci 78:449–456
Article Google Scholar
Keller MC, Medland SE, Duncan LE, Hatemi PK, Neale MC, Maes HHM, Eaves LJ (2009) Modeling extended twin family data I: description of the cascade model. Twin Res Hum Genet 12(1):8–18
Article PubMed PubMed Central Google Scholar
Kendler KS, Neale MC, Sullivan P, Corey LA, Gardner CO, Prescott CA (1999) A population-based twin study in women of smoking initiation and nicotine dependence. Psychol Med 29:299–308
Article PubMed Google Scholar
Kirkpatrick RM, McGue M, Iacono WG, Miller MB, Basu S (2014) Results of a “GWAS Plus:” general cognitive ability is substantially heritable and massively polygenic. PLoS ONE. https://doi.org/10.1371/journal.pone.0112390
Article PubMed PubMed Central Google Scholar
Kraft D (1994) Algorithm 733: TOMP—Fortran modules for optimal control calculations. ACM Trans Math Softw 20(3):262–281
Article Google Scholar
Lee SA, Cross-Disorder Group of the Psychiatric Genomics Consortium et al (2013) Genetic relationship between five psychiatric disorders estimated from genome-wide SNPs. Nat Genet 45(9):984–994
Article PubMed Google Scholar
Lee S, DeCandia TR, Ripke S, Yang J, The Schizophrenia Psychiatric Genome-Wide Association Study Consortium, The International Schizophrenia Consortium et al (2012) Estimating the proportion of variation in susceptibility to schizophrenia captured by common SNPs. Nat Genet 44(3):247–250. https://doi.org/10.1038/ng.1108
Article Google Scholar
Meyer K, Smith SP (1996) Restricted maximum likelihood estimation for animal models using derivatives of the likelihood. Genet Sel Evol 28:23–49
Article PubMed Central Google Scholar
Morandat F, Hill B, Osvald L, Vitek J (2012) Evaulating the design of the R language: objects and functions for data analysis. In: Noble J (ed) ECOOP 2012—Object-Oriented Programming. Springer Science+Business Media, New York
Morris AP, DIAGRAM Consortium et al (2012) Large-scale association analysis provides insights into the genetic architecture and pathophysiology of type 2 diabetes. Nat Genet 44:981–990. https://doi.org/10.1038/ng.2383
Article PubMed PubMed Central Google Scholar
Mulaik SA (2010) Foundations of factor analysis, 2nd edn. CRC Press, New York
Google Scholar
Neale MC, Cardon L (1992) Methodology for genetic studies of twins and families. Springer Science+Business Media, New York
Book Google Scholar
Neale MC, Hunter MD, Pritikin JN, Zahery M, Brick TR et al (2016) OpenMx 2.0: extended structural equation and statistical modeling. Psychometrika 81(2):535–549
Article PubMed Google Scholar
Pawitan Y (2013) In all likelihood: statistical modelling and inference using likelihood. Oxford University Press, Oxford
Google Scholar
Posthuma D (2009) Multivariate genetic analysis. In: Kim Y-K (ed) Handbook of behavior genetics. Springer Science+Business Media, New York, pp 47–59. https://doi.org/10.1007/978-0-387-76727-7_4
Chapter Google Scholar
Purcell S (2002) Variance components models for gene-environment interaction in twin analysis. Twin Res 5(6):554–571
Article PubMed Google Scholar
Purcell S, Neale B, Todd-Brown K, Thomas L, Ferreira MAR, Bender D et al (2007) PLINK: a tool set for whole-genome association and population-based linkage analyses. Am J Hum Genet 81:559–575. https://doi.org/10.1086/519795
Article PubMed PubMed Central Google Scholar
R Core Team (2018) R: a language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. https://www.R-project.org/
Ripke S, O’Dushlaine C, Chambert K, Moran JL, Kähler AK, Akterin S et al (2013) Genome-wide association analysis identifies 13 new risk loci for schizophrenia. Nat Genet 43(10):1150–1159
Article Google Scholar
Shapland CY, Verhoef E, Davey Smith G, Fisher SE, Verhulst B, Dale PS, St Pourcain B (2020 preprint) The multivariate genome-wide architecture of interrelated literacy, language, and working memory skills reveals distinct etiologies. bioRxiv https://doi.org/10.1101/2020.08.14.251199
Sharma G, Agarwala A, Bhattacharya B (2013) A fast parallel Gauss Jordan algorithm for matrix inversion using CUDA. Comput Struct 128:31–37
Article Google Scholar
Speed D, Hemani G, Johnson MR, Balding DJ (2013) Improved heritability estimation from genome-wide SNPs. Am J Hum Genet 91:1011–1021. https://doi.org/10.1016/j.ajhg.2012.10.010
Article Google Scholar
St Pourcain B, Eaves LJ, Ring SM, Fisher SE, Medland S, Evans DM, Davey Smith G (2018) Developmental changes within the genetic architecture of social communication behavior: a multivariate study of genetic variance in unrelated individuals. Biol Psychiat 83(7):598–606. https://doi.org/10.1016/j.biopsych.2017.09.020
Article PubMed Google Scholar
van Dongen J, Slagboom PE, Draisma HHM, Martin NG, Boomsma DI (2012) The continuing value of twin studies in the omics era. Nat Rev Genet 13:640–653
Article PubMed Google Scholar
Verhoef E, Shapland CY, Fisher SE, Dale PS, St Pourcain B (2020) The amplification of genetic factors for early vocabulary during children’s language and literacy development. J Child Psychol Psychiatry. https://doi.org/10.1111/jcpp.13327
Wainschtein P, Jain DP, Yengo L, Zheng Z, TOPMed Anthropometry Working Group, Trans-Omics for Precision Medicine Consortium et al (2019 preprint) Recovery of trait heritability from whole genome sequence data. https://doi.org/10.1101/588020
Whaley RC, Petitet A, Dongarra JJ (2001) Automated empirical optimization of software and the ATLAS project. Parallel Comput 27:3–35
Article Google Scholar
Yang J, Benyamin B, McEvoy BP, Gordon S, Henders AK, Nyholt DR et al (2010) Common SNPs explain a large proportion of the heritability for human height. Nat Genet 42(7):565–569
Article PubMed PubMed Central Google Scholar
Yang J, Lee SH, Goddard ME, Visscher PM (2011) GCTA: a tool for genome-wide complex trait analysis. Am J Hum Genet 88:76–82. https://doi.org/10.1016/j.ajhg.2010.11.011
Article PubMed PubMed Central Google Scholar
Yang J, Lee SH, Goddard ME, Visscher PM (2013) Genome-wide complex trait analysis (GCTA): methods, data analyses, and interpretations. In: Gondro C et al (eds) Genome-wide association studies and genomic prediction, methods in molecular biology, vol 1019. Springer Science+Business Media, New York

Download references

Funding

The work reported in this paper was funded by the National Institute on Drug Abuse R25DA026119.

Author information

Robert M. Kirkpatrick
Present address: Virginia Institute for Psychiatric & Behavioral Genetics, Virginia Commonwealth University, Richmond, VA, 23298-0126, USA

Authors and Affiliations

Virginia Commonwealth University, Richmond, USA
Robert M. Kirkpatrick, Joshua N. Pritikin & Michael C. Neale
Georgia Institute of Technology, Atlanta, USA
Michael D. Hunter

Authors

Robert M. Kirkpatrick
View author publications
You can also search for this author in PubMed Google Scholar
Joshua N. Pritikin
View author publications
You can also search for this author in PubMed Google Scholar
Michael D. Hunter
View author publications
You can also search for this author in PubMed Google Scholar
Michael C. Neale
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Robert M. Kirkpatrick.

Ethics declarations

Conflict of interest

Robert M. Kirkpatrick, Joshua N. Pritikin, Michael D. Hunter and Michael C. Neale declare they have no conflict of interest.

Human and animal rights and informed consent

This article does not contain any studies with human participants or animals performed by any of the authors.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Edited by David Evans.

Supplementary information

Below is the link to the electronic supplementary material.

Supplementary material 1 (PDF 694 kb)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Kirkpatrick, R.M., Pritikin, J.N., Hunter, M.D. et al. Combining Structural-Equation Modeling with Genomic-Relatedness-Matrix Restricted Maximum Likelihood in OpenMx. Behav Genet 51, 331–342 (2021). https://doi.org/10.1007/s10519-020-10037-5

Download citation

Received: 17 July 2020
Accepted: 07 December 2020
Published: 13 January 2021
Issue Date: May 2021
DOI: https://doi.org/10.1007/s10519-020-10037-5

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Combining Structural-Equation Modeling with Genomic-Relatedness-Matrix Restricted Maximum Likelihood in OpenMx

Abstract

Access this article

Similar content being viewed by others

RMSEA, CFI, and TLI in structural equation modeling with ordered categorical data: The story they tell depends on the estimation methods

A new criterion for assessing discriminant validity in variance-based structural equation modeling

Estimating power in (generalized) linear mixed models: An open introduction and tutorial in R

Notes

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Human and animal rights and informed consent

Additional information

Publisher's Note

Supplementary information

Supplementary material 1 (PDF 694 kb)

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Combining Structural-Equation Modeling with Genomic-Relatedness-Matrix Restricted Maximum Likelihood in OpenMx

Abstract

Access this article

Similar content being viewed by others

RMSEA, CFI, and TLI in structural equation modeling with ordered categorical data: The story they tell depends on the estimation methods

A new criterion for assessing discriminant validity in variance-based structural equation modeling

Estimating power in (generalized) linear mixed models: An open introduction and tutorial in R

Notes

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Human and animal rights and informed consent

Additional information

Publisher's Note

Supplementary information

Supplementary material 1 (PDF 694 kb)

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation