Abstract
There is a long history of fitting biometrical structural-equation models (SEMs) in the pregenomic behavioral-genetics literature of twin, family, and adoption studies. Recently, a method has emerged for estimating biometrical variance–covariance components based not upon the expected degree of genetic resemblance among relatives, but upon the observed degree of genetic resemblance among unrelated individuals for whom genome-wide genotypes are available—genomic-relatedness-matrix restricted maximum-likelihood (GREML). However, most existing GREML software is concerned with quickly and efficiently estimating heritability coefficients, genetic correlations, and so on, rather than with allowing the user to fit SEMs to multitrait samples of genotyped participants. We therefore introduce a feature in the OpenMx package, “mxGREML”, designed to fit the biometrical SEMs from the pregenomic era in present-day genomic study designs. We explain the additional functionality this new feature has brought to OpenMx, and how the new functionality works. We provide an illustrative example of its use. We discuss the feature’s current limitations, and our plans for its further development.
Similar content being viewed by others
Notes
Here, “raw data” is meant in the OpenMx sense, i.e. “not covariance-matrix input” (which is commonly used in SEM). Although less than ideal, it is possible to run an mxGREML analysis without raw genotypic or phenotypic data. The data’s owner would need to provide the user with one or more GRMs calculated from raw genotypes, and residuals for one or more phenotypes corrected for covariates. In such a case, the residuals would be what populates y, and X would consist only of constants.
mxGREML analyses of ordinal-threshold traits is the topic of a forthcoming manuscript.
As pointed out to us by an anonymous referee, one consequence of this design assumption is that it is not straightforward to incorporate regressions among endogenous variables in an mxGREML model, since doing so would require the corresponding regression coefficients to appear in both the model-expected mean vector and covariance matrix. That is a limitation inherent to REML, and is not specific to OpenMx. The referee suggested that there might be ways to circumvent this limitation, such as mean-centering manifest endogenous variables prior to mxGREML analysis; another possibility might be to conduct the desired regressions outside of OpenMx, and analyze the resulting residuals in the mxGREML model. To date, we have not explored such workarounds. One approach to endogenous-variable regression that will certainly work is to analyze y as a dataset with 1 row and np columns, using the pre-existing mxExpectationNormal() and mxFitFunctionML(), as they allow the user to freely and explicitly specify the model-expected mean vector (e.g., Eaves et al. 2014).
See below, under “Customization: Data-handling”.
As of this writing, the GREML fitfunction requires a partial derivative for all (or none) of the model’s explicit free parameters, though that requirement will be relaxed in the future. It is true that providing a derivative of V for every free parameter can require a fair amount of input from the user—see, for example, script #13 in Table 1, which has 16 free parameters.
To give the reader a sense of scale: on a computing cluster (Intel Xeon E5-2680 v4 CPU at 2.4 GHz), we recently ran script #11 (a five-timepoint latent-growth model) from Table 1, except edited to have a sample size of 4000 and to use 8 processing threads. The job used about 55 GB of memory, and OpenMx’s running time was slightly under 20 h.
References
Abadi M, Agarwal A, Barham P, Brevdo E, Chen Z et al (2015). TensorFlow: large-scale machine learning on heterogeneous systems. White paper and software available at tensorflow.org
Benjamin DJ, Cesarini D, van der Loos MJHM, Dawes CT, Koellinger PD, Magnusson PKE et al (2012) The genetic architecture of economic and political preferences. PNAS 109(21):8026–8031. https://doi.org/10.1073/pnas.1120666109
Benyamin B, St Pourcaine B, Davis OS, Davies G, Hansell NK et al (2014) Childhood intelligence is heritable, highly polygenic and associated with FNBP1L. Mol Psychiatry 19:253–254. https://doi.org/10.1038/mp.2012.184
Boker S, Neale M, Maes H, Wilde M, Spiegel M et al (2011) OpenMx: an open source extended structural equation modeling framework. Psychometrika 76(2):306–317
Carpenter B, Gelman A, Hoffman MD, Lee D, Goodrich B et al (2017) Stan: a probabilistic programming language. J Stat Softw. https://doi.org/10.18637/jss.v076.i01
Davies G, Tenesa A, Payton A, Yang J, Harris SE et al (2011) Genome-wide association studies establish that human intelligence is highly heritable and polygenic. Mol Psychiatry 16:996–1005
DeFries JC, Fulker DW (1985) Multiple regression analysis of twin data. Behav Genet 15(5):467–473
Eaves LJ, St Pourcain B, Davey Smith G, York TP, Evans DE (2014) Resolving the effects of maternal and offspring genotype on dyadic outcomes in genome wide complex trait analysis (M-GCTA”). Behav Genet 44:445–455
Gaugler T, Klei L, Sanders SJ, Bodea CA, Goldberg AP et al (2014) Most genetic risk for autism resides with common variation. Nat Genet 46(8):881–885
Gill PE, Murray W, Saunders MA, Wright MH (2001) User’s Guide for NPSOL 5.0: a Fortran Package for Nonlinear Programming. Adapted from Stanford University Department of Operations Research Technical Report SOL 86-1, 1986. http://www.ccom.ucsd.edu/~peg/papers/npdoc.pdf
Gillespie NA, Eaves LJ, Maes H, Silberg JL (2015) Testing models for the contributions of genes and environment to developmental change in adolescent depression. Behav Genet 45:382–393
Gilmour AR, Thompson R, Cullis BR (1995) Average information REML: an efficient algorithm for variance parameter estimation in linear mixed models. Biometrics 51(4):1440–1450
Haworth S, Shapland CY, Hayward C, Prins BP, Felix JF et al (2019) Low-frequency variation in TP53 has large effects on head circumference and intracranial volume. Nat Commun 10:357. https://doi.org/10.1038/s41467-018-07863-x
Jacob B, Guennebaud G et al (2010). Eigen v3. http://eigen.tuxfamily.org/
Johnson SG (2020) The NLopt nonlinear-optimization package, http://github.com/stevengj/nlopt
Johnson DL, Thompson R (1995) Restricted maximum likelihood estimation of variance components for univariate animal models using sparse techniques and average information. J Dairy Sci 78:449–456
Keller MC, Medland SE, Duncan LE, Hatemi PK, Neale MC, Maes HHM, Eaves LJ (2009) Modeling extended twin family data I: description of the cascade model. Twin Res Hum Genet 12(1):8–18
Kendler KS, Neale MC, Sullivan P, Corey LA, Gardner CO, Prescott CA (1999) A population-based twin study in women of smoking initiation and nicotine dependence. Psychol Med 29:299–308
Kirkpatrick RM, McGue M, Iacono WG, Miller MB, Basu S (2014) Results of a “GWAS Plus:” general cognitive ability is substantially heritable and massively polygenic. PLoS ONE. https://doi.org/10.1371/journal.pone.0112390
Kraft D (1994) Algorithm 733: TOMP—Fortran modules for optimal control calculations. ACM Trans Math Softw 20(3):262–281
Lee SA, Cross-Disorder Group of the Psychiatric Genomics Consortium et al (2013) Genetic relationship between five psychiatric disorders estimated from genome-wide SNPs. Nat Genet 45(9):984–994
Lee S, DeCandia TR, Ripke S, Yang J, The Schizophrenia Psychiatric Genome-Wide Association Study Consortium, The International Schizophrenia Consortium et al (2012) Estimating the proportion of variation in susceptibility to schizophrenia captured by common SNPs. Nat Genet 44(3):247–250. https://doi.org/10.1038/ng.1108
Meyer K, Smith SP (1996) Restricted maximum likelihood estimation for animal models using derivatives of the likelihood. Genet Sel Evol 28:23–49
Morandat F, Hill B, Osvald L, Vitek J (2012) Evaulating the design of the R language: objects and functions for data analysis. In: Noble J (ed) ECOOP 2012—Object-Oriented Programming. Springer Science+Business Media, New York
Morris AP, DIAGRAM Consortium et al (2012) Large-scale association analysis provides insights into the genetic architecture and pathophysiology of type 2 diabetes. Nat Genet 44:981–990. https://doi.org/10.1038/ng.2383
Mulaik SA (2010) Foundations of factor analysis, 2nd edn. CRC Press, New York
Neale MC, Cardon L (1992) Methodology for genetic studies of twins and families. Springer Science+Business Media, New York
Neale MC, Hunter MD, Pritikin JN, Zahery M, Brick TR et al (2016) OpenMx 2.0: extended structural equation and statistical modeling. Psychometrika 81(2):535–549
Pawitan Y (2013) In all likelihood: statistical modelling and inference using likelihood. Oxford University Press, Oxford
Posthuma D (2009) Multivariate genetic analysis. In: Kim Y-K (ed) Handbook of behavior genetics. Springer Science+Business Media, New York, pp 47–59. https://doi.org/10.1007/978-0-387-76727-7_4
Purcell S (2002) Variance components models for gene-environment interaction in twin analysis. Twin Res 5(6):554–571
Purcell S, Neale B, Todd-Brown K, Thomas L, Ferreira MAR, Bender D et al (2007) PLINK: a tool set for whole-genome association and population-based linkage analyses. Am J Hum Genet 81:559–575. https://doi.org/10.1086/519795
R Core Team (2018) R: a language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. https://www.R-project.org/
Ripke S, O’Dushlaine C, Chambert K, Moran JL, Kähler AK, Akterin S et al (2013) Genome-wide association analysis identifies 13 new risk loci for schizophrenia. Nat Genet 43(10):1150–1159
Shapland CY, Verhoef E, Davey Smith G, Fisher SE, Verhulst B, Dale PS, St Pourcain B (2020 preprint) The multivariate genome-wide architecture of interrelated literacy, language, and working memory skills reveals distinct etiologies. bioRxiv https://doi.org/10.1101/2020.08.14.251199
Sharma G, Agarwala A, Bhattacharya B (2013) A fast parallel Gauss Jordan algorithm for matrix inversion using CUDA. Comput Struct 128:31–37
Speed D, Hemani G, Johnson MR, Balding DJ (2013) Improved heritability estimation from genome-wide SNPs. Am J Hum Genet 91:1011–1021. https://doi.org/10.1016/j.ajhg.2012.10.010
St Pourcain B, Eaves LJ, Ring SM, Fisher SE, Medland S, Evans DM, Davey Smith G (2018) Developmental changes within the genetic architecture of social communication behavior: a multivariate study of genetic variance in unrelated individuals. Biol Psychiat 83(7):598–606. https://doi.org/10.1016/j.biopsych.2017.09.020
van Dongen J, Slagboom PE, Draisma HHM, Martin NG, Boomsma DI (2012) The continuing value of twin studies in the omics era. Nat Rev Genet 13:640–653
Verhoef E, Shapland CY, Fisher SE, Dale PS, St Pourcain B (2020) The amplification of genetic factors for early vocabulary during children’s language and literacy development. J Child Psychol Psychiatry. https://doi.org/10.1111/jcpp.13327
Wainschtein P, Jain DP, Yengo L, Zheng Z, TOPMed Anthropometry Working Group, Trans-Omics for Precision Medicine Consortium et al (2019 preprint) Recovery of trait heritability from whole genome sequence data. https://doi.org/10.1101/588020
Whaley RC, Petitet A, Dongarra JJ (2001) Automated empirical optimization of software and the ATLAS project. Parallel Comput 27:3–35
Yang J, Benyamin B, McEvoy BP, Gordon S, Henders AK, Nyholt DR et al (2010) Common SNPs explain a large proportion of the heritability for human height. Nat Genet 42(7):565–569
Yang J, Lee SH, Goddard ME, Visscher PM (2011) GCTA: a tool for genome-wide complex trait analysis. Am J Hum Genet 88:76–82. https://doi.org/10.1016/j.ajhg.2010.11.011
Yang J, Lee SH, Goddard ME, Visscher PM (2013) Genome-wide complex trait analysis (GCTA): methods, data analyses, and interpretations. In: Gondro C et al (eds) Genome-wide association studies and genomic prediction, methods in molecular biology, vol 1019. Springer Science+Business Media, New York
Funding
The work reported in this paper was funded by the National Institute on Drug Abuse R25DA026119.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
Robert M. Kirkpatrick, Joshua N. Pritikin, Michael D. Hunter and Michael C. Neale declare they have no conflict of interest.
Human and animal rights and informed consent
This article does not contain any studies with human participants or animals performed by any of the authors.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Edited by David Evans.
Supplementary information
Below is the link to the electronic supplementary material.
Rights and permissions
About this article
Cite this article
Kirkpatrick, R.M., Pritikin, J.N., Hunter, M.D. et al. Combining Structural-Equation Modeling with Genomic-Relatedness-Matrix Restricted Maximum Likelihood in OpenMx. Behav Genet 51, 331–342 (2021). https://doi.org/10.1007/s10519-020-10037-5
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10519-020-10037-5