Skip to main content

Combining Structural-Equation Modeling with Genomic-Relatedness-Matrix Restricted Maximum Likelihood in OpenMx

Abstract

There is a long history of fitting biometrical structural-equation models (SEMs) in the pregenomic behavioral-genetics literature of twin, family, and adoption studies. Recently, a method has emerged for estimating biometrical variance–covariance components based not upon the expected degree of genetic resemblance among relatives, but upon the observed degree of genetic resemblance among unrelated individuals for whom genome-wide genotypes are available—genomic-relatedness-matrix restricted maximum-likelihood (GREML). However, most existing GREML software is concerned with quickly and efficiently estimating heritability coefficients, genetic correlations, and so on, rather than with allowing the user to fit SEMs to multitrait samples of genotyped participants. We therefore introduce a feature in the OpenMx package, “mxGREML”, designed to fit the biometrical SEMs from the pregenomic era in present-day genomic study designs. We explain the additional functionality this new feature has brought to OpenMx, and how the new functionality works. We provide an illustrative example of its use. We discuss the feature’s current limitations, and our plans for its further development.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2

Notes

  1. 1.

    Here, “raw data” is meant in the OpenMx sense, i.e. “not covariance-matrix input” (which is commonly used in SEM). Although less than ideal, it is possible to run an mxGREML analysis without raw genotypic or phenotypic data. The data’s owner would need to provide the user with one or more GRMs calculated from raw genotypes, and residuals for one or more phenotypes corrected for covariates. In such a case, the residuals would be what populates y, and X would consist only of constants.

  2. 2.

    mxGREML analyses of ordinal-threshold traits is the topic of a forthcoming manuscript.

  3. 3.

    As pointed out to us by an anonymous referee, one consequence of this design assumption is that it is not straightforward to incorporate regressions among endogenous variables in an mxGREML model, since doing so would require the corresponding regression coefficients to appear in both the model-expected mean vector and covariance matrix. That is a limitation inherent to REML, and is not specific to OpenMx. The referee suggested that there might be ways to circumvent this limitation, such as mean-centering manifest endogenous variables prior to mxGREML analysis; another possibility might be to conduct the desired regressions outside of OpenMx, and analyze the resulting residuals in the mxGREML model. To date, we have not explored such workarounds. One approach to endogenous-variable regression that will certainly work is to analyze y as a dataset with 1 row and np columns, using the pre-existing mxExpectationNormal() and mxFitFunctionML(), as they allow the user to freely and explicitly specify the model-expected mean vector (e.g., Eaves et al. 2014).

  4. 4.

    See below, under “Customization: Data-handling”.

  5. 5.

    As of this writing, the GREML fitfunction requires a partial derivative for all (or none) of the model’s explicit free parameters, though that requirement will be relaxed in the future. It is true that providing a derivative of V for every free parameter can require a fair amount of input from the user—see, for example, script #13 in Table 1, which has 16 free parameters.

  6. 6.

    To give the reader a sense of scale: on a computing cluster (Intel Xeon E5-2680 v4 CPU at 2.4 GHz), we recently ran script #11 (a five-timepoint latent-growth model) from Table 1, except edited to have a sample size of 4000 and to use 8 processing threads. The job used about 55 GB of memory, and OpenMx’s running time was slightly under 20 h.

References

  1. Abadi M, Agarwal A, Barham P, Brevdo E, Chen Z et al (2015). TensorFlow: large-scale machine learning on heterogeneous systems. White paper and software available at tensorflow.org

  2. Benjamin DJ, Cesarini D, van der Loos MJHM, Dawes CT, Koellinger PD, Magnusson PKE et al (2012) The genetic architecture of economic and political preferences. PNAS 109(21):8026–8031. https://doi.org/10.1073/pnas.1120666109

    Article  PubMed  Google Scholar 

  3. Benyamin B, St Pourcaine B, Davis OS, Davies G, Hansell NK et al (2014) Childhood intelligence is heritable, highly polygenic and associated with FNBP1L. Mol Psychiatry 19:253–254. https://doi.org/10.1038/mp.2012.184

    Article  PubMed  Google Scholar 

  4. Boker S, Neale M, Maes H, Wilde M, Spiegel M et al (2011) OpenMx: an open source extended structural equation modeling framework. Psychometrika 76(2):306–317

    Article  Google Scholar 

  5. Carpenter B, Gelman A, Hoffman MD, Lee D, Goodrich B et al (2017) Stan: a probabilistic programming language. J Stat Softw. https://doi.org/10.18637/jss.v076.i01

    Article  Google Scholar 

  6. Davies G, Tenesa A, Payton A, Yang J, Harris SE et al (2011) Genome-wide association studies establish that human intelligence is highly heritable and polygenic. Mol Psychiatry 16:996–1005

    Article  Google Scholar 

  7. DeFries JC, Fulker DW (1985) Multiple regression analysis of twin data. Behav Genet 15(5):467–473

    Article  Google Scholar 

  8. Eaves LJ, St Pourcain B, Davey Smith G, York TP, Evans DE (2014) Resolving the effects of maternal and offspring genotype on dyadic outcomes in genome wide complex trait analysis (M-GCTA”). Behav Genet 44:445–455

    Article  Google Scholar 

  9. Gaugler T, Klei L, Sanders SJ, Bodea CA, Goldberg AP et al (2014) Most genetic risk for autism resides with common variation. Nat Genet 46(8):881–885

    Article  Google Scholar 

  10. Gill PE, Murray W, Saunders MA, Wright MH (2001) User’s Guide for NPSOL 5.0: a Fortran Package for Nonlinear Programming. Adapted from Stanford University Department of Operations Research Technical Report SOL 86-1, 1986. http://www.ccom.ucsd.edu/~peg/papers/npdoc.pdf

  11. Gillespie NA, Eaves LJ, Maes H, Silberg JL (2015) Testing models for the contributions of genes and environment to developmental change in adolescent depression. Behav Genet 45:382–393

    Article  Google Scholar 

  12. Gilmour AR, Thompson R, Cullis BR (1995) Average information REML: an efficient algorithm for variance parameter estimation in linear mixed models. Biometrics 51(4):1440–1450

    Article  Google Scholar 

  13. Haworth S, Shapland CY, Hayward C, Prins BP, Felix JF et al (2019) Low-frequency variation in TP53 has large effects on head circumference and intracranial volume. Nat Commun 10:357. https://doi.org/10.1038/s41467-018-07863-x

    Article  PubMed  PubMed Central  Google Scholar 

  14. Jacob B, Guennebaud G et al (2010). Eigen v3. http://eigen.tuxfamily.org/

  15. Johnson SG (2020) The NLopt nonlinear-optimization package, http://github.com/stevengj/nlopt

  16. Johnson DL, Thompson R (1995) Restricted maximum likelihood estimation of variance components for univariate animal models using sparse techniques and average information. J Dairy Sci 78:449–456

    Article  Google Scholar 

  17. Keller MC, Medland SE, Duncan LE, Hatemi PK, Neale MC, Maes HHM, Eaves LJ (2009) Modeling extended twin family data I: description of the cascade model. Twin Res Hum Genet 12(1):8–18

    Article  Google Scholar 

  18. Kendler KS, Neale MC, Sullivan P, Corey LA, Gardner CO, Prescott CA (1999) A population-based twin study in women of smoking initiation and nicotine dependence. Psychol Med 29:299–308

    Article  Google Scholar 

  19. Kirkpatrick RM, McGue M, Iacono WG, Miller MB, Basu S (2014) Results of a “GWAS Plus:” general cognitive ability is substantially heritable and massively polygenic. PLoS ONE. https://doi.org/10.1371/journal.pone.0112390

    Article  PubMed  PubMed Central  Google Scholar 

  20. Kraft D (1994) Algorithm 733: TOMP—Fortran modules for optimal control calculations. ACM Trans Math Softw 20(3):262–281

    Article  Google Scholar 

  21. Lee SA, Cross-Disorder Group of the Psychiatric Genomics Consortium et al (2013) Genetic relationship between five psychiatric disorders estimated from genome-wide SNPs. Nat Genet 45(9):984–994

    Article  Google Scholar 

  22. Lee S, DeCandia TR, Ripke S, Yang J, The Schizophrenia Psychiatric Genome-Wide Association Study Consortium, The International Schizophrenia Consortium et al (2012) Estimating the proportion of variation in susceptibility to schizophrenia captured by common SNPs. Nat Genet 44(3):247–250. https://doi.org/10.1038/ng.1108

    Article  Google Scholar 

  23. Meyer K, Smith SP (1996) Restricted maximum likelihood estimation for animal models using derivatives of the likelihood. Genet Sel Evol 28:23–49

    Article  Google Scholar 

  24. Morandat F, Hill B, Osvald L, Vitek J (2012) Evaulating the design of the R language: objects and functions for data analysis. In: Noble J (ed) ECOOP 2012—Object-Oriented Programming. Springer Science+Business Media, New York

  25. Morris AP, DIAGRAM Consortium et al (2012) Large-scale association analysis provides insights into the genetic architecture and pathophysiology of type 2 diabetes. Nat Genet 44:981–990. https://doi.org/10.1038/ng.2383

    Article  PubMed  PubMed Central  Google Scholar 

  26. Mulaik SA (2010) Foundations of factor analysis, 2nd edn. CRC Press, New York

    Google Scholar 

  27. Neale MC, Cardon L (1992) Methodology for genetic studies of twins and families. Springer Science+Business Media, New York

    Book  Google Scholar 

  28. Neale MC, Hunter MD, Pritikin JN, Zahery M, Brick TR et al (2016) OpenMx 2.0: extended structural equation and statistical modeling. Psychometrika 81(2):535–549

    Article  Google Scholar 

  29. Pawitan Y (2013) In all likelihood: statistical modelling and inference using likelihood. Oxford University Press, Oxford

    Google Scholar 

  30. Posthuma D (2009) Multivariate genetic analysis. In: Kim Y-K (ed) Handbook of behavior genetics. Springer Science+Business Media, New York, pp 47–59. https://doi.org/10.1007/978-0-387-76727-7_4

    Chapter  Google Scholar 

  31. Purcell S (2002) Variance components models for gene-environment interaction in twin analysis. Twin Res 5(6):554–571

    Article  Google Scholar 

  32. Purcell S, Neale B, Todd-Brown K, Thomas L, Ferreira MAR, Bender D et al (2007) PLINK: a tool set for whole-genome association and population-based linkage analyses. Am J Hum Genet 81:559–575. https://doi.org/10.1086/519795

    Article  PubMed  PubMed Central  Google Scholar 

  33. R Core Team (2018) R: a language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. https://www.R-project.org/

  34. Ripke S, O’Dushlaine C, Chambert K, Moran JL, Kähler AK, Akterin S et al (2013) Genome-wide association analysis identifies 13 new risk loci for schizophrenia. Nat Genet 43(10):1150–1159

    Article  Google Scholar 

  35. Shapland CY, Verhoef E, Davey Smith G, Fisher SE, Verhulst B, Dale PS, St Pourcain B (2020 preprint) The multivariate genome-wide architecture of interrelated literacy, language, and working memory skills reveals distinct etiologies. bioRxiv https://doi.org/10.1101/2020.08.14.251199

  36. Sharma G, Agarwala A, Bhattacharya B (2013) A fast parallel Gauss Jordan algorithm for matrix inversion using CUDA. Comput Struct 128:31–37

    Article  Google Scholar 

  37. Speed D, Hemani G, Johnson MR, Balding DJ (2013) Improved heritability estimation from genome-wide SNPs. Am J Hum Genet 91:1011–1021. https://doi.org/10.1016/j.ajhg.2012.10.010

    Article  Google Scholar 

  38. St Pourcain B, Eaves LJ, Ring SM, Fisher SE, Medland S, Evans DM, Davey Smith G (2018) Developmental changes within the genetic architecture of social communication behavior: a multivariate study of genetic variance in unrelated individuals. Biol Psychiat 83(7):598–606. https://doi.org/10.1016/j.biopsych.2017.09.020

    Article  PubMed  Google Scholar 

  39. van Dongen J, Slagboom PE, Draisma HHM, Martin NG, Boomsma DI (2012) The continuing value of twin studies in the omics era. Nat Rev Genet 13:640–653

    Article  Google Scholar 

  40. Verhoef E, Shapland CY, Fisher SE, Dale PS, St Pourcain B (2020) The amplification of genetic factors for early vocabulary during children’s language and literacy development. J Child Psychol Psychiatry. https://doi.org/10.1111/jcpp.13327

  41. Wainschtein P, Jain DP, Yengo L, Zheng Z, TOPMed Anthropometry Working Group, Trans-Omics for Precision Medicine Consortium et al (2019 preprint) Recovery of trait heritability from whole genome sequence data. https://doi.org/10.1101/588020

  42. Whaley RC, Petitet A, Dongarra JJ (2001) Automated empirical optimization of software and the ATLAS project. Parallel Comput 27:3–35

    Article  Google Scholar 

  43. Yang J, Benyamin B, McEvoy BP, Gordon S, Henders AK, Nyholt DR et al (2010) Common SNPs explain a large proportion of the heritability for human height. Nat Genet 42(7):565–569

    Article  Google Scholar 

  44. Yang J, Lee SH, Goddard ME, Visscher PM (2011) GCTA: a tool for genome-wide complex trait analysis. Am J Hum Genet 88:76–82. https://doi.org/10.1016/j.ajhg.2010.11.011

    Article  PubMed  PubMed Central  Google Scholar 

  45. Yang J, Lee SH, Goddard ME, Visscher PM (2013) Genome-wide complex trait analysis (GCTA): methods, data analyses, and interpretations. In: Gondro C et al (eds) Genome-wide association studies and genomic prediction, methods in molecular biology, vol 1019. Springer Science+Business Media, New York

Download references

Funding

The work reported in this paper was funded by the National Institute on Drug Abuse R25DA026119.

Author information

Affiliations

Authors

Corresponding author

Correspondence to Robert M. Kirkpatrick.

Ethics declarations

Conflict of interest

Robert M. Kirkpatrick, Joshua N. Pritikin, Michael D. Hunter and Michael C. Neale declare they have no conflict of interest.

Human and animal rights and informed consent

This article does not contain any studies with human participants or animals performed by any of the authors.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Edited by David Evans.

Supplementary information

Below is the link to the electronic supplementary material.

Supplementary material 1 (PDF 694 kb)

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Kirkpatrick, R.M., Pritikin, J.N., Hunter, M.D. et al. Combining Structural-Equation Modeling with Genomic-Relatedness-Matrix Restricted Maximum Likelihood in OpenMx. Behav Genet 51, 331–342 (2021). https://doi.org/10.1007/s10519-020-10037-5

Download citation

Keywords

  • Structural equation modeling
  • Genomics
  • Statistical methods
  • Software