Abstract
For quantitative behavior genetic (e.g., twin) studies, Purcell proposed a novel model for testing gene-by-measured environment (GxM) interactions while accounting for gene-by-environment correlation. Rathouz et al. expanded this model into a broader class of non-linear biometric models for quantifying and testing such interactions. In this work, we propose a novel factorization of the likelihood for this class of models, and adopt numerical integration techniques to achieve model estimation, especially for those without close-form likelihood. The validity of our procedures is established through numerical simulation studies. The new procedures are illustrated in a twin study analysis of the moderating effect of birth weight on the genetic influences on childhood anxiety. A second example is given in an online appendix. Both the extant GxM models and the new non-linear models critically assume normality of all structural components, which implies continuous, but not normal, manifest response variables.
Similar content being viewed by others
Notes
References
Bates D, Mullen KM, Nash JC, Varadhan R (2012) minqa: Derivativefree optimization algorithms by quadratic approximation [Computer software manual]. Retrieved from http://cran.r-project.org/web/packages/minqa/index.html
Bennett A (2008) Gene environment interplay: nonhuman primate models in the study of resilience and vulnerability. Dev Pychobiol 50(1):48–59
Dick DM, Rose RJ, Viken RJ, Kaprio J, Koskenvuo M (2001) Exploring gene-environment interactions: socioregional moderation of alcohol use. J Abnorm Psychol 110(4):625–632
du Toit SH, Cudeck R (2009) Estimation of the nonlinear random coefficient model when some random effects are separable. Psychometrika 74(1):65–82
Eaves L (2006) Genotype x environment interaction in psychopathology: fact or artifact? Twin Res Hum Genet 9(01):1–8
Eaves L, Last K, Martin N, Jinks J (1977) A progressive approach to non-additivity and genotype-environmental covariance in the analysis of human differences. Br J Math Stat Psychol 30(1):1–42
Eaves L, Silberg J, Erkanli A (2003) Resolving multiple epigenetic pathways to adolescent depression. J Child Psychol Psychiatry 44(7):1006–1014
Jinks JL, Fulker DW (1970) Comparison of the biometrical genetical, mava, and classical approaches to the analysis of the human behavior. Psychol Bull 73(5):311–349
Johnson W (2007) Genetic and environmental influences on behavior: capturing all the interplay. Psychol Rev 114(2):423–440
Klein A, Moosbrugger H (2000) Maximum likelihood estimation of latent interaction effects with the lms method. Psychometrika 65(4):457–474
Lahey B, Applegate B, Waldman I, Loft J, Hankin B, Rick J (2004) The structure of child and adolescent psychopathology: generating new hypotheses. J Abnorm Psychol 113(3):358–385
Lahey BB, Waldman ID (2003) A developmental propensity model of the origins of conduct problems during childhood and adolescence. In Lahey BB, Moffitt TE, Caspi A (eds) Causes of conduct disorder and juvenile delinquency. Guilford Press, New York, pp 76–117
Liu Q, Pierce DA (1994) A note on gauss ą ł hermite quadrature. Biometrika 81(3):624–629
Loehlin J (1996) The cholesky approach: a cautionary note. Behav Genet 26(1):65–69
Molenaar D, Dolan CV (2014) Testing systematic genotype by environment interactions using item level data. Behav Genet 44(3):212–231
Muthén L, Muthén B (1998–2012) Mplus User’s Guide, 6th edn, Muthén & Muthén, Los Angeles, CA
Naylor JC, Smith AF (1982) Applications of a method for the efficient computation of posterior distributions. Appl Stat 31(3):214–225
Neale M, Cardon L (1992) Methodology for genetic studies of twins and families (No. 67). Springer, Berlin
Pinheiro JC, Bates DM (1995) Approximations to the log-likelihood function in the nonlinear mixed-effects model. J Comput Gr Stat 4(1):12–35
Powell MJ (2009) The bobyqa algorithm for bound constrained optimization without derivatives. Technical report, Department of Applied Mathematics and Theoretical Physics, University of Cambridge
Purcell S (2002) Variance components models for geneenvironment interaction in twin analysis. Twin Res 5(6):554–571
R Core Team (2013) R: A language and environment for statistical computing [Computer software manual]. Retrieved from http://www.R-project.org/
Rabe-Hesketh S, Skrondal A, Pickles A (2005) Maximum likelihood estimation of limited and discrete dependent variable models with nested random effects. J Econom 128(2):301–323
Raftery AE (1995) Bayesian model selection in social research. Sociol Methodol 25:111–164
Rathouz PJ, Van Hulle CA, Rodgers JL, Waldman ID, Lahey BB (2008) Specification, testing, and interpretation of gene-by-measured-environment interaction models in the presence of gene-environment correlation. Behav Genet 38(3):301–315
Rutter M, Moffitt T, Caspi A (2006) Gene-environment interplay and psychopathology: multiple varieties but real effects. J Child Psychol Psychiatry 47(3–4):226–261
Schwarz G (1978) Estimating the dimension of a model. Ann Stat 6(2):461–464
Stroud AH, Secrest D (1966) Gaussian quadrature formulas, vol 374. Prentice-Hall, Englewood Cliffs, NJ
Van Hulle CA, Lahey BB, Rathouz PJ (2013) Operating characteristics of alternative statistical methods for detecting gene-by-measured environment interaction in the presence of gene-environment correlation in twin and sibling studies. Behav Genet 43(1):71–84
Weakliem DL (1999) A critique of the bayesian information criterion for model selection. Sociol Methods Res 27(3):359–397
Weaver I, Cervoni N, Champagne F, D’Alessio A, Sharma S, Seckl J et al (2004) Epigenetic programming by maternal behavior. Nat Neurosci 7(8):847–854
Zheng H, Rathouz PJ (2013) GxM: Maximum likelihood estimation for gene-by-measured environment interaction models [Computer software manual]. Retrieved from http://cran.r-project.org/web/packages/GxM/index.html
Zheng H, Van Hulle CA, Rathouz PJ (2015) Comparing alternative biometric models with and without gene-by-measured environment interaction in behavior genetic designs: statistical operating characteristics. Behav Genet. doi:10.1007/s10519-015-9710-1
Acknowledgments
This study was funded by the NIH grant R21 MH086099 from the National Institute for Mental Health.
Conflict of interest
Authors declare that they have no conflict of interest.
Human and Animal Rights and Informed Consent
All procedures followed were in accordance with the ethical standards of the responsible committee on human experimentation (institutional and national) and with the Helsinki Declaration of 1975, as revised in 2000. Informed consent was obtained from all patients of being included in the study.
Author information
Authors and Affiliations
Corresponding author
Additional information
Edited by Gitta Lubke.
Appendices
Appendix 1: Likelihood calculation through numerical integration
Adaptive Gauss–Hermite quadrature
In the calculation of a definite integral, even when the formula for the integrand is known, it may be difficult to find an antiderivative which has a closed-form expression. In such circumstances, numerical integration methods are often applied to obtain approximate results. The Gaussian quadrature rule is one of the most widely used numerical integration techniques to approximate the integral of a function \(g(x)\) over a specified domain \({\mathcal {D}}\) with a known weighting kernel \(\phi (x)\). If the integrand \(g(x)\) can be well approximated by a polynomial of order \(2k-1\) or less, then a quadrature with \(k\) nodes suffices for a good estimate of the integral,
The nodes \(x_i\) and weights \(w_i\), \(i=1,\ldots ,k\), are uniquely determined by the domain \({\mathcal {D}}\) and the weighting kernel \(\phi (x)\) (Stroud and Secrest 1966). In the case wherein the integration domain is the real line and the integration kernel is \(\phi (x)=e^{-x^2}\), the resulting quadrature rule is known as Gauss–Hermite quadrature (GHQ).
Because of its close relationship to the normal distribution, GHQ is widely used in statistics. Adaptive GHQ (AGHQ) (Liu and Pierce 1994; Naylor and Smith 1982) arises by shifting and scaling the kernel for greater numerical accuracy, strategically placing the nodes \(x_i\) to emphasize the areas of greatest mass in the integrand function. The advantages of AGHQ over traditional GHQ are shown in the estimation of latent models with nonlinear random effects by Pinheiro and Bates (1995) and Rabe-Hesketh et al. (2005). In this work, we relocate the nodes according to the easily obtainable location and scale of the normal density. Specifically, if \(Y\sim {\mathcal {N}}(m, \sigma ^2)\) and \(g\) is a known but complicated function, the expectation of \(g(Y)\) can be calculated approximately as
using \(x=(y-m)/\sqrt{2}\sigma\). Whereas this is not “adaptive” in the strictest sense of Liu and Pierce (1994), we still use AGHQ to represent this technique because of the application of the relocation of nodes.
With regard to numerical evaluation of a multiple integral, a natural way forward is to decompose it into a sequence of nested one-dimensional quadratures and to repeatedly apply (10). Taking integration over domain \({R^p}\), we could use \(k_j\) points in the \(j\)th dimension, \(j=1,\ldots ,p\), and obtain a multi-dimensional version of AGHQ. Specifically, if \({\varvec{Y}}\) is a \(p\)-dimensional random vector which follows a multivariate normal distribution with mean vector \(\mathbf{m}\) and covariance matrix \(\Sigma\), the expectation of \(g({\varvec{Y}})\), where \(g(\cdot )\) is now a multivariate function, obtains approximately as
where \({\varvec{x_{(i)}}} = (x_{1i_1}, \ldots , x_{p\,i_p})^T\); \(x_{j1},\ldots ,x_{jk_j}\) are the nodes for the \(j\)th dimension; and the product \(w_{i_1} \cdots w_{i_p}\) is the corresponding weight for node \({\varvec{x_{(i)}}}\).
AGHQ in likelihood calculation
In the application of AGHQ to approximation of likelihood \(f(P|M;\theta )\), we incorporate distribution functions from specific models into the integration. We denote \({\varvec{y}} = (A_M,C_M)^T\) to simplify the notation. Because \(f(A_M,C_M|M)\) is a multivariate normal density function, we set \({\mathbf{m}}= \text {E}({\varvec{y}}|M;\theta _M)\) and \(\Sigma = \text {Cov}({\varvec{y}}|M;\theta _M)\), so that the function specified by \(f(P|{\varvec{y}},M)=f(P|A_m,C_M,M)\) plays the role of \(g({\varvec{y}})\) in (11). Therefore, we have
where conditional distribution function \(f(P|{\varvec{x_{(i)}}},M;\theta )\) is computable for all proposed models from Rathouz et al. (2008).
Appendix 2: Argument options in R package GxM
Model option
We consider both bivariate Cholesky models and bivariate correlated factors models, including Chol, CholGxM, NLMainGxM, CorrGxM, CholNonLin and CorrNonLin. The routines for fitting these models are provided in our \({\mathbf{R}}\) package, GxM. For models that do not admit a closed-form likelihood, we apply numerical integration techniques; for models that have closed-form likelihood, both fitting with closed formula and numerical techniques are provided. All models exploit derivative-free optimization.
Zero set option
This option provides for constraining some parameters to zero, greatly expanding the number of nested sub-models that are available, and allowing testing of specific parameters via likelihood ratio tests or by comparing BIC values. As explained in the Model section, GxM can be detected by testing statistical hypothesis under which certain parameters are zero. We supply an option named “zeroset” to enable users to fit models with chosen parameter(s) constrained to zero.
Initialization and priority option
For optimization problem with high dimensional parameters and non-concave surfaces, it is important to have reasonable and multiple starting points. By setting the non-linear latent terms to zero, all of our proposed models except Model (4) reduce to a common trivial model, and direct parameter estimation such as a method of moment estimator can be applied. This set of estimates serves as a desirable starting point. For Model (4), we use polynomial regression technique to eliminate the main effect of \(M\) on \(P\). After replacing the original \(P\) with regression residuals, the modified model can also be viewed as a case of the common trivial model. For non-linear models, we further add an intermediate update using a small number (\(k\) = 3) of AGHQ nodes. Lastly, we provide for the option of leaving the initialization to potential users. With priority level equal to 1, the user-specified initialization would be updated in the intermediate stage. By increasing priority level from 1 to 2, the manually specified initialization would ignore the intermediate update.
AGHQ nodes number option
We provide this option to allow a tradeoff between accuracy and computational intensity. As one may expect, a larger number of AGHQ nodes produces more accurate likelihood values. On the other hand, because the integration is 3-dimensional, the computation cost increases fast.
Parallel computing option
As an interpreted language, the performance of \({\mathbf{R}}\) in terms of computational speed is not as satisfactory as that for compiled languages. This issue is of concern when using computationally intensive numerical integration and derivative-free optimization techniques. Therefore, we embed parallel processing technique in response to the challenge.
Parallel computing with \({\mathbf{R}}\) is directly supported beginning with release 2.14.0. The package parallel provides convenient functions to perform parallel computing in both explicit and implicit modes. For instance, in the calculation of log-likelihood for GxM models, because of the summation over individual observations as \(l(\theta ) = \log L(\theta ) = \sum _{i} \log f(M_i,P_i;\theta )\), the global log-likelihood computation can be performed in a parallel manner. Users are provided the option to use parallel computation, and if so the number of CPU cores to allocate the computations.
Appendix 3: Configurations for 23 scenarios
The configurations of simulation settings for 23 scenarios in numerical analysis is shown in Table 6.
Rights and permissions
About this article
Cite this article
Zheng, H., Rathouz, P.J. Fitting Procedures for Novel Gene-by-Measured Environment Interaction Models in Behavior Genetic Designs. Behav Genet 45, 467–479 (2015). https://doi.org/10.1007/s10519-015-9707-9
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10519-015-9707-9