Introduction

The aim of this paper is to explore how the phenotype of children or adults may influence their own and their family members’ environment. We consider this in the context of the genetic simplex model as applied to longitudinal twin data. The genetic simplex model was proposed to investigate the covariance structure of longitudinal or repeated phenotypic measures in the classical twin design (Eaves et al. 1986; Boomsma and Molenaar 1987a). This involves fitting first order autoregressive models to the additive genetic (A), shared (or common; C), and unshared (or specific; E) covariance matrices of the repeated measures. Usually in its application, the variables A, C, and E are assumed to be uncorrelated, and to contribute additively to the variance of a continuous phenotype, or a liability underlying a discrete phenotype. We denote these assumptions as the absent of ‘GE covariance’ and ‘GxE interaction’, respectively. Note that the G and the E in this shorthand refer to genetic influences (A and/or D (dominance effects)) and environmental effects (unique and common) in general.

The inclusion of GE covariance in the genetic simplex model can be achieved in different ways. We are interested in the process in which the phenotype contributes to the environment (henceforth, Ph->E transmission). We limit ourselves to the effects of the phenotypes of the twins at occasion t on their environmental variables (E) at occasion t + 1 in the twin model, whether the transmission is model within and between twin members. This extension is inspired by sibling interaction models (Eaves 1976; Eaves et al. 1977; Carey 1986), which include mutual effects of the phenotypes of the twins and siblings on each other, and by an extension to the genetic simplex presented by Eaves et al. (‘phenotype-to-phenotype transmission’). We presented a related extension in de Kort et al. (2012), which, as we explain below, is a special case of the present model.

This extension can also be related to the processes of active and passive GE-covariance as discussed by Plomin et al. (1977) and (Scarr and McCartney 1983), and to the process of ‘genotypes selecting environments’ as discussed Eaves et al. (1977). Such processes, and so the GE-covariance arising from them, are considered to be plausible in cognitive development (Dickens and Flynn 2001; Johnson et al. 2011; Haworth et al. 2010), and in developmental psychopathology (Rutter et al. 1997; Rutter et al. 2006; Rutter and Silberg 2002). Given that most twin models do not include explicitly GE-covariance, the present paper may help to bring the practice of longitudinal twin modeling closer to the theory of developmental psychology.

The immediate goal of our paper is to present the phenotype to environment transmission model, to establish that the modeling including Ph->E transmission in the standard genetic simplex model is locally identified, and that the model is empirically viable in terms of resolution and statistical power. We consider identification in given 3 or 4 measurement occasions in the classical twin design, we apply the model to data obtained at 4 occasions.

Below we first present the standard simplex model. We then consider the extension consisting of the path from the phenotype to the unshared environmental influences within and between twin members. Local identification of the models is considered for 3 and 4 occasions analytically. We consider the issue of power to detect the effects associated with our extension, and we consider constraints, which may enhance the power. We apply the models to maternal ratings of anxiety in female twins at ages 3, 7, 10, and 12 years.

The Standard Genetic Simplex Model

Let yijt denote the phenotypic score of member j (j = 1,2) of twin or sibling pair i at time or age t (t = 1,…,T; where T is 3 or 4). The phenotypic score is regressed on the A, C, and E variables: \( {\text{y}}_{\text{ijt}} = {\text{b}}_{{0{\text{t}}}} + {\text{A}}_{\text{ijt}} + {\text{C}}_{\text{ijt}} + {\text{E}}_{\text{ijt}} + \varepsilon_{\text{ijt}} \), where all regressors have zero means, so that the phenotypic mean E[yijt] equals the intercept b0t (see Dolan et al. 1991, for a version with structured means). The occasion-specific residual is subject to the decomposition \( \varepsilon_{\text{ijt}} = {\text{a}}_{\text{ijt}} + {\text{c}}_{\text{ijt}} + {\text{e}}_{\text{ijt}} \), where eijt possibly includes measurement error. The A, C, and E variables are subject to first order autoregressions (t = 2,…,T):

$$ {\text{A}}_{\text{ijt}} = {\text{b}}_{{{\text{At}},{\text{t}} - 1}} {\text{A}}_{{{\text{ijt}} - 1}} + \zeta_{\text{Aijt}} , $$
$$ {\text{C}}_{\text{ijt}} = {\text{b}}_{{{\text{Ct}},{\text{t}} - 1}} {\text{C}}_{{{\text{ijt}} - 1}} + \zeta_{\text{Cijt}} , $$
$$ {\text{E}}_{\text{ijt}} = {\text{b}}_{{{\text{Et}},{\text{t}} - 1}} {\text{E}}_{{{\text{ijt}} - 1}} + \zeta_{\text{Eijt}} , $$

where bAt,t−1, bCt,t−1, and bEt,t−1 are the autoregressive coefficients, and ζAijt, ζCijt, and ζEijt are regression residuals (a.k.a, innovations in this context). At t = 1, we set Aij1 = ζAij1, Cij1 = ζCij1, and Eij1 = ζEij1. The model is depicted in Fig. 1. Identification of the standard genetic simplex is not an issue, as it is based on the decomposition of the phenotypic TxT covariance into the genetic and environmental (A, C and E) covariance matrices. This decomposition poses no problems of identification in the classical twin design and in other genetically informative designs. In simultaneously subjecting these covariance matrices to the simplex model, the standard identification conditions hold (Jöreskog 1970). Notably, given that the autoregressive parameters (bAt,t−1, bAt,t−1, bAt,t−1) are not zero and the variances (σ2At], σ2Ct], σ2Et]) are not zero, the occasion-specific variances, σ2[at], σ2[ct], and σ2[et], are not identified at t = 1 and t = T. This is usually solved by fixing these to zero (e.g., σ2[a1] = σ2[aT] = 0; same applies to the environmental occasion-specific variances), or by equating the variance components at occasions 1 and 2, and at occasions T − 1 and T (e.g., σ2[at] = σ2[at+1], where t = 1 or t = T − 1; same applies to the environmental occasion-specific variances). Assuming identification is achieved by applying such constraints, the associated decomposition of variance is

$$ \sigma^{ 2} [{\text{y}}_{\text{ijt}} ] = \sigma^{ 2} [{\text{A}}_{\text{t}} ] + \sigma^{ 2} [{\text{C}}_{\text{t}} ] + \sigma^{ 2} [{\text{E}}_{\text{t}} ] + \sigma^{ 2} [\varepsilon_{\text{t}} ],\quad \left( {{\text{t}} = 1,{\text{T}}} \right) $$
$$ \sigma^{ 2} [\varepsilon_{\text{t}} ] = \sigma^{ 2} [{\text{a}}_{\text{t}} ] + \sigma^{ 2} \left[ {{\text{c}}_{\text{t}} } \right] + \sigma^{ 2} [{\text{e}}_{\text{t}} ], \quad \left( {{\text{t}} = 1,{\text{T}}} \right) $$
$$ \sigma^{ 2} [{\text{A}}_{\text{t}} ] = {\text{b}}_{{{\text{At}},{\text{t}} - 1}}^{ 2} \sigma^{ 2} [{\text{A}}_{{{\text{t}} - 1}} ] + \sigma^{ 2} [\zeta_{\text{At}} ],\quad \left( {{\text{t}} = 2,{\text{T}}} \right) $$
$$ \sigma^{ 2} [{\text{C}}_{\text{t}} ] = {\text{b}}_{{{\text{Ct}},{\text{t}} - 1}}^{ 2} \sigma^{ 2} [{\text{C}}_{{{\text{t}} - 1}} ] + \sigma^{ 2} [\zeta_{\text{Ct}} ],\quad \left( {{\text{t}} = 2,{\text{T}}} \right) $$
$$ \sigma^{ 2} [{\text{E}}_{\text{t}} ] = {\text{b}}_{{{\text{Et}},{\text{t}} - 1}}^{ 2} \sigma^{ 2} [{\text{E}}_{{{\text{t}} - 1}} ] + \sigma^{ 2} [\zeta_{\text{Et}} ].\quad \left( {{\text{t}} = 2,{\text{T}}} \right) $$
Fig. 1
figure 1

The standard ACE simplex (ACE model). Occasion-specific influences are not shown. The scaling used is shown only at t = 1

At t = 1, we set \( \sigma^{ 2} [{\text{A}}_{ 1} ] = \sigma^{ 2} [\zeta_{\text{A1}} ],\sigma^{ 2} [{\text{C}}_{ 1} ] = \sigma^{ 2} [\zeta_{\text{C1}} ] \), and \( \sigma^{ 2} [{\text{E}}_{ 1} ] = \sigma^{ 2} [\zeta_{\text{E1}} ] \) (given \( {\text{A}}_{\text{ij1}} = \zeta_{\text{Aij1}} ,{\text{ C}}_{\text{ij1}} = \zeta_{\text{Cij1}} ,{\text{ and E}}_{\text{ij1}} = \zeta_{\text{Eij1}} ) \).

The genetic simplex model and variations on this model (e.g.,Hewitt et al. 1988) have been put to good use in studies of personality, cognition, psychophysiology, psychopathology, etc. (e.g., see Hoekstra et al. 2007; Rietveld et al. 2003a, b; Bartels et al. 2004, 2002; Boomsma et al. 1989; Gillespie et al. 2007; Cardon et al. 1992; Petrill et al. 2004). Minica et al. (2010) discussed the inclusion of measured genetic variants in the genetic simplex. In these studies, GE-covariance was assumed to be absent. Below we extend the standard simplex model by considering the possibility that the phenotypes of the twins at occasion t contributes to their environmental influences at occasion t + 1. This extension introduces covariance among the A, C, and E, and necessarily alters the meaning of the latent environmental variables, as we point out below.

Ph->E Transmission, in the Presence of C

Figure 2 depicts the model in which we suppose that the phenotype of a person at occasion t contributes to shaping his or her own environment (say Ei1t+1; parameters αk; k = 1,…,T − 1), and possibly also to the environment of a cotwin or other family members at occasion t + 1 (Ei2t+1; parameters denoted βk; k = 1,…,T − 1). At t = 1, we assume that the environmental variables are not subject to such direct phenotypic influences, so that at t = 1 the latent environment variables have their standard interpretation, which in part is based on their specification as uncorrelated.

Fig. 2
figure 2

The extended ACE simplex model. Occasion-specific influences are not shown. The scaling used is shown only at t = 1. The extension comprises the arrows from the phenotype y* at t to the E variables at t + 1 (i.e., parameters αk and βk). For the distinction between y (Fig. 1) and y* in this Figure, see the text

As above, the phenotype at each time point is related to the intercept b0t (the phenotypic mean at time t) and the zero mean additive genetic (Aijt), environmental variables (Cijt and Eijt), and the time specific residual (εijt):

$$ {\text{y}}_{\text{ijt}} = {\text{b}}_{{0{\text{t}}}} + {\text{y}}_{\text{ijt}}^{*} + \varepsilon_{\text{ijt}} , $$
$$ {\text{y}}_{\text{ijt}}^{*} = {\text{A}}_{\text{ijt}} + {\text{C}}_{\text{ijt}} + {\text{E}}_{\text{ijt}} . $$

The phenotype yijt is decomposed into a part, y *ijt , that is subject to the longitudinal model, and the time specific part, \( {\text{b}}_{{0{\text{t}}}} + \varepsilon_{\text{ijt}} \), which, as above, may include zero mean occasion-specific influences: \( \varepsilon_{\text{ijt}} = {\text{a}}_{\text{ijt}} + {\text{c}}_{\text{ijt}} + {\text{e}}_{\text{ijt}} \). In this manner, the time specific influences εijt are strictly time specific, i.e., not subject to any transmission. As above, the additive genetic variable (A) and shared environmental variable (C) is subject to the first order autoregressions. The environmental variables in twin members 1 and 2 are regressed on the preceding environmental variables and preceding phenotypes:

$$ {\text{E}}_{\text{i1t}} = {\text{b}}_{{{\text{Et}},{\text{t}} - 1}} {\text{E}}_{{{\text{i1t}} - 1}} + \alpha_{\text{k}} {\text{y}}^{*}_{{{\text{i1t}} - 1}} + \beta_{\text{k}} {\text{y}}^{*}_{{{\text{i2t}} - 1}} + \zeta_{\text{Ei1t}} , $$
$$ {\text{E}}_{\text{i2t}} = {\text{b}}_{{{\text{Et}},{\text{t}} - 1}} {\text{E}}_{{{\text{i2t}} - 1}} + \beta_{\text{k}} {\text{y}}^{*}_{{{\text{i1t}} - 1}} + \alpha_{\text{k}} {\text{y}}^{*}_{{{\text{i2t}} - 1}} + \zeta_{\text{Ei2t}} , $$

where the Ph->E transmission parameters are αk (transmission within a twin member) and βk (transmission across twin members). By including the βk parameter, we allow the phenotypic value of one twin member to contribute to the environment of the other twin member.

Application of the tracing rules reveals that Ph->E transmission starting at t = 1 necessarily changes the standard definition of the C and E at t = 2 and onwards, as the environmental influences become correlated. The tracing rules also reveal that the extension introduces GE covariance, i.e., the additive genetic variables (A; at t = 2,…,T), and the unshared environmental influences (E; at t = 2,…,T) are correlated. de Kort et al. (2012) considered this model, but accommodated C by allowing the environmental influences to be correlated (i.e., not modeling E and C, but T, the totality of relevant environmental influences; Carey, 2006). The model depicted in Fig. 2, and de Kort’s model are equivalent if bCt,t−1 = bEt,t−1 (t = 2,…,T).

Local Identification

We know that the standard genetic ACE simplex is locally identified, provided that the occasion-specific variances (σ2[at], σ2[ct], and σ2[et]) at t = 1 and t = T are fixed to zero, or constrained to equal the neighboring variances. Below we considered the identification of the extended model in two ways, numerically and analytically. A model is analytically locally identified if the Jacobian matrix of the model is of full column rank (Bekker et al. 1993). Let σ(θ) denote the (T*(T + 1)/2)-dimensional vector containing the non-redundant elements of the model covariance matrices of the MZ and DZ twins (expressed in terms of the fixed and to-be-estimated parameters), and let θ denote the p-dimensional vector of to-be-estimated parameters. The p × q Jacobian matrix equals J(θ) = ∂σ(θ)/∂θ. As explained by Bekker et al. (1993), the model is locally identified if J(θ) has full column rank. This test may be understood as a generalization of the test of the rank of the design matrix in scaling tests (Mather and Jinks 1977). We used Maple 6 (e.g., Heck 1993) to carry out this test by expressing in Maple the phenotypic covariance matrices in terms of the parameters, organizing these in the vector σ(θ), and defining the vector θ. This is a small programming task. The more complicated operations of calculating the Jacobian and its null space are carried out in Maple. For other of applications of this method in the twin design and in other contexts, see Derks et al. (2006), de Kort et al. (2012), and Bollen and Bauldry (2010).

Identification can also be established numerically by choosing parameters values, calculating the expected covariance matrices given these values, and establishing that the parameters are consistently (given variation in starting values) correctly recovered in fitting the true model to the expected covariance matrices. Analytical identification is computationally more demanding, but preferable as it does not depend on an arbitrary choice of parameter values. However, to reduce the computational burden in Maple, we imposed the following constraints on the occasion-specific variances: (1) \( \sigma^{ 2} [{\text{a}}_{ 1} ] = \sigma^{ 2} [{\text{a}}_{ 2} ] = \sigma^{ 2} [{\text{a}}_{ 3} ] = \sigma^{ 2} [{\text{a}}_{ 4} ] \ne 0 \), (2), σ2[ct]=0 (t=1,…,4), and (3) \( \sigma^{ 2} [{\text{e}}_{ 1} ] = \sigma^{ 2} [{\text{e}}_{ 2} ] = \sigma^{ 2} [{\text{e}}_{ 3} ] = \sigma^{ 2} [{\text{e}}_{ 4} ] \ne 0 \). In the numerical study of identification, we relax these constraints.

Phenotype to E Effects in the Presence of C

Analytical Identification T = 4

Given T = 4, the addition of the parameters αk and βk (k = 1,2,3), rendered the extended simplex model (see Fig. 2) unidentified. In exploring identifying constraints, we first considered the parameters αk and βk. Using Maple, we established that the reduction of two sets of three parameters (αk and βk, k = 1,2,3) to two sets of two parameters rendered the model identified. We considered the equality constraints α2 = α3 and β2 = β3 (leaving α1 and β1 unconstrained), and we considered the imposition of a linear trend αk = b + (k−1)*b, k = 1,2,3, and βk = b + (k−1)*b (k = 1,2,3), where b, b, b, and b are now the free parameters. We note that whereas α1, α2 = α3 and β12 = β3 is identified, we found that α1 = α2, α3 and β1 = β2, β3 is not. We repeated these analyses without common environmental influences (i.e., deleting the C simplex), but this had no bearing on the results, i.e., the equality constraints (or the linear constraints) were still required to render the model identified.

Others constraints are possible. An obvious choice is to constrain the autoregressive parameters. We considered separately bA2,1 = bA3,2 = bA4,3 (henceforth equal A b-coefficients), bC2,1 = bC3,2 = bC4,3 (equal C b-coefficients), and bE2,1 = bE3,2 = bE4,3 (equal E b-coefficients), and found that each set of constraints resulted in model identification with unconstrained αk and βk (k = 1,2,3).

Analytical Identification T = 3

We retained the constraints on the occasion-specific variance (at t = 1,2,3). The T = 3 model with the two sets of two parameter, e.g., α1, α2 and β1, β2, is not identified (deletion of the common environmental simplex did not results in identification). Imposing equality constraints on the parameters αk and βk1 = α2 = α and β1 = β2 = β) rendered the model identified. We then considered the imposition of (1) equal A b-coefficients and equal C b-coefficients; (2) equal A b-coefficients and equal E b-coefficient; (3) equal C b-coefficients and equal E b-coefficients. Each sets of constraints rendered the model identified, with unconstrained αk and βk (k = 1,2).

In sum, we considered identification subject to constraints on the occasion-specific residuals mentioned above. Given T = 4, we conclude the following: (1) The extended model (Fig. 2) with added parameters αk and βk is identified given constraints on αk (e.g., α1, α2 = α3; or the linear constraint) and on βk (e.g., β1, β2 = β3; or the linear constraint). These constraints are required in the presence or absence of the C simplex. (2) It is identified with αk (k = 1,2,3) and βk (k = 1,2,3) given equality constrained autoregressive coefficient, i.e., given equal A b-coefficients, equal C b-coefficients, or equal E b-coefficients). Given T = 3, we conclude the following: (1) The extended model is identified given the equality constraints α1 = α2 and β1 = β2. (2) It is identified with αk (k = 1,2) and βk (k = 1,2) given equal A b-coefficients and equal C b-coefficients, or equal A b-coefficients and E b-coefficients, or equal E b-coefficients and equal C b-coefficients.

Misspecification and Power Given T = 4

Local identification is a necessary, but not a sufficient, condition for a model to be viable. We have to provide some indication of power to resolve the effects of interest (Martin et al. 1978). Given the fairly restrictive identification conditions associated with T = 3, we address these issues only in the case of T = 4. We do this by fitting the true model and misspecified models to the population matrices using exact normal data simulation (van der Sluis et al. 2008). We used Mplus 6.1 (Muthén and Muthén 2007) to fit the models using maximum likelihood (ML) estimation.

We consider only three sets of parameter values. The first set includes the following parameters of the standard genetic simplex (ACE; set 1):

$$ {\text{b}}_{{{\text{At}},{\text{t}} - 1}} = . 7,{\text{ b}}_{{{\text{Ct}},{\text{t}} - 1}} = . 9,{\text{ and b}}_{{{\text{Et}},{\text{t}} - 1}} = . 6, \quad \left( {{\text{t}} = 2, 3, 4} \right) $$
$$ \sigma^{ 2} [\zeta_{\text{A1}} ] = . 4*\rho ,\,\sigma^{ 2} [\zeta_{\text{C1}} ] = . 2*\rho ,\,\sigma^{ 2} [\zeta_{\text{E1}} ] = . 4*\rho , $$
$$ \sigma^{ 2} [\zeta_{\text{At}} ] = . 4*\left( { 1- {\text{ b}}_{{{\text{At}},{\text{t}} - 1}}^{ 2} } \right)*\rho , $$
$$ \sigma^{ 2} [\zeta_{\text{Ct}} ] = . 2*\left( { 1- {\text{ b}}_{{{\text{Ct}},{\text{t}} - 1}}^{ 2} } \right)*\rho , $$
$$ \sigma^{ 2} [\zeta_{\text{Et}} ] = . 4*\left( { 1- {\text{ b}}_{{{\text{Et}},{\text{t}} - 1}}^{ 2} } \right)*\rho , \quad \left( {{\text{t}} = 2, 3, 4} \right) $$
$$ \sigma^{ 2} [{\text{a}}_{\text{t}} ] = \left( { 1- \rho } \right)*. 50,\sigma^{ 2} [{\text{c}}_{\text{t}} ] = 0, \, \;{\text{and}}\;\;\sigma^{ 2} [{\text{e}}_{\text{t}} ] = \left( { 1- \rho } \right)*. 50. \quad \left( {{\text{t}} = 1,.., 4} \right) $$

The second set includes σ2[ct] as shown (ACE; set 2):

$$ {\text{b}}_{{{\text{At}},{\text{t}} - 1}} = . 7,{\text{ b}}_{{{\text{Ct}},{\text{t}} - 1}} = . 9,{\text{ and b}}_{{{\text{Et}},{\text{t}} - 1}} = . 6, \quad \left( {{\text{t}} = 1, 2, 3} \right) $$
$$ \sigma^{ 2} [\zeta_{\text{A1}} ] = . 4*\rho ,\,\sigma^{ 2} [\zeta_{\text{C1}} ] = . 2*\rho ,\,\sigma^{ 2} [\zeta_{\text{E1}} ] = . 4*\rho , $$
$$ \sigma^{ 2} [\zeta_{\text{At}} ] = . 4*\left( { 1- {\text{ b}}_{{{\text{At}},{\text{t}} - 1}}^{ 2} } \right)*\rho , $$
$$ \sigma^{ 2} [\zeta_{\text{Ct}} ] = . 2*\left( { 1- {\text{ b}}_{{{\text{Ct}},{\text{t}} - 1}}^{ 2} } \right)*\rho , $$
$$ \sigma^{ 2} [\zeta_{\text{Et}} ] = . 4*\left( { 1- {\text{ b}}_{{{\text{Et}},{\text{t}} - 1}}^{ 2} } \right)*\rho , \quad \left( {{\text{t}} = 2, 3, 4} \right) $$
$$ \sigma^{ 2} [{\text{a}}_{\text{t}} ] = \left( { 1- \rho } \right)*. 3 3 3,\;\;\sigma^{ 2} [{\text{c}}_{\text{t}} ] = . 3 3 3, \quad \;{\text{and}}\;\;\sigma^{ 2} [{\text{e}}_{\text{t}} ] = \left( { 1- \rho } \right)*. 3 3 3. \quad \left( {{\text{t}} = 1,.., 4} \right) $$

The third set excludes the influence of C altogether, i.e., bCt,t−1 = .0, σ2C1] = 0, σ2Ct] = 0, σ2[ct] = 0 (AE; set 3):

$$ {\text{b}}_{{{\text{At}},{\text{t}} - 1}} = . 7,{\text{ and b}}_{{{\text{Et}},{\text{t}} - 1}} = . 6, \quad \left( {{\text{t}} = 2, 3, 4} \right) $$
$$ \sigma^{ 2} [\zeta_{\text{A1}} ] = . 5*\rho ,\;\;\sigma^{ 2} [\zeta_{\text{E1}} ] = . 5*\rho , $$
$$ \sigma^{2} [\zeta_{\text{At}} ] = .5*\left( {1 - {\text{b}}_{{{\text{At}},{\text{t}} - 1}}^{2} } \right)*\rho, \sigma^{2} [\zeta_{\text{Et}} ] = .5*\left( {1 - {\text{b}}_{{{\text{Et}},{\text{t}} - 1}}^{2} } \right)*\rho,\quad \left( {{\text{t}} = 2,3,4} \right) $$
$$ \sigma^{ 2} [{\text{a}}_{\text{t}} ] = \left( { 1- \rho } \right)*. 50,\;{\text{ and}}\;\;\sigma^{ 2} [{\text{e}}_{\text{t}} ] = \left( { 1- \rho } \right)*. 50. \quad \left( {{\text{t}} = 1,.., 4} \right) $$

Note that the fixed parameter ρ is the ratio of the variance due to the autoregressive processes to the total phenotypic variance (ρ is the reliability, if we conveniently consider the occasion-specific variance as due to error). We chose ρ = .80, so that 20 % of the phenotypic variance is occasion-specific in the standard simplex. Given three sets of parameters, we varied αk and βk as shown Table 1. In fitting the models, we consistently applied the identifying constraints α2 = α31 unconstrained) and β2 = β31 unconstrained). In set 1 and 3, we imposed the constraints mentioned on the occasion-specific residual variances: σ2[a1] = σ2[a2] = σ2[a3] = σ2[a4] ≠ 0, σ2[ct] = 0 (t = 1,…,4), and σ2[e1] = σ2[e2] = σ2[e3] = σ2[e4] ≠ 0. In set 2, we included the occasion-specific variances σ2[c1] = σ2[c2] = σ2[c3] = σ2[c4] ≠ 0 to establish numerically that these are identified.

Table 1 Detection of the phenotype to environment transmission in the ACE simplex and in the AE simplex

The parameter values chosen are quite arbitrary. To get some sense of the resulting summary statistics, we report in the appendix associated phenotypic summary statistics associated with parameter sets 1 and 3. The twin correlations look plausible. The models gives rise to small differences in phenotypic variance in the MZs and DZs (Eaves et al. 1977). In both set 1 and 3, given αk = βk = .1, the within twin member correlations between the As and Es range from .0 to .32 in MZs, and from .0 to .25 in DZs. Given αk = βk = −.1, the correlations range from −.07 to −.28 in MZs and from −.05 to −.21 in DZs. In sum, the results in Table 1 are based on the model with the (over-identifying) constraints on the occasion-specific residual variances, and on the constraints α2 = α31 unconstrained) and β2 = β31 unconstrained).

The results in Table 1 are clear: given the present parameter values, we require prohibitively large sample sizes to resolve Ph->E transmission in the presence of C. This is understandable as Ph->E transmission destroys a design feature of the ACE model: A, C, and E become correlated over time. The resolution in set 2 is slightly lower still, but the differences are relatively small (i.e., the resolution is dismal, regardless). It is important to note that the addition of the occasion-specific residual variances, σ2[ct], did not give rise to any identification problems, judging by the parameter recovery and the Mplus numerical identification test based on the Information matrix.

Ph->E transmission in the AE model, in contrast, fares well in terms of sample size requirements to resolve this feature (see Table 1). The results raise the question how well we can distinguish between the AE model with Ph->E transmission and the ACE model without such transmission. Although these models are not nested, we can still compare the overall χ2 goodness of fit indices, as obtained by fitting the models to the population covariance matrices. These results are shown in Table 2. The generating model is the AE simplex (parameter set 3) with the parameters α1, α2 = α3, β1, and β2 = β3. We fitted the AE simplex model (df = 68) without the parameters αk = βk = 0 (same results as in Table 1, set 3), we fitted the standard ACE models with and with occasion-specific residuals (df = 61 and df = 60, respectively), and the AE model without a rank one C covariance matrix (df = 64; i.e., we estimated σ2[C1] = σ2C1] and bCt,t−1, but fixed σ2Ct] = 0, t = 2,3,4, and σ2[ct] = 0). Judging by the twin correlations in the Table 5 in Appendix (set 3), the inclusion of C makes little sense if βk is negative. However, we proceed with the model fitting results, but return to the model αk = .1 & βk = −.1 below.

Table 2 The generating model is the AE simplex (parameter set 3) with the parameters α1, α2 = α3, β1, and β2 = β3, as shown

The goodness of fit results (df = 61 model vs. df = 60 models) are consistent with our finding that the occasion-specific residuals variances consistently hit the lower bound of zero (σ2[ct] = 0), as did the C innovation variances (i.e., σ2Ct] = 0, t = 2,3,4). Dropping these variance components did not result in any appreciable increase in χ2 (the df = 64 model). The results in Table 2 suggest that (1) the power is good to detect the Ph->E transmission in the AE model (as we know from Table 1; e.g., given αk = βk = .1, chi2 = 4.39 + 34.01 = 38.4, df = 4); (2) the ACE model will fit the AE model with Ph->E transmission quite well as long as βk is positive (e.g., given αk = βk = .1, χ2 = 1.55 + 3.89 = 5.44; DF = 64; Nmz = Ndz = 1000); (3) the C in the misspecified ACE model is almost perfectly rank 1 (compare column 2 and 3 of Table 2); (4) the DZ twins generally provide most information to distinguish these models.

To show that the incorrect df = 64 model (AE simplex, C rank 1) not only fits well, but also produces seemingly sensible parameter values, we report the parameters of this incorrect model, given the data generating AE simplex model with αk = .15 and βk = .15. The point estimates (standard errors in parentheses) are:

$$ \sigma [{\text{a}}_{\text{t}} ] = . 2 8 6 { }\left( {.0 4} \right),\quad \sigma [{\text{e}}_{\text{t}} ] = . 3 1 8 { }\left( {.0 4} \right), $$
$$ \sigma [\zeta_{\text{A1}} ] = . 5 2 8 { }\left( {.0 4} \right),\quad \sigma [\zeta_{\text{A2}} ] = . 3 9 4 { }\left( {.0 6} \right),\quad \sigma [\zeta_{\text{A3}} ] = . 3 9 1 { }\left( {.0 6} \right),\quad \sigma [\zeta_{\text{A4}} ] = . 4 2 4 { }\left( {.0 6} \right), $$
$$ \sigma [\zeta_{\text{E1}} ] = . 6 4 7 { }\left( {.0 2} \right),\quad \sigma [\zeta_{\text{E2}} ] = . 5 1 2 { }\left( {.0 3} \right),\quad \sigma [\zeta_{\text{E3}} ] = . 5 1 2 { }\left( {.0 3} \right),\quad \sigma [\zeta_{\text{E4}} ] = . 50 9 { }\left( {.0 3} \right), $$
$$ \sigma [\zeta_{\text{C1}} ] = . 3 4 7 { }\left( {.0 4} \right), $$
$$ {\text{b}}_{{{\text{A2}}, 1}} = . 9 1 1 { }\left( {. 1 1} \right),{\text{ b}}_{{{\text{A3}}, 2}} = . 7 90 \, \left( {.0 8} \right),{\text{ b}}_{{{\text{A4}}, 3}} = . 7 8 3 { }\left( {.0 9} \right), $$
$$ {\text{b}}_{{{\text{E2}}, 1}} = . 5 8 2 { }\left( {.0 4} \right),{\text{ b}}_{{{\text{E3}}, 2}} = . 5 8 6 { }\left( {.0 4} \right),{\text{ b}}_{{{\text{E4}}, 3}} = . 5 8 9 { }\left( {.0 4} \right), $$
$$ {\text{b}}_{{{\text{C2}}, 1}} = 1. 5 6 { }\left( {. 1 3} \right),{\text{ b}}_{{{\text{C3}}, 2}} = 1. 3 3 { }\left( {.0 9} \right),{\text{ b}}_{{{\text{C4}}, 3}} = 1. 1 5 2 { }\left( {.0 7} \right). $$

These values seem quite sensible. One may object to the estimates of bCt+1,t being greater than one (the parameters being outside the unit circle). However, in this model these parameters are not interpretable as autoregressive coefficients.

We note that the ACE simplex model does not fit quite as well if the parameter βk is negative (i.e., βk = −.10). The summary statistics in the Appendix indicate that parameter set 3, with αk = .10 and βk = −.10, produces correlations that are not consistent with the presence of C. For instance, given αk = .10 and βk = −.10, we have, at t = 2,3,4, MZ correlations of .420, .373, and .345, and DZ correlations of .148, .090, and .054. These resemble twin correlations sometimes observed in personality dimensions. It is well known that negative sibling interaction and non-additive genetic effects may give rise to such disparate correlations (Rietveld et al. 2003a, b; Eaves 1988, 1976). Given that C is unlikely given these correlations, we fitted the standard ADE simplex to the data generated by set 3 with αk = .10 and βk = −.10. We obtained a χ2(60) of 14.37. As the occasion-specific D variances and the D innovation variances were zero, so we fixed these to zero (the D covariance matrix is now rank 1), and again obtained the χ2(64) = 14.37. The parameter values were quite sensible. One may question whether dominance is enough to account for the differences in the correlations (e.g., .345 vs. .054; Eaves, 1988). However, as above, what is puzzling, if one were to take the ADE model seriously, is the rank 1 D covariance matrix.

Finally, we considered the fit of the AE model with Ph->E transmission to expected covariance matrices generated with parameter set 2 (ACE simplex with αk = βk = 0). In Mplus the χ2(64) was 45.7 (Nmz = Ndz = 1000), which is relatively large compared to the values in Table 2. More importantly, we note that the parameter estimates made little sense. For instance, the parameters bEt+1,t assumed negative values, and the parameters bA2,1 and bA3,2 were almost zero. In conclusion, we find that the AE model with Ph->E transmission fits data generated by the standard ACE simplex relatively poorly, and does not produce sensible parameter estimates.

Illustration: Anxiety at 3, 7, 10, and 12

We applied the ACE standard simplex and the AE simplex with Ph->E transmission to mother ratings of anxiety measured at ages 3, 7, 10, and 12 years in female MZ and DZ twins. The data were collected by the Netherlands Twin Register (NT), which includes the Young NTR (YNTR; van Beijsterveldt et al. 2003; Boomsma et al. 2002, 2006) that has recruited newborn twins and multiples at birth since 1987. The parents and teachers of the twins rate anxious depression in the children by age appropriate questionnaires from the Achenbach system of empirical assessment (ASEBA): the Child Behavior Checklist (CBCL/1.5-5; Achenbach 1990, 1992a, b) and CBCL/4-18 (Verhulst et al. 1996).

As Ph->E transmission need not be the same in boys and girls, and as a proper treatment of sex differences is beyond the present scope, we analyzed the data of the MZ and DZ girls. We have 3,480 MZ pairs and 3,145 DZ pairs. The percentages observed at ages 3–12 years are about 89, 54, 45, and 37 % in MZ twins, and 89, 50, 39, and 32 % in the DZ twins. FIML estimates of the MZ phenotypic twin correlations are .71, .58, .58, and .63. The corresponding DZ correlations are .31, .36, .35, and .40. Additional summary statistics are given in Table 6 Appendix. Using FIML estimation in Mplus, we fitted to the raw data the standard ACE simplex model, with occasion-specific residual variances constrained to be equal over time.

The goodness of fit indices are shown in Table 3. We found that the occasions specific residual variances, σ[ct] (t = 1,…,4), and the C innovations were zero, σ[ζCt] (t = 2,3,4). We fixed these to zero, reducing the C covariance matrix to rank 1. We know from the analyses of expected covariance matrices (see above, Table 2) that the rank 1 C covariance matrix is compatible with Ph->E transmission. We therefore removed C altogether by dropping the parameter σ[ζC1], and we added the four Ph->E transmission parameters α1, α2 = α3, β1, and β2 = β3. This resulted in smaller AIC and BIC, but the αk parameters were not significant (alpha = .01). The model with only αk = 0 and βk (2 parameters) estimated produced the smallest value of BIC and a slightly larger AIC. As a check, we fitted the model with αk estimated (2 parameters) and βk = 0, but concluded that this model is not compatible with the data, as it consistently failed to converge. Finally we fitted the standard AE simplex. But this model produced the largest AIC and the third largest BIC. Given the values of αk = 0, we conclude that a twin’s anxious behavior does not influence her own environment, but does contribute to the environment of her co-twin. We report in Table 4 the parameter estimates and robust standard errors.

Table 3 Fit indices (smallest AIC and BIC underlined)
Table 4 Parameter estimates in the analysis of Anxiety from 3y to 12y. ML estimates and robust standard errors in parentheses in the AE simplex with parameter βk, logl = −69522.1)

Table 7 in Appendix contains the correlation matrices among the A and E variables. We note that we observed correlations between the A and E variables after age 3, i.e., GE covariance attributable to the Ph->E transmission. The parameters estimates are β1 = 0.123 (s.e. .041) and β2 = β3 = 0.062 (s.e. .027), and the resulting correlations between A and E range from .05 to .23. From age 3 onwards, we note that the environmental variables become correlated, again due to the Ph->E transmission positive parameters βk. The correlations range from .09 to .23. Both the additive genetic correlations (.45 (3–7y), .87 (7–10y), .87 (10–12y)) and environmental correlations increase over time (.33, .74, and .89). The heritabilities are 0.70 (3y), 0.58 (7y), 0.51 (10y), 0.53 (12y). Given the positive values of the transmission parameters, we may interpret the results in the spirit of cooperative sibling interaction (Eaves 1976; Eaves et al. 1977): manifest anxious behavior of one twin member forms a cause of anxiety in the other twin member, by contributing to the other twin’s environment.

Discussion

We perceive a discrepancy between the practice of longitudinal modeling within the classical twin design and developmental psychological theory. The former usually features the provisional assumption that GE covariance is absent, while the latter places great emphasis on GE covariance arising in plausible notions of genotype–environment interplay or person–environment interplay (Loehlin and DeFries 1987; Plomin et al. 1977; Scarr 1992; Scarr and McCartney 1983). For instance, in developmental psychopathology, GE covariance is accorded an important role (Rutter et al. 1997, 2006; Rutter and Silberg 2002), and is thought to be relevant to the development of treatment (Jaffee and Price 2008). GE-covariance is also thought to be relevant to cognitive development (Johnson et al. 2011; Dickens and Flynn 2001; Haworth et al. 2010; for a recent application of the present model to intelligence data, see Dolan et al. 2014).

In the present paper, we explored, in the longitudinal classical twin design, GE-covariance by positing Ph->E transmission, as discussed by Eaves (1976), Eaves et al. (1977), Carey (1986), and de Kort et al. (2012). Local identification of the model considered posed no great problems. Given T = 4, we established local identification given the constraints reducing the two set of three parameters (αk, βk; k = 1,2,3) to sets of two parameters (α2 = α3 and β2 = β3 or a linear constraint), in otherwise unconstrained genetic and environmental simplex models. The parameters αk, βk (k = 1,2,3) may be rendered identified by introducing other constraints in the simplex (equal autoregressive coefficients), but we did not pursue such constraints in our numerical analyses. Our numerical results—given our limited choice of parameters—suggest that Ph->E transmission in the absence of C is viable with realistic samples sizes (Fig. 3), but, in the presence of C (Fig. 2), well beyond the resolution provided by realistic twin samples in the presence of C (see Table 1). This is understandable, as Ph->E transmission (without explicit C influences) gives rise to correlated environmental effects, which are hard to distinguish from proper C (see Table 6 in Appendix).

Fig. 3
figure 3

The extended AE simplex model. Occasion-specific influences are not shown. The scaling used is shown only at t = 1. The extension comprises the arrows from the phenotype y* at t to the E variables at t + 1 (i.e., parameters αk and βk). For the distinction between y (Fig. 1) and y* in this Figure, see the text

An interesting result is that the AE simplex Ph->E transmission, with positive αk and βk, gives rise to a covariance structure that is quite consistent with an AE simplex plus a rank 1 C covariance matrix. The presence of C is to be expected, given the correlated environmental effects caused by the Ph->E transmission. However, we had not anticipated that the resulting C covariance matrix is rank 1. We contend that a rank 1 covariance matrix (be it due to A, D, C, or E) is in itself a suspicious results (what psychological process generates this?). It is striking that we also observed the rank 1 C covariance matrix in our analyses of the anxiety data. We found the Ph->E transmission model, limited to βk, provided the best fit in terms of the AIC and BIC. In addition, compared to the model with a rank 1 C covariance matrix, we think that this model is substantively more plausible. GE-covariance, as conceived here, necessarily gives rise to correlated environments. So a successful AE model would seem to rule out our GE covariance process. However, we note that an ADE model, notably with a rank 1 D covariance matrix, is compatible with Ph->E transmission with a negative parameter βk. A second symptom of GE covariance in this connection is an overly large discrepancy between the phenotypic MZ and the DZ correlation (e.g., as mentioned above, .345 vs. .054; but see Eaves 1988, for a genetic explanation).

We have considered only Ph->E transmission (Fig. 3), but recognize that there are other possibilities. We considered phenotype to E (within twin member; behavior influences own environment, E) in combination with phenotype to C transmission (Ph->C). This is formally locally identified given constraints to those applied to αk and βk. However, numerically this model generated many problems, which suggested empirical under-identification. We did not consider phenotype to A transmission, although we consider it possible that behavior (e.g., substance abuse, exercise, etc.) may influences gene expression. We do not know whether such feedback is detectable in psychometric data, or on the time scale upon which such data are typically collected. In addition, we do not know how well we can distinguish statistically these various types of feedback models. This question is relevant to our analyses of anxiety. Eaves et al. (1986) interpreted phenotype to phenotype transmission as a test–retest effect. This could apply to the mothers’ ratings, although our results favored our model with phenotype to E. The model with phenotype to phenotype transmission within and between twin members produced a log likelihood of −69,522 (24 parameters), AIC = 139,093, and BIC = 139,256. The model with phenotype to phenotype transmission within twin members produced a log likelihood of −69,537 (22 parameters), AIC = 139,119, and BIC = 139,268. As shown in Table 3, AIC and BIC of our model of choice (22 parameters) equal 139,088 and 139,237, respectively.

Finally, we considered GE covariance in the absence of GE interaction. One form of interaction which would seem to be plausible is differential Ph->E transmission, in which the magnitude of the transmission effects depends on the phenotypic scores. For instance, in theories of cognitive development active Ph->E transmission is often associated with the idea of highly intelligent children seeking out (or creating) “smart” environments (Plomin et al. 1977). This may be true, but it does not necessarily imply that children of intermediate or low intelligence do not engage in any “niche” picking. Moderated Ph->E transmission may be modeled using Bayesian estimation (Eaves and Erkanli 2003).