Bias and Precision of Parameter Estimates from Models Using Polygenic Scores to Estimate Environmental and Genetic Parental Influences

In a companion paper Balbona et al. (Behav Genet, in press), we introduced a series of causal models that use polygenic scores from transmitted and nontransmitted alleles, the offspring trait, and parental traits to estimate the variation due to the environmental influences the parental trait has on the offspring trait (vertical transmission) as well as additive genetic effects. These models also estimate and account for the gene-gene and gene-environment covariation that arises from assortative mating and vertical transmission respectively. In the current study, we simulated polygenic scores and phenotypes of parents and offspring under genetic and vertical transmission scenarios, assuming two types of assortative mating. We instantiated the models from our companion paper in the OpenMx software, and compared the true values of parameters to maximum likelihood estimates from models fitted on the simulated data to quantify the bias and precision of estimates. We show that parameter estimates from these models are unbiased when assumptions are met, but as expected, they are biased to the degree that assumptions are unmet. Standard errors of the estimated variances due to vertical transmission and to genetic effects decrease with increasing sample sizes and with increasing \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$r^2$$\end{document}r2 values of the polygenic score. Even when the polygenic score explains a modest amount of trait variation (\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$r^2=.05$$\end{document}r2=.05), standard errors of these standardized estimates are reasonable (\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$< .05$$\end{document}<.05) for \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$n=16K$$\end{document}n=16K trios, and can even be reasonable for smaller sample sizes (e.g., down to 4K) when the polygenic score is more predictive. These causal models offer a novel approach for understanding how parents influence their offspring, but their use requires polygenic scores on relevant traits that are modestly predictive (e.g., \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$r^2>.025)$$\end{document}r2>.025) as well as datasets with genomic and phenotypic information on parents and offspring. The utility of polygenic scores for elucidating parental influences should thus serve as additional motivation for large genomic biobanks to perform GWAS’s on traits that may be relevant to parenting and to oversample close relatives, particularly parents and offspring. Supplementary information The online version of this article (10.1007/s10519-020-10033-9) contains supplementary material, which is available to authorized users.


Introduction
While behavioral genetics is sometimes viewed as being concerned with cataloging heritability ( h 2 ) and its determinants across traits, there has traditionally been great interest in understanding how family members directly impact each other environmentally. However, disentangling the genetic and environmental factors that cause familial resemblance has proven difficult, and is made all the more so when these factors are correlated. One likely reason for such a correlation is vertical transmission (VT), which occurs when a parental trait has a direct environmental influence on an offspring trait. VT leads to a covariance between the trait's genetic and parental influences-a type of gene-environment Edited by Sarah Medland.
1 3 covariance that has recently been termed genetic nurture by Kong et al. (2018). Kong et al. (2018) showed that genetic nurture can be estimated from the covariance between the offspring's phenotypic value and a polygenic score (PGS) calculated from the alleles not transmitted from parents to offspring. This covariance was denoted as NT in Kong et al. (2018), which we also adopt here. Furthermore, the direct genetic effect of the PGS after removing the influence of genetic nurture can be estimated by subtracting NT from the covariance between the transmitted PGS and the offspring phenotypic value ( T ). However, primary phenotypic assortative mating, which we denote in this paper simply as assortative mating (AM), occurs when mates choose each other based on phenotypic similarity, and complicates the modeling of NT and T considerably. A single generation of AM on a heritable trait leads to a positive "trans" covariance between mates' PGS's (Robinson et al. 2017;Hugh-Jones et al. 2016). Such a covariance is a competing explanation for non-zero observations of NT ; specifically, part of NT may be due to the AM-induced covariance between the transmitted PGS from the mother (which has a direct genetic influence on the offspring), the nontransmitted PGS from the father, and vice-versa. Kong et al. recognized this confounding influence of AM on estimates of genetic nurture. In their study on educational attainment, they found evidence for AM in the parental generation but not before, and their approach to account for AM was therefore restricted to this scenario (disequilibrium AM). However, when AM occurs across multiple generations (equilibrium AM), recombination mixes causal variants of different parental origins on the same haplotype, leading to "cis" covariance between causal variants withinhaplotypes that is eventually (at equilibrium) equal to the trans covariance across mates. In this case, part of NT may also be due to the AM-induced cis covariance between the transmitted PGS and the nontransmitted PGS from the same parent. Therefore, when there is evidence for equilibrium AM, this additional covariance must be accounted for to avoid bias.
In our companion paper (Balbona et al., in press), we introduce a series of causal models that use transmitted and nontransmitted PGS's, along with the offspring phenotypic value, to estimate genetic nurture and direct genetic effects of the PGS under both equilibrium and disequilibrium AM scenarios. Importantly, these models also provide estimates of the full variance due to VT ( V F ). In particular, we showed that the full V F can be estimated when there is no AM regardless of the predictive ability of the PGS ( r 2 PGS ). When there is AM, inclusion of parental phenotypic data or an assumption about h 2 t0 -the total h 2 in the base population (before AM, or at "time 0")-allows estimates of the full V F , the full genetic nurture, and the full additive genetic variation. The causal modeling framework we used allows these models to be easily extended to account for different mechanisms of AM, to incorporate sibling and/or twin data, and to fit two traits bivariately in order to test cross-trait VT and AM.
The goal of this paper is to quantify the performance of the base models described in Balbona et al. (in press) when their assumptions are met, and to understand their sensitivity to assumptions when they are unmet. To do this, we simulated additive genetic effects, vertical transmission, and two types of AM (equilibrium and disequilibrium), and generated polygenic scores and phenotypes of parents and offspring. We instantiated the models in the OpenMx (Boker et al. 2011;Neale et al. 2016) structural equation modeling software, and compared the true values of parameters to maximum likelihood estimates from the simulated data to quantify their bias and precision.

Causal models
In Balbona et al. (in press), we described three causal models that differ in their assumptions and data used, and we derive the expectations of parameters in each. Model 0 uses data on the offspring phenotype ( Y o ) and on PGS's calculated from four sources: the transmitted paternal ( T p ), the nontransmitted paternal ( NT p ), the transmitted maternal ( T m ), and the nontransmitted maternal ( NT m ) haplotypes. Model 0 assumes no AM and no genetic effects other than those due to the PGS. Counter-intuitively, even when this last assumption is violated and the PGS explains little trait h 2 , we show mathematically that the estimate of the variance due to VT is unbiased. We verify this conclusion below. Note that VT is a process, not a score. In our path diagrams (Balbona et al., inpress), we denote F as the "familial" score caused by VT. Hence, the variation due to VT is denoted V F with estimate V F Model 1 uses the same data as Model 0, but incorporates the influence of AM on parameter expectations. As with Model 0, it assumes the PGS explains all genetic variation. Even though it models the influence of AM, Model 1 does not account for the influences of AM on the genetic effects not captured by the PGS. Thus, both Model 0 and Model 1 yield biased estimates when there is AM and the PGS explains less than the full h 2 . Because no PGS currently explains the full h 2 for any trait, these two models should not be used when there is evidence of AM, and the utility of modeling AM in Model 1 is mostly a didactic example of how AM can be modeled.
In addition to the data used in the models above, Model 2 includes observed maternal ( Y m ) and paternal ( Y p ) phenotypic values in order to model the effects of both the PGS (with variance estimate V PGS ) and the latent genetic score (LGS, with variance estimate V LGS ). The LGS is the unobserved genetic score defined to be statistically orthogonal to the PGS in the base population. Furthermore, Model 2 provides estimates of the full genetic nurture effect ( v +ŵ ), due to the covariance of parental effects with both the PGS ( ŵ ) and the LGS ( v ). We also investigated the performance of a modification of Model 2 (Model 2-NP) that uses no parental phenotype information, and instead uses assumed values of the h 2 in the base population, or at "time 0" ( h 2 t0 ). Such assumed values would presumably come from the literature, from models that provide decent estimates of h 2 t0 after accounting for AM and genetic nurture, such as extended twin family models (Keller et al. 2010) or relatedness disequilibrium regression (RDR; Young et al. 2018; see Balbona et al. (in press) for a caveat about using estimates from RDR for this). For Models 1 and 2, we either assumed that AM has reached equilibrium (Models 1e and 2e) or that it is at disequilibrium (Models 1d and 2d), having occurred for only a single generation. Modeling other types of AM is discussed in our companion paper, but these are not examined here.
Details on model assumptions and parameter expectations for each model are in Balbona et al. (in press). Table 1 of Balbona et al. (in press) provides an overview of all parameter names, whereas Table 1 of this paper provides an overview of the principal differences between the models. We translated each of the seven models in Table 1 into OpenMx code in R. We used the NPSOL optimizer (Gill et al. 1986) in OpenMx due to its ability to handle the many nonlinear constraints that were required to fit these models. The OpenMx scripts used here are available at https ://githu b.com/yoki5 348/VT_SEM.

Data simulation
We simulated data using a modified version of the GeneEvolve software (Tahmasbi and Keller 2017), which can simulate all the processes discussed here and create phenotypic data and genotypic data that has the same patterns of physical linkage disequilibrium observed in real SNP or sequence data. Our data did not require realistic patterns of linkage disequilibrium, and given that we needed to generate thousands of simulated datasets, we created a modified version of GeneEvolve in R in which all causal variants (CVs) were in linkage equilibrium in the base population. AM thereafter created realistic directional covariances between CVs.
For each simulation, we drew m = 100 binomially distributed CVs with minor allele frequencies (p) drawn from ∼ U(.1, .5) and with effect sizes ∼ N(0, where m is the number of causal variants. Half of the CVs contributed to the PGS and half to the LGS. We scaled the PGS so that it explained varying proportions of V Y in the base population ( r 2 PGS,t0 ), from h 2 t0 to .025, and we scaled the LGS to explain h 2 t0 − r 2 PGS,t0 proportions of V Y . We summed the PGS and LGS to create a total genetic score and created an environmental score ∼ N(0, V (PGS+LGS),t0 ) such that h 2 to = .50 and standardized Y such that V Y,t0 = 1 exactly in the base population. The same scaling coefficients were then used across all generations, such that the variances of parameters could increase over their base population values as a consequence of VT or AM.
GeneEvolve chose mates such that the mate correlation, r mate , was either 0 or .25 across generations. Each mate couple had two offspring. The two haplotypes of mates recombined at random, leading to four haplotypes (transmitted vs. nontransmitted crossed by maternal vs. paternal origin) for each offspring. To simulate VT, the offspring was constant across generations. It should be noted that V F reaches equilibrium after a single generation, whereas AM takes ∼ 5-10 generations before its consequences reach equilibrium. We therefore ran the modified GeneEvolve for a single generation to simulate disequilibrium AM, and for 20 generations to simulate equilibrium AM.
At the end of each simulation, we generated n fam = 16K trio families such that no siblings existed in the final data, although more distant collateral relatives existed sporadically. Because our models only use within-family information to estimate parameters, we do not expect that non-independence across families due to distant relatives influenced our point estimates, but it may have led to slightly smaller standard errors (SE's) of estimates than would occur if all families were unrelated. For the results presented in Figs. 1, 2, and 3, h 2 t0 = 0.5 , r 2 PGS,t0 = 0.05 or 0.5, V F,t0 = 0.15 , r mate = 0 or 0.25, and AM was at equilibrium or disequilibrium. We also investigated other parameter values of h 2 t0 , r mate , and V F,t0 , but results from these

Results
Figures 1, 2, and 3 show a comparison of estimates from five of the seven models we investigated. We show the results for the two models that used assumed values of h 2 t0 rather than parental phenotypes to derive estimates of latent genetic effects (Models 2e-NP and 2d-NP) in Supplement Figs. 1, 2, 3. When the assumed values of h 2 t0 are correct, estimates from Models 2e-NP and 2d-NP are very similar to estimates from Models 2e and 2d, although their SE's are slightly higher. Of course, estimates from Model 2e-NP and 2d-NP will be biased to the degree that assumed values of h 2 t0 are incorrect. It should be noted that Model 2d-NP is most similar to the approach taken by Kong et al, although they did not attempt to estimate V F . Nevertheless, the results in Figs. 1, 22, 3 from Model 2d should mimic most closely how their approach would perform, under the assumption that the value they assumed for h 2 t0 of educational attainment, taken from RDR , was correct. Figure 1 shows the equilibrium true parameter values (grey dotted lines) and parameter estimates (boxplots) from the five models when there is no AM. In the absence of AM, all the models provide unbiased estimates of V PGS , V F , w, and , both when the PGS explains all the h 2 (Fig. 1a) or only 10% of it (Fig. 1b). That Models 0 and 1 estimate the full V F even when their assumption that r 2 PGS,t0 = h 2 t0 is violated confirms our conclusion in the companion paper, where we explain why this occurs. Thus, in the absence of AM, the full variation due to VT can be estimated simply When there is equilibrium AM and the PGS explains all h 2 , the estimates from the models that assume equilibrium AM (1e and 2e) are unbiased (Fig. 2a). Estimates from Model 2e are unbiased when there is equilibrium AM and the PGS is weakly predictive, but Model 1e's estimates are sensitive to the assumption that r 2 PGS,t0 = h 2 t0 when AM exists (Fig. 2b). This is because AM induces an unmodeled covariance between the PGS and the LGS (i) that inflates NT , which in turn upwardly biases estimates of V F and w from Model 1e. However, Model 1e's estimates of the direct effect of the PGS, which come from T − NT , remain unbiased because i inflates NT and T to the same degree. Estimates from models that assume no AM (Model 0) or that model the wrong type of AM (Model 1d and 2d) do not properly account for the covariances between haplotypes that are induced by equilibrium AM, and therefore yield downwardly biased estimates when r 2 PGS,t0 = h 2 t0 and upwardly biased estimates when r 2 PGS,t0 < h 2 t0 . Nevertheless, the bias and spread of estimates from Model 2d are smaller than those of Model 1d because the observed mate covariance, as well as the covariance between one mate's PGS and the other mate's trait, are used in Model 2d, which decreases the bias in and therefore improves the estimation of other parameters.
When there has been a single generation of AM (disequilibrium AM), estimates from Model 2d are unbiased regardless of the predictive ability of the PGS, and estimates from Model 1d are unbiased as r 2 PGS,t0 approaches h 2 t0 (Fig. 3). When AM is at disequilibrium, estimates from models that assume equilibrium AM are typically biased. In particular, Model 2e's estimates related to VT ( V F , w, and v) are downwardly biased. This occurs because the covariances between haplotypic LGS's and PGS's (g, h, and i) implied by the causal model are larger than their actual values, which leads to expectations of NT that are larger than those observed. To compensate, estimates of V F , w, and v are lowered while estimates of V LGS are increased. Interestingly, when r 2 PGS,t0 = h 2 t0 , estimates of genetic nurture ( ŵ ) are unbiased when the model of AM is incorrect (Model 1e and 2e; Fig. 3b). It is not obvious from the math why this this occurs, but may be because downwardly biasing ̂ compensates for values of ĝ , ĥ , and ̂i implied by the causal model that are too high.
The results for the disequilibrium AM scenario are similar to those from equilibrium AM in that they demonstrate the sensitivity of these models to assumptions about how AM has operated. Fortunately, as we describe in our companion paper, there is a good deal of information in the covariances between the four haplotypic PGS's, the offspring trait, and potentially the two parental traits that allows assumptions regarding AM to be tested with high statistical power.

Correlations and standard errors of parameter estimates
A causal model is considered 'under-identified' when a set of two or more estimates use exactly the same information to estimate their values. All of the models we have reviewed above are identified. However, the information used to estimate different sets of parameters in these models is partially redundant, sometimes highly so, which decreases their precision. Figure 4 shows an example scatter plot between the estimates from the 1K simulated datasets from Model 2e (see Supplementary Fig. 4 for these results for Models 1e and 2e-NP). Much of the information to estimate V F and w comes from NT , with both estimates increasing with higher values of NT , and so it is sensible that these two estimates are highly ( r = .98 ) positively correlated. Furthermore, the assumption that the ratio of genetic nurture to direct genetic effects is the same for observed as for latent genetic effects ( v a = w ) is required for Model 2 to be identified. Thus, v is an increasing function of ŵ and therefore of NT ; hence, v is positively correlated with V F and ŵ . Much of the information to estimate V LGS comes from the residual parent-offspring covariance, cov(Y o , Y * ), after removing the effects having to do with VT ( V F , ŵ , and v ), which explains the negative correlations between V LGS and these three estimates. Finally, V PGS is not an observed variance (i.e., it is not simply synonymous with V Y ×r 2 PGS ). Rather, V PGS is the variance due to the direct effect of the PGS after removing its expected covariance with F, and so V PGS depends to some degree on the values of other estimates.
The high correlation values between parameter estimates suggests that their standard error (SE's) will be high unless large sample sizes are employed. For the results displayed in Fig. 5 where ̂i is the standardized estimate for simulated dataset i, and is the true standardized value of the estimate. Note that unlike results in Figs. 1, 2, 3, 4, SE's shown in Fig. 5 are from standardized estimates for interpretability. Figure 5a shows the SE's of estimates as a function of sample sizes (the number of trio families, n fam ) from 1K to 64K. We compared the SE's of models that provide unbiased estimates of V PGS , V F , and w when the PGS is weakly predictive and there is equilibrium AM (Models 2e and 2e-NP). Model 2e also estimates V LGS whereas its value is assumed in Model 2e-NP, and so we show the SE's of V LGS for Model 2e as well. As one would expect, SE's of these estimates decrease as n fam increases. Because their estimates correspond closely to observed statistics, V PGS and Fig. 3 Comparison of estimates across models when there is VT and disequilibrium AM. For each simulation, h 2 t0 = .50 , r mate = .25 , V F,t0 = .15 , and n fam = 16K . a r 2 PGS,t0 = .50 . b r 2 PGS,t0 = .05 . See Fig. 1 note for additional details w have much smaller SE's than V LGS and V F . The SE's of estimates from Model 2e are typically smaller than those from Model 2e-NP because Model 2e also uses information on parental phenotypes. To achieve a SE(V LGS ) and SE(V F ) smaller than .05, at least n fam = 8K trio families are required. These models can handle incomplete trios (only two of three family members sampled) so long as sufficient numbers of each type of relative pair exist in the sample. Of course, when there is such missingness, larger sample sizes are required. Finally, the SE's of V PGS and V F are slightly smaller for models that assume disequilibrium AM ( Supplementary Fig. 5), and so slightly smaller sample sizes are required for disequilibrium AM models to achieve equivalent statistical power. Figure 5b shows the SE's of estimates as a function of r 2 PGS,t0 when n fam is held constant at 16K. As r 2 PGS,t0 increases, the SE(V PGS ) also increases slightly because the SE is proportionate to the mean value of the estimate. On the other hand, the SE(V F ) and SE(V LGS ) decrease as r 2 PGS,t0 increases. This effect becomes more pronounced at low levels of r 2 PGS,t0 , especially for Model 2e-NP. Information to estimate V F comes primarily from ̂2 NT 2 T (Balbona et al. in press), and the variance of this ratio increases as r 2 PGS,t0 gets smaller. Because V LGS is strongly dependent on V F (Fig. 4), the SE(V LGS ) is similarly influenced by r 2 PGS,t0 . The relationships between r 2 PGS,t0 and the SE's of estimates from models that assume disequilibrium AM are very similar to those shown ( Supplementary Fig. 5B). Table 2 shows the convergence time in seconds for each of the seven models for various sample sizes. We ran the models in OpenMx using the NPSOL optimizer with feasibility tolerance = 1e − 7 and Standard Errors option set to Yes. For timing, we used a laptop with an i5 1.6GHz processor and 16GB RAM and ran each model and sample size combination a single time; hence there is some stochastic noise (e.g., Model 2e-NP took longer for n fam = 4K than n fam = 8K ). As can be seen, despite their apparent complexity, the models run very fast. The slowest model (Model 2d) took only ∼ 2.5 min with n fam = 64K . Thus, computational capacity and time should not be limiting factors for using these models.

Discussion
In this study, we quantified the performance of several causal models introduced in our companion paper that were inspired by Kong et al.'s approach for estimating genetic nurture. Using a Monte Carlo simulation approach to find true parameter values, generate trio datasets, and fit the simulated data in OpenMx, we confirmed that the estimates from these models are unbiased when assumptions are met. Indeed, when there is no AM, estimates from all models are unbiased even when the assumption regarding the predictive ability of the PGS is violated. However, when there is AM, estimates are sensitive to assumptions about the process leading to mate similarity; estimates are biased to the degree these assumptions are unmet. Fortunately, these assumptions do not have to be guessed at as the observed covariances involving the haplotypic PGS's provide information that can be used to differentiate various processes of AM (Balbona et al., in press).
Model assumptions are never met perfectly in real data, and so the violation of an assumption does not mean that the estimates from the model are worthless, but it does mean that it is important to interpret the estimates with the proper nuance. It would be impossible to present results for estimates under all possible scenarios, but we cover some of the major ones in the figures above. In Table 3, we provide an overview of how violations of the principal assumptions influence parameter estimates, including some assumptions that were not covered in the present manuscript. We show the direction of the assumption violation that the biases refer to in the third column; the effect on the estimate(s) would be in the opposite direction for violations in the opposite direction. For example, V F would be underestimated rather than overestimated if â 2 > a 2 . We omit the biases on ̂ , ĝ , â , ĥ , and ̂i because their effects are already included in V PGS and V LGS . Of course, biases increase as violations become more severe, and so it is important to have some idea of how assumptions fare for any given data. When it is clear that assumptions are violated to a degree that makes estimates significantly biased, users should attempt to alter the model to accommodate new assumptions that better fit the data at hand.
Many of the influences of assumption violations covered in Table 3 are derivable from Figs. 1, 2, 3 above. We briefly discuss two here that were not covered in our simulations. First, for some phenotypes, VT influences are likely to be stronger from one parent than the other ).
For example, several studies using a Mendelian randomization paradigm have found that maternal traits (e.g., height) influence neonatal trait values (e.g., gestational age; Lawlor et al. 2017;Zhang et al. 2015). Although not directly investigated, it is unlikely that paternal traits would have a similar impact. When . Therefore, V F will be underestimated. In future study, we will extend the models to allow VT influences to differ by  parental origin ( f p and f m ; see also Evans et al. 2019;Tubbs et al. 2020;Warrington et al. 2018). Second, we discussed in our companion paper how different genetic effects in parents vs. offspring ( * ≠ o ) can be modeled. When this is unmodeled, o = T − NT , as expected, and thus V PGS is unaffected (or minimally so when there is AM). However, NT is a function of * rather than o , meaning that the observed NT (=cov(NT m + NT p , Y o ) ) will be smaller than that implied by the model, leading to underestimates of V F and genetic nurture. Third, our models assume simple additive genetic effects with no dominance or epistasis. Because dominance does not inflate cov(Y o , Y * ) , we expect that dominance influences would go only into the residual variance, V . However, additive-by-additive (and higher order) epistasis would lead to unmodeled parent-offspring resemblance. For Model 2, this residual cov(Y * , Y o ) should upwardly bias V LGS and have second-order effects on V F and v. The primary limitation to the current paper is that we made some of the same assumptions in our simulations as we did in our models, meaning that the influences of several factors on parameter estimates have yet to be investigated. We discuss a few of these in turn. First, both our simulations and our models assume no shared environmental influences, although we believe that there is sufficient information to estimate them. So long as there is no gene-environment covariance due to factors other than genetic nurture (e.g., stratification has been controlled for properly in the GWAS and in the structural equation models), NT should only be influenced by genetic nurture and AM. Any residual differences between the mean phenotypic values of families after accounting for differences due to VT, genetic nurture, and genetics could be used to estimate shared environmental influences. Nevertheless, we have yet to build or test models that do this. Second, our models define the LGS as the genetic component that is statistically orthogonal to the PGS in the base population, and thus all covariance between the LGS and PGS (i) arises only from AM. Our simulations were based on this same assumption. We did not simulate physical linkage disequilibrium nor the process of building PGS's based on estimated effects from GWAS. We do not anticipate any changes to our conclusions had we done this, but it is an issue that awaits confirmation. Finally, we have yet to simulate several scenarios (especially stratification, gene-by-age interactions, epistasis, social homogamy, and genetic homogamy) in Table 3, and so the reported influences on parameter estimates for these in Table 3 should be considered provisional until there is a more formal treatment.
The use of PGS's to understand genetic nurture, as well as the direct genetic effect purged of their covariance with familial environmental effects, is an important advance made by Kong et al. We formalized this approach in a series of models introduced in our companion paper, and showed how this approach can also be used to estimate the total influence of parental traits on offspring traits. In the current paper, we have demonstrated that the models developed in our companion paper work as intended. These models are only the beginning, and they suggest many novel and exciting ways in which measured genetics data can be incorporated into family models to better understand the nature of nurture.