Figure 1 illustrates means and standard deviations for the 30 measures at T1, T2 and for T2 change. The details of the descriptive statistics, along with descriptive statistics further broken down by gender and zygosity, are included in Supplementary Tables 1–7. These results are based on one twin randomly selected from each pair so that the data points are independent. Results for the other twin are virtually identical, as shown in Supplementary Tables 8–10.
Almost as many changes were in a positive direction as in a negative direction. However, the effect sizes are modest as indicated by Cohen’s d statistic, which is the ratio of the mean difference to the standard deviation (Cohen 1988; Fig. 1). The average d across the 30 measures was 0.24, which accounts for less than two percent of the variance and includes as many positive as negative changes.
Cohen (1988) proposed, as convention, that a large effect size is a d of 0.8, accounting for about 25% of the variance. Only one large negative effect emerged, decreased Volunteering (0.84), which seems likely to be due to less opportunity for volunteering during lockdown.
A d of 0.5, considered a medium effect size, accounts for about 9% of the variance. Medium-sized mean differences in the negative direction emerged for three variables. Prosocial Behaviour declined (0.44), which, like Volunteering, might be due in part to reduced opportunity. Achievement Motivation decreased (0.47), which is worrying because emerging adults are our next generation of workers. Verbal Victimisation declined (0.58), which again could be explained by decrease in social interactions, particularly in person, during the lockdown. Hyperactivity-Inattention increased (0.42), which seems to fit with reports that people feel less able to concentrate. Other effect sizes were modest (d = 0.20).
These mean differences mask a wide range of individual differences. If the COVID-19 crisis affected people in more extreme ways, we would expect to see increased variance at T2. The standard deviations (Supplementary Tables 1 and 2) do not support this hypothesis. The average standard deviation at T2 (1.71) was slightly lower than at T1 (1.79). Out of 30 variables, variance decreased in 17 measures and increased in 13 measures (Supplementary Table 3). Many of these variance differences are significant even after correcting for multiple testing, however, the effect size as indexed by F value (ratio between variance at T1 and T2) is small (average F ratio 1.49, regardless whether variance increased or decreased from T1 to T2), and the effect sizes were smaller when variance increased from T1 to T2 than when variance decreased. These variance differences were similar for males and females (average F ratio 1.65 for males and 1.52 for females).
For these analyses and the following analyses of individual differences, we focused on variables that showed sufficient variability and approached normal distributions, including Achievement motivation, Alcohol use (measured by multiplying the frequency with quantity), Community satisfaction, Conduct problems, Depression, Emotional problems, General anxiety, Healthcare, Hyperactivity/inattention, Importance of relationships, Love and relationships, Media use, Money attitudes, Peer problems, Physical activity, Prosocial behaviour, Purpose in life and Volunteering.
If the COVID-crisis re-shuffled the rank order of individual differences, we would expect to see little stability from T1 to T2. Pearson correlations from T1 to T2 are shown in Fig. 2 and listed in Supplementary Table 11, separately for males and females. The average correlation is 0.48 across the 2-year gap. The most stable measures include Purpose in Life (0.68), Emotional Problems (0.56), Peer Problems (0.58), General Anxiety (0.57), and Depression (0.56). Stability correlations were generally similar for males and females, with average stability correlations of 0.50 and 0.47, respectively.
Reliability of the measures represents a ceiling for stability. In TEDS, we obtained two-week test–retest reliability from TEDS twins on most measures as part of our preparatory work for the 2018 (T1) assessment (Supplementary Table 12). The average test–retest reliability was 0.71, ranging from 0.47 for Importance of Healthcare to 0.84 for Volunteering. The average stability correlation of 0.48 implies that 48% of the total variance of the measures was stable from T1 to T2. Taking test–retest reliability into account (through dividing the correlation estimate by the test–retest coefficient) suggests that 68% of the reliable variance of the measures was stable from T1 to T2. This finding indicates that there is still some change between T1 and T2 across the range of psychological and behavioural measures studied here.
Despite the substantial stability from T1 to T2, T2 change scores revealed some individuals who changed dramatically in positive as well as negative directions, as illustrated in Supplementary Fig. 1.
Although phenotypic moderation of the psychological response to the COVID crisis revealed many significant interactions between moderators and outcome variables, these interactions did not survive correction for multiple testing using Bonferroni corrections. Moreover, the effect sizes of the interaction terms were small, explaining less than 1% of the variance in all cases (See Supplementary Tables 13–33).
Genetic and environmental aetiologies of variances and covariances
Figure 3 depicts intraclass correlations for identical and non-identical twins at T1 and T2 and for T2 change scores. (See Supplementary Tables 34–36 for the correlation coefficients). We will describe the main results of the twin analysis using these twin correlations, although later we show that these results are confirmed by structural equation modelling, which also provides 95% confidence intervals for the genetic and environmental estimates.
At T1, the average twin correlations for identical and non-identical twins were 0.35 and 0.16, respectively. Because identical twins are identical genetically whereas non-identical twins are only share 50% of their segregating genes, the difference in their correlations indexes genetic influence on individual differences, called heritability. Doubling the difference between these correlations suggests a rough estimate of heritability of 35% at T1 because heritability cannot exceed the identical twin correlation. At T2, the average twin correlations for identical and non-identical twins were similar, 0.31 and 0.16, as was the average heritability of 30%, despite the COVID-19 crisis and lockdown.
Twin resemblance not explained by zygosity can be attributed to shared environment (C). In other words, the extent to which heritability does not account for the identical twin correlation is a rough index of C. On average, C was negligible at T1 (2%) and T2 (4%).
The rest of the variance is attributed to a residual component of variance (E) that includes non-shared environment plus unreliability of measurement. The average E was 63% at T1 and 66% at T2. Test–retest reliabilities suggest that non-shared environment accounted for about half of E at T1 and T2.
Deducting the component of variance due to unreliability indicates that about half of the reliable variance at T1 and T2 can be attributed to inherited DNA differences. In other words, of the total variance at T1 and T2, about 40% can, on average across the measures, be attributed to genetic factors, about 30% to non-shared environmental factors, and about 30% to unreliability of measurement. Shared environmental influence has negligible impact.
T2 change scores show lower heritabilities, 16% on average. Because T2 change is a residualised score independent of scores at T1, stable genetic influence from T1 to T2 is removed from T2 change scores. Thus, heritability of T2 change scores represents novel genetic influence at T2 that does not affect T1. This new genetic influence could be due to gene–environment interplay: gene–environment correlation, that is where environments young adults chose or were exposed to did not simply happen at random but were correlated with their genotypes; or gene–environment interaction, that is young adults responded differently to the environment (e.g. COVID-19 and associated lockdown) based on their genotypes; alternatively, the new genetic influence can be explained by maturation over the 2-year period. Shared environment, which includes not only shared rearing environment (the twin pairs grew up together in the same family) but also shared experiences during the COVID-19 crisis, has negligible effects on T2 change, 3% on average. Most of the variance of T2 change scores is due to the E component of variance, 81% on average. We cannot separate E of T2 change scores into non-shared environment and unreliability of measurement because test–retest reliability at T1 cannot be assumed to represent the reliability of T2 change scores.
Univariate model-fitting results
These results about variance and covariance gleaned from the twin correlations are highly similar to the results of univariate model-fitting analyses of variance for T1, T2 and T2 change measures, as shown in Fig. 4. (See Supplementary Table 37 for model-fit statistics, precise ACE estimates and confidence intervals). Even though some fit statistics indicate that a better model to fit would have been AE or ADE model, and the fit statistics are only satisfactory, we decided to report full ACE model for all traits for completeness. The average broad model-fitting heritability estimates were 32% for T1, 32% for T2 and 15% for T2 change, likely encompassing both additive and non-additive genetic effects. Model-fitting estimates of shared environment were 3% for T1 measures, 3% for T2 measures and 2% for T2 change measures. Average model-fitting estimates of E were 66%, 65% and 82%, respectively.
Bivariate model-fitting results
The Cholesky Decomposition bivariate model-fitting model separates A, C and E components of variance at T2 into variance in common with variance at T1 and variance at T2 independent of variance at T1. As explained in Methods, the model yields estimates of the extent to which the phenotypic correlation between T1 and T2 is accounted for by A, C and E. The genetic correlations are shown in the top panel of Fig. 5 (See Supplementary Fig. 2 for shared environmental and non-shared environmental correlations). The results of the Cholesky bivariate analysis are illustrated in the bottom panel of Fig. 5, with details in Supplementary Tables 38–43. Genetics accounts for 55% of the T1–T2 phenotypic correlations on average. Shared environment accounts for 4% of the phenotypic correlations on average. E influences shared at T1 and T2 are responsible for the rest of the phenotypic correlations (40%), which could be stable non-shared environmental influences and/or correlated error.
The Cholesky model also estimates A, C and E components of variance at T2 independent of their respective A, C and E components of variance at T1. These A, C and E estimates at T2 independent of those at T1 (Supplementary Tables 38–43) are, as expected, similar to the A, C and E estimates for T2 change shown in Fig. 4.
Figure 5 also shows the genetic correlations between T1 and T2 and shows the proportion of the phenotypic correlations (presented in Fig. 2) that can be explained by genetic, shared-environmental and non-shared environmental factors. As explained in Analyses, the Cholesky model estimates the genetic contribution to phenotypic stability from T1 to T2, which includes the genetic correlation. The genetic correlation is the correlation between genetic effects at T1 and T2 independent of the T1 and T2 heritabilities. The genetic correlations averaged 0.91, and most of their 95% confidence intervals included 1.0, indicating that genetic effects at T2 were substantially correlated with genetic effects at T1, despite the COVID-19 crisis and lockdown, although it should be noted that the heritabilities for diverse traits are modest to moderate.
Twins locked down together vs apart
Finally, we investigated possible moderators of the univariate results. The most novel moderator is whether the twins were locked down together or living apart during lockdown. Lockdown presents a quasi-experimental test of contemporary shared environments by comparing results for the 28% of twins living together during lockdown and those living apart. If shared lockdown experiences were important, twins locked down together should be more similar than twins living apart during lockdown. On the basis of the generally weak effects of shared environment, we predicted that environmental effects due to living together during lockdown are negligible.
At first this prediction seemed wrong because the average twin correlation for twin pairs locked down together (0.30) was higher than the correlation for twin pairs living apart during lockdown (0.23), although this difference was not significant (p = 0.051). However, this possible effect of shared environments might be a genetic effect in disguise because identical twins locked down together more often than non-identical twins (32% vs 25%). Results of univariate model-fitting separately for twins locked down together vs apart (Fig. 6) are consistent with the notion that the apparent effect of shared environments might be mediated in part genetically (Supplementary Table 44–49 for model-fitting results including the 95% confidence intervals). For T2 scores, twins together yielded a slightly higher average estimate of shared environmental influence compared to twins apart (0.07 vs 0.03), suggesting some very slight increase in true shared environmental influence. However, twins together also yielded a slightly higher average estimate of genetic influence compared to twins apart (0.33 vs 0.30), which could be the result of genetically influenced selection for being locked down together, which would be an example of gene–environment correlation. However, a great deal of caution is warranted in these interpretations because the difference in phenotypic correlations for twins locked down together vs apart is not significant, and our design has negligible power to detect significant differences of this magnitude for A and C.
Power to detect significant differences for such small effects is negligible. Nonetheless, further support for the hypothesis that the apparent C effect of being locked down together is not really C comes from finding nearly identical A and C estimates pre-existing at T1: A and C are 0.33 and 0.06 for twins together and 0.30 and 0.03 for twins apart. Results of T2 change scores provides additional confirmation in that a similar pattern emerged: A and C are 0.19 and 0.04, respectively, for twins together and 0.14 and 0.02 for twins apart.
We also considered other potential moderators. For example, similar to being locked down together or apart, gender is a dichotomous variable that is the same for both members of a twin pair (when opposite-sex non-identical twins are excluded). Separate univariate analyses for male and female twins yielded similar results. These model-fitting results are presented in Supplementary Tables 50 and 51.
For the continuous moderator of family SES and for moderators that can be discordant for members of a twin pair (losing a job/financial difficulties, living conditions during lockdown, COVID-19 symptoms, impact of COVID-19 on family health and financial situation, worries of infection and impact on health, change in sleep habits), we corrected T2 and T2 change scores for these moderators and repeated the analyses. ACE estimates were similar when we compared estimates before and after correction for these moderators. These model-fitting results are included in Supplementary Tables 52–66.