Introduction

Although an active lifestyle is accepted to be a major contributor to health (Garber et al. 2011; Janssen and Leblanc 2010) and the period of childhood and youth likely constitutes a critical phase of life to establish long-term activity habits (Telama et al. 2014), a large proportion of children and adolescents does not meet physical activity guidelines (Colley et al. 2011; Hallal et al. 2012). Regular exercise behavior in leisure time, due to its higher intensity compared to habitual physical activity, is a promising target for interventions (Samitz et al. 2011) and a lot of research has therefore been devoted to the determinants of exercise behavior, with studies traditionally focusing on environmental determinants such as socioeconomic status, access to exercise facilities and social support (Biddle and Mutrie 2008; Sallis et al. 2000; van der Horst et al. 2007). Twin studies provide an important addition to these efforts as they allow for the examination of how much of the population variance in exercise behavior is due to factors shared by family members (as opposed to non-shared environmental factors) and the extent to which these familial factors are shared genetic factors or shared environmental factors.

Most twin studies on exercise behavior have been conducted in adults, with only a handful of studies in younger individuals (for an overview, see Fig. 1 in Huppertz et al. (2012)). The Netherlands Twin Register (NTR) has conducted studies on regular exercise behavior during leisure time in 7-, 10- and 12-year-old twins (Huppertz et al. 2012), as well as in 14-, 16- and 18-year-old twins (Boomsma et al. 1989; de Geus et al. 2003; de Moor et al. 2011; Koopmans et al. 1994; Stubbe et al. 2005; van der Aa et al. 2010). In childhood, shared environmental effects explained most of the variance in exercise behavior, whereas in late adolescence, genetic effects became more important. However, these studies have only reported the relative influence of genes and the environment, while the observed pattern could be caused by different mechanisms. It could arise from a simultaneous decrease in shared environmental variance and an increase in genetic variance, but also from a decrease in shared environmental variance only or an increase in genetic variance only. To elucidate the underlying mechanism, the absolute variance components have to be estimated across ages. Vink et al. (2011) investigated the effect of age on the absolute and relative genetic, shared environmental and non-shared environmental variance in exercise behavior of adult participants of the NTR and found that the genetic variance remained stable from age 19 to 50 years, whereas the non-shared environmental variance increased, giving rise to a gradual decrease in the heritability of adult exercise behavior with increasing age.

Fig. 1
figure 1

Twin correlations of exercise behavior and 99 % confidence intervals based on fully saturated threshold models. MZM monozygotic male, DZM dizygotic male, MZF monozygotic female, DZF dizygotic female, DOS dizygotic of opposite-sex

Although exercise behavior has been shown to track moderately from childhood to adolescence (Telama 2009; Telama et al. 2014; Twisk et al. 2000), the nature of this stability has not been assessed in longitudinal twin studies. Tracking of exercise behavior from adolescence into young adulthood, however, has been assessed previously in Finnish twins with longitudinal data at the ages of approximately 16, 17, 19 and 26 years (Aaltonen et al. 2013). The genetic correlations across ages ranged between 0.78 and 0.82 for males and between 0.54 and 0.67 for females. The shared environmental correlations ranged between 0.53 and 0.76 for males and between 0.73 and 0.85 for females, indicating that both stable and new genetic and environmental factors affect exercise behavior in this age range, with some more stability of genetic influences in males and of shared environmental influences in females. The non-shared environmental correlations were lower, in part reflecting that they incorporate measurement error, which may be specific to each measurement occasion.

In the above study by Aaltonen et al. (2013), the genetic and shared environmental correlations over time were retrieved by means of a Cholesky decomposition that does not assume any specific underlying structure of the data. So-called transmission or simplex models instead assume that successive measures of exercise behavior are causally linked so that the behavior at each new age builds upon earlier experiences. In addition to the effects of past behavior (“transmission”), new influences may enter the picture at each age to account for changes in exercise behavior (“innovation”). In a genetically informative longitudinal study, it is possible to go one step further and to explore transmission and innovation at the level of the variance components (Neale and Cardon 1992). In such a study, one can account for the fact that genetic and environmental influences may show different patterns of transmission and innovation. For example, the genetic contribution to exercise behavior during leisure time could be largely transmitted from age to age and additionally, new genetic influences could come into play during development. If environmental effects on stability, in turn, were small, this would be reflected in larger innovation compared to the transmission effects. The pattern may be particularly complex for children’s and adolescents’ exercise behavior in view of the large changes in the genetic architecture over time.

This study aims to (1) investigate the effect of age on the absolute and the relative genetic, shared environmental and non-shared environmental variance of exercise behavior in childhood and adolescence and to (2) elucidate the longitudinal genetic structure of exercise behavior by assessing transmission and innovation of the genetic and the shared environmental components over time. For these purposes, we fitted both an age-moderation model and a simplex model on data of twins aged approximately 7, 10, 12, 14, 16 and 18 years.

Methods

Participants

The NTR provided data on exercise behavior of twins aged approximately 7 (“survey 7”), 10 (“survey 10”), 12 (“survey 12”), 14 (“survey 14”), 16 (“survey 16”) and 18 years (“survey 18”) (van Beijsterveldt et al. 2013; Willemsen et al. 2013). After excluding some extreme cases that had filled out the survey more than 2 years after the targeted age, data were available for 7394 individuals at survey 7, 8111 at survey 10, 14,916 at survey 12, 9621 at survey 14, 6585 at survey 16 and 2883 at survey 18. From this dataset, 375 participants were excluded due to diseases or physical handicaps that may prevent them from being physically active (e.g., congenital heart disease, hemiplegia). For the surveys 14, 16 and 18, an injury at the time of assessment led to exclusion of the exercise data for that specific survey (N = 449 for survey 14, N = 490 for survey 16, N = 69 for survey 18). The top 0.1 % of all observations within each survey (that is those with unrealistically high scores on exercise behavior) were excluded as outliers (N = 48 observations). The final sample consisted of 27,332 twins (48.1 % males, 51.9 % females), with two measurements for 6861 individuals, three measurements for 4779 individuals and four measurements for 1341 individuals. The longitudinal structure included 2-year follow-ups (surveys 10 and 12, 12 and 14, 14 and 16, 16 and 18), a 3-year follow-up (7 and 10), 4-year follow-ups (10 and 14, 12 and 16, 14 and 18), a 5-year follow-up (7 and 12), 6-year follow-ups (10 and 16, 12 and 18) and a 7-year follow-up (7 and 14). Supplementary Table 1 depicts the number of twins and complete twin pairs for each combination of surveys, split by zygosity. Most data were collected around age 12, because 1) items assessing exercise behavior were first introduced to survey 12 in 1999 and to the other surveys approximately 5 years later (2004/2005) and 2) some participants were too old to provide data on for instance survey 7 at the time that the exercise items were included, whereas others were not old enough yet to provide data on for instance survey 18 at the time that the data were analyzed. The number of twins and complete twin pairs for each survey, split by zygosity, are presented in Table 1.

Table 1 Number of twins (complete pairs) with data on exercise behavior after applying exclusion criteria, split by survey and zygosity

For 18.5 % of the same-sex twin pairs, zygosity was determined by blood group or DNA typing. For the remaining ones, it was determined by survey items on physical similarities and confusion by family members and strangers. Zygosity classification based on these items has shown 93–97 % agreement with DNA polymorphisms (Rietveld et al. 2000; Willemsen et al. 2005). Parents consented to take part in research of the NTR upon registration. Around the age of 13 years, adolescent twins provided their informed consent to fill out surveys. The data collection protocol was approved by the Medical Research Ethics Committee of the VU University Medical Center (IRB letter May 2007 for parental report and letter no. 2003/182 for adolescent self-report).

Measures

Exercise behavior during leisure time was assessed with similar measures across surveys. In surveys 7, 10 and 12, parents were provided with a list of common exercise activities in the Netherlands (such as athletics, badminton, ballet/dance, basketball, fitness training, gymnastics, handball, jogging/running, hockey, netball, horseback riding, (ice-)skating, tennis, martial arts, soccer, swimming, volleyball), plus the option to add additional activities, and were asked (1) whether or not their children participated in the exercise activities and, if so, (2) for how many years, (3) for how many months a year, (4) how many times a week and (5) how many minutes each time they participated in the respective activity. Adolescents were asked to report on their own behavior in essentially the same way. This study focuses on regular exercise behavior during leisure time. This includes both supervised and unsupervised activities. It excludes physical activities related to transportation (walking, biking), physical education classes and irregular exercise activities that were initiated less than half a year ago or that were performed for less than 3 months per year (e.g., ski holidays).

Exercise behavior was quantified as weekly Metabolic Equivalents of Task (MET) hours. Each activity was assigned a MET score, based on Ridley et al. (2008)’s compendium of energy expenditures for youth. A MET score represents the energy that is expended to perform a specific activity relative to the standard resting metabolic rate, which would be one MET. For instance, running at a moderate level requires 8.5 times the energy that is used while sitting quietly and thus running has a MET score of 8.5. Individuals who did not participate in any exercise activities received a weekly MET hours score of zero. For the remaining individuals, the product of the MET score, weekly frequency and duration was summed across all exercise activities to obtain “total weekly MET hours that were spent on regular exercise activities during leisure time”.

For the surveys 7, 10 and 12, both parents reported on exercise behavior of their children for 59.4, 23.1 and 42.1 % of the sample, respectively. For these cases, the average rating of the parents was used as the correlations between mothers’ and fathers’ ratings were high (0.74, 0.88 and 0.89, respectively). In addition, 37.5 % (survey 7), 75.8 % (survey 10) and 56.5 % (survey 12) of the ratings were based on maternal report only, and 3.1, 1.1 and 1.4 % on paternal report only, respectively. After survey 12, self-ratings were analyzed.

Statistical Analyses

The percentage of non-exercisers (individuals with zero MET hours per week) increased with age (13 % for survey 7, 13 % for survey 10, 14 % for survey 12, 21 % for survey 14, 28 % for survey 16 and 40 % for survey 18). This led to a highly skewed distribution of the phenotype for the older ages which could not be corrected by simple transformation. These censored data would have led to downward biases of the shared environmental components and upward biases of the non-shared environmental components (Derks et al. 2004). Therefore, the data were categorized into three groups (coded 0, 1, 2), based on the following cutoffs: (0) ≥0 and <5 weekly MET hours (“low”), (1) ≥5 and <20 MET weekly hours (“middle”) and (2) ≥20 weekly MET hours (“high”). These cutoffs were chosen based on the condition that for each survey, at least 10 % of the individuals should fall into each group. The data were analyzed using liability threshold models (Falconer and Mackay 1960; Wright 1934), with two thresholds separating the three groups. These models assume that a latent continuous liability underlies the skewed distribution of the measured phenotype. The resemblance of twins was thus calculated based on this liability. We expected large changes in means and variances with age. These can either be taken into account by constraining the means and variances and allowing the thresholds to vary, or by constraining the thresholds and allowing the means and variances to vary. The second approach was chosen here for a more straightforward interpretation of the results. The thresholds were constrained to −0.64 and 0.23, respectively, which are the z-scores that correspond to the percentages of individuals in the three exercise-categories in the cross-sectional dataset.

The first set of analyses aimed to investigate the effect of age on the absolute and the relative contribution of genetic factors (“A” for “additive genetic”), environmental factors that are shared within twin pairs (“C” for “common environmental”) and non-shared environmental factors (“E”, including measurement error) to the total variance in exercise behavior in childhood and youth. To get a rough impression of the genetic architecture, polychoric twin correlations were calculated for each survey (7, 10, 12, 14, 16 and 18) based on so-called saturated models. These models estimate the twin correlations for each sex-by-zygosity group without attempting to model the correlations as a function of genes or the environment.

Next, a model specifying the genetic and environmental architecture of the liability to exercise behavior was fitted to the data, namely a moderation model with linear and quadratic effects of z-transformed age as moderators on the means and variance components (Medland et al. 2009; Purcell 2002; Purcell and Koenen 2005). A cross-sectional dataset was created by selecting one observation for each individual out of the full, longitudinal dataset. The selection favored data points that were collected for both twins of a pair within the same survey and it was aimed to select approximately the same number of observations for all ages. Based on previous studies, we decided to fit a model to these data that included A-, C- and E- components and that allowed for quantitative and qualitative sex differences. Quantitative sex differences were taken into account by estimating separate parameters for males and females. Based on our previous work (Huppertz et al. 2012; Stubbe et al. 2005), qualitative sex differences were modelled by freely estimating the correlation between the latent shared environmental components in dizygotic twins of opposite-sex (DOS) instead of constraining them to 1, while leaving the correlation between the genetic components constrained at 0.5. In total, 26 parameters were estimated: two grand means (one for males, one for females), six variance components (A, C, E, for males and females separately), one correlation between the shared environmental components of DOS twins, the linear and quadratic effects of age on the means (four parameters) and on the latent variance components (12 parameters), and the linear effect of age on the correlation between the shared environmental components of DOS twins (one parameter). The latter was done to account for changes in qualitative sex differences with age.

The second set of analyses aimed to elucidate the longitudinal structure of exercise behavior. To get a rough impression of the stability of exercise behavior over time, phenotypic polychoric correlations across the surveys 7, 10, 12, 14, 16 and 18 were calculated with the R-package polycor, based on one randomly selected individual per twin pair. To gain insight into the relative contribution of genetic and environmental factors to these longitudinal correlations, the within- and cross-survey twin correlations were calculated for each of the five sex-by-zygosity groups using the same package.

Finally, a longitudinal genetic model was fitted to the full dataset to decompose the within- and cross-survey (co-) variance into genetic, shared environmental and non-shared environmental effects. The A-components and the C-components were modelled with a simplex structure (Boomsma and Molenaar 1987), whereas the E-components were modelled with a Cholesky structure, where every latent variable that influences one time point also influences subsequent, but not previous, time points (Neale and Cardon 1992). The Cholesky structure can thus be thought of as a “full model” and was chosen for the E-component as no specific underlying structure is to be expected since the E-component is a mixture of “real” non-shared environmental influences and measurement error. The simplex structure, in contrast, explicitly differentiates between transmission and innovation. The analyses were conducted on same-sex twin pairs only and quantitative sex differences were taken into account by estimating separate parameters for males and females. In total, 49 parameters were estimated for each sex: one mean for each survey (six parameters), genetic transmission (five parameters), genetic innovation (six parameters), shared environmental transmission (five parameters), shared environmental innovation (six parameters) and non-shared environmental effects (21 parameters). If not stated otherwise, the genetic analyses were conducted in the software package OpenMx in R (Boker et al. 2011).

Results

Table 2 contains the mean age of the participants for each survey, as well as the number and percentage of individuals engaged in the different levels of exercise behavior. For both sexes, the percentage of individuals with low exercise behavior increased from survey 12 to survey 18. The reverse trend was seen for individuals with a moderate level of exercise behavior of which the relative frequency decreased from survey 7 to 18. With the exception of a smaller percentage at survey 7, the percentage of individuals with a high level of exercise behavior remained fairly constant. In all surveys, males exercised significantly more often at a high intensity level than females (p < 0.001).

Table 2 Mean age (standard deviation) and the number and percentage of individuals engaged in the different levels of exercise behavior, split by sex and survey

Age-moderation Model

The polychoric twin correlations of each survey are depicted in Fig. 1. The MZ twin correlations were only marginally larger than the DZ twin correlations at survey 7, but the difference between MZ and DZ correlations increased with increasing age. At the same time, the MZ correlations were generally smaller than twice the DZ correlations, suggesting shared environmental influence. The same-sex correlations within each zygosity were comparable for males and females, suggesting no quantitative sex differences. The DOS correlations were smaller than what would be expected based on the same-sex DZ correlations which implies qualitative sex differences. The difference decreased with increasing age and disappeared in later adolescence.

The unstandardized genetic, shared environmental and non-shared environmental variance of exercise behavior across surveys is depicted in Fig. 2. Although age was z-transformed for the analyses, the x-axes depict age in years, for clarity (age range: 6.78–19.99 years). Supplementary Table 2 depicts the number of twins and complete twin pairs for the age moderation analyses, split by zygosity. Based on 99 % confidence intervals, the linear effects on the A-components were significant for males (β = 0.50) and females (β = 0.30), whereas the quadratic effects were not. For the C-components, only the quadratic effect in females was significant (β = −0.11). Finally, the linear effects on the E-components were significant for males (β = 0.32) and females (β = 0.17), as were the quadratic effects (β = 0.10 and β = 0.03, respectively). In sum, there was a large increase in genetic variance with age, paired to a more modest increase in non-shared environmental variance. The influence of shared environmental effects showed an inversed U-shape for females, but the effect was small compared to the increase in genetic variance. It should be noted that the total variance was much larger for males than for females. Next, the genetic, shared environmental and non-shared environmental variances were standardized to obtain their relative contribution, for males and females separately. The standardized estimates are depicted in Fig. 3. The A-component increased with age whereas C-component decreased and the relative contribution of E remained relatively low at all ages.

Fig. 2
figure 2

Changes in the absolute contribution of genetic, shared environmental and non-shared environmental factors to variance in exercise behavior as a function of age, for males (left) and females (right) separately. Am A-component for males [linear beta = 0.50 (99 % CI 0.29; 0.69); quadratic beta = 0.03 (−0.13; 0.17)], Cm C-component for males [0.18 (−0.18; 0.37); −0.09 (−0.29; 0.05)], Em E-component for males [0.32 (0.26; 0.39); 0.10 (0.06; 0.15)], Af A-component for females [0.30 (0.22; 0.37); 0.02 (−0.05; 0.09)], Cf C-component for females [−0.00 (−0.12; 0.10); −0.11 (−0.19; −0.03)], Ef E-component for females [0.17 (0.14; 0.19); 0.03 (0.01; 0.06)]

Fig. 3
figure 3

Changes in the relative contribution of genetic, shared environmental and non-shared environmental factors to variance in exercise behavior as a function of age, for males (left) and females (right) separately

Simplex Model

The phenotypic polychoric correlations across the surveys 7 to 18 are shown in Table 3. The correlations, which reflect tracking of exercise behavior over time, were mostly moderate, ranging from 0.23 to 0.75, with larger correlations between surveys in closer proximity to each other and in older individuals. Supplementary Table 3 depicts the within- and cross-survey twin correlations. MZ cross-survey correlations were generally larger than DZ cross-survey correlations, implying genetic influences on stability. In combination with the lower longitudinal correlations for surveys that were further apart, this reinforces the use of a genetic simplex model (Boomsma and Molenaar 1987).

Table 3 Phenotypic correlations across repeated measurements, for males and females separately (standard error; N)

Figure 4 depicts the path estimates of the genetic and shared environmental components as estimated in the simplex model. The depicted parameters were all freely estimated. Table 4 depicts the relative contribution of genetic, shared environmental and non-shared environmental effects to variance in exercise behavior for each age. The genetic and shared environmental variance components are further separated into the part that is due to transmission and the part that is due to innovation. The transmission part is calculated based on all paths that precede the respective survey, whereas the innovation part is calculated based on the innovation path for the respective survey only. In males, genetic transmission was strong from survey 10 onwards and relatively more important than genetic innovation, with the exception of a strong genetic innovation at survey 18. A different pattern appeared for females in that genetic effects were also transmitted across surveys but new effects consistently emerged for each survey, with approximately the same amount of innovation and transmission for the surveys 16 and 18. The shared environmental effects were marked by both transmission and innovation, with a tendency for innovation being more important in males and transmission in females. For survey 18 in males and the surveys 16 and 18 in females, no new shared environmental effects emerged.

Fig. 4
figure 4

The path estimates of the genetic and shared environmental components of exercise behavior as estimated with the simplex model, for males and females separately. The paths of the non-shared environmental components were omitted for clarity

Table 4 The relative contribution of genetic (A) and shared environmental (C) effects (split by transmission (T) and innovation (I)), as well as non-shared environmental (E) effects to variance in exercise behavior based on the simplex model, split by sex and survey

Discussion

This study aimed to investigate the impact of genes and the environment on the development of exercise behavior across childhood and adolescence. In this period, the total variance in exercise behavior increased because relative to children, adolescents were less often engaged in moderate levels of exercise behavior and more often in low levels, whereas the percentage of individuals with high levels of exercise behavior remained fairly constant throughout childhood and youth. Two genetic models were fitted to the data: an age-moderation model and a simplex model. The age-moderation model used the largest available cross-sectional dataset and revealed that the absolute genetic variance in exercise behavior increased with age, whereas the absolute shared environmental variance remained relatively stable. Therefore, the relative contribution of shared environmental factors decreased across incremental age groups. The simplex model used repeated measures within the same persons to detect the sources of developmental changes in exercise behavior and showed that genetic factors influencing exercise behavior were a main source of stability, particularly in males. Shared environmental factors showed marked innovation around the ages of 10 and 12 years in both sexes. The role of new shared environmental effects diminished after age 12 and disappeared around the age of 18 years.

Taken together, the age-moderation model and the simplex model converge on a singular pattern. Individual differences in childhood exercise behavior are strongly determined by shared environmental factors with 80 % of the variance determined by C around age 7. Throughout the development from age 7 to age 18, genetic factors gradually overwhelm the effects of the shared environment, especially in males. Age 14 is a tipping point where the relative influence of genes definitively trumps that of the shared environment. At age 18, heritability of exercise behavior in young men is very high (79 %), whereas it is moderately high in young women (49 %), where the effects of the shared environment still linger (19 %).

Several previous twin studies have explored the heritability of exercise behavior in childhood and youth. Huppertz et al. (2012) investigated the heritability of exercise behavior for the ages 7, 10 and 12 years, based on a subset of the data that were used for the present study. With the exception of 10-year-old boys (A = 66 %, C = 25 %), most of the variance in exercise behavior of 7- to 12-year-olds could be explained by shared environmental factors (C = 50–72 %). There were significant qualitative sex differences for the ages of 7 and 12 years. At age 10, a sudden rise in heritability was reported for boys, which was not found in the present study, probably due to the larger sample size. The large shared environmental influence in childhood is in line with findings of small-scale studies on total physical activity measured with accelerometers (Fisher et al. 2010), respiratory gas exchange and doubly labeled water (Franks et al. 2005), and pedometers (Plomin and Foch 1980), although it should be noted that these studies investigated somewhat different phenotypes, which limits comparability to our study.

Shared environmental factors influencing exercise behavior have also been noted in the age range of adolescence before. Maia et al. (2002) calculated the heritability of the sports participation index in 12- to 25-year-old twins (N = 411 pairs) and found that for males, 68 % of the variance was explained by genetic effects and 20 % by shared environmental effects. Estimates were 40 and 28 % for females. In a larger sample (N = 5216 individuals at baseline), Aaltonen et al. (2013) found heritability estimates of around 43–52 % in approximately 16- to 19-year-old twins, with a shared environmental influence of 18–26 %. Two other studies, however, report results that appear not immediately consistent with our finding. We suspect that this reflects the practice of reporting data on the best fitting AE model rather than a full ACE model when dropping C is found to deliver the most parsimonious model. Non-significance of the C-component does not necessarily mean that it is absent, however, but simply that it is relatively hard to pick up with classical twin studies, unless the sample size is very large (Posthuma and Boomsma 2000). For instance, van der Aa et al. (2010) investigated the heritability of exercise behavior in 14-, 16- and 18-year-old twins on a subset of the dataset that was used for the present study. Genetic effects appeared to be the most important contributors to the total variance in boys and girls (A = 72–85 %), with the exception of 14-year-old girls (A = 38 %, C = 46 %). Likewise, Beunen and Thomis (1999) have investigated sports participation in 15-year-old twins (N = 183 individuals) and found that for boys, most of the variance (83 %) was explained by genetic factors after dropping C from the model. For girls, C could not be dropped and only 44 % of the individual differences in sports participation were explained by genetic factors, with 54 % due to shared environmental factors.

Overall, existing studies are well in line with the general pattern in our study in that individual differences in exercise behavior are strongly determined by shared environmental factors in childhood but that in adolescence, genetic factors gradually overwhelm the effects of the shared environment, especially in males. The shared environmental factors affecting exercise behavior may consist mainly of parental influences in children (Huppertz et al. 2012). Parents often act as gatekeepers to children’s exercise activities by providing necessary resources and support. They are also involved in the timing and initial choice of specific exercise activities and might thus affect their children’s skill acquisition and, ultimately, their exercise performance (Timmons et al. 2007). Parents may be partly responsible for the qualitative sex differences seen in childhood. It has been reported that boys tend to receive more parental support to be physically active than girls, although the findings are not unanimous (Anderson et al. 2009; Beets et al. 2010; Gustafson and Rhodes 2006). With increasing age, the social support received by peers starts to supersede that of parents (Chan et al. 2012). The influence of peers may well account for the innovation we noted in the shared environmental variance with increasing age as well as the absence of innovation in 18-year-old males and 16- and 18-year-old females when the parental influence on exercise behavior may all have disappeared. The quality of coaches and trainers might contribute to both the shared and the non-shared environmental variance throughout childhood and youth (Chan et al. 2012).

The nature of the genetic factors that increasingly affect exercise behavior throughout childhood and adolescence remains uncharted, but a number of testable hypotheses have been put forward (Bryan et al. 2007; de Geus and de Moor 2011). The first one suggests genetic effects on a homeostatic “need to be active” which has been operationalized in rodent studies by spontaneous wheel running (Knab and Lightfoot 2010; Lightfoot et al. 2004). Large strain differences exist in spontaneous running when animals are granted free access to a wheel, and selective breeding confirms that this “activity drive” is a heritable trait (Garland et al. 2011). In humans, the activity drive may be an integral part of personality traits like extraversion, sensation seeking or impulsivity. Other personality traits like neuroticism or conscientiousness may also come into play, e.g. by determining individual differences in attraction to regular exercise behavior and the ability to persist. The personality traits extraversion, sensation seeking and conscientiousness are indeed positively related to exercise behavior, whereas neuroticism is negatively related (de Moor et al. 2006; Rhodes and Smith 2006). Personality may furthermore play a role in the formation of attitudes towards exercise, in particular the perception of the benefits of and the barriers towards exercise behavior. As personality traits as well as exercise attitudes have a partly genetic basis (de Moor et al. 2012; Huppertz et al. 2014; Jang et al. 1996), they are likely to contribute to the genetic variation in exercise behavior (de Geus and de Moor 2011). Furthermore, as personality is considered to be a rather stable trait from early childhood onwards, it may mainly explain the transmission, but not innovation, of genetic effects across ages.

Apart from personality traits and exercise attitudes, large individual differences have been observed in the acute mood response to activity bouts (Ekkekakis 2008; Parfitt and Hughes 2009). Low-intensity exercise evokes rewarding reactions in most individuals, whereas high-intensity exercise evokes aversive reactions in most individuals. The responses to intermediate levels of exercise, however, are much more variable, with some individuals reporting rewarding feelings, whilst others report aversive feelings (Ekkekakis et al. 2005). Individual differences in the acute psychological response to exercise are likely to be largely explained by genetic factors (Knab and Lightfoot 2010). If this response becomes increasingly more important to maintain regular exercise behaviour from childhood to adolescence, it could be a source of the genetic innovation that was observed.

Finally, fitness and exercise ability (as in endurance, strength, flexibility, motor coordination, training response and similar) have been shown to be highly heritable traits (Bouchard and Hoffman 2011; Bouchard and Rankinen 2001). As adolescents tend to seek out activities that they are good at and to avoid those that they are not good at, an adolescent endowed to be good at exercising (and/or to improve fast with training) will be more likely to keep pursuing physical exercise on a regular basis (de Geus and de Moor 2008). In males, strong genetic transmission is seen from age 10 onwards which ultimately results in a very high heritability of exercise behavior at age 18 (79 %). The increase in genetic variance is less steep in girls. Exercise ability and trainability might explain part of this difference, as boys are more likely to take part in team sports and competitive sports (implying more comparison among peers) and as perceived athletic ability is culturally more important to boys than to girls. The focus on adolescents here should not detract from the fact that for younger children, perceived competence may also play a role in the maintenance of exercise activities. However, the strong increase in genetic variance suggests that innate differences in competence are more relevant in adolescence than in childhood. Shared environmental influences probably suppress the effects of innate differences in the latter group.

As stated in the introduction, the development of exercise behavior from childhood to adolescence has not been assessed in longitudinal twin studies. In part, this may be due to the difficulty of repeatedly assessing exercise behavior in a large set of twins, especially in young twins. Ideally, one would assess exercise behavior with a combination of objective and subjective measures. As this was not feasible for the present study, we relied on subjective reports only, which may have led to biases. In contrast to total physical activity, however, exercise behavior is structured and clearly defined in time and can therefore be recalled with acceptable accuracy. The correlations between mothers’ and fathers’ ratings of their children’s exercise behavior ranged between 0.74 and 0.89 in this study and the six-months test–retest reliability of this measure was found to be 0.91 (Stubbe et al. 2007) and 0.82 (de Moor et al. 2008) in our previous work. Furthermore, it has been associated with the sweat index and the frequency of being physically active for at least 20 min in the past 6 months (de Moor and de Geus 2013), which are likely to be largely affected by exercise behavior. Finally, the results are in line with previous studies that used objective measures of physical activity (Fisher et al. 2010; Franks et al. 2005; Plomin and Foch 1980).

It should be noted that exercise behavior was assessed through parental report for the surveys 7, 10 and 12, and through self-report for the surveys 14, 16 and 18. This may introduce potential rater effects that may mimic some of the patterns that were found. More specifically, self-report, where two individuals report on their own behavior, may lower twin correlations compared to parental report, where the same individual, namely the parent, reports on both children (Kan et al. 2014). For the self-reports, genetic models will estimate the genetic effects that are common to both raters as “A” and the genetic effects that are specific to each rater as “E”, under the assumption that rater-specific factors are genetically influenced. We indeed found a larger E-component in adolescents compared to children. Unfortunately, we cannot differentiate between the part of the E-component that reflects non-shared environmental effects and the part that reflects measurement error. However, we argue that as opposed to, for instance, ratings on psychopathology (Kan et al. 2014), informant dependency is less of a concern in exercise behavior, as this behavior is less dependent on subjective perceptions, but can be rather objectively reported as weekly frequency and duration. Moreover, in line with a recent study by Telama et al. (2014) and an earlier review (Telama 2009), we found moderate-to-high tracking of exercise behavior across the entire age range, with larger correlations for surveys that were in closer proximity to each other and higher stability in the surveys targeting older twins, with no deviant patterns from survey 12 to 14 (from parental report to self-report).

Although, in general, twin studies are the most elegant method to estimate the contribution of genes and the environment to variance in a trait, a number of critical assumptions have to be met to obtain valid results. First, it is assumed that twins are representative of the general population. As stated in our previous work (Huppertz et al. 2012), a specific limitation of using twins to understand the determinants of exercise behavior, is that the findings may not generalize to families with siblings of different ages or a single child. Because twins have the same age, it is more convenient for parents to handle their twins as a pair (and thus to promote exercise behavior), than it would be in the case of siblings with a larger age difference. This might have led to a greater role of tangible support (a shared environmental factor) on the part of the parents compared to families without twins. To confirm that there are no systematic differences in the percentage of non-exercisers and in the means and variances in weekly MET hours between multiples and singletons, we selected a group of multiples and a group of their siblings of the NTR aged 13–18 years and compared their exercise behavior in narrow age ranges (Supplementary Table 4). We conclude that exercise behavior of twins is generalizable to the population-at-large. Second, modeling assumed that the twins’ parents did not select each other based on the phenotype under study (or a correlated phenotype), whereas various studies have found evidence for significant spousal resemblance in exercise behavior (Aarnio et al. 1997; Boomsma et al. 1989; Perusse et al. 1988, 1989; Seabra et al. 2008). This may have led to a higher resemblance than expected of DZ twins in genes that affect exercise behavior, which would imply an overestimation of shared environmental variance. Third, the so-called equal environments assumption holds that environmental differences between MZ and DZ twins are not related to the phenotype under study. Otherwise, a higher similarity of MZ twins compared to DZ twins could be due to genetic influences, environmental influences, or both, whereas the classical twin design ascribes a difference in similarity to genetic factors only (Kendler 1993). The equal environments assumption has been shown to hold for a wide range of phenotypes (Kendler 1993), including physical activity-related traits (e.g., doing sports) (Eriksson et al. 2006).

Notwithstanding its limitations, this study provides an important extension to the literature as it is the largest investigation of exercise behavior in twins aged 7–18 years. More than 27,000 individuals have provided data for this study and almost 13,000 individuals have provided data on more than one measurement moment. Exercise behavior was assessed in narrow ranges around the ages of 7, 10, 12, 14, 16 and 18 years, we modeled both the absolute and the relative contribution of genes, the shared environment and the non-shared environment to variance in exercise behavior as a function of age, and we modelled the underlying developmental structure. Our age-moderation analysis confirmed the major role of shared environmental factors in children’s exercise behavior and genetic factors in adolescents’ exercise behavior, implying that family-based interventions might work to increase exercise behavior in children, whereas individual-based interventions might be better suited for adolescents. Given the enormous complexity of factors that cause individual differences in exercise behavior, it is not surprising that “one-size-fits-all” interventions do not bring about satisfactory changes in behavior. Age-specific shared environmental and genetic determinants of differences between individuals need to be identified in order to develop personalized interventions that take into account human variation.