Introduction

Breeders of crops and livestock have known for centuries that matings between distantly related individuals often produce better offspring than those between closely related individuals (Darwin 1868). This phenomenon is commonly known as heterosis (since Shull 1914), or hybrid vigor, denoting the superiority of offspring. When inbred populations are crossed, the offspring will often exhibit mean values higher than those of the mid-parent level for any traits that have exhibited inbreeding depression, including aspects of Darwinian fitness (reproductive success, e.g., Falconer and Mackay 1996; Birchler et al. 2006). This is not always the case, however, as outbreeding depression can also occur in distantly related populations due to breakup of coadapted gene complexes that contribute to a phenotype affected by a high degree of epistasis (Lynch 1991, 1994; Burke and Arnold 2001; Birchler et al. 2006).

As noted by Mayr (1961), independent lines (populations) experiencing apparently identical directional selection will often respond at different paces and with different correlated traits. Although directional selection works to increase the frequency of favorable alleles while reducing the frequency of unfavorable alleles, the simultaneous effects of random genetic drift are indifferent to any particular allele’s selective relevance. Therefore, drift potentially fixes alleles whose effects are neutral or even counter to what selection favors. As drift and mutation are stochastic processes, their effects will, on average, cause populations to diverge genetically, and the generation-to-generation response to directional selection will be contingent on existing genetic variation. For these reasons (and others), identical selection may often lead to “multiple solutions” in different populations (Garland and Rose 2009; Garland et al. 2011a) and when these populations are mixed, as during an intentional cross, heterosis for many traits will often occur (e.g., Ehiobu and Goddard 1990; Bult and Lynch 1996; reviews in Lynch and Walsh 1998; Lippman and Zamir 2007).

Heterosis has been documented for many traits, within many different species, such as high-temperature growth rate in yeast (Steinmetz et al. 2002), post-weaning success in pigs (Young et al. 1976), and mannose-binding lectin in humans (Hellemann et al. 2007). In house mice, heterosis has been observed for traits including food competition (Manosevitz 1972), motor behavior (Guttman et al. 1980), growth rates (Bhuvanakumar et al. 1985), body size (Lynch et al. 1986), litter size (Peripato et al. 2004), activity rhythms (Beau 1991), and nest-building behavior (Bult and Lynch 1996).

The primary purpose of the present study was to test for heterosis using two (of four) replicate lines of mice that have been bred for high voluntary wheel-running behavior (Swallow et al. 1998, 2005, 2009). Wheel running is a behavior that generally will involve aspects of both motivation and ability (Waters et al. 2008; Meek et al. 2009; Garland et al. 2011b). For example, an individual rodent that is highly motivated to run (e.g., because it is highly rewarding in a neurobiological sense) but lacks the inherent endurance capacity to do so simply will not be able to run as much as another individual with both high motivation and high ability. Rodent wheel running has been the subject of numerous studies, with goals ranging widely across behavior, physiology, and genetics (e.g., Slonaker 1912; de Kock and Rohn 1971; Holloszy and Smith 1987; Belke and Garland 2007). Despite a century of study, precisely what wheel running in laboratory rodents represents remains controversial (Mather 1981; Sherwin 1998; Garland et al. 2011b). Heterosis has been observed for wheel-running behavior (and other aspects of locomotor activity, e.g., exploratory behavior) when inbred strains of mice were crossed (Bruell 1964a, b).

The crosses necessary to study heterosis also allowed us to test for line differences. On average, the four replicate High Runner (HR) lines run 2.5- to 3.0-fold more revolutions/day as compared with four non-selected control (C) lines, a differential that has been maintained from approximately generation 16 to the time of the present study at generation 53 (Middleton et al. 2008; Swallow et al. 2009; Kolb et al. 2010). The nature of this selection limit is as yet unknown, but does not appear to be simply an exhaustion of additive genetic variance for wheel running (unpublished results). Phenotypically, the selection limit may be related to availability of lipids to fuel the many hours of running that occur during each 24-h period (Gomes et al. 2009; Kolb et al. 2010; Meek et al. 2010). Whatever the precise phenotypic characteristics that underlie the selection limit, if a cross between two HR lines results in hybrid vigor, then selection applied to a population derived from such a line cross would have the potential to break through the prevailing selection limit (e.g., Bult and Lynch 2000). In addition to measures of wheel running, we report data for masses of four organs, at least three of which (heart ventricles, calf muscles, liver) may have important roles during endurance running (e.g., see Dumke et al. 2001; Garland et al. 2002; Swallow et al. 2005; Rezende et al. 2006c; Meek et al. 2009; and references therein).

Methods

Animals

Mice used in this study were from an ongoing selection experiment for high voluntary wheel running. Full details of the selection experiment are found in Swallow et al. (1998), and only a brief synopsis is presented here. The original progenitors were 224 mice of the outbred, genetically variable (e.g., see Carter et al. 1999) Hsd:ICR strain of house mice (Mus domesticus). This population was randomly mated for two generations and then divided into eight closed lines, four of which were deemed high runner (lab designations HR 3,6,7,8) and four control (C 1,2,4,5). A minimum of ten pairs from each line were used to propagate the subsequent generations. Pregnant dams are given a breeder diet (Harlan Teklad, Madison, WI, Mouse Breeder Diet [S-2335] 7004) through weaning. At other times, standard chow (Harlan Teklad, Madison, WI, Rodent Diet [W] 8604) and water are available ad libitum. Pups are weaned at 21 days of age. Each generation, at 6–8 weeks of age, mice are individually housed with access to a Wahman-type running wheel (circumference = 1.12 m) for 6 days, during which daily wheel running is monitored by a computer-automated system. The selection criterion is the mean number of revolutions run on days 5 and 6 of the 6-day test. In the four HR lines, the highest-running male and female from each family are chosen as breeders (i.e., within-family selection). In addition, second-highest running males and females are chosen to provide backup pairings. In the four control lines (C), two males and two females are randomly chosen from each family without regard to wheel running. Within all lines, breeders are randomly paired, excluding sibling mating.

Selected lines 7 and 8 were chosen for this study due to the absence of the mini-muscle allele, which affects numerous traits, including wheel running and organ masses (see Garland et al. 2002, Swallow et al., 2005; Rezende et al. 2006a, c; Hannon et al. 2008; Hartmann et al. 2008; Middleton et al. 2008; Gomes et al. 2009). All line 7 and line 8 breeders (see previous paragraph) from generation 53 were repaired to produce mice for the present study (i.e., second litters). Sires were housed individually from time of removal from first pairing to time of second pairing. Dams were housed 3-4 to a cage from time of weaning of first litter to time of pairing for this experiment.

Due to the within-family selection method used to choose breeders for the selection experiment, the breeders for the present experiment usually had three siblings (one of the same sex, two of the opposite sex) also included in the experiment. Therefore, mice were repaired using the following guidelines. Sibling mating was disallowed and all females were mated with a novel male. Considering two siblings of the same sex, one sibling was randomly chosen to be mated with a mouse from the same line, while the other sibling was mated to the other line. For families represented by other than four (3 or 5) siblings, the odd mouse was randomly assigned as a breeder.

This protocol produced a total of 43 breeding pairs: 11 pairs were purebred line 7 × line 7; 10 were purebred line 8 × line 8; 11 were male line 7 × female line 8 hybrids; 11 were male line 8 × female line 7 hybrids. Purebred offspring of the replicate selected lines (7 × 7 and 8 × 8) were used because direct comparison to parental individuals could be compromised due to possible generational effects, which can be substantial (e.g., see figures in Swallow et al. 1998, 2009; Middleton et al. 2008; Kolb et al. 2010). Reciprocal crosses for the hybrids were conducted to test for parental effects. Eighteen days after pairing, the male was removed if the female was visibly pregnant; otherwise, he remained with the female until she appeared pregnant. Mice were weaned at 21 days of age and housed 4 per cage by sex and cross type. Total sample sizes were 171 females and 166 males for wheel-running traits, with the breakdown by cross type as follows: 47 female and 38 male for line 7 × line 7; 42 female and 48 male for line 8 × line 8; 38 female and 37 male for male line 7 × female line 8 hybrids; 45 female and 43 male for male line 8 × female line 7 hybrids. For organ masses, total sample sizes were 177 females (176 for ventricle mass) and 166 males (165 for ventricle mass and triceps surae mass).

Measurement of wheel running and organ masses

F1s were wheel-tested in the same manner as in the regular selection experiment (described above). Rooms were controlled for temperature (~22°C) and photoperiod 12:12 light/dark cycle (lights on 0700). Wheels were checked daily to ensure freedom of rotation. Wheel running was monitored with a computer-automated system and revolutions were recorded in 1-min bins (intervals). Wheel running was quantified as means for days 5 and 6 of the 6-day test (Swallow et al. 1998). Following previous studies, we analyzed means for total revolutions per day, the number of 1-min intervals per day with at least one revolution (minutes/day), the mean speed when running (revolutions/minutes), and the highest single 1-min interval per day (e.g., Swallow et al. 1998; Hannon et al. 2008; Kelly et al. 2010a, b). We also analyzed body mass at the start of the wheel trial.

Following wheel testing, mice were returned to standard cages without wheels, housed 4 per cage. Approximately 7 days following wheel testing, mice were sacrificed by CO2 inhalation in batches to allow for harvesting of organs and muscle tissue. Mean age at sacrifice was 69 ± 3 (± SD) days. Following sacrifice, mice were weighed and dissected to determine masses of organs that have potential relevance for exercise physiology. The heart was detached and ventricles were removed from the atria and connecting blood vessels. Ventricles were blotted to remove any excess blood prior to weighing. The liver was excised followed by the spleen, then the right and left triceps surae muscles [which include the lateral and medial heads of the gastrocnemius, soleus, and the plantaris, as described in Carter et al. (1999)]. Wet masses of all tissues were recorded to the nearest 0.001 g on an electronic balance (Denver Instruments, Denver CO, USA, model M-220).

Statistical analyses

To test for differences in wheel running, body mass, and organ masses, a two-way analysis of covariance model (ANCOVA) was applied using the MIXED procedure in SAS (version 9.1; SAS Institute, Cary, NC, USA). All analyses used age as a covariate. Analyses of organ masses used body mass as an additional covariate. Analyses of wheel-running traits did not include body mass as a covariate, but did use a measure of wheel freeness. To measure wheel freeness, each wheel was accelerated to a constant velocity, then the number of revolutions spun until stopping was recorded. For analyses, wheel freeness was transformed by raising measured values to the 0.4 power to obtain a more homogeneous spread of values. Deviations from linearity were not apparent in plots of the wheel-running traits versus transformed wheel freeness, and preliminary analyses indicated that the interaction between group and transformed wheel freeness were not statistically significant (all P > 0.08). Therefore, this interaction term was not included in final statistical models. Family was a random effect, nested within cross type. Preliminary analyses combined the sexes and tested for effects of cross type, sex, and the cross type * sex interaction. Because we found significant interactions (e.g., for revolutions/day, P = 0.0012; see Results), subsequent analyses were done separately by sex.

The hybrid groups were expected to exhibit greater variance than the parental types. Therefore, we considered a range of models that allowed for different variances among families within types and/or among individuals within families (i.e., the residual variance). Specifically, we considered models with (1) a single estimate for residual variance, (2) a single estimate for residual variance and a single estimate for variance among families (nested random effect), (3) a single estimate for residual variance and separate estimates of family variance for each of the four cross types, (4) a different residual variance for each cross type and no variance among families, (5) a different residual variance for each cross type and a single estimate of variance among families, (6) a different residual variance for each cross type and separate estimates of family variance for each cross type. We used a priori contrasts to compare the two parental types (i.e., test for line differences), the two reciprocal hybrid crosses (test for parental effects), and the two parental groups with the two hybrid groups (test for heterosis). In general, significance levels for these contrasts were similar across the six models listed above. For simplicity and consistency, we report results only for the most parameter-rich model, i.e., number (6) above. In some cases, traits were transformed to improve normality of residuals.

Because we performed a number of tests on closely related data, our Type I error rate for the entire experiment may exceed the nominal 5% alpha level. Therefore, we performed a positive false discovery rate (pFDR) analysis using the QVALUE package (Version 1.1; Storey 2002) for R (Version 2.8.0; R Core Development Team 2008), allowing for 5% false significant results (pFDR = 0.05). Based on analysis of the 60 P values presented in Table 1, those <0.016 can be considered significant, and we emphasize those results.

Table 1 Statistical comparisons of body mass, wheel running, and organ masses (with body mass as a covariate) separated by sex

Results

In preliminary analyses, we found significant sex * cross type interactions for revolutions/day (P = 0.0012), minutes/day (P = 0.0140), and maximum speed in any 1-min interval (P = 0.0255), but not for mean speed (P = 0.0850) or body mass (P = 0.7866). Therefore, subsequent analyses were done separately by sex.

Females

After adjusting for multiple comparisons, purebred females from line 7 ran significantly more revolutions per day (P = 0.0032), at higher mean (P = 0.0005) and maximum speeds (P = 0.0008), but not for more minutes per day (P = 0.6163), as compared with line 8 females (Tables 1, 2; Fig. 1). Line 7 females were significantly smaller than those from line 8 (Tables 1, 2). Controlling for variation in body mass, lines 7 and 8 differed significantly for ventricle, spleen, and triceps sure mass, but not liver mass (Tables 1, 2; Fig. 2).

Table 2 Least squares means and (standard errors) for body mass, wheel running, and organ masses (corresponding to statistical analyses in Table 1)
Fig. 1
figure 1

Wheel-running activity during days 5 and 6 of a 6-day exposure to wheels (1.12 m circumference) attached to standard housing cages. Values are least squares means + SEs from analysis of covariance models in SAS Procedure Mixed (see text and Table 1 for statistical results). 7 × 7 and 8 × 8 denote purebred mice from two different HR lines bred for high voluntary wheel running (Swallow et al. 1998). Values in between these are for reciprocal crosses. See Table 2 for numerical values

Fig. 2
figure 2

Triceps surae muscle mass, adjusted for body mass. Values are least squares means + SEs from analysis of covariance models in SAS Procedure Mixed (see Table 1 for statistical results and Table 2 for numerical values). Note broken Y-axis to emphasize differences among groups. 7 × 7 and 8 × 8 denote purebred mice from two different HR lines bred for high voluntary wheel running (Swallow et al. 1998). Values in between these are for reciprocal crosses

Female hybrids were intermediate between the purebred lines for body mass at the start of wheel access and for all running traits (Fig. 1; Tables 1, 2). Female hybrids had significantly smaller ventricles (P = 0.0034) than purebreds after adjusting for body mass. Hybrids from the two reciprocal cross populations were not significantly different for any trait (Tables 1, 2; Figs. 1, 2).

Males

Purebred males from HR lines 7 and 8 differed significantly for minutes/day of wheel running, but not for revolutions/day, mean speed or max speed (Tables 1, 2; Fig. 1). Purebred males from line 8 were significantly larger than those from line 7, and they also had significantly larger livers, spleens, and triceps surae muscles (Fig. 2; Tables 1, 2).

Unlike female hybrids, as compared with the mean for purebred lines, male hybrids showed a significant increase in revolutions per day (P = 0.0016), mean speed (P = 0.0037), and maximum speed (P = 0.0101), but did not differ in body mass at the start of wheel access (Tables 1, 2). Consistent with females, male hybrids from the reciprocal crosses (7 × 8 vs. 8 × 7) were not significantly different for any trait (Tables 1, 2; Figs. 1, 2).

Discussion

Results of our crosses between two replicate lines bred for high voluntary wheel running, intended primarily to examine heterosis, also show that the two lines differ for a number of traits, often in a sex-specific fashion. For example, revolutions run per day—the target of selective breeding—were higher in purebred HR line 7 than 8 for females (14,607 vs. 10,878, respectively, 2-tailed P = 0.0032), but not for males (9,123 vs. 11,257, P = 0.1759) (Fig. 1; Tables 1, 2). Moreover, the patterns of heterosis that we identified differ between males and females. Therefore, we separate much of the subsequent discussion by sex. It is important to note that the higher wheel running of females than males in line 7 is not peculiar to this generation (e.g., see Garland et al. 2011a for results from generation 43).

Males

For males, examination of the two components of wheel revolutions/day indicates that the two HR lines have responded differently to artificial selection (Fig. 1; Tables 1, 2). Line 8 males ran substantially more minutes/day as compared with line 7 (542 vs. 441 min/day), but the direction of this differential was reversed for mean running speed (18.17 vs. 20.02 revolutions/min). The end result was no statistical difference in revolutions/day (10,086 vs. 9,123), thus demonstrating approximate functional equivalence achieved by “multiple solutions” in response to selective breeding (e.g., Endler et al. 2001; Spitschak et al. 2007; see also Swallow et al. 2009; Garland et al. 2011a). Line 7 males were smaller than those of line 8, and also had significantly smaller body-mass adjusted spleens, livers, and triceps surae muscles (Tables 1, 2; Fig. 2), but whether this is causally related to the differences in running behavior is unclear (see also Garland et al. 2002).

Consistent with the partial evolutionary independence of average running speed and duration found in the present study, within an advanced intercross mapping population of HR line #8 and inbred C57BL/6J, two statistically significant QTL were detected for average running speed on days 5 plus 6, and a different QTL was detected for time spent running on days 5 and 6 (Kelly et al. 2010b), although a formal test for epistasis was not performed. Similarly, a QTL analysis of an F2 population from a cross between relatively high-running C57L/J and low-running C3H/HeJ inbred strains found two QTL for wheel-running speed, one of which did not colocalize with the single QTL identified for duration (Lightfoot et al. 2008), although a subsequent paper detected considerable epistasis by use of a full genome epistasis scan for all possible interactions of QTL between each pair of 20 chromosomes (Leamy et al. 2008).

Hybrid males showed a significant increase in revolutions/day over purebred males (hybrid vigor), caused mainly by higher running speed, with a trend also for more time spent running (Fig. 1). This result demonstrates that the underlying genetic architecture of high wheel running in males differs between these two HR lines (e.g., Bult and Lynch 1996). In contrast to the results for wheel running, hybrids were intermediate to the parental groups for relative liver, spleen, and triceps surae muscle masses. It is interesting that these lower-level traits do not follow the same pattern of heterosis as the target of selection, which could be explained by their not being functionally necessary to support the higher levels of wheel running and/or by a change in their genetic correlation with wheel running in the cross populations (e.g., see Eisen 1975). In previous publications that reported masses for these organs, no consistent, statistically significant differences were found in comparisons of the four High Runner and four control lines (Dumke et al. 2001; Garland et al. 2002; Swallow et al. 2005; Rezende et al. 2006c; Meek et al. 2009).

Females

Unlike males, purebred line 7 females ran significantly more revolutions/day than line 8 females, almost entirely because the former ran faster, with no statistical difference in duration of running (Fig. 1; Tables 1, 2). Also unlike males, hybrid females were intermediate between the two parental phenotypes for both revolutions/day and speed. In spite of the differences from males, overall these comparisons again indicate different genetic responses to selection.

As with males, females of line 7 were smaller than line 8 and had smaller size-adjusted spleens and triceps surae. In contrast to males, line 7 females had relatively larger hearts than their line-8 counterparts (Tables 1, 2), which could contribute to their higher running speeds via positive effects on endurance (Meek et al. 2009) or maximal aerobic capacity (Rezende et al. 2006b, c, 2009). Arguing against this, however, hybrid females had relatively smaller heart ventricles (P = 0.0034) than either purebred line, but exhibited intermediate levels of wheel running (Fig. 1; Tables 1, 2).

Parental effects

In a reciprocal cross between HR line 8 and a control line, we found parent-of-origin effects in the F1 for both body mass and wheel running (R. M. Hannon, S. A. Kelly, B. K. Keeney, J. L. Malisch, and T. Garland, Jr., unpublished results). Similarly, in a cross between HR line 8 and inbred C57BL/6J, we found parent-of-origin effects on body composition and wheel-running traits in a fourth-generation intercross population (Kelly et al. 2010a). In the present cross, however, we found no such effects that were statistically significant. The lack of such effects in the present cross may reflect the fact that the two replicate HR lines studied here are more similar, both phenotypically and genetically, than for a control line or C57BL/6J vs. HR line 8.

Summary and future directions

The line crosses presented here demonstrate different responses to selection for high voluntary wheel running in two (of four total) replicate HR lines, as well as sex-by-line interactions in the response to selection. In addition, the two HR lines not studied here have shown an increase in the frequency of a Mendelian recessive allele that causes a 50% reduction in hindlimb muscle mass and increased wheel-running speed, among many other identified pleiotropic effects (Garland et al. 2002; Swallow et al. 2005; Rezende et al. 2006a; Hannon et al. 2008; Middleton et al. 2008; Gomes et al. 2009). The “mini-muscle” phenotype was never detected in the two lines studied here, again demonstrating different genetic responses to selection. Thus, overall, results of the long-term selection experiment reinforce the concept that directional selection favoring a particular phenotype, and hence altering the frequencies of alleles that affect the phenotype, will occur simultaneously with other evolutionary processes, especially random genetic drift in the relatively small populations used for rodent selection experiments (e.g., Eisen 1975; Swallow et al. 2009).

Hormonal differences may contribute to the line (or sex: Lightfoot 2008) differences we observed. For example, it has been shown previously that HR lines have higher circulating corticosterone (CORT) concentrations than C, and that differences among replicate lines are also statistically significant (Malisch et al. 2007, 2009). As suggested elsewhere (Malisch et al. 2008), organisms with elevated corticosterone levels could have higher available energy and/or motivation to perform during exercise such as wheel running (Dallman et al. 1993; Pecoraro et al. 2006). However, whether HR lines 7 and 8 show consistent differences in baseline CORT or in levels during wheel running is not yet known (see Malisch et al. 2007, 2009).

Our results show some clear examples of sex-specific heterosis, as has occasionally been reported in the literature. White et al. (1970) report heterosis involving body mass in mice, with both sexes experiencing heterosis, but one sex showing it to a greater degree. Line crosses involving body mass in beef cattle and poultry (Stonaker 1963), fecundity in Drosophila (Brown and Bell 1960), and survival in swine (Cox 1960) also showed one sex to exhibit a greater degree of heterosis. However, the pattern of sex-specific heterosis reported in this study seems to be rare. Unlike the examples cited, we show cases (Fig. 1) in which the F1 of one sex exhibits clear heterosis, whereas the F1 of the other is intermediate between the phenotypic means of the parental populations.

The mechanisms underlying the cases of sex-specific heterosis that we observed are not yet apparent. Using a backcross between a different HR line (#3) and inbred C57BL/6J, Nehrenberg et al. (2010) reported several sex-specific QTL, including for aspects of wheel running. That study probably underestimates the magnitude of such effects, because the cross design used did not allow examination of markers on the sex chromosomes. Kelly et al. (2010b) included markers on the X chromosome in a QTL study that used a large advanced intercross line (G4) population originated from a reciprocal cross between HR line #8 (one of the two used here) and C57BL/6J, but did not any detect any QTL on the X chromosome nor any sex-specific QTL. As noted in the Introduction, Leamy et al. (2008) detected a large amount of epistasis using a full genome scan of SNP markers in an F2 population of mice derived from a cross of two inbred strains, and some of the epistatic interactions involved markers on the X chromosome. To date, no study of mouse wheel-running QTL has included markers on the Y chromosome. Molecular imprinting is widespread in the mouse genome (Searle and Beechey 1978; Cattanach and Kirk 1985; Cattanach 1986), and sex-specific molecular imprinting (Hager et al. 2008) could potentially account for the differential heterosis we see between the sexes in the F1 hybrids.

Experimental evidence has shown that both dominance and over-dominance play a role in heterosis, with some involvement of epistasis, although the relative contribution of each of these mechanisms is still unclear (Birchler et al. 2006; Lippman and Zamir 2007) and is likely to vary among organisms, strains, and traits. Additionally, epistatic interactions among loci can also play a significant role in heterosis. For example, in an F2 population of mice derived from a cross of two strains exhibiting large differences in wheel running (C57L/J, high active; C3H/HeJ, low active), a full-genome epistasis scan for all possible interactions of QTL between each pair of 20 chromosomes indicated that epistatic interactions contributed an average of 26% of the total genetic variation for the three measures of daily wheel running (total distance, duration, and average speed) (Leamy et al. 2008). As with most other studies of heterosis in rodent behavior (e.g., Bruell 1964a, b; Lynch et al. 1986; Bult and Lynch 1996, 2000), the present study provides no evidence as to which mechanism(s) account(s) for the observed instances of heterosis. Nonetheless, our results do indicate that crossing of replicate selected lines can yield offspring that exceed what was an apparent selection limit, as in Bult and Lynch (1996). Given that heterosis for wheel running was only observed in male hybrids, it raises the interesting possibility that female mice might be closer to a true selection limit as compared with males. This suggests that further selection on a population descended from the hybrids (Bult and Lynch 2000) might be able to break the limit for males but not females.