Introduction

The gigantic, “bizarre” antlers of the extinct Irish elk, Megaloceros giganteus, have played a role in evolutionary theorizing from Darwin and until today (Andersson, 1994; Geist, 1998; Gould, 1974, 1977; Lister, 1994; Moen et al., 1999; Somjee, 2021). The Irish elk even figured in predarwinian debates about extinction and became a standard example of evolution by orthogenesis. The orthogenetic trend toward increasing size and complexity of the antler was suspected as an underlying cause of extinction (Worman & Kimbrell, 2008). Proponents of the modern synthesis such as Huxley (1932) and Simpson (1953) argued against orthogenesis, and explained the assumed maladaptation of the antlers as a result of an allometric constraint where selection on body size may have driven an overdevelopment of the antlers due to a positive allometry between antler size and body size. Gould (1973, 1974) pointed out that this theorizing was not based on quantitative measurement or testing of the involved hypotheses, and set out to obtain and analyze quantitative data on antler size. Gould concluded that the proportionally gigantic antlers of the Irish elk were about as large as predicted by the interspecific, “evolutionary”, allometry among cervine (true) deer, and also found that the within-species, “static”, allometry of a sample of 79 Irish elk showed a strong positive allometry with skull size. These findings support the hypothesis that body size and antler size are constrained to evolve in concert, such that a large-bodied deer necessarily would have proportionally even larger antlers. Gould argued further that this finding is agnostic to the causal basis of selection, which he felt were more likely to be on the antler as a sexual display than on body size per se, as suggested already by Darwin (see Gould, 1974, 1977).

Gould’s (1973, 1974) regression of antler size on body size across cervine deer may have been the first quantitative cross-species study of antler size (Fig. 1). Subsequent comparative work has focused on the role of sexual selection, explaining larger antlers as a result of stronger sexual selection in species with more competitive mating groups (e.g. Bartoszek et al., 2012; Ceacero, 2016; Clutton-Brock et al., 1980; Hansen, 2014; Holman & Bro-Jørgensen, 2016; Lemaître et al., 2014; Plard et al., 2011). Clutton-Brock et al. (1980) even suggested that the positive antler-body allometry could be explained as a side effect of stronger sexual selection in larger-bodied species.

Fig. 1
figure 1

Gould’s (1974) allometry for the cervine deer: The two lines are based on reduced-major-axis and least-squares regression, respectively, for 18 species of cervine deer. The Irish elk, marked with an M, falls straight on the reduced-major-axis line and only slightly above the least-squares line. It has smaller antlers for its size than its nearest living relative, the fallow deer, marked D. For large deer, it is the relatively small antlers of the European and American moose, marked with A, that are unusual. The moose are not cervine and were not used for computing the regressions. The scales are in inches. Reprinted from Gould (1974) with permission from Oxford University press

The studies of Gould (1973, 1974), and Clutton-Brock et al. (1980) on sexual selection, have become standard examples of allometry, developmental constraint and adaptation. In particular, the notion that the Irish elk falls on the evolutionary allometry for cervine deer is an iconic illustration for the structuralist viewpoint that traits need be understood in the context of the whole organism and not as individual parts (Amundson, 2005; Gould, 2002; Gould & Lewontin, 1979). Even without claiming that allometry explains the size of the antlers, this result changes our perspective on what is to be considered unusual and in need of explanation. It is less the gigantic antlers of the Irish elk that deserves our attention than the puny antlers of the present-day moose.

Although the conceptual importance of these studies is well known, the comparative data and methods on which they are built have received scant discussion. In fact, both the data and the methods are poor and ill-suited for the questions at hand (Hansen, 2014). While Gould made a decent effort at collecting new and accurate data for the Irish elk, he used standard literature data for other deer species. Clutton-Brock et al. (1980) used essentially the same data. Despite being reported and used as quantitative measurements of antler size and body size, these data are a mixture of qualitative and semi-quantitative observations originating from records of trophy hunting. As an example, the “91.0 cm” mean shoulder height for the fallow deer reported by Clutton-Brock et al. (1980) and also used by Gould appears to derive from the undocumented statement “Height at shoulder about 3 feet” in Ward (1903, p. 64). Such inaccuracy in comparative data appears common, and there is little reason to expect comparative data for ungulates to be much better than the scandalously error-ridden data on body size that has fueled an industry of comparative research on primates (Sandel et al., 2016; Smith & Jungers, 1997). The deer data are further void of any indication of uncertainty or sample sizes. Later compilations of antler data, such as by Plard et al. (2011), may be of better quality, but are still lacking measures of uncertainty and documentation of origin.

A more subtle problem is that antlers have complex and varied shapes. It is not obvious what measurements are comparable across species, or indeed what underlying property of antlers the measurements are supposed to represent. Most comparative studies are based on main-beam length along the outer curve from the margin of the burr to the tip as a measure of antler size, but without any formal argument for why this is a comparable and reasonable representation of “size” across species. Indeed, if size is supposed to reflect investment by the animal, something like the weight of the antler would be a more reasonable measure, and there has been little attempt at showing how beam length relates to weight or volume of antlers of different shapes (but see Tsuboi et al., 2020). For studies of sexual selection, an assessment of display size or weapon quality may also be relevant, but one would need to study how well a chosen measurement could represent these factors.

Although standard for the times, the statistical analyses of allometry used by Gould (1973, 1974) and Clutton-Brock et al. (1980) are inadequate by current standards by not accounting for phylogeny or reliability of species statistics. Furthermore, rather than a mean or median, Gould (1973, 1974) based his comparative analysis on the maximum reported antler size, a statistic with poor precision and biased by sample size and reporting. An even more serious problem is that Gould (1973) drew his key conclusion that the Irish elk did not deviate from the evolutionary allometry from a reduced-major-axis regression. Despite common use in studies of allometry, the reduced-major-axis slope is not a reasonable estimator of allometric exponents and can easily return estimates that are grossly in error if used for this purpose (Hansen & Bartoszek, 2012; Kelly & Price, 2004; Kilmer & Rodriguez, 2017; Pélabon et al., 2014; Seim & Sæther, 1983; Voje et al., 2014).

Here we revisit Gould’s iconic study with a new, consistently collected morphometric data set on antler and skull size for cervids, including the Irish elk. We compute accurate measures of antler volume as a representation of antler size, we use state-of-the-art phylogenetic comparative methods, and we account for estimation error in species means. We further present a new molecular phylogeny for Cervidae on which we base the comparative analysis.

Methods

Data and Measurements

Morphometric measurements of antlers and skulls were obtained for 568 museum specimens from 57 species or subspecies of cervids (Table 1). The specimens were from the American Museum of Natural History in New York (AMNH), the National Museum of Natural History in Washington (NMNH), the Natural History Museum in London (NHMUK), the Natural History Museum in Vienna (NHMW), the Swedish Museum of Natural History in Stockholm (NHRM), the Norwegian University of Science and Technology–Natural History Museum (NTNU NHM), the National Museum of Ireland–Natural History (NMI), the Museum of Geology and Paleontology at University of Florence (IGF), the Osaka Museum of Natural History (OMNH), and the teaching collection of the Department of Biology, University of Oslo (UiO). All measured specimens were prime-aged individuals as judged by tooth wear (see supplement for details).

Table 1 Taxon means (± standard error) used for comparative analyses

The main morphometrics data were obtained by 3D photogrammetry as described in Tsuboi et al. (2020). Briefly, a set of 40–50 photos were taken with a 20.2 megapixels Canon G7X camera from each skull with attached antlers and then processed with RECAP PHOTO® (Autodesk Inc.) to produce a 3D photogrammetry object of the specimen. Three paper scales of 10 mm × 30 mm were placed on the specimen to determine scale. The photogrammetry object was “cleaned” manually to remove artifacts from the background. Rendered images of select specimens, including the extinct Megaloceros and Eucladoceros, are shown in Fig. 2. The original images are available from authors on request, and the measurements and details of individual specimens are available from Dryad (https://doi.org/10.5061/dryad.kh18932dt).

Fig. 2
figure 2

Larger antlers in extinct deer? Rendered photogrammetry images of the skull and antlers of a a complete specimen of the Irish elk, b the Florence specimen of Eucladoceros, and c our largest specimen of Alaskan moose. The 22.6 l estimated antler volume of the specimen in a is below the average for our sample of Irish elk. The 11 l antler volume of the Eucladoceros specimen represents the largest positive deviation of any species from the evolutionary allometry, and the 21.6 l volume of this specimen of the Alaskan moose is by far the largest set of antlers in our sample excluding the Irish elk. The corresponding 3D images are presented in Supplementary movies

The volume measurements obtained from the photogrammetry object had an average relative error of 8.5% per antler that did not vary with skull size, and appeared unbiased as estimators of weight, as assessed by estimates from 71 loose antlers with known weight and 20 clay models with known weight and volume (Tsuboi et al., 2020). The imprecision was largely due to variation in the images of the scale bars and could be substantially reduced by repeated measures. For this reason, we base all measurements on two replicates of the image processing, which reduced the relative error to 4.3%. These levels of measurement error are microscopic on the level of a comparative analysis across the family with repeatabilities in excess of 99.9% (Tsuboi et al., 2020) but should be taken into account in within-species analyses. For 11 out of the 25 Megaloceros specimens only a single antler could be measured, the other being broken. For these, and also for a few specimens from other species, we estimated the antler volume as twice the volume of the single complete antler. We accounted for the decreased reliability of these specimens in our modeling of observation variance. This does not correct for bias due to potential directional asymmetry, which might have an effect in the case of the Irish elk, as 9 right and only 2 left antlers were missing. There was no indication of substantial directional asymmetry, however, because the mean left and right antler sizes of the 14 complete specimens were 12.3 and 12.1 l, respectively.

In addition to the photogrammetry, we took manual measurements of skull length, main beam length and pedicle circumferences from each skull. The main beam length was measured by two observers (BTK and MT) as the length of the main antler beam along the outer curve from the margin of the burr to the tip, for both left and right antlers to the nearest millimeter using a measuring tape. The posterior skull length was measured with a ruler as the distance between the posterior margin of the occipital condyle and the posterior edge of the third upper molar. We were unable to obtain this measure for Dama mesopotamica, Rusa alfredi and Muntiacus putaoensis, because all specimens lacked occipital condyles. In these cases, we estimated their posterior skull lengths using a cross-species regression based on body mass data obtained from the literature. Further details can be found in the supplementary methods (Fig. S1). The pictures and measurements were taken by four observers (MT, BTK, MG, CS), but as Tsuboi et al. (2020) found no observer effect, these were pooled. Image processing was done by one observer (MT). Repeated measures were taken from all specimens for antler volume, and from 115 specimens for posterior skull length and main-beam length. Measurement variances are reported in Tables S1S3.

To replicate aspects of Gould’s analysis we also used measures of beam length and shoulder height taken from Ward (1903) and reported in Table S4.

Phylogeny

Sequence data for phylogenetic analysis consisted of 12,097 base pairs (bp) representing 12 mitochondrial loci: cyt b (1140 bp), COI (1545 bp), COII (684 bp), COIII (783 bp), ND1 (954 bp), ND2 (1044 bp), ND3 (345 bp), ND4 (714 bp), ND5 (1,821 bp), ATP6 (681 bp), ATP8 (117 bp), d-loop (605 bp); and 3 nuclear loci: αLAlb (460 bp), PRKCI (514 bp), sry (690 bp), for 43 species of extant deer with 2 subspecies of Alces alces, 15 subspecies of Cervus elaphus, 8 subspecies of Cervus nippon, 7 subspecies of Rangifer tarandus, and the extinct Megaloceros giganteus. The sequences for all mitochondrial and one of the nuclear (sry) loci were obtained from Genbank (Dryad: https://doi.org/10.5061/dryad.kh18932dt). These were aligned using the software MAFFT v.7.213 (Katoh & Standley, 2013) for each locus. Unreliably aligned sites, as identified with BMGE v.1.0 (Criscuolo & Gribaldo, 2010), were excluded from the alignments. Codon positions were identified using AliView v.1.18 (Larsson, 2014) for all protein-coding genes, and codons with insertions and gaps were manually removed. For the remaining two nuclear loci (intron 2 of α-lactalbumin; αLAlb and intron 1 of protein kinase C iota; PRKCI), we obtained sequence alignments from Gilbert et al. (2006).

Bayesian molecular phylogenetic analyses were performed on the concatenated alignment using BEAST 2 v.2.4.5 (Drummond et al., 2012). To account for potentially different evolutionary rates, the sequence data were split into five separate partitions for (i) the mitochondrial d-loop, (ii) the combined first and second codon positions of all protein-coding mitochondrial genes, (iii) the third codon position of mitochondrial genes, (iv) the concatenated intron sequences of the αLAlb and PRKCI genes and (v) the sry gene. For each partition, we inferred and averaged over substitution models with the bModelTest package (Bouckaert & Drummond, 2017). This approach accounts for uncertainty in the identification of the best substitution model in a Bayesian framework.

As detailed in the supplement, we used seven fossil taxa to constrain the ages of the following clades: tribe Alceini, tribe Capreolini, genus Capreolus, genus Cervus, genus Dama, tribe Muntiacini and the New World deer (tribes Rangiferini and Odocoileini). We used uniform prior distributions for the divergence times of these clades, with the most recent age of each fossil specimen as the lower boundary and 100 mya as an upper boundary. The upper boundary was inconsequential because we constrained the root age of the family Cervidae with a normal prior with a 14.2 mya mean and a 0.89 myr standard deviation based on a recent time-calibrated molecular phylogeny of Cetartiodactyla (Toljagić et al., 2018).

To facilitate comparisons with previously reported phylogenies including the family Cervidae, and in particular with respect to the position of Megaloceros giganteus, we performed three additional analyses: (i) with the GTR + Gamma model instead of the bModelTest approach, (ii) without nuclear data, and (iii) without fossil constraints. For each of these settings, we ran three replicate analyses with random starting trees, each for 300 million Markov-chain Monte Carlo generations. Chain convergence was assessed based on effective sample sizes greater than 200 for all parameters and by comparing parameter traces within and among run replicates using Tracer 1.6.0 (Rambaut et al., 2018). After discarding the first 20% of posterior tree estimates from each replicate, we combined the posterior distributions of replicates into a set of 10,000 posterior phylogenies for each of the four sets of posterior distributions. Maximum clade credibility phylogenies were generated from these sets using TreeAnnotator v.2.4 (Bouckaert et al., 2014), with node heights according to the mean posterior clade age estimates.

Comparative Analysis

The comparative analysis follows Grabowski et al. (2016) and is implemented in the software Slouch (Hansen et al., 2008; Kopperud et al., 2020; http://github.com/kopperud/slouch). Slouch is based on an Ornstein–Uhlenbeck model of delayed adaptation, but here we used a “direct-effect” model as described in Grabowski et al. (2016). This differs from the standard Ornstein–Uhlenbeck model of adaptive evolution in that the response variable (antler size) changes immediately in response to changes in the predictor variable (body size), as would be expected from an allometric constraint. The model still incorporates slow residual changes around the current state according to an Ornstein–Uhlenbeck process, which makes correlations between species decay exponentially with phylogenetic distance. To quantify phylogenetic signal we use the phylogenetic half life, t1/2, which measures the time it takes before half the correlation with the ancestral state is lost (Hansen, 1997). When tested in isolation (i.e., from a model with only an intercept), log antler volume, log beam length and log posterior skull length had best estimates of t1/2 = ∞, indicating a Brownian-motion-like phylogenetic signal (2-unit support intervals extended down to 39%, 46% and 43% of tree height for the three traits, respectively).

The comparative analyses were based on taxon means as given in Table 1. Some of the species means include specimens from distinct or unspecified subspecies as indicated in the supplementary specimen data file. We incorporated the estimation variance in the species means of both antler- and body-size measures into the analysis. These were computed from our species-specific samples, but as the samples were small for some species, we used the correction presented in Grabowski et al. (2016) to estimate the error variance in the estimated means for each species based on a weighted average of the sample variance of the focal species and the average sample variance of the other species. In this way, the error variances of species with large sample sizes are based mainly on their own sample variances, while the error variances of species with small sample size are based mostly on the average sample variances of other species. We corrected for attenuation of the generalized-least-squares regression as described in Hansen and Bartoszek (2012).

Within-Species Analyses

Static allometries were estimated from standard log–log regression, but we corrected for attenuation due to measurement variation in the predictor variable by dividing the slope with 1 − \({\sigma }_{m}^{2}/{\sigma }_{t}^{2}\), where \({\sigma }_{m}^{2}\) is the measurement variance and \({\sigma }_{t}^{2}\) is the total within-species variance in the predictor (i.e., log posterior skull length). We estimated measurement variance from repeated measures as described in Tsuboi et al. (2020) and assumed it was constant within species. For the Irish elk we additionally weighted the regression by setting the measurement variance of single-antlered specimens to twice the measurement variance of total antler size minus 4 times the measurement covariance between the left and right antlers.

Even if our measures are from specimens assessed to be of prime age, there could be a component of ontogenetic variation in our sample. We studied this in a sample of 66 red deer of known age, which also included younger individuals. In this sample we found a linear relationship, R2 = 68%, between age and the circumference of the pedicle from which the antler grows (Fig. S2). Assuming this extends to other species, we used pedicle circumference as a proxy for age in our analyses. We removed five specimens that were outliers in terms of both antler size and pedicle circumference (see supplement for details). This included the smallest specimen of the Irish elk, which with an antler volume of 12.9 l was 19% smaller than the second-smallest specimen. Hence, all subsequent analyses involving the Irish elk are based on 24 specimens.

Results

Phylogeny

The maximum clade credibility phylogeny based on the full molecular data set is shown in Fig. 3. The Irish elk is supported as a sister species to the fallow deers with an estimated split time 4.7 mya. The most recent common ancestor of the Cervidae as a whole is estimated to have lived 14.0 mya. The fossil constraints had little effect on the phylogeny, as the node ages estimated from molecular data were always older than the oldest fossil record of the relevant clade. The use of the GTR + Gamma model in place of model averaging led to minor changes in topology within Odocoileini and among the wapiti subspecies of the genus Cervus. Exclusion of nuclear markers resulted in alternative positions for the Alceini and Capreolini. The alternative phylogenies are presented in Fig. S3.

Fig. 3
figure 3

Molecular phylogeny of Cervidae including the Irish elk: The root of the phylogeny is 14.0 mya with a 95% credible interval of 12.3–15.8 mya, and the first split separates the true deer and the muntjacs from the other cervids. The Irish elk is the sister species of the fallow deers and the split is estimated at 4.7 mya with a 95% credible interval of 3.7–5.8 mya. Nodes with fossil constraints are indicated by numbers corresponding to description in the supplement. BEAST2 input and result files are available on Dryad: https://doi.org/10.5061/dryad.kh18932dt

Evolutionary Allometries

The morphometric volume estimates of antlers revealed that the Irish elk indeed had the absolutely largest antlers of any measured deer (Table 1). The average antler volume for our 24 mature specimens was 25.5 l, which is more than twice the average volume of any other species. The second and third largest average antler volumes were 12.4 l for the Alaskan moose (Alces alces gigas) and 11.0 l for the single specimen of the smaller-bodied Eucladoceros. Even the specimen of the Irish elk with the smallest antlers, a possibly subadult outlier at 12.9 l, had larger antlers than the average moose, and the moose with the largest antlers, at 21.6 l, was smaller than the average Irish elk. Apart from these two specimens, the antler-volume distribution of the Irish elk did not overlap with the distribution of any other species. As a curiosity, the one specimen of the Japanese giant elk, Sinomegaceros, had much smaller antlers at 8.2 l (extrapolated from a single right antler) despite being of similar body size as the Irish elk. We caution that the antler and skull of this specimen were fragmentary and partially reconstructed (Supplementary methods; Taruno et al., 2019).

The allometric analysis of antler volume relative to posterior skull length across Cervidae shown in Fig. 4 revealed an allometric exponent of 5.95 ± 0.29. This analysis included all taxa shown on the phylogeny in Fig. 3 except Muntiacus atherodes and Elaphodus cephalophus, which have rudimentary antlers that are anatomically distinct from other deer in that they are parts of the pedicel and not normally shed (Groves & Grubb, 1990). Dividing the exponent by three to account for the dimensionality of volume relative to length yields a value close to two, which is strong positive allometry. It predicts that doubling body size would quadruple antler size on a comparable scale.

Fig. 4
figure 4

Evolutionary allometry of antler volume on skull length: The dashed line is the allometry across 46 taxa from the family Cervidae and the solid line is across 18 taxa from the tribe Cervini. Both regressions are based on as many taxa as possible from the phylogeny in Fig. 3, but exclude Muntiacus atherodes, Elaphodus cephalophus and female Rangifer. Taxa not included in the regressions are marked with open symbols, and extinct taxa are marked with daggers. The regressions are based on phylogenetic generalized least squares including measurement variance and corrected for attenuation due to measurement variance in the predictor. The evolutionary model was an Ornstein–Uhlenbeck process with a direct-effect predictor assumed to follow Brownian motion. The Cervidae model had an allometric exponent of 5.95 ± 0.29, and an intercept (at mean log skull length) of − 15.17 ± 0.05 log(l). The phylogenetic half life was t1/2 = 3.5 myr (24% of tree height, support interval: 0%–266%), the stationary variance was v = 0.30 log(l)2 and the R2 = 90%. The diffusion variance for the predictor was 1.73 log(cm)2/t, where t is tree height. The Cervini model had an allometric exponent of 4.86 ± 0.35, and an intercept of − 12.41 ± 0.09 log(l). The phylogenetic half-life was t1/2 = 0 myr (support interval: 0–∞), the stationary variance was v = 0.092 log(l)2 and the R2 = 92%. The diffusion variance for the predictor was 0.46 log(cm)2/t

This evolutionary allometry predicts that an average-sized Irish elk should have an antler volume of 25.3 l. The observed average of the Irish elk is thus only 1% above the prediction, and this analysis supports Gould's hypothesis that the “gigantic” antlers of the Irish elk are as expected for a deer of its size. The Eucladoceros specimen on the other hand, had an antler volume that was more than twice the allometric prediction of 5.2 l for its size.

Gould included only the tribe Cervini in his analysis, however, and restricting our analysis to the Cervini, as marked in Fig. 3, revealed a shallower allometry with an exponent of 4.94 ± 0.37 (Fig. 4). With this allometry, the projected antler volume for the Irish elk becomes 17.5 l, which puts the average observed antler volume almost 50% above the prediction, and more than 100% above if the Irish elk were omitted from the allometric regression. This is inconsistent with Gould’s hypothesis, and also with his results.

Comparison with Gould

As illustrated in Fig. 1, Gould found that the Irish elk had antlers with main beam length about 13% above the prediction from his least-squares regression and essentially as predicted by his reduced-major-axis regression. We now present some analyses to identify the sources of the discrepancy between Gould's and our results.

The inclusion of phylogeny did not explain the difference, as the best estimate for the phylogenetic half life was zero implying no effect of phylogeny on the Cervini allometry. Due to the low number of species, however, this estimate is highly uncertain with the likelihood surface essentially flat up to moderately long half lives. This uncertainty is illustrated by the fact that including the outliers shifted the best estimate of phylogenetic half life to 26% of the age of the phylogeny (Table S7). For the whole of Cervidae the phylogeny did have a moderate effect with a best estimate of half life at 25% of the age of the phylogeny. In this case, the ordinary-least-squares slope was even steeper at 6.38 ± 0.26 (R2 = 93%, intercept: − 16.23 ± 0.61) leaving the Irish elk with smaller antlers than projected from its size.

The differences between the generalized- and ordinary-least-squares slopes in Fig. 5 were thus due to the inclusion of observation error variance in the former. This can explain some, but not all, of the differences from Gould. The ordinary-least-squares slope for our Cervini data was steeper than our generalized-least-squares slope (Fig. 5a), but still left the Irish elk 32% above the prediction. The attenuation effect due to error variation in the predictor variable was negligible, however, as expected from the large size range among species.

Fig. 5
figure 5

Comparison of methods and measurements: In a we compare our generalized-least-squares (GLS; dotted line) regression for the Cervini data with ordinary least squares (OLS; solid line). In b and c we do the same analyses but replace either antler volume with main beam length or skull length with shoulder height. Note how the two have opposite effects on the deviation of the Irish elk (bold outline). In d we regress main beam length on shoulder height

The different measures of antler and body size had large but opposing effects. Inspection of the panels in Fig. 5 shows that using antler volume in place of beam length tends to make the Irish elk more extreme (compare Fig. 5a with c and b with d), while the use of skull length in place of shoulder height works in the opposite direction (compare Fig. 5a with b and c with d). The net result of regressing our beam-length data on shoulder height is a 1.55 ordinary-least-squares exponent and a 1.47 generalized-least-squares exponent. The former puts the beam length of an Irish elk with 183 cm shoulder height 16% above the regression, and the latter puts it 24% above (Fig. 5d). This is higher than found by Gould, but less than in our analysis with antler volume and skull length, as can be seen by comparing Fig. 5a with d.

Using the mean rather than the maximum beam length did not make a large difference (Fig. 6). Replacing the maximum with the mean with Gould’s data in fact shifts the Irish elk from 13 to 6% above the projected value (Fig. 6a).

Fig. 6
figure 6

Comparison of allometries based on maximum versus mean beam length: Solid lines are ordinary-least-squares regression of log maximum beam length on log shoulder height, and dashed lines are ordinary least-squares regression of log mean beam length on log shoulder height. Filled circles are maximum and open circles are mean beam length. Shaded circles are cases with a single specimen. a Gould’s (1974) data. b Our Cervini data. Removing the sambar from our analysis yields regressions with slopes 1.73 for both max and mean antler height

Gould also based his analysis on 18 taxa, but lacked some of our taxa, and included subspecies of Cervus elaphus and C. nippon that we do not have. A major difference between the respective ordinary-least-squares analyses of beam length on shoulder height stems from our inclusion of the sambar, Rusa unicolor, which is a substantial outlier in Figs. 5b, d and 6b that pulls down the regression. Without the sambar our ordinary-least-squares slope for the Cervini becomes 1.73, which is even steeper than Gould's slope of 1.60. This outlier may be caused by the published shoulder-height data for the sambar deriving from the larger-bodied Indian subspecies while our antler measures include smaller Bornean and Sumatran subspecies. We also note that Gould had the mean beam length of the Irish elk at 188 cm while we have it at 161 cm, which may be a result of Gould including private display specimens while we used only museum material. Our value of 161 cm falls well below Gould’s regression line.

Gould’s reduced-major-axis slope was steeper than his least-squares slope. This is also true with our data for the Cervini, for which the reduced-major-axis slope of antler volume against posterior skull length was 5.36, which is substantially steeper than the least-squares slope, but still leaves the Irish elk 17% above the prediction.

Most large-bodied deer have palmated or bifurcated antlers with relatively high volume-to-length ratios, and this may explain why volume has a steeper evolutionary allometry than linear size (Fig. 7). This also tends to put deer with palmated antlers, such as the Irish elk, the fallow deer and the moose, above the evolutionary allometry.

Fig. 7
figure 7

Effects of antler shape on relation between beam length and volume: Antlers are classified into bifurcated, main beamed or palmated with separate ordinary-least-squares regressions over our Cervidae data for the three types. Bifurcated: Log(volume) = − 10.04 + 2.56 log(beam length), R2 = 96%. Main-beamed: Log(volume) = − 8.42 + 2.05 log(beam length), R2 = 97%. Palmated: Log(volume) = − 10.88 + 2.85 log(beam length), R2 = 90%

Static and Ontogenetic Allometries

Static allometries between antler volume and skull length computed for the 21 species with a sample size of at least ten individuals were rather erratic (Fig. 8a). The static exponent averaged 3.00 ± 0.35 (weighted average accounting for attenuation). This is exactly isometric and much smaller than the evolutionary allometric exponent. For the Irish elk, the static allometric exponent was 0.82 ± 1.96 (Fig. 8b), which is hypoallometric, and dramatically different from Gould’s (1974) finding of strong positive static allometry for various linear measures of antler size on skull length with exponents ranging from 2 to above 3. Also, in contrast to Gould, almost none of the within-species variation in antler volume was explained by the allometry. A poor fit to the allometric equation seems general across deer species with considerably less than 50% of log antler volume variance explained by static allometry in most cases (Table S6). For our data on beam length versus skull length in the Irish elk, we found an exponent of 0.37 explaining 1% of the variance, which is similarly inconsistent with Gould’s analysis.

Fig. 8
figure 8

Static allometries: a Static allometries for all cervid species with more than ten specimens plotted together with the evolutionary allometry (dashed line); data in Table S6. b Blow-up of the data for the Irish elk with pedicle circumference (an age indicator) indicated in color. Note that the outlier marked with a circle is also a specimen with small pedicle circumference. This specimen was not included in calculating the regression. Static allometries are based on ordinary least squares except for the Irish elk for which weighted regression was used to account for specimens with only a single measurable antler. Slopes are corrected for attenuation due to measurement error in the predictor

Within the red deer there appears to be a breakpoint between 6 and 7 years of age, after which the size of the antler is not increasing with age (Fig. 9a; see also Huxley, 1931; Kruuk et al., 2002; Mattioli et al., 2021; Nussey et al., 2009). Using the sample of 33 young (≤ 6 year) red deer, we found an ontogenetic allometric exponent of 4.91 between antler volume and posterior skull length (R2 = 50%), which is much steeper than the static allometric exponent of 1.60 calculated from the 33 mature red deer (≥ 7 year), and similar to the evolutionary allometry (Fig. 9b).

Fig. 9
figure 9

Ontogenetic and static allometry of Norwegian red deer Cervus elaphus atlanticus: a Antler volume against age. b Antler volume against posterior skull length. Open circles are animals above 6 years of age and filled circles are animals 6 years and younger. The black solid line is the evolutionary allometry across Cervini. The light solid line is the ontogenetic allometry across animals 6 years and younger (intercept: − 13.11 ± 2.36, slope: 4.01 ± 2.36, R2 = 50%), and the dashed line is the static allometry across animals above 6 years of age (intercept: − 3.25 ± 2.48, slope: 1.60 ± 1.01, R2 = 7.5%)

Discussion

Our evolutionary allometric analysis of deer antler volume both supports and contradicts Gould’s claim that the gigantic antlers of the Irish elk are predictable from allometry. Although we reject Gould’s claim that the Irish elk falls on the evolutionary allometry across the true deer (tribe Cervini), its antler volume averages 50% or more than predicted from this group, we did find that its antler volume is almost perfectly predicted from the allometry estimated across the whole of Cervidae. There are no obvious biological reasons why one of these analyses is more correct than the other. The difference between them is largely due to a set of small-bodied non-cervine deer with very small antlers, which increases the regression slope across the whole of Cervidae. Arguably, this difference could reflect a nonlinearity in the allometry due to change of function in this small-bodied group, such as the cessation of a social signaling function from the antlers, but there is no firm evidence for this (Groves & Grubb, 1990; Lopez & Stankowich, 2023). Gould did not provide any argument for restricting the allometric analysis to cervines, and even plotted and discussed some non-cervine deer, as the moose, in relation to the cervine allometry.

Methodological Considerations and Differences from Gould

Why does our analysis of antler allometry in Cervini differ from that of Gould? There are six candidate causes: (1) Accounting for phylogeny in the comparative analysis. (2) Accounting for observation error. (3) Using different measurements of antler and body size. (4) Using mean versus maximum antler size. (5) Inclusion of different species and specimens. (6) Using regression versus reduced major axis. All of these had impact, but none were sufficient to explain the difference, which is a collective outcome of all the factors taken together.

The use of generalized least squares to account for phylogenetic correlations in the residuals and observation variance in the species means did have an impact in that the Irish elk had antlers 50% above the prediction from the generalized-least-squares regression as compared to 32% from the ordinary-least-squares regression, but the latter is still different from Gould's analysis. The entire difference between generalized and ordinary least squares across Cervini was due to observation variance, as the best estimate of phylogenetic signal in the residuals was zero.

The use of volume rather than beam length to measure antler size, and posterior skull length rather than shoulder height to measure body size both had effects on the results, but these pulled in different directions that partially cancelled out, leaving the Irish elk with beam lengths 16% above the prediction from shoulder height based on ordinary least squares across Cervini. Gould’s use of maximum rather than mean antler size had less effect on his results, and also made the Irish elk slightly more extreme, as it would have been only 6% above the prediction if Gould had used the mean beam length, as compared to 13% with the maximum beam length. This may be due to Gould's large sample for the Irish elk, n = 79, as the maximum is expected to increase with sample size.

The remaining difference between our and Gould’s ordinary-least-squares allometry of beam length on shoulder height must be due to different data, and in particular to our inclusion of the sambar, which was a substantial outlier. Without the sambar, our ordinary-least-squares analysis of the Cervini in fact returned a larger allometric exponent than found by Gould.

Gould’s use of reduced-major-axis regression needs comment. In his 1974 Evolution paper he presented both reduced-major-axis and least-squares regression, and generally put more emphasis on the latter, which he described as more “conservative”. In his brief 1973 paper in Nature, however, he presented only the reduced-major-axis slope, and thus gave the impression of a perfect fit to his hypothesis. It cannot be stated too strongly that reduced major axis is without merit in the study of allometry. The reduced-major-axis slope is simply the ratio between the standard deviations of the response and the predictor variable, and except for determination of the sign of the slope, it does not involve the covariance between the variables at all. The method will consequently return a substantial slope for most any pair of variables regardless of whether they are related or not. This slope will approach the least-squares slope when the correlation between the variables becomes high, but deviation from the least-squares slope is little more than meaningless error. The method is sometimes presented as accounting for error in the predictor variable, but this is a misunderstanding based in confusing measurement error with biological model deviations (see Hansen & Bartoszek, 2012 for details).

Different measures are not just about accuracy, but also convey different biological meanings. For example, the shallower allometry of volume relative to beam length is due to the fact that palmated antlers, which are more massive relative to their span, are more common among larger deer. Lemaître et al. (2014) argued that the beam-length allometry is becoming shallower for deer above 100–120 kg and suggested that this is due to increased costs of relatively larger antlers in large-bodied species. The same argument was made for bovid horns in Tidiere et al. (2017). Our results, extending those of Tsuboi et al. (2020), show that this effect is more likely due to an evolutionary change in shape of antlers with increasing size than to a reduced investment in mass (Fig. 9).

As detailed in Table S5, shoulder heights were obtained from a variety of sources and are of mixed and often unknown quality. It is possible that the shallower slopes on shoulder height is influenced by attenuation due to large observation errors. Our skull-length measures are certainly more accurate, but they are not necessarily better biological measures of the size of the animal. The R2 between the two measures (logged) across Cervini is 93%, which is a poor correlation across such a large range of body sizes. Clearly, choice of measure matters, and obtaining size measures that are both statistically and biologically accurate is no doubt an important step toward more refined analyses of allometry even on the among-species level (cmp. Houle et al., 2011; Smith & Jungers, 1997).

Phylogeny and Evolution of Deers

Our phylogenetic analysis is not the first to include the Irish elk based on ancient DNA. Kuehn et al. (2005), Lister et al. (2005) and Hughes et al. (2006) pioneered this, and our analysis is consistent with the latter two in having the Irish elk as a sister species to the fallow deers (Dama). An association between the Irish elk and the fallow deers has also traditionally been assumed based on morphological arguments (see Geist, 1998; Gould, 1974 for discussion). On the other hand, Kuehn et al. (2005) and Agnarsson and May-Collado’s (2008) molecular analyses separated the Irish elk from the fallow deers. The former aligned it with Cervus and the second placed it with Pere David’s deer (Elaphurus davidianus) in an early-diverging lineage within the Cervini. The latter association was also a close alternative in the analysis of Hughes et al. (2006), and as the support in Lister et al. (2005) was partially based on morphological data, Agnarsson and May-Collado (2008) concluded that the exact position of Megaloceros remains to be conclusively determined. Later, Immel et al. (2015) again found molecular support for a Megaloceros-Dama association, and our study provides a further step in that direction by providing solid support (posterior probability 99.97%) for Megaloceros as belonging with the fallow deers in a clade within the Cervini. In our analysis a clade combining Axis and Rucervus forms a sister group to all other Cervini.

Our results support earlier molecular phylogenies (e.g., Gilbert et al., 2006; Hassanin et al., 2012; Pitra et al., 2004; Toljagic et al., 2018) in placing all South American deer in one clade, the Odocoileini. We estimate the first split within this clade at a little less than 9 mya, and at least 8 lineages in our phylogeny were in existence 3 mya and thus before the formation of the Isthmus of Panama (see Stange et al., 2018), as also found by Duarte et al. (2008). Our results are therefore consistent with an invasion of South America millions of years before the landbridge appeared (cmp. Simpson, 1980; Stange et al., 2018). The alternative would be a massive radiation of forms in North America that then independently moved south during the great American biotic interchange and, with the exception of the Odocoileus, went extinct in North America.

Allometric Constraints and Antler Evolution

The Irish elk is likely to have evolved its large size in less than the 5 myr since it split from the fallow deers. The European fallow deer also has palmated antlers that are considerably larger than the allometric prediction from both the Cervidae and the Cervini, and it is possible that the relative increase in antler size from a Dama-like ancestor has followed an evolutionary allometry. In fact, the evolutionary allometric exponent based on Dama dama and Megaloceros alone is 5.01, which is only slightly steeper than the cervine allometry. This is consistent with Gould’s hypothesis in the sense that the recent evolution of the large antlers may be a simple consequence of allometry. Speaking against this possibility is the general observation of considerable disparity in antler size and shape among megacerine deer (Geist, 1998; Lister, 1994; van der Geer, 2018), as also illustrated by our finding of relatively smaller antlers in Sinomegaceros, but we caution again that this is based on incomplete data, and should be treated as an anecdotal observation.

Given the megacerine variation and the considerable non-allometric variation in antler size across the deer family on display in Fig. 4, it seems unlikely that there are strong allometric constraints on antler evolution on time scales of millions of years. The fallow deer, for example, has antlers that are 35% larger than predicted from the Cervini allometry and 83% larger than predicted from the Cervidae allometry, while the European moose has antlers half or less than half of the predicted volumes from the two allometries.

Allometric constraints may still be important on shorter time scales, and it remains possible that the Irish elk got its gigantic antlers as a side effect of a rapid increase in body size from a large-antlered fallow-deer-like ancestor. Gould (1974) suggested that such constraints could result from within-species allometry, and presented data showing a strong positive static allometry for his Irish elk sample. We were unable to replicate this result, finding no evidence for a positive, or indeed any, static allometry across our 25 specimens of the Irish elk. We are at a loss to explain this discrepancy. Gould may have had a few juvenile specimens in his sample, but not enough to make a big difference. We further found no case of good fit to a static allometry in any of the 21 deer species for which we had sufficient data, and we must conclude that there is no strong relationship between adult antler and body size within species of deer. Our analysis of the red deer data, however, revealed a positive ontogenetic allometry in that antler size increases more rapidly than body size with age in young red deer, but after 6 years of age the relationship between antler size and body size disappears, leaving no meaningful static allometry among adult red deer. Kruuk et al. (2002) and Nussey et al. (2009) found a similar pattern, while Mattioli et al. (2021) found a weaker, but still substantial, relationship between body mass and antler mass among adult red deer. If such patterns generalize to other species, it is possible that the evolution of antler size could happen through heterochronic changes in which juvenile patterns of antler growth are extended or shortened. The age of maturity and growth pattern of the Irish elk are not known, and the hypothesis that the large size of the antlers is due to an extension of juvenile growth patterns remains speculative but is not rejected by our data. The general hypothesis could be tested with comparative growth data from extant deer. We also caution that antler density may vary across species (Tsuboi et al., 2020), and that antler-mass allometry may be different from antler-volume allometry.

In conclusion, we found no evidence for allometric constraints as an explanation for the large antlers of the Irish elk. The influence of a heterochronic shift along an ontogenetic allometry during a recent evolutionary burst remains possible, but the general variability of antler size and shape across deer makes it hard to conceive that the Irish elk would be constrained from decreasing the size of its antlers if they were seriously maladaptive. There is also no evidence that the antlers played a role in the extinction of the species (Lister & Stuart, 2019). We still believe evolutionary allometries are informative in providing a perspective on what and how species are unusual, and useful as benchmarks to control for body size in comparative studies of antler adaptation. Our study also illustrates the impact of methodology and data quality in comparative studies. Choice of regression techniques, phylogeny correction, trait measurements, taxon range and sampling all had impact on the results, and are likely to influence the conclusions of any comparative study of antler evolution.