Tyrannosaurus Osborn (1905)
Tyrannosaurus rex Osborn (1905)
Dynamosaurus imperiosus Osborn (1905)
Gorgosaurus lancensis Gilmore (1946)
Gorgosaurus megagracilis Paul (1988)
Tyrannosaurus imperator Paul et al. (2022).
Tyrannosaurus regina Paul et al. (2022).
Tyrannosaurus incertae sedis Paul et al. (2022).
Tyrannosauridae incertae sedis Paul et al. (2022).
Type species: Tyrannosaurus rex Osborn (1905).
Tyrannosaurus rex can be distinguished from its sister species Ta. bataar on the basis of: giant body size in skeletally mature individuals where the femur length is 1.2 m or greater; orbitotemporal region of the skull is wide (snout width to temporal width measured across quadratojugals jugals is 53% or less); presence of a rostrodorsal process on the rostral ramus of the lacrimal and a corresponding caudolateral process of the nasal, producing an interlocking suture between the bones in lateral view; absence of a lateral swelling on the supraorbital process of the lacrimal; absence of a subcutaneous flange along the ventral margin of the antorbital fossa (= external antorbital fenestra); short ventral process of the prefrontal; absence of a deep ventral flange on the vomer; a minimum of 11 maxillary and 12 dentary teeth in skeletally mature individuals; long arm (humerus to femur ratio is 29%) and metatarsus relative to femur length (metatarsal III to femur ratio is 51%) (Currie, 2003; Fig. 2; Table 1; Brochu, 2003).
Another issue is the authors could not identify to the species level several well-preserved and nearly complete skulls (AMNH FARB 5027, LACM 23844, MOR 008, UWBM 99000) that they list as incertae sedis. We concede that one of the incertae sedis specimens (AMNH FARB 5027) includes both dentaries, but it is contained in a glass case, making it inaccessible to measurement. In contrast, several incomplete specimens, including one without a skull (TCM 2001.90.1), were identified to species. The authors state these unidentifiable specimens are of uncertain proportions or stratigraphic position.
Although any taxonomic revision can cause issues, the scientific priority, following the spirit of the ICZN, is to make the taxonomy stable and practicable. Following that principle, the multiple species hypothesis causes more uncertainty than clarity: Of the 53 specimens listed in the taxonomic section, only five are confidently assigned to T. rex, 11 to “T. imperator” and seven to “T. regina.” A further three are T. rex?, one “T. imperator”?, four T. rex or “T. imperator”, seven T. rex or “T. regina”?, five cannot be determined and seven are listed as incertae sedis. So 30 specimens are uncertain and only 23 are assigned to one of the three species. If the commercially-held BHI specimens are excluded, this worsens to only 13 confidently assigned and 25 not—so two thirds of the specimens are of uncertain identity. In effect, their taxonomic revision results in instability: the diagnoses take many specimens that, until now, were easily classified as T. rex and render them uncertain.
Also, several juvenile and subadult specimens are regarded as incertae sedis; based on what is seen in Ta. bataar, at least one species-level autapomorphy is seen throughout growth in this sister taxon (e.g., subcutaneous flange of the maxilla; Tsuihiji et al., 2011). If species characters also appear early in growth in the T. rex hypodigm (and in fact, the autapomorphic skull shape of T. rex is emergent in juveniles—Carr, 2020), then the best preserved fossils of juveniles (BMRP 2002.4.1, CMNH 7541, LACM 238471) should be referable to any one of the three taxa. To test this hypothesis, we obtained the ratio of dentary teeth 3:2 in BMRP 2002.4.1 and compared that with the ratios in the three taxa. The ratio is 1.6, which supports referral to T. imperator if the presence of a slender second dentary tooth is a reliable diagnostic character that emerges early in growth (but see below). Alternatively, we take the point of Wiens et al. (2005) that diagnostic characters tend to emerge late in growth; i.e., if all young T. rex have slender second dentary teeth, it is improbable that this is a diagnostic character of T. imperator; rather, it is the juvenile state of the character that later changes (in this case, the basal tooth length increases) as the animal matures.
If a species diagnosis is to have any value, then every specimen that includes a complete skull, but not the femur, or vice versa, should be identifiable even if the other relevant information is missing. In this case, the diagnostic features are limited to one cranial character (dentary tooth 3:2 ratio) and one postcranial character (femur l:c ratio), which taken separately are not sufficient to distinguish the taxa. For example, isolated dentaries of T. rex and T. regina, and isolated femora of T. rex and T. imperator, would be confused for each other since their ‘diagnostic’ features are identical. It remains to be established if the diagnostic characters, as set out by the authors, are invariant throughout growth; if they are invariant then their case for multiple species might be strengthened. If stratigraphic data are necessary to identify taxa, as is expressed by the authors, then the identification is circular if morphology is not taken into account (Álvaro & Esteve, 2020). For example, the presumed stratigraphic separation between the troodontids Laetinavenatrix mcmasterae and Stenonychosaurus inequalis was the basis of referring specimens to one or the other taxon (van der Reest & Currie, 2017), until it was found that they are not stratigraphically (or morphologically) separate and represent the same taxon, S. inequalis (Cullen et al., 2021).
In terms of statistically detecting dimorphism, sample size is problematically low in Paul et al. (2022). Only 25 specimens were used in the regression analysis of femoral ratios, and only 6 or 8 specimens were used in the analyses of each “species.” Modeling has shown that sample sizes of 100 specimens might not detect sexual differences as great as 28% (Godfrey et al., 1993 in Mallon, 2017). Modeling also shows that small sample size (n = < 50) affects the precision of quantitatively estimating sexual dimorphism (Kościński & Pietraszewski, 2004). Hone and Mallon (2017) suggest that a sample size of at least 35 specimens of each sex is required for sexual size differences to be statistically detectable, assuming a high alligator level of dimorphism.
The diagnoses of the three species are based on two morphological characters, femoral robusticity and the ratio of two dentary teeth (Paul et al., 2022). This is not to say that naming fossil taxa on a limited number of traits is inherently incorrect or flawed; when data are limited because remains are fragmentary, this is understandable in context. However, this low number of diagnostic characters stands in contrast to the numerous well-documented specimens of Tyrannosaurus available and 1850 anatomical traits that have been scored for more than 40 specimens (Carr, 2020). Furthermore, the tooth count of individual cranial bones is known to be variable in multiple theropod groups, including tyrannosaurines (Carr et al., 2017), and so selecting the presence or absence of a single dentary tooth is a trait that varies intra- and interspecifically is not a strong character for the basis of taxonomic identity (Carr, 2020).
The hypothesis of Paul et al. (2022) depends on the precise and accurate stratigraphic position of the specimens in their study. However, these data are based on Larson (2008b), personal communication, and a meeting abstract (Kaskes et al., 2016). The data in Larson (2008b) are imprecise (several are personal communications, or vague, e.g., from the “lower half” of the formation) and the personal communication is taken at face value; this renders independent verification of these stratigraphic assessments difficult to test by other researchers. Precise stratigraphic data are required to put fossils into a robust framework (e.g., Scannella et al., 2014). The stratigraphic framework itself is vague, where the Hell Creek Formation is blocked out into lower, middle, and upper thirds by Paul et al. (2022)—an approach of insufficient resolution to provide a result comparable to the Triceratops speciation study (Scannella et al., 2014). This is not to say that taxa cannot be separated into broad stratigraphic bins that are taxonomically informative and laterally continuous (Mallon et al., 2012), rather, the stratigraphic data of specimens must be reliably accurate.
Additionally, Paul et al. (2022) did not demonstrate that their “lower, middle, and upper” subdivisions of the Hell Creek and its regional stratigraphy equivalents were comparable to previous well-established stratigraphy of these units (Eberth & Kamo, 2019; Hartman et al., 2002; Wilson et al., 2014). The Hell Creek represents a formation in which considerable research effort has already established a geochronology and zonation based on lithology, macrofloral, and palynological data (Fastovsky & Bercovici, 2016), paleomagnetism, and radioisotopic dating (Hicks et al., 2002): Ideally, any research program focusing on evolutionary changes within this unit should be addressed within this context if it is to be integrated into the overall picture of environmental transformations through the late Maastrichtian Western Interior.
As noted above, the authors were unable to refer several excellent specimens to any one of the three taxa. In some cases they were unable to narrow down identifications owing to “uncertain stratigraphic position” or, conversely, they ambiguously narrowed down identifications to two taxa based on “high stratigraphic placement” (Paul et al., 2022). However, the taxonomic identification of a specimen must be independent of stratigraphic position or the identification becomes circular (Álvaro & Esteve, 2020). Instead, unique and derived anatomical features should be used to identify specimens.
“Skeletal Robusticity is Consistent Within Specimens But is Not Correlated with Absolute Size and Presumed Maturation”
We applied agglomerative hierarchical clustering to the data presented by Paul et al. (2022) using unweighted pair-group arithmetic average (UPGMA) agglomerative hierarchical clustering implemented via the R package ‘cluster’ (Maechler et al., 2022). We recovered a dendrogram (Fig. 3a) that mostly matches their three putative species, but importantly also indicates that the optimal number of clusters in the dataset is one (Fig. 3c). In other words, there is indeed structure in the dataset (as in any biological dataset), but there is no statistical justification for assigning the specimens to different species corresponding to this structure.
“Tyrannosaurus Femoral Proportions Do Show Unusual Variation”
Calculating the robustness of each extant avian theropod yields a mean intraspecific femur robusticity range of 0.66 ± 0.35—greater than the absolute variation observed in T. rex (0.61) and every other tyrannosaurid included (Fig. 4). The median robustness range is slightly smaller at 0.59. While it is true that T. rex has higher variation in femur robusticity than the other three tyrannosaurids, a taxonomically controlled and statistically large sample shows it to be unexceptional—in fact, some modern birds are over three times as variable in this character. The undesirable statistical properties of ratios have been known for decades (Atchley et al., 1976) – namely, ratios change the structure of the underlying dataset and can create spurious correlations. To ensure that this did not bias our results, we regressed femoral circumference against femur length for all 112 species of theropods in our dataset, and computed the R-squared value for each regression as a metric of dispersion. As in the ratio data, T. rex resolved as having a degree of variation that is entirely typical of modern avian theropods (Suppl. Info. 2) under both ordinary least-squares and standardized major axis regression. Therefore, the argument that the range of robusticity scores for T. rex is abnormal (and thus suggestive of cryptic diversity) can be rejected.
“Incisiform Dentary Tooth Arrangement Correlates with Femoral Robusticity and Also Appears to Change with Time”
The number of mesial incisiform teeth was used by Paul et al. (2022) in their diagnoses of the three taxa. Specifically, they considered the ratio of the “diameter” of the third tooth to the second tooth as taxonomically meaningful. Although not defined in the article, “diameter” corresponds to the mesiodistal basal crown length (WSP, personal communication to TDC, March 15, 2022). The authors considered a ratio of tooth 3:tooth 2 of 1.2 or greater as indicating the second tooth is small enough to be considered a second incisor; i.e., a ratio closer to (or less than) 1.0 indicates a large second tooth that is not an incisor. However, comparisons of the mesial dentition between specimens is not straightforward: (1) owing to ontogenetic tooth loss the homology of mesial tooth identity between specimens is unclear, (2) it is not clear if the measurements used by Paul et al. (2022) are from successive teeth from the same side, opposite sides, or are taken from tooth sockets, or some combination of the three, and (3) it is not clear if the 1.2 ratio is a statistically meaningful threshold.
Among adult T. rex, the number of dentary teeth is variable and ranges from 12 to 14 teeth (Carr, 2020). In adults, it appears that the first tooth is lost completely and is represented by a small divot at the rostrodorsal corner of the dentary ahead of the first open alveolus (Carr, 2020; Fig. 5). If the divot is correctly identified, then the first three open alveoli represent sockets two to four, not one to three. If the loss of mesial alveoli accounts for the difference in number, then it is difficult to establish the homology between the mesial teeth of specimens with different tooth counts. Therefore, what appears to be the “second” tooth position in two specimens might really be the third tooth in one specimen (first mesial tooth lost developmentally) but the fourth in another (first two mesial teeth lost developmentally). Ergo, the ratios might be incomparable because the teeth are developmentally nonhomologous.
It is worth pointing out that in their Table 2, the corresponding columns of tooth diameter measurements are almost certainly transposed where, in most cases, the diameters of the second tooth exceed those of the third tooth. In Tyrannosaurus, as in all other tyrannosaurids, dentary tooth size increases sequentially from the first tooth to the fourth or fifth position; therefore, the large measurements under the heading “2nd Dentary tooth base diameter” are really of the third tooth and vice versa. Also, the ratio of the second to third teeth are listed under the heading “Hum ratio,” which implies “Humerus ratio” (Paul et al., 2022).
After rectifying these issues, we tested their results by comparing the measurements we have for the same specimens. We found that in several cases teeth were incomplete (not indicated by the authors), were completely missing, leaving an empty socket (not indicated by the authors), and we found that all of our corresponding measurements were lower than what the authors published (Table 2). Of note is that they did figure a specimen that is missing the first three teeth (Paul et al., 2022: Fig. 3a), but they give tooth measurements for it in their Table 2. Therefore, we conclude that the tooth data presented in Table 2 of Paul et al. (2022)—in the case of the figured specimen and those for which we have data—actually represent measurements of teeth from opposite sides, empty alveoli, or some combination of each (Table 2).
In the one case where we were able to obtain the dentary tooth 3:2 basal crown length ratio for sequential teeth of the same side, the result was significantly different. Paul et al. (2022) obtained a ratio of 0.98 (indicating one incisor) for MOR 980, which they identified as “T. regina” based in part on the ratio; in contrast, we obtained a ratio of 1.26 (indicating two incisors since the third tooth is significantly larger than the second tooth; Table 2), which in their schema would identify it as a “gracile” “T. imperator.”
When their tooth ratio data are arranged in sequential order, the ratios are continuous; it appears that the 1.2 ratio is an arbitrary threshold that was not based on a biological discontinuity. Consistent with this observation are two specimens, MOR 1125 and RSM 2523.8, which are referred to “T. imperator” and T. rex, respectively. Although their incisor count is not given, based on their high tooth ratios (Paul et al., 2022, Table 2) they have two incisors, which is diagnostic, in part, for “T. imperator,” and so the identification of MOR 1125 is consistent with the diagnosis. However, the presence of two incisors in RSM 2523.8 conflicts with its diagnosis as T. rex, which should only have one incisor. Accordingly, the authors moderate their identification of this specimen with a question mark. The discrepancy of RSM 2523.8 might account for the qualifying nature of the taxon diagnoses, where specific incisor counts are described as “usually” occurring. This issue is emphasized by the agglomerative clustering results where the dendrograms recovered for the femoral and dentary tooth data do not match, indicating character discrepancy (Fig. 3).
Finally, inspection of their diagram of incisors (Paul et al., 2022, Fig. 3) shows one specimen with empty tooth sockets and another with the first two teeth erupted. The exemplar of two small incisors (Paul et al., 2022, Fig. 3a) shows tooth sockets that are nearly the size of the third, whereas the exemplar of one small incisor (CM 9380) indeed shows a first tooth (and alveolus) that is truly a fraction of the size of the second and third alveoli that are similar in size to each other (Paul et al., 2022, Fig. 3B). Taken at face value, the first example simply has large mesial teeth, whereas the second does indeed have a small first tooth. Therefore, as presented by Paul et al. (2022), the conflicting variation of the mesial dentary teeth data are unresolved and therefore should be considered uninformative for diagnosing taxa.
Agglomerative Hierarchical Clustering Test of Basal Crown Length Ratios
Despite the issues discussed above, we tested their tooth ratio data (Paul et al., 2022, Table 2) using agglomerative hierarchical clustering to determine the optimal number of clusters. We found that the optimal number of clusters in the dataset is one (Fig. 3b, d). Based on their data, there is no statistical justification for assigning the specimens to different clusters.
Nearly half of the specimens (16 out of 37) used in the study of Paul et al. (2022) are held by commercial fossil companies or are privately owned. The practice of obtaining data from fossils that are not in public trusts is strongly discouraged to ensure replication of observations, as set out in the ethics guidelines of the Society of Vertebrate Paleontology and the instructions to authors of the Journal of Vertebrate Paleontology, the professional standard-bearers for vertebrate paleontologists.
The recent auction of “Stan,” a fossil held by a commercial company until its recent auction to a private collector in October, 2020 (Greshko, 2022; Vogel, 2020), emphasizes the fact that commercial stockrooms are not on the same footing as recognized public trusts. All of the commercially-held fossils are also vulnerable to sale and the privately owned fossils are simply off limits as a matter of scientific practice. The fact that so many of the fossils in the study of Paul et al. (2022) are not in public trusts is highly problematic because their results are not replicable unless the ethics of vertebrate paleontology are transgressed, and in some cases, like when specimens disappear into a private collection, not at all.
Recommendations and Caveats
For a case such as Tyrannosaurus with a good number of well-preserved specimens we propose the following procedure to test the hypothesis that one fossil species might really be several:
Clearly state the null and alternative hypotheses, and give the rationale for the alternative. In the case of a multiple species hypothesis, the null is one species that lacks sexual dimorphism. Alternatively, if sexual dimorphism is assumed to be plesiomorphic for dinosaurs, then the null would be to expect dimorphism. Untested “key” taxonomic characters are identified in the rationale for the alternative hypothesis; ultimately, all available evidence must be brought to bear on testing the hypothesis.
Complete a quantitative cladistic analysis of variation of individual specimens to test for the presence of two or more synapomorphy- (or synontomorphy-) based groups. A linear cladogram (or ontogram) indicates the test of multiple groups has failed; alternatively, recovery of several distinct groups indicates support. All available morphological evidence must be brought to bear on the analysis; untested key “taxonomic” characters must be avoided.
Complete a rigorous statistical analysis of a large sample measurement data to test the hypothesis of quantitative (group) differences in the data; the agglomerative hierarchical test is appropriate here.
Obtain precise (meter-level) stratigraphic data to set the specimens in a rigorous geochronological framework incorporating previously established geochronologic, lithostratigraphic and biostratigraphic data to independently test that hypothesized sympatric taxa are spatiotemporally discrete (note that this is supportive evidence for taxonomic separation and should not be used to justify taxonomic constructs, which must be rigorously justified morphologically or molecularly). This approach would distinguish chronospecies (= anagenetic lineages) from morphological species produced by cladogenesis.
When sympatric species and sexual dimorphism are not known a priori, they can be confused for each other; clear criteria must be established to distinguish between them. In the case of size dimorphism in T. rex considered as evidence for sexual dimorphism on the one hand, or for species differences on the other, we offer two heuristics to assess such cases of ambiguity: Size dimorphism is the best evidence for sexual dimorphism, in the absence of discrete morphological character differences. Size dimorphism is the best evidence for species dimorphism, in the presence of discrete morphological traits throughout growth that differentiate two or more species.