Background

The pinnipeds are a monophyletic group of aquatic carnivores most closely related to either mustelids or ursids. The three monophyletic families – Phocidae (earless or true seals), Otariidae (sea lions and fur seals), and Odobenidae (one extant species of walrus) last shared a common ancestor within arctoid carnivores > 25 million years ago (mya) [1, 2]. Some morphological studies [3, 4] and virtually all molecular studies [e.g., [515]] support a link between otariids and odobenids (Otarioidea). However, several morphologists prefer a phocid-odobenid clade (e.g. [2, 1618]).

There are 34 extant species of pinniped, including Monachus tropicalis (which is widely believed to have gone extinct recently) and treating Zalophus as being monotypic (Z. californianus) (Table 1). The family Phocidae contains 19 species in two subfamilies: Monachinae or "southern" hemisphere seals (nine species comprising Antarctic, elephant, and monk seals) and Phocinae or "northern" hemisphere seals (10 species that inhabit the Arctic and sub-Arctic) [19]. Some authors have questioned the monophyly of Monachinae [2022], although recent studies have shown this subfamily to be monophyletic [4, 11, 14, 15, 23, 24]. The monophyly of Phocinae has not been questioned since being established by King [25]; however, there remains considerable debate over inter-group relationships, especially within Phocina (reviewed by [11, 26]). The family Otariidae contains 14 extant species that have been divided traditionally into the subfamilies Arctocephalinae (fur seals) and Otariinae (sea lions) (e.g. [27, 28]). Early suggestions that this subfamilial classification might be incorrect (e.g. [29]) have received increasing support from recent molecular analyses [12, 14, 15, 3032]. Taken together with a number of reports of both interspecific and intergeneric hybrids within Otariidae (e.g. [19, 33, 34]), a reassessment of otariid taxonomy based on additional phylogenetic evidence is needed. Brunner [31] provides an extensive review of the history of otariid classification. Finally, Odobenidae today comprises only the single species of walrus, Odobenus rosmarus.

Table 1 Indented taxonomy listing the 34 pinniped taxa (including the extinct Monachus tropicalis) included in the analyses.

Several recent genetic studies [11, 12, 14, 15, 24, 26, 32] have advanced our knowledge of relationships within Pinnipedia considerably. Unfortunately, many of these (the exceptions being [14, 24, 26]) did not include divergence-date estimates as required for some types of macroevolutionary studies and phylogenetic comparative analyses. In addition, none included all species. The only study to include divergence-date estimates for all extant pinnipeds was that of Bininda-Emonds et al. [23] as a part of a larger carnivore supertree. This tree has been used extensively in comparative studies of carnivores in general (e.g., [3542]) and pinnipeds in particular (e.g., [4347]). However, it remains that the carnivore supertree is nearly a decade old and might no longer reflect current phylogenetic opinion.

Our objective is to address the lack of a recent phylogenetic study that includes all extant pinniped species and to provide date estimates for all nodes. To accomplish this task, we used the supertree method matrix representation with parsimony (MRP, [48, 49]) to derive a complete phylogeny of the group from 50 gene trees (with mtDNA markers analyzed either individually or combined as a single source tree), with corresponding maximum likelihood (ML) and Bayesian (BI) analyses of the concatenated supermatrix serving as a form of topological sensitivity analysis in a global congruence framework [50]. Divergence dates within the supertree topology were estimated using 52 genes calibrated with eight robust fossil dates using two different methods. Together, the use of a larger data set focussed exclusively on the pinnipeds should yield both a more accurate topology and divergence dates than those present in the global carnivore supertree of Bininda-Emonds et al. [23].

Results and Discussion

General structure of the supertree

Our preferred hypothesis of pinniped evolution is that derived from the molecular supertree with all genes analyzed individually (Fig. 1; see Methods). It agrees broadly with other recent studies (e.g., [1015, 23, 24, 26, 32]). In particular, the monophyly of each of Pinnipedia, Otarioidea, Phocidae, Otariidae, and the two phocid subfamilies was supported. Many of these nodes are among the most strongly supported in the supertree. The high level of congruence across numerous studies using different data sources and methodologies would suggest that higher-level pinniped relationships are well resolved. However, many relationships closer to the tips of the tree, particularly those within each of Arctocephalus and Phocina, remain contentious.

Figure 1
figure 1

Molecular supertree of the world's extant pinnipeds (plus one recently extinct Monachus species) based on a weighted matrix representation with parsimony analysis of 50 maximum likelihood gene trees. Node numbers correspond to divergence dates in Table 2. Branch lengths correspond to time with the scale bar indicating one million years. Boxed subset provides additional detail on branching order for two parts of the supertree where divergences occurred over a short timeframe.

Support values within the supertree (Table 2) were generally much higher than values typically reported for the supertree-specific support measure rQS (see [51, 52]), with an average rQS value (± SD) across the tree of 0.234 ± 0.214. As such, most nodes are directly supported by a majority of the 50 source trees containing all the relevant taxa. The only exception is the node comprising Halichoerus grypus, Pusa caspica and Pusa sibirica, which has a slightly negative rQS value (-0.040). Even so, all more inclusive nodes possess positive rQS values, indicating that the conflict has more to do with the exact placement of Halichoerus within Pusa rather than the placement of it within this genus per se.

Table 2 Divergence dates for the world's pinnipeds based on the median of up to 52 relative molecular and/or one fossil date analyzed using the relDate method.

Alternative analyses of the molecular data set (supertree analysis with all mtDNA forming a single source tree or ML or BI analyses of the combined supermatrix; Figures 2 and 3, respectively) yield topologies that agree broadly with that in Figure 1. The rQs support measure across the supertree (0.18 ± 0.11) again showed that most nodes are directly supported by a majority of the 12 source trees containing all the relevant taxa. In all cases, the changes occur in parts of the tree with noticeably weaker support and/or branch lengths, indicating general regions of uncertainty: 1) Neophoca cinerea nests deeper within otariids, either as the sister taxon to Phocarctos hookeri (ML) or to the clade comprising the genera Arctocephalus, Otaria, and Phocarctos (BI), or forms the sister taxon to Callorhinus ursinus (supertree); 2) the formation of a sister-group relationship between Otaria byronia and Arctocephalus pusillus, which were previously adjacent to one another (all analyses); 3) the clades (Arctocephalus townsendii + A. phillippi) and (A. gazella + A. tropicalis) trade places (all analyses); and 4) changes to the internal relationships of Phocina, either with Halichoerus grypus and Pusa caspica being pulled basally with respect to the remainder of the group, with Halichoerus forming the sister group to the remaining species (ML), or with Pusa hispida and the clade of Histriophoca fasciata and Pagophilus groenlandicus nesting deeper within the group (BI), or with Pusa hispida moving inside P. sibirica and with a polytomy at the base of Phocini (supertree).

Figure 2
figure 2

Molecular supertree of the world's extant pinnipeds (excluding the recently extinct Monachus tropicalis) based on a weighted matrix representation with parsimony analysis of 12 maximum likelihood gene trees, where all mtDNA genes were combined to form a single source tree. Support values for each node, as measured by rQS [51, 52] are also provided.

Figure 3
figure 3

Likelihood-based analyses of the molecular supermatrix of 50 gene trees: a) ML tree with bootstrap proportions and b) BI tree with posterior probabilities. Scale bars indicate average number of substitutions per site per unit time.

In the supertree, nodes 1 and 2 (see Fig. 1) represent the divergences of the canid and ursid lineages, respectively, and nodes 3 to 35 represent the various pinniped divergences. The total sample size (molecular and fossil date estimates) underlying the divergence times for each node ranged from 0 (node 35 – the split between Monachus schauinslandi and M. tropicalis, where the date was interpolated using a constant birth model) to 27 (Table 2). Over half (19) of the pinniped nodes were dated using at least 12 separate estimates. The remaining 14 nodes were dated by five or fewer estimates. Ten of these 14 nodes relate to otariid relationships, and seven concern Arctocephalus species. Divergences within the Pusa + Halichoerus clade were also dated by a comparatively small number of estimates. However, no obvious relationship existed between the variability in a date estimate (given by the coefficient of variation, CV) and the number of estimates it was derived from (R2 = 0.02, P = 0.4849, df = 26).

Our inferred relDate dates for the supertree topology (see Methods) are also significantly correlated with those for comparable nodes (which are restricted largely to Phocidae) in the two major studies to estimate divergence times within pinnipeds, those of Bininda-Emonds et al. [23] (R2 = 0.52, P = 0.004) and Arnason et al. [14] (R2 = 0.958, P < 0.0001) (df = 12 in both cases using ln-transformed values). However, whereas our dates did not differ significantly from those of Bininda-Emonds et al. [23] (paired-t of ln-transformed values = -1.36, P = 0.197; df = 13), they were significantly more recent than those of Arnason et al. [14] (paired-t of ln-transformed values = -9.82, P < 0.0001; df = 13), probably reflecting their use of a only single and more distant calibration point (the caniform-feliform split at 52 mya) as well as topological differences between the trees and different methodologies used to derive the dates.

Both sets of multidivtime divergence dates (Table 3) are significantly different from the relDate divergence dates (paired-t of ln-transformed values = -11.39, P < 0.0001; df = 32, for relDate versus multidivtime all genes; paired-t of ln-transformed values = -4.53, P < 0.0001; df = 32, for relDate versus multidivtime mtDNA only). The supertree (relDate) divergence dates underestimate the multidivtime dates from all genes and mtDNA genes by 88% and 51% on average, respectively. With respect to confidence intervals (CIs), only 9 and 7 (of 33) of the relDate dates fall into the range provided by the multidivtime CIs for mtDNA or all genes, respectively. Conversely, only 3 and 4 (of 33) dates for all genes and mtDNA only, respectively, fall within the CIs of the relDate dates. However, it is important to note that the two sets of multidivtime dates themselves are also significantly different from one another (paired-t of ln-transformed values = 2.36, P = 0.02; df = 32). In the following sections, we compare both sets of divergence dates (i.e., the relDate and multidivtime dates) with those from the fossil record and other studies.

Table 3 Divergence dates calculated using Bayesian relaxed molecular clock method implemented by multidivtime [122, 123] for all genes combined and for mtDNA genes only, each fitted to the preferred supertree topology (Fig. 1).

Origins of major pinniped groups

The split between ursids and pinnipeds is estimated to be 35.7 ± 2.63 (= mean ± SE) mya (relDate, Table 2; the multidivtime dates for this node were similar (Table 3)), although this should not be taken to imply that ursids are the closest living relatives of pinnipeds among arctoid carnivores. Early pinnipeds (pinnipedimorphs) are held to have originated in the North Pacific during the late Oligocene (34-24 mya) ([2, 22, 45, 53], but see [14], who speculate on an origin on the southern shores of North America), which is consistent with our estimate. Thereafter, a substantial lag is apparent, with the basal pinniped split between Phocidae and Otarioidea occurring some 12 million years later at 23.0 ± 1.36 mya (Table 2) (ca. 26 mya with multidivtime, Table 3). Both values are more recent than the 28.1 mya and 33.0 mya estimates obtained by Bininda-Emonds et al. [23] and Arnason et al. [14], respectively.

Odobenidae includes a single extant species and at least 20 fossil species in 14 genera [2], with the most basal taxa known from the late early Miocene (ca. 21-16 mya). Deméré et al. [2] suggest that odobenoids first evolved in the North Pacific region sometime before 18 mya (late early Miocene), and our data indicate the upper bound to be 20.8 mya. The multidivtime dates were similar at ca. 21 mya. Both values are substantially older than the 14.2 mya estimate obtained by Bininda-Emonds et al. [23], but younger than the 26.0 mya estimate of Arnason et al. [14].

Modern fur seals and sea lions are thought to have evolved from the ancestral family Enaliarctidae ca. 11 mya [5456], with our data showing that the diversification of the crown group occurred shortly thereafter at 8.2 ± 2.09 mya (the dates estimated using multidivtime were again older, ca. 11 mya). Arnason et al. [14] consider the late Oligocene Enaliarctinae [57] to be the oldest otarioid lineage so far described (25–27 mya; [58]). However, Deméré et al. [2] consider this group to be early pinnipedimorphs that originated before the evolution of the modern crown-group pinnipeds.

The first phocid fossils date from the middle Miocene (ca. 16-14 mya) (but see [59, 60]) in the North Atlantic [61], although some authors (e.g., [2, 4, 62]) have speculated over a North Pacific origin. Koretsky and Sanders [59, 60] recently described the "Oligocene seal" from the late Oligocene (ca. 28 mya) in South Carolina as the oldest known true seal, a fossil that predates our estimate for the basal-most split in all pinnipeds. However, because this new description was based on a very small sample (two partial femora), and because Deméré et al. [2] noted that its stratigraphic provenience may be in question, we instead used 23 mya as a conservative fossil calibration point for the split between Phocidae and Otarioidea. Obviously, acceptance of the "Oligocene seal" as the oldest known phocid (and therefore crown-group pinniped) would cause all divergence times within the pinnipeds to be older than the ones that we report.

Otariidae

Phylogeny

The supertree resolved Callorhinus ursinus as sister to all remaining otariids (as is now generally accepted [1214, 23, 32]), with neither the sea lions nor Arctocephalus forming clades. The five sea lion genera were generally positioned basally to the various Arctocephalus species. The exception was Phocarctos (and possibly Otaria in the supermatrix analyses), which nested within Arctocephalus. Wynen et al. [32] also reconstructed Neophoca as being the next otariid species to diverge (contra the supermatrix results) and found Zalophus + Eumetopias to form the sister clade to the remaining forms (Arctocephalus, Otaria and Phocarctos). These results add to the already large body of evidence, both molecular and morphological, that subfamilial descriptions in Otariidae, traditionally based on the single character of presence or absence of underfur, are incorrect [7, 12, 14, 15, 3032, 53, 63]. However, resolution of most of the more inclusive otariid clades remains problematic [14, 15, 32], especially the relationships among the various Arctocephalus species, and the placements of the A. australis + A. forsteri + A. galapagoensis and A. philippii + A. townsendi clades in particular. The likelihood-based supermatrix analyses reinforce the generally weak or conflicting phylogenetic signal in the data set for otariids, with both suggesting what is to our knowledge a novel, more nested position for Neophoca (although the inferred location differs between the analyses).

The supertree resolved A. forsteri as the sister to A. australis + A. galapagoensis, with all three as sister to an A. gazella + A. tropicalis clade, an arrangement with relatively moderate support (Table 2). Wynen et al. [32] found a similar result, placing A. gazella as sister to the A. australis + A. forsteri + A. galapagoensis clade, but placed A. tropicalis as sister to A. pusillus on a more basal branch separate from other arctocephaline species. Our results also support a polyphyletic Arctocephalus, but with A. pusillus as the separate lineage. The separation of A. pusillus from other Arctocephalus species (and possible pairing with Otaria as found in both the supermatrix analyses and the combined mtDNA supertree) is perhaps not unexpected in hindsight, given that this species has long been considered as having an 'enigmatic taxonomic position' due to its similarity to sea lions in size, skull morphology, and behaviour [6466].

Several authors [31, 32] have recently questioned the status of A. philippii and A. townsendi as separate species (also see [67, 68]). Brunner [31] went so far as to suggest that both taxa be removed from Arctocephalus to form subspecies in the previously described genus Arctophoca (Arctophoca philippii philippii and A. p. townsendi [69]). Our results are equivocal on this latter issue, given that these two taxa do form part of the main clade of Arctocephalus, but as sister to the remaining species. The two taxa, however, are indicated to have diverged from one another earlier (0.3 mya; relDate date) than other another pair of undisputed Arctocephalus species (namely A. gazella and A. tropicalis at 0.1 mya), a potential argument in favour of them retaining separate species status (regardless of the generic appellation).

The close genetic relationship we found between A. australis, A. forsteri and A. galapagoensis (also [32]) is also congruent with the morphometric results of Brunner [31], who suggested that A. galapagoensis be considered a subspecies of A. australis (as per [66, 67]). Again, the relatively long divergence time between these two taxa (0.7 mya; relDate date) could argue against this arrangement.

Ultimately, relationships within Arctocephalus remain poorly resolved with little agreement between different studies or, as shown in this study, even different analyses of the same base data set. This situation will likely remain at least until additional genes for these taxa are sequenced. We would note that the relationships and divergence times within Arctocephalus presented here are based on the only genetic marker available at the time data were extracted from GenBank (MT-CYB sequences [32]). Additional genetic sequences for these species are desperately required (but see [14, 15]).

Divergence dates

The only recent studies to estimate divergence dates for otariids are those of Bininda-Emonds et al. [23] and Arnason et al. [14]. Here, we compare our estimates to those two studies and the available fossil record, which is unfortunately limited. Our relDate estimate of 8.2 ± 2.09 mya for the root of the otariid crown-group is younger than other recent estimates [14, 23]. The multidivtime dates (ca. 11–12 mya) agree well with Bininda-Emonds et al. [23], but are still younger than that estimated by Arnason et al. [14]. Thereafter, a series of rapid divergences are inferred to have occurred. The origin of Neophoca was estimated at 6.1 mya based on MT-CYB only (ca. 10 mya using multidivtime), followed by the initial radiation of the remaining species at 5.2 ± 1.09 mya (ca. 9 mya using multidivtime), and the origins of Otaria at 4.5 ± 0.21 mya and Arctocephalus pusillus at 4.3 mya (the latter, again, based only on MT-CYB; both divergences ca. 7 mya in the multidivtime analyses). The oldest known record for the southern hemisphere otariids is established by Hydrarctos lomasiensis from the late Pliocene or early Pleistocene (< 3.4 mya, [70, 71]). Fossils from California and Japan suggest that sea lions did not diversify until ca. 3 mya [55, 56, 72]; however, only the late Pleistocene occurrences (< 0.8 mya) of Otaria bryonia [73] and Neophoca palatine [74] are considered reliable at present [2]. Our date for the origin of the lineage leading to Otaria as a whole is naturally much older than this and almost three times older than that in Bininda-Emonds et al. [23] (which places Otaria in a very different position). Arnason et al. [14] estimated an older divergence time, but also based on a different phylogeny. We infer Phocarctos to have split from the remaining Arctocephalus species 3.4 ± 0.34 mya. Finally, the divergence between Eumetopias and Zalophus was dated as 4.5 ± 0.37 mya, which is considerably older that the earliest known fossils (Pleistocene, 1.64-0.79 mya [56]), but younger than the 8 mya estimate of Arnason et al. [14] (which is still older than the multidivtime estimate of ca. 6 mya).

Our results similarly indicate a rapid radiation within Arctocephalus, with many species originating within the past 1 million years (both dating methods, Tables 2, 3). Overall, the date estimates showed reasonable levels of variation (relDate median CV of 27.5), although some were highly variable. For example, the split between the clades A. gazella + A. tropicalis and A. australis + A. forsteri + A. galapagoensis had a final date estimate of 3.1 mya but a large SE (3.43 my) and 95% confidence intervals on the input date (-2.76–10.68 mya), possibly reflecting weak signal in this area of the tree (see sensitivity analyses). Arctocephaline species are known in the fossil record only from poorly documented records of A. pusillus and A. townsendi from the Pleistocene (< 0.8 mya) [29]. The origin of Arctocephalus + Phocarctos hookeri was estimated here using MT-CYB data at 4.3 mya, which is younger than other recent estimates based on different topologies [14, 23]. Although our results lend support to previous suggestions [2, 32] that both sea lions and Arctocephalus underwent recent periods of rapid radiation, all the evidence to date tend to be based on a small dataset for most species.

Phocidae

Phylogeny

Compared to otariids, phocid relationships are generally much more agreed upon. The traditional and well-accepted phocid subfamilies Monachinae and Phocinae were both recovered as monophyletic in the supertree and supermatrix analyses (also see [4, 1115, 23, 26]). Erignathus barbatus was sister to the remaining northern phocids, followed by Cystophora cristata. The next branch of the tree separated Pagophilus groenlandicus and Histriophoca fasciata (= Histriophocina) as the sister group to the remaining taxa (but note the differences in the alternative supertree and the BI supermatrix). Most recent studies [1115, 23, 26] have found support for this arrangement among the early branches (i.e., involving the lineages Erignathus, Cystophora, and Histriophocina). Of the six Pusa, Phoca, and Halichoerus species (= Phocina), in the preferred tree Pusa hispida was found to be sister to the remaining species in which Phoca vitulina + Phoca largha formed the sister clade to (Pusa sibirica + (Halichoerus + Pusa caspica)) (again note the alternative arrangements in Figures 2 and 3, indicating poor signal in this part of the pinniped phylogeny). The sister-group relationship between Phoca vitulina and P. largha recovered here in all analyses is consistent among and well supported in numerous studies based on diverse data types [4, 1115, 23, 26], and reflects early suggestions that the latter species represents a subspecies of the former [68, 75].

Arguably the biggest outstanding problem in phocid phylogeny concerns the placement of Halichoerus within Phocina, and there have been long-standing suggestions (e.g., [76]) for taxonomic revision of these taxa. Both Davis et al. [11] and Delisle and Strobeck [12] found the strongest support for Halichoerus as sister to Pusa, with both being sister to Phoca. However, both studies included only Pusa hispida as an exemplar for Pusa. Fulton and Strobeck [15] also recovered a similar result, but did not include Pusa sibirica. Four recent studies have included all three Pusa species [4, 14, 23, 26]. Bininda-Emonds and Russell [4] recovered Halichoerus as sister to Erignathus + Histriophocina + the remaining Phocina using morphological data. Bininda-Emonds et al. [23] resolved an unresolved Pusa as sister to the two Phoca species in their supertree, with Halichoerus being sister to this clade. The molecular results of Arnason et al. [14] and Palo and Väinölä [26] were similar to ours, indicating weak support for a P. caspica + H. grypus clade, and for a basal position for P. hispida within Phocina. Although the precise interrelationships of the species differ slightly, our results support the suggestions of these other recent studies that both Halichoerus and Pusa be included within a redefined Phoca, possibly as subgenera. In fact, Arnason et al. [6] suggested recently that the scientific name for the grey seal be Phoca grypa. This solution also works in light of the continuing uncertainty concerning interrelationships within Phocina (compare Figures 1, 2, and 3 and these with the references above), especially the increasing number of suggestions that Pusa might be paraphyletic (except if it were to be retained as a subgenus).

It is also noteworthy that all the relevant divergences within Phocina apparently occurred in a very short time frame (also see [14, 26]), which might make resolution within this group difficult to obtain even with additional markers. By contrast, there were no negative branch lengths in this part of the supertree (although nodes 23 and 24 in Figure 1 were held to be simultaneous initially), indicating relatively good agreement among the sequence data. Also, except for node 25, all the rQS values in this part of the (preferred) tree are > 0, again indicating more agreement than conflict among the set of gene trees (note the rQs values in Fig. 2, the only negative value in the alternative supertree concerns the sister-group relations of the two Histriophocina species).

Within Monachinae, all analyses recovered a monophyletic Monachus as sister to Miroungini + Lobodontini. Relationships within Monachus and Mirounga recovered here are consistent among and well supported in numerous studies [4, 1115, 23, 26] (but see [22] regarding Monachus). Relationships within Lobodontini have traditionally been contentious, although recent studies [1115] all support the sister relationship between Leptonychotes and Hydrurga recovered here (contra [4, 23]). However, the positions of Ommatophoca and Lobodon relative to each other and to the Leptonychotes + Hydrurga clade remain problematic. Many recent studies [11, 12, 14, 15] found the strongest support for an (Ommatophoca, (Lobodon, (Leptonychotes + Hydrurga))) relationship. Our results differed and, similar to Fyler et al. [24], supported Lobodon as being sister to the remaining species. The supermatrix analyses indicated the identical sets of relationships for Monachinae.

Divergence dates

The fossil record suggests that the divergence of the two phocid subfamilies occurred sometime prior to the middle Miocene (> 14.6 mya) [77] and we used 16 mya as a minimum age constraint for the corresponding node (also see [23]). Similarly, Fyler et al. [24] used 15 and 17 mya as calibration points from which to estimate divergence dates in Monachinae. The corresponding molecular estimate of Arnason et al. [14] at 22 mya is older still and in better agreement with our multidivtime dates. The initial divergence in phocines (i.e., the lineage leading to Erignathus) was dated at 13.0 ± 0.90 mya, which is slightly younger than other estimates [14, 23, 24, 26] (the multidivtime dates are again older, ca. 19 mya). Our relDate dates for the origins of Cystophora (8.0 ± 0.42 mya) and Histriophoca + Pagophilus (6.4 ± 0.40 mya) are considerably younger than the corresponding estimates from Bininda-Emonds et al. [23] (which are in closer agreement with the multidivtime dates), but considerably older than the available fossil evidence. Deméré et al. [2] suggested that these basal phocines originated in the Arctic during the Pleistocene and represent the products of a glacioeustatic-forced allopatric speciation event. Arnason et al. [14] estimated a considerably older date (12 mya) for the divergence of Cystophora, again in agreement with both Bininda-Emonds et al. [23] and our multidivtime results, but a comparable 7 mya estimate for the origin of Histriophocina.

The genus Phoca arose 2.2 ± 0.62 mya (using relDate; multidivtime dates ca. 5–6 mya), with both extant species diverging from one another 1.1 ± 0.18 mya. These two nodes were well sampled, with 18 and 12 molecular estimates, respectively. The suggested recent separation and evolution of the two Phoca species (using both dating methods) is in general agreement with other studies [14, 23, 68, 75, 78]. Pusa sibirica arose 2.1 ± 0.21 mya, and Halichoerus grypus and Pusa caspica diverged immediately thereafter at 2.0 ± 0.14 mya; the divergence estimates for these last two nodes were each dated by only three genes apiece, and both are considerably older in the multidivtime analyses. Bininda-Emonds et al. [23], by contrast, estimated the origin of Halichoerus to be 7.1 mya, although this was based on a different topology, with Halichoerus in a more basal position. They also dated a Pusa polytomy to 2.8 mya, whereas we estimate here (using relDate) that the three genera Phoca, Halichoerus, and Pusa all arose over a short time span ranging from 2.2 to 1.1 mya (2–6 mya using multidivtime). Palo and Väinölä [26] similarly estimated that the radiation of the five main Phocini mtDNA lineages occurred ca. 2.5–3.1 mya, whereas Arnason et al. [14] estimated that the basal Phocina radiations occurred at 4.5 mya. Sasaki et al. [79] derived considerably younger estimates for divergences within Pusa, although their calibration was based on an incorrect estimate of the general mammalian substitution rate [26]. In addition, the sister-group relationships on which their dates are based conflict with our results and those of other recent studies [14, 26]. Regardless of the precise relationships upon which the dates are based, the general consensus is that the diversification within Phocina was both rapid and relatively recent, which agrees with biogeographic evidence suggesting that the evolution of the Halichoerus-Pusa-Phoca complex likely occurred in the Greenland Sea/Barents Sea portion of the Arctic [2], and was possibly related to the closing of the Panama Canal 3.2-2.8 mya, which resulted in the freezing over of the Arctic Ocean [8082].

Among the southern phocids, most nodes (with the obvious exception of the Monachus schauinslandi and M. tropicalis split) were well sampled, with 12–21 date estimates each. The lineage leading to Monachus split from the remaining species 11.3 ± 0.60 mya, which is slightly younger than other recent estimates [23, 24] (and these other estimates are themselves slightly younger than the multidivtime dates). Our relDate estimate of the origin of the lineage leading to M. monachus (9.9 ± 0.28 mya) is considerably older than the 4.8 mya estimate of Bininda-Emonds et al. [23], but in good accord with those of Fyler et al. [24] and Arnason et al. [14]. The multidivtime dates for this node are again older, ca. 15–16 mya. The split between M. schauinslandi and M. tropicalis was interpolated to be 4.9 mya, compared to 2.8 mya estimate from Bininda-Emonds et al. [23] (also based on interpolation from a pure-birth model).

Our results indicate that the Mirounga lineage split from the lobodontine seals 10.0 ± 0.65 mya (ca. 15–16 mya using multidivtime), which accords well with recent genetic studies [14, 23, 24] and with fossil evidence indicating that the oldest fossils of southern lobodontines are from the late Miocene (6.7-5.2 mya) [71] and suggesting that the divergence occurred sometime before 11 mya [2, 83]. Our relDate date for the split between the two Mirounga species (2.3 ± 0.85 mya) was slightly younger than that in other recent studies [14, 23, 24] (which were all in general agreement with the multidivtime results), but considerably older than the 0.8 mya estimate of Slade et al. [84].

Among the four lobodontine seals, Lobodon diverged first at 7.1 ± 0.34 mya, followed shortly thereafter by Ommatophoca at 6.8 ± 0.26 mya, and finally by Hydrurga + Leptonychotes at 4.3 ± 0.55 mya. The time of origin of the lineage leading to Lobodon is younger than the date estimated by Fyler et al. [24], but older than that of Arnason et al. [14] (who also resolved a different topology). However, both it and time of origin of the lineage leading to Ommatophoca correspond well to the dates of Bininda-Emonds et al. [23]. The divergence dates determined using multidivtime were again considerably older (Table 3).

Conclusion

Our results add to the growing list of studies that highlight the need for a re-evaluation of pinniped taxonomy, with revisions being required for both otariids (with respect to subfamilial classification and the genus Arctocephalus) and phocids (within Phocina, especially as regards Halichoerus and Pusa), ideally based on additional and especially nuclear genetic markers. The divergence-date estimates herein indicate rapid radiations in both families, particularly the southern hemisphere fur seals (Arctocephalus) and the northern phocids (Phocina), a fact which might account for the historical difficulties in assessing the phylogenetic relationships within each group. The historically unusual, but increasingly suggested nesting of Halichoerus within Pusa (see also [6, 14, 15, 26]) highlights both the utility of large molecular datasets with numerous genes and the value of including all relevant species in phylogenetic analysis (see also [4]). We suggest increased genetic sampling throughout the group as the best approaches to further improving our understanding of pinniped phylogenetics. For example, at the time we gathered data, only MT-CYB had been sequenced for most otariid species and only a small number of genes were available for several Pusa species, although additional sequences have since been provided [14, 15]. That being said, the problem areas within Phocina and Arctocephalus that were identified by both supertree and supermatrix analyses might prove resistant to resolution even with increased sampling should the apparent rapid branching in these parts of the tree be real.

Phylogenetic comparative methods have become the standard way for comparing aspects of the biology of a group of species while avoiding statistical problems associated with species not being independent due to their shared evolutionary history [85]. Phylogenetic analyses are improved with appropriate reconstruction of ancestral nodes using divergence-date information [86, 87], and estimates of divergence dates provide conservation biology with a powerful tool in assessing vulnerability to conservation problems and comparative analysis of extinction risk [88, 89]. Our results will allow the use of phylogenetic comparative methods with a robust estimate of pinniped phylogeny and divergence times that includes all species.

Methods

DNA sequence data

The use of large, multigene data sets provides the numerous informative changes required for correct inferences, and may also help to raise weak phylogenetic signals above the noise level [90]. In addition, the best topologies are often resolved when estimates are based on a combination of mitochondrial and nuclear DNA. With these points in mind, we mined GenBank for all available pinniped DNA sequence data to infer a phylogeny based on the largest data set possible. All sequence data were downloaded on January 30, 2006 and mined using the Perl script GenBankStrip v2.0 [91] to retain only those genes that had been sequenced for at least three pinniped species and were longer than 200 bp (except for tRNA genes, which had to be longer than 50 bp). For the 52 genes meeting these criteria (see Table 4), matching sequences for exemplars from Canidae (either Canis lupus or, on one occasion, C. latrans) and/or Ursidae (usually Ursus arctos, but also U. americanus or U. maritimus as needed) were downloaded for outgroup analysis.

Table 4 Genetic sequences used in this study with their inferred models of evolution.

Sequences in each data set were aligned using ClustalW [92] or with transAlign [93] in combination with ClustalW for the protein-coding sequences, and improved manually where needed. Thereafter, each aligned data set was passed through the Perl script seqCleaner v1.0.2 [91] to standardize the species names, to eliminate inferior sequences (i.e., those with > 5% Ns), and to ensure that all sequences overlapped pairwise by at least 100 bps (or 25 bps for the tRNA genes). Note that although species names were standardized according to Wilson and Reeder [94] for the analyses, those used in the text for Phocini follow the currently accepted International Commission of Zoological Nomenclature (ICZN) taxonomy, which recognizes the five genera Halichoerus, Histriophoca, Pagophilus, Phoca, and Pusa.

The final data set of 52 genes (Table 4) comprises 26818 bps in total, or an average of 515.7 bp per gene (range = 68–1980 bps). On average, each gene was sampled for 11.2 species (range = 3–35); however, only an average of 5.5 species per nuclear gene were available for study. Two genes, LYZ and exon 29 of APOB, contained fewer than three pinniped species and, as such, were uninformative for resolving pinniped interrelationships. However, they were still retained to determine times of divergence. Accession numbers for all sequences used in the final data set are provided as supplementary material (Additional file 1).

The final data set is dominated by mitochondrial genes, which forms a single locus due to its common inheritance and general lack of recombination. As such, it must be kept in mind that all the resulting topologies (be they derived in a supertree or supermatrix framework) and divergence times could be biased by any peculiarities related to mitochondrial sequence data (e.g., introgression or linkage) or simply the disproportionately large amount of mitochondrial data. However, the data set represents the "current systematic database" for pinnipeds and so the best possible current data source for which to infer their phylogenetic relationships. However, to assess the impact of this potential source of bias, we performed a second supertree analysis where all mtDNA genes were combined to form a single source tree (yielding 12 source trees in total). Nevertheless, the collection of additional nuclear markers is desperately needed for this group.

The final data set used for the phylogenetic analyses, together with the supertree and supermatrix trees is freely available from TreeBASE [95] (study accession number S1911, matrix accession numbers M3516-M3518).

Phylogeny reconstruction and supertree analysis

Our general approach to infer the phylogeny of the pinnipeds involved a divide-and-conquer strategy in which individual genes trees were determined using the best possible methodology for each and then combined as a supertree. Compared to a simultaneous analysis of the multigene "supermatrix", this procedure has been argued to potentially account better for the differential models of evolution that might be present [96] and, for extremely large matrices, looks to be a faster analytical method without any appreciable loss of accuracy [97]. Although the use of mixed models is possible in both maximum likelihood (ML, [98]) and Bayesian frameworks, the accuracy of the resulting tree, at least in a Bayesian framework, has recently been called into question [99], especially when reasonable levels of conflict exist between the different data partitions [100]. Furthermore, Jeffroy et al. [101] have also recently argued that trees derived from multigene, phylogenomic data sets should be treated more cautiously than those from single-gene analyses given that the systematic biases inherent to phylogeny reconstruction become more apparent with larger data sets. Nevertheless, in light of the fierce criticism that the supertree approach has attracted (e.g., [102, 103], but see [104, 105]), we also conduct ML and Bayesian inference (BI) analyses of the concatenated supermatrix to help identify especially problematic regions of the pinniped tree as part of a global congruence framework [50] and to add to the growing body of studies comparing phylogenetic inference under these two frameworks (e.g., [15, 106]).

For the supertree analyses, we used PHYML [106] to determine the ML tree for each of the 50 phylogenetically informative genes after determining their optimal model of evolution according to either AIC or AICc (as appropriate, the latter being a version of the AIC corrected for small sample sizes) using MrAIC [107] and PHYML [106] (Table 4). The 50 gene trees were then used to build a weighted supertree of the group using matrix representation with parsimony (MRP, [48, 49]). In so doing, we have assumed that each gene tree forms an independent unit in our preferred supertree, something that is admittedly debatable for the mitochondrial genes and especially the very small tRNA genes. However, in the absence of any robust linkage information, this assumption seemed more justifiable and objective than the defining of gene partitions based on assumed linkage or for purely practical considerations (e.g., concatenating all the tRNA genes because of their small size). Nonetheless, the sensitivity of these assumptions was assessed using the second supertree in which all mtDNA genes formed a single source tree.

All gene trees were encoded for the MRP analysis using semi-rooted coding [108], whereby only those trees with either a canid and/or ursid outgroup taxon and where the pinnipeds were reconstructed as being monophyletic were held to be rooted. Furthermore, the individual MRP characters, which correspond to a particular node on a gene tree, were weighted according to the bootstrap frequency [109] of that node, as determined using PHYML and based on 1000 replicates. This procedure has been demonstrated to increase the accuracy of MRP supertree construction in simulation [110]. The weighted parsimony analysis of the resulting MRP matrix was accomplished using a branch-and-bound search in PAUP* v4.0b10 [111], with Canidae and Ursidae being specified as a paraphyletic outgroup. Monachus tropicalis, for which no molecular data exist, was added to the supertree manually as the sister species of M. schauinslandii (following [4, 23]).

Support for both supertrees and the relationships in them were quantified with the supertree-specific rQS index [51, 52], which compares the topology of the supertree to that of each of the source trees contributing to it. As such, it is preferable to such conventional, character-based support measures such as Bremer support [112] and the bootstrap, which are invalid in this context given that MRP characters for a given source tree are non-independent. Values for rQS range from + 1 to -1, with the two values indicating that a given node is directly supported or directly contradicted by all source trees, respectively. The rQS value for the entire tree is simply the average of all the nodal rQS values. Previous applications of the rQS index show that it often tends to negative values [51, 52, 113], indicating that more conflict than agreement generally exists among a set of source trees for a given node. As such, positive values of rQS can be taken to indicate good support in the sense that more source trees support the relationship than contradict it.

The individual gene data sets were also concatenated to form a single supermatrix that was analyzed using both partitioned ML and BI methods. ML analyses used RAxML VI-HPC v2.2.3 [114]. A GTR + G model was assumed for the data using the CAT approximation of the gamma distribution, with the model parameters being allowed to vary independently for each gene. CAT is both a fast approximation of the gamma model (due to its lower computational and memory costs) and one that appears to yield better log likelihood scores even when calculated under a real gamma model [115], and therefore is ideally suited to large, computationally intensive data matrices such as ours. The ML tree was taken to be the optimal tree over 100 replicates, for which nodal support was estimated using the bootstrap with 1000 replicates and search parameters matching those for the optimality search.

BI used MrBayes v3.1.2 [116], with the individual models specified for each individual gene matching the optimal model determined in the gene-tree analyses as closely as possible. Otherwise, flat priors were used in all cases. Searches employed a MCMC algorithm of two separate runs, each with four chains that were run for 10000000 generations and with the first 5000000 generations being discarded as burn-in. Trees were sampled every 5000 generations to derive the final BI tree and estimates of the posterior probabilities.

Divergence date estimations

Following Bininda-Emonds et al. [117], divergence times on the supertree only were determined using a combination of fossil calibration points and molecular dates under the assumption of a local molecular clock (see [118]). As a first step, the optimal model of evolution for all 52 genes was (re)determined using an AIC in ModelTEST v3.6 [119] in combination with PAUP*, with the appropriately pruned supertree topology being used as the reference tree in place of the default NJ tree. This combination was used here in place of the previous MrAIC/PHYML combination largely because it can be used to test for the applicability of a molecular clock (through PAUP*) using a likelihood-ratio test. The small taxonomic distribution meant that all but six genes (CYP1A1, MT-ND4, MT-ND5, MT-RNR2, OB, and MT-TQ) evolved according to a molecular clock.

Thereafter, we used PAUP* to fit the sequence data for each gene to the (pruned) supertree topology under the optimal model in a ML framework. In line with Purvis' [118] local-clock model, the relative branch lengths for each gene tree relative to the topology of the supertree were determined using the Perl script relDate v2.2.1 [91]. Only the gene trees for the clock-like genes were considered to be rooted and relative branch lengths were calculated with respect to ancestral nodes only (and not also with respect to daughter nodes).

Divergence times were then determined by calibrating the relative branch lengths for each gene tree using a set of fossil dates (Table 5). For a given node, the initial divergence date was taken to be the maximum of 1) the median of all fossil plus molecular estimates and 2) the fossil estimate. In this way, the fossil estimate acts as a minimum age constraint that can overrule the molecular estimates. Upper and lower bounds on any given date estimates took the form of the 95% confidence interval derived from all individual gene and/or fossil estimates for that node. Although error in the branch-length estimation for the individual gene trees can also contribute to uncertainty in the final date estimates [120], it is likely to be less important than the variation present between the different genes themselves. However, together with uncertainties in the fossil dates, it cannot be excluded that our confidence intervals are underestimates of the true values.

Table 5 Fossil calibrations used to anchor molecular date estimates.

Finally, the Perl script chronoGrapher v1.3.3 [91] was used to correct for any negative branch lengths and simultaneously to derive a divergence-time estimate for the single node lacking an initial estimate (that linking Monachus schauinslandi and M. tropicalis). The date for this latter node was interpolated from the dates of up to five of its ancestral nodes based on the relative number of species descended from each node, assuming a constant birth model (see [117]).

More details regarding this dating procedure, including its strengths and weaknesses with respect to other relaxed molecular clock methods (recently reviewed in [121]) can be found in Bininda-Emonds et al. [117].

The Bayesian relaxed molecular clock method implemented by multidivtime [122, 123] was also used to calculate divergence dates from the supermatrix data fitted to the preferred supertree topology. General methodology followed Rutschman [124], with maximum likelihood parameters estimated using PAML version 3.15 [125]. Incomplete overlap of sequences between taxa (in particular the outgroup sequence(s) not being represented in every partition) meant that model partitioning by gene was impossible; instead, a single F84 + gamma model was applied to the entire supermatrix. The root prior rttm (the mean of the prior distribution for the time from the ingroup root to the tips; in other words, the age of the ursid-pinniped split) was specified as 19.5 mya, with the remaining constraints the same as in the supertree dating analysis (Table 5). Other multidivtime parameters were calculated following the recommendations of Rutschmann [124]: rtrate (mean of prior distribution for the rate at the root node) = X/rttm, where X is the median amount of evolution from the root to tips; rtratesd (standard deviation of rtrtate) = 0.5 × rtrate; brownmean (mean of the prior distribution for the autocorrelation parameter, v) = 1/rttm; brownsd (standard deviation of brownmean) = brownmean. Three independent multidivtime analyses were run for 1 × 106 cycles, with samples taken every 100 cycles after a burn-in period of 1 × 105 cycles. The dates presented here are mean values for the three runs. The multidivtime analyses were then repeated using only the mitochondrial genes to investigate whether the inclusion of nuclear genes greatly altered the estimated divergence dates.