Background

Understanding the diversification history of species-rich clades is a major goal in evolutionary biology [1] as they provide important insights into the evolution of life on earth. With over 10,000 plant species, Magnoliidae, sensu Cantino et al. [2], are the largest clade of flowering plants outside monocots and eudicots and comprise four orders: Canellales (123 spp.), Laurales (3874 spp., incl. avocado), Magnoliales (2978 spp., incl. magnolias), and Piperales (3190 spp., incl. black pepper). Investigating the diversification of Magnoliidae is of primary importance to provide guidance for future hypotheses on the drivers of diversification of flowering plants as a whole. In addition, many organisms (incl. various butterfly and beetle groups) are highly dependent on this group for feeding or reproduction [3,4], and magnoliid species are an important part of tropical ecosystems [5,6]. As a consequence, investigating their tempo and mode of diversification will provide key knowledge to understand not only the evolutionary history of Magnoliidae as a whole, but also that of these related groups [7] and of the environments they live in [8].

It has often been suggested that the biology of angiosperms has a great influence on their own diversification [9,10]. The biology of Magnoliidae differs from that of eudicots and monocots in several respects. For instance, most members of Magnoliidae are mainly pollinated by beetles, flies, or thrips, while, in contrast to eudicots and monocots, bee or wind pollination are rare [3,11]. In addition, floral and vegetative morphology of Magnoliidae is highly variable and several of these variable traits have been shown to be associated with variation of diversification rates in other flowering plant lineages [10].

Previous molecular dating studies at large (e.g., angiosperms) or narrow (e.g., a particular family) taxonomic scale have suggested that the crown node age of Magnoliidae is older than the crown nodes of both eudicots and monocots, ranging from 240.16 to 85.07 Myr [12,13], and that several of the 19 families of Magnoliidae appear to have originated before the end of the Cretaceous [13-22]. In addition, the ages of numerous nodes within Magnoliidae have yet to be estimated [23]. In parallel, few studies have focused on the diversification of the group, being either at the scale of all angiosperms [24] or within families [25,26].

In the present study, we provide for the first time a complete time scale for Magnoliidae at the familial level. We then use this new time scale to test for significant diversification rate shifts during the evolutionary history of the clade. To do so, we take advantage of a recent improvement in the phylogeny of the group [16,27-35] and a new revision of the rich fossil record of Magnoliidae providing reliable minimum-age calibration points [36]. In order to conduct these analyses for nodes above and below the family level, we used a dataset including more than 75% of the existing genera and 12 molecular markers from the three plant genomes (6 plastid, 4 mitochondrial, 2 nuclear).

Results

Temporal analyses

All runs in each of the five BEAST analyses with different calibration scheme (angio-130, −140, −150, −170 and −200) converged effectively. The concatenated post-burn-in samples of the four runs in each analyses showed final effective sampling sizes (ESS) of the likelihood far above 200 [see Additional file 1]. Except for the position of Siparunaceae (Laurales) and Magnoliaceae (Magnoliales), which remained unresolved in all analyses, the great majority of the relationships among the families received high support values [see Additional file 2] and is similar in all analyses [see Additional file 3].

Conducting BEAST analyses without molecular data highlighted the non-uniformity of the temporal constraints effectively used in the analyses (results not shown). However, no effective temporal priors violated the paleontological information used to calibrate the present analyses. In addition, the temporal posterior distributions of calibrated nodes obtained with the molecular data differ from the effective temporal prior distributions obtained without these data. These mismatches for calibrated nodes suggest that molecular data are well informative.

The 95% credibility intervals of the ages obtained with BEAST are larger than those obtained with r8s (Figure 1). When the maximum age of the root varied, only the ages estimated for the crown node of Magnoliidae, and the splits between Canellales and Piperales, and between Laurales and Magnoliales, are significantly different in the BEAST analyses (Figure 1A). On the contrary, in the PL analyses, the majority of age estimates are significantly different when the maximum age of the root is changed (Figure 1B).

Figure 1
figure 1

Age estimates for the root and different nodes within the tree of Magnoliidae. Estimates are in million years, and the root is constrained with a maximum age. Names of families refer to the crown node ages except when SL (stem lineage) is mentioned. A, BEAST analyses with mean age estimates and 95% credibility intervals. B, r8s (penalized likelihood) analyses with best-scoring ML tree age and 95% credibility intervals. *The stem lineage of Siparunaceae, in the BEAST analysis with the maximum age of angiosperms set to 140 million years, did not correspond to the same node in other analyses. Abbreviation: SL, stem lineage.

In general, the ages estimated in BEAST and r8s were similar (Figure 1, Table 1). In the present paper, we take into account the overlapping 95% credibility intervals of all 10 analyses (see Discussion). Our results support an origin of extant Magnoliidae (crown node) between 178.78 and 126.82 Myr, that is, in the Lower Cretaceous or the Jurassic. The crown nodes of Canellales, Laurales, Magnoliales, and Piperales are dated to 143.18-125.90 Myr, 165.62-112.05 Myr, 164.09-114.75 Myr, and 158.11-88.15 Myr, respectively. By the end of the Cretaceous (66 Myr), most families (crown nodes) were probably present, with the exception of Canellaceae, Myristicaceae and Winteraceae. The oldest families in terms of crown node age are Aristolochiaceae and Calycanthaceae (BEAST: 132.66-52.38 Myr, 119.79-91.65 Myr, respectively) or Aristolochiaceae and Lauraceae (PL: 147.26-101.83 Myr, 146.61-97.3 Myr, respectively), depending on the relaxed clock method used (Figure 1, Table 1).

Table 1 Age estimates for selected nodes of Magnoliidae

Diversification analyses

Our MEDUSA analyses on 1000 posterior BEAST trees of the angio-140 and angio-200 calibration schemes resulted in very similar scenarios for the mode of diversification within Magnoliidae (Figures 2A, B). The two maximum clade credibility trees used to summarize results are identical in terms of relationships. We identified six to 13 significant diversification rate shifts across Magnoliidae. In both cases, models including nine shifts were the most often selected among the 1000 trees sampled (Figure 2). The mean of the initial net diversification rate (speciation minus extinction) at the crown node of Magnoliidae is estimated to 0.0401 ± 0.0138 species per million years (sp.myr−1) in angio-140, and 0.0342 ± 0.0134 sp.myr−1 in angio-200. When considering shifts present in more than 50% of the trees (Figure 2), five main shifts in net diversification rates within Magnoliidae are identified (the corresponding nodes were all supported by posterior probabilities of 1.0 and are numbered on Figure 2). Three are decreases and two are increases: 1) at the crown node of the clade of Piperaceae and Saururaceae (magnitude of shift: −0.0052 ± 0.0155 sp.myr−1 in 70% of trees of the angio-140 analysis; −0.0032 ± 0.0130 sp.myr−1 in 68% of trees of the angio-200 analysis); 2) at the crown node of Piperaceae excluding Verhuellia (+0.0317 ± 0.0421 sp.myr−1 in 83% of trees of the angio-140 analysis, +0.0295 ± 0.0265 sp.myr−1 in 80% of trees of the angio-200 analysis); 3) at the crown node of Calycanthaceae (−0.0292 ± 0.0140 sp.myr−1 in 86% of trees of the angio-140 analysis, −0.0238 ± 0.0133 sp.myr−1 in 83% of trees of the angio-200 analysis); 4) at the crown node of the clade of Degeneria and Galbulimima (−0.0366 ± 0.0140 sp.myr−1 in 71% of trees of the angio-140 analysis, −0.0325 ± 0.0130 sp.myr−1 in 90% of trees of the angio-200 analysis); 5) at the stem node of tribe Miliuseae of Annonaceae (+0.1231 ± 0.0306 sp.myr−1 in 71% of trees of the angio-140 analysis, +0.1157 ± 0.0307 sp.myr−1 in 83% of trees of the angio-200 analysis). In addition to those five main shifts, the net diversification rate averaged over the sample of 1000 trees show a general increase both within the Laurales and within the Magnoliales (Figure 2). Our estimates of the relative extinction rate (ratio of extinction over speciation) at the crown node of Magnoliidae are 0.9000 ± 0.1533 (angio-140) and 0.8857 ± 0.1603 (angio-200) [see Additional file 4]. This rate remained relatively high (more than 0.5) in Magnoliidae except in Laurales and part of Piperales.

Figure 2
figure 2

Diversification analyses. Results of the MEDUSA analyses obtained using 1000 posterior trees randomly sampled from the BEAST posterior: A, angio-140, B, angio-200. Topologies are the maximum clade credibility tree of 1000 randomly selected trees (phylogenetic relationships between A and B are identical). Names of terminals refer to compartments defined in the Methods sections and detailed in Additional file 5. Branch colors illustrate the mean net diversification rate (r). Red dots denote significant shifts in r, their size being proportional to their frequency among the 1000 trees tested. Numbered shifts in Figure 2A: 1, crown node of the clade of Piperaceae and Saururaceae; 2, crown node of the clade of Piperaceae excluding Verhuellia; 3, crown node of Calycanthaceae; 4, crown node of the clade of Himantandraceae and Degeneriaceae; 5, crown node of the clade of Miliuseae and Monocarpia. The diagram in the top-right corner represents the frequency of the different model sizes (number of shifts) in the 1000-model collections for the two analyses (angio-140 and angio-200). Abbreviations: sp., species; Myr, million years.

Discussion

Tempo of Magnoliidae evolution

We provide here for the first time a near complete temporal framework for the evolution of Magnoliidae above the generic level. Applying different maximum age constraints to the crown node of angiosperms highlighted the sensitivity of inferred ages estimated for the deepest nodes of Magnoliidae (Figure 1). Although previous studies have recognized a long gap in the fossil record of the angiosperm stem lineage [13,21,37], there is no indisputable argument in favor of one particular maximum age constraint for the crown node of angiosperms. Here we will systematically consider the whole range of age estimates obtained across our five maximum age calibration schemes and both molecular dating approaches used.

We restricted ourselves to conservative fossil age constraints based on phylogenetic analyses of fossil and extant taxa [36]. This led us to consider only 10 out of more than 100 described fossils putatively belonging to Magnoliidae [38]. However, our new age estimates for families and orders of Magnoliidae appear, in general, to be compatible with the putative fossil record attributed to each of these taxa. Future investigations of the phylogenetic placement of other magnoliid fossils will likely provide additional minimum age constraints, which could push back in time several young estimates supported in the present study, and decrease the size of the credibility intervals. This study further illustrates that, just as paleontological dating, molecular dating estimates are often associated with large uncertainties. The latter approach has to be seen as an attempt to reduce the range of the most likely ages for nodes constrained by the age of the fossil record securely placed and dated, and to evaluate the probability of the ages of nodes for which there is no direct fossil record. Because all the ambiguity of our current knowledge has to be taken into account, the molecular dating approach cannot provide exact secure ages, except for exceptionally fossil-rich clades.

Based on this most reliable knowledge in the field of paleobotany of Magnoliidae [36] and chrono-stratigraphy [39], combined with a comprehensive sample of taxa and molecular markers, our data support an origin of extant lineages of Magnoliidae (crown node of the group) during the Jurassic or Lower Cretaceous between 178.78 and 126.82 Myr. This result tend to be older than previously suggested [12,15,21,22,37,40-42]. For example, the angiosperm-wide study of Bell et al. [37], in which 33 magnoliid genera were sampled, estimated the crown node of Magnoliidae to be between 138 and 108 Myr. In another, more recent angiosperm-wide analysis sampling 34 species of Magnoliidae and using a PL approach, Zanne et al. [42] estimated the age of the crown node of Magnoliidae to be 147 Myr without uncertainty associated. Our new estimates for the crown-group age of orders also tend to be older than previously inferred [13,15,16,22,37] (Figure 1). However, some studies have supported much older ages for these nodes [13,14,21,43]. At the familial level, in addition to providing the first estimates for the ages of crown-group Canellaceae (52.34-11.33 Myr), Saururaceae (80.79-46.69 Myr), and Winteraceae (56.73-12.11 Myr; Table 1), the crown nodes of families are also generally dated to an older range of dates than previously estimated [16-20,26,44], with the exception of Atherospermataceae (Laurales) for which we found a younger age than previously suggested [45]. The large amount of combinations of parameters influencing age estimates limits straightforward explanations of these differences with previous studies (e.g., topology used [46]; fossil species and priors used to define and model age constraints [47]; molecular dating method [13]). Despite variation among the ages obtained for the crown ages of families, our results confirm that Magnoliidae diversified into morphologically distinct clades (now identified as families) by the end of the Cretaceous (66 Myr, Figure 1), but probably earlier than thought before [16-20,26,44]. This provides arguments in favor of an earlier diversification of angiosperms altogether, with the presence of more flowering plant lineages in the Cretaceous than previously thought.

Finally, taking into account the whole range of age estimates obtained across our five different calibration schemes, leads to long ranges of age estimates that are compatible with biogeographic scenarios previously suggested in Magnoliidae [17,26,44,45,48-52]. However, viewed in isolation, the different assumptions about the maximum age of angiosperms lead to alternative time scales for the evolutionary history of the families of Magnoliidae. Therefore, biogeographic scenarios will need to be reevaluated in the context of explicit assumptions on the age of the angiosperms.

Mode of diversification of Magnoliidae

The new timetree obtained for Magnoliidae in this study allowed us to detect an average of nine significant diversification rate shifts across lineages, implying that diversification has not been a homogeneous process throughout the history of the clade (Figure 2). Explaining the present day diversity of species-rich clades has generally required several diversification rate shifts, but previous studies using the same statistical approach in other large angiosperm clades found, in general, fewer shifts than in the present analysis. Within eudicots, Beaulieu and Donoghue [53] detected six shifts in Campanulidae (35,000 spp.), Koenen et al. [54] detected nine shifts in Fabaceae (19,500 spp.), Xi et al. [55] found five significant shifts in Malpighiales (16,000 spp.), and Arakaki et al. [8] found seven shifts in Cactaceae (1850 spp.). Within monocots, Escudero and Hipp [56] supported three shifts in the family Cyperaceae (5480 spp.) while Baker and Couvreur [57] identified up to 13 shifts in palms (2500 spp.). Despite the dependence of the number of shifts detected on the taxonomic level of tips in the chronograms used (MEDUSA cannot detect shifts within supraspecific terminal compartments), it seems that Magnoliidae experienced a high variation in diversification rates. Because of their longer evolutionary history in comparison with these other clades and because their fossil record supports a global distribution during the Cretaceous and the Paleogene [38], Magnoliidae as a whole may have been affected by more events potentially influencing the speciation and extinction rates (e.g., climatic shifts [58]; variation in available area [10]; variation in geographic distribution [59]; dynamic of diversification of associated pollinators [60]).

Establishing the causes affecting diversification-rate variation can be difficult. There is a large diversity of intrinsic and extrinsic traits known to influence speciation and extinction [10], all of them being potentially correlated to each other [61]. The influence of a trait on the birth and death of species is dependent on other taxa, other traits of the same organism, and its physical environment [62], implying that a trait could have different effects on different clades. Finally, the actual shift of diversification may happen several nodes after the evolution of an influencing trait. One explanation could be that an isolated trait will influence diversification in combination with other characters, the effect appearing when the entire set of traits needed is present [53]. For these reasons, the investigation of the correlation between traits and diversification rates will need to take into account as many potential factors as possible [63].

The older estimates of the ages in the “angio-200” hypothesis led to slightly lower absolute rates of diversification than in the “angio-140” hypothesis, as expected. Because the general pattern (number and position of shifts) is the same in both analyses, in the present paragraph we refer only to rates of the angio-140 analysis. The background net diversification rate obtained in the tree of Magnoliidae (0.0401 ± 0.0138 sp.myr−1) is comparable to those found in angiosperms [14,24]. In the present study, the lowest net diversification rates, resulting from shifts 3 and 4 (Figure 2), were found in the crown group of Calycanthaceae (0.0175 ± 0.0066 sp.myr−1), and in the branches of Degeneriaceae and Himantandraceae (0.0057 ± 0.0088 sp.myr−1). These three families share distinctive floral features not or seldom found in other Magnoliidae groups, such as the presence of inner staminodes that cover the stigma at the end of the female phase [64], and spiral phyllotaxy of all floral organs [65,66]. However, it is difficult to link these two characters with mechanisms sustaining a low diversification rate (influence on speciation and/or extinction). Within Calycanthaceae, the low rate of net diversification is associated with a low speciation rate for the entire clade (0.0182 sp.myr−1) and near-zero extinction (0.0007 sp.myr−1). The absence of extinction would seem contradictory with the presence of several fossils in the Cretaceous [17] and could be an artefact of the method which does not take into account the extinct diversity (see below). However, because the phylogenetic position of these fossils within the family has not been tested (except for Virginianthus calycanthoides [67]), it is difficult to draw conclusions about extinction within the family. The pollination systems of Calycanthaceae involve trapping of a large variety of beetles and thrips [68,69]. In comparison with specific pollination systems involving a unique relationship for the two partners, this generalist interaction is not favorable for genetic isolation promoting diversification [70]. In addition, Calycanthaceae have a temperate distribution, except for the monospecific genus Idiospermum growing in a restricted area in north-east Australia. The low ecological limits on species richness in temperate areas [10] could provide an explanation for the low diversification of the lineage leading to the genera Calycanthus and Chimonanthus. On the other hand, the apparent lack of diversification of the lineage leading to Idiospermum could be explained by different causes. Individuals of this species are long-living trees [71] involving low fixation rates in diverging populations. The genus is restricted to the very wet humid tropical lowland rainforests of Australia [72]. Idiospermum presents one of the heaviest seeds among Australian plants and is therefore probably not dispersed by animals, limiting its capacity of dispersion [71,73]. If this low capacity of dispersion is ancestral in the lineage leading to the extant species of the genus, it restrained the number of available areas promoting diversification [10]. This latter explanation might be generalized for all Calycanthaceae as the seeds of the family contain secondary metabolites that are toxic at least for mammals [74].

The highest diversification rate inferred in this study was found on the branch leading to the terminal compartment Miliuseae in subfamily Malmeoideae of Annonaceae (0.1527 ± 0.0548 sp.myr−1). A shift of +0.1231 ± 0.0306 sp.myr−1 at the node sustaining the branch of this clade was present in 71% of the trees tested and the associated rate of speciation on the branch is on average equal to 0.4558 sp.myr−1. Couvreur et al. [26] found that major clades in Annonaceae have undergone different rates of diversification, with subfamily Malmeoideae (previously the Short-Branch Clade) having significantly higher rates than subfamily Annonoideae (previously the Long-Branch Clade). However, analyses of family-level lineage-through-time plots (LTTs) did not detect significant shits in diversification rates during most of the evolutionary history of Annonaceae [26]. Erkens et al. [25], using a tree with fewer taxa and two alternative approaches to detect rate shifts, identified up to three diversification rate shifts in Annonaceae, including the one we detected here at the crown of Miliuseae. Reasons for an increase in diversification within Miliuseae are not yet well known but could be related to founder effects after dispersal into South-East Asia or distinctive pollen characters [25].

Canellales and Piperales have, on average, higher speciation rates than Magnoliales and Laurales. Within Piperales, this may be explained by widespread herbaceous habits, resulting in faster life history (shorter generation time) and higher fixation rates in divergent populations [75]. In addition, within this order, Aristolochiaceae typically have zygomorphic flowers, which has been shown to have a positive impact on diversification [70]. However, Canellales and Piperales also share the highest extinction rates, making Laurales and Magnoliales the most productive orders in terms of net diversification. Differences in extinction rates are difficult to explain. For instance, there are no major differences in pollination systems among the four orders [64], and their current geographic distributions are similar [76]. However, Doyle and Endress [77] found several synapomorphies for the clade of Laurales and Magnoliales, specifically the presence of more than two whorls of stamens and more than one whorl or series of carpels. These characters might have played a role in relation to pollination, but the underlying mechanisms remain unknown.

Finally, even though the MEDUSA approach represents a significant conceptual improvement over previous models, it still requires several important assumptions that may have influenced our results. First, MEDUSA does not implement diversification models in which the extinction rate can be higher than speciation rates. Second, the rates are assumed to be constant in each part of the tree, and only abrupt variations are evaluated, probably not in accordance with the biological reality in which rates could gradually fluctuate. Morlon et al., [78] recently developed an approach specifically to relax these two assumptions, but it is not implemented yet to detect shifts automatically nor to analyze higher-level phylogenies with terminally unresolved clades. In addition, Rabosky [79] recently developed a new Bayesian framework to detect multiple diversification processes in a tree while allowing the speciation rate to vary continuously through time, but it is not available yet for a set of multiple trees and therefore cannot take phylogenetic and dating uncertainty into account. Last, because the fossil record is not taken into account in the diversification rate analyses, very low rates of speciation and extremely low extinction rates are always allocated to the old lineages with few species (long branches in chronograms; [1]), which have probably been more diverse in the past (e.g., Idiospermum). New research efforts are currently being made in this area [80,81]. Therefore, our understanding of Magnoliidae diversification may further improve in the future when more realistic methods and more complete phylogenies are available.

Conclusions

The present study suggests that Magnoliidae began to diversify somewhere between the Toarcian (Early Jurassic) and the Barremian (Early Cretaceous), or 178.78-126.82 Myr ago. Several key nodes within Magnoliidae are dated here for the first time. In general, our age estimates suggest a possibly older diversification of the group than previously inferred. The rich fossil record of Magnoliidae may eventually provide additional calibration points to refine the time scale proposed here. However, considerable work remains to be done to securely relate this extinct diversity to the extant one, a task for which an integrated morphological dataset will be essential. The tempo and mode of diversification within this clade appears to be characterized by several increases and decreases of diversification rates, suggesting that Magnoliidae have not undergone constant diversification but have instead been shaped by alternative, yet to be determined evolutionary processes. Our new dated trees provide a solid basis for future biogeographical studies and robust statistical tests of correlation among intrinsic traits, extrinsic factors, and diversification rates within Magnoliidae.

Methods

Molecular dataset

In order to conduct the molecular dating analyses we used the same 12-marker molecular dataset as [33]. This matrix includes 12 coding and non-coding markers from the three genomes: atpB, matK, trnL intron, trnL-trnF spacer, ndhF, rbcL from the chloroplast; atp1, matR, mtSSU, mtLSU from the mitochondrion; and 18S rDNA and 26S rDNA from the nucleus. In this dataset, we used an exemplar approach, in which each genus was represented by one species. The problematic parasitic family Hydnoraceae was excluded because its exact position within Piperales remains difficult to assess [16,33]. All remaining 19 families of Magnoliidae are represented, and more than 75% of the genera were sampled (198 genera out of 262). In addition, we included 23 outgroup taxa sampled from early-diverging angiosperms (Amborellales, Nymphaeales, Austrobaileyales, Chloranthales) as well as eudicots and monocots [see Additional file 3].

Molecular dating analyses

Calibration scheme

All the geological ages presented in this study follow the revised Geological Time Scale of Gradstein et al. [39]. To calibrate the molecular phylogeny we use the calibration scheme proposed by Massoni et al. [36]. This scheme consists of 10 fossils reviewed for both their phylogenetic positions (based exclusively on explicit phylogenetic analyses) and their absolute age based on the latest stratigraphic and geochronological literature (Table 2), resulting in solid minimum age constraints on six nodes of our tree (Figure 3). In addition to these six minimum age constraints, we used two maximum ages. First, the crown node of eudicots was set to a conservative maximum of 126.7 Myr (126.3 ± 0.4 Myr), based on the first appearance of tricolpate pollen grains near the late Barremian-early Aptian boundary [82,83]. This maximum age constraint is justified by the fact that tricolpate pollen is a synapomorphy of eudicots and that the absence of such pollen in well sampled earlier sediments worldwide is well documented. It has been used extensively in previous molecular dating analyses of angiosperms as a whole [14] and of clades nested within eudicots [47]. Second, the crown node of angiosperms was set to five different maximum ages in order to test the sensitivity of the Magnoliidae divergence time scale to this temporal constraint. The age of the crown node of angiosperms is a matter of debate [84]. Several pre-Cretaceous fossils for the angiosperms have been described but their ages and phylogenetic affinities with the angiosperms (stem relatives of angiosperms, crown members of angiosperms, or gymnosperms) are controversial [84]. The oldest fossil pollen confirmed to be angiospermous (either crown or stem), based on columellar exine structure, is dated to the Hauterivian or Valanginian (i.e. 139.4-133.9 Myr [84]). On the other hand, molecular dating studies have generally converged to an age for the crown node of angiosperms between 140 and 200 Myr [12,15,22,37,40,41] with few of them inferring older ages [13,43]. In order to take into account this uncertainty on the crown age of angiosperms, we used five different age constraints: 130 Myr (angio-130 analyses), 140 Myr (angio-140 analyses), 150 Myr (angio-150 analyses), 170 Myr (angio-170 analyses) and 200 Myr (angio-200 analyses) for the maximum age constraint applied to the crown node of angiosperms in all molecular dating analyses.

Table 2 Fossil species used to define the calibration scheme presented in Massoni et al. [36]
Figure 3
figure 3

Chronogram of the angio-140 analysis obtained with BEAST. Maximum clade credibility tree obtained with BEAST and the maximum age constraint for the crown node of angiosperms set to 140 million years. Node bars are 95% credibility intervals. Blue dots symbolize minimum age constraints, and green dots the maximum age constraints applied to crown eudicots and crown angiosperms. The geologic time scale follows Gradstein et al. [39]. Abbreviations: Oligo., Oligocene; Mio., Miocene; Plio., Pliocene; Win., Winteraceae; Can., Canellaceae; Ari., Aristolochiaceae (incl. Lactoris); Sau., Saururaceae; Pip., Piperaceae; Cal., Calycanthaceae; Ath., Atherospermataceae; Her., Hernandiaceae; Lau., Lauraceae; Mon., Monimiaceae; Myr., Myristicaceae; Deg., Degeneriaceae; Him., Himantandraceae; Mag., Magnoliaceae; Eup., Eupomatiaceae; Ann., Annonaceae.

Divergence time estimation

Uncorrelated Lognormal Clock (UCLN) analyses

To evaluate divergence times within Magnoliidae while taking into account phylogenetic uncertainty, we used BEAST v1.7.5 [85] without fixing the tree. In all BEAST analyses, we partitioned our molecular dataset according to the 12 markers (as in Massoni et al. [33]). We used MrModeltest 2.3 [86] to evaluate the best fitted model for each partition. For all partitions, GTR + Gamma was selected as the most appropriate model according to the Akaike Information Criterion, and the evaluation of invariant sites (I) was recommended for all, except for the plastid spacer trnL-trnF. A Birth-Death incomplete sampling prior was specified for the trees [87], and rate heterogeneity was modeled using the UCLN relaxed clock of Drummond et al. [88] with a uniform prior for the mean of the branch rate set between 0 and 1E100. All age constraints were applied using uniform priors (hard minimum and maximum ages) because of the difficulty of parameterizing non-uniform priors, which may involve subjectivity [47]. These uniform distributions were bounded either by the age of the fossil (Table 2) and 1E100 Myr for minimum age constraints, or by the appropriate maximum age (see above) and 0 Myr for maximum age constraints. Because in BEAST the construction of the calibrated tree prior is a multiplicative construction involving the multiplication of the calibration densities (specified by the user) with the tree prior, the effective calibration priors used in analyses can differ from the specified priors [89]. This can cause the effective prior to violate the desired paleontological constraints [90]. We conducted all BEAST analyses without the data in order to estimate the mismatch between the specified and effective calibration priors, but also the effect of data on posterior distributions. Because of initial likelihood problems at the start of the analyses we used a starting tree obtained from the Bayesian analysis of Massoni et al. [33] and rendered ultrametric using Penalized Likelihood (PL) implemented in r8s v1.8 [91]. Because of rounding to a limited number of decimals, branch times in this starting tree summed up to slightly older node ages than its original calibrations. Thus, in order to use this starting tree, we had to apply slightly older (+0.1 Myr) maximum age constraints (126.8, 130.1, 140.1, 150.1, 170.1, and 200.1 Myr). For all analyses, we ran four independent chains of a Markov Chain Monte Carlo procedure for 100 million generations each, sampling parameters and trees every 1000 generations [see Additional file 1]. All analyses were performed on the CIPRES cluster [92]. We evaluated the size of the burnin phase for each run using Tracer v1.5 [93]. The post-burnin posteriors of the four runs of each analysis were then combined using LogCombiner v1.7.5. Because of computational limitations in relation to the large size of generated files, we re-sampled trees and parameters every 10,000 or 15,000 generations [see Additional file 1]. We then used TreeAnnotator v1.7 to select the maximum clade credibility (MCC) tree of each analysis.

Penalized Likelihood (PL) analyses

We also conducted PL analyses for each set of age constraints using r8s v1.8 [91]. The PL relaxed clock assumes some degree of autocorrelation of molecular substitution rates between a parent and its immediate descendants [94]. Although autocorrelated relaxed clock models are also available in Bayesian frameworks, our rationale for using r8s was to test the sensitivity of our age estimates to a fundamentally different approach to molecular dating. In all analyses, we used the best scoring maximum likelihood (ML) phylogram of Massoni et al. [33] obtained with the 12-marker RAxML analysis excluding Hydnora. In order to determine the optimal level of autocorrelation across the tree (smoothing parameter), we conducted cross validation procedures testing 17 different values of smoothing in a range between 0.1 and 10,000,000 for each calibration scheme, using the penalty additive function. For angio-130, angio-140, angio-170 and angio-200 the optimal values were 10, and for angio-150 it was 3.2. In order to provide confidence intervals on age estimates with PL, we conducted non-parametric maximum likelihood bootstrapping of 1000 replicates on the original dataset of Massoni et al. [33] using RAxML v7.3.2 [95] while fixing the tree topology to the best scoring ML tree obtained in this previous study. As RAxML requires the application of the same model to all partitions, we used the GTR + GAMMA + I model for each partition. As mentioned above, this model was selected for all partitions of the present dataset except for the plastid spacer trnL-trnF for which the invariant site was not recommended. Within the resulting 1000-tree collection, only branch lengths varied. We reconstructed the chronograms for both the best ML tree and the 1000 bootstrapped phylograms using the optimal smoothing parameter obtained for the best ML tree and the TN algorithm of r8s. All results were summarized using the software TreeAnnotator v1.7 with the 1000 PL trees as the input file and the ML-PL tree as the target tree.

Diversity dynamics

We used the MEDUSA approach of Alfaro et al. [1] to test for significant diversification rate shifts in an incompletely sampled phylogeny where each terminal is assigned the total number of extant species it represents. MEDUSA detects, using a stepwise AIC approach, significant shifts of diversification rates across a given phylogeny by evaluating the fit of different pairwise birth-death models. In order to select the best model we used the AICc criterion, a birth-death model of diversification, and we allowed the placement of shifts either on stem or on crown nodes. To assess the sensitivity of MEDUSA to age and phylogenetic uncertainty, we conducted the analyses on two collections of 1000 posterior BEAST trees randomly sampled from the post-burnin phase, corresponding to calibration scenarios angio-140 and angio-200. Prior to the analyses, each randomly sampled chronogram was transformed in two ways, using functions in the ape package of R [96]. First, all outgroups of Magnoliidae were pruned. Second, the chronograms were simplified by pruning selected terminal taxa so that each remaining terminal taxon would represent a monophyletic compartment with known extant diversity (see below). All analyses were performed in R using the multiMEDUSA procedure from package MEDUSA v0.93 4.33 available on the website authored by Joseph W. Brown (https://github.com/josephwb/turboMEDUSA).

Extant taxonomic richness data

In order to assign the entire species richness of Magnoliidae to the tips of the phylogeny, it was necessary in several instances to merge tips into a single terminal to represent larger monophyletic groups hereafter referred to as compartments. We proceeded in two steps. First, we maintained genera as tips or created larger compartments, depending on the monophyletic status of genera as tested in previous phylogenetic studies: if the literature supported the monophyly of a genus, this taxon was used as a compartment; if the monophyly of a genus had never been tested before, a compartment including this genus and its sister group was defined (or a larger compartment if the neighborhood relationships were not well supported); if the monophyly of a genus had been challenged by previous studies, we defined a larger compartment to include this genus and all other genera potentially involved in the paraphyly or polyphyly. After this first step, we modified our compartmentalization scheme to take into account the species numbers of genera not sampled in our trees (less than 25% of the total number of accepted genera). If there was enough information in the literature to support an accurate placement (phylogenetic or apomorphy-based) of a missing genus, we added its number of species to that of an earlier defined compartment (generic or supra-generic). In some cases, the missing genus could not be precisely placed within a larger clade of two or more previously defined compartments. As a guideline, we accepted to merge these compartments into a larger one if the number of species would represent more than three percent of the total number of species in the resulting compartment. Otherwise, the missing diversity was ignored in order to maintain enough compartments for conducting a meaningful analysis. In all cases, we defined these supra-generic compartments in such a way that they were present in all 1000 BEAST trees used to conduct the MEDUSA analyses (posterior probabilities [PP] equal to 100%). The resulting compartmentalization scheme consisted in 85 terminal taxa [see Additional file 5]. In total, 31 species (16 genera) were ignored due to unknown phylogenetic placement, representing ca. 0.3% of the 10,209 species of Magnoliidae. For 15 genera (out of 262) the number of species was not clearly indicated in previous publications in which case the Plant List (http://www.theplantlist.org/) was used to estimate the number of extant species currently accepted in the genus. The definition of supra-generic clades and the incorporation of the missing diversity are justified in detail in Additional file 5.

Availability of supporting data

Nexus files including the MCC trees of the BEAST analyses and the PL trees of the r8s analyses (annotated with the 95% credibility intervals of age estimates) are available in Dryad Digital repository, http://dx.doi.org/10.5061/dryad.ct231 [97].