Introduction

Molecular dating is an essential component of contemporary evolutionary studies. The idea that substitutions accumulate in a time-correlated manner in molecular sequences has greatly impacted evolutionary biology since it was proposed in the 1960s [1,2,3,4]. Over the last decades, major breakthroughs in sequencing technologies have allowed the assembly of large molecular datasets to estimate divergence times between species [5,6,7,8]. Such massive datasets pose a computational burden to parameter-rich molecular dating methods that rely on Bayesian Markov chain Monte Carlo (MCMC) sampling, slowing the testing and proposition of evolutionary hypotheses [9,10,11,12]. Because of this, phylogenomic studies have frequently devised alternative strategies to compute biological timescales, including the use of reduced datasets [13,14,15,16,17,18,19] and the summarization of time estimates based on data partitioning schemes [20, 21].

Such limitations prompted the development of rapid methods to date lineage divergences as alternatives to the standard Bayesian molecular dating, hence accelerating evolutionary analysis in the big data era [22, 23]. Like Bayesian approaches, the new methods have their own assumptions, including those related to how substitution rates vary across the phylogenetic tree. Currently, the most frequently used rapid molecular dating approaches are penalized likelihood (PL) [24] and the relative rate framework (RRF) [12, 25]. They have been employed in several branches of the Tree of Life, from prokaryotes to plants and animals [26,27,28,29,30,31,32,33]. Notably, these methodologies are more environmentally friendly than highly parametric Bayesian analyses, as their associated carbon footprints are orders of magnitude smaller [35]. Because of this, they might play an important role in the growing environmental awareness of bioinformatics research, conforming with the green computing standards [34, 35].

Although both PL and RRF do not require rate constancy, they are fundamentally distinct. PL uses a penalty function to minimize rate changes between adjacent branches globally [24]. Therefore, it assumes autocorrelation of evolutionary rates, which has been suggested as pervasive across the tree of life [36, 37]. A key component of PL is the smoothing parameter (λ), which controls the global level of rate variation and is optimized by a cross-validation method. The lower the value, the greater the rate variation across the phylogeny. PL was first implemented in the r8s software [38], and was later refined to deal with large phylogenies [39, 40]. In turn, RRF minimizes the difference in evolutionary rates of ancestral and descendant lineages individually [12]. This eliminates the need for a global penalty function and still accommodates rate differences between sister lineages [23]. As a result, RRF does not require any additional analytical step, such as the cross-validation procedure, to select an optimal level of rate variation. It is also important to mention that although the rates estimated by RRF are autocorrelated, RRF deals with lineage rates instead of branch rates [12], the standard modeling of Bayesian autocorrelated methods [41]. RRF is implemented in the RelTime routine of the software MEGA [42].

As they are currently implemented, PL and RRF also differ in the treatment of calibration information. While PL requires calibration information to be hard-bounded by minimum and/or maximum values [38], RRF via RelTime allows for the use of calibration densities [43]. Additionally, the uncertainty associated with the estimates of node ages are dealt with distinctly. PL can be combined with a bootstrap approach to asses uncertainty [38, 44], whereas RelTime adopts an explicit analytical equation to calculate confidence intervals [43]. Both frameworks reduce computational requirements compared to Bayesian relaxed clock methods. Because the algorithms of PL and RRF are different, results may be different, and their relative performances compared to Bayesian approaches have not been evaluated yet with empirical datasets.

As PL and RRF have been increasingly used to estimate timescales over the last years, it is essential to carry out large-scale evaluation against the popular Bayesian framework. While previous studies investigated both fast dating methods separately [22, 25, 40, 45,46,47,48], a joint assessment of their performance with empirical data is lacking [49]. Moreover, treePL, which is the most popular implementation of PL for large phylogenies, was not extensively compared to any Bayesian method whatsoever, and there is little information on how they behave comparatively with real data. In this regard, the phylogenomic datasets that have been produced in the last years provide the ideal opportunity to investigate the relative performances of rapid and Bayesian methods.

Material and methods

We collected empirical datasets from 23 phylogenomic studies to assess the relative performance of fast dating methods compared to Bayesian methods. Studies were selected based on the availability of Bayesian timetrees or the input files used to carry out Bayesian inference plus molecular sequence alignments deposited in public databases or as supplementary information. Data retrieved comprise DNA and amino acid sequences from diverse taxonomic groups with divergences as old as the Precambrian. The number of sequences ranged from tens to nearly a thousand, and alignment lengths from ~ 5 kb to > 4 Mb. Alignment lengths, data types, number of terminals, calibration information, methodology originally employed, and the labels used to refer to each study, are summarized in Table 1.

Table 1 Detailed information about the phylogenomic datasets analyzed

The original studies have employed a Bayesian relaxed clock methodology as implemented in BEAST, MCMCTree, or PhyloBayes, except for Kuntner et al. (2019), who estimated divergence times using the RRF. In this case, the Bayesian timescale was inferred for the first time. Whenever possible, timetrees were directly obtained from the original works. Otherwise, divergence times were estimated using the input files published. We tried to keep substitution models matching the original studies, but studies that used CAT models of amino acid substitution implemented in PhyloBayes [71] were subjected to model selection in MEGA X [42]. If the original study applied data partitioning with distinct substitution models, we chose the model used in most partitions.

Fast divergence time inference

We used the same alignment and topology as originally employed by the authors to estimate absolute times in RelTime [12, 25] and treePL [40]. Temporal calibration information was also extracted from the studies and applied according to the specificities of each method. To standardize computation, all analyses were carried out on a machine with a 3.2 GHz 6-Core Intel® i7 processor and 64 GB 2667 MHz DDR4 RAM. All branch lengths (in substitutions per site) used by both methods were estimated in MEGA X. RelTime calculations were performed with the command line version of MEGA X, and the confidence intervals (CI) of divergence times were calculated analytically, as implemented by the method.

In treePL, the program was firstly run using the option ‘prime’ to select the best optimization parameters. Then, a cross-validation procedure was performed to optimize the smoothing parameter values for each dataset [24], totalling 10 optimization iterations and 1017 simulated annealing iterations. The ‘cvstart’ and ‘cvstop’ parameters were set to 1017 and 10− 19, respectively, resulting in 37 smoothing parameter values tested. All analyses were run with the ‘thorough’ option. Confidence intervals of time estimates were calculated from 100 bootstrap replicates summarized in TreeAnnotator [72].

Regarding calibration information, whenever the original studies employed uniform priors, the bounds of the uniform distributions were provided as minimum and maximum boundaries of node age in treePL, while in RelTime, they were set as lower and upper limits of a uniform distribution. When probability distributions other than the uniform were originally used, namely, the normal, lognormal, exponential and skew-t distributions, they were also used in RelTime, except for the skew-t distribution, which is currently unavailable in this software. It was thus approximated by a normal distribution using the sn [73] and fitdistrplus packages [74] in R [75]. As treePL implements only minimum and maximum values as calibrations, we derived minimum and maximum bounds based on the lower 2.5% and upper 97.5% quantiles, respectively, of the density distributions. For the skew-t distribution, we did the same procedure, but using the normal distribution approximated for RelTime.

Because treePL works with rooted trees, the outgroup was removed before running the analyses. In RelTime, the outgroup was provided only to root the ingroup, but no calibrations were placed within it, and it was later removed from the estimated timetrees.

For the Kuntner et al. (2019) dataset, we inferred a Bayesian timescale in MCMCTree [76, 77] using the same calibration information, employing the independent rates prior with the HKY + G(5) substitution model [78]. Markov chain Monte Carlo analysis was run twice to check for convergence, each chain was sampled every 100th cycle until ESS values to approximate the posterior were greater than 200.

Evaluation of relative performance

To contrast RelTime and treePL estimates to those derived with Bayesian methods, we calculated a series of metrics. For Bayesian time estimates, either the means or the medians of the posterior distribution of divergence times were used, depending on which value was reported in the original study. For each dataset, we performed linear regressions of RelTime and treePL estimates against Bayesian estimates. The coefficient of determination (R2) and the slope (β) of the linear regression through the origin were used as summary statistics to assess the strength of the association between fast and Bayesian dating methods.

For each data set, the average difference between fast dating methods and Bayesian time estimates was normalized to become comparable across studies that focused on various depths of the Tree of Life. Given n divergence times in a data set, for each ith node age (t), the average difference was calculated as follows.

$$\overline{D}=\left(\frac{1}{n}\sum_{i=1}^n\frac{\mid {t}_{i, FAST}-{t}_{i, BAYES}\mid }{t_{i, BAYES}}\right)\times 100\%$$

Additionally, the precision of divergence time estimates was also accessed. For Bayesian time estimates, measures of uncertainty were as reported in the original study, either the highest posterior densities (HPDs) or the credibility intervals (CrIs). Because confidence and credibility intervals are fundamentally different from a statistical standpoint, they were not compared directly. In practice, these metrics are generally regarded as the measures of uncertainty associated with the time estimate in empirical studies, and they are required for evolutionary hypothesis testing. Thus, we reported their values for each method. For convenience, RelTime CIs, treePL CIs and HPDs/CrIs from Bayesian analyses will be hereafter referred to simply metrics of uncertainty.

For each dataset, two values were computed based on uncertainty metrics: the coverage and the median uncertainty width of each method. Coverage is a measure analogous to the success rate, as it indicates the frequency that node age estimates from fast methods were included within the credibility interval of the original Bayesian analyses. This frequency was computed for each dataset. The median uncertainty width of a method for each dataset was calculated as follows. For each ith node age estimate, the difference between the maximum (tmax) and minimum (tmin) limits of the uncertainty metric (U) was normalized by the estimated node age (t).

$$U\ {width}_i=\frac{t_{i,\mathit{\max}}-{t}_{i,\mathit{\min}}}{t_i}$$

Therefore, uncertainty widths of a data set were transformed as fractions of the estimated node ages, and their median value was calculated. Importantly, this measure was computed excluding nodes that presented node ages smaller than 10− 10. This was done to avoid division by values near zero.

We tested whether the number of terminals, the number of sites in the alignment, and the percentage of calibrated nodes (the number of calibrations divided by the number of tree nodes) impacted the association between the Bayesian estimates and those from both fast-dating methods. Linear models were inferred using 1) the absolute deviations of the slope of the regression lines from 1 or 2) the mean squared errors (MSEs) as response variables. Besides MSE, we also tested the R2 and the RMSE as measures of goodness of fit with identical results. The importance of each feature was assessed by the varImp function [79] of the caret R package [80].

Results

Fast methods produced time estimates highly correlated with Bayesian time estimates, regardless of the Bayesian method employed. All the recovered R2 values of the linear regression between fast methods and Bayesian node ages were ≥ 0.94, with most values higher than 0.98. The slope of the regression lines indicated a great correspondence between rapid methodologies and Bayesian node ages (Fig. 1a). The median slope values were 0.98 and 0.95 for treePL and RelTime, respectively. Nevertheless, the slopes of the regression lines between treePL and Bayesian time estimates presented a larger variance than when we compared RelTime to Bayesian node ages. For instance, in the Peters et al. dataset [67], the comparison of treePL and Bayesian time estimates returned a β = 1.99, indicating that node ages were generally 99% older than MCMCTree inferred times. For this same dataset, RelTime node ages led to a β = 1.46 when compared to Bayesian divergence times. For three other datasets, treePL estimates showed very high β values when compared to Bayesian estimates: PessoaFilho17 [65] (βtreePL = 1.57, βRelTime = 1.15), Allio20 [50] (βtreePL = 1.58, βRelTime = 1.09) and Peters17 [66] (βtreePL = 1.6, βRelTime = 1.16). On the other hand, treePL produced much younger times for the dataset of Fang18 [58] (βtreePL = 0.54, βRelTime = 0.75). The highest β recovered for RelTime was for the dataset of Ran18 [68] (βRelTime = 1.5), which was very similar to the β recovered for treePL (βtreePL = 1.48). The lower β values produced by the node ages estimated by RelTime were for the datasets of Hedin19 [60] (βtreePL = 0.54, βRelTime = 0.75) and Fang18 [58] (βtreePL = 0.78, βRelTime = 0.75). Comparisons between time estimates using Bayesian and fast methods per dataset can be accessed through Supporting information 1.

Fig. 1
figure 1

The performance of fast dating methodologies relative to the Bayesian methods for phylogenomic data. The slopes (β) of the linear regressions through the origin between rapid and Bayesian methods are shown in panel a. The mean normalized differences between RelTime/treePL and Bayesian node ages (\(\overline{D}\)) are shown in b

The distribution of treePL \(\overline{D}\) values was also wider than the distribution of RelTime (Fig. 1b). RelTime estimates were, on average, more similar to Bayesian time estimates, as the mean \(\overline{D}\) was 26.5% for RelTime and 37.24% for treePL. When treePL was used to estimate divergence times, several datasets led to estimates that were, on average, more than 50% different from the Bayesian node ages. Conversely, RelTime molecular dates were, on average, more than 50% different than the Bayesian estimates for a single dataset (Ran18). For this dataset, both treePL and RelTime node ages were approximately 60% different from Bayesian times. For most datasets (70%), RelTime produced time estimates that were, on average, less than 30% different from the Bayesian ones, while treePL estimated node ages that were less than 30% distant from Bayesian times for only 39.13% of the datasets (Supporting information 2).

Regarding the uncertainties of time estimates, treePL provided very narrow uncertainty intervals, with the distribution of the median interval widths across all datasets analyzed centered around 19.6%. This same value was centered around 64.3% for Bayesian and 102.3% for RelTime. For some of the datasets (52.17%), treePL uncertainty intervals eventually did not include the node ages estimated by the method itself. In these cases, up to 9% of the node ages did not fall within the intervals generated by treePL bootstrap approach. Regarding the frequency in which fast methods’ divergence times were included within the Bayesian credibility intervals, treePL and RelTime presented a similar performance. Mean coverage values for RelTime node ages were centered around 77.3%, while for treePL, it was placed around 75.1% (Fig. 2). The percentage of datasets that led to coverage values that included less than half of the estimated node ages of a phylogeny was 41% for treePL and 27% for RelTime. On the other hand, for 36 and 45% of the studies, time estimates were covered by the Bayesian credibility interval with a frequency of more than 80% when using treePL and RelTime, respectively.

Fig. 2
figure 2

Frequency in which time estimates from treePL and RelTime were placed within the Bayesian credibility intervals as reported by the original studies (coverage)

For both fast-dating methods, deviation from the slope β =1 was significantly explained by the three features investigated (p < 0.001 and R2 = 0.59 for RelTime and p < 0.005 and R2 = 0.40 for treePL). The data feature with the highest importance in determining the deviation from a perfect fit to Bayesian estimates was the number of sites in the alignment (importance of 60% for RelTime and 37% for TrerePL). For explaining MSEs, the calibration density was the feature with the highest importance for RelTime (69%, p < 0.001 and R2 = 0.50), while treePL MSEs were not significantly predicted by any of the features analyzed (p > 0.05). For RelTime, increasing the density of calibrations resulted in more distinct time estimates from Bayesian analysis.

Computational efficiency was very distinct between fast methods (Fig. 3). Average running times were 51.8 hours for treePL and 0.9 hours for RelTime. For most datasets, treePL took more than 24 hours to complete the calculations. In fact, RelTime usually took less than 2% of treePL running time, often more than 60 times faster than treePL (Fig. 3). Because confidence intervals are essential to retrieving uncertainty measures for divergence time estimates, treePL running times considered the estimation of branch lengths for the one hundred bootstrap replicates used to compute confidence intervals.

Fig. 3
figure 3

Computational speed ratio of RelTime to treePL for the phylogenomic datasets analyzed

Discussion

We provided the first comprehensive analysis of two of the most frequently used fast dating methodologies against Bayesian molecular dating, employing several empirical phylogenomic datasets from distinct biological groups, including up to hundreds of taxa. We measured differences in node age estimates, coverage of the Bayesian credibility intervals, and computational time efficiency. Our findings indicate that RRF, as implemented in RelTime, is a fast alternative to time-consuming molecular dating software. RelTime was much faster and generally provided time estimates closer to the Bayesian node ages than treePL. TreePL, which is considered a fast algorithm for performing molecular dating, required a significant computational time. This was due to the bootstrapping strategy used to compute confidence intervals of time estimates. As measurements of uncertainty are necessary to interpret biological scenarios derived from timetrees, their calculation entailed a running time that was comparable to Bayesian approaches, with some running times of more than one month.

Studies that have evaluated treePL performance against other approaches are scarce. The original work describing its implementation performed an evaluation using simulated and empirical data [40]. However, simulations did not include alignments, as the divergence times were directly inferred from the true tree, and the empirical datasets did not consist of several loci. Previous works employing both Bayesian approaches and treePL compared time estimates for specific taxa [81, 82], and their results are contrasting, with treePL leading either to older time estimates than BEAST in angiosperm evolution [82], or younger node ages than BEAST in a flowering plant family [81]. These works also reported contrasting results regarding the precision of treePL time estimates. In the present study, treePL confidence intervals were consistently narrow for all datasets analyzed. This result is expected because the bootstrap procedure leads to reduced parametric uncertainty as the number of sites increases, which is the case for phylogenomic data. Regarding time estimates, we found that treePL tended to produce older estimates than Bayesian analyses (Fig. 1a). This is in agreement with other works that have compared PL to Bayesian and non-Bayesian approaches [83,84,85,86].

It is already known that PL may provide overly ancient divergence time estimates when there is no calibration information to limit node ages near the root because of optimization issues [87]. The absence of efficient time constraints at deeper nodes was, in fact, common to all the analyses where older estimates were obtained (β > 1.1). For most of these datasets, treePL placed the age of the deep nodes precisely at or very close to the values provided as loose maxima. To test if the PL approach would present a better performance when outgroups and root/outgroup calibrations were kept in the analyses, we have conducted all treePL analyses using all ingroup and outgroup sequences and calibrations (when applicable). We did not find any significant performance improvement (Supporting information 3). Additionally, our findings corroborate Barba-Montoya et al. [49], which recovered a better performance for RelTime using simulated data. These authors found treePL to be more impacted by minor deviations from the molecular clock. While we have not quantified the clockness of the empirical datasets, this was probably one of the reasons for the more asymmetrical distributions of \(\overline{D}\) values for treePL, while RelTime presented lower asymmetry (Supporting information 2).

Comparisons between time estimates retrieved by the RRF and Bayesian methods have been carried out in several empirical studies [12, 22, 25, 43, 45, 88,89,90]. Mello et al. (2017) and Tao et al. (2020) employed phylogenomic datasets and found that RelTime produced reliable time estimates compared to BEAST and MCMCTree. Here, we extended these findings to PhyloBayes software, which implements more sophisticated substitution models. Although MEGA does not provide the option to use the site-heterogeneous models implemented in PhyloBayes, times inferred employing the simpler models available in MEGA exhibited good correspondence to PhyloBayes estimates. The equivalence between timescales from simple and complex homogeneous substitution models was reported elsewhere [91]. We confirmed this finding and showed that it could be extended to site-heterogeneous substitution models.

If researchers need a faster alternative to Bayesian dating, our work demonstrated the good performance of RelTime’s RRF when compared to treePL. Besides providing node ages closer to Bayesian estimates, RelTime inferred ages were placed within Bayesian credibility intervals more frequently. Recently, using simulated data, Barba-Montoya et al. [49] also recovered a greater accuracy for RelTime when compared to other fast dating methods, particularly when autocorrelated rates were used. We showed that for empirical phylogenomic datasets, in which the true rate model is unknown, RelTime also performed better than treePL to approximate the standard Bayesian procedure. Additionally, on average, treePL produced rather precise estimates. The narrow confidence intervals of treePL estimates were also previously recovered using simulated data [49]. Simulations also have shown that RelTime confidence intervals exhibit equivalent or greater coverage probabilities than Bayesian approaches [43].

Besides having good statistical proprieties, we expect fast dating methods to reduce computational time significantly. We demonstrated that, on average, RelTime was 60 times faster than treePL. In the age of big data, such speed-up makes large-scale biological hypothesis testing feasible. Moreover, previous works based on simulations that accessed PL performance against Bayesian approaches and RelTime found that it performed worse than these methods under various scenarios of heterogeneous rates [25, 92]. These findings, together with our results that certified the speed of RelTime, demonstrate the usefulness of the RRF in obtaining biological timescales for large datasets.

The discrepancy between divergence time estimates from fast-dating and Bayesian methods was primarily influenced by the alignment length. Longer alignments resulted in larger differences between methods. This result is expected if methods rely on different modeling assumptions regarding parameters and evolutionary rate variation. Consequently, as the sample size approaches infinity, estimates become significantly different. For RelTime, calibration density significantly impacted the MSE of time estimates, implying that, besides alignment length, increasing the number of time constraints also makes the differences between methods more pronounced [49].

While previous work has advocated that RRF may not be suitable to infer divergence times for deep time datasets, leading to overly older time estimates [90], our analyses did not support this claim. Also, in contrast with a previous study [89], our results indicate that the strategy used by RelTime to calibrate timetrees [43] is as appropriate as the Bayesian calibration priors, yielding excellent correspondence between the timescales from both methods for most of the datasets (for ~ 78% of the datasets, β values deviated less than 0.2 from 1).

It is worth mentioning that larger differences between Bayesian analysis and RelTime may be retrieved at nodes connecting branches with lengths close to zero. Such lack of substitutions along branches causes RelTime to estimate more recent node ages. The fact that fast methods use branch lengths to estimate divergence times without relying on priors for node ages implies that when some branches have near zero substitutions, they underestimate times compared to Bayesian analysis. This occurs because divergence time priors assign lengths > 0 even when no substitutions are observed, as in the coalescent prior [93]. This may also affect treePL estimates, as observed for the dataset of Fang18 (Supporting information 2), although treePL may also assign non-zero time values to branches where the number of accumulated substitutions is effectively zero [40], leading to older inferred times than RelTime.

Our comparative analysis using a comprehensive empirical dataset has shown that fast dating methods are a viable alternative to time-consuming Bayesian methods to infer node ages for large-scale datasets. Additionally, we demonstrated that the RRF approach implemented in RelTime performed better, with lower demand in computational times. Thus, we emphasize the efficacy of the RRF in establishing molecular timescales with excellent correspondence to those inferred by Bayesian approaches. Timescales from different dating frameworks were impacted by alignment length, suggesting that their asymptotic properties are different. Furthermore, the quick estimation of confidence intervals of node ages allows for robust testing between several alternate evolutionary hypotheses, eliminating the computational burden brought forth by big data in biology.