Evolutionary Genomics pp 71-117 | Cite as
A Not-So-Long Introduction to Computational Molecular Evolution
Abstract
In this chapter, we give a not-so-long and self-contained introduction to computational molecular evolution. In particular, we present the emergence of the use of likelihood-based methods, review the standard DNA substitution models, and introduce how model choice operates. We also present recent developments in inferring absolute divergence times and rates on a phylogeny, before showing how state-of-the-art models take inspiration from diffusion theory to link population genetics, which traditionally focuses at a taxonomic level below that of the species, and molecular evolution. Although this is not a cookbook chapter, we try and point to popular programs and implementations along the way.
Key words
Likelihood Bayes Model choice Phylogenetics Divergence times1 Introduction
Many books [1, 2, 3, 4, 5, 6, 7] and review papers [8, 9, 10] have been published in recent years on the topic of computational molecular evolution, so that updating our previous primer on the very same topic [11] may seem redundant. However, the field is continuously undergoing changes, as both models and algorithms become even more sophisticated, efficient, robust, and accurate. This increase in refinement has not been motivated by a desire to complicate existing models, but rather to make an old wish come true: that of having integrated methods that can take unaligned sequences as an input, and simultaneously output the alignment, the tree, and other estimates of interest, in a sound statistical framework justified by sound principles: those of population genetics.
The aim of this chapter is still to provide readers with the essentials of computational molecular evolution, offering a brief overview of recent progress, both in terms of modeling and algorithm development. Some of the details will be left out as they are dealt with by others in this volume. Likewise, the analysis of genomic-scale data is briefly touched upon, but the details are left to other chapters.
2 Parsimony and Likelihood
2.1 A Brief Overview of Parsimony
s _{1} | ATGACCCCAATACGCAAAACTAACCCCCTAATAAAATTAATTAACCACTCCTTC |
s _{2} | ATGACCCCAATACGGAAAACTAACCCCCAAATAAAATTAATTAACCACTCATTC |
s _{3} | ATGACGCCAATACGCAAAACTAACCGCCTAATAAAATTAATTTACCACTCATTC |
s _{1} | xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxyxxx |
s _{2} | xxxxxxxxxxxxxxyxxxxxxxxxxxxxyxxxxxxxxxxxxxxxxxxxxxxxxx |
s _{3} | xxxxxyxxxxxxxxxxxxxxxxxxxyxxxxxxxxxxxxxxxxyxxxxxxxxxxx |
The winning-site strategy
Site pattern | Supported T_{i} | Count |
---|---|---|
xxx | T _{0} | 48 |
xxy | T _{1} | 3 |
xyx | T _{2} | 2 |
yxx | T _{3} | 1 |
A number of methodological variations exist. A very condensed overview can be found in the books by Durbin [14] or, with more details, Felsenstein [15]. Most computer programs that implement substitution models where sites are iid condense the alignment as an array of site patterns; some, like PAML [16], even output these site patterns.
Note that in obtaining this topology estimate, most of the site columns were discarded from our alignment (all the xxx site patterns, representing 89% of the site in our example above). Most of our data were phylogenetically uninformative (for parsimony). We also failed to take evolutionary time into account, or any process of basic molecular biology, such as the observation that transitions (substitution of a purine [A or G] by a purine, or a pyrimidine by a pyrimidine) are more frequent than transversions (substitution between a purine and a pyrimidine).
2.2 Assessing the Reliability of an Estimate: The Bootstrap
As with any statistical exercise estimating a quantity of interest, we would like to have a confidence interval, taken at a particular level, so that we can gauge the reliability of our estimate. A standard approach to derive confidence intervals is the bootstrap [17], a computational technique that resamples data points with replacement to simulate the distribution of any test statistic under the null hypothesis that is tested. The bootstrap, particularly useful in complicated nonparametric problems where no asymptotic results can be obtained [18], was adapted by Felsenstein to the nonstandard phylogenetic problem [19]. Indeed, the problem is nonstandard in that the object for which we wish to assess accuracy is not a real-valued parameter, but a graph.
However, this approach is no longer used or cited extensively since 2008 (source: ISI Thompson). One alternative that has gained momentum is the one based on the approximated likelihood ratio test (aLRT) [22], implemented, for instance, in phyml [23, 24]. Instead of resampling any quantity (sites or sitewise log-likelihood values), the aLRT tests the null hypothesis that an interior branch length is zero. In spite of being slightly conservative in simulations, the approach is extremely fast and hence highly practical [22].
The meaning of the bootstrap has been a matter of debate for years. As noted before [8] (see also [22]), the bootstrap proportion P can be seen as assessing the correctness of an internal node, and failing to do so [25], or 1 − P can be interpreted as a conservative probability of falsely supporting monophyly [26]. Since bootstrap proportions are either too liberal or too conservative depending on the actual interpretation of P [27], it is difficult to adjust the threshold below which monophyly can be confidently ruled out [28]. Alternatively, an intuitive geometric argument was proposed to explain the conservativeness of bootstrap probabilities [18] and was further developed into the approximately unbiased or AU test, implemented in CONSEL [29]. In spite of these difficulties, the bootstrap is still widely used—and mandatory in all publications featuring a phylogeny—to assess the confidence one can have in the tree estimated from the data under a particular scheme or model (see Subheading 2.9.3 below). Lastly, note that bootstrap support has often been abused [30], as a high value does not necessarily indicate high phylogenetic signal, and can be the result of systematic biases [31] due to the use of the wrong model of evolution, for instance, as detailed below.
2.3 Parsimony and LBA
Now that we have a means of evaluating the support for the different topologies, we can test some of the conditions under which parsimony estimates the correct tree topology. Ideally, a good method should return the correct answer with a probability of one when the number of sites increases to infinity. This desirable statistical property is called consistency. One serious criticism of parsimony is its sensitivity to long branch attraction, or LBA, even in the presence of an infinite amount of data (infinite alignment length) [31]. In other words, parsimony is not statistically consistent.
The LBA artifact has been shown to plague the analysis of numerous data sets, and a number of empirical approaches have been used to detect the artifact [36, 37]. Most recent papers based on multigene analyses (e.g., [38, 39]) now examine carefully the effect of across-site and across-lineage rate variation (in addition to the use of heterogeneous models). For both sites and lineages, the procedure is the same and consists in successively removing either the sites that evolve the fastest or the taxa that show the longest root-to-tip branch lengths.
2.4 Origin of the Problem
Consequently, we would like a tree-reconstruction method that accounts for multiple substitutions. We would also like a method that (1) takes into account less parsimonious as well as most parsimonious state reconstructions (intervals, tests), (2) weights changes differently if they occur on branches of different length (evolutionary time), and (3) weights different kinds of events (transitions, transversions) differently (biological realism). Likelihood methods include such considerations explicitly, as they require modeling the substitution process itself.
2.5 Modeling Molecular Evolution
The process we want to model should describe the substitution process of the different nucleotides of a DNA sequence. Again, we will make the simplifying assumption that sites evolve under a time-homogeneous Markov process and are iid, as above. We can therefore concentrate on one single site for now (e.g., [41]).
Except in the simplest models of evolution, finding analytical solutions for the eigenvalues and associated eigenvectors can be tedious. As a result, numerical procedures are employed to solve Eq. 8. Alternatively, a Taylor expansion can be used to approximate P(t).
If all entries in Q are positive, any state or nucleotide can be reached from any other in a finite number of steps (all states “communicate”) and the base frequencies have a stationary distribution π = (π_{T}, π_{C}, π_{A}, π_{G}). This is the steady state reached after an “infinite” amount of time, or long enough for the Markov process to forget its initial state, starting from “random” base frequencies.
2.6 Computation on a Tree
Note that this example is somewhat artificial: with only two sequences, we can compute the likelihood directly with π_{T}P_{T,C}(t_{0} + t_{1}) = π_{C}P_{C,T}(t_{0} + t_{1}); the full summation over unknown states as in Eq. 9 is required with three sequences or more. When analyzing a multiple-sequence alignment of S sequences, there will be many nodes in the tree for which the character state is unknown, which means that the summation required will involve many terms. Specifically, the sum will be over 4^{S−3} terms. Fortunately, terms can be factored out of the summation, and a dynamic programing algorithm with a complexity of the order of \( \mathcal{O}\left({4}^2S\right) \), called the pruning algorithm [44], can be used (see [15] for details).
2.7 Substitution Models and Instantaneous Rate Matrices Q
Of course, this model is extremely simplistic and neglects a fair amount of basic molecular biology. In particular, it overlooks two observations. First, base frequencies are not all equal in actual DNA sequences, but are rather skewed, and second, transitions are more frequent than transversions (see Subheading 2.5).
In fact, there exist only a few “named” additional substitution models [15], most of which are time-reversible models, while a total of 203 models can be derived from GTR [49]. We have focused solely on DNA models in this chapter, but the problem is similar with amino acid or codon models, except that the number of parameters increases quickly. We have also limited ourselves to time-reversible time-homogeneous models, but irreversible non-homogeneous models were developed some time ago [50] and are used, for instance, to root phylogenies [51] or to help alleviate the effects of LBA [39].
2.8 Some Computational Aspects
2.8.1 Optimization of the Likelihood Function
For a given substitution model, how should parameters be estimated, given the (potentially) high dimensionality of the model? Analytical solutions consist in determining when the first derivative of the likelihood function is equal to zero (with a change of sign in the second derivative). However, finding the root of the likelihood function analytically is only possible in the simple case of three sequences of binary characters under the assumption of the molecular clock (see Subheading 3.1) [12]. As a result, numerical solutions must be found to maximize the likelihood function.
How general is this result? Simultaneously optimizing parameters of the substitution model, while optimizing branch lengths one at a time, was shown to be more effective on large data sets [43], potentially because of the correlation that exists between some of the parameters entering the Q matrix (see Subheading 2.7).
2.8.2 Convergence
In many instances though, different substitution models will give different tree topologies, and therefore different biological conclusions. One difficulty is therefore to know which model should be used to analyze a particular data set.
2.9 Selection of the Appropriate Substitution Model
One important issue in model selection is about the trade-off between bias and variance [55]: a simple model will fail to capture all the sophistication of the actual substitution process, and will therefore be highly biased even if all the parameters can be estimated with tight precision (little variance). Alternatively, a highly parameterized model will “spread” the information available from the data over a large number of parameters, hereby making their estimation difficult (flat likelihood surface; see Subheading 2.8.1), with a large variance, in spite of perhaps being a more realistic model with less bias. The objective of most model selection procedure is therefore to find not the best model in terms of likelihood score, but the most appropriate model, the one that strikes the right balance between bias and variance in terms of number of parameters. However, we argue that optimizing for this bias–variance trade-off works only for statistical procedures, be they, for instance, frequentist (LRT, likelihood ratio test) or Bayesian (BF, Bayes factor). On the other hand, information-theoretic criteria (e.g., AIC, Akaike information criterion) aim at selecting the model that is approximately closest to the “true” biological process.
The bias–variance trade-off mainly concerns the comparison of models that are based on the same underlying rationale, for instance, choosing among the 203 models that can be derived from GTR. We may also be interested in comparing models that are based on very different rationales. The likelihood ratio test is suited for assessing the bias–variance trade-off, while Bayesian and information-theoretic approaches, as well as cross-validation (CV), can be used for more general model comparisons. Here we review four approaches to model selection: LRT, BF, AIC, and CV.
2.9.1 The Likelihood Ratio Test
The substitution models presented above have one key property: it is possible to reduce the most sophisticated time-reversible named model (GTR+Γ+I) to any simpler model by imposing some constraints on parameters. As a result, the models are said to be nested, and statistical theory (the Neyman–Pearson lemma) tells us that there is an optimal (most powerful) way of comparing two nested models (a simple null vs. a simple alternative hypothesis) based on the likelihood ratio test or LRT.
The test statistic of the LRT is twice the log-likelihood difference between the most sophisticated model (which by definition is always the one with the highest likelihood—if this is not the case, there is a convergence issue; see Subheading 2.8.1) and the simpler model. This test statistic follows asymptotically a χ^{2} distribution (under certain regularity conditions), and the degree of freedom of the test is equal to the difference in the number of free parameters between the two models.
The null hypothesis is that the two competing models explain the data equally well. The alternative is that the most sophisticated model explains the data better than the simpler model. If the null hypothesis cannot be rejected at a certain level (type-I error rate), then, based on the argument developed above, the simpler model should be used to analyze the data. Otherwise, if the null hypothesis can be rejected, the more sophisticated model should be used to analyze the data. Note that a test never leads to accepting a null hypothesis; the only outcomes of a test are either reject or fail to reject a null hypothesis.
An approach that has become popular under the widespread adoption of computer programs such as ModelTest [66] and jModelTest [67] is the hierarchical LRT (hLRT). This hierarchy goes from the simplest model (JC) to the set of most complex models (+Γ+I), traversing a tree of models. The issue is that there is more than one way to traverse this tree of models, and that depending on which way is adopted, the procedure may end up selecting different models [68, 69].
2.9.2 Information-Theoretic Approaches
Information theory provides us with a number of solutions to circumvent the three limitations of the LRT (nestedness, continuity, and dependency on the order in which models are compared).
Two points are key to deriving the criterion proposed by Akaike (see [55]). First, we usually want to compare at least two approximating models, g_{0} and g_{1}. We can then measure which one is closest to the “true” process f by taking the difference between their respective Kullback–Leibler distances. In the process, the direct reference to the “true” process cancels out. As a result, the “best” model among g_{0} and g_{1} is the one that is closest to the “true” process f: it is the model that minimizes the distance to f. By setting model parameters to their MLEs, we now deal with estimated distances, but these are still with respect to the unknown f.
A small-sample second-order version of AIC exists, where the penalty for extra parameters (2K in Eq. 20) is slightly modified to account for the trade-off between information content in the data and K (see [55]). In our experience, we find it advisable to use this small-sample correction irrespective of the actual size of the data, since this correction vanishes in large and informative samples, but corrects for proper model ranking when K becomes very large compared to the amount of information (e.g., in phylogenomics where models are partitioned with respect to hundreds of genes).
The AIC has been shown to tend to favor parameter-rich models [71, 72, 73, 74, 75], which has motivated the use and development of alternative approaches in computational molecular evolution. These include, the Bayesian information criterion [76], and the decision theory or DT approach, which is based on ΔAIC weighted by squared branch length differences [71]. Most of these approaches, including the hLRT, have recently been compared in a simulation study that suggests, in agreement with empirical studies [72, 77], that both BIC and DT have the highest accuracy and precision [75].
One particular drawback of these information-theoretic approaches is that they require that every single model of evolution, or at least the most “popular” models (the few named ones), be evaluated. This step can be time-consuming, especially if a full maximum likelihood optimization is performed under each model. A first set of heuristics consists in fixing the tree topology to a tree estimated with a quick distance-based method such as BioNJ [78], and then estimating just the branch lengths and the parameters of the substitution model, as implemented in jModelTest [67]. As the optimizations are independent of each other under each substitution model, these computations are typically forked to multiple cores or processors [79]. Further heuristics exist to avoid all these independent optimizations [79], as implemented in SMS (Smart Model Selection in PhyML), which is reported to be cutting runtimes in half without forfeiting accuracy [80].
Note finally that all these approaches are not limited to selecting the most appropriate or the best model of evolution. Disregarding the hLRT, which requires that models be nested (to be able to use the χ^{2} approximation; otherwise, see [65]), AIC, BIC, etc. allow us to compare non-nested models and, in particular, phylogenetic trees (branch lengths plus topology).
2.9.3 The Bayesian Approach
The Bayesian framework has permitted the development of two main approaches, which are actually two sides of the same coin: one based on finding the model that is the most probable a posteriori, and one based on ranking models and estimating a quantity called the Bayes factor.
In a nutshell, the frequentist approaches developed in the previous sections are based on the likelihood, which is the probability of the data, given the parameters: p(X|θ). However, this approach may not be the most intuitive, since most practitioners are not interested in knowing the conditional probability of their data, as the data were collected to learn more about the processes that generated them. It can therefore be argued that the Bayesian approach, which considers the probability of the parameters given the data or p(θ|X), is more intuitive than the frequentist approach. Unlike likelihood, which relies on the function p(X|θ) and permits point estimation, Bayesian inference is based on the posterior distribution p(θ|X). This distribution is often summarized by a centrality measure such as its mode, mean, or median. Measures of uncertainty are based on credibility intervals, the Bayesian equivalent of confidence intervals. Typically, credibility intervals are taken at the 95% cutoff and are called highest posterior densities (HPDs).
Building on this, two approaches can be formulated to compare models in a Bayesian framework. The first is to treat the model as a “random variable,” and compute its posterior probability. The best model is then the one that has the highest posterior probability. This approach is typically implemented in a reversible-jump MCMC (or rjMCMC) sampler (e.g., see [49]).
A number of approximations to evaluate Eq. 25 exist and are reviewed in [85] (see also [86, 87]). The simplest one is based on the harmonic mean of the likelihood sampled from the posterior distribution [88], also known as the harmonic mean estimator (HME). The way this estimator is derived demands to understand how integrals can be approximated. Briefly, to compute \( I=\int g\left(\theta \right)\kern0.3em p\left(\theta \right)\kern0.3em d\theta \), generate a sample from a distribution p^{⋆}(θ) and calculate the simulation-consistent estimator \( I=\sum {w}_i\kern0.3em g\left(\theta \right)/ \sum {w}_i \) where w_{i} is the importance function p(θ)∕p^{⋆}(θ). Take g = p(X|θ) and p^{⋆}(θ) = p(X|θ) p(θ)∕p(X), then \( \widehat{I}=\widehat{p}\left(X|{M}_0\right)={\lim}_{N\to \infty }{\left(\frac{1}{N}\sum \frac{1}{p\left(X|{\theta}_i\right)}\right)}^{-1} \) with θ ∼ p(θ|X) (see supplementary information in [89]). As a result, a very simple way to estimate the marginal likelihood and Bayes factors is to take the output of an MCMC sampler and compute the harmonic mean of the likelihood values (not the log-likelihood values) sampled from the posterior distribution.
Because of its simplicity, this estimator is now implemented in most popular programs such as MrBayes [90] or BEAST [91]. However, it might be considered as the worst estimator possible, because its results are unstable [88, 92] and biased towards the selection of parameter-rich models [86]. An alternative and reliable estimator, based on thermodynamic integration (TI; [86]—also known as path sampling; [93, 94]), is much more demanding in terms of computation. Indeed, it requires running MCMC samplers morphing one model into the other (and vice versa), which can increase computation time by up to an order of magnitude [86]. Improvements of the TI estimator are however available. The stepping-stone (SS) approach builds on importance sampling and TI to speed up the computation while maintaining the accuracy of the standard TI estimator [87, 95].
Moving away from the estimation of marginal likelihoods, an analogue of AIC that can be obtained through the output of an MCMC sampler (AICM) was proposed [96]. In essence, it relies on the asymptotic convergence of the posterior distribution of the log-likelihood on a gamma distribution [97]. As such, it becomes possible to estimate the effective number of parameters as twice the sample variance of posterior distribution of the log-likelihood, which itself can be estimated by a resampling procedure [96]. This gives a very elegant means of estimating AIC, from the posterior simulations. However, although AICM seems to be a more stable measure of model ranking than HME, both TI and SS still seem to outperform this estimator, at least in the case of the comparison of demographic and relaxed molecular clock models [96] (see Subheading 3).
2.9.4 Cross-Validation
Cross-validation is another model selection approach, which is extremely versatile in that it can be used to compare any set of models of interest. Besides, the approach is very intuitive. In its simplest form, cross-validation consists in dividing the available data into two sets, one used for “training” and the other one used for “validating.” In the training step (TS), the model of interest is fitted to the training data in order to obtain a set of MLEs. These MLEs are then used to compute the likelihood using the validation data (validation step, VS). Because the validation data were not part of the training data, the likelihood values computed during VS can be directly used to compare models, without requiring any explicit correction for model dimensionality.
The robustness of the cross-validation scores can be explored in various ways, such as repeating the above procedure with a switched labeling of training and validation data (hence the expression cross-validation). Of course, this simple 2-fold cross-validation could be extended to n-fold cross-validation, where the data are subdivided into n subsets, with n − 1 subsets serving for training, and one for validation. Ideally, the procedure is repeated n − 1 additional times.
We know of only two examples of its use in phylogenetics, one in the ML framework [98] and one with a Bayesian approach [99]. Given the increasing size of modern data sets, putting aside some of the data for validation is probably not going to dramatically affect the information content of the whole data set. As a result, model selection via cross-validation, which is statistically sound, could become a very popular approach.
2.10 Finding the Best Tree Topology
2.10.1 Counting Trees
Now that we can select a model of evolution (Subheading 2.9) and estimate model parameters (Subheading 2.8) under a particular model (Subheading 2.5), how do we find the optimal tree? The basic example in Subheading 2.1 suggested that we score all possible tree topologies and choose for inference the one that has the highest score. However, a simple counting exercise shows that an exhaustive examination of all possible topologies is not realistic.
Counting tree topologies
Number of taxa | Unrooted tree | Rooted trees |
---|---|---|
3 | 1 | 3 |
4 | 3 | 15 |
5 | 15 | 105 |
6 | 105 | 945 |
10 | 2,027,025 | 34,459,425 |
20 | 221,643,095,476,699,771,875 | 8,200,794,532,637,891,559,375 |
As a result, the number of possible topologies quickly becomes very large when the number n of sequences increases, even with a very modest n, so that heuristics become necessary to find the best-scoring tree.
2.10.2 Some Heuristics to Find the Best Tree
The simplest approach builds upon the idea presented in Figs. 9 and 10. Stepwise addition, for instance, starts with three sequences drawn at random among the n sequences to be analyzed, and adds sequences one at a time, keeping only the tree that has the highest score at each step (e.g., [52]). However, there is no guarantee that the final tree is the optimal tree [44]. The idea behind branch-and-bound [102], refined in [103], is to have a look-ahead routine that prevents entrapment in suboptimal trees. This routine sets a bound on the trees selected at each round of additions, such that only the trees that have a score at least as good as that of the trees obtained in the next round are kept in the search algorithm. Solutions found by the branch-and-bound algorithm are optimal, but computing time becomes quickly prohibitive with more than 20 sequences.
As a result, most tree-search algorithms will start with a quickly obtained tree, often reconstructed with an algorithm based on pairwise distances such as neighbor-joining [104] or a related approach [78, 105], and then alter the tree randomly until no further improvement is obtained or after a certain number of unsuccessful attempts are reached. Examples of such algorithms include nearest neighbor interchange (NNI), subtree pruning and regrafting (SPR), or tree bisection and reconnection (TBR), see, for instance, [6] for a full description. While the details are of little importance here, the critical point is the extent of topological rearrangement in each case. With, e.g., NNI, each rearrangement can give rise to two topologies. The result is that exploring the topology space is slow, especially in problems with large n. On the other hand, TBR has, among the three methods cited above, the largest number of neighbors. As a result, the topology space is explored quickly, but the optimal tree can be “missed” simply because a dramatic change is attempted, so that the computational cost increases. Alternatively, the chance of finding the optimal tree \( \widehat{\tau} \) when \( \widehat{\tau} \) is very different from the current tree is higher when the algorithm can create some dramatic rearrangements. Some programs, such as PhyML ver. 3.0, now use a combination of NNI and SPR to address this issue [24]. MCMC samplers that search the tree space implement somewhat similar tree-perturbation algorithms that are either “global,” and modify the topology dramatically, or “local” [106] (see also [107] for a correction of the original local moves). As a result, MCMC samplers are affected by the same issues as traditional likelihood methods. Much of the difficulty therefore comes from this kind of trade-off between larger rearrangements that are expected to improve accuracy and the computational burden associated with these extra computations [108].
2.10.3 Cutting Corners with ABC and AI
As some of the above computations can become quite costly (high runtimes, heavy memory footprints, poor scalability with large data sets, etc.), computational workarounds have been and are being explored. One of these resorts to approximate Bayes computing (ABC), which is essentially a likelihood-free approach. First developed in the context of population genetics [109, 110], the driving idea is to bypass the optimization procedures and replace them with simulations in the context of a rejection sampler. In population genetics, the problem could be about a gene tree, which is usually appropriately described by a coalescence tree [111, 112], for which we want to estimate some model parameters. As we are able to simulate trees from such a process, it is possible to place prior distributions on these model parameters, and simulate trees by drawing parameters until the simulated trees “look like” the observed tree. The set of parameters thus drawn approximates the posterior distribution of the corresponding variables. This forms the basis of a naïve rejection sampler, that is quite flexible as it does not even require that a probabilistic model be formulated, but one that can be inefficient, especially if the posterior distribution is far from the prior distribution—which is usually the case. As a result, a number of variations have been described, trying either to correlate sample draws as in MCMC samplers [113] or to resample sequentially from the past [114, 115]. In spite of recent reviews of the computational promises and deliveries of ABC samplers [116, 117, 118], the few applications in molecular evolution have been, to date, mostly limited to molecular epidemiology [119, 120, 121, 122]. One of the major challenges to estimate a phylogenetic tree from a sequence alignment with ABC is the lack of a proper and efficient simulation strategy: it is possible to simulate trees under various processes (we saw the coalescent above), it is also possible to simulate an alignment from a given (possibly simulated tree), so that in theory one could imagine an ABC algorithm that would use this backward process to estimate phylogenetic trees by comparing a simulated alignment to an “actual” alignment. This, however, would most likely be a very inefficient sampler.
A second area that holds promises is the use of artificial intelligence (AI), and more specifically of machine learning (ML), in molecular evolution. Here again, attempts have been made to using standard ML approaches such as support vector machines [123] to guide the comparison of tree shapes, for instance, [124], which can then be used in epidemiology [121], but estimating a phylogenetic tree has proved more challenging. In one notable exception, an alignment-free distance-based tree-reconstruction method was proposed [125], but its main legacy seems to be in the development of k-mers, or unaligned sequences chopped into words of length k, to reconstruct phylogenetic trees—in particular in the context of phylogenomics (phylogenetics at a genomics scale) [126, 127]. To the best of our knowledge, nobody has ever tried, yet, to train a neural network or even a deep learning algorithm [128, 129, 130] on a database of phylogenetic trees with corresponding alignments such as TreeBASE [131] or PANDIT [132]. As applications of deep learning start emerging in genomics [133] and proteomics [134], it is likely that phylogenetics will come next.
3 Uncovering Processes and Times
3.1 Dating the Tree of Life: Always Deeper?
Similar to the problem of estimating the tree of life, dating the tree of life poses many challenges [135]. Since it was first proposed in 1965 [40], the idea of estimating divergence times has since undergone a dramatic change, and new approaches are regularly proposed. Population geneticists have their own approaches, which are either fully Bayesian [136] or based on approximate Bayesian computation in the coalescent framework [137]. All these approaches make it possible to infer divergence times between recently diverged species, as in the case of humans and chimpanzees, or to date demographic events such as the migrations “out of Africa” of early human populations [138].
In the context of molecular evolution, we are usually interested in estimating deeper divergence times, such as those between species, which are available online, for instance, at www.timetree.org [139], recently revamped and extended to cover close to 100k species [140]. While early “molecular dates” were systematically biased towards ages that are too old [135], we argue here that recent developments in the field have led to more accurate methods and also to a better understanding of methodological limitations.
3.1.1 The Strict Molecular Clock
In this example (Fig. 11), the branch length from the fossil-dated node is 0.1 substitutions/site (sub/site), and the fossil was estimated to be present 10 million years ago (MYA). Under the strict molecular clock assumption (equal rates over the whole tree), we can (1) estimate the rate of evolution (0.1∕10 = 0.01 sub/site/my) and (2) date all the other nodes on the tree. For instance, the most recent common ancestor of S2 and S5 is separated from the tips by a branch length of 0.02 sub/site. Its divergence time is therefore 0.02∕0.01 = 2 MYA.
This linear regression model suggested by the molecular clock hypothesis has often been portrayed as a recipe [147], which gave rise in the late twentieth to early twenty-first century to a veritable cottage industry [148, 149, 150, 151], culminating with a paper suggesting that the age of the tree of life might be older than the age of planet Earth [152]. This recipe was put down by two factors: (1) the publication of a piece written in a rather unusual style for a scientific paper [153], and (2) new methodological developments. The main points made in [153] are that (1) most of the early dating studies relied on one analysis [149] that used a fossil-based calibration point for the divergence of birds at 310 MYA to estimate a number of molecular dates for vertebrates, and that (2) these molecular dates were then used in subsequent studies as a proxy for calibration points, disregarding their uncertainty. As a result, estimation errors were passed on and amplified from study to study, leading to the nonsensical results in [152].
3.1.2 Local Molecular Clocks
This “debacle” has motivated further theoretical developments in the dating field. The simplest idea is that, if a global clock does not hold for the entire tree, then perhaps groups of related species share the same rate. That is, if a global clock does not hold, perhaps the tree can be subdivided into local molecular clocks. An initial idea was proposed in the context of quartets of sequences [154] and was later generalized to a tree of any size with any number of local clocks on the tree [155] (constrained by the number of branches on the tree and calibration points). Because of the arbitrariness of such local clocks, methods have been devised to place the clocks on the tree [156] and to estimate the appropriate number of clocks that should be used [157]. A Bayesian approach now estimates all these parameters and their placement in an integrated statistical framework [158].
3.1.3 Correlated Relaxed Clocks
The idea of a correlated relaxed molecular clock goes back to Sanderson [159] (see also [160]), who considered that rates of evolution can change from branch to branch on a tree. By constraining rates of evolution to vary in an autocorrelated manner on a tree, it is possible to devise a method that minimizes the amount of rate change.
3.1.4 Uncorrelated Relaxed Clocks
Because of the autocorrelation between the rate of each branch and that of its ancestral branch (except for the root, which obviously requires a special treatment), the tree topology is fixed under the autocorrelated models described above. By relaxing this assumption about rate autocorrelation, [172] were able to implement a model that also integrates over topological uncertainty. In spite of the somewhat counter-intuitive nature of the relaxation of the autocorrelated process, as implemented in BEAST [91, 173], empirical studies have found this approach to be one of the best-performing (e.g., [157]).
When first published, it was proposed that making use of an uncorrelated relaxed molecular clock could improve phylogenetic inference [172]. The idea was that calibration points and their placement on the tree could act as additional information. However, a simulation study suggests that relaxed molecular clocks might not improve phylogenetic accuracy [174], a result that might be due to the lack of calibration constraints in this particular simulation study.
3.1.5 Some Applications of Relaxed Clock Models
Since the advent of relaxed molecular clocks, two very exciting developments have seen the light of day. The first concerns the inclusion of spatial statistics into dating models [175, 176]. Spatial statistics are not new in population genetics [177] and have been used with success in combination with analyses in computational molecular evolution (e.g., [178]). However, the originality in [176], for instance, is to combine in a single statistical framework molecular data with geographical and environmental information to infer the diffusion of sequences through both space and time. While these preliminary models seem to deal appropriately with natural barriers to gene flow such as coastlines, a more detailed set of constraints on gene flow may further enhance their current predictive power.
The second development coming from relaxed molecular clocks concerns the mapping of ancestral characters onto uncertain phylogenies. This is not a novel topic, as a Bayesian approach was first described in 2004 [179, 180]. The novelty is that we now have the tools to correlate morphological and molecular evolution in terms of their absolute rates and to allow both molecular and morphological rates of evolution to vary in time [181]. Further development will certainly integrate over topological uncertainty. While there has been a heated controversy about the existence of such a correlation in the past [182], all previous studies were using branch length as a proxy for rate of molecular evolution, which is clearly incorrect. We can therefore expect some more accurate results on this topic very soon. More details and examples can be found in recent and extensive reviews [183, 184, 185] that further discuss applications to biogeographic studies [186], or extensions to viral [187, 188], as well as other types of genomic [189] and morphological [190] data.
4 Molecular Population Phylogenomics
Population genetics is rich in theory regarding the relative roles of mutation, drift, and selection. Much research in population genomics is now focusing on using this theory to develop statistical procedures to infer past processes based on population-level data, such as those of the 1000-genome project [191], the UK’s 10,000 genome project [192], and always more ambitious projects [193]. One limitation of these inference procedures is that they all focus on a thin slice of evolutionary time by studying evolution at the level of populations. If we wish to study longer evolutionary time scales, for example, tens or hundreds of millions of years, we must resort to interspecific data. In such a context, which is becoming intrinsically phylogenetic, the most important event is a substitution, that is, a mutation that has been fixed. Yet substitution rates can be defined from several features. In particular, from a population genetics perspective, it is of interest to model both mutational features and selective effects, combining them multiplicatively to specify substitution rates. We review briefly how substitution models that invoke codons as the state space lend themselves naturally to these objectives in a first section below (Subheading 4.1), before explaining the origin (and a shortcoming) of all the approaches developed so far (Subheading 4.2).
4.1 Bridging the Gap Between Population Genetics and Phylogenetics
Assuming a point-mutation process, such that events only change one nucleotide of a codon during a small time interval, Muse and Gaut proposed a codon substitution model with rates specified from the Q_{GTR} nucleotide-level matrix (see Subheading 2.7), along with one parameter that modulates synonymous events and another one that modulates nonsynonymous events [194]. In most subsequent formulations, the parameter associated with synonymous events is assumed to be fixed, such that the model only modulates nonsynonymous rates by means of a parameter denoted ω. This parameter has traditionally been interpreted as the nonsynonymous to synonymous rate ratio, and is generally associated with a different formulation of the codon model proposed by Goldman and Yang [195]. More details on codon models can be found in Chapter 4.1 [196]. There continues to be a debate regarding the interpretation of the ω parameter [197, 198]. Regardless of how this issue is settled, it is clear that ω is aimed at capturing the net overall effects of selection, irrespective of the exact nature of these effects.
With the intention to model selective effects themselves, Halpern and Bruno proposed a codon substitution model that combines a nucleotide-level layer, as described above, for controlling mutational features, along with a fixation factor that is proportional to the fixation probability of the mutational event [199]. The fixation factor is in turn specified from an account of amino acid or codon preferences. One objective of the model, then, consists in teasing apart mutation and selection. While [199] proposed their model with site-specific fixation factors, later work has explored simpler specifications, where all sites have the same fixation factor [200]. Other models that aimed at capturing across-site heterogeneities in fixation factors were proposed using nonparametric devices and empirical mixtures [201]. Another core idea behind these approaches is to construct a more appropriate null model against which to test for features of the evolutionary process. This idea has been put into practice for the detection of adaptive evolution in protein-coding genes [202, 203]. Recent developments include sequence-wide fixation factors [9, 197, 204, 205], and we predict that these models will play a role in bridging the gap between molecular evolution at the population and at the species levels.
4.2 Origin of Mutation–Selection Models: The Genic Selection Model
In order to understand a shortcoming of these models, we need to go back to the development of fixation probabilities that took place in the second half of the twentieth century. The basic unit or quantum of evolution is a change in allele frequency p. Allele frequencies can be affected by four processes: migration, mutation, selection, and drift. Because of the symmetry between migration and mutation [206], which only differ in their magnitude, these two processes can be treated as one. We are left with three forces: mutation, selection, and drift. The question is then, what is the fate of an allele under the combined action of these processes? Our development here follows [207] (but see [208] for a very clear account).
4.2.1 Fixation Probabilities
Of the three processes affecting allele frequencies, mutation and selection can be seen as directional forces in that their action will shift the distribution of allele frequencies towards a particular point, be it an internal equilibrium, or fixation/loss of an allele. On the other hand, drift is a non-directional process that will increase the variance in allele frequencies across populations, and will therefore spread out the distribution of allele frequencies. This distribution is denoted Ψ(p, t). We also must assume that the magnitude of all three processes, mutation, selection, and drift, is small and of the order of \( \frac{1}{2{N}_e} \), where N_{e} is the effective population size. To derive the fate of an allele after a certain number of generations, we also need to define g(p, ε;dt), the probability that allele frequency changes from p to p + ε during a time interval dt.
4.2.2 The Case of Genic Selection
The standard selection models
Selection coefficients | A _{1} A _{1} | A _{1} A _{2} | A _{2} A _{2} |
Genic (positive) selection | w_{1} = 1 + s | w_{2} = 1 + hs | w_{3} = 1 |
Overdominance | w_{1} = 1 | w_{2} = 1 + s | w_{3} = 1 |
Two critical points should be noted here. First, none of the recent codon models [197, 199, 200, 201, 202, 204, 210, 211] ever investigated the role of dominance h, as they all consider that the allele under (positive) selection is fully dominant. Second, Table 3 shows that another class of selection models, those based on balancing selection, has never been considered so far. The impact of the selection model on the predictions made by the mutation–selection (-drift) models is currently unknown.
5 High-Performance Computing for Phylogenetics
5.1 Parallelization
Because of the dependency of the likelihood computations on the shape of a particular tree (see Subheading 2.6), most phylogenetic computations cannot be parallelized to take advantage of a multiprocessor (or multicore) environment. Nevertheless, two main directions have been explored to speed up computations: first, in computing the likelihood of substitution models that incorporate among-site rate variation and second, in distributing bootstrap replicates to several processors, as both types of computations can be done independently. A third route is explored in Chapter 7.4 [212].
In the first case, among-site rate variation is usually modeled with a Γ distribution [213] that is discretized over a finite (and small) number of categories [214]. The likelihood then takes the form of a weighted sum of likelihood functions, one for each discrete rate category, so that each of these functions can be evaluated independently. The route most commonly used is the plain “embarrassingly parallel” solution, where completely independent computations are farmed out to different processors. Such is the case for bootstrap replicates, for which a version of PhyML [24] exists, or in a Bayesian context for independent MCMC samplers [215] (see Subheading 2.9.3). The PhyloBayes-MPI package implements distributed likelihood calculations across sites over several compute-cores, allowing for a genuinely parallelized MCMC run [216, 217].
5.2 HPC and Cloud Computing
More recent work has focused on the development of heuristics that make large-scale phylogenetics amenable to high-performance computing (HPC) that are performed on computer clusters. Because of the algorithmic complexity of resolving phylogenetic trees, an approach based on “algorithmic engineering” was developed [218]. The underlying idea is akin to the training phase in supervised machine learning [123], except that here the target is not the performance of a classifier but that of search heuristics. All of these heuristics reuse parameter estimates, avoid the computation of the full likelihood function for all the bootstrap replicates, or seed the search algorithm for every n replicate on the results of previous replicates [218]. For instance, in the “lazy subtree rearrangement” [219], topologies are modified by SPR (see Subheading 2.10.2), but instead of recomputing the likelihood on the whole tree, only the branch lengths around the perturbation are re-optimized. This approximation is used to rank candidate topologies, and the actual likelihood is evaluated on the complete tree only for the best candidates. These heuristics now permit the analysis of thousands of sequences in a probabilistic framework [220], but the actual convergence of these algorithms remains difficult to evaluate, especially on very large data sets (e.g., >10^{4} sequences).
In addition to the reduction of the memory footprint for sparse data matrices [221], an alternative direction to “tweaking likelihood algorithms” has been to take direct advantage of the computing architecture available. One particular effort aims at tapping directly into the computing power of graphics processing units or GPUs, taking advantage of their shared common memory, their highly parallelized architecture, and the comparatively negligible cost of spawning and destroying threads on them. As a result, it is possible to distribute some of the summation entering the pruning algorithm (see Subheading 2.6) to different GPUs [222]. The number of programs taking advantage of these developments is widening and includes popular options such as BEAST [91] and MrBayes [223].
All these fast algorithms can be installed on a local computer cluster, a solution adopted by many research groups since the late 1990s. However, installing a cluster can be demanding and costly because a dedicated room is required with appropriate cooling and power supply (not to mention securing the room, physically). Besides, redundancy requirements, both in terms of power supply and data storage, as well as basic software maintenance and user management, may demand hiring a system administrator. An alternative is to run analyses on a remote HPC server, in the “cloud.” Canada, for instance, has a number of such facilities, thanks to national funding bodies (CAC at cac.queensu.ca, SHARCNET at www.sharcnet.ca, or Calcul Quebec at www.calculquebec.ca, just to name a few), and commercial solutions are just a few clicks away (e.g., Amazon Elastic Compute Cloud or EC2). Researchers can obtain access to these HPC solutions according to a number of business models (free, on demand, yearly subscription, etc.) that are associated with a wide spectrum of costs [224]. But in spite of the technical support offered in the price, users usually still have to install their preferred phylogenetic software manually or put a formal request to the team of system administrators managing the HPC facility, all of which is not always convenient.
To make the algorithmic and technological developments described above more accessible, the recent past has seen the emergence of cloud computing [225] dedicated to the phylogenetics community. Examples include the CIPRES Science Gateway (www.phylo.org), or Phylogeny.fr (www.phylogeny.fr, [226]). Many include web portals that do not require that users be well versed in Unix commands, while others may include an application programming interface to cater to the most computer-savvy users. One potential limitation of these services is the bandwidth necessary to transfer large files, and storage requirements—especially in the context of next generation sequencing data. The management of relatively large files will remain a potential issue, unless phylogenetics practitioners are ready to discard these files after analysis, the end product of which is a single tree file a few kilobytes in size, in the same way that people involved in genome projects delete the original image files produced by massively parallel sequencers. Data security or privacy might not be a problem in most applications, except in projects dealing with human subjects or viruses such as HIV that expose the sexual practices of subjects. However, once these various hurdles are out of the way, users could very well imagine running their phylogenetic analyses with millions of sequences from their smartphone while commuting.
6 Conclusions
Although most of the initial applications of likelihood-based methods were motivated by the shortcomings of parsimony, they have now become well accepted as they constitute principled inference approaches that rely on probabilistic logic. Moreover, they allow biologists to evaluate more rigorously the relative importance of different aspects of evolution. The models presented in this chapter have the ability to disentangle rates from times (Subheading 3), or mutation from selection (Subheading 4), while in most cases accounting for the uncertainty about nuisance parameters. But the latest developments described above still make a number of restrictive assumptions (Subheading 4.2), and while many variations in model formulations can be envisaged, they still remain to be explored in practice.
Although some progress has been made in developing integrative approaches (e.g., [176, 181]), throughout this chapter we have assumed that a reliable alignment was available as a starting point. A number of methods exist to co-estimate an alignment and a phylogenetic tree (see Part I of this book), but the computational requirements and convergence of some of these approaches can be daunting, even on the smallest data sets by today’s standards.
This brings us, finally, to the issue of tractability of most of these models in the face of very large data sets. The field of phylogenomics is developing quickly (see Part III), at a pace that is ever increasing given the output rate of whole genome sequencing projects. Environmental questions are drawing more and more attention, and metagenomes (see Part VI) will be analyzed in the context of what will soon be called metaphylogenomics. Exploring the numerous available and foreseeable substitution models in such contexts will require continued work in computational methodologies. As such, modeling efforts will continue to go hand-in-hand with, and maybe dependent on, algorithmic developments [227]. It is also not impossible that in the near future, the use of likelihood-free approach such as ABC or machine learning algorithms in computational molecular evolution be more thoroughly explored.
Notes
Acknowledgements
We would like to thank Michelle Brazeau, Eric Chen, Ilya Hekimi, Benoît Pagé, and Wayne Sawtell for their critical reading of a draft of the original chapter, as well as Jonathan Dench and George S. Long for their careful reading of the most recent draft. This work was supported by the Natural Sciences Research Council of Canada (SAB, NR).
References
- 1.Nei M, Kumar S (2000) Molecular evolution and phylogenetics. Oxford University Press, OxfordGoogle Scholar
- 2.Higgs PG, Attwood TK (2005) Bioinformatics and molecular evolution. Blackwell Publishing, OxfordGoogle Scholar
- 3.Balding DJ, Bishop MJ, Cannings C (2007) Handbook of statistical genetics, 3rd edn. Wiley, ChichesterCrossRefGoogle Scholar
- 4.Salemi M, Vandamme A-M, Lemey P (2009) The phylogenetic handbook: a practical approach to phylogenetic analysis and hypothesis testing, 2nd edn. Cambridge University Press, CambridgeGoogle Scholar
- 5.Hall BG (2011) Phylogenetic trees made easy: a how to manual. Sinauer Associates, SunderlandGoogle Scholar
- 6.Yang Z (2014) Molecular evolution: a statistical approach. Oxford University Press, OxfordCrossRefGoogle Scholar
- 7.Drummond AJ, Bouckaert RR (2015) Bayesian evolutionary analysis with BEAST. Cambridge University Press, CambridgeCrossRefGoogle Scholar
- 8.Aris-Brosou S, Xia X (2008) Phylogenetic analyses: a toolbox expanding towards Bayesian methods. Int J Plant Genomics 2008:683509PubMedPubMedCentralCrossRefGoogle Scholar
- 9.Rodrigue N, Philippe H (2010) Mechanistic revisions of phenomenological modeling strategies in molecular evolution. Trends Genet 26:248–252PubMedCrossRefGoogle Scholar
- 10.Yang Z, Rannala B (2012) Molecular phylogenetics: principles and practice. Nat Rev Genet 13:303–314PubMedCrossRefGoogle Scholar
- 11.Aris-Brosou S, Rodrigue N (2012) The essentials of computational molecular evolution. Methods Mol Biol 855:111–152PubMedCrossRefGoogle Scholar
- 12.Yang Z (2000) Complexity of the simplest phylogenetic estimation problem. Proc Biol Sci 267:109–116PubMedPubMedCentralCrossRefGoogle Scholar
- 13.Sober E (1988) Reconstructing the past: parsimony, evolution, and inference. MIT Press, CambridgeGoogle Scholar
- 14.Durbin R, Eddy SR, Krogh A, Mitchison G (1998) Biological sequence analysis: probabilistic models of proteins and nucleic acids. Cambridge University Press, CambridgeCrossRefGoogle Scholar
- 15.Felsenstein J (2004) Inferring phylogenies. Sinauer Associates, SunderlandGoogle Scholar
- 16.Yang Z (2007) PAML 4: phylogenetic analysis by maximum likelihood. Mol Biol Evol 24:1586–1591PubMedPubMedCentralCrossRefGoogle Scholar
- 17.Efron B, Tibshirani R (1993) An introduction to the bootstrap, vol 57. Chapman and Hall, Boca RatonCrossRefGoogle Scholar
- 18.Efron B, Halloran E, Holmes S (1996) Bootstrap confidence levels for phylogenetic trees. Proc Natl Acad Sci USA 93:7085–7090PubMedCrossRefGoogle Scholar
- 19.Felsenstein J (1985) Confidence limits on phylogenies: an approach using the bootstrap. Evolution 39:783–791PubMedPubMedCentralCrossRefGoogle Scholar
- 20.Baldauf SL (2003) Phylogeny for the faint of heart: a tutorial. Trends Genet 19:345–351PubMedCrossRefGoogle Scholar
- 21.Hasegawa M, Kishino H (1989) Confidence limits of the maximum-likelihood estimate of the hominoid three from mitochondrial-DNA sequences. Evolution 43:672–677PubMedGoogle Scholar
- 22.Anisimova M, Gascuel O (2006) Approximate likelihood-ratio test for branches: a fast, accurate, and powerful alternative. Syst Biol 55:539–552CrossRefGoogle Scholar
- 23.Guindon S, Delsuc F, Dufayard J-F, Gascuel O (2009) Estimating maximum likelihood phylogenies with phyml. Methods Mol Biol 537:113–137PubMedCrossRefGoogle Scholar
- 24.Guindon S, Dufayard J-F, Lefort V, Anisimova M, Hordijk W, Gascuel O (2010) New algorithms and methods to estimate maximum-likelihood phylogenies: assessing the performance of PhyML 3.0. Syst Biol 59:307–321CrossRefGoogle Scholar
- 25.Hillis DM, Bull JJ (1993) An empirical test of bootstrapping as a method for assessing confidence in phylogenetic analysis. Syst Biol 42:182–192CrossRefGoogle Scholar
- 26.Felsenstein J, Kishino H (1993) Is there something wrong with the bootstrap on phylogenies? A reply to Hillis and Bull. Syst Biol 42:193–200CrossRefGoogle Scholar
- 27.Yang Z, Rannala B (2005) Branch-length prior influences Bayesian posterior probability of phylogeny. Syst Biol 54:455–470PubMedCrossRefGoogle Scholar
- 28.Berry V, Gascuel O (1996) On the interpretation of bootstrap trees: appropriate threshold of clade selection and induced gain. Mol Biol Evol 13:999CrossRefGoogle Scholar
- 29.Shimodaira H, Hasegawa M (2001) CONSEL: for assessing the confidence of phylogenetic tree selection. Bioinformatics 17:1246–1247PubMedCrossRefGoogle Scholar
- 30.Salichos L, Rokas A (2013) Inferring ancient divergences requires genes with strong phylogenetic signals. Nature 497:327–331PubMedCrossRefPubMedCentralGoogle Scholar
- 31.Felsenstein J (1978) Cases in which parsimony or compatibility methods will be positively misleading. Syst Zool 27:401–410CrossRefGoogle Scholar
- 32.Tuffley C, Steel M (1997) Links between maximum likelihood and maximum parsimony under a simple model of site substitution. Bull Math Biol 59:581–607PubMedCrossRefPubMedCentralGoogle Scholar
- 33.Steel M, Penny D (2000) Parsimony, likelihood, and the role of models in molecular phylogenetics. Mol Biol Evol 17:839–850PubMedCrossRefPubMedCentralGoogle Scholar
- 34.Holder MT, Lewis PO, Swofford DL (2010) The Akaike information criterion will not choose the no common mechanism model. Syst Biol 59:477–485PubMedCrossRefPubMedCentralGoogle Scholar
- 35.Editors T (2016) Editorial. Cladistics 32:1. https://doi.org/10.1111/cla.12148 CrossRefGoogle Scholar
- 36.Philippe H, Zhou Y, Brinkmann H, Rodrigue N, Delsuc F (2005) Heterotachy and long-branch attraction in phylogenetics. BMC Evol Biol 5:50PubMedPubMedCentralCrossRefGoogle Scholar
- 37.Brinkmann H, van der Giezen M, Zhou Y, de Raucourt GP, Philippe H (2005) An empirical assessment of long-branch attraction artefacts in deep eukaryotic phylogenomics. Syst Biol 54:743–757PubMedCrossRefPubMedCentralGoogle Scholar
- 38.Hampl V, Hug L, Leigh JW, Dacks JB, Lang BF, Simpson AG, Roger AJ (2009) Phylogenomic analyses support the monophyly of Excavata and resolve relationships among eukaryotic “supergroups”. Proc Natl Acad Sci USA 106:3859–3864PubMedCrossRefPubMedCentralGoogle Scholar
- 39.Liu H, Aris-Brosou S, Probert I, de Vargas C (2010) A timeline of the environmental genetics of the haptophytes. Mol Biol Evol 27:161–176PubMedCrossRefPubMedCentralGoogle Scholar
- 40.Zuckerkandl E, Pauling L (1965) Evolutionary divergence and convergence in proteins. In: Bryson V, Vogel HJ (eds) Evolving genes and proteins. Academic, Cambridge, pp 97–166CrossRefGoogle Scholar
- 41.Galtier N, Gascuel O, Jean-Marie A (2005) Markov models in molecular evolution. In: Nielsen R (ed) Statistical methods in molecular evolution. Statistics for biology and health. Springer, New York, pp 3–24CrossRefGoogle Scholar
- 42.Cox DR, Miller HD (1965) The theory of stochastic processes. Chapman and Hall/CRC, Boca RatonGoogle Scholar
- 43.Yang Z (2000) Maximum likelihood estimation on large phylogenies and analysis of adaptive evolution in human influenza virus A. J Mol Evol 51:423–432PubMedCrossRefPubMedCentralGoogle Scholar
- 44.Felsenstein J (1981) Evolutionary trees from DNA sequences: a maximum likelihood approach. J Mol Evol 17:368–376CrossRefGoogle Scholar
- 45.Jukes JC, Cantor CR (1969) Evolution of protein molecules. In: Munro HN (ed) Mammalian protein metabolism. Academic, New York, pp 21–123CrossRefGoogle Scholar
- 46.Kimura M (1980) A simple method for estimating evolutionary rates of base substitutions through comparative studies of nucleotide sequences. J Mol Evol 16:111–120CrossRefGoogle Scholar
- 47.Hasegawa M, Kishino H, Yano T (1985) Dating of the human-ape splitting by a molecular clock of mitochondrial DNA. J Mol Evol 22:160–174PubMedCrossRefPubMedCentralGoogle Scholar
- 48.Tavaré S (1986) Some probabilistic and statistical problems in the analysis of DNA sequences. Lect Math Life Sci 17:57–86Google Scholar
- 49.Huelsenbeck JP, Larget B, Alfaro ME (2004) Bayesian phylogenetic model selection using reversible jump Markov chain Monte Carlo. Mol Biol Evol 21:1123–1133PubMedCrossRefPubMedCentralGoogle Scholar
- 50.Yang Z, Roberts D (1995) On the use of nucleic acid sequences to infer early branchings in the tree of life. Mol Biol Evol 12:451–458PubMedGoogle Scholar
- 51.Huelsenbeck JP, Bollback JP, Levine AM (2002) Inferring the root of a phylogenetic tree. Syst Biol 51:32–43CrossRefGoogle Scholar
- 52.Yang Z (2006) Computational molecular evolution. Oxford University Press, OxfordCrossRefGoogle Scholar
- 53.Aris-Brosou S (2005) Determinants of adaptive evolution at the molecular level: the extended complexity hypothesis. Mol Biol Evol 22:200–209PubMedCrossRefGoogle Scholar
- 54.Anisimova M, Yang Z (2004) Molecular evolution of the hepatitis delta virus antigen gene: recombination or positive selection? J Mol Evol 59:815–826PubMedCrossRefGoogle Scholar
- 55.Burnham KP, Anderson DR (1998) Model selection and inference: a practical information-theoretic approach. Springer, BerlinCrossRefGoogle Scholar
- 56.Anisimova M, Bielawski JP, Yang Z (2001) Accuracy and power of the likelihood ratio test in detecting adaptive molecular evolution. Mol Biol Evol 18:1585–1592PubMedPubMedCentralCrossRefGoogle Scholar
- 57.Whelan S, Goldman N (2004) Estimating the frequency of events that cause multiple-nucleotide changes. Genetics 167:2027–2043PubMedPubMedCentralCrossRefGoogle Scholar
- 58.Wong WS, Yang Z, Goldman N, Nielsen R (2004) Accuracy and power of statistical methods for detecting adaptive evolution in protein coding sequences and for identifying positively selected sites. Genetics 168:1041–1051PubMedPubMedCentralCrossRefGoogle Scholar
- 59.Massingham T, Goldman N (2005) Detecting amino acid sites under positive selection and purifying selection. Genetics 169:1753–1762PubMedPubMedCentralCrossRefGoogle Scholar
- 60.Zhang J, Nielsen R, Yang Z (2005) Evaluation of an improved branch-site likelihood method for detecting positive selection at the molecular level. Mol Biol Evol 22:2472–2479PubMedPubMedCentralCrossRefGoogle Scholar
- 61.Anisimova M, Yang Z (2007) Multiple hypothesis testing to detect lineages under positive selection that affects only a few sites. Mol Biol Evol 24:1219–1228PubMedPubMedCentralCrossRefGoogle Scholar
- 62.Yang Z (2010) A likelihood ratio test of speciation with gene flow using genomic sequence data. Genome Biol Evol 2:200–211PubMedPubMedCentralCrossRefGoogle Scholar
- 63.Fletcher W, Yang Z (2010) The effect of insertions, deletions, and alignment errors on the branch-site test of positive selection. Mol Biol Evol 27:2257–2267PubMedPubMedCentralCrossRefGoogle Scholar
- 64.Yang Z, dos Reis M (2011) Statistical properties of the branch-site test of positive selection. Mol Biol Evol 28:1217–1228PubMedPubMedCentralCrossRefGoogle Scholar
- 65.Self SG, Liang K-Y (1987) Asymptotic properties of maximum likelihood estimators and likelihood ratio tests under nonstandard conditions. J Am Stat Assoc 82:605–610CrossRefGoogle Scholar
- 66.Posada D, Crandall KA (1998) MODELTEST: testing the model of DNA substitution. Bioinformatics 14:817–818CrossRefGoogle Scholar
- 67.Posada D (2008) jModelTest: phylogenetic model averaging. Mol Biol Evol 25:1253–1256PubMedCrossRefGoogle Scholar
- 68.Cunningham CW, Zhu H, Hillis DM (1998) Best-fit maximum-likelihood models for phylogenetic inference: empirical tests with known phylogenies. Evolution 52:978–987PubMedCrossRefGoogle Scholar
- 69.Pol D (2004) Empirical problems of the hierarchical likelihood ratio test for model selection. Syst Biol 53:949–962PubMedCrossRefGoogle Scholar
- 70.Kullback S, Leibler RA (1951) On information and sufficiency. Ann Math Stat 22:79–86CrossRefGoogle Scholar
- 71.Minin V, Abdo Z, Joyce P, Sullivan J (2003) Performance-based selection of likelihood models for phylogeny estimation. Syst Biol 52:674–683PubMedCrossRefGoogle Scholar
- 72.Ripplinger J, Sullivan J (2008) Does choice in model selection affect maximum likelihood analysis? Syst Biol 57:76–85PubMedCrossRefGoogle Scholar
- 73.Posada D, Crandall KA (2001) Selecting the best-fit model of nucleotide substitution. Syst Biol 50:580–601CrossRefGoogle Scholar
- 74.Abdo Z, Minin VN, Joyce P, Sullivan J (2005) Accounting for uncertainty in the tree topology has little effect on the decision-theoretic approach to model selection in phylogeny estimation. Mol Biol Evol 22:691–703PubMedCrossRefGoogle Scholar
- 75.Luo A, Qiao H, Zhang Y, Shi W, Ho SY, Xu W, Zhang A, Zhu C (2010) Performance of criteria for selecting evolutionary models in phylogenetics: a comprehensive study based on simulated datasets. BMC Evol Biol 10:242PubMedPubMedCentralCrossRefGoogle Scholar
- 76.Schwarz G (1978) Estimating the dimension of a model. Ann Stat 6:461–464CrossRefGoogle Scholar
- 77.Evans J, Sullivan J (2011) Approximating model probabilities in Bayesian information criterion and decision-theoretic approaches to model selection in phylogenetics. Mol Biol Evol 28:343–349PubMedCrossRefGoogle Scholar
- 78.Gascuel O (1997) BIONJ: an improved version of the NJ algorithm based on a simple model of sequence data. Mol Biol Evol 14:685–695PubMedCrossRefGoogle Scholar
- 79.Darriba D, Taboada GL, Doallo R, Posada D (2012) jModelTest 2: more models, new heuristics and parallel computing. Nat Methods 9:772–772PubMedPubMedCentralCrossRefGoogle Scholar
- 80.Lefort V, Longueville J-E, Gascuel O (2017) SMS: smart model selection in PhyML. Mol Biol Evol 34:2422–2424PubMedPubMedCentralCrossRefGoogle Scholar
- 81.Kleinman CL, Rodrigue N, Bonnard C, Philippe H, Lartillot N (2006) A maximum likelihood framework for protein design. BMC Bioinformatics 7:326PubMedPubMedCentralCrossRefGoogle Scholar
- 82.Rodrigue N, Philippe H, Lartillot N (2007) Exploring fast computational strategies for probabilistic phylogenetic analysis. Syst Biol 56:711–726PubMedCrossRefGoogle Scholar
- 83.Yang Z (2005) Bayesian inference in molecular phylogenetics. In: Gascuel O (ed) Mathematics of evolution and phylogeny. Oxford University Press, Oxford, pp 63–90Google Scholar
- 84.Jeffreys H (1939) Theory of probability. The International series of monographs on physics. The Clarendon Press, OxfordGoogle Scholar
- 85.Kass RE, Raftery AE (1995) Bayes factors. J Am Stat Assoc 90:773–795CrossRefGoogle Scholar
- 86.Lartillot N, Philippe H (2006) Computing Bayes factors using thermodynamic integration. Syst Biol 55:195–207PubMedCrossRefGoogle Scholar
- 87.Fan Y, Wu R, Chen MH, Kuo L, Lewis PO (2011) Choosing among partition models in Bayesian phylogenetics. Mol Biol Evol 28:523–32PubMedCrossRefGoogle Scholar
- 88.Newton MA, Raftery AE (1994) Approximating Bayesian inference with the weighted likelihood bootstrap. J R Stat Soc B 56:3–48Google Scholar
- 89.Aris-Brosou S (2003) How Bayes tests of molecular phylogenies compare with frequentist approaches. Bioinformatics 19:618–624PubMedCrossRefGoogle Scholar
- 90.Ronquist F, Huelsenbeck JP (2003) MrBayes 3: Bayesian phylogenetic inference under mixed models. Bioinformatics 19:1572–1574PubMedCrossRefGoogle Scholar
- 91.Drummond AJ, Rambaut A (2007) BEAST: Bayesian evolutionary analysis by sampling trees. BMC Evol Biol 7:214PubMedPubMedCentralCrossRefGoogle Scholar
- 92.Raftery AE (1996) Hypothesis testing and model selection. In: Gilks WR, Richardson S, Spiegelhalter DJ (eds) Markov chain Monte Carlo in practice. Chapman & Hall, Boca Raton, pp 163–187Google Scholar
- 93.Ogata Y (1989) A Monte Carlo method for high dimensional integration. Numer Math 55:137–157CrossRefGoogle Scholar
- 94.Gelman A, Meng X-L (1998) Simulating normalizing constants: from importance sampling to bridge sampling to path sampling. Stat Sci 13:163–185CrossRefGoogle Scholar
- 95.Xie W, Lewis PO, Fan Y, Kuo L, Chen MH (2011) Improving marginal likelihood estimation for Bayesian phylogenetic model selection. Syst Biol 60:150–60PubMedCrossRefGoogle Scholar
- 96.Baele G, Lemey P, Bedford T, Rambaut A, Suchard MA, Alekseyenko AV (2012) Improving the accuracy of demographic and molecular clock model comparison while accommodating phylogenetic uncertainty. Mol Biol Evol 29:2157–2167PubMedPubMedCentralCrossRefGoogle Scholar
- 97.Raftery AE, Newton MA, Satagopan JM, Krivitsky PN (2007) Estimating the integrated likelihood via posterior simulation using the harmonic mean identity. Bayesian Stat 8:1–45Google Scholar
- 98.Smyth P (2000) Model selection for probabilistic clustering using cross-validated likelihood. Stat Comput 10:63–72CrossRefGoogle Scholar
- 99.Lartillot N, Brinkmann H, Philippe H (2007) Suppression of long-branch attraction artefacts in the animal phylogeny using a site-heterogeneous model. BMC Evol Biol 7(Suppl 1):S4PubMedPubMedCentralCrossRefGoogle Scholar
- 100.Cavalli-Sforza LL, Edwards AW (1967) Phylogenetic analysis. Models and estimation procedures. Am J Hum Genet 19:233–257PubMedPubMedCentralGoogle Scholar
- 101.Aris-Brosou S (2003) Least and most powerful phylogenetic tests to elucidate the origin of the seed plants in the presence of conflicting signals under misspecified models. Syst Biol 52:781–793PubMedCrossRefGoogle Scholar
- 102.Foulds LR, Penny D, Hendy MD (1979) A general approach to proving the minimality of phylogenetic trees illustrated by an example with a set of 23 vertebrates. J Mol Evol 13:151–166PubMedCrossRefGoogle Scholar
- 103.Hendy MD, Penny D (1982) Branch and bound algorithms to determine minimal evolutionary trees. Math Biosci 59:277–290CrossRefGoogle Scholar
- 104.Saitou N, Nei M (1987) The neighbor-joining method: a new method for reconstructing phylogenetic trees. Mol Biol Evol 4:406–425PubMedGoogle Scholar
- 105.Bruno WJ, Socci ND, Halpern AL (2000) Weighted neighbor joining: a likelihood-based approach to distance-based phylogeny reconstruction. Mol Biol Evol 17:189–197PubMedCrossRefGoogle Scholar
- 106.Larget B, Simon D (1999) Markov chain Monte Carlo algorithms for the Bayesian analysis of phylogenetic trees. Mol Biol Evol 16:750CrossRefGoogle Scholar
- 107.Holder MT, Lewis PO, Swofford DL, Larget B (2005) Hastings ratio of the LOCAL proposal used in Bayesian phylogenetics. Syst Biol 54:961–965PubMedCrossRefPubMedCentralGoogle Scholar
- 108.Whelan S (2007) New approaches to phylogenetic tree search and their application to large numbers of protein alignments. Syst Biol 56:727–740PubMedCrossRefPubMedCentralGoogle Scholar
- 109.Pritchard JK, Seielstad MT, Perez-Lezaun A, Feldman MW (1999) Population growth of human y chromosomes: a study of Y chromosome microsatellites. Mol Biol Evol 16:1791–1798PubMedCrossRefPubMedCentralGoogle Scholar
- 110.Beaumont MA, Zhang W, Balding DJ (2002) Approximate Bayesian computation in population genetics. Genetics 162:2025–2035PubMedPubMedCentralGoogle Scholar
- 111.Kingman JFC (1982) The coalescent. Stoch Process Appl 13:235–248CrossRefGoogle Scholar
- 112.Hein J, Schierup MH, Wiuf C (2005) Gene genealogies, variation and evolution: a primer in coalescent theory. Oxford University Press, OxfordGoogle Scholar
- 113.Marjoram P, Molitor J, Plagnol V, Tavaré S (2003) Markov chain Monte Carlo without likelihoods. Proc Natl Acad Sci 100:15324–15328PubMedCrossRefGoogle Scholar
- 114.Sisson SA, Fan Y, Tanaka MM (2007) Sequential Monte Carlo without likelihoods. Proc Natl Acad Sci 104:1760–1765PubMedCrossRefGoogle Scholar
- 115.Toni T, Welch D, Strelkowa N, Ipsen A, Stumpf MP (2009) Approximate Bayesian computation scheme for parameter inference and model selection in dynamical systems. J R Soc Interface 6:187–202PubMedCrossRefGoogle Scholar
- 116.Beaumont MA (2010) Approximate Bayesian computation in evolution and ecology. Annu Rev Ecol Evol Syst 41:379–406CrossRefGoogle Scholar
- 117.Sunnåker M, Busetto AG, Numminen E, Corander J, Foll M, Dessimoz C (2013) Approximate Bayesian computation. PLoS Comput Biol 9:e1002803PubMedPubMedCentralCrossRefGoogle Scholar
- 118.Lintusaari J, Gutmann MU, Dutta R, Kaski S, Corander J (2017) Fundamentals and recent developments in approximate Bayesian computation. Syst Biol 66:e66–e82PubMedGoogle Scholar
- 119.Ratmann O, Donker G, Meijer A, Fraser C, Koelle K (2012) Phylodynamic inference and model assessment with approximate Bayesian computation: influenza as a case study. PLoS Comput Biol 8:e1002835PubMedPubMedCentralCrossRefGoogle Scholar
- 120.Zheng Y, Aris-Brosou S (2013) Approximate Bayesian computation algorithms for estimating network model parameters. In: Joint statistical meeting proceedings (2013)—biometrics section, pp 2239–2253Google Scholar
- 121.Poon AF (2015) Phylodynamic inference with kernel ABC and its application to HIV epidemiology. Mol Biol Evol 32:2483–2495PubMedPubMedCentralCrossRefGoogle Scholar
- 122.Ibeh N, Aris-Brosou S (2016) Estimation of sub-epidemic dynamics by means of sequential Monte Carlo approximate Bayesian computation: an application to the Swiss HIV cohort study. https://doi.org/10.1101/085993
- 123.Hastie T, Tibshirani R, Friedman JH (2009) The elements of statistical learning: data mining, inference, and prediction. Springer series in statistics, 2nd edn. Springer, New YorkGoogle Scholar
- 124.Poon AF, Walker LW, Murray H, McCloskey RM, Harrigan PR, Liang RH (2013) Mapping the shapes of phylogenetic trees from human and zoonotic RNA viruses. PLoS One 8:e78122PubMedPubMedCentralCrossRefGoogle Scholar
- 125.Schwarz RF, Fletcher W, Förster F, Merget B, Wolf M, Schultz J, Markowetz F (2010) Evolutionary distances in the twilight zone—a rational kernel approach. PLoS One 5:e15788PubMedPubMedCentralCrossRefGoogle Scholar
- 126.Höhl M, Ragan MA (2007) Is multiple-sequence alignment required for accurate inference of phylogeny? Syst Biol 56:206–221PubMedCrossRefGoogle Scholar
- 127.Sanderson M, Nicolae M, McMahon M (2017) Homology-aware phylogenomics at gigabase scales. Syst Biol 66:590–603PubMedPubMedCentralGoogle Scholar
- 128.Jordan MI, Mitchell TM (2015) Machine learning: trends, perspectives, and prospects. Science 349:255–260PubMedPubMedCentralCrossRefGoogle Scholar
- 129.Rusk N (2016) Deep learning. Nat Methods 13:35CrossRefGoogle Scholar
- 130.Esteva A, Kuprel B, Novoa RA, Ko J, Swetter SM, Blau HM, Thrun S (2017) Dermatologist-level classification of skin cancer with deep neural networks. Nature 542:115–118PubMedCrossRefGoogle Scholar
- 131.Morell V (1996) TreeBASE: the roots of phylogeny. Science 273:569CrossRefGoogle Scholar
- 132.Whelan S, de Bakker PIW, Quevillon E, Rodriguez N, Goldman N (2006) PANDIT: an evolution-centric database of protein and associated nucleotide domains with inferred trees. Nucleic Acids Res 34:D327–D331PubMedPubMedCentralCrossRefGoogle Scholar
- 133.Zhou J, Troyanskaya OG (2015) Predicting effects of noncoding variants with deep learning-based sequence model. Nat Methods 12:931–934PubMedPubMedCentralCrossRefGoogle Scholar
- 134.Tran NH, Zhang X, Xin L, Shan B, Li M (2017) De novo peptide sequencing by deep learning. Proc Natl Acad Sci. https://doi.org/10.1073/pnas.1705691114 CrossRefGoogle Scholar
- 135.Benton MJ, Ayala FJ (2003) Dating the tree of life. Science 300:1698–700PubMedCrossRefGoogle Scholar
- 136.Rannala B, Yang Z (2007) Inferring speciation times under an episodic molecular clock. Syst Biol 56:453–66PubMedPubMedCentralCrossRefGoogle Scholar
- 137.Wegmann D, Leuenberger C, Excoffier L (2009) Efficient approximate Bayesian computation coupled with Markov chain Monte Carlo without likelihood. Genetics 182:1207–1218PubMedPubMedCentralCrossRefGoogle Scholar
- 138.Reich D, Green RE, Kircher M et al (2010) Genetic history of an archaic hominin group from Denisova Cave in Siberia. Nature 468:1053–1060PubMedPubMedCentralCrossRefGoogle Scholar
- 139.Hedges SB, Dudley J, Kumar S (2006) TimeTree: a public knowledge-base of divergence times among organisms. Bioinformatics 22:2971–2972PubMedCrossRefGoogle Scholar
- 140.Kumar S, Stecher G, Suleski M, Hedges SB (2017) TimeTree: a resource for timelines, timetrees, and divergence times. Mol Biol Evol 34:1812–1819PubMedCrossRefPubMedCentralGoogle Scholar
- 141.Welch JJ, Bromham L (2005) Molecular dating when rates vary. Trends Ecol Evol 20:320–327PubMedCrossRefGoogle Scholar
- 142.Kimura M (1983) The neutral theory of molecular evolution. Cambridge University Press, CambridgeCrossRefGoogle Scholar
- 143.Sarich VM, Wilson AC (1973) Generation time and genomic evolution in primates. Science 179:1144–1147PubMedCrossRefGoogle Scholar
- 144.Muse SV, Weir BS (1992) Testing for equality of evolutionary rates. Genetics 132:269–276PubMedPubMedCentralGoogle Scholar
- 145.Bromham L, Penny D, Rambaut A, Hendy MD (2000) The power of relative rates tests depends on the data. J Mol Evol 50:296–301PubMedCrossRefGoogle Scholar
- 146.Rambaut A (2000) Estimating the rate of molecular evolution: incorporating non-contemporaneous sequences into maximum likelihood phylogenies. Bioinformatics 16:395–399PubMedCrossRefGoogle Scholar
- 147.Martin AP (2001) Molecular clocks. Encyclopedia of life sciences. Wiley, Hoboken, pp 1–6Google Scholar
- 148.Wray GA, Levinton JS, Shapiro LH (1996) Molecular evidence for deep Precambrian divergences among Metazoan phyla. Science 274:568–573CrossRefGoogle Scholar
- 149.Kumar S, Hedges SB (1998) A molecular timescale for vertebrate evolution. Nature 392:917–920PubMedCrossRefGoogle Scholar
- 150.Wang DY, Kumar S, Hedges SB (1999) Divergence time estimates for the early history of animal phyla and the origin of plants, animals and fungi. Proc Biol Sci 266:163–171PubMedPubMedCentralCrossRefGoogle Scholar
- 151.Heckman DS, Geiser DM, Eidell BR, Stauffer RL, Kardos NL, Hedges SB (2001) Molecular evidence for the early colonization of land by fungi and plants. Science 293:1129–1133PubMedCrossRefGoogle Scholar
- 152.Hedges SB, Chen H, Kumar S, Wang DY, Thompson AS, Watanabe H (2001) A genomic timescale for the origin of eukaryotes. BMC Evol Biol 1:4PubMedPubMedCentralCrossRefGoogle Scholar
- 153.Graur D, Martin W (2004) Reading the entrails of chickens: molecular timescales of evolution and the illusion of precision. Trends Genet 20:80–86PubMedCrossRefPubMedCentralGoogle Scholar
- 154.Rambaut A, Bromham L (1998) Estimating divergence dates from molecular sequences. Mol Biol Evol 15:442–448PubMedCrossRefGoogle Scholar
- 155.Yoder AD, Yang Z (2000) Estimation of primate speciation dates using local molecular clocks. Mol Biol Evol 17:1081–1090PubMedCrossRefPubMedCentralGoogle Scholar
- 156.Yang Z (2004) A heuristic rate smoothing procedure for maximum likelihood estimation of species divergence times. Acta Zool Sin 50:645–656Google Scholar
- 157.Aris-Brosou S (2007) Dating phylogenies with hybrid local molecular clocks. PLoS One 2:e879PubMedPubMedCentralCrossRefGoogle Scholar
- 158.Drummond AJ, Suchard MA (2010) Bayesian random local clocks, or one rate to rule them all. BMC Biol 8:114PubMedPubMedCentralCrossRefGoogle Scholar
- 159.Sanderson M (1997) A nonparametric approach to estimating divergence times in the absence of rate constancy. Mol Biol Evol 14:1218CrossRefGoogle Scholar
- 160.Sanderson MJ (2002) Estimating absolute rates of molecular evolution and divergence times: a penalized likelihood approach. Mol Biol Evol 19:101–109PubMedCrossRefPubMedCentralGoogle Scholar
- 161.Gillespie JH (1991) The causes of molecular evolution. Oxford University Press, OxfordGoogle Scholar
- 162.Thorne JL, Kishino H, Painter IS (1998) Estimating the rate of evolution of the rate of molecular evolution. Mol Biol Evol 15:1647–1657PubMedPubMedCentralCrossRefGoogle Scholar
- 163.Aris-Brosou S, Yang Z (2002) Effects of models of rate evolution on estimation of divergence dates with special reference to the metazoan 18S ribosomal RNA phylogeny. Syst Biol 51:703–714PubMedCrossRefGoogle Scholar
- 164.Aris-Brosou S, Yang Z (2003) Bayesian models of episodic evolution support a late precambrian explosive diversification of the Metazoa. Mol Biol Evol 20:1947–1954PubMedCrossRefPubMedCentralGoogle Scholar
- 165.Rannala B, Yang Z (1996) Probability distribution of molecular evolutionary trees: a new method of phylogenetic inference. J Mol Evol 43:304–311PubMedCrossRefPubMedCentralGoogle Scholar
- 166.Pybus OG, Rambaut A, Harvey PH (2000) An integrated framework for the inference of viral population history from reconstructed genealogies. Genetics 155:1429–1437PubMedPubMedCentralGoogle Scholar
- 167.Drummond AJ, Rambaut A, Shapiro B, Pybus OG (2005) Bayesian coalescent inference of past population dynamics from molecular sequences. Mol Biol Evol 22:1185–1192PubMedCrossRefPubMedCentralGoogle Scholar
- 168.Minin VN, Bloomquist EW, Suchard MA (2008) Smooth skyride through a rough skyline: Bayesian coalescent-based inference of population dynamics. Mol Biol Evol 25:1459–1471PubMedPubMedCentralCrossRefGoogle Scholar
- 169.Hedges SB, Kumar S (2004) Precision of molecular time estimates. Trends Genet 20:242–247PubMedCrossRefGoogle Scholar
- 170.Yang Z, Rannala B (2006) Bayesian estimation of species divergence times under a molecular clock using multiple fossil calibrations with soft bounds. Mol Biol Evol 23:212–226PubMedPubMedCentralCrossRefGoogle Scholar
- 171.Inoue J, Donoghue PCJ, Yang Z (2010) The impact of the representation of fossil calibrations on Bayesian estimation of species divergence times. Syst Biol 59:74–89PubMedPubMedCentralCrossRefGoogle Scholar
- 172.Drummond AJ, Ho SYW, Phillips MJ, Rambaut A (2006) Relaxed phylogenetics and dating with confidence. PLoS Biol 4:e88PubMedPubMedCentralCrossRefGoogle Scholar
- 173.Bouckaert R, Heled J, Kühnert D, Vaughan T, Wu CH, Xie D, Suchard MA, Rambaut A, Drummond AJ (2014) BEAST 2: a software platform for Bayesian evolutionary analysis. PLoS Comput Biol 10:e1003537PubMedPubMedCentralCrossRefGoogle Scholar
- 174.Wertheim JO, Sanderson MJ, Worobey M, Bjork A (2010) Relaxed molecular clocks, the bias-variance trade-off, and the quality of phylogenetic inference. Syst Biol 59:1–8PubMedCrossRefPubMedCentralGoogle Scholar
- 175.Lemey P, Rambaut A, Drummond AJ, Suchard MA (2009) Bayesian phylogeography finds its roots. PLoS Comput Biol 5:e1000520PubMedPubMedCentralCrossRefGoogle Scholar
- 176.Lemey P, Rambaut A, Welch JJ, Suchard MA (2010) Phylogeography takes a relaxed random walk in continuous space and time. Mol Biol Evol 27:1877–1885PubMedPubMedCentralCrossRefGoogle Scholar
- 177.Guillot G, Santos F, Estoup A (2008) Analysing georeferenced population genetics data with Geneland: a new algorithm to deal with null alleles and a friendly graphical user interface. Bioinformatics 24:1406–1407PubMedCrossRefGoogle Scholar
- 178.Nadin-Davis SA, Feng Y, Mousse D, Wandeler AI, Aris-Brosou ST (2010) Spatial and temporal dynamics of rabies virus variants in big brown bat populations across Canada: footprints of an emerging zoonosis. Mol Ecol 19:2120–2136PubMedCrossRefGoogle Scholar
- 179.Pagel M, Meade A (2004) A phylogenetic mixture model for detecting pattern-heterogeneity in gene sequence or character-state data. Syst Biol 53:571–581PubMedPubMedCentralCrossRefGoogle Scholar
- 180.Pagel M, Meade A, Barker D (2004) Bayesian estimation of ancestral character states on phylogenies. Syst Biol 53:673–684PubMedCrossRefPubMedCentralGoogle Scholar
- 181.Lartillot N, Poujol R (2011) A phylogenetic model for investigating correlated evolution of substitution rates and continuous phenotypic characters. Mol Biol Evol 28:729–744PubMedCrossRefGoogle Scholar
- 182.Bromham L, Woolfit M, Lee MS, Rambaut A (2002) Testing the relationship between morphological and molecular rates of change along phylogenies. Evolution 56:1921–1930PubMedCrossRefPubMedCentralGoogle Scholar
- 183.Ho SYW, Duchêne S (2014) Molecular-clock methods for estimating evolutionary rates and timescales. Mol Ecol 23:5947–5965PubMedCrossRefPubMedCentralGoogle Scholar
- 184.dos Reis M, Donoghue PCJ, Yang Z (2016) Bayesian molecular clock dating of species divergences in the genomics era. Nat Rev Genet 17:71–80PubMedPubMedCentralCrossRefGoogle Scholar
- 185.Donoghue PCJ, Yang Z (2016) The evolution of methods for establishing evolutionary timescales. Philos Trans R Soc Lond B Biol Sci. https://doi.org/10.1098/rstb.2016.0020 CrossRefGoogle Scholar
- 186.Ho SY, Tong KJ, Foster CS, Ritchie AM, Lo N, Crisp MD (2015) Biogeographic calibrations for the molecular clock. Biol Lett 11:20150194PubMedPubMedCentralCrossRefGoogle Scholar
- 187.Kühnert D, Wu C-H, Drummond AJ (2011) Phylogenetic and epidemic modeling of rapidly evolving infectious diseases. Infect Genet Evol 11:1825–1141PubMedCrossRefGoogle Scholar
- 188.Rieux A, Balloux F (2016) Inferences from tip-calibrated phylogenies: a review and a practical guide. Mol Ecol 25:1911–1924PubMedPubMedCentralCrossRefGoogle Scholar
- 189.Ho SYW, Chen AXY, Lins LSF, Duchêne DA, Lo N (2016) The genome as an evolutionary timepiece. Genome Biol Evol 8:3006–3010PubMedPubMedCentralCrossRefGoogle Scholar
- 190.O’Reilly JE, dos Reis M, Donoghue PCJ (2015) Dating tips for divergence-time estimation. Trends Genet 31:637–50PubMedPubMedCentralCrossRefGoogle Scholar
- 191.1000 Genomes Project Consortium, Abecasis GR, Altshuler D, Auton A, Brooks LD, Durbin RM, Gibbs RA, Hurles ME, McVean GA (2010) A map of human genome variation from population-scale sequencing. Nature 467:1061–1073PubMedCrossRefGoogle Scholar
- 192.UK10K Consortium, Walter K, Min JL, Huang J et al (2015) The UK10K project identifies rare variants in health and disease. Nature 526:82–90CrossRefGoogle Scholar
- 193.Ledford H (2016) AstraZeneca launches project to sequence 2 million genomes. Nature 532:427PubMedCrossRefGoogle Scholar
- 194.Muse SV, Gaut BS (1994) A likelihood approach for comparing synonymous and nonsynonymous nucleotide substitution rates, with application to the chloroplast genome. Mol Biol Evol 11:715–724PubMedPubMedCentralGoogle Scholar
- 195.Goldman N, Yang Z (1994) A codon-based model of nucleotide substitution for protein-coding DNA sequences. Mol Biol Evol 11:725–736PubMedPubMedCentralGoogle Scholar
- 196.Kosiol C, Anisimova M (2011) Methods for detecting natural selection in protein-coding genes. In: Anisimova M (ed) Evolutionary genomics: statistical and computational methods. Methods in molecular biology series. Humana-Springer, New YorkGoogle Scholar
- 197.Thorne JL, Choi SC, Yu J, Higgs PG, Kishino H (2007) Population genetics without intraspecific data. Mol Biol Evol 24:1667–1677PubMedCrossRefGoogle Scholar
- 198.Choi SC, Hobolth A, Robinson DM, Kishino H, Thorne JL (2007) Quantifying the impact of protein tertiary structure on molecular evolution. Mol Biol Evol 24:1769–1782PubMedPubMedCentralCrossRefGoogle Scholar
- 199.Halpern AL, Bruno WJ (1998) Evolutionary distances for protein-coding sequences: modeling site-specific residue frequencies. Mol Biol Evol 15:910–917PubMedPubMedCentralCrossRefGoogle Scholar
- 200.Yang Z, Nielsen R (2008) Mutation-selection models of codon substitution and their use to estimate selective strengths on codon usage. Mol Biol Evol 25:568–579PubMedPubMedCentralCrossRefGoogle Scholar
- 201.Rodrigue N, Philippe H, Lartillot N (2010) Mutation-selection models of coding sequence evolution with site-heterogeneous amino acid fitness profiles. Proc Natl Acad Sci USA 107:4629–4634PubMedPubMedCentralCrossRefGoogle Scholar
- 202.Rodrigue N, Lartillot N (2017) Detecting adaptation in protein-coding genes using a Bayesian site-heterogeneous mutation-selection codon substitution model. Mol Biol Evol 34:204–214PubMedCrossRefGoogle Scholar
- 203.Bloom JD (2017) Identification of positive selection in genes is greatly improved by using experimentally informed site-specific models. Biol Direct 12:1. https://doi.org/10.1186/s13062-016-0172-z PubMedPubMedCentralCrossRefGoogle Scholar
- 204.Choi SC, Redelings BD, Thorne JL (2008) Basing population genetic inferences and models of molecular evolution upon desired stationary distributions of DNA or protein sequences. Philos Trans R Soc Lond B Biol Sci 363:3931–3939PubMedPubMedCentralCrossRefGoogle Scholar
- 205.Rodrigue N, Kleinman CL, Philippe H, Lartillot N (2009) Computational methods for evaluating phylogenetic models of coding sequence evolution with dependence between codons. Mol Biol Evol 26:1663–1676PubMedCrossRefGoogle Scholar
- 206.Hartl DL, Clark AG (2007) Principles of population genetics, 4th edn. Sinauer Associates, SunderlandGoogle Scholar
- 207.Kimura M (1962) On the probability of fixation of mutant genes in a population. Genetics 47:713–719PubMedPubMedCentralGoogle Scholar
- 208.Rice SH (2004) Evolutionary theory: mathematical and conceptual foundations. Sinauer Associates, SunderlandGoogle Scholar
- 209.Kimura M (1978) Change of gene frequencies by natural selection under population number regulation. Proc Natl Acad Sci USA 75:1934–1937PubMedCrossRefGoogle Scholar
- 210.Tamuri A, dos Reis M, Goldstein R (2012) Estimating the distribution of selection coefficients from phylogenetic data using sitewise mutation-selection models. Genetics 190:1101–1115PubMedPubMedCentralCrossRefGoogle Scholar
- 211.Rodrigue N (2013) On the statistical interpretation of site-specific variables in phylogeny-based substitution models. Genetics 193:557–564PubMedPubMedCentralCrossRefGoogle Scholar
- 212.Prins P, Belhachemi D, Möller S, Smant G (2011) Scalable computing in evolutionary genomics. In: Anisimova M (ed) Evolutionary genomics: statistical and computational methods. Methods in molecular biology series. Humana-Springer, New YorkGoogle Scholar
- 213.Yang Z (1993) Maximum-likelihood estimation of phylogeny from DNA sequences when substitution rates differ over sites. Mol Biol Evol 10:1396–1401PubMedGoogle Scholar
- 214.Yang Z (1994) Maximum likelihood phylogenetic estimation from DNA sequences with variable rates over sites: approximate methods. J Mol Evol 39:306–314PubMedCrossRefGoogle Scholar
- 215.Altekar G, Dwarkadas S, Huelsenbeck JP, Ronquist F (2004) Parallel Metropolis coupled Markov chain Monte Carlo for Bayesian phylogenetic inference. Bioinformatics 20:407–415PubMedCrossRefGoogle Scholar
- 216.Lartillot N, Rodrigue N, Stubbs D, Richer J (2013) PhyloBayes MPI: phylogenetic reconstruction with infinite mixtures of profiles in a parallel environment. Syst Biol 62:611–615PubMedCrossRefGoogle Scholar
- 217.Rodrigue N, Lartillot N (2014) Site-heterogeneous mutation-selection models within the PhyloBayes-MPI package. Bioinformatics 30:1020–1021PubMedPubMedCentralCrossRefGoogle Scholar
- 218.Stamatakis A, Hoover P, Rougemont J (2008) A rapid bootstrap algorithm for the RAxML Web servers. Syst Biol 57:758–771CrossRefGoogle Scholar
- 219.Stamatakis A, Ludwig T, Meier H (2005) RAxML-III: a fast program for maximum likelihood-based inference of large phylogenetic trees. Bioinformatics 21:456–463PubMedCrossRefGoogle Scholar
- 220.Stamatakis A, Göker M, Grimm GW (2010) Maximum likelihood analyses of 3,490 rbcL sequences: scalability of comprehensive inference versus group-specific taxon sampling. Evol Bioinform Online 6:73–90PubMedPubMedCentralCrossRefGoogle Scholar
- 221.Stamatakis A, Alachiotis N (2010) Time and memory efficient likelihood-based tree searches on phylogenomic alignments with missing data. Bioinformatics 26:i132–i139PubMedPubMedCentralCrossRefGoogle Scholar
- 222.Suchard MA, Rambaut A (2009) Many-core algorithms for statistical phylogenetics. Bioinformatics 25:1370–1376PubMedPubMedCentralCrossRefGoogle Scholar
- 223.Ronquist F, Teslenko M, van der Mark P, Ayres DL, Darling A, Höhna S, Larget B, Liu L, Suchard MA, Huelsenbeck JP (2012) MrBayes 3.2: efficient Bayesian phylogenetic inference and model choice across a large model space. Syst Biol 61:539–542PubMedPubMedCentralCrossRefGoogle Scholar
- 224.Muir P, Li S, Lou S et al (2016) The real cost of sequencing: scaling computation to keep pace with data generation. Genome Biol 17:53PubMedPubMedCentralCrossRefGoogle Scholar
- 225.Schatz MC, Langmead B, Salzberg SL (2010) Cloud computing and the DNA data race. Nat Biotechnol 28:691–693PubMedPubMedCentralCrossRefGoogle Scholar
- 226.Dereeper A, Guignon V, Blanc G et al (2008) Phylogeny.fr: robust phylogenetic analysis for the non-specialist. Nucleic Acids Res 36:W465–W469PubMedPubMedCentralCrossRefGoogle Scholar
- 227.de Koning AP, Gu W, Pollock DD (2010) Rapid likelihood analysis on large phylogenies using partial sampling of substitution histories. Mol Biol Evol 27:249–265PubMedCrossRefPubMedCentralGoogle Scholar
Copyright information
Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.
The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.