Estimating Phylogenies from Molecular Data

  • Daniele CatanzaroEmail author


Phylogenetic estimation from aligned DNA, RNA or amino acid sequences has attracted more and more attention in recent years due to its importance in analysis of many fine-scale genetic data. Nowadays, its application fields range from medical research to drug discovery, to epidemiology, to systematics and population dynamics. Estimating phylogenies involves solving an optimization problem, called the phylogenetic estimation problem (PEP), whose versions depend on the criterion used to select a phylogeny among plausible alternatives. This chapter offers an overview of PEP and discuss the most important versions that occur in the literature.


Edge Weight Evolutionary Distance Internal Vertex Molecular Sequence Maximal Lyapunov Exponent 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

8.1 Introduction

Molecular phylogenetics studies the hierarchical evolutionary relationships among species, or taxa, by means of molecular data such as DNA, RNA, amino acid or codon sequences. These relationships are usually described through a weighted tree, called a phylogeny(Fig. 8.1), whose leavesrepresent the observed taxa, internal verticesrepresent the intermediate ancestors, edgesrepresent the estimated evolutionary relationships and edge weightsrepresent measures of the similarity between pairs of taxa.
Fig. 8.1

An example of a phylogeny of primates

Phylogenies provide a fundamental information in analysis of many fine-scale genetic data; for this reason, the use of molecular phylogenetics has become more and more frequent (and sometimes indispensable) in several research fields such as systematics, medical research, drug discovery, epidemiology and population dynamics [56]. For example, the use of molecular phylogenetics was of considerable assistance to predict the evolution of human influenza A [8], to understand the relationships between the virulence and the genetic evolution of HIV [55, 66], to identify emerging viruses as SARS [51], to recreate and investigate ancestral proteins [17], to design neuropeptides causing smooth muscle contraction [2] and to relate geographic patterns to macroevolutionary processes [36].

Since no one could observe evolution over thousands or millions of years, a part from known phylogenies ([57]), there is no general way to validate empirically a candidate phylogeny for a set of molecular sequences extracted from taxa. For this reason, the literature proposes a number of criteria for selecting one phylogeny from among plausible alternatives. Each criterion adopts its own set of evolutionary hypotheses, whose ability to describe evolution of taxa determines the gap between the realand the true phylogeny, i.e., the gap between the real evolutionary process of taxa and the phylogeny that one would obtain under the same set of hypotheses if all molecular data from taxa were available [9].

The criteria of phylogenetic estimation can usually be quantified and expressed in terms of objective functions, giving rise to families of optimization problems whose general paradigm can be stated as follows:

Problem 8.1.

The phylogenetic estimation problem (PEP)
$$\begin{array}{rcl} \mbox{ optimize}& \quad & f(T) \\ \mathrm{s.t.}& \quad & g(\Gamma,T) = 0 \\ & \quad & T \in \mathcal{T}, \\ \end{array}$$

where Γis the set of molecular sequences from ntaxa, Ta phylogeny of Γ, \(\mathcal{T}\)the set of (2n− 5)! ! = 1 ×3 ×5 ×7⋯ ×2n− 5 phylogenies of Γ, \(f : \mathcal{T} \rightarrow \mathbb{R}\)a function modeling the selected criterion of phylogenetic estimation, and \(g : \Gamma \times \mathcal{T} \rightarrow \mathbb{R}\)a function correlating the set Γto a phylogeny T.

A specific optimization problem, or phylogenetic estimation paradigm, is completely characterized by defining the functions fand g. The phylogeny T that optimizes fand satisfies gis referred to as optimal, and if T approaches the true phylogeny as the amount of molecular data from taxa increases, the corresponding criterion is said to be statistically consistent[32]. The statistical consistency is a desirable property in molecular phylogenetics because it measures the ability of a criterion to recover the true (and hopefully the real) phylogeny of the given molecular data. Later in this chapter, we will show that the consistency property changes from criterion to criterion and in some cases may be even absent.

Here, we provide a review of the main estimation criteria that occur in the literature on molecular phylogenetics. Particular emphasis is given to the comparative description of the hypotheses at the core of each criterion and to the optimization aspects related to the phylogenetic estimation paradigms. In Sect. 8.2, we discuss the problem of measuring the similarity among molecular sequences. In Sect. 8.3, we discuss the fundamental least-squares paradigm and formalize the concept of phylogeny. In Sect. 8.4, we present the minimum evolution paradigm by evidencing the recent perspectives and computational advances. Finally, in Sect. 8.5we present the likelihood and the bayesian paradigms by exposing briefly their benefits and drawbacks.

8.2 Measuring Molecular Similarity

The degree of similarity between pairwise molecular sequences reflects the amount of mutation events that occurred since they split from their common ancestor. Quantifying such similarity constitutes the first step in the phylogenetic estimation process [11]. The task involves the investigation and the modeling of the mutation processover time, i.e., the process by which errors occur in molecular data and are inherited between generations.

Different types of mutation may occur in the genome structure, most of which are point mutations, i.e., changes that involve the replacement, or substitution, of one nucleotide for another in the DNA sequence. Point mutations can be classified in two categories: the transitions and the transversions. The transitions occur when a purine nucleotide (adenine or guanine) is substituted for another purine, or when a pyrimidine (cytosine or thymine) is substituted for another pyrimidine. The transversions occur when a pyrimidine is substituted for a purine, or vice versa.

A second class of point mutations are those that lead to insertionsand deletionsof nucleotides in the genome. This phenomenon mainly occurs in non-coding regions of DNA, but may interest also coding regions of the genome and be the cause of deleterious effects [57].

Finally, a third class of mutations are those that involve entire chromosome regions of the genome. Specifically, we may have: (1) a duplication, when a chromosome region is duplicated; (2) a translocation, when a chromosome region is transferred into another chromosome; (3) an inversion, when a chromosome region is broken off, turned upside down and reconnected; (4) a deletion, when a chromosome region is missing or deleted; (5) and a loss of heterozygosity, e.g., when two instances of the same chromosome break and then reconnect but to the different end pieces [57].

Modeling the second and the third classes of mutations is generally non-trivial and requires advanced mathematical background. We refer the interested reader to Felsenstein [29] for an introduction and to Park and Deem [58] for recent advances in the modeling of such classes. Here we shall focus on the first class of mutations and present a fundamental model of molecular evolution which is at the core of the most currently used criteria of phylogenetic estimation. Unless otherwise stated, throughout the chapter we will always assume that the molecular sequences under study have been previously subjected to an alignment process, i.e., a process through which the evolutionary relationships between nucleotides of molecular data are evidenced (see [60] for details).

8.2.1 The Time Homogeneous Markov Model of Molecular Evolution

Let Sbe a DNA sequence, i.e., a string of fixed length over an alphabet Υ= { A, C, G, T}, where “A” codes for adenine, “C” for cytosine, “G” for guanine, and “T” for thymine. Let r ij ≥ 0, ij, be the constant rate of substitution from nucleotide ito nucleotide j. Assume that each character (site) of Sevolves independently over time and that, instant per instant, the Markov conservative hypothesis[39] holds, i.e.,
$$\begin{array}{rcl}{ r}_{\mathit{ii}}& =& -{\sum \nolimits }_{j\in \Upsilon,\ j\neq i}{r}_{\mathit{ij}}\quad \ \forall \ i \in \Upsilon.\end{array}$$
Let p ij (t) be the probability that nucleotide iundergoes to a substitution to nucleotide jat finite time t. Then, if the superposition principle holds, at t+ dtsuch probability can be written as:
$${p}_{\mathit{ij}}(t + \mathrm{d}t) ={ \sum \nolimits }_{k\in \Upsilon }{p}_{\mathit{ik}}(t){p}_{\mathit{kj}}(\mathrm{d}t)\quad \ \forall \ i,j \in \Upsilon.$$
By subtracting p ij (t) in both sides of (8.2) and dividing for dtwe obtain:
$$\begin{array}{rcl} \frac{{p}_{\mathit{ij}}(t + \mathrm{d}t) - {p}_{\mathit{ij}}(t)} {\mathrm{d}t} & =& \frac{{\sum \nolimits }_{k\in \Upsilon,\ k\neq j}{p}_{\mathit{ik}}(t){p}_{\mathit{kj}}(\mathrm{d}t)} {\mathrm{d}t} \\ & & +{p}_{\mathit{ij}}(t)\frac{{p}_{\mathit{jj}}(\mathrm{d}t) - 1} {\mathrm{d}t} \quad \ \forall \ i,j \in \Upsilon, \\ \end{array}$$
$$\begin{array}{rcl} \frac{{p}_{\mathit{ij}}(t + \mathrm{d}t) - {p}_{\mathit{ij}}(t)} {\mathrm{d}t} & =& \frac{{\sum \nolimits }_{k\in \Upsilon,\ k\neq j}{p}_{\mathit{ik}}(t){p}_{\mathit{kj}}(\mathrm{d}t)} {\mathrm{d}t} \\ & & +{p}_{\mathit{ij}}(t)\frac{1 -{\sum \nolimits }_{k\in \Upsilon,\ k\neq j}{p}_{\mathit{kj}}(\mathrm{d}t) - 1} {\mathrm{d}t} \quad \ \forall \ i,j \in \Upsilon.\end{array}$$
Hence, we have
$$\begin{array}{rcl} \dot{{p}}_{\mathit{ij}}(t)& =& {\sum \nolimits }_{k\in \Upsilon,\ k\neq j}{p}_{\mathit{ik}}(t){r}_{\mathit{kj}} + {p}_{\mathit{ij}}(t){r}_{\mathit{jj}}\quad \ \forall \ i,j \in \Upsilon.\end{array}$$
When expressing (8.3) in matrix form, the Chapman–Kolmogorov master equation arises
$$\begin{array}{rcl} \dot{\mathbf{P}}(t) = \mathbf{P}(t)\mathbf{R} = \mathbf{R}\mathbf{P}(t),& & \\ \end{array}$$
whose integral
$$\begin{array}{rcl} \mathbf{P}(t) ={ \mathbf{e}}^{\mathbf{R}t} ={ \sum \nolimits }_{n=0}^{\infty }\frac{{\mathbf{R}}^{n}{t}^{n}} {n!} & &\end{array}$$
is known as the time homogeneous Markov (THM) model of DNA sequence evolution [48, 63]. The THM model is a generalization of the Markov models described in Jukes and Cantor [44], Kimura [46], Hasegawa et al. [37], Tamura and Nei [78], and can be easily adapted to RNA, amino acid and codon sequences as shown in Felsenstein [29] and Schadt and Lange [71, 72]. In the next section, we shall investigate the dynamics of the THM model in order to derive a commonly used formula to quantify the similarity between molecular data.

8.2.2 Estimating Evolutionary Distances from Molecular Data

Two molecular sequences S 1and S 2, evolving at time t 0from a common ancestor, could be characterized at time tby different amounts of substitution events, some of which not directly observable. Hence, if we would sample the sequences at time tand measure their similarity, or evolutionary distance, in terms of number of observed differences, we could underestimate the overall substitution events that occurred since S 1and S 2split from their common ancestor. A number of authors suggested that the use of the time homogeneous Markov models could overcome the underestimation problem in all those cases in which the hypotheses at the core of the model would properly describe the real evolutionary process of the analyzed sequences [29]. Moreover, in order to compare the evolutionary distances of different pairs of molecular sequences, the authors also proposed to express the evolutionary distances in terms of expected number of substitution events per site rather than the time necessary to transform a sequence into another [29]. In this section, we will present the most general formula currently known in the literature to compute the evolutionary distance from pairwise molecular sequences. To this aim, we shall investigate now the dynamics of the THM model.

As shown in Zadeh and Desoer [84], (8.4) can also be expressed in closed formula as:
$$\begin{array}{rcl} \mathbf{P}(t) ={ \mathbf{e}}^{\mathbf{R}t} = \Omega {\mathbf{e}}^{\Lambda t}{\Omega }^{-1},& &\end{array}$$
where Ωis the eigenvector matrix of R, and Λis the diagonal matrix of the eigenvalues of R. This fact suggests that the spectrum of P(t) is the exponential spectrum of R, i.e., the dynamics of P(t) is univocally determined from the knowledge of the spectrum of R[84].
It is worth noting that the Markov conservative hypothesis implies that the determinant of matrix Ris equal to zero, i.e., at least one of its eigenvalues is identically zero. Moreover, since any k-leading principal sub-matrix of R, k< 4, has negative determinant, for one of the Sylvester corollaries (see [6, p. 409]) all the remaining eigenvalues are negative. Thus, as the spectrum of P(t) is the exponential spectrum of R, matrix P(t) has at least one eigenvalue equal to 1, called the maximal Lyapunov exponent, and three eigenvalues lying in the interval [0, 1]. The maximal Lyapunov exponent prevents the presence of chaotic attractors and guarantees that, as tgoes to infinity, the generic entry p ij (t) is non-zero and independent on the starting state iΥ. In other words, the maximal Lyapunov exponent guarantees the existence of four positive values π A , π C , π G , and π T , called equilibrium frequencies, such that
$$ \begin{array}{rcl} {\lim }_{t\rightarrow \infty }{p}_{\mathit{ij}}(t) = {\pi }_{j}& \quad & \ \forall \ i,j \in \Upsilon.\end{array}$$
The values π j constitute a stationary distributionand turn out to be useful to measure the evolutionary distance between S 1and S 2. In fact, denote O(t) as a matrix whose generic entry o ij (t), i, jΥ, represents the probability that at a given site and time t, S 1is characterized by nucleotide iand S 2by nucleotide j. Assume that O(0) = Π, where Πdenotes a diagonal matrix whose jth diagonal entry is π j . Then it holds that:
$$\begin{array}{rcl}{ o}_{\mathit{ij}}(t) ={ \sum \nolimits }_{k\in \Upsilon }{p}_{\mathit{ik}}'(t){\pi }_{k}{p}_{\mathit{kj}}(t)& \quad & \ \forall \ i,j \in \Upsilon,\ t \geq 0, \\ \end{array}$$
or equivalently
$$\begin{array}{rcl} \mathbf{O}(t) = \mathbf{P}'(t)\Pi \mathbf{P}(t)& \quad & \ t \geq 0,\end{array}$$
where P (t) denotes the transpose of P(t). Premultiplying for Π − 1both sides of (8.6) we have
$$\begin{array}{rcl}{ \Pi }^{-1}\mathbf{O}(t) = {\Pi }^{-1}\mathbf{P}'(t)\Pi \mathbf{P}(t) = {\Pi }^{-1}{\mathbf{e}}^{\mathbf{R}'t}\Pi {\mathbf{e}}^{\mathbf{R}t}.& & \\ \end{array}$$
Since for any matrix function it holds that f(ABA − 1) = A f(B)A − 1, we have
$$\begin{array}{rcl}{ \Pi }^{-1}\mathbf{O}(t) ={ \mathbf{e}}^{{\Pi }^{-1}\mathbf{R}'t\Pi }{\mathbf{e}}^{\mathbf{R}t}.& &\end{array}$$
If we assume that the hypothesis of time-reversibilityholds, i.e.:
$$\begin{array}{rcl} \Pi \mathbf{R} = \mathbf{R}'\Pi,& & \\ \end{array}$$
then Π − 1 R ′tΠand R tare commutative, and (8.7) becomes:
$$\begin{array}{rcl}{ \Pi }^{-1}\mathbf{O}(t) ={ \mathbf{e}}^{{\Pi }^{-1}\mathbf{R}'t\Pi +\mathbf{R}t }.& &\end{array}$$
By applying the logarithmic matrix function to both members of (8.8) and premultiplying for Π, we obtain
$$\begin{array}{rcl} \mathbf{R}'t\Pi + \Pi \mathbf{R}t = \Pi \log ({\Pi }^{-1}\mathbf{O}(t)).& & \\ \end{array}$$
As the negative trace of 2 Rrepresents the expected number of substitution events per site between S 1and S 2, at time tthe evolutionary distance \({d}_{{S}_{1},{S}_{2}}\)between S 1and S 2can be computed as:
$$\begin{array}{rcl}{ d}_{{S}_{1},{S}_{2}} = -2t\ \mbox{ tr}[\Pi \mathbf{R}] = -\mbox{ tr}[\Pi \log ({\Pi }^{-1}\mathbf{O}(t))].& &\end{array}$$
Equation (8.9) is known as the general time-reversible (GTR) distance [48, 63] and is the most general formula to quantify the similarity between molecular data using a time-reversible Markov model of molecular evolution. It is worth noting that if in one hand the hypothesis of time-reversibility simplifies the formalization of the evolutionary process of a pair of molecular sequences, on the other hand its introduction gives rises to important consequences. In fact, the hypothesis of time-reversibility implies that if we would compare two molecular data whose nucleotide frequencies are in equilibrium, the probability that a nucleotide iundergoes a substitution to nucleotide jwould be equal to the probability that a nucleotide jundergoes a substitution to nucleotide i. Thus, given a present-day molecular sequence and its ancestral sequence, it would be impossible to determine which sequence is the present and which is the ancestral one. Hence, the hypothesis of time-reversibility removes the temporality from the evolutionary process. We shall show in the next sections how the paradigms of phylogenetic estimation take advantage of this fact. Below, we provide an example from [79] showing a possible application of (8.9). Estimating Evolutionary Distances from Molecular Data: A Practical Example

Consider the mitochondrial DNA sequences of human and chimpanzee showed in Horai et al. [40]. The corresponding matrices O(t) and Πare respectively
$$\begin{array}{rcl} \begin{array}{cc} \begin{array}{cccc} \hspace{66.0pt} A\hspace{26.39996pt} &C\hspace{26.39996pt} &G\hspace{26.39996pt} &T\hspace{26.39996pt}\\ \end{array} & \\ \mathbf{O}(t) = \left (\begin{array}{cccc} \ 0.2889\ &\ 0.0012\ &\ 0.0131\ &\ 0.0005\\ \ 0.0012\ &\ 0.2799\ &\ 0.0001\ &\ 0.0266 \\ \ 0.0131\ &\ 0.0001\ &\ 0.1180\ &\ 0.0001\\ \ 0.00005\ &\ 0.0266\ &\ 0.0001\ &\ 0.2299\\ \end{array} \right )&\begin{array}{c} A\\ C \\ G\\ T\\ \end{array}\end{array} & & \\ \end{array}$$
$$\begin{array}{rcl} \begin{array}{cc} \begin{array}{cccc} \hspace{51.60004pt} A\hspace{26.39996pt} &C\hspace{26.39996pt} &G\hspace{26.39996pt} &T\hspace{26.39996pt}\\ \end{array} & \\ \Pi = \left (\begin{array}{cccc} \ 0.3037\ & \ 0\ & \ 0\ & \ 0\\ \ 0\ &\ 0.3079\ & \ 0\ & \ 0 \\ \ 0\ & \ 0\ &\ 0.1313\ & \ 0\\ \ 0\ & \ 0\ & \ 0\ &\ 0.2571\\ \end{array} \right )&\begin{array}{c} A\\ C \\ G\\ T.\\ \end{array}\end{array} & & \\ \end{array}$$
The product Π − 1 O(t) is:
$$\begin{array}{rcl} \begin{array}{cc} \begin{array}{cccc} \hspace{78.0pt} A\hspace{26.39996pt} &C\hspace{26.39996pt} &G\hspace{26.39996pt} &T\hspace{26.39996pt}\\ \end{array} & \\ {\Pi }^{-1}\mathbf{O}(t) = \left (\begin{array}{cccc} \ 0.9513\ &\ 0.0040\ &\ 0.0430\ &\ 0.0017\\ \ 0.0040\ &\ 0.9092\ &\ 0.0003\ &\ 0.0865 \\ \ 0.0995\ &\ 0.0008\ &\ 0.8989\ &\ 0.0008\\ \ 0.0030\ &\ 0.1036\ &\ 0.0004\ &\ 0.8940\\ \end{array} \right )&\begin{array}{c} A\\ C \\ G\\ T \end{array}\end{array} & & \\ \end{array}$$
and the corresponding logarithm matrix function log(Π − 1 O(t)) is:
$$\begin{array}{rcl} \begin{array}{cc} \begin{array}{llll} \hspace{78.0pt} A\hspace{26.39996pt} &C\hspace{26.39996pt} &G\hspace{26.39996pt} &T\hspace{26.39996pt}\\ \end{array} & \\ {\Pi }^{-1}\mathbf{O}(t) = \left (\begin{array}{cccc} - 0.0524& 0.0042 & 0.0466 & 0.0016\\ 0.0042 & - 0.1008 & 0.0002 & 0.0963 \\ 0.1078 & 0.0006 & - 0.1091& 0.0007\\ 0.00019 & 0.1154 & 0.0004 & - 0.1176 \\ \end{array} \right )&\begin{array}{c} A\\ C \\ G\\ T.\\ \end{array} \end{array} & & \\ \end{array}$$
The product Πlog(Π − 1 O(t)) is:
$$\begin{array}{rcl} \begin{array}{cc} \begin{array}{llll} \hspace{114.0pt} A\hspace{26.39996pt} &C\hspace{26.39996pt} &G\hspace{26.39996pt} &T\hspace{26.39996pt}\\ \end{array} & \\ \Pi \log ({\Pi }^{-1}\mathbf{O}(t)) = \left (\begin{array}{cccc} - 0.0159& 0.0013 & 0.0142 & 0.0005\\ 0.0013 & - 0.0310 & 0.0001 & 0.0297 \\ 0.0142 & 0.0001 & - 0.0143& 0.0001\\ 0.0005 & 0.0293 & 0.0001 & - 0.0302 \\ \end{array} \right )&\begin{array}{c} A\\ C \\ G\\ T\\ \end{array} \end{array} & & \\ \end{array}$$
whose negative trace provides the evolutionary distance d= − tr[Πlog(Π − 1 O(t))] = 0. 09152.

The reader interested in more sophisticated applications of the GTR distance will find useful examples in Lanave et al. [48], Rodriguez et al. [63], and Cantanzaro et al. [10, 11].

8.3 The Least-Squares Paradigm of Phylogenetic Estimation

A paradigm of phylogenetic estimation is a quantitative criterion used to discern a phylogeny from among plausible alternatives. One of the earliest paradigms was introduced by Cavalli-Sforza and Edwards [15] and is known as the additive modelor the the least-squares modelof phylogenetic estimation [9].

Cavalli-Sforza and Edwards observed that as molecular data provide the most detailed anatomy possible for any organism, the diversity of life on Earth must be reflected in them. Hence, if evolution of a set of molecular data from taxa could be seen as a tree, then it could be described through a process that changes nucleotides over time. The trajectories described by such a process would split as taxa diverges, unite as taxa hybridize, end as taxa become extinct, and living taxa would be represented by the intercept of the process and the “now” plane (Fig. 8.2).
Fig. 8.2

An evolutionary process and its projection onto the “now” plane – from Cavalli-Sforza and Edwards [15]

In general, we do not have a sampling of such a process over time but only the knowledge of the living taxa. Hence, in absence of further information, one may be able only to reconstruct the projection of the process onto the “now” plane rather than the process itself. Note that altough the evolutionary process over time is “directed,” its projection is not (Fig. 8.2). Thus, when the projection is considered, the direction of evolution is definitely missed. Nevertheless, the projection of the evolutionary process constitutes still an important piece of information for the analyzed taxa; for this reason, Cavalli-Sforza and Edwards proposed a possible paradigm to recover it.

The authors first considered the problem of how to represent formally a projection (phylogeny) of the evolutionary process. In order to remark the lack of a direction in evolution, the authors proposed to remove the root and the orientation in the edges of a phylogeny and represented it as an unrooted binary tree, i.e., an undirected acyclic graph in which each internal vertex has degree three. The degree constraint has not necessarily a biological foundation but helped the authors to formalize the evolutionary process. In fact, given ntaxa, the degree constraint implies that the number of edges in a phylogeny Tis (2n− 3) and the number of internal vertices is (n− 2). To prove the claim note that as Tis a tree, it holds that:
$$\vert {\mathcal{E}}_{i}(T)\vert + \vert {\mathcal{E}}_{e}(T)\vert = \vert {V }_{i}\vert + \vert {V }_{e}\vert - 1,$$
where e (T) and i (T) are the set of external and internal edges of T, respectively. Moreover, since internal vertices have degree three, the following property holds:
$$\begin{array}{rcl} 2\vert {\mathcal{E}}_{i}(T)\vert + 2\vert {\mathcal{E}}_{e}(T)\vert = 3\vert {V }_{i}\vert + \vert {V }_{e}\vert.& &\end{array}$$
Combining (8.10) and (8.11) it follows that | V i | = (n− 2) and | i | = (n− 3). Thus, a phylogeny \(T \in \mathcal{T}\)can be seen as an unrooted binary tree in which the ntaxa are the nleaves of Tand the common ancestors are internal vertices of degree three. It is worth noting that dealing with unrooted binary trees does not introduces oversimplifications since it is easy to see that any m-ary tree can be transformed into a phylogeny by adding “dummy” vertices and edges (e.g., see Fig. 8.3).
Fig. 8.3

The 4-ary tree (on the left) can be transformed into an unrooted binary tree by adding a dummy vertex and edge (dashed, on the right)

Cavalli-Sforza and Edwards encoded a phylogeny in \(\mathcal{T}\)by means of an Edge–Path incidence matrix of a Tree(EPT) (see [53, p. 550]) i.e., a network matrix Xhaving a row for each path between two leaves and a column for each edge. The generic entry x rs,e of matrix Xis equal to 1 if edge ebelongs to the path p rs from leaf rto leaf sand 0 otherwise. As an example, Fig. 8.4b shows the EPT matrix corresponding to the phylogeny shown in Fig. 8.4a. Hence, the authors proposed a model in which each evolutionary distance d rs , r, sΓ, among pairwise molecular data could be thought of as the resulting sum of mutation events accumulated on edges belonging to the path p rs linking taxa rand son X. In other words, fixed a phylogeny Xand defined w e as the amount of mutation events on edge e, Cavalli-Sforza and Edwards asserted that:
$$ \begin{array}{rcl} \mathbf{X}\mathbf{w} ={ \mathbf{D}}^{\bigtriangleup },& & \\ \end{array} $$
where w= { w e } is the edge weight vector associated with X, and D is a n(n− 1) ∕ 2 vector whose components are obtained by taking row by row the entries of the strictly upper triangular matrix D= { d rs }. In general, for a fixed matrix X, (8.12) may not admit solutions; for this reason, the authors proposed the use of the ordinary least-squares (OLS) to find the entries of vector w. Specifically, the authors suggested that the values \({\rho }_{\mathit{rs}} ={ \sum \nolimits }_{e\in {p}_{\mathit{rs}}}{x}_{\mathit{rs,e}}{w}_{e}\)should minimize the function,
$$ \begin{array}{rcl} {\sum \nolimits }_{r,s\in \Gamma :r\neq s}{({d}_{\mathit{rs}} - {\rho }_{\mathit{rs}})}^{2}& =& {\sum \nolimits }_{r,s\in \Gamma :r\neq s}{\left ({d}_{\mathit{rs}} -{\sum \nolimits }_{e\in {p}_{\mathit{rs}}}{x}_{\mathit{rs,e}}{w}_{e}\right )}^{2}, \\ \end{array} $$
i.e., minimize the quadratic error related to the approximation of the evolutionary process with its projection. This condition holds when
$$ \begin{array}{rcl} \mathbf{w} ={ \mathbf{X}}^{\dag }{\mathbf{D}}^{\bigtriangleup },& & \\ \end{array} $$
where X is the Moore–Penrose pseudo-inverse matrix of X. Thus, Cavalli-Sforza and Edwards’ paradigm of phylogenetic estimation may be stated in terms of the following NP-hard convex optimization problem [22]:

Problem 8.2.

The ordinary least-squares problem (OLSP)
$$ \begin{array}{rcl} {\min }_{\mathbf{X}\in \mathcal{X},\mathbf{w}\in {\mathbb{R}}^{2n-3}}& \quad & f(\mathbf{X}) ={ \sum \nolimits }_{r,s\in \Gamma :r\neq s}{\left ({d}_{\mathit{rs}} -{\sum \nolimits }_{e\in {p}_{\mathit{rs}}}{x}_{\mathit{rs,e}}{w}_{e}\right )}^{2}, \\ \end{array}$$
where \(\mathcal{X}\)denotes the set of all possible EPT matrices coding phylogenies. We refer the reader interested in a mathematical description of the necessary and sufficient conditions that characterize the set \(\mathcal{X}\)to [14].
Fig. 8.4

(a) An example of a phylogeny of four taxa (modeled as an unrooted binary tree in which each internal vertex has degree 3) and its associated EPT matrix (b)

8.3.1 Modified Least-Squares Paradigms of Phylogenetic Estimation

A number of authors proposed some modifications to Cavalli-Sforza and Edwards’ model. Specifically, Fitch and Margoliash [31] observed that OLSP implicitly considers the evolutionary distances d rs among pairwise molecular data as uniformly distributed independent random variables, a hypothesis that cannot be considered generally true due to the common evolutionary history of the analyzed taxa and the presence of sampling errors in molecular data. Hence, Fitch and Margoliash proposed to modify Cavalli-Sforza and Edwards’ paradigm by introducing the quantities ω rs representing the variances of d rs . They set ω rs = 1 ∕ d rs 2, r, sΓ, and stated the following paradigm:

Problem 8.3.

The weighted least-squares problem (WLSP)
$$ \begin{array}{rcl} {\min }_{\mathbf{X}\in \mathcal{X},\mathbf{w}\in {\mathbb{R}}^{2n-3}}& \quad & f(\mathbf{X}) ={ \sum \nolimits }_{r,s\in \Gamma :r\neq s}{\omega }_{\mathit{rs}}{\left ({d}_{\mathit{rs}} -{\sum \nolimits }_{e\in {p}_{\mathit{rs}}}{x}_{\mathit{rs,e}}{w}_{e}\right )}^{2}.\end{array}$$

Later, Chakraborty [16] and Hasegawa et al. [38] proposed a very similar paradigm, called the generalized least-squares problem (GLSP), in which the variances ω rs are replaced by the covariances of d rs . Nowadays, GLSP has fallen into disuse due to its statistical inconsistency problems [9].

8.3.2 Drawbacks of the Least-Squares Paradigms of Phylogenetic Estimation

Although the least-squares paradigm is a milestone in molecular phylogenetics, it is characterized by a number of drawbacks. For example, Cavalli-Sforza and Edwards’ paradigm returns a tree metric, i.e., a phylogeny whose edge weights are non-negative [73, 80], whenever the distance matrix Dsatisfies the ultrametric property
$$\begin{array}{rcl}{ d}_{\mathit{rs}} \leq \max \{ {d}_{\mathit{rq}},{d}_{\mathit{qs}}\}& \quad & \ r,s,q \in \Gamma \ :\ r\neq s\neq q \\ \end{array}$$
or the additive property
$$\begin{array}{rcl}{ d}_{\mathit{rs}} + {d}_{\mathit{hk}} \leq \max \{ {d}_{\mathit{rh}} + {d}_{\mathit{sk}},{d}_{\mathit{rk}} + {d}_{\mathit{sh}}\}& \quad & \ r,s,h,k \in \Gamma \ :\ r\neq s\neq h\neq k.\end{array}$$
Specifically, when Dis ultrametric or additive, the solution of Problem 8.2is unique and obtainable in polynomial time through the UPGMA greedy algorithm [74] or the sequential algorithm [80], respectively.

Unfortunately, when Dis generic (e.g., when it is obtained by means of the THM model, see Sect. 8.2), the least-squares paradigm may lead to the occurrence of negative entries in the vector w, i.e., to a phylogeny that is not a tree metric [32, 47]. Negative edge weights are infeasible both from a conceptual point of view (a distance, being an expected number of mutation events over time, cannot be negative [45]) and from a biological point of view (evolution cannot proceed backwards [57, 77]). For the latter reason at least, non-tree metric phylogenies are generally not accepted in molecular phylogenetics [35].

In response, some authors investigated the consequences of adding or guaranteeing the positivity constraint of edge weights in the least-squares paradigm.

Gascuel and Levy [33] observed that the presence of the positivity constraint transforms any least-square model into a non-negative linear regression problem which involves projecting the distance matrix Donto the positive cone defined by the set of tree metrics (see also [5, p. 187]). Thus, the authors designed an iterative polynomial time algorithm able to generate a sequence of least-squares projections of Donto such a set until an additive distance matrix (and the corresponding phylogeny) is obtained.

Farach et al. [26] proposed an alternative approach to impose the positivity constraint. Specifically, the authors proposed to find the minimal perturbation of the distance matrix Dthat guarantees the satisfaction of the additive or the ultrametric property. Farach et al. [26] proposed the -norm and 1-norm to constraint the entries of Dto satisfy the additive (ultrametric) property, and proved that such a problem can be solved in polynomial time when Dis required to be ultrametric under the -norm. By contrast, the authors proved that their approaches become hard when an ultrametric or an additive distance matrix is required under the 1-norm.

Finally, Barthélemy and Guénoche [3] and Makarenkov and Leclerc [50] proposed a Lagrangian relaxation of the positivity constraint to guarantee metric trees. Both algorithms are iterative and apply to the OLSP and the WLSP, respectively. Specifically, starting from a leaf, the algorithms generate a phylogeny with a growing number of leaves by solving an optimization problem in which the best non-negative edge weights that minimize the OLSP (respectively the WLSP) are found. Both algorithms are polynomial time and characterized by a computational complexity of O(n 4) and O(n 5), respectively. FITCH, was also proposed by Felsenstein [27].

A second and possibly more serious drawback of the least-squares is the statistical inconsistency of some paradigms. Specifically, a part from the OLSP which proves to be statistically consistent [23, 68], the only case in which the WLSP is known to be consistent, is when the variances ω rs are set to the inverse of the product of two strictly positive constants α i and α j . By contrast the GLSP is generally inconsistent [35].

8.4 The Minimum Evolution Paradigm of Phylogenetic Estimation

Kidd and Sgaramella-Zonta [45] and Beyer et al. [4] independently proposed an alternative paradigm known as the minimum evolution problemor the minimum evolution paradigmof phylogenetic estimation [9].

The minimum evolution paradigm arises from Cavalli-Sforza and Edwards’ model but mainly differs for the way in which a phylogeny is chosen from among possible alternatives. In fact, the minimum evolution criterion states that if the evolutionary distances d rs were unbiased estimates of the true evolutionary distances(i.e., the distances that one would obtain if all the molecular data from the analyzed taxa were available), then the true phylogeny would have an expected length shorter than any other possible phylogeny compatible with D. Hence, the minimum evolution paradigm aims at finding the phylogeny whose sum of edge weights, estimated from the corresponding evolutionary distances, is minimum [9].

It is worth noting that the minimum evolution criterion does not asses that molecular evolution follows minimum paths, but states, according to classical evolutionary theory, that a minimum length phylogeny may properly approximate the real phylogeny of well-conserved molecular data, i.e., data whose basic biochemical function has undergone small change throughout the evolution of the observed taxa [4]. That evolution proceeds by small rather than smallest changes is due to the fact that the neighborhood of possible alleles that are selected at each instant of the life of a taxon is finite, and perhaps more important, the selective forces acting on the taxon may not be constant throughout its evolution [4, 80]. Over the long term (periods of environmental change, including the intracellular environment), small changes** will not generally provide the smallest change. Thus, a minimum length phylogeny provides a lower bound on the total number of mutation events that could have occurred along evolution of the observed taxa.

Different versions of the minimum evolution paradigm are discussed in the literature on phylogenetics, and each one is characterized by its own edge weight estimation model [9]. Specifically, we can distinguish between the least-squares edge weight estimation model [24, 68, 69] and the linear programming edge weight estimation model [4, 14, 80]. In the next sections, we shall analyze both families in detail.

8.4.1 The Minimum Evolution Paradigm Under the Least-Squares Edge Weight Estimation Model

The earliest minimum evolution paradigm of phylogenetic estimation was proposed by Kidd and Sgaramella-Zonta [45] and exploits Cavalli-Sforza and Edwards’ model to estimate edge weights. The authors proposed to change the objective function of the OLSP with
$$\begin{array}{rcl} f(\mathbf{X}) =\parallel \mathbf{w} {\parallel }_{1} =\parallel {\mathbf{X}}^{\dag }{\mathbf{D}}^{\bigtriangleup }{\parallel }_{ 1}& &\end{array}$$
giving rise to the following NP-hard convex optimization problem [9]:

Problem 8.4.

The minimum evolution under least-squares problem (MELSP)
$$ \begin{array}{rcl}{ \min }_{\mathbf{X}\in \mathcal{X}}& \quad & f(\mathbf{X}) =\parallel {\mathbf{X}}^{\dag }{\mathbf{D}}^{\bigtriangleup }{\parallel }_{ 1}.\end{array}$$
Rzhetsky and Nei [68, 69] observed that the MELSP is statistically consistent, and such a property is also guaranteed when considering a relaxed version of the objective function in which edge weights are summed regardless their sign. However, Swofford et al. [77] criticized the choice of taking into account negative edge weights (or even their absolute value) in the objective function due to their biological unfeasibility. Thus, the authors proposed to replace the objective function (8.13) with
$$\begin{array}{rcl} f(\mathbf{X}) ={ \sum \nolimits }_{e\in \mathcal{E}(T=\mathbf{X})\vert {w}_{e}\geq 0}{w}_{e}.& & \\ \end{array}$$
Gascuel et al. [35] investigated the statistical consistency of Swofford et al. [77] paradigm and obtained analogous results to Rzhetsky and Nei [68, 69]. At present, Swofford et al. [77] paradigm is one of the most used versions of minimum evolution, being implemented in the well-known software for phylogenetic estimation “PAUP” [76]. The software is able to solve exactly instances of the paradigm containing upto 13 taxa and implements a hill-climbing metaheuristic to tackle larger instances of the problem.
Recently, Desper and Gascuel [24, 25] formalized the most recent version of the minimum evolution paradigm, called the Balanced Minimum Evolution problem (BME). The paradigm is based on Pauplin [59] seminal work in which the author criticized the biological consideration at the core of the OLSP. In fact, Pauplin noted that when computing the Moore-Penrose pseudo-inverse of the EPT matrix X, some edges can be weighted more than others. Since there is no biological justification for that, Pauplin proposed a new paradigm in which all edges of a phylogeny were weighted in the same way. The resulting objective function does not depend explicitly on edge weights and can be stated as follows:
$$\begin{array}{rcl} f(T) ={ \sum \nolimits }_{r,s\in \Gamma :r\neq s} \frac{{d}_{\mathit{rs}}} {{2}^{{\tau }_{\mathit{rs}}}},& & \\ \end{array}$$
where τ rs is called the topological distanceand denotes the number of edges belonging to the path between taxa rand sin a phylogeny T[9]. Hence, BME can be stated in terms of the following optimization problem:

Problem 8.5.

The Balanced Minimum Evolution Problem (BME)
$$ \begin{array}{rcl} {\min }_{T\in \mathcal{T}}& \quad & f(T) ={ \sum \nolimits }_{r,s\in \Gamma :r\neq s} \frac{{d}_{\mathit{rs}}} {{2}^{{\tau }_{\mathit{rs}}}}. \end{array}$$
BME is known to be statistically consistent [24, 25] and its optimal solution satisfies the positivity constraint whenever the distance matrix satisfies the triangular inequality
$$\begin{array}{rcl}{ d}_{\mathit{rs}} \leq {d}_{\mathit{rq}} + {d}_{\mathit{qs}}\ \forall \ r,s,q \in \Gamma \ :\ r\neq s\neq q.& & \\ \end{array}$$
For the latter reason at least, finding the optimal solution to instances of BME is highly desirable. Unfortunately, this task seems hard, although at present no information about the complexity of BME is known in the literature.
Recent advances in the polyhedral combinatorics of BME led to solve exactly instances containing up to 20–25 taxa [13]. However, the size of the instances analyzable to the optimum is still far away from real needs; for this reason, the use of clustering heuristics (Fig. 8.5), such as the neighbor-joining tree (NJT) ([70, 75]), is common to tackle large instances of BME. Possibly, future developments on the polyhedral combinatorics of BME will provide fundamental new insights for the development of more efficient exact approaches to solution of the problem.
Fig. 8.5

Clustering heuristics: initially a graph-star is considered; subsequently two vertices (circled) are selected, marked (white vertices) and joined by an internal vertex. The algorithm is iterated on the remaining black verticesuntil a phylogeny is obtained

8.4.2 The Minimum Evolution Paradigm Under the Linear Programming Edge Weight Estimation Model

An alternative model to estimate edge weights in the minimum evolution paradigm is provided by linear programming. The model was introduced by Beyer et al. [4] and is based on the following motivation: if the evolutionary distances between pairs of molecular data have to reflect the number of mutation events required to convert one molecular sequence into another over time, then they must satisfy the triangle inequality. Moreover, since any edge weight of a phylogeny is de facto an evolutionary distance, also the entries of vector wmust satisfy the triangle inequality. This last observation imposes that for each path p rs from taxa rand sin X, the constraint \({\sum \nolimits }_{e\in {p}_{\mathit{rs}}}{w}_{e}{x}_{\mathit{rs,e}} \geq {d}_{\mathit{rs}}\)is satisfied. Hence, Beyer et al. [4] proposed a possible paradigm of phylogenetic estimation consisting of solving the following mixed integer programming model:

Problem 8.6.

The minimum evolution problem under linear programming (MELP)
$$ \begin{array}{rcl} {\min }_{\mathbf{X}\in \mathcal{X},\mathbf{w}\in {\mathbb{R}}_{{ 0}^{+}}^{2n-3}}& \quad & f(\mathbf{X},\mathbf{w}) =\parallel \mathbf{w} {\parallel }_{1} \\ s.t.& \quad & \mathbf{X}\mathbf{w} \geq {\mathbf{D}}^{\bigtriangleup }.\end{array}$$

MELP is a well-known APX-hard problem [26] for which the current exact algorithms described in the literature provide solutions to instances containing not more than a dozen taxa [14]. To the best of our knowledge, nothing is known about the statistical consistency of MELP.

8.4.3 Drawbacks of the Minimum Evolution Paradigm of Phylogenetic Estimation

There are mainly two drawbacks that affect the minimum evolution paradigm of phylogenetic estimation: the “rigidity” of its criterion and the hardness of its paradigms.

As regards to the first drawback, some authors, among which notably Felsenstein [29, p. 175], argued that the minimum evolution paradigms could prove unreliable as it neglects rate variation when estimating edge weights. This major criticism could be possibly overcome using non-homogeneous Markov models. Specifically, in a non-homogeneous Markov model, the Chapman–Kolmogorov master equation becomes [84]:
$$\begin{array}{rcl} \dot{\mathbf{P}}(0,t) = \mathbf{R}(t)\mathbf{P}(0,t),& &\end{array}$$
whose integral is given by
$$\begin{array}{rcl} \mathbf{P}(0,t) = \mathbf{I} +{ \int \nolimits \nolimits }_{0}^{t}\mathbf{R}(\tau )\mathbf{P}(0,\tau )\mathrm{d}\tau,& &\end{array}$$
where Idenotes the identity matrix. The use of the integral (8.15) could prove unpractical for an empirical use. However, note that (8.15) can be approximated through the Peano–Baker sequence
$$\begin{array}{rcl}{ \mathbf{P}}_{0}(0,t)& =& \mathbf{I} \\ {\mathbf{P}}_{k}(0,t)& =& \mathbf{I} +{ \int \nolimits \nolimits }_{0}^{t}\mathbf{R}(\tau ){\mathbf{P}}_{ k-1}(0,\tau )\mathrm{d}\tau,\ \ \mbox{ $k = 1,2,\ldots $}\end{array}$$
since it is possible to prove that (8.16) converges to matrix P(0, t) when k[18]. Hence, under a non-homogeneous Markov model, the substitution probability matrix could be easily computed by means of iterative procedures that appropriately approximate (8.15).

Concerning the second drawback, it is easy to realize that the NP-hardness of the minimum evolution paradigms constitutes a big handicap for the development of exact solution approaches of practical use. Exact approaches are necessary to guarantee the optimality of a given solution and fundamental to investigate whether the hypotheses at the core of a criterion are well suited to describe the evolutionary process of the observed taxa. At present, most molecular datasets involve hundreds of taxa, whereas the current exact solution approaches have difficulty to tackle instances containing more than two dozen taxa (even smaller for the linear programming paradigm). Increasing the size of the datasets analyzable to the optimum is possibly one of the most challenging problems in molecular phylogenetics and warrants for sure further research efforts.

8.5 The Likelihood Paradigm of Phylogenetic Estimation

One of the most used criteria of phylogenetic estimation is the likelihood criterion. First formalized by Felsenstein [28], the likelihood criterion states that under many plausible explanations of an observed phenomenon, the one having the highest probability of occurring should be preferred to the others. When the likelihood criterion is applied to phylogenetic estimation, a phylogeny is defined to be optimal (or the most likely) if it has the highest probability of explaining the observed taxa. Thus, the likelihood paradigm consists of finding the phylogeny that maximizes a stochastic function, called the likelihood function, modeling a set of evolutionary hypotheses of the observed taxa.

The fundamental difference that distinguishes the likelihood paradigm from the least-squares and the minimum evolution paradigms is the nature of the information that it aims at finding. Specifically, if the least-squares and the minimum evolution paradigms aim at finding the best possible approximation of the projection of the evolutionary process of the observed taxa, the likelihood paradigm aims at reconstructing the most likely evolutionary process that originated the observed taxa. Hence, if the phylogeny of the least-squares and the minimum evolution paradigms is an unrooted binary tree, the phylogeny of the likelihood paradigm is a rooted phylogeny, i.e., full binary tree having (2n− 1) vertices.

Formally, the likelihood function is defined to be a recursive function of a fixed rooted phylogeny T, a model of molecular evolution Mand an observed data matrix S= { s rc }, i.e., a matrix whose rth row represents the molecular sequence of the r-th taxon. Defined the quantity
$$\begin{array}{rcl}{ L}_{c}^{r}(i)& =& \left \{\begin{array}{ll} 1,&\mbox{ if ${s}_{\mathit{rc}} = i$} \\ 0,&\mbox{ otherwise,}\\ \end{array} \right. \end{array}$$
for each leaf rof T, each column cof Sand each iΥ, and the quantity
$${L}_{c}^{v}(i) = \left [\ {\sum \nolimits }_{j\in \Upsilon }{L}_{c}^{{v}_{1} }(j){p}_{\mathit{ij}}({t}_{{v}_{1},v})\ \right ]\left [\ {\sum \nolimits }_{j\in \Upsilon }{L}_{c}^{{v}_{2} }(j){p}_{\mathit{ij}}({t}_{{v}_{2},v})\ \right ],$$
for each internal vertex vof Thaving v 1and v 2as children, the likelihood function L(T, S, M) of Tcan be defined as
$$L(T,\mathbf{S},M) ={ \prod \nolimits }_{c}\left [{\sum \nolimits }_{j\in \Upsilon }{L}_{c}^{\rho }(j){\pi }_{ j}\right ],$$
where ρ denotes the root of T. In the context of the likelihood paradigm, the expected numbers of substitutions per site t v h , v k assume the analogous meaning of edge weights in the least-squares and minimum evolution paradigms. Hence, when a given model of molecular evolution is assumed to hold (e.g., the THM model), finding the most likely phylogeny for a set of molecular sequences means maximizing the nonlinear (usually) non-convex stochastic function L(T, S, M) over all the possible rooted phylogenies, and for each rooted phylogeny, over all the possible associated edge weights t v h , v k and substitution probabilities p ij (t v h , v k ).

The NP-hardness of the likelihood paradigm [62] justified the development of a number of approximate solution approaches typically based on hill climbing strategies. Specifically, the strategies consist of a first phase in which the structure of a best-so-far phylogeny is modified and a second phase in which the nonlinear optimization of edge weights and the substitution probabilities is performed. The two phases are consecutively iterated until a stopping criterion is satisfied (e.g., the number of iterations performed or the elapsed time) [7, 28, 64]. A systematic review of the hill climbing strategies for the likelihood paradigm is out of the scope of the present chapter and can be found in Bryant et al. [7].

Recent mathematical advances on the likelihood paradigm led to overcome several limitations of the initial Felsenstein’s model, such as the absence of a rate variation among sites [81] and the absence of correlated evolution among sites [61]. Moreover, several progresses have been done concerning the analysis of its statistical consistency and its idenfiability, i.e., the study of the conditions under which the likelihood function is at least injective, an aspect markably related to its consistency [7]. The reader may find useful to refer to Gascuel [32] and Gascuel and Steel [34] for an overview of these aspects.

8.5.1 The Bayesian Paradigm of Phylogenetic Estimation

Given a dataset of molecular sequences, suppose we have sufficient empirical evidence to assert that the evolution of the observed taxa followed a specific stochastic process. Then, we could try to combine this a priori information with the likelihood function in order to bias the search of the most probable phylogeny through those solutions that fit the known evolutionary process. This idea is at the core of the most recent likelihood-derived paradigm of phylogenetic estimation, called the bayesian paradigm, and will be briefly described in this section.

Similar to the likelihood paradigm, the bayesian paradigm aims at finding the phylogeny that has the highest probability to recover the evolutionary process of the observed taxa. However, the selection of the most probable phylogeny is performed in light of the a priori information. Specifically, the a priori information is usually modeled by means of peculiar probability distributions, called prior distributions, which mainly concern three parameters, namely: the topology, i.e., the structure of the phylogeny, edge weights and the substitution probabilities. Defined
$$\begin{array}{rcl} \Theta =\{ {t}_{{v}_{h},{v}_{k}} \in {\mathbb{R}}_{{0}^{+}} : ({v}_{k},{v}_{k}) \in T,\ \forall \ T \in \mathcal{T}\,\},& & \\ \end{array}$$
as the edge weight space and
$$\mathcal{R} = \left \{{p}_{\mathit{ij}}(t) \in [0,1] :{ \sum \nolimits }_{j\in \Upsilon }{p}_{\mathit{ij}}(t) = 1,\ \forall \ i,j \in \Upsilon,\ t \in {\mathbb{R}}_{{0}^{+}}\right \},$$
as the substitution probability space, the bayesian paradigm considers the prior distributions γ(T), γ(t), and γ(R), to model the a priori information on \(\mathcal{T}\), Θ, and , respectively. Selected an appropriate model of molecular evolution M, the prior distributions are then combined with the likelihood function to provide a posterior density functionB(T, S, M) that represents the probability distribution of phylogenies conditional on the observed data matrix S, the model Mand the priors distributions γ(T), γ(t), and γ(R). Maximizing B(T, S, M) is the goal of the bayesian paradigm.
According to Bayes’ theorem, fixed a phylogeny T i and denoted t i and R i the corresponding subspaces of edge weights and substitution probabilities, the mathematical expression of the posterior probability B(T i , S, M) of T i can be written as:
$$\begin{array}{rcl} B({T}_{i},\mathbf{S},M) = \frac{{L}_{\int \nolimits \nolimits }({T}_{i},\mathbf{S},M)\gamma ({T}_{i})} {{\sum \nolimits }_{{T}_{j}\in \mathcal{T}}{L}_{\int \nolimits \nolimits }({T}_{j},\mathbf{S},M)\gamma ({T}_{j})},& &\end{array}$$
where γ(T i ) denotes the prior probability of T i , and L (T i , S, M) denotes the integral of the likelihood function L(T i , S, M) over all possible edge weights and substitution probabilities [41], i.e.,
$$\begin{array}{rcl}{ L}_{\int \nolimits \nolimits }({T}_{i},\mathbf{S},M) ={ \int \nolimits \nolimits }_{{t}_{i}}{ \int \nolimits \nolimits }_{{R}_{i}}L({T}_{i},\mathbf{S},M)\gamma (t')\gamma (R')\mathrm{d}t'\mathrm{d}R'.& & \\ \end{array}$$
Hence, finding the optimal solution for the bayesian paradigm means finding the phylogeny T i , the associated edge weights and the substitution probabilities that globally maximize the posterior probability distribution of phylogenies B(T, S, M). Since finding the maximum a posteriori phylogeny implicitly implies being able to solve the likelihood paradigm, solving the bayesian paradigm is NP-hard [29].

The recursive nature of the likelihood function and the intractability of computing the denominator of Bayes’ theorem prevent an analytical approach to solution of the bayesian paradigm. Hence, the maximum a posteriori phylogeny is usually computed by means of a Markov chain Monte Carlo (MCMC) algorithm [30], i.e., an algorithm that samples B(T, S, M) through a stochastic generation of phylogenies in \(\mathcal{T}\)([49, 52, 83]). Sampling B(T, S, M) is extremely time consuming; therefore, the bayesian estimations may take even weeks [42]. However, as observed by Yang [82] and Huelsenbeck et al. [41, 43], the sampling process has also the indisputable benefit of providing a measure of the reliability of the best-so-far solution found. In fact, by sampling stochastically around the (best local) maximum a posteriori phylogeny T , the bayesian paradigm could determine support values for the subtrees of T , i.e., measures of the posterior probability that the subtrees are true.

The bayesian paradigm is possibly the most complex among the phylogenetic estimation paradigms currently available in the literature on molecular phylogenetics. The recent computational advances obtained by Ronquist and Huelsenbeck [65] speeded up the execution of the MCMC algorithm and widened the use of the bayesian paradigm. However, the lack of a systematic investigation of its** statistical consistency and the unclear dependence of the posterior density function on the a priori information [82] possibly make the bayesian paradigm still unripe for phylogenetic estimation [1].

8.5.2 Drawbacks of the Likelihood and the Bayesian Paradigms of Phylogenetic Estimation

The higher the complexity of a paradigm, the higher the number of draw-backs that could arise, and the likelihood and the bayesian paradigms do not escape the rule. Specifically, a number of computational and theoretical drawbacks affect the two paradigms. The computational drawbacks mainly involve (i) the optimization aspects of the likelihood function and (ii) the sampling process in the bayesian paradigm. The theoretical drawbacks concern the evolutionary hypotheses at the core of the likelihood and bayesian criteria.

As regards to the computational drawbacks, in Sect. 8.5we have seen that finding the most likely phylogeny for a set of taxa involves maximizing a nonlinear and generally non-convex stochastic function over all the possible phylogenies in \(\mathcal{T}\), and for each phylogeny, over all the possible edge weights and substitution probabilities. Notoriously, this task can be only performed in an approximate way, due to a lack of general mathematical conditions that guarantee the global optimality of a solution in nonlinear non-convex programming [21, 54]. Hence, although it is possible (at least for small datasets) to enumerate all the possible phylogenies in \(\mathcal{T}\), it is not possible to optimize globally edge weights and the substitution probabilities of a fixed phylogeny T. This fact may affect negatively the statistical consistency of the likelihood and the bayesian paradigms. In fact, the local optima of the likelihood function grows up exponentially in function of the number of taxa considered [7, 19, 20]. Thus, fixed a phylogeny T, the global optimum of the likelihood function is generally approximated by means of hill-climbing techniques that jump from local optimum to another one until a stopping criterion is satisfied (e.g., the number of iterations performed or the elapsed time) [7, 28, 64]. Assume that two phylogenies T 1and T 2are given, and let μ1and μ2be two vectors whose entries are edge weights and the substitution probabilities associated to T 1and T 2, respectively. Let z 1and z 2, the likelihood values of T 1and T 2for μ1and μ2, respectively, and assume, without loss of generality, that z 1> z 2. Due to the local nature of the optima μ1and μ2, there could exists another local optimum, say \(\hat{{\mu }}_{2}\), such that \(\hat{{z}}_{2} >{z}_{1} >{z}_{2}\). If the hill-climbing algorithm finds \(\hat{{\mu }}_{2}\)before μ2, then we will consider T 2as a better phylogeny than T 1, otherwise we will discard T 2in favor of T 1. Hence, it is easy to realize that if one of the two phylogenies is the true phylogeny, its acceptance is subordinated to the goodness of the hill-climbing algorithm used to optimize the likelihood function, and as a result the statistical consistency of the likelihood and bayesian paradigms may be seriously compromised.

Some authors argued that multiple local optima should arise infrequently in real datasets [64], but this conjecture was proved false by Bryant et at. [7] and Catanzaro et al. [12]. Specifically, Bryant et al. [7] observed that changing the model of molecular evolution influences the presence of multiple optima in the likelihood function, and Catanzaro et al. [12] showed a number of real datasets affected by strong multimodality of the likelihood function. Despite the importance of the topic, to the best of our knowledge nobody was able to propose a plausible solution to this critical aspect.

A second computational drawback concerns the sampling process of the bayesian paradigm. In fact, as shown in Sect. 8.5.1, the approximation of the posterior density function is generally performed by means of a MCMC algorithm (e.g., the Metropolis or the Gibbs sampling algorithm [30]) that performs random walks in \(\mathcal{T}\). The random walk should be sufficiently diversified to sample potentially the whole \(\mathcal{T}\)and avoid double backs (i.e., to sample phylogenies already visited). Unfortunately, despite the recent computational advances in the bayesian paradigm [65], no technique may guarantee a sufficient diversification of the sampling process. Hence, the convergence to the maximum a posteriori phylogeny in practice becomes the convergence to the best-so-far a posteriori phylogeny that can be arbitrarily distinct from the true phylogeny (see [29, p. 296]).

As regards to the theoretical drawbacks, it is worth noting that the evolutionary hypotheses at the core of the likelihood and bayesian criteria of phylogenetic estimation are at the same time their strength and their weakness. For example, if a proposed model of molecular evolution matches (at least roughly) the real evolutionary process of a set of molecular data, then the likelihood and the bayesian paradigms could succeed in recovering the real phylogeny of the corresponding set of taxa (provided a solution to their computational drawbacks). However, if it is not the case, the paradigms will just provide a (sub)optimal solution for that model that may completely mismatch the real phylogeny. This aspect becomes evident e.g., in Rydin and Källersjö [67]’s article where, for a same dataset, two different Markov model of molecular evolution are used and two different maximum posterior phylogenies are obtained both having the 100% posterior probability of supporting the true phylogeny. concerns in general all the paradigms discussed in this chapter and possibly there is no easy solution for it.

Finally, a second theoretical drawback concerns the prior distributions of the bayesian paradigm. In fact, it is worth noting that if on one hand a strength of the bayesian paradigm is the ability to incorporate the a priori information, on the other hand this information is rarely available, hence in practical applications the prior distributions are generally modeled as uniform distributions, frustrating the potential strengths of the paradigm [1]. Moreover, it is unclear what type of information is well suited for a prior distribution; how possible conflicts among different sets of a priori information can be resolved; and if the inclusion of prior distributions strongly bias the estimation process. Huelsenbeck et al. [43] vaguely claimed “in a typical Bayesian analysis of phylogeny, the results are likely to be rather insensitive to the prior,” but this results was not confirmed by Yang [82] who observed that “[...] the posterior probabilities of trees vary widely over simulated datasets [...] and can be unduly influenced by the prior [...].” Possibly, further research efforts are needed to provide answers to these practical concerns.

8.6 Conclusion

The success of a criterion of phylogenetic estimation is undoubtedly influenced by the quality of the evolutionary hypotheses at its core. If the hypotheses match (at least roughly) the real evolutionary process of a set of taxa, then the criterion will hopefully succeed in recovering the real phylogeny. Otherwise, the criterion will miserably fail, by suggesting an optimal phylogeny that mismatch partially or totally the correct result. Since we are far away from a complete understanding of the complex facets of evolution, it is not generally possible to assess the superiority of a criterion over others. Hence, families of estimation criteria cohabit in the literature of molecular phylogenetics, by providing different perspectives about the evolutionary process of the involved taxa.

In this chapter, we have presented a general introduction of the existing literature about molecular phylogenetics. Our purpose has been to introduce a classification scheme in order to provide a general framework for papers appearing in this area. In particular, three main criteria of phylogenetic estimation have been outlined, the first based on the least-squares paradigm, first proposed by Cavalli-Sforza and Edwards [15], the second based on the minimum evolution paradigm, independently proposed by Kidd and Sgaramella-Zonta [45] and Beyer et al. [4], and the third based on the likelihood paradigm, first proposed by Felsenstein [28]. This division has been further disaggregated into different, approximately homogeneous sub-areas, and the basic aspects of each have been pointed out. For each, also, the most relevant issues affecting their use in tackling real-world sized problems have been outlined, as have the most interesting refinements deserving further research effort.



Daniele Catanzaro acknowledges support from the Belgian National Fund for Scientific Research (F.N.R.S.) of which he is “Chargé de Recherches.” Raffaele Pesenti and the anonymous reviewers for their valuable comments on previous versions of the manuscript. Finally, thanks to Prof. Mike Steel and Dr. Rosa Maria Lo Presti for helpful and exciting discussions.


  1. 1.
    J. K. Archibald, M. E. Mort, and D. J. Crawford. Bayesian inference of phylogeny: A non-technical primer. Taxon, 52:187–191, 2003CrossRefGoogle Scholar
  2. 2.
    D. A. Bader, B. M. E. Moret, and L. Vawter. Industrial applications of high-performance computing for phylogeny reconstruction. In SPIE ITCom: Commercial application for high-performance computing, pages 159–168. SPIE, WA, 2001Google Scholar
  3. 3.
    J. P. Barthélemy and A. Guénoche. Trees and proximity representations. Wiley, NY, 1991Google Scholar
  4. 4.
    W. A. Beyer, M. Stein, T. Smith, and S. Ulam. A molecular sequence metric and evolutionary trees. Mathematical Biosciences, 19:9–25, 1974CrossRefGoogle Scholar
  5. 5.
    Å. Björck. Numerical methods for least-squares problems. SIAM, PA, 1996CrossRefGoogle Scholar
  6. 6.
    J. Brinkhuis and V. Tikhomirov. Optimization: Insights and applications. Princeton University Press, NJ, 2005Google Scholar
  7. 7.
    D. Bryant, N. Galtier, and M. A. Poursat. Likelihood calculation in molecular phylogenetics. In O. Gascuel, editor, Mathematics of evolution and phylogeny. Oxford University Press, NY, 2005Google Scholar
  8. 8.
    R. M. Bush, C. A. Bender, K. Subbarao, N. J. Cox, and W. M. Fitch. Predicting the evolution of human influenza A. Science, 286(5446):1921–1925, 1999PubMedCrossRefGoogle Scholar
  9. 9.
    D. Catanzaro. The minimum evolution problem: Overview and classification. Networks, 53(2): 112–125, 2009CrossRefGoogle Scholar
  10. 10.
    D. Catanzaro, L. Gatto, and M. Milinkovitch. Assessing the applicability of the GTR nucleotide substitution model through simulations. Evolutionary Bioinformatics, 2:145–155, 2006Google Scholar
  11. 11.
    D. Catanzaro, R. Pesenti, and M. Milinkovitch. A non-linear optimization procedure to estimate distances and instantaneous substitution rate matrices under the GTR model. Bioinformatics, 22(6):708–715, 2006PubMedCrossRefGoogle Scholar
  12. 12.
    D. Catanzaro, R. Pesenti, and M. C. Milinkovitch. A very large-scale neighborhood search to estimate phylogenies under the maximum likelihood criterion. Technical report, G.O.M. – Computer Science Department – Université Libre de Bruxelles (U.L.B.), 2007Google Scholar
  13. 13.
    D. Catanzaro, M. Labbé, R. Pesenti, and J. J. Salazar-Gonzalez. The balanced minimum evolution problem. Technical report, G.O.M. – Computer Science Department – Université Libre de Bruxelles (U.L.B.), 2009Google Scholar
  14. 14.
    D. Catanzaro, M. Labbé, R. Pesenti, and J. J. Salazar-Gonzalez. Mathematical models to reconstruct phylogenetic trees under the minimum evolution criterion. Networks, 53(2):126–140, 2009CrossRefGoogle Scholar
  15. 15.
    L. L. Cavalli-Sforza and A. W. F. Edwards. Phylogenetic analysis: Models and estimation procedures. American Journal of Human Genetics, 19:233–257, 1967PubMedGoogle Scholar
  16. 16.
    R. Chakraborty. Estimation of time of divergence from phylogenetic studies. Canadian Journal of Genetics and Cytology, 19:217–223, 1977Google Scholar
  17. 17.
    B. S. W. Chang and M. J. Donoghue. Recreating ancestral proteins. Trends in Ecology and Evolution, 15(3):109–114, 2000PubMedCrossRefGoogle Scholar
  18. 18.
    L. Chisci. Sistemi Dinamici – Parte I. Pitagora, Italy, 2001Google Scholar
  19. 19.
    B. Chor, M. D. Hendy, B. R. Holland, and D. Penny. Multiple maxima of likelihood in phylogenetic trees: An analytic approach. Molecular Biology and Evolution, 17(10):1529–1541, 2000PubMedCrossRefGoogle Scholar
  20. 20.
    B. Chor, M. D. Hendy, and S. Snir. Maximum likelihood jukes-cantor triplets: Analytic solutions. Molecular Biology and Evolution, 23(3):626–632, 2005PubMedCrossRefGoogle Scholar
  21. 21.
    A. R. Conn, N. I. M. Gould, and P. L. Toint. Trust-region methods. SIAM, PA, 2000CrossRefGoogle Scholar
  22. 22.
    W. H. E. Day. Computational complexity of inferring phylogenies from dissimilarity matrices. Bulletin of Mathematical Biology, 49:461–467, 1987PubMedGoogle Scholar
  23. 23.
    F. Denis and O. Gascuel. On the consistency of the minimum evolution principle of phylogenetic inference. Discrete Applied Mathematics, 127:66–77, 2003CrossRefGoogle Scholar
  24. 24.
    R. Desper and O. Gascuel. Fast and accurate phylogeny reconstruction algorithms based on the minimum evolution principle. Journal of Computational Biology, 9(5):687–705, 2002PubMedCrossRefGoogle Scholar
  25. 25.
    R. Desper and O. Gascuel. Theoretical foundations of the balanced minimum evolution method of phylogenetic inference and its relationship to the weighted least-squares tree fitting. Molecular Biology and Evolution, 21(3):587–598, 2004PubMedCrossRefGoogle Scholar
  26. 26.
    M. Farach, S. Kannan, and T. Warnow. A robust model for finding optimal evolutionary trees. Algorithmica, 13:155–179, 1995CrossRefGoogle Scholar
  27. 27.
    J. Felsenstein. An alternating least-squares approach to inferring phylogenies from pairwise distances. Systematic Biology, 46:101–111, 1997PubMedCrossRefGoogle Scholar
  28. 28.
    J. Felsenstein. Evolutionary trees from DNA sequences: A maximum likelihood approach. Journal of Molecular Evolution, 17:368–376, 1981PubMedCrossRefGoogle Scholar
  29. 29.
    J. Felsenstein. Inferring phylogenies. Sinauer Associates, MA, 2004Google Scholar
  30. 30.
    G. S. Fishman. Monte Carlo: Concepts, algorithms, and applications. Springer, NY, 1996Google Scholar
  31. 31.
    W. M. Fitch and E. Margoliash. Construction of phylogenetic trees. Science, 155:279–284, 1967PubMedCrossRefGoogle Scholar
  32. 32.
    O. Gascuel. Mathematics of evolution and phylogeny. Oxford University Press, NY, 2005Google Scholar
  33. 33.
    O. Gascuel and D. Levy. A reduction algorithm for approximating a (non-metric) dissimilarity by a tree distance. Journal of Classification, 13:129–155, 1996CrossRefGoogle Scholar
  34. 34.
    O. Gascuel and M. A. Steel. Reconstructing evolution. Oxford University Press, NY, 2007Google Scholar
  35. 35.
    O. Gascuel, D. Bryant, and F. Denis. Strengths and limitations of the minimum evolution principle. Systematic Biology, 50:621–627, 2001PubMedCrossRefGoogle Scholar
  36. 36.
    P. H. Harvey, A. J. L. Brown, J. M. Smith, and S. Nee. New uses for new phylogenies. Oxford University Press, Oxford, 1996Google Scholar
  37. 37.
    M. Hasegawa, H. Kishino, and T. Yano. Evolutionary trees from DNA sequences: A maximum likelihood approach. Journal of Molecular Evolution, 17:368–376, 1981CrossRefGoogle Scholar
  38. 38.
    M. Hasegawa, H. Kishino, and T. Yano. Dating the human-ape splitting by a molecular clock of mitochondrial DNA. Journal of Molecular Evolution, 22:160–174, 1985PubMedCrossRefGoogle Scholar
  39. 39.
    D. P. Heyman and M. J. Sobel, editors. Stochastic models, volume 2 of Handbooks in operations research and management science. North-Holland, Amsterdam, 1990Google Scholar
  40. 40.
    S. Horai, Y. Sattah, K. Hayasaka, R. Kondo, T. Inoue, T. Ishida, S. Hayashi, and N. Takahata. Man’s place in the hominoidea revealed by mitochondrial DNA genealogy. Journal of Molecular Evolution, 35:32–43, 1992PubMedCrossRefGoogle Scholar
  41. 41.
    J. P. Huelsenbeck, B. Larget, P. van der Mark, and F. Ronquist. MrBayes: Bayesian inference of phylogenetic trees. Bioinformatics, 17(8):754–755, 2001PubMedCrossRefGoogle Scholar
  42. 42.
    J. P. Huelsenbeck, F. Ronquist, R. Nielsen, and J. P. Bollback. Bayesian inference of phylogeny and its impact on evolutionary biology. Science, 294:2310–2314, 2001PubMedCrossRefGoogle Scholar
  43. 43.
    J. P. Huelsenbeck, B. Larget, R. E. Miller, and F. Ronquist. Potential applications and pitfalls of bayesian inference of phylogeny. Systematic Biology, 51:673–688, 2002PubMedCrossRefGoogle Scholar
  44. 44.
    T. H. Jukes and C.R. Cantor. Evolution of protein molecules. In H. N. Munro, editor, Mammalian protein metabolism, pages 21–123. Academic Press, NY, 1969Google Scholar
  45. 45.
    K. K. Kidd and L. A. Sgaramella-Zonta. Phylogenetic analysis: Concepts and methods. American Journal of Human Genetics, 23:235–252, 1971PubMedGoogle Scholar
  46. 46.
    M. Kimura. A simple method for estimating evolutionary rates of base substitutions through comparative studies of nucleotide sequences. Journal of Molecular Evolution, 16:111–120, 1980PubMedCrossRefGoogle Scholar
  47. 47.
    M. K. Kuhner and J. Felsenstein. A simulation comparison of phylogeny algorithms under equal and unequal rates. Molecular Biology and Evolution, 11(3):584–593, 1994Google Scholar
  48. 48.
    C. Lanave, G. Preparata, C. Saccone, and G. Serio. A new method for calculating evolutionary substitution rates. Journal of Molecular Evolution, 20:86–93, 1984PubMedCrossRefGoogle Scholar
  49. 49.
    S. Li, D. Pearl, and H. Doss. Phylogenetic tree construction using Markov chain Monte Carlo. Journal of the American Statistical Association, 95:493–508, 2000CrossRefGoogle Scholar
  50. 50.
    V. Makarenkov and B. Leclerc. An algorithm for the fitting of a tree metric according to a weighted least-squares criterion. Journal of Classification, 16:3–26, 1999CrossRefGoogle Scholar
  51. 51.
    M. A. Marra, S. J. Jones, C. R. Astell, R. A. Holt, A. Brooks-Wilson, Y. S. Butterfield, J. Khattra, J. K. Asano, S. A. Barber, S. Y. Chan, A. Cloutier, S. M. Coughlin, D. Freeman, N. Girn, O. L. Griffith, S. R. Leach, M. Mayo, H. McDonald, S. B. Montgomery, P. K. Pandoh, A. S. Petrescu, A. G. Robertson, J. E. Schein, A. Siddiqui, D. E. Smailus, J. M. Stott, G. S. Yang, F. Plummer, A. Andonov, H. Artsob, N. Bastien, K. Bernard, T. F. Booth, D. Bowness, M. Czub, M. Drebot, L. Fernando, R. Flick, M. Garbutt, M. Gray, A. Grolla, S. Jones, H. Feldmann, A. Meyers, A. Kabani, Y. Li, S. Normand, U. Stroher, G. A. Tipples, S. Tyler, R. Vogrig, D. Ward, B. Watson, R. C. Brunham, M. Krajden, M. Petric, D. M. Skowronski, C. Upton, and R. L. Roper. The genome sequence of the SARS-associated coronavirus. Science, 300(5624):1399–1404, 2003PubMedCrossRefGoogle Scholar
  52. 52.
    B. Mau and M. A. Newton. Phylogenetic inference for binary data on dendograms using Markov chain Monte Carlo. Journal of Computational and Graphical Statistics, 6:122–131, 1997Google Scholar
  53. 53.
    G. L. Nemhauser and L. A. Wolsey. Integer and combinatorial optimization. Wiley-Interscience, NY, 1999Google Scholar
  54. 54.
    G. L. Nemhauser, A. H. G. Rinnooy Kan, and M. J. Tod, editors. Optimization, volume 1 of Handbooks in operations research and management science. North-Holland, Amsterdam, 1989Google Scholar
  55. 55.
    C. Y. Ou, C. A. Ciesielski, G. Myers, C. I. Bandea, C. C. Luo, B. T. M. Korber, J. I. Mullins, G. Schochetman, R. L. Berkelman, A. N. Economou, J. J. Witte, L. J. Furman, G. A. Satten, K. A. Maclnnes, J. W. Curran, and H. W. Jaffe. Molecular epidemiology of HIV transmission in a dental practice. Science, 256(5060):1165–1171, 1992PubMedCrossRefGoogle Scholar
  56. 56.
    L. Pachter and B. Sturmfels. The mathematics of phylogenomics. SIAM Review, 49(1):3–31, 2007CrossRefGoogle Scholar
  57. 57.
    R. D. M. Page and E. C. Holmes. Molecular evolution: A phylogenetic approach. Blackwell Science, Oxford, 1998Google Scholar
  58. 58.
    J. M. Park and M. W. Deem. Phase diagrams of quasispecies theory with recombination and horizontal gene transfer. Physical Review Letters, 98:058101–058104, 2007PubMedCrossRefGoogle Scholar
  59. 59.
    Y. Pauplin. Direct calculation of a tree length using a distance matrix. Journal of Molecular Evolution, 51:41–47, 2000PubMedGoogle Scholar
  60. 60.
    P. A. Pevzner. Computational molecular biology. MIT, MA, 2000Google Scholar
  61. 61.
    D. D. Pollock, W. R. Taylor, and N. Goldman. Coevolving protein residues: Maximum likelihood identification and relationship to structure. Journal of Molecular Biology, 287(1): 187–198, 1999PubMedCrossRefGoogle Scholar
  62. 62.
    S. Roch. A short proof that phylogenetic tree reconstruction by maximum likelihood is hard. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 3(1):92–94, 2006PubMedCrossRefGoogle Scholar
  63. 63.
    F. Rodriguez, J. L. Oliver, A. Marin, and J. R. Medina. The general stochastic model of nucleotide substitution. Journal of Theoretical Biology, 142:485–501, 1990PubMedCrossRefGoogle Scholar
  64. 64.
    J. S. Rogers and D. Swofford. Multiple local maxima for likelihoods of phylogenetic trees from nucleotide sequences. Molecular Biology and Evolution, 16:1079–1085, 1999PubMedCrossRefGoogle Scholar
  65. 65.
    F. Ronquist and J. P. Huelsenbeck. MrBayes 3: Bayesian phylogenetic inference under mixed models. Bioinformatics, 19(12):1572–1574, 2003PubMedCrossRefGoogle Scholar
  66. 66.
    H. A. Ross and A. G. Rodrigo. Immune-mediated positive selection drives human immunodeficency virus type 1 molecular variation and predicts disease duration. Journal of Virology, 76(22):11715–11720, 2002PubMedCrossRefGoogle Scholar
  67. 67.
    C. Rydin and M. Källersjö. Taxon sampling and seed plant phylogeny. Cladistics, 18:485–513, 2002Google Scholar
  68. 68.
    A. Rzhetsky and M. Nei. Theoretical foundations of the minimum evolution method of phylogenetic inference. Molecular Biology and Evolution, 10:1073–1095, 1993PubMedGoogle Scholar
  69. 69.
    A. Rzhetsky and M. Nei. Statistical properties of the ordinary least-squares generalized least-squares and minimum evolution methods of phylogenetic inference. Journal of Molecular Evolution, 35:367–375, 1992PubMedCrossRefGoogle Scholar
  70. 70.
    N. Saitou and M. Nei. The neighbor-joining method: A new method for reconstructing phylogenetic trees. Molecular Biology and Evolution, 4:406–425, 1987PubMedGoogle Scholar
  71. 71.
    E. Schadt and K. Lange. Codon and rate variation models in molecular phylogeny. Molecular Biology and Evolution, 19(9):1534–1549, 2002PubMedCrossRefGoogle Scholar
  72. 72.
    E. Schadt and K. Lange. Applications of codon and rate variation models in molecular phylogeny. Molecular Biology and Evolution, 19(9):1550–1562, 2002PubMedCrossRefGoogle Scholar
  73. 73.
    C. Semple and M. A. Steel. Phylogenetics. Oxford University Press, NY, 2003Google Scholar
  74. 74.
    P. H. A. Sneath and R. R. Sokal. Numerical taxonomy. W. K. Freeman and Company, CA, 1963Google Scholar
  75. 75.
    J. A. Studier and K. J. Keppler. A note on the neighbor-joining algorithm of Saitou and Nei. Molecular Biology and Evolution, 5:729–731, 1988PubMedGoogle Scholar
  76. 76.
    D. L. Swofford. PAUP* version 4.0. Sinauer Associates, MA, 1997Google Scholar
  77. 77.
    D. L. Swofford, G. J. Olsen, P. J. Waddell, and D. M. Hillis. Phylogenetic inference. In D. M. Hillis, C. Moritz, and B. K. Mable, editors, Molecular systematics, pages 407–514. Sinauer Associates, MA, 1996Google Scholar
  78. 78.
    K. Tamura and M. Nei. Estimation of the number of nucleotide substitutions in the control region of mitochondrial DNA in humans and chimpanzees. Molecular Biology and Evolution, 10(3):512–526, 1993PubMedGoogle Scholar
  79. 79.
    P. J. Waddell and M. A. Steel. General time-reversible distances with unequal rates across sites: Mixing gamma and inverse gaussian distributions with invariant sites. Molecular Phylogenetics and Evolution, 8:398–414, 1997PubMedCrossRefGoogle Scholar
  80. 80.
    M. S. Waterman, T. F. Smith, M. Singh, and W. A. Beyer. Additive evolutionary trees. Journal of Theoretical Biology, 64:199–213, 1977PubMedCrossRefGoogle Scholar
  81. 81.
    Z. Yang. Maximum likelihood phylogenetic estimation from DNA sequences with variable rates over sites: Approximate methods. Journal of Molecular Evolution, 39:306–314, 1994PubMedCrossRefGoogle Scholar
  82. 82.
    Z. Yang. Bayesian inference in molecular phylogenetics. In O. Gascuel, editor, Mathematics of evolution and phylogeny. Oxford University Press, NY, 2005Google Scholar
  83. 83.
    Z. Yang and B. Rannala. Bayesian phylogenetic inference using DNA sequences: A Markov chain Monte Carlo method. Molecular Biology and Evolution, 14:717–724, 1997PubMedCrossRefGoogle Scholar
  84. 84.
    L. A. Zadeh and C. A. Desoer. Linear system theory. McGraw-Hill, NY, 1963Google Scholar

Copyright information

© Springer New York 2011

Authors and Affiliations

  1. 1.Service Graphes and Mathematical Optimization, Computer Science DepartmentUniversité Libre de Bruxelles (U.L.B.)BrusselsBelgium

Personalised recommendations