International Journal of Primatology

, Volume 35, Issue 1, pp 32–54

The Use (and Misuse) of Phylogenetic Trees in Comparative Behavioral Analyses


    • Department of AnthropologyNew York University
    • New York Consortium in Evolutionary Primatology
    • Behavioral Ecology and Sociobiology UnitGerman Primate Center
  • Christina M. Bergey
    • Department of AnthropologyNew York University
    • New York Consortium in Evolutionary Primatology
  • Andrew S. Burrell
    • Department of AnthropologyNew York University

DOI: 10.1007/s10764-013-9701-0


Phylogenetic comparative methods play a critical role in our understanding of the adaptive origin of primate behaviors. To incorporate evolutionary history directly into comparative behavioral research, behavioral ecologists rely on strong, well-resolved phylogenetic trees. Phylogenies provide the framework on which behaviors can be compared and homologies can be distinguished from similarities due to convergent or parallel evolution. Phylogenetic reconstructions are also of critical importance when inferring the ancestral state of behavioral patterns and when suggesting the evolutionary changes that behavior has undergone. Improvements in genome sequencing technologies have increased the amount of data available to researchers. Recently, several primate phylogenetic studies have used multiple loci to produce robust phylogenetic trees that include hundreds of primate species. These trees are now commonly used in comparative analyses and there is a perception that we have a complete picture of the primate tree. But how confident can we be in those phylogenies? And how reliable are comparative analyses based on such trees? Herein, we argue that even recent molecular phylogenies should be treated cautiously because they rely on many assumptions and have many shortcomings. Most phylogenetic studies do not model gene tree diversity and can produce misleading results, such as strong support for an incorrect species tree, especially in the case of rapid and recent radiations. We discuss implications that incorrect phylogenies can have for reconstructing the evolution of primate behaviors and we urge primatologists to be aware of the current limitations of phylogenetic reconstructions when applying phylogenetic comparative methods.


Coalescence Gene tree-species tree Molecular phylogenetics Supermatrix Supertree


The comparative approach plays a critical role in evolutionary biology. Comparisons at different levels of analysis (individuals, populations, or species) are essential to develop a comprehensive picture of biological processes and to understand better how natural selection has shaped morphological or behavioral traits (Harvey and Pagel 1991; Harvey et al. 1996; Nunn 2011; Nunn and Barton 2001; Pagel 1999). The comparative approach is therefore the basis on which to gather information on the evolution of any phenotypic traits, from the reconstruction of ancestral morphotypes (O’Leary et al. 2013) to the discrimination between homologous and convergent behaviors (MacLean et al. 2012).

Comparative analyses are particularly critical in the study of the evolution of behaviors. Behaviors tend to be more flexible and labile than other traits, e.g., morphological characters, and strong phylogenetic reconstructions are critical to obtain the framework necessary to study their evolution. Moreover, because behaviors rarely leave traces in the fossil record, the comparative approach is often the only tool available to reconstruct ancestral traits in extinct species. The critical role of the comparative approach in behavioral ecology was clearly stated by Niko Tinbergen when he suggested four major questions to study animal behavior (Tinbergen 1959, 1963). Of these, two ultimate questions —function and phylogeny— are difficult, if not impossible, to address successfully without a phylogenetic comparative perspective (Sherman 1988; Tinbergen 1963). Behavioral ecologists need to rely on strong, well-resolved phylogenetic trees to incorporate evolutionary history directly into comparative behavioral research (Arnold et al. 2010; Nunn 2011; Owens 2006; Price et al. 2011).

Despite the historical importance of a phylogenetic perspective and the recognition of the central role that comparative methods play in studying animal behavior, some recent studies have suggested that the use of phylogenies in animal behavioral studies has declined over the years (Owens 2006; Price et al. 2011). These results are even more striking when compared to the overall increase in our knowledge of the tree of life (Delsuc et al. 2005; Thomson and Shaffer 2010): phylogenetic reconstructions have become increasingly more complete and reliable, providing researchers with unprecedented historical perspectives of animal behavior (Nunn 2011; Price et al. 2011).

The field of evolutionary anthropology possibly represents a notable exception to this trend (Mulder 2001; Nunn 2011; Nunn and Barton 2001). Anthropologists frame most of their studies with a comparative approach, from reconstructing the evolution of early hominins in the fossil record (Harrison 2010) to the study of cognitive behavior in great apes (MacLean et al. 2012). The widespread use of comparative methods by evolutionary anthropologists might often be driven by the need to contextualize primate evolutionary history in an anthropocentric perspective. Answering questions such as “what does cognitive behavior in great apes tell us about human intelligence?” or “which primate social system is a good proxy for early hominin societies?” requires a comparison between human behavior and our closest relatives (Jolly 2009). The comparative method therefore has a long history in primatology, and anthropology in general (Jolly 2009; Nunn 2011), and it is not surprising that considerable effort has been made to obtain a well resolved phylogenetic tree for the order Primates (Chatterjee et al. 2009; Fabre et al. 2009; Perelman et al. 2011; Springer et al. 2012).

One of the first phylogenies to be extensively used for comparative studies of primates was published by Purvis (1995). The author compiled a large data set from 112 publications to include 203 primate species in one comprehensive phylogeny. The resulting tree was estimated to be 79% resolved (160 fully bifurcating nodes out of 202). Since Purvis’s seminal study, our knowledge of the primate tree has been dramatically improved. The advent of new molecular techniques has allowed researchers to increase the amount of data and number of species included in phylogenetic studies. Recently, several studies have investigated primate phylogeny by combining multiple loci into a single matrix (Chatterjee et al. 2009; Fabre et al. 2009; Jameson et al. 2011; Perelman et al. 2011; Springer et al. 2012). For instance, Perelman et al. (2011) compiled one of the largest and most complete primate datasets to date containing 191 species. More recently, Springer and colleagues (2012) expanded Perelman’s data set and presented a phylogeny including 367 primate species.

The availability of large and robust phylogenetic trees that include most of the known primate species might provide the false impression that we have a complete picture of “The Tree” of the living primates. However, some methods used to analyze large multilocus data sets, such as the supertree and supermatrix methods, have limitations that are often ignored by evolutionary primatologists (Degnan and Rosenberg 2006, 2009; Edwards 2009; Ting and Sterner 2013). These methods do not account for the independent history of each individual gene and may provide misleading results. Indeed, individual gene trees can in fact differ from the actual species tree because of the coalescent process. Supertree and supermatrix methods often assume that genes (or loci)  all have the same topology; however, coalescent events do not necessarily coincide with speciation events and in specific evolutionary scenarios, e.g., rapid radiations or high levels of hybridization, gene tree heterogeneity might be particularly high (Degnan and Rosenberg 2006, 2009; Edwards 2009) (Fig. 1). Recent advances in coalescence methods allow researchers to co-estimate discordant gene trees embedded in a shared species tree. This method, also known as the gene tree-species tree approach, has been advocated to infer more accurately phylogenetic trees in case of a high level of gene tree discordance (Edwards 2009; Leaché and Rannala 2011; Ting and Sterner 2013).
Fig. 1

Gene tree-species tree diagram. Diagram depicting two possible sources of discordance between gene trees (1(2,3)) and species tree ((1,2),3). (a) Incomplete lineage sorting: failure of two or more lineages in a population to coalesce, leading to one of the lineages to first coalesce with a lineage from a less closely related population. (b) Hybridization or horizontal gene transfer: a lineage jumps from the population ancestral to species 3 to the population ancestral to species 2.

Here we contend that, despite the amount of data available today, our knowledge of primate phylogeny is still incomplete and this perception of confidence is somewhat unwarranted. Our goal here is to 1) review how phylogenetic trees can be used in primate behavioral studies; 2) explore the challenges in reconstructing phylogenetic trees using multiple loci; 3) provide two examples (Papionini and Macaca) that illustrate the current status of phylogenetic reconstructions within primates; and 4) discuss advantages and disadvantages of the gene tree-species tree approach when reconstructing primate phylogenies.

Phylogenetic Comparative Methods

In this section we review some of the most common applications of comparative methods in the study of primate behavior. This list is not exhaustive and our goal is only to illustrate the importance of complete and well-resolved phylogenies in conducting comparative analyses. We present four main categories of phylogenetic comparative methods to study primate behavior: 1) phylogenetic signal, 2) ancestral state reconstruction, 3) correlated evolution, and 4) phylogenetic targeting. A fully detailed review of these techniques can be found in Nunn (2011) and MacLean et al. (2012).

Phylogenetic signal is the term used to describe morphological or behavioral similarities due to a common, shared evolutionary history. Other terms such as phylogenetic constraint or inertia are also used to identify the same concept (Blomberg et al. 2003). Closely related species often resemble each other in their morphology and behavior more than distantly related species because they inherited the same or similar traits from their common ancestor (Blomberg et al. 2003; Harvey and Pagel 1991; MacLean et al. 2012; Nunn 2011). Phylogenetic trees are thus essential to identify the level of phylogenetic signal within a particular trait and to identify processes such as convergent evolution (two distantly related organisms that resemble each other because of common adaptation) or character displacement (closely related organisms that resemble each other less than expected). Identifying the level of phylogenetic signal for different phenotypic traits allows researchers to estimate how strongly phylogeny can predict behavioral variation across species. For instance, it has been shown that morphological or physiological traits have higher average phylogenetic signal than behavioral traits: behavioral traits are often subjected to environmental acclimation and tend to be more labile than other types of traits (Blomberg et al. 2003). If two sister taxa are incorrectly separated in a phylogenetic tree, comparative studies might misinterpret their similar traits as evidence of adaptation to similar conditions.

Another important use of phylogenetic trees in comparative methods is the reconstruction of ancestral states. For instance, O’Leary et al. (2013) used 4541 characters from both extinct and living species to reconstruct the hypothetical ancestral mammal, including both morphological, e.g., body mass, and behavioral, e.g., feeding and mating behavior, traits. Steiper and Seiffert (2012) used ancestral reconstruction for morphological traits, such as body mass and endocranial capacity, to adjust estimates of the rate of evolution in early primates. The reconstruction of ancestral states is also quite common in paleontology. The fossil record is largely incomplete and many morphological features can be inferred using phylogenetic reconstruction (Benefit and McCrossin 1991; Montgomery et al. 2010). Because behavioral traits do not fossilize, reconstruction of ancestral traits using phylogenetic information is the only tool available to researchers to describe behavioral traits of extinct species. In behavioral studies of living taxa, the reconstruction of ancestral states can be used to understand when in time and where in phylogeny a particular behavior has arisen (MacLean et al. 2012). For instance, phylogenetic comparative methods have been used to reconstruct the evolution of sexual swellings in macaques (Nunn 2011) and to investigate the evolutionary history of primate mating systems (Opie et al. 2012).

A third important evolutionary line of inquiry in the study of primate behavior is the identification of independent variables, such as social or ecological factors, that explain variation in specific behavioral traits. To compare different variables across taxa successfully it is necessary to control for common ancestry. Similar traits in closely related species might be a consequence of common ancestry and not necessarily adaptations to similar selective pressures. Data points used in behavioral analyses thus may not be statistically independent, violating one of the major assumptions in correlation and regression analyses. This is particularly important for traits with a high phylogenetic signal. One of the most widely used approaches to control for evolutionary relatedness in comparative biology is the method of phylogenetic independent contrasts (Felsenstein 1985; Garland et al. 1992; Nunn 2011; Nunn and Barton 2001). By examining the amount of evolutionary divergence since two lineages last shared a common ancestor (usually represented by differences in trait values between closely related lineages), this method addresses the statistical nonindependence of comparative data. Incorporating phylogenetic information in comparative analyses increases the statistical power to detect relationships between variables. It also reduces the risk of erroneously inferring a significant relationship where none exists (Di Fiore and Rendall 1994; MacLean et al. 2012; Nunn 2011; Rendall and Di Fiore 1995). Besides phylogenetic independent contrasts, other methods have been suggested to address correlated evolution across both continuous and discrete phenotypic traits, including phylogenetic generalized least squares (PGLS: Grafen 1989; Pagel 1999) and Pagel’s discrete test (Pagel 1994; Pagel and Meade 2006). For a detailed review of these methods see Nunn (2011).

Finally, a recent application of phylogenetic data to the study of behavior has been developed by Arnold and Nunn (2010). This new phylogenetic method, called phylogenetic targeting, allows researchers to choose (or target) species that can provide the best data to test a particular hypothesis (Arnold and Nunn 2010). By taking into account phylogeny and potential confounding variables, it is possible to select candidate species for future data collection. Phylogenetic targeting can increase the statistical power to guide the selection of species and to design comparative studies better relative to a priori hypotheses (Arnold and Nunn 2010). The authors also provide a Web-based computer program to implement such approach (PhyloTargeting:

Gene Discordance and Behavior Reconstruction: The Case of Afro-papionins

The Afro-papionins (subtribe Papionina) provide an example of how tree topologies can affect reconstructions of ancestral behaviors. This group is useful because our understanding of their systematics has changed over time and because they display a wide variety of social organizations, ranging from the multilevel societies of Theropithecus and some Papio species, to Mandrillus “hordes,” to the macaque-like multimale, multifemale structure of Lophocebus. The topology of the Afro-papionin tree determines how we infer ancestral traits, and that in turn allows insights into how complex forms of social organization such as multilevel societies evolved (Henzi and Barrett 2003, 2005; Jolly 2007).

A long-standing controversy regarding the systematics of the Afro-papionins is whether mangabeys (Cercocebus and Lophocebus) and baboons (Mandrillus, Papio, and Theropithecus) constitute reciprocally monophyletic lineages (Fig. 2a). This is now considered to be resolved, with considerable evidence for mangabey polyphyly. Papionina is composed of two major clades: Mandrillus+Cercocebus and Lophocebus+Papio+Rungwecebus+Theropithecus (Cronin and Sarich 1976; Fleagle and McGraw 1999; Gilbert 2007; Harris and Disotell 1998) (Fig. 2b). The revised topology allows interesting new insights into the evolution of Afro-papionin behavior. For example, the fact that Cercocebus and Mandrillus are sister taxa makes certain aspects of the evolution of their behavior more clear. Jolly (2007) noted similarities between the genera: unlike any other Afro-papionin, both occupy an ecological niche as forest-floor gleaners and both breed seasonally. Jolly (2007) proposed that these traits were likely present in the common ancestor of the two. Further, Jolly (2007) proposed that seasonal breeding strongly affects the social organization of both genera by allowing males to be absent from social groups during the majority of the year, rejoining only during the mating season. Jolly suggested that the pattern seen in Mandrillus was a more extreme, derived form of that seen in Cercocebus or in their common ancestor.
Fig. 2

Alternative papionin phylogenies. (a) Papionin phylogeny based on morphological data with mangabeys and baboons reciprocally monophyletic (Szalay and Delson 1979). (b) Revised phylogeny based on genetic and morphological data showing mangabeys to be polyphyletic (Cronin and Sarich 1976; Fleagle and McGraw 1999; Gilbert 2007; Harris and Disotell 1998).

Tree topology also affects interpretations about the evolution of multilevel societies in the Afro-papionins. Multilevel societies are uncommon among primates, occurring only occasionally among colobines, Afro-papionins, hominins (Grueter et al. 2012), and possibly Cacajao (Bowler et al. 2012). As this relatively rare form of social organization is present in three closely related Afro-papionin species: Theropithecus gelada (Dunbar 1984; Kawai et al. 1983); Papio hamadryas (Kummer 1968), and Papio papio (Galat-Luong et al. 2006; Patzelt et al. 2011), these taxa provide a good opportunity to understand how, why, and from what multilevel societies evolve. Because the behavioral “states” of extant taxa, when placed on a tree, can be used to infer the states of ancestral taxa, the topology of a tree can strongly affect the inference of ancestral states. The topology also allows inferences about whether multilevel societies evolved multiple times independently, or whether some or all of these species inherited multilevel organization from a common ancestor.

The relationships among Lophocebus, Papio, Rungwecebus, and Theropithecus, and the relationships within Papio have not yet been confidently resolved (Fig. 3) (Harris and Disotell 1998; Keller et al. 2010; Olson et al. 2008; Perelman et al. 2011; Pozzi et al., unpublished data; Purvis 1995; Zinner et al. 2013). However, there are only a few alternate topologies that have any statistical support, and these have implications for our understanding of the evolution of multilevel societies. One possible topology, illustrated in Fig. 3a, places Theropithecus as sister to Papio (Disotell 1994), and Papio papio as sister to Papio hamadryas within Papio (Jolly 1993). In this instance, the parsimonious reconstruction of ancestral behavioral states would hold that the common ancestor of Papio and Theropithecus had a multilevel society, and that less overtly tiered societies evolved within Papio from an ancestor with a multilevel society. However, this topology is not strongly supported. Rather, topologies such as those in Fig. 3b are more likely (Keller et al. 2010; Olson et al. 2008; Perelman et al. 2011). These imply ancestral forms had single-level societies, and that multilevel societies evolved independently at least twice. This is supported by the fact that there are differences in many features of Theropithecus gelada, Papio hamadryas, and Papio papio societies, possibly indicative of independent origins (Jolly 2007). Perhaps something about the ecology or social organization of Afro-papionins “primes” their societies to evolve more complex, multilevel forms of organization.
Fig. 3

Reconstruction of ancestral states is affected by tree topology. (a) If Theropithecus is sister to Papio, parsimony suggests that their common ancestor had a multilevel society. The transition from single level to multilevel presumably occurred once. (b) If Lophocebus is sister to Papio, parsimony suggests that their common ancestor had a single level society, and that multilevel societies evolved at least twice independently. SL = inferred single level society; ML = inferred multilevel society; ∆ML/SL = change to given state. Extant taxa with multilevel societies are underlined.

Reconstructing Phylogenetic Trees Using Molecular Data

Comparative methods use phylogenetic trees as their starting points. The main assumption is that phylogenetic relationships among species are well known and can be used to infer ancestral character states, convergent evolution, or character displacement. Different topologies (and in some cases different branch lengths) or the presence of missing taxa might affect the outcome of comparative methods and possibly provide misleading results. Obtaining well-supported and well-sampled phylogenies is therefore the first essential step in any comparative analysis but unfortunately it is often unclear to behavioral ecologists which phylogenetic studies are most reliable.

Traditionally, molecular primatologists have used single-locus data to build phylogenetic trees. In particular, mitochondrial sequences (mtDNA) have been extensively used to reconstruct primate phylogenetic relationships, from the genus (Chiou et al. 2011; Meyer et al. 2011; Zinner et al. 2013), family (Chan et al. 2010; Raaum et al. 2005; Sterner et al. 2006), or even higher taxonomic levels (Arnason et al. 2008; Finstermeier et al. 2013; Hodgson et al. 2009; Matsui et al. 2009). Several features make mtDNA especially suitable for reconstructing evolutionary relationships among species, including the lack of recombination, relatively high substitution rates, a large number of copies in a cell (making it easier to obtain in low-quality samples), and haploidy (Moore 1995). In addition, mtDNA shows heterogeneity in base composition and evolutionary rates across the molecule, and different regions of the genome have been used for different level of analyses, such as D-loop for population genetic studies (Blair and Melnick 2012; Chaves et al. 2011; Wimmer et al. 2002) or cytochrome b for phylogenetic reconstructions (Meyer et al. 2011; Roos et al. 2004; Zinner et al. 2009b).

Despite the many advantages of mitochondrial markers, single-locus phylogenies can provide misleading information about evolutionary relationships and should always be treated with caution (Edwards 2009). Within primates, some groups that are well supported by nuclear and morphological data are not monophyletic when using mtDNA. For instance, African papionins and lorisoids represent two primate groups in which the mitochondrial tree probably does not reflect the evolutionary history of the species (Finstermeier et al. 2013; Pozzi et al., unpublished data). Such gene trees represent coalescent events for a single locus and do not necessarily coincide with speciation events (Edwards 2009; Maddison 1997; Maddison and Knowles 2006).

Several phenomena might be responsible for the discordance between individual gene trees, both nuclear and mitochondrial, and the species phylogeny, including incomplete lineage sorting, hybridization (or horizontal gene transfer), recombination, gene duplication, and gene loss (Degnan and Rosenberg 2006, 2009; Knowles 2009; Maddison 1997; Maddison and Knowles 2006). For instance, in the case of hybridization, some parts of the genome (either mitochondrial and/or nuclear) are shared between species that are not necessarily closely related; the regions subjected to introgression are thus not representative of the actual species phylogeny (Zinner et al. 2011). A recent example within primates is represented by Rungwecebus kipunji. This species was first discovered in 2005 (Jones et al. 2005) and was initially described as a new member of the genus Lophocebus, but was raised to a new monotypic genus the following year (Davenport et al. 2006). Recently, two independent studies showed signs of hybridization in Rungwecebus: mitochondrial haplotypes from individuals living in the Southern Highlands of Tanzania were nested within geographically adjacent yellow baboon (Papio cynocephalus) populations (Burrell et al. 2009; Zinner et al. 2009a). However, a recent study showed that individuals from a different geographic area (Ndundulu forest in the Udzungwa Mountains) retain the “true” Rungwecebus mitochondrial genome (Roberts et al. 2010). This interesting example shows how the history of a single locus (in this case mtDNA) can differ from that of the species, and how studies can be affected by geographical sampling in cases of localized introgression, thus resulting in misleading conclusions about the history of species relationships.

The recent publication of several complete primate genomes also shows that large parts of the nuclear genome might have an evolutionary history that is not congruent with that of the species. For instance, whole genome analyses reveled signs of introgression of Neanderthals and Denisovans (an archaic hominin from southern Siberia) into non-African human populations affecting 1–6% of the entire genome (Green et al. 2010; Reich et al. 2010). More recently, Scally and colleagues (2012) found that 30% of the gorilla genome exhibits incomplete lineage sorting and supports a sister relationship between gorilla and either human or chimpanzee instead of a sister relationship with the human/chimpanzee clade.

For the aforementioned reasons, the use of multiple loci has been deemed critical for proper reconstruction of the phylogenetic relationships among species (Edwards 2009; Maddison and Knowles 2006; Rokas et al. 2003). This is especially important for recent and fast radiations, for which incongruences due to hybridization and incomplete lineage sorting are highly likely (Degnan and Rosenberg 2006, 2009). Two main approaches have been used to analyze multiple loci. Rokas et al. (2003) showed how the concatenation of several loci into a single matrix could provide strong support for a single topology. In their study they analyzed 106 orthologous genes in yeast, and showed that concatenated genes yielded a fully resolved species tree, whereas single or sets of small number of concatenated genes are likely to yield conflicting topologies. This so-called supermatrix approach has been extensively used in primatology, and several researchers have recently employed this technique to build very highly supported phylogenetic trees (Chatterjee et al. 2009; Fabre et al. 2009; Jameson et al. 2011; Perelman et al. 2011; Springer et al. 2012). For instance, Perelman et al. (2011) used 54 concatenated nuclear loci across 191 primate species and found ~96% of the nodes to be highly supported (Bayesian posterior probability > 0.95). More recently, Springer et al. (2012) generated a primate phylogeny based on a supermatrix including 69 nuclear genes and 10 mitochondrial sequences. Although this supermatrix contained a large amount of missing data (~69%), the mean bootstrap support value below the genus level (species and subspecies) was quite high (82.1%).

Another method for building large phylogenies using multiple loci is the supertree approach (Bininda-Emonds 2004; Bininda-Emonds et al. 2002; Sanderson et al. 1998). In this case, individual trees are obtained separately for each locus, and then a “consensus tree” between all the individual trees is built (Bininda-Emonds 2004; Bininda-Emonds et al. 2007). This technique allows researchers to combine source trees that have only a few taxa in common. Source trees are usually codified into a matrix, e.g., matrix representation with parsimony; this matrix is then optimized using various methods to produce a final supertree (Bininda-Emonds 2004; Sanderson et al. 1998). For instance, Purvis (1995) combined various individual trees from >100 publications to obtain a large phylogeny for primates. More recently, another study conducted by Bininda-Emonds et al. (2007) combined 298 trees from the literature to include a total of 233 primate species.

The use of supermatrix or supertree phylogenies for comparative studies is quite appealing because it allows researchers to approach complete taxon sampling. This is a critical factor for comparative analyses because the presence of missing taxa in a phylogeny can affect the results for many behavioral analyses. The ability to reconstruct ancestral phenotypes (either morphological or behavioral), for instance, can be affected by taxon representation and phylogenetic uncertainties (Finarelli and Flynn 2006; Montgomery et al. 2010). Incomplete taxon sampling can provide misleading results in reconstructing phylogenetic relationships (Hillis et al. 2003; Nabhan and Sarkar 2012; Plazzi et al. 2010; Townsend and Leuenberger 2011; Zwickl and Hillis 2002), thereby introducing possible inaccuracies in reconstructions of ancestral phenotypic traits. However, supertree and supermatrix approaches also have several flaws. Even phylogenies based on supertrees that are almost complete in terms of taxon representation do not use the original sequence data, making the interpretation of single nodes challenging. Moreover, some areas of the trees might be based on single gene trees and therefore may not be representative of the actual species’ evolutionary history. Supermatrices are more difficult to assemble because they require the creation of a matrix with multiple loci for hundreds of species. As a consequence, supermatrix studies tend to either have a small number of loci (Chatterjee et al. 2009: four loci) or a large amount of missing data (69–75%: Fabre et al. 2009; Perelman et al. 2011; Springer et al. 2012).

All these phylogenetic reconstructions are based on the assumption that all genes have the same or similar evolutionary history and do not account for the individual gene histories. However, both simulation and empirical studies have demonstrated potentially high levels of tree discordance across different loci (Degnan and Rosenberg 2006, 2009; Edwards 2009). Both supermatrix and supertree methods perform quite poorly when gene tree heterogeneity is high. For instance, concatenated data sets can produce misleading phylogenetic reconstructions, with most of the nodes demonstrating high levels of statistical support despite not reflecting the actual evolutionary history of the species (Edwards 2009; Heled and Drummond 2010; Kubatko and Degnan 2007; Leaché and Rannala 2011; Song et al. 2012). Phylogenetic relationships can be particularly difficult to reconstruct when branch lengths are short (a result of short amounts of time between speciation events) and when effective ancestral population sizes are large: in these situations, gene tree discordance can be extremely high and most of the gene trees might differ from the species phylogeny (Degnan and Rosenberg 2006, 2009; Edwards 2009). For instance, in a recent study on eutherian mammals, Song et al. (2012) found that >98% of the gene trees in their study had a distinct topology (440 gene topologies out of 447 total genes).

An alternative approach to obtain valuable phylogenetic information for comparative analyses has been suggested by Nunn et al. (Arnold et al. 2010; Nunn 2011). The 10kTrees Project provides a way for behavioral ecologists to run comparative analyses across a set of trees (up to 10,000 individual trees), allowing the results to be no longer dependent on a single tree being correct (Arnold et al., 2010; Nunn 2011). Phylogenetic trees obtained from the 10kTrees website have been used to study numerous aspects of primate behavior, including facial expression (Dobson 2012), lactation (Hinde and Milligan 2011), cognition (MacLean et al. 2012), and sexual behavior (Matthews 2012). Although the 10kTrees approach allows multiple tree topologies to be examined, this method does not account for gene tree discordance. The set of tree topologies (with branch lengths) are in fact sampled from a Bayesian posterior distribution of a single concatenated data set and can thus suffer from drawbacks similar to those of the supermatrix studies, including low number of loci, missing data, and concatenation biases. The data set assembled in the latest version of the 10kTrees website (version 3) includes 301 primates species but the data are represented by only six independent loci (mitochondrial, Y-chromosome, and four nuclear loci). Analyses are then run using a supermatrix approach in which all loci are concatenated into a single matrix with a high level of missing data (63–69%). Finally, because most species are represented only (or predominantly) by mitochondrial markers (representing one single locus), vast areas of the trees obtained by the 10kTrees website are affected by the same problems as single-locus phylogenies.

The recent availability of large multilocus data sets and the future availability of a growing number of whole genomes open up promising new directions to address phylogenetic reconstructions more effectively (Ting and Sterner 2013). Recently, several coalescence methods that take gene tree heterogeneity into account have been developed to better estimate species phylogenies from multilocus data (Heled and Drummond 2010; Kubatko et al. 2009; Liu 2008). Coalescence methods integrate gene tree phylogenies with population genetic parameters, e.g., ancestral population size, to reconstruct species trees. Although most of these methods assume that incomplete lineage sorting is the only source of gene tree incongruence (Heled and Drummond 2010; Kubatko et al. 2009; Liu 2008), several new methods are now able to estimate hybridization, gene loss, and gene duplication (Meng and Kubatko 2009; Rasmussen and Kellis 2012; Yu et al. 2011). Simulation studies have shown that coalescence methods outperform concatenation methods in recovering species trees when most of the assumptions are met (Edwards 2009; Heled and Drummond 2010; Leache and Rannala 2011). To date, only two studies have included coalescent methods to reconstruct phylogenetic relationships within primates. Weisrock et al. (2012) investigated the phylogeny of mouse lemurs using 12 independent loci, while Perez et al. (2012) reconstructed New World monkey phylogeny using 54 loci obtained from Perelman et al. (2011). However, the extent and significance of gene tree heterogeneity in reconstructing phylogenies are still largely unexplored within most primate clades. Our knowledge of the primate tree is therefore still incomplete and the confidence we have in current primate phylogenies may be not justified.

Case Studies: Concatenation versus Gene Tree-Species Tree Approach

To explore how confident we can be in recent phylogenetic reconstructions, we reanalyze some data from Perelman et al. (2011). Here we have compiled two different data sets within papionins to 1) explore the accuracy of current multilocus phylogenies in reconstructing species phylogenies at different taxonomic levels and 2) compare the results between concatenation and gene tree-species tree approaches.


Data Sets

The Perelman et al. (2011) data set comprises 54 loci for >190 primate species. However, the supermatrix used in their study includes a high level of missing data, and many loci are not represented for several species. To minimize the amount of missing data, we compiled two different data sets in which all taxa show at least a partial sequence for each selected locus. The first data set is composed of 10 species representing 6 genera within the tribe Papionini. We included 34 independent loci for a total of 25,975 bp (mean of 764 bp per locus) (Table I). To compare the power of these loci at a lower taxonomic scale (often the focus of behavioral comparisons), we also compiled a second dataset for 14 species of Macaca for 32 independent loci (20,485 bp total, mean of 640 bp per locus) (Table I).
Table I

Loci used in the two data sets with length in base pairs (bp) for each gene


Data set no.1 (Papionini)

(34 loci: 25,975 bp)

Data set no. 2 (Macaca)

(32 loci: 20,485 bp)




























































































































Each data set was used for concatenated and gene tree-species tree analyses.

aTwo sequences joined together as part of the same locus (AFF2+AFF2.2; LRPPRC_169+ LRPPRC_171; NPAS3+ NPAS3.2; RAG1+RAG2).




Phylogenetic Analyses

For both data sets individual genes were independently aligned using MUSCLE (Edgar 2004). We ran concatenated analyses using both maximum likelihood (ML) and Bayesian inference (MB). ML analyses were run using RAxML version 7.2.6 (Stamatakis et al. 2005, 2006). We partitioned the data set by gene, and a GTR+Γ model was used for each partition. We performed a rapid bootstrap (–f a -x option) with 1000 replications to assess the level of support for each node (Stamatakis et al. 2006, 2008). Bayesian analyses were run using MrBayes v3.2.1 (Ronquist et al. 2012). The data set was partitioned by gene and the model of nucleotide evolution was estimated independently for each partition using the Akaike Information Criterion (AIC) as implemented in MrModelTest 2.3 (Nylander 2004). The analyses were run for 40 million generations for the Papionini data set and 20 million for the Macaca data set with a sampling frequency of 1000 generations. We assessed convergence by checking log likelihood (LnL), potential scale reduction factor (PSRF), and average standard deviation of split frequencies (<0.01) in MrBayes, and visually using Tracer v.1.5 to plot the likelihood versus generation number and estimate the effective sample size (ESS > 200) of all parameters and to compare the performance of the four independent analyses. After checking for convergence, we summarized the posterior distribution of trees by removing the first 10% of generations.

Gene tree-species trees analyses were run using BUCKy 1.4.2 (Ané et al. 2007; Larget et al. 2010). This software does not assume that different loci all have the same topology and it performs a Bayesian concordance analysis to estimate how much of the genome supports each possible node in the tree. Whereas many programs assume that tree discordance is due only to incomplete lineage sorting, BUCKy makes no assumption about the reason for such gene tree heterogeneity. We first obtained a posterior gene tree distribution for each locus independently using MrBayes v3.2.1. We ran 40 million generations for two runs and we excluded the first 10% of the trees, for a total of 72,000 trees per locus. The resulting tree files from MrBayes were then used for concordance analyses using a two-process step: 1) we used the software mbsum (part of the BUCKy package) to summarize the number and proportions of distinct topologies for each gene; 2) we used the resulting files as an input in BUCKy for Bayesian concordance analysis. The software applies a MCMC (Markov chain Monte Carlo) process to sample from the single-gene posterior probabilities of trees and to produce a joint posterior distribution for each clade. This distribution is usually represented by the clade concordance factor (CF), which is a summary statistic describing the proportion of genes that contain a particular clade. For each data set, we ran the concordance analyses with four MCMC chains for 10 million generations following a 10% burn-in period. BUCKy uses only the single-parameter prior probability (α) that represents the expectation for different genes to reconstruct different trees (α = 0: no discordance among gene tree topologies; α = infinity: all gene trees are completely independent). We first estimated α based on the number of taxa, loci, and expected levels of gene tree discordance using an R script provided by the authors of BUCKy. To explore how the results were affected by the prior probability distributions for the number of distinct trees, we ran the analyses using α values of 0.1, 1, 10, 100, and 500 following the approach proposed by Weisrock et al. (2012).



The Papionini data set included 25,975 bp, of which 151 were parsimony-informative characters (0.58% with a mean of 4.4 single-nucleotide polymorphisms [SNPs] per locus). Maximum likelihood and Bayesian analyses using the concatenated alignments provided a single tree with two highly supported clades represented by Cercocebus+Mandrillus (bootstrap proportion [BP] = 100% and posterior probability [PP] = 1.00) and Lophocebus+Theropithecus+Papio (BP = 100% and PP = 1.00) (Fig. 4a). Within the latter clade, both analyses supported the basal position of Theropithecus, and a sister relationship between Lophocebus and Papio (BP = 81% and PP = 1.00). The relationships within the genus Papio, although identical between ML and MB analyses (P. anubis sister of the clade P. papio-P. hamadryas), were poorly supported, with only 51% bootstrap value and 0.74 posterior probability, respectively.
Fig. 4

Phylogenetic results for the two example data sets with Papionini (34 loci: 25,975 bp) and Macaca (32 loci: 20485 bp). (a) Papionini and (b) Macaca. The trees on the left were obtained using a supermatrix approach (Bayesian posterior probability values [PP] above the branch and ML bootstrap value [BP] below the branch), while the trees on the right were obtained using a gene tree-species tree approach (concordance values obtained for α = 100).

Bayesian concordance analysis performed with BUCKy supported a different topology to the concatenated data set. As expected, the CFs for each node slightly change according to the value of prior α. However, the primary concordance tree is the same across all the analyses, supporting the same relationships between taxa. The only difference from the concatenated analyses is represented by the sister relationship between Theropithecus and Lophocebus, to the exclusion of Papio. Surprisingly, the CF for this node is quite high in most analyses (0.376–0.642), with 95% credibility intervals ranging from a low of 0.118 to a high of 0.794. This suggests that a high percentage of loci (e.g., α = 100: 0.537; 95% CI 0.324–0.706) support a topology that differs from the one supported by the concatenation analyses (Fig. 3a). Alternative topologies for the relationships among these taxa were weakly supported: the CF for the sister relationship between Lophocebus and Papio to the exclusion of Theropithecus ranged between 0.224 and 0.356 (e.g., α = 100: 0.283; 95% CI 0.147–0.471), while the CF for the clade PapioTheropithecus was lower than 0.050 in all the analyses. This result is particularly interesting because it suggests that the topology strongly supported by the concatenation analyses is supported by only ca. 30% of the loci, while the majority of loci often support an alternative topology with Lophocebus the sister of Theropithecus. However, the interpretation of this result is not straightforward: it is possible —as simulation studies have suggested— that concatenated analyses strongly support an incorrect topology when gene discordance is very high (Edwards, 2009; Leaché and Rannala 2011). In this case some nonphylogenetic signal is driving the result, e.g., high level of incomplete lineage sorting or hybridization, and hiding the actual phylogenetic signal in the concatenated analyses. No matter what it is driving the different topologies between the two analyses, it is clear that more than half of the individual gene trees analyzed here agree on a different topology from the one supported by the concatenated analysis, suggesting that concatenated results should be treated with extreme caution.

Within Papio, the concordance analyses provided the same tree as the concatenation analyses with the clade P. hamadryas+P. papio as the sister group of P. anubis, with more than half of the loci (between 0.549 and 0.666) supporting this topology (e.g., α = 100: 0.549; 95% CI 0.324–0.794). All the other nodes in the tree were also well supported with CFs higher than 0.75 across all the analyses. To explore further the phylogenetic signal for each locus, we analyzed the posterior gene tree distribution for each locus independently by calculating the number of trees that were distinct (Fig. 5). For each locus we analyzed 72,000 trees resulting from MrBayes analyses (.t files), representing two different runs of 40 million generations (sampled every 1000 generations) and 10% burn-in. If one locus strongly supported one topology, we would expect most of the trees sampled by MrBayes to be similar to each other (only a small proportion of trees would be distinct); conversely, if one gene was poorly informative (unresolved), we would expect that almost 100% of the trees sampled would have a distinct topology. Across the loci used for the Papionini data set, the percentage of distinct trees was extremely variable, ranging from 0.04% (BRCA2: only 31 distinct topologies) to 98.17% (NEGR1: 70,685 distinct topologies).
Fig. 5

Distinct gene topologies per locus. Histogram representing the percentage of trees that were distinct for each locus (only shared loci between the data sets are shown). For each locus we analyzed 72,000 trees resulting from MrBayes analyses (.t files) and calculated the number of distinct tree topologies using mbsum.


This data set included 20,485 bp, of which only 79 were parsimony-informative characters (0.39%; average of 2.5 SNPs per locus). ML and MB analyses of the concatenated data set resulted in identical tree topologies. Macaca sylvanus was basal in the genus, with high support in both analyses. In MB analyses all nodes were highly supported (PP >0.90). Bootstrap values were lower, with three nodes lower than 80%: the clade Macaca tonkeana–Macaca ochreata (74%); the clade including fascicularis, mulatta, cyclopis, and fuscata (79%); and the position of the clade arctoides and thibetana (79%).

The topology supported by the Bayesian concordance analysis was identical to the one obtained by the concatenated analyses. However, the CFs across all the nodes were low, ranging from 0.085 (the clade including fascicularis, mulatta, cyclopis, and fuscata) to 0.327 (sister relationship between Macaca nemestrina and Macaca silenus). The analysis of the proportion of posterior gene trees that are distinct for each locus showed that most of the loci within Macaca are actually uninformative, with almost 100% being distinct topologies (range between 72.5% and 99.9%). Given the very low amount of information in most loci, the low-resolution of the concordance analyses is therefore not surprising.


We here reviewed the current status of phylogenetic reconstructions within primates and discussed the possible limitations for the use of molecular phylogenies in comparative analyses. Recently, many large primate phylogenies containing hundreds of species and tens of loci have been published (Chatterjee et al. 2009; Fabre et al. 2009; Perelman et al. 2011; Springer et al. 2012). Most of these phylogenetic trees were built using supermatrix or supertree approaches that do not account for the independent history of each individual gene (Ting and Sterner 2013). Here, we contend that the confidence we have in those trees is probably overestimated and might provide misleading results when used in comparative analyses. Phylogenetic methods that do not take into account gene tree discordance tend to provide very highly resolved and supported trees that do not necessarily represent true species histories (Edwards 2009; Leache and Rannala 2011; Weisrock et al. 2012). Despite the great value of these large phylogenetic trees, it is important to be aware of the limitations of such reconstructions, especially at the lower taxonomic scale. The data sets that we reanalyzed in this article demonstrated that our confidence in current phylogenies might be misplaced, especially at the intrageneric level. For instance, a single tree topology is strongly supported within Macaca in the concatenated analyses but these relationships are based on very few substitutions and the majority of the loci show no variation at all. This problem is not evident when multiple loci are analyzed as a single supermatrix; however, when gene tree-species tree methods are applied, the uncertainty in most of the internal nodes becomes clear. In concatenated analyses a few informative loci can drive the topology and, even if the support value across the nodes is high, it might not be representative of the species’ evolutionary histories.

Coalescence approaches can be used to provide a more realistic picture of the primate tree. Several multilocus data sets are now available for primates, but they all suffer from the same problems and might not be ideal for gene tree-species tree analyses. The loci available, such as those used by Perelman et al. (2011), are usually very short (<1000 bp) and often show very little variation at the intrageneric level. This is particularly relevant in behavioral studies that require phylogenetic trees for closely related species. Including many loci that are not informative may affect our ability to accurately estimate phylogenetic relationships using gene tree-species tree methods (Townsend et al. 2011). Most of these methods rely on the estimation of individual gene trees; however, when a substantial number of loci provide unresolved trees, species tree analyses may fail to reconstruct relationships accurately among species. In concatenated analyses this is not an issue because a few informative loci can drive the strong —but possibly false— resolution of species relationships (Townsend et al. 2011). In this case, the presence of several uninformative loci does not affect the final resolution of the tree.

To obtain strong species trees using a coalescence framework, longer and more variable loci are likely to be necessary (Song et al. 2012). Longer loci might have the disadvantage of being more prone to recombination, violating one of the assumptions of coalescence methods. However, the amount of recombination within a particular locus may be relatively low between closely related species and it can be tested before running the analyses (Song et al. 2012). Another possible drawback of most available multilocus datasets is the need for multiple individuals per species. The use of coalescence analyses makes the distinction between phylogenetics and population genetics blurrier, and future studies should aim to include multiple individuals in their analyses (Edwards 2009). This is critical to reconstruct the coalescence of different loci within species and to estimate some critical population parameters, such as the ancestral effective population size (Heled and Drummond 2010; Maddison and Knowles 2006).

Unfortunately, there are also limitations to coalescence methods of species tree inference. Several empirical studies have shown that Bayesian methods, such as BEST (Liu 2008) or *BEAST (Heled and Drummond 2010), often fail to reach convergence and might currently be too computationally intense for most data sets (Perez et al. 2012; Weisrock et al. 2012). Novel partially parametric methods that use only summary statistics of the gene tree topologies have been developed recently, e.g., MP-EST and STAR. These methods can possibly overcome the computational demands of fully parametric methods but they usually require more loci to achieve a high level of confidence in the results (Liu et al. 2009, 2010; Song et al. 2012). Finally, most of the coalescence approaches make strong assumptions regarding the source of the tree discordance. In general, most of the programs available today consider incomplete lineage sorting as the only source of gene heterogeneity (Heled and Drummund 2010; Liu 2008). However, many other factors can result in multiple gene topologies, such as hybridization, recombination, or gene duplication and loss (Degnan and Rosenberg 2009; Edwards 2009). In particular, hybridization seems to play a critical role in primate evolution and many primate species are now known to hybridize in the wild (Burrell et al. 2009; Cortés-Ortiz et al. 2007; Detwiler et al. 2005; Gligor et al. 2009; Zinner et al. 2009b, 2011). New methods that take into account both hybridization and incomplete lineage sorting are therefore necessary to estimate primate species trees under a coalescence framework (Meng and Kubatko 2009; Yu et al 2011).

Even recent molecular phylogenies should be treated with caution in behavioral analyses because they still suffer from many invalid assumptions and shortcomings. Future studies of primate phylogenetics should aim to build appropriate data sets that do take into account different sources of tree heterogeneity to obtain a better understanding of the primate tree. New advances in sequencing technologies and the advent of new genomic techniques will soon provide large data sets that will allow these questions to be more appropriately addressed in many primate species (Bergey et al. 2013; Ting and Sterner 2013). Until that day, behavioral ecologists should be aware of the current limitations of primate phylogenetic reconstructions when they apply phylogenetic comparative methods in their studies.


We thank James Higham, Lauren Brent, and Amanda Melin for inviting us to contribute to this special issue of the International Journal of Primatology. We are grateful to Lauren Brent and two anonymous reviewers for helpful comments and suggestions. We also thank Bret Larget for support and advice in running BUCKy.

Copyright information

© Springer Science+Business Media New York 2013