Understanding Evolutionary Trees
- First Online:
- Cite this article as:
- Gregory, T.R. Evo Edu Outreach (2008) 1: 121. doi:10.1007/s12052-008-0035-x
Charles Darwin sketched his first evolutionary tree in 1837, and trees have remained a central metaphor in evolutionary biology up to the present. Today, phylogenetics—the science of constructing and evaluating hypotheses about historical patterns of descent in the form of evolutionary trees—has become pervasive within and increasingly outside evolutionary biology. Fostering skills in “tree thinking” is therefore a critical component of biological education. Conversely, misconceptions about evolutionary trees can be very detrimental to one’s understanding of the patterns and processes that have occurred in the history of life. This paper provides a basic introduction to evolutionary trees, including some guidelines for how and how not to read them. Ten of the most common misconceptions about evolutionary trees and their implications for understanding evolution are addressed.
KeywordsBranchCladeCommon ancestorEvolutionNodePhylogenySister taxaTopologyTrend
Introduction: The Importance of Tree Thinking
As buds give rise by growth to fresh buds, and these, if vigorous, branch out and overtop on all sides many a feebler branch, so by generation I believe it has been with the great Tree of Life, which fills with its dead and broken branches the crust of the earth, and covers the surface with its ever-branching and beautiful ramifications.
Today, evolutionary trees are the subject of detailed, rigorous analysis that seeks to reconstruct the patterns of branching that have led to the diversity of life as we know it (e.g., Cracraft and Donoghue 2004; Hodkinson and Parnell 2007; Lecointre and Le Guyader 2007; Maddison and Schultz 2007). An entire discipline known as phylogenetics (Gr. phyle, tribe + genesis, birth) has emerged, complete with professional societies, dedicated scientific journals, and a complex technical literature that can be impenetrable to many nonspecialists. The output of this profession has become prodigious: It has been suggested that phylogeneticists as a group publish an average of 15 new evolutionary trees per day (Rokas 2006). Little surprise, then, that it has been argued that evolutionary biology as a whole has undergone a shift to “tree thinking” (O’Hara 1988), akin to the earlier movement toward “population thinking” that helped to shape the Neo-Darwinian synthesis around the mid-twentieth century (Mayr and Provine 1980).
Tree thinking does not necessarily entail knowing how phylogenies are inferred by practicing systematists. Anyone who has looked into phylogenetics from outside the field of evolutionary biology knows that it is complex and rapidly changing, replete with a dense statistical literature, impassioned philosophical debates, and an abundance of highly technical computer programs. Fortunately, one can interpret trees and use them for organizing knowledge of biodiversity without knowing the details of phylogenetic inference.
Unfortunately, it is becoming clear that many readers lack a sufficient level of phylogenetic literacy to properly interpret evolutionary patterns and processes. For example, a recent study of undergraduate students who had received at least introductory instruction in evolutionary science revealed a range of common misconceptions about phylogenetic trees that represent “fundamental barriers to understanding how evolution operates” (Meir et al. 2007).2 Early correction of these misconceptions would be of obvious benefit, and it has been suggested that the importance for biology students of learning how to interpret evolutionary trees is on par with that of geography students being taught how to read maps (O’Hara 1997). Given the growing significance of phylogenetic analyses in forensic, medical, and other applications (e.g., Vogel 1997; Rambaut et al. 2001; Mace et al. 2003; Mace and Holden 2005) in addition to their pervasive influence in evolutionary studies, this claim does not appear to be overstated.
This paper aims to provide a brief introduction to evolutionary trees and some basic details on how they should and should not be read and interpreted. This is followed by a discussion of ten of the most common misconceptions about evolutionary trees, many of which are held simultaneously and any of which can severely impede one’s understanding of evolution.
The Basics of Phylogenetic Literacy
What is an Evolutionary Tree?
In the most general terms, an evolutionary tree—also known as a phylogeny3—is a diagrammatic depiction of biological entities that are connected through common descent, such as species or higher-level taxonomic groupings. An overwhelming body of evidence supports the conclusion that every organism alive today and all those who have ever lived are members of a shared heritage that extends back to the origin of life some 3.8 billion years ago. One might therefore expect it to be possible, at least in principle, to reconstruct the Tree of Life, branch by branch and bough by bough, from the current diversity residing at the outermost twigs to a universally shared root. However, this proposition remains controversial—not because there is any scientific doubt about the historical relatedness of species (i.e., the fact of evolution; Gregory 2008) but because of the complex nature of evolutionary processes.
For a start, relatedness among species is a concept that depends on genetics as well as history, and there is ample evidence that even distantly diverged lineages have, at times, experienced significant gene sharing (a process known as lateral or horizontal gene transfer, in contrast to the more typical “vertical” transmission of genes from parent down to offspring). Some authors argue that this was sufficiently rampant in the earliest period of life’s history, and has been common enough throughout the more recent past, to create a “Web of Life” lacking any single root, rather than a strictly bifurcating tree in which branches, once split, remain separate forever (e.g., Doolittle 2000; Doolittle and Bapteste 2007). At the very least, it must be noted that in light of processes such as lateral gene transfer and gene duplication, the history of individual genes may not follow the same historical paths as those of the species in which they reside. In many cases, “gene trees” and “species trees” may not be equivalent, a fact that complicates (but does not preclude) the reconstruction of phylogenies using molecular information (e.g., Wolf et al. 2002; Rokas 2006).
These issues aside, living organisms do have a history, and this does include universal relatedness of one sort or another, be it analogous to a simple tree, a more complex web, or something else. Moreover, there is no fundamental principle that prevents the pattern of ancestry from differing both temporally and taxonomically: it is possible (but by no means confirmed) that a straightforward tree metaphor is inappropriate for, say, ancient (or perhaps even modern) bacteria but is accurate when applied to eukaryotes. In the case of the latter, at least, there may be a “true” phylogeny that accurately depicts the historical patterns of ancestry connecting eukaryote branches to their common root, but the shape of the tree is far from resolved (Baldauf 2003). In fact, except in rare instances where the pattern of evolutionary branching is created in the laboratory and observed directly as it occurs (e.g., Hillis et al. 1992; Sanson et al. 2002), it is impossible to know with certainty that any given phylogeny is historically accurate. As a result, any reconstructed phylogenetic tree is a hypothesis about relationships and patterns of branching and thus is subject to further testing and revision with the analysis of additional data. Fully resolved and uncontroversial phylogenies are rare, and as such, the generation, testing, and updating of phylogenetic hypotheses remain an active and sometimes hotly debated area of research.
Anatomy of a Phylogeny
The old cliché contends that an undue focus on individual trees can prevent one from appreciating the grandeur of a forest. The reverse applies with regard to evolutionary trees, in that their collective importance is obvious, but many people are unfamiliar with the basic features of individual phylogenies. Whether they illustrate relationships among a few species or thousands (e.g., Bininda-Emonds et al. 2007) or of larger groupings of species (genera, families, phyla,4 etc.), all evolutionary trees provide the same basic information: a historical pattern of ancestry, divergence, and descent. They do so by depicting a series of branches that merge at points representing common ancestors, which themselves are connected through more distant ancestors.
By definition, the more common ancestors that two species share to the exclusion of other species, the more closely related they are. For example, in Fig. 2, from the terminal nodes to the root, species A and B share four common ancestors, species A and D share two common ancestors, and species F shares only one ancestor (the root itself) with any of the other five species. Species A and B are linked through a recent common ancestor that is not shared by any other taxa on the tree and are therefore known as “sister taxa.” The next closest relative of species A and B is species C, with whom they share an ancestor to the exclusion of species D, E, and F. Species D and E are sister taxa and are the next closest relatives of A + B + C. Species F, by contrast, is not linked to any of the other species beyond a single distant ancestor and is known as the “outgroup.” An outgroup is necessary to root a tree (unrooted trees also can be drawn, but these are less informative and are not covered here).
How to Read Evolutionary Trees
Phylogenies as Family Trees
This simple comparison between phylogenies and family pedigrees highlights some other important points regarding the interpretation of evolutionary trees. First, contemporary entities (whether individual family members, species, or larger groupings) are related through common ancestors—they are not themselves ancestors of one another. Thus, the reader is not descended from a sibling; rather, both are descended from a shared parent. Likewise, the reader is not descended from a cousin, but they share a more distant common ancestor, namely their grandparent. Second, not only individual relatedness but also the relatedness of nested and increasingly inclusive groups is indicated on a tree. The reader, his or her sibling, and their shared parent represent an “immediate family,” whereas adding the cousins, the aunt, and the grandparent would also produce a coherent grouping that could be labeled more generally as a “family.” The analogous groups in phylogenetic terms, ones that include an ancestor and all of its descendants, are called “clades” (Fig. 3c). As O’Hara (1994) explained, “If you were to grab hold of the tree at any point, and cut immediately below your grip—below in the sense of toward the root—the chunk of the tree in your hand would by definition be a clade.” In other words, clades are branches that include all the twigs that have sprouted from them. Third, all members of an immediate family are equally related to individuals outside of their immediate family but with whom they share a more distant ancestor. For example, in Fig. 3a, both the reader and his or her sibling are equally related to both cousins. In like fashion, species Y and S in Fig. 3b are equally related to species C and to species K. Indeed, no matter how many descendants a parent and an aunt have, all siblings will be equally related to all of their first cousins. The same is true of species.
Types of Trees
How Not to Read Evolutionary Trees
Misunderstandings of evolutionary trees are pervasive among students, in the media, and among other nonspecialists. Even more alarming, they also surface frequently in the peer-reviewed scientific literature, often with significant implications for the conclusions drawn from comparative analyses (see Crisp and Cook 2005 for several examples). The following sections describe and seek to correct ten of the most commonly encountered misconceptions about evolutionary trees. Several of these are interrelated and therefore overlap to an extent, but each can be illustrated using distinct examples. Learning (and teaching students) to avoid these misunderstandings represents a key step toward the development of adequate tree thinking skills.
Misconception #1: Higher and Lower
Notions of a “Great Chain of Being” or scala naturae (scales of nature), in which living species (and, in some cases, nonliving matter and/or the divine) are ranked from lowest to highest and extend back at least as far as Aristotle. Although Darwin (1837) himself noted early on that “It is absurd to talk of one animal being higher than another,” in many respects, his contribution merely shifted the explanation for the perceived rankings, replacing the scales of nature with an “evolutionary scale” or “evolutionary ladder” (Ruse 1996). Talk of “higher” and “lower” organisms, made in reference to contemporaneous species, persists in both public and professional scientific discourse. Not surprisingly, humans typically are (self-)designated as the “highest” organisms, with other living species ranked as higher or lower on the “evolutionary scale” according to how similar they are to this particular terminal node on the phylogeny of animals.
As many prominent authors have noted, there is no scientifically defensible basis on which to rank living species in this way, regardless of how interesting or unique some aspect of their biology may be to human observers (e.g., Dawkins 1992; Gould 1994, 1996). This error does not so much reflect a specific misunderstanding of phylogenetic diagrams per se but a failure to grasp the very concept of common descent. Therefore, the adjustment to be made in this case is from imagining evolution as a linear, progressive process that generates ladder-like ranks to one of branching and diversification of which trees are the result (e.g., O’Hara 1992, 1997; Nee 2005).
Misconception #2: Main Line and Side Tracks
Although it is clearly a critical first step, recognizing evolution as tree-like does not in itself eliminate progressionist interpretations of life’s history. Even those who acknowledge the branching nature of evolutionary change may continue to interpret it as a progressive process in which a “main line” has led to a distinct endpoint (namely Homo sapiens). In this narrative, all other modern species are derivatives of “side tracks,” anomalous offshoots of the main line to humans that all went astray for one reason or another. Even Huxley (1880) fell prey to this line of thinking when he suggested that the teleost fishes “appear to me to be off the main line of evolution—to represent, as it were, side tracks starting from certain points of that line.”
When we come to realize that even among the vertebrates there are 50,000 different ‘vertebrate stories’, each one with a different ending and each one with a different narrative landscape; when we truly think in terms of the diverging tree, instead of the line; when we understand that it is absurd to talk of one animal being higher than another; only then will we see the full grandeur of the historical view of life.
As a matter of fact, it is most likely that evolutionary history will be misconstrued as representing a progressive “main line” when there is only one obvious endpoint available. In what he called “life’s little joke,” Gould (1991) noted that only unsuccessful lineages with very few living representatives are taken as endpoints of a supposed main line.
Misconception #3: Reading across the Tips
As a means of correcting this misinterpretation, one may take the time to identify the clades depicted in the tree (Baum et al. 2005). Humans, cats, and their common mammalian ancestor represent one clade, as do birds, lizards, and their common ancestor. These lineages together with their shared ancestor represent a clade (amniotes) in which the first two clades are nested. Adding frogs and the ancestor linking them to the aforementioned species creates a yet larger clade (tetrapods). Adding fishes and the common ancestor of all species on this tree creates the final and largest clade (vertebrates). Because frogs can be included in a clade with humans before fishes can—in other words, because frogs and humans share a common ancestor that is not shared with fishes—frogs are more closely related to humans than to fishes. Indeed, frogs and humans are exactly equally related to fishes through this common ancestor (recall that two cousins are equally related to a third, more distant relative).
A more rapid approach is to mentally rotate a few internal nodes with no effect on the topology of the tree, as shown in Fig. 11b. In this modified tree, humans are still sister to cats and birds are sister to lizards, frogs are then sister to amniotes, and fishes are the outgroup to the tetrapods. This second tree is identical in topology and is therefore equally accurate as the first tree. However, it should be obvious that humans are not suddenly more closely related to frogs than to reptiles and birds.
Misconception #4: Similarity versus Relatedness
The modern science of taxonomy is built upon the foundation laid by Carolus Linnaeus in the mid-eighteenth century. His system, which long predated the widespread scientific acceptance of common descent inspired by Darwin, categorized organisms on the basis of physical similarity. Notably, in the first edition of his Systema Naturae of 1735, whales were grouped with fishes—an oversight that he corrected in the tenth edition in 1758 by placing them with the other mammals. Today, the primary criterion for scientific classification is evolutionary relatedness, whereas differences in the degree of physical similarity across lineages are often a confounding variable. This can be so for two major reasons: First, as with whales and fishes, adaptation to similar environments can lead to a superficial convergence of physical appearance. Second, the rates of morphological change can vary considerably among lineages, with some remaining similar to a common ancestor and/or to more distantly related contemporary lineages and others becoming markedly different over the same time span (Baum et al. 2005).
Misconception #5: Sibling versus Ancestor
Mistaken assumptions that the ancestor of two modern groups must have been very similar to, or perhaps even was, one of the modern groups extend well beyond the case of crocodiles and birds. Any claim that two species represent each other’s closest living relative should not be construed as implying that one of the modern groups itself is an ancestor of the other nor even that the common ancestor looked anything like either of the two groups. For example, the hypothesis that whales and hippopotamuses are sister groups (e.g., Boisserie et al. 2005) does not imply that the ancestor of whales was a hippo nor that it would even have been thought of as being similar to a hippo were it encountered when it was alive. Not surprisingly, the fossil record of whales, which is becoming increasingly extensive, shows that the early ancestors of whales (e.g., Pakicetus, Ambulocetus) bore no substantial resemblance to modern hippos at all (Thewissen and Bajpai 2001; Thewissen and Williams 2002).
Nowhere is this misconception more pronounced than in discussions of human evolution. One often hears it expressed in the rhetorical challenges offered by those who exhibit the poorest comprehension of evolutionary concepts: “If humans are descended from chimps,” so the question goes, “then why are there still chimps?” “If humans are descended from monkeys, then why has no one observed a monkey giving birth to a human baby?” The answer is simple because the premise is flawed: Humans are not descended from chimpanzees or monkeys, and no sane biologist suggests otherwise.
The notion that other primates should have disappeared now that humans have evolved is based on a false understanding of species formation. Specifically, it assumes a process in which one species gradually transforms as a whole into another (called “anagenesis”). The reality of species diversification is that it most often proceeds by “cladogenesis,” the branching of new species from common ancestral populations. Chimps continue to exist because they are part of a separate branch that formed through cladogenesis when an ancestral population of a species, which was neither chimp nor human, split into independent lineages. Being confused about the coexistence of humans and chimpanzees is akin to being puzzled by the coexistence of Canada and Australia. Once again, rotating some internal nodes (Fig. 14b) can help to correct the misperception that other living primates are ancestors of humans or offshoots of a main line leading to humans or of incorrectly assuming that the left- or bottom-most tip represents an ancestor to those at the terminal nodes of the other branches.
Misconception #6: Long Branch Implies no Change (or “Less Diverse Equals Basal Equals Ancestral”)
When viewing unbalanced trees such as those presented as Figs. 10a, 11a, 13, and 14a, there is a tendency among many people to misinterpret the long branch leading to the lone outgroup taxon in two ways. First, it is sometimes assumed that this species, although actually a contemporary of all others on the tree, is ancestral to the other lineages or at least is more similar to the root ancestor than any of the other species included in the tree (Crisp and Cook 2005). Second, this long branch is often taken to imply that no further branching has occurred along this lineage.
As with several of the other misconceptions discussed here, the problem of “basal equals primitive” is most likely to emerge when the tree under consideration is unbalanced and ladderized. It must be borne in mind that even if the unbalanced nature of a phylogeny reflects real differences in species diversity (which it often does not, as most trees include an incomplete sample of species), the relative diversity of major lineages can change over time, with one being the most diverse now and the other having been so in the past (Crisp and Cook 2005).
Once two lineages have separated, each evolves new characters independently of the other and, with time, each will show a mixture of plesiomorphic [inherited largely unchanged from the ancestor] and apomorphic [newly evolved and thus not possessed by the ancestor] character states. Therefore, extant species in both lineages resemble, to varying degrees, their common ancestor. Consequently, whereas character states can be relatively ancestral (plesiomorphic) or derived (apomorphic), these concepts are nonsensical when applied to whole organisms.
Misconception #7: Different Lineage Ages for Modern Species
Misconception #8: Backwards Time Axes
Misconception #9: More Intervening Nodes Equals More Distantly Related
In the study by Meir et al. (2007), many students demonstrated a tendency to assess relatedness in a phylogeny like the one depicted in Fig. 17a by “counting nodes.” For example, because birds on this tree are separated from mammals by four internal nodes (Z, Y, X, W), whereas the separation of turtles and mammals consists of only two internal nodes (X, W), many students incorrectly concluded that birds must be more distantly related to mammals than are turtles. The important point in calculating relatedness is not the number of intervening nodes along a given branch but the number of shared ancestors.
In Fig. 17a, both turtles and birds share one ancestor with mammals (node W), making them equally closely related to mammals. By contrast, birds share three common ancestors with crocodilians (nodes Z, Y, and X) but only two with turtles (nodes X and W), which makes birds and crocodilians more closely related to one another than either is to turtles. To illustrate the basic notion that all modern species in a tree are equally distant from their common ancestor, one can plot the same phylogeny as in Fig. 17a with different patterns for each branch (Fig. 17b) and then splice those branches together to show that the total distance from the root (node W) to any of the terminal nodes is exactly equal (Fig. 17c). The only difference is the number of branching events that occurred within the lineages, whereas the relatedness of the lineages themselves is not affected by this. Misconceptions about relatedness based on node counting also could be countered by balancing the tree, for example by deleting all but one species of birds/reptiles, resulting in a symmetrical V-shaped tree, regardless of which species remains along with mammals, or by adding an equal number of mammals to the sample to even out the diversity along the major branches.
Misconception #10: Change Only at Nodes
There is a legitimate debate among professional evolutionary biologists regarding the patterns of species formation, such as whether it occurs comparatively rapidly (in a geological sense) or is more gradual. Proponents of the punctuated equilibrium model of speciation argue that species remain largely unchanged morphologically for the duration of their existence, with most physical diversification occurring concomitant with species formation events (Eldredge and Gould 1972; Gould 2002; Eldredge 2008). If punctuated equilibrium were established conclusively to represent the exclusive mode of species formation in a clade and an accurate and complete phylogenetic tree were available for that clade that included all living and extinct species, then one could reasonably interpret the internal nodes as the points at which most morphological divergence took place among species. As Meir et al. (2007) noted, many students do draw such a conclusion, although of course this is not because they possess the requisite knowledge on which to base it.
The fact is that one should not assume that an internal node indicates the exact moment (again, geologically speaking) when particular physical changes came about, any more than one should interpret a long, node-free branch as indicating that no change has occurred. More accurately, an internal node represents the time at which a formerly cohesive population diverged into two genetically isolated descendant populations, with morphological change possible both at this time and long afterward (Baum et al. 2005).
Finally, one must bear in mind that terminal nodes can also be misinterpreted if the diversity that they sometimes represent is neglected. For example, the tree in Fig. 11 shows only a single fish, a frog, a lizard, a bird, a cat, and a human, but in actuality, these six terminal nodes together represent more than 50,000 species of living vertebrates and an untold number of ancestors. The important point is that any given node, whether internal or at the tips, represents a diverse assemblage of organisms with a complex evolutionary history.
Looking Ahead to Better Understanding the Past
Two points are abundantly clear when it comes to phylogenetic literacy: (1) It is crucial for an understanding of modern evolutionary concepts, and (2) it is insufficiently common. Misconceptions abound regarding evolutionary trees—sometimes because of, and sometimes creating, incorrect preconceptions about how, evolution operates. Many are holdovers of progressionist or even pre-evolutionary thinking about life’s diversity. Some, along with widespread misunderstandings of evolutionary mechanisms such as natural selection, undoubtedly contribute to the staggeringly low public acceptance of the principle of common descent in North America (Alters and Nelson 2002; Miller et al. 2006).
The way forward on this issue is unambiguous. Students, members of the public, and other nonspecialists must be better educated about the information that evolutionary trees do and do not convey. To this end, several teaching plans and software exercises for constructing and/or using phylogenetic hypotheses have become available (e.g., Bilardello and Valdes 1998; Gendron 2000; Singer et al. 2001; Goldsmith 2003; Meir et al. 2005). In addition, freely accessible online resources are making it possible for individuals to learn about and interact with evolutionary trees (see Appendix).
More generally, lessons at the high school and undergraduate level should de-emphasize the technical aspects of phylogeny reconstruction in favor of a focus on the concepts underlying tree thinking. In this regard, identifying, confronting, and clarifying misconceptions is perhaps the most important strategy. After all, a misconception corrected is a concept better understood. In few cases is this more relevant or more important than with Darwin’s preferred metaphor of the Tree of Life.
A discussion of phylogenetic methods is well beyond the scope of this article. Introductions to the technical aspects of phylogenetic analysis are provided by Hillis et al. (1996), Page and Holmes (1998), Nei and Kumar (2000), Felsenstein (2003), Salemi and Vandamme (2003), and Hall (2007).
For the purposes of this discussion and regardless of whether this will annoy some specialists, “evolutionary tree,” “phylogenetic tree,” and “phylogeny” are used interchangeably.
Students (including many graduate students) sometimes exhibit confusion regarding the singular and plural forms of terms such as these. “Species” is both the singular and the plural (“specie” is not a biological term—it refers to coins). “Genus” is the singular, whereas “genera” is the plural. “Phylum” is the singular and “phyla” is the plural. Other terms of interest include “taxon” (singular) and “taxa” (plural) and the widely misused “data,” which is the plural form of “datum.” While on the topic, it bears mentioning that one human is still referred to as Homo sapiens, which means “wise man” and does not represent the plural of “Homo sapien.”
Of course, one must not take this analogy too far. Human offspring have two parents, four grandparents, and so on, whereas each species in a phylogenetic tree is usually considered to have descended from a single parental species through a branching event (speciation). In this way, a more appropriate analogy would be to a pedigree showing only the males or only the females of a family or to the family tree of individual organisms that reproduce either through asexual fission or budding.
I thank Sarah Adamowicz, Alex Ardila Garcia, Martin Brummell, Niles Eldredge, Bruce Lieberman, Mark Pagel, Andy Purvis, Jillian Smith, Phillip Spinks, and Jonathan Witt for feedback on an early draft of the paper.