Abstract
Phylogenomics aims to describe evolutionary relatedness between organisms by analyzing genomic data. The common practice is to produce phylogenomic trees from molecular information in the sequence, order, and content of genes in genomes. These phylogenies describe the evolution of life and become valuable tools for taxonomy. The recent availability of structural and functional data for hundreds of genomes now offers the opportunity to study evolution using more deep, conserved, and reliable sets of molecular features. Here, we reconstruct trees of life from the functions of proteins. We start by inferring rooted phylogenomic trees and networks of organisms directly from Gene Ontology annotations. Phylogenies and networks yield novel insights into the emergence and evolution of cellular life. The ancestor of Archaea originated earlier than the ancestors of Bacteria and Eukarya and was thermophilic. In contrast, basal bacterial lineages were non-thermophilic. A close relationship between Plants and Metazoa was also identified that disagrees with the traditional Fungi-Metazoa grouping. While measures of evolutionary reticulation were minimum in Eukarya and maximum in Bacteria, the massive role of horizontal gene transfer in microbes did not materialize in phylogenomic networks. Phylogenies and networks also showed that the best reconstructions were recovered when problematic taxa (i.e., parasitic/symbiotic organisms) and horizontally transferred characters were excluded from analysis. Our results indicate that functionomic data represent a useful addition to the set of molecular characters used for tree reconstruction and that trees of cellular life carry in deep branches considerable predictive power to explain the evolution of living organisms.
Similar content being viewed by others
References
Aminov RI (2011) Horizontal gene exchange in environmental microbiota. Front Microbiol 2:158
Anderson I, Rodriguez J, Susanti D, Porat I, Reich C, Ulrich LE, Elkins JG, Mavromatis K, Lykidis A, Kim E, Thompson LS, Nolan M, Land M, Copeland A, Lapidus A, Lucas S, Detter C, Zhulin IB, Olsen GJ, Whitman W, Mukhopadhyay B, Bristow J, Kyrpides N (2008) Genome sequence of Thermophilus pendens reveals an exceptional loss of biosynthetic pathways without genome reduction. J Bacteriol 190:2957–2965
Anderson IJ, Dharmarajan L, Rodriguez J, Hooper S, Porat I, Ulrich LE, Elkins JG, Mavromatis K, Sun H, Land M, Lapidus A, Lucas S, Barry K, Huber H, Zhulin IB, Whitman WB, Mukhopadhyay B, Woese C, Bristow J, Kyrpides N (2009) The complete genome sequence of Staphylothermus marinus reveals differences in sulfur metabolism among heterotrophic Crenarchaeota. BMC Genomics 10:145
Ashburner M, Ball CA, Blake JA, Botstein D, Butler H, Cherry JM, Davis AP, Dolinski K, Dwight SS, Eppig JT (2000) Gene Ontology: tool for the unification of biology. Nat Genet 25:25–29
Bapteste E, Brochier C (2004) On the conceptual difficulties in rooting the tree of life. Trends Microbiol 12:9–13
Blank CE (2009) Not so old Archaea—the antiquity of biogeochemical processes in the archaeal domain of life. Geobiology 7:495–514
Brochier C, Philippe H (2002) Phylogeny: a non-hyperthermophilic ancestor for bacteria. Nature 417:244
Brochier-Armanet C, Forterre P, Gribaldo S (2011) Phylogeny and evolution of the Archaea: one hundred genomes later. Curr Opin Microbiol 14:274–281
Bryant D, Moulton V (2004) Neighbor-net: an agglomerative method for the construction of phylogenetic networks. Mol Biol Evol 21:255–265
Bryant HN, Wagner G (2001) Character polarity and the rooting of cladograms. In: Wagner GP (ed) The character concept in evolutionary biology. Academic Press, New York, pp 319–338
Buckley CD (2012) Investigating cultural evolution using phylogenetic analysis: the origins and descent of the southeast Asian tradition of warp ikat weaving. PLoS One 7:e52064
Bukhari SA, Caetano-Anollés G (2013) Origin and evolution of protein fold designs inferred from phylogenomic analysis of CATH domain structures in proteomes. PLoS Comput Biol 9:e1003009
Caetano-Anollés G, Caetano-Anollés D (2003) An evolutionarily structured universe of protein architecture. Genome Res 13:1563–1571
Caetano-Anollés G, Nasir A (2012) Benefits of using molecular structure and abundance in phylogenomic analysis. Front Gene 3:172
Caetano-Anollés G, Wang M, Caetano-Anollés D, Mittenthal J (2009) The origin, evolution and structure of the protein world. Biochem J 417:621–637
Caetano-Anollés G, Yafremava L, Mittenthal JE (2010) Modularity and dissipation in evolution of macromolecular structures, functions and networks. In: Caetano-Anollés G (ed) Evolutionary genomics and systems biology. John Wiley & Sons, Hoboken, NJ, pp 431–449
Caetano-Anollés G, Wang M, Caetano-Anollés D (2013) Structural phylogenomics retrodicts the origin of the genetic code and uncovers the evolutionary impact of protein flexibility. PLoS One 8:e72225
Caetano-Anollés G, Nasir A, Zhou K, Caetano-Anollés D, Mittenthal JE, Sun FJ, Kim KM (2014) Archaea: the first domain of diversified life. Archaea 2014:590214
Caro-Quintero A, Deng J, Auchtung J, Brettar I, Hofle MG, Klappenbach J, Konstantinidis KT (2011) Unprecedented levels of horizontal gene transfer among spatially co-occurring Shewanella bacteria from the Baltic Sea. ISME J 5:131–140
Cavalier-Smith T (2002) The neomuran origin of archaebacteria, the negibacterial root of the universal tree and bacterial megaclassification. Int J Syst Evol Microbiol 52:7–76
Chang CW, Lyu PC, Arita M (2011) Reconstructing phylogeny from metabolic substrate-product relationships. BMC Bioinform 12(Suppl 1):S27
Chappe B, Michaelis W, Albrecht P, Ourisson G (1979) Fossil evidence for a novel series of archaebacterial lipids. Naturwissenschaften 66:522–523
Ciccarelli FD, Doerks T, Von Mering C, Creevey CJ, Snel B, Bork P (2006) Toward automatic reconstruction of a highly resolved tree of life. Science 311:1283–1287
Confalonieri F, Elie C, Nadal M, de La Tour C, Forterre P, Duguet M (1993) Reverse gyrase: a helicase-like domain and a type I topoisomerase in the same polypeptide. Proc Natl Acad Sci USA 90:4753–4757
Cummings MP, Neel MC, Shaw KL (2008) A genealogical approach to quantifying lineage divergence. Evolution 62:2411–2422
Dagan T, Roettger M, Bryant D, Martin W (2010) Genome networks root the tree of life between prokaryotic domains. Genome Biol Evol 2:379
Delsuc F, Brinkmann H, Philippe H (2005) Phylogenomics and the reconstruction of the tree of life. Nat Rev Genet 6:361–375
Di Giulio M (2007) The tree of life might be rooted in the branch leading to Nanoarchaeota. Gene 401:108–113
Doolittle WF (1999) Phylogenetic classification and the universal tree. Science 284:2124–2128
Dopazo H, Santoyo J, Dopazo J (2004) Phylogenomics and the number of characters required for obtaining an accurate phylogeny of eukaryote model species. Bioinformatics 20(Suppl 1):i116–i121
Dufresne A, Garczarek L, Partensky F (2005) Accelerated evolution associated with genome reduction in a free-living prokaryote. Genome Biol 6:R14
Emes RD, Goodstadt L, Winter EE, Ponting CP (2003) Comparison of the genomes of human and mouse lays the foundation of genome zoology. Hum Mol Genet 12:701–709
Farris JS (1989) The retention index and homoplasy excess. Syst Zool 38:406–407
Farris JS (2008) Parsimony and explanatory power. Cladistics 24:1–23
Forslund K, Henricson A, Hollich V, Sonnhammer EL (2008) Domain tree-based analysis of protein architecture evolution. Mol Biol Evol 25:254–264
Forterre P, Philippe H (1999) Where is the root of the universal tree of life? BioEssays 21:871–879
Forterre P, Bergerat A, Lopez-Garcia P (1996) The unique DNA topology and DNA topoisomerases of hyperthermophilic archaea. FEMS Microbiol Rev 18:237–248
Forterre P, Bouthier De La Tour C, Philippe H, Duguet M (2000) Reverse gyrase from hyperthermophiles: probable transfer of a thermoadaptation trait from archaea to bacteria. Trends Genet 16:152–154
Garcia-Vallve S, Guzmán E, Montero M, Romeu A (2003) HGT-DB: a database of putative horizontally transferred genes in prokaryotic complete genomes. Nucleic Acids Res 31:187–189
Gerstein M (1998) Patterns of protein-fold usage in eight microbial genomes: a comprehensive structural census. Proteins 33:518–534
Giovannoni SJ, Tripp HJ, Givan S, Podar M, Vergin KL, Baptista D, Bibbs L, Eads J, Richardson TH, Noordewier M (2005) Genome streamlining in a cosmopolitan oceanic bacterium. Science 309:1242–1245
Gogarten JP, Olendzenski L (1999) Orthologs, paralogs and genome comparisons. Curr Opin Genet Dev 9:630–636
Gough J (2005) Convergent evolution of domain architectures (is rare). Bioinformatics 21:1464–1471
Gribaldo S, Brochier-Armanet C (2006) The origin and evolution of Archaea: a state of the art. Phil Trans R Soc B 361:1007–1022
Griffiths E, Gupta RS (2004) Signature sequences in diverse proteins provide evidence for the late divergence of the Order Aquificales. Int Microbiol 7:41–52
Gu X, Zhang H (2004) Genome phylogenetic analysis based on extended gene contents. Mol Biol Evol 21:1401–1408
Gupta R (2000) The phylogeny of proteobacteria: relationships to other eubacterial phyla and eukaryotes. FEMS Microbiol Rev 24:367–402
Harish A, Caetano-Anollés G (2012) Ribosomal history reveals origins of modern protein synthesis. PLoS One 7:e32776
Holland BR, Huber KT, Dress A, Moulton V (2002) Delta plots: a tool for analyzing phylogenetic distance data. Mol Biol Evol 19:2051–2059
Huson DH (1998) SplitsTree: analyzing and visualizing evolutionary data. Bioinformatics 14:68–73
Huson DH, Richter DC, Rausch C, Dezulian T, Franz M, Rupp R (2007) Dendroscope: an interactive viewer for large phylogenetic trees. BMC Bioinform 8:460
Illergård K, Ardell DH, Elofsson A (2009) Structure is three to ten times more conserved than sequence—a study of structural response in protein cores. Proteins 77:499–508
Jablonka E, Lamb MJ (2006) The evolution of information in the major transitions. J Theor Biol 239:236–246
Jain R, Rivera MC, Lake JA (1999) Horizontal gene transfer among genomes: the complexity hypothesis. Proc Natl Acad Sci USA 96:3801–3806
Jensen RA (1976) Enzyme recruitment in evolution of new function. Annu Rev Microbiol 30:409–425
Khersonsky O, Tawfik DS (2010) Enzyme promiscuity: a mechanistic and evolutionary perspective. Annu Rev Biochem 79:471–505
Kim KM, Caetano-Anollés G (2010) Emergence and evolution of modern molecular functions inferred from phylogenomic analysis of ontological data. Mol Biol Evol 27:1710–1733
Kim KM, Caetano-Anollés G (2011) The proteomic complexity and rise of the primordial ancestor of diversified life. BMC Evol Biol 11:140
Kim KM, Caetano-Anollés G (2012) The evolutionary history of protein fold families and proteomes confirms that the archaeal ancestor is more ancient than the ancestors of other superkingdoms. BMC Evol Biol 12:13
Kloesges T, Popa O, Martin W, Dagan T (2011) Networks of gene sharing among 329 proteobacterial genomes reveal differences in lateral gene transfer frequency at different phylogenetic depths. Mol Biol Evol 28:1057–1074
Kolaczkowski B, Thornton JW (2004) Performance of maximum parsimony and likelihood phylogenetics when evolution is heterogeneous. Nature 431:980–984
Koonin EV (2003) Comparative genomics, minimal gene-sets and the last universal common ancestor. Nat Rev Microbiol 1:127–136
Koonin EV, Makarova KS, Aravind L (2001) Horizontal gene transfer in prokaryotes: quantification and classification. Annu Rev Microbiol 55:709–742
Lake JA, Skophammer RG, Herbold CW, Servin JA (2009) Genome beginnings: rooting the tree of life. Phil Trans R Soc B 364:2177–2185
Lienau EK, DeSalle R (2010) Is the microbial tree of life verificationist? Cladistics 26:195–201
Liolios K, Chen IA, Mavromatis K, Tavernarakis N, Hugenholtz P, Markowitz VM, Kyrpides NC (2010) The Genomes On Line Database (GOLD) in 2009: status of genomic and metagenomic projects and their associated metadata. Nucleic Acids Res 38:D346–D354
Liu L, Pearl DK, Brumfield RT, Edwards SV (2008) Estimating species trees using multiple-allele DNA sequence data. Evolution 62:2080–2091
Loytynoja A, Milinkovitch MC (2001) Molecular phylogenetic analyses of the mitochondrial ADP-ATP carriers: the Plantae/Fungi/Metazoa trichotomy revisited. Proc Natl Acad Sci USA 98:10202–10207
Lundberg JG (1972) Wagner networks and ancestors. Syst Biol 21:398–413
Marcet-Houben M, Puigbo P, Romeu A, Garcia-Vallve S (2007) Towards reconstructing a metabolic tree of life. Bioinformation 2:135–144
McDaniel LD, Young E, Delaney J, Ruhnau F, Ritchie KB, Paul JH (2010) High frequency of horizontal gene transfer in the oceans. Science 330:50
Merhej V, Raoult D (2012) Rhizome of life, catastrophes, sequence exchanges, gene creations, and giant viruses: how microbial genomics challenges Darwin. Front Cel Inf Microbiol 2:113
Michaelis W, Albrecht P (1979) Molecular fossils of Archaebacteria in Kerogen. Naturwissenschaften 66:420–421
Mossel E, Steel M (2004) A phase transition for a random cluster model on phylogenetic trees. Math Biosci 187:189–203
Murzin AG, Brenner SE, Hubbard T, Chothia C (1995) SCOP: a structural classification of proteins database for the investigation of sequences and structures. J Mol Biol 247:536–540
Nasir A, Caetano-Anollés G (2013) Comparative analysis of proteomes and functionomes provides insights into origins of cellular diversification. Archaea 2013:648746
Nasir A, Naeem A, Khan MJ, Nicora HDL, Caetano-Anollés G (2011) Annotation of protein domains reveals remarkable conservation in the functional make up of proteomes across superkingdoms. Genes 2:869–911
Nasir A, Kim KM, Caetano-Anollés G (2012) Giant viruses coexisted with the cellular ancestors and represent a distinct supergroup along with superkingdoms Archaea, Bacteria and Eukarya. BMC Evol Biol 12:156
Nasir A, Kim KM, Caetano-Anollés G (2014a) Global patterns of protein domain gain and loss in superkingdoms. PLoS Comput Biol 10:e1003452
Nasir A, Kim KM, Caetano-Anollés G (2014b) A phylogenomic census of molecular functions identifies modern thermophilic Archaea as the most ancient form of cellular life. Archaea (in press)
Orengo CA, Michie A, Jones S, Jones DT, Swindells M, Thornton JM (1997) CATH–a hierarchic classification of protein domain structures. Structure 5:1093–1109
Overbeek R, Begley T, Butler RM, Choudhuri JV, Chuang HY, Cohoon M, de Crecy-Lagard V, Diaz N, Disz T, Edwards R, Fonstein M, Frank ED, Gerdes S, Glass EM, Goesmann A, Hanson A, Iwata-Reuyl D, Jensen R, Jamshidi N, Krause L, Kubal M, Larsen N, Linke B, McHardy AC, Meyer F, Neuweger H, Olsen G, Olson R, Osterman A, Portnoy V, Pusch GD, Rodionov DA, Ruckert C, Steiner J, Stevens R, Thiele I, Vassieva O, Ye Y, Zagnitko O, Vonstein V (2005) The subsystems approach to genome annotation and its use in the project to annotate 1000 genomes. Nucleic Acids Res 33:5691–5702
Pace NR (2009) Mapping the tree of life: progress and prospects. Microbiol Mol Biol Rev 73:565–576
Penny D, Poole A (1999) The nature of the last universal common ancestor. Curr Opin Genet Dev 9:672–677
Perelman P, Johnson WE, Roos C, Seuanez HN, Horvath JE, Moreira MA, Kessing B, Pontius J, Roelke M, Rumpler Y, Schneider MP, Silva A, O’Brien SJ, Pecon-Slattery J (2011) A molecular phylogeny of living primates. PLoS Genet 7:e1001342
Posada D, Crandall KA (1998) MODELTEST: testing the model of DNA substitution. Bioinformatics 14:817–818
Quast C, Pruesse E, Yilmaz P, Gerken J, Schweer T, Yarza P, Peplies J, Glockner FO (2013) The SILVA ribosomal RNA gene database project: improved data processing and web-based tools. Nucleic Acids Res 41:D590–D596
Rappe MS, Giovannoni SJ (2003) The uncultured microbial majority. Annu Rev Microbiol 57:369–394
Ravin NV, Mardanov AV, Beletsky AV, Kublanov IV, Kolganova TV, Lebedinsky AV, Chernyh NA, Bonch-Osmolovskaya EA, Skryabin KG (2009) Complete genome sequence of the anaerobic, protein-degrading hyperthermophilic crenarchaeon Desulfurococcus kamchatkensis. J Bacteriol 191:2371–2379
Reynolds KA, McLaughlin RN, Raganathan R (2012) Ho spots for allosteric regulation on protein surfaces. Cell 147:1564–1575
Rhee SY, Wood V, Dolinski K, Draghici S (2008) Use and misuse of the gene ontology annotations. Nat Rev Genet 9:509–515
Saitou N, Nei M (1987) The neighbor-joining method: a new method for reconstructing phylogenetic trees. Mol Biol Evol 4:406–425
Sankoff D, Leduc G, Antoine N, Paquin B, Lang BF, Cedergren R (1992) Gene order comparisons for phylogenetic inference: evolution of the mitochondrial genome. Proc Natl Acad Sci USA 89:6575–6579
Schopf JW (1999) Deep divisions in the Tree of Life–what does the fossil record reveal? Biol Bull 196:351–353 discussion 354–355
Sober E (2004) The contest between parsimony and likelihood. Syst Biol 53:644–653
Sober E, Steel M (2002) Testing the hypothesis of common ancestry. J Theor Biol 218:395–408
Sober E, Steel M (2013) Time and knowability in evolutionary processes. arXiv 1301.6470 [q-bio.PE]
Sun F, Caetano-Anollés G (2008a) a Evolutionary patterns in the sequence and structure of transfer RNA: early origins of Archaea and viruses. PLoS Comput Biol 4:e1000018
Sun F, Caetano-Anollés G (2008b) b The origin and evolution of tRNA inferred from phylogenetic analysis of structure. J Mol Evol 66:21–35
Sun F, Caetano-Anollés G (2009) The evolutionary history of the structure of 5S ribosomal RNA. J Mol Evol 69:430–443
Sun F, Caetano-Anollés G (2010) The ancient history of the structure of ribonuclease P and the early origins of Archaea. BMC Bioinform 11:153
Swofford DL (2002) Phylogenomic Analysis Using Parsimony and Other Programs (PAUP*) Ver 4.0b10. Sinauer, Sunderland, MA
Szathmary E, Smith JM (1995) The major evolutionary transitions. Nature 374:227–232
Tatusov RL, Fedorova ND, Jackson JD, Jacobs AR, Kiryutin B, Koonin EV, Krylov DM, Mazumder R, Mekhedov SL, Nikolskaya AN, Rao BS, Smirnov S, Sverdlov AV, Vasudevan S, Wolf YI, Yin JJ, Natale DA (2003) The COG database: an updated version includes eukaryotes. BMC Bioinformatics 4:41
Vesteg M, Krajcovic J (2008) Origin of eukaryotic cells as a symbiosis of parasitic alpha-proteobacteria in the periplasm of two-membrane-bounded sexual pre-karyotes. Commun Integr Biol 1:104–113
Vogel C, Chothia C (2006) Protein family expansions and biological complexity. PLoS Comput Biol 2:e48
Wang M, Caetano-Anollés G (2006) Global phylogeny determined by the combination of protein domains in proteomes. Mol Biol Evol 23:2444–2454
Wang M, Caetano-Anollés G (2009) The evolutionary mechanics of domain organization in proteomes and the rise of modularity in the protein world. Structure 17:66–78
Wang M, Yafremava LS, Caetano-Anollés D, Mittenthal JE, Caetano-Anollés G (2007) Reductive evolution of architectural repertoires in proteomes and the birth of the tripartite world. Genome Res 17:1572–1585
Wang M, Jiang Y, Kim KM, Qu G, Ji H, Mittenthal JE, Zhang H, Caetano-Anollés G (2011a) A universal molecular clock of protein folds and its power in tracing the early history of aerobic metabolism and planet oxygenation. Mol Biol Evol 28:567–582
Wang M, Kurland CG, Caetano-Anollés G (2011b) Reductive evolution of proteomes and protein structures. Proc Natl Acad Sci USA 108:11954–11958
Warnow T (2012) Standard maximum likelihood analyses of alignments with gaps can be statistically inconsistent. PLoS Curr 4:RRN1308
Wichmann K, Holman EW, Rama T, Walker RS (2011) Correlates of reticulation in linguistic phylogenies. Lang Dyn Change 1:205–240
Woese CR (1987) Bacterial evolution. Microbiol Rev 51:221–271
Woese CR, Fox GE (1977) Phylogenetic structure of the prokaryotic domain: the primary kingdoms. Proc Natl Acad Sci USA 74:5088–5090
Woese CR, Maniloff J, Zablen LB (1980) Phylogenetic analysis of the mycoplasmas. Proc Natl Acad Sci USA 77:494–498
Woese CR, Kandler O, Wheelis ML (1990) Towards a natural system of organisms: proposal for the domains Archaea, Bacteria, and Eucarya. Proc Natl Acad Sci 87:4576–4579
Wong J (2014) Emergence of life: from functional RNA selection to natural selection and beyond. Front Biosci 19:1117–1150
Wong J, Chen J, Mat W, Ng S, Xue H (2007) Polyphasic evidence delineating the root of life and roots of biological domains. Gene 403:39–52
Xue H, Tong K, Marck C, Grosjean H, Wong J (2003) Transfer RNA paralogs: evidence for genetic code-amino acid biosynthesis coevolution and an archaeal root of life. Gene 310:59–66
Xue H, Ng S, Tong K, Wong J (2005) Congruence of evidence for a Methanopyrus-proximal root of life based on transfer RNA and aminoacyl-tRNA synthetase genes. Gene 360:120–130
Yafremava LS, Wielgos M, Thomas S, Nasir A, Wang M, Mittenthal JE, Caetano-Anollés G (2013) A general framework of persistence strategies for biological systems helps explain domains of life. Front Gene 4:16
Yang S, Bourne PE (2009) The evolutionary history of protein domains viewed by species phylogeny. PLoS One 4:e8378
Zillig W, Holz I, Janekovic D, Klenk HP, Imsel E, Trent J, Wunderl S, Forjaz VH, Coutinho R, Ferreira T (1990) Hyperthermus butylicus, a hyperthermophilic sulfur-reducing archaebacterium that ferments peptides. J Bacteriol 172:3959–3965
Acknowledgments
We are thankful to the members of GCA laboratory for fruitful discussions and anonymous reviewers who reviewed the paper and provided comments that significantly improved the manuscript. Research was supported by grants from the National Science Foundation (MCB-0749836 and OISE-1132791) and the United States Department of Agriculture (ILLU-802-909 and ILLU-483-625) to GCA and grants from KRIBB Research Initiative Program and the Next-Generation BioGreen 21 Program, Rural Development Administration (PJ0090192013) to KMK.
Author information
Authors and Affiliations
Corresponding author
Additional information
Kyung Mo Kim and Arshan Nasir contributed equally to this work.
Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
About this article
Cite this article
Kim, K.M., Nasir, A., Hwang, K. et al. A Tree of Cellular Life Inferred from a Genomic Census of Molecular Functions. J Mol Evol 79, 240–262 (2014). https://doi.org/10.1007/s00239-014-9637-9
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00239-014-9637-9