Abstract
A systematic way of inferring evolutionary relatedness of microbial organisms from the oligopeptide content, i.e., frequency of amino acid K-strings in their complete proteomes, is proposed. The new method circumvents the ambiguity of choosing the genes for phylogenetic reconstruction and avoids the necessity of aligning sequences of essentially different length and gene content. The only “parameter” in the method is the length K of the oligopeptides, which serves to tune the “resolution power” of the method. The topology of the trees converges with K increasing. Applied to a total of 109 organisms, including 16 Archaea, 87 Bacteria, and 6 Eukarya, it yields an unrooted tree that agrees with the biologists’ “tree of life” based on SSU rRNA comparison in a majority of basic branchings, and especially, in all lower taxa.
Similar content being viewed by others
References
B Alberts (1994) Molecular biology of the cell, 3rd ed. Garland New York 121
L Aravind RL Tatusov YI Wolf DR Walker EV Koonin et al. (1998) ArticleTitleEvidence for massive gene exchange between archaeal and bacterial hyperthermophiles. Trends Genet 14 442–444 Occurrence Handle10.1016/S0168-9525(98)01553-4 Occurrence Handle1:CAS:528:DyaK1cXns1ynur8%3D Occurrence Handle9825671
SL Baldauf JD Palmer WF Doolittle (1996) ArticleTitleThe root of the universal tree and the origin of eukaryotes based on elongation factor phylogeny. Proc Natl Acad Sci USA 93 7749–7754 Occurrence Handle10.1073/pnas.93.15.7749 Occurrence Handle1:CAS:528:DyaK28XksFSrtr0%3D Occurrence Handle8755547
DA Benson I Karsch-Mizrachi DJ Lipman J Ostell DL Wheeler (2003) ArticleTitleGenBank. Nucleic Acid Res 31 23–27 Occurrence Handle10.1093/nar/gkg057 Occurrence Handle1:CAS:528:DC%2BD3sXhvFSgu78%3D Occurrence Handle12519940
InstitutionalAuthorNameBergey’s Manual Trust (1984–1989) Bergey’s manual of systematic bacteriology, 1st ed, Vols 1–4. Williams & Wilkins Baltimore
InstitutionalAuthorNameBergey’s Manual Trust (2001) Bergey’s manual of systematic bacteriology, 2nd ed, Vol 1. Springer-Verlag New York
V Brendel JS Beckmann EN Trifonov (1986) ArticleTitleLinguistics of nucleotide sequences: Morphology and comparison of vocabularies. J Biomol Struct Dyn 4 11–21 Occurrence Handle1:CAS:528:DyaL28XlvVKjtbo%3D Occurrence Handle3078230
K Chu J Qi Z Yu VO Anh (2003) ArticleTitleOrigin and phylogeny of chloroplasts: A simple correlation analysis of complete genomes. Mol Biol Evol Occurrence Handle10.1093/molbev/msg021 Occurrence Handle1:CAS:528:DC%2BD3sXhsVOitbk%3D Occurrence Handle12598686
V Daubin M Gouy G Perriere (2001) ArticleTitleBacterial molecular phylogeny using supertree approach. Genome Inform 12 155–164 Occurrence Handle1:CAS:528:DC%2BD38XkvV2rt7Y%3D
WF Doolittle (1999) ArticleTitlePhylogenetic classification and the universal tree. Science 284 2124–2128 Occurrence Handle1:CAS:528:DyaK1MXkt1Kgsbs%3D Occurrence Handle10381871
WF Doolittle (2000) ArticleTitleUprooting the tree of life. Sci Am February 90–95
Felsenstein J (1993) PHYLIP (phylogeny inference package), version 3.5c. Distributed by the author at http://evolution.genetics.washington.edu/phylip.html
ST Fitz-Gibbon CH House (1999) ArticleTitleWhole genome-based phylogenetic analysis of free-living microorganism. Nucleic Acid Res 27 4218–4222 Occurrence Handle10.1093/nar/27.21.4218 Occurrence Handle1:CAS:528:DyaK1MXnt1Gkur8%3D Occurrence Handle10518613
Garrity GM, Winters M, Searles DB (2001) Taxonomic outline of the prokaryotic genera. In: Bergey’s manual of systematic bacteriology, 2nd ed, Rel 1.0 (Available at http://www.cme.msu.edu/bergeys/april2001-genus.pdf )
Hao BL, Xie HM, Zhang SY (2001) Compositional representation of protein sequences and the number of Eulerian loops. (Available at arXiv:physics/0103028 at: http://lanl.arXiv.org/ )
R Hu B Wang (2001) ArticleTitleStatistically significant strings are related to regulatory elements in the promoter region of Saccharomyces cerevisiae. Physica A 290 464–474 Occurrence Handle10.1016/S0378-4371(00)00488-X Occurrence Handle1:CAS:528:DC%2BD3MXpt1Khuw%3D%3D
MA Huynen B Snel P Bork (1999) ArticleTitleLateral gene transfer, genome surveys, and the phylogeny of prokaryotes. Science 286 1441 Occurrence Handle10.1126/science.286.5444.1443a
S Karlin C Burge (1995) ArticleTitleDinucleotide relative abundance extremes: A genomic signature. Trends Genet 11 283–290 Occurrence Handle1:CAS:528:DyaK2MXmvVahtLY%3D Occurrence Handle7482779
LM Margulis KV Schwartz (1998) Five kingdoms, 3rd ed. WH Freeman New York 60
O Matte-Tailliez C Brochier P Forterre H Philippe (2002) ArticleTitleArchaeal phylogeny based on ribosomal proteins. Mol Biol Evol 19 631–639 Occurrence Handle1:CAS:528:DC%2BD38XjsFakurs%3D Occurrence Handle11961097
RGE Murray (1989) The higher taxa, or, a place for everything…? ST Williams ME Sharpe JG Holt (Eds) Bergey’s manual of systematic bacteriology, Vol 4. Williams and Wilkins Baltimore 2329–2332
E Pennisi (1998) ArticleTitleGenome data shake tree of life. Science 280 672–674 Occurrence Handle10.1126/science.280.5364.672 Occurrence Handle1:CAS:528:DyaK1cXjtVOltbg%3D Occurrence Handle9599142
E Pennisi (1999) ArticleTitleIs it time to uproot the tree of life? Science 284 1305–1308 Occurrence Handle10.1126/science.284.5418.1305 Occurrence Handle1:CAS:528:DyaK1MXjs1SqtLg%3D Occurrence Handle10383313
MA Ragan (2001) ArticleTitleDetection of lateral gene transfer among microbial genomes. Curr Opin Gen Dev 11 620–626 Occurrence Handle10.1016/S0959-437X(00)00244-6 Occurrence Handle1:CAS:528:DC%2BD3MXnslajt7k%3D
N Saitou M Nei (1987) ArticleTitleThe neighbor-joining method: A new method for reconstructing phylogenetic trees. Mol Biol Evol 4 406–425 Occurrence Handle1:STN:280:BieC1cbgtVY%3D Occurrence Handle3447015
B Snel P Bork MA Huynen (1999) ArticleTitleGenome phylogeny based on gene content. Nature Genet 21 108–110 Occurrence Handle10.1038/5052 Occurrence Handle1:CAS:528:DyaK1MXltlWjtQ%3D%3D Occurrence Handle9916801
F Tekaia A Lazcano B Dujon (1999) ArticleTitleThe genomic tree as revealed from whole genome proteome comparisons. Genome Res 9 550–557 Occurrence Handle1:CAS:528:DyaK1MXksVems7s%3D Occurrence Handle10400922
JF Tomb O White AR Kerlavage et al. (1997) ArticleTitleThe complete genome sequence of the gastric pathogen Helicobacter pyroli. Nature 388 539–547 Occurrence Handle1:CAS:528:DyaK2sXlt1artb0%3D Occurrence Handle9252185
DL Wheeler DM Church S Federhen et al. (2003) ArticleTitleDatabase resources of the National Center for Biotechnology. Nucleic Acids Res 31 28–33 Occurrence Handle10.1093/nar/gkg033 Occurrence Handle1:CAS:528:DC%2BD3sXhvFSgu7w%3D Occurrence Handle12519941
CR Woese (1998) ArticleTitleThe universal ancester. Proc Natl Acad Sci USA 95 6854–6859 Occurrence Handle1:CAS:528:DyaK1cXjslynu7w%3D Occurrence Handle9618502
CR Woese (2000) ArticleTitleInterpreting the universal phylogenetic tree. Proc Natl Acad Sci USA 97 8392–8396 Occurrence Handle1:CAS:528:DC%2BD3cXlt1Ggtbc%3D Occurrence Handle10900003
CR Woese GE Fox (1977) ArticleTitlePhylogenetic structure of the prokaryotic domain: The primary kingdoms. Proc Natl Acad Sci USA 74 5088–5090 Occurrence Handle1:STN:280:CSeD2MzgsFI%3D Occurrence Handle270744
YI Wolf IB Rogozin NV Grishin RL Tatusov EV Koonin (2001) ArticleTitleGenome trees constructed using five different approaches suggest new major bacterial clades. BMC Evol Biol 1 8 Occurrence Handle10.1186/1471-2148-1-8 Occurrence Handle1:STN:280:DC%2BD3srgs1Wruw%3D%3D Occurrence Handle11734060
E Zuckerkandl L Pauling (1965) Evolutionary divergence and convergence in proteins. V Bryson HJ Vogel (Eds) Evolving genes and proteins. Academic Press New York 97–166
Acknowledgements
The authors thank Drs. Yang Zhong (Fudan University) and Hongya Gu (Peking University) for discussion and comments. We also thank an anonymous referee who pointed out the problem with small genomes. The use of the 64 CPU IBM Cluster at Peking University is also gratefully acknowledged. This work was supported in part by grants from the Special Funds for Major State Basic Research Project of China, the Innovation Project of CAS, and Major Innovation Research Project “248” of Beijing Municipality.
Author information
Authors and Affiliations
Corresponding author
Appendix
Appendix
The list of all prokaryotic genomes used in our study is given in Tables A1 and A2. The species are listed in accordance with their “Bergey Code” in order to make comparison of the trees with Bergey‘s Manual easier. The Bergey Code is a shorthand of the classification given in the 2001 edition of Bergey‘s Manual of Systematic Bacteriology (Garrity et al. 2001). For example, Lacococcus lactis is listed under Phylum BXIII (Firmicutes)—Class III (Bacilli)—Order II (Lactobacillales)—Family VI (Streptococcaceae)—Genus II (Lactococcus). We changed all Roman numerals to Arabic and wrote the lineage as B13.3.2.6.2, dropping the taxonomic units and the Latin names.
The six eukaryotes included are Saccharomyces cerevisiae (Yeast; NC_001133–48), Caenorhabitidis elegans (worm; NC_003279–84), Arabidopsis thaliana (Arath; NC_003070.71.74.75.76), Encephalitozoon cuniculi (Enccu; NC_003242.29–38), Plasmodium falciparum (Plafa; NC_000521.910.4314–18.25–31), and Schizosaccharomyces pombe (Schpo; NC_003421. 23.24).
Rights and permissions
About this article
Cite this article
Qi, J., Wang, B. & Hao, BI. Whole Proteome Prokaryote Phylogeny Without Sequence Alignment: A K-String Composition Approach . J Mol Evol 58, 1–11 (2004). https://doi.org/10.1007/s00239-003-2493-7
Received:
Accepted:
Issue Date:
DOI: https://doi.org/10.1007/s00239-003-2493-7