Journal of Molecular Evolution

, Volume 58, Issue 1, pp 1–11

Whole Proteome Prokaryote Phylogeny Without Sequence Alignment: A K-String Composition Approach


DOI: 10.1007/s00239-003-2493-7

Cite this article as:
Qi, J., Wang, B. & Hao, BI. J Mol Evol (2004) 58: 1. doi:10.1007/s00239-003-2493-7


A systematic way of inferring evolutionary relatedness of microbial organisms from the oligopeptide content, i.e., frequency of amino acid K-strings in their complete proteomes, is proposed. The new method circumvents the ambiguity of choosing the genes for phylogenetic reconstruction and avoids the necessity of aligning sequences of essentially different length and gene content. The only “parameter” in the method is the length K of the oligopeptides, which serves to tune the “resolution power” of the method. The topology of the trees converges with K increasing. Applied to a total of 109 organisms, including 16 Archaea, 87 Bacteria, and 6 Eukarya, it yields an unrooted tree that agrees with the biologists’ “tree of life” based on SSU rRNA comparison in a majority of basic branchings, and especially, in all lower taxa.


Prokaryote Phylogeny Archaea K-strings Compositional distance Tree of life 

Copyright information

© Springer-Verlag New York Inc. 2004

Authors and Affiliations

  1. 1.The Institute of Theoretical Physics, Academia Sinica, Beijing 100080China
  2. 2.The T-Life Research CenterFudan University, Shanghai 200433China

Personalised recommendations