Journal of Biological Physics

, Volume 28, Issue 3, pp 439–447 | Cite as

Phylogeny Based on Whole Genome as inferred from Complete Information Set Analysis

  • W. Li
  • W. Fang
  • L. Ling
  • J. Wang
  • Z. Xuan
  • R. Chen


Previous molecular phylogeny algorithms mainly rely onmulti-sequence alignments of cautiously selected characteristic sequences,thus not directly appropriate for whole genome phylogeny where eventssuch as rearrangements make full-length alignments impossible. Weintroduce here the concept of Complete Information Set (CIS) and itsmeasurement implementation as evolution distance without reference tosizes. As method proof-test, the 16s rRNA sequences of 22 completelysequenced Bacteria and Archaea species are used to reconstruct aphylogenetic tree, which is generally consistent with the commonlyaccepted one. Based on whole genome, our further efforts yield a highlyrobust whole genome phylogenetic tree, supporting separate monophyleticcluster of species with similar phenotype as well as the early evolution ofthermophilic Bacteria and late diverging of Eukarya. The purpose of thiswork is not to contradict or confirm previous phylogeny standards butrather to bring a brand-new algorithm and tool to the phylogeny researchcommunity. The software to estimate the sequence distance and materialsused in this study are available upon request to corresponding author.

comparative genomics information discrepancy molecular evolution sequence analysis 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Koonin, E.V.: The Emerging Paradigm and Open Problems in Comparative Genomics, Bioinformatics 15(1999), 265–266.Google Scholar
  2. 2.
    Woese, C.R., Kandler, O. and Wheelis, M.L.: Towards a Natural System of Organisms: Proposal for the Domains Archaea, Bacteria, and Eucarya, Proc. Natl. Acad. Sci. USA 87(1990), 4576–4579.Google Scholar
  3. 3.
    Doolittle, W.F. and Logsdon, J.M., Jr.: Archaeal Genomics: Do Archaea have a Mixed Heritage? Curr. Biol. 8(1998), R209–211.Google Scholar
  4. 4.
    Woese, C.: The Universal Ancestor, Proc. Natl. Acad. Sci. USA 95(1998), 6854–6859.Google Scholar
  5. 5.
    Nomura, M.: Engineering of Bacterial Ribosomes: Replacement of all Seven Escherichia colirRNA Operons by a Single Plasmid-Encoded Operon, Proc. Natl. Acad. Sci. USA 96(1999), 1820–1822.Google Scholar
  6. 6.
    Pennisi, E.: Is it Time to Uproot the Tree of Life? Science 284(1999), 1305–1307.Google Scholar
  7. 7.
    Boore, J.L. and Brown, W.M.: Big Trees from Little Genomes: Mitochondrial Gene Order as a Phylogenetic Tool, Curr. Opin. Genet. Dev. 8(1998), 668–674.Google Scholar
  8. 8.
    Snel, B., Bork, P. and Huynen, M.A.: Genome Phylogeny Based on Gene Content, Nat. Genet. 21(1999), 108–110.Google Scholar
  9. 9.
    Lin, J. and Gerstein, M.: Whole-Genome Trees based on the Occurrence of Folds and Orthologs: Implications for Comparing Genomes on Different Levels, Genome Res. 10(2000), 808–818.Google Scholar
  10. 10.
    Brown, J.R., Douady, C.J., Italia, M.J., Marshall, W.E. and Stanhope, M.J.: Universal Trees based on Large Combined Protein Sequence Data Sets, Nat. Genet. 28(2001), 281–285.Google Scholar
  11. 11.
    Li, M. et al.: An Information-Based Sequence Distance and its Application to Whole Mitochondrial Genome Phylogeny, Bioinformatics 17(2001), 149–154.Google Scholar
  12. 12.
    Hariri, A., Weber, B. and Olmsted, J.: 3rd. On the Validity of Shannon-Information Calculations for Molecular Biological Sequences, J. Theor. Biol. 147(1990), 235–254.Google Scholar
  13. 13.
    Fang, W.W.: The Characterization of a Measure of Information Discrepancy, Information 125(2000), 207–252.Google Scholar
  14. 14.
    Fang, W.W.: On a Global Optimization Problem in the Study of Information Discrepancy, J. Global Optimization 11(1997), 387–408.Google Scholar
  15. 15.
    Kullback, S.: Information Theory and Statistics, Wiley, New York, 1959.Google Scholar
  16. 16.
    Saitou, N. and Nei, M.: The Neighbor-Joining Method: A new Method for Reconstructing Phylogenetic Trees, Mol. Biol. Evol. 4(1987), 406–425.Google Scholar
  17. 17.
    Efron, B., Halloran, E. and Holmes, S.: Bootstrap Confidence Levels for Phylogenetic Trees, Proc. Natl. Acad. Sci. USA 93(1996), 13429–13434.Google Scholar
  18. 18.
    Thompson, J.D., Higgins, D.G. and Gibson, T.J.: CLUSTAL W: Improving the Sensitivity of Progressive Multiple Sequence Alignment through SequenceWeighting, Position-Specific Gap Penalties and Weight Matrix Choice, Nucleic Acids Res. 22(1994), 4673–4680.Google Scholar
  19. 19.
    Hillis, D.M., Huelsenbeck, J.P. and Swofford, D.L.: Hobgoblin of Phylogenetics? Nature 369(1994), 363–364.Google Scholar
  20. 20.
    Russo, C.A., Takezaki, N. and Nei, M.: Efficiencies of Different Genes and Different Tree-Building Methods in Recovering a Known Vertebrate Phylogeny, Mol. Biol. Evol. 13(1996), 525–536.Google Scholar

Copyright information

© Kluwer Academic Publishers 2002

Authors and Affiliations

  • W. Li
    • 1
  • W. Fang
    • 2
  • L. Ling
    • 1
  • J. Wang
    • 1
  • Z. Xuan
    • 1
  • R. Chen
    • 1
  1. 1.Laboratory of Bioinformatics, Institute of BiophysicsChinese Academy of SciencesBeijingChina
  2. 2.Academy of Mathematical and Systemic SciencesChinese Academy of SciencesBeijingChina

Personalised recommendations