Fractal and Dynamical Language Methods to Construct Phylogenetic Tree Based on Protein Sequences from Complete Genomes

  • Zu-Guo Yu
  • Vo Anh
  • Li-Quan Zhou
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3612)


The complete genomes of living organisms have provided much information on their phylogenetic relationships. In the past few years, we proposed three alternative methods to model the noise background in the composition vector of protein sequences from a complete genome. The first method is based on the frequencies of the 20 kinds of amino acids appearing in the genome and the multiplicative model. The second method is based on the iterated function system model in fractal geometry. The last method is based on the relationship between a word and its two sub-words in the theory of symbolic dynamics. Here we introduce these methods. The complete genomes of prokaryotes and eukaryotes are selected to test these algorithms. Our distance-based phylogenetic tree of prokaryotes and eukaryotes agrees with the biologists’ “tree of life” based on the 16S-like rRNA genes in a majority of basic branches and most lower taxa.


Protein Sequence Complete Genome Iterate Function System Multiplicative Model Simple Correlation Analysis 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Anh, V.V., Lau, K.S., Yu, Z.G.: Recognition of an organism from fragments of its complete genome. Phys. Rev. E 66, 031910 (2002)CrossRefGoogle Scholar
  2. 2.
    Brown, T.A.: Genetics, 3rd edn. Chapman & Hall, London (1998)Google Scholar
  3. 3.
    Brown, J.R., Doolittle, W.F.: Archaea and the prokaryote-to-eukaryote transition. Micro-biol. Mol. Biol. Rev. 61, 456–502 (1997)Google Scholar
  4. 4.
    Charlebois, R.L., Beiko, R.G., Ragan, M.A.: Branching out. Nature 421, 217–217 (2003)CrossRefGoogle Scholar
  5. 5.
    Chatton, E.: Titres et travaux scientifiques (Sette, Sottano, Italy) (1937)Google Scholar
  6. 6.
    Chu, K.H., Qi, J., Yu, Z.G., Anh, V.V.: Origin and Phylogeny of Chloroplasts revealed by a simple correlation analysis of complete genome. Mol. Biol. Evol. 21, 200–206 (2004)CrossRefGoogle Scholar
  7. 7.
    Doolittle, R.F.: Microbial genomes opened up. Nature 392, 339–342 (1998)CrossRefGoogle Scholar
  8. 8.
    Doolittle, R.F.: Phylogenetic classification and the universal tree. Science 284, 2124–2128 (1999)CrossRefGoogle Scholar
  9. 9.
    Eisen, J.A., Fraser, C.M.: Phylogenomics: intersection of evolution and genomics. Science 300, 1706–1707 (2003)CrossRefGoogle Scholar
  10. 10.
    Felsenstein, J.: PHYLIP (phylogeny Inference package) version 3.5c (1993), Distributed by the author at
  11. 11.
    Fitch, W.M., Margoliash, E.: Construction of phylogenetic trees. Science 155, 279–284 (1967)CrossRefGoogle Scholar
  12. 12.
    Fitz-Gibbon, S.T., House, C.H.: Whole genome-based phylogenetic analysis of free-living microorganisms. Nucleic Acids Res. 27, 4218–4222 (1999)CrossRefGoogle Scholar
  13. 13.
    Gupta, R.S.: Protein phylogenies and signature sequences: A reappraisal of evolutionary relationships among Archaebacteria, Eubacteria, and Eukaryotes. Microbiol. Mol. Biol. Rev. 62, 1435–1491 (1998)Google Scholar
  14. 14.
    Iwabe, N., et al.: Evolutionary relationship of archaebacteria, eubacteria and eukaryotes in-ferred from phylogenetic trees of duplicated genes. Proc. Natl. Acad. Sci. USA 86, 9355–9359 (1989)CrossRefGoogle Scholar
  15. 15.
    Li, M., Badger, J.H., Chen, X., Kwong, S., Kearney, P., Zhang, H.: An information-based sequence distance and its application to whole mitochondrial genome phylogeny. Bioinformatics 17, 149–154 (2001)CrossRefGoogle Scholar
  16. 16.
    Lin, J., Gerstein, M.: Whole-genome trees based on the occurrence of folds and orthologs, implications for comparing genomes at different levels. Genome Res. 10, 808–818 (2000)CrossRefGoogle Scholar
  17. 17.
    Martin, W., Herrmann, R.G.: Gene transfer from organelles to the nucleus: How much, what happens, and why? Plant Physiol. 118, 9–17 (1998)CrossRefGoogle Scholar
  18. 18.
    Mayr, E.: Two empires or three. Proc. Natl. Acad. Sci. U.S.A. 95, 9720–9723 (1998)CrossRefGoogle Scholar
  19. 19.
    Qi, J., Luo, H., Hao, B.: CVTree: a phylogenetic tree reconstruction tool based on whole genomes. Nucleic Acids Research 32, W45–W47 (2004a)CrossRefGoogle Scholar
  20. 20.
    Qi, J., Wang, B., Hao, B.: Whole proteome prokaryote phylogeny without sequence alignment: a K-string composition approach. J. Mol. Evol. 58, 1–11 (2004b)CrossRefGoogle Scholar
  21. 21.
    Ragan, M.A.: Detection of lateral gene transfer among microbial genomes. Curr. Opin. Gen. Dev. 11, 620–626 (2001)CrossRefGoogle Scholar
  22. 22.
    Saitou, N., Nei, M.: The neighbor-joining method: a new method for reconstructing phylogenetic trees. Mol. Biol. Evol. 4, 406–425 (1987)Google Scholar
  23. 23.
    Sankoff, D., Leaduc, G., Antoine, N., Paquin, B., Lang, B.F., Cedergren, R.: Gene order comparisons for phylogenetic inference: Evolution of the mitochondrial genome. Proc. Natl. Acad. Sci. U.S.A. 89, 6575–6579 (1992)CrossRefGoogle Scholar
  24. 24.
    Stuart, G.W., Moffet, K., Baker, S.: Integrated gene species phylogenies from unaligned whole genome protein sequences. Bioinformatics 18, 100–108 (2002a)CrossRefGoogle Scholar
  25. 25.
    Stuart, G.W., Moffet, K., Leader, J.J.: A comprehensive vertebrate phylogeny using vector representations of protein sequences from whole genomes. Mol. Biol. Evol. 19, 554–562 (2002b)Google Scholar
  26. 26.
    Tekaia, F., Lazcano, A., Dujon, B.: The genomic tree as revealed from whole proteome comparisons. Genome Res. 9, 550–557 (1999)Google Scholar
  27. 27.
    Vrscay, E.R.: Fractal Geometry and analysis. In: Belair, J. (ed.). NATO ASI series. Kluwer Academic Publishers, Dordrecht (1991)Google Scholar
  28. 28.
    Weiss, O., Jimenez, M.A., Herzel, H.: Information content of protein sequences. J. Theor. Biol. 206, 379–386 (2000)CrossRefGoogle Scholar
  29. 29.
    Woese, C.R.: Bacterial evolution. Microbiol. Rev. 51, 221–271 (1987)Google Scholar
  30. 30.
    Woese, C.R.: The universal ansestor. Proc. Natl. Acad. Sci. USA 95, 6854–6859 (1998)CrossRefGoogle Scholar
  31. 31.
    Woese, C.R., Kandler, O., Wheelis, M.L.: Towards a natural system of organisms: Proposal for the domains Archaea, Bacteria, and Eucarya. Proc. Natl. Acad. Sci. USA 87, 4576–4579 (1990)CrossRefGoogle Scholar
  32. 32.
    Yu, Z.G., Anh, V.: Phylogenetic tree of prokaryotes based on complete genomes using fractal and correlation analyses. In: Proceedings of the Second Asia-Pacific Bioinformatics Conference, Dunedin, New Zealand. The Australian Computer Society Inc. (2004)Google Scholar
  33. 33.
    Yu, Z.G., Jiang, P.: Distance, correlation and mutual information among portraits of organisms based on complete genomes. Phys. Lett. A 286, 34–46 (2001)zbMATHCrossRefGoogle Scholar
  34. 34.
    Yu, Z.G., Anh, V., Lau, K.S.: Multifractal and correlation analysis of protein sequences from complete genome. Phys. Rev. E. 68, 021913 (2003a)CrossRefGoogle Scholar
  35. 35.
    Yu, Z.G., Anh, V., Lau, K.S.: Chaos game representation, and multifractal and correlation analysis of protein sequences from complete genome based on detailed HP model. J. Theor. Biol. 226, 341–348 (2004)CrossRefMathSciNetGoogle Scholar
  36. 36.
    Yu, Z.G., Anh, V., Lau, K.S., Chu, K.H.: The genomic tree of living organisms based on a fractal model. Phys. Lett. A 317, 293–302 (2003b)zbMATHCrossRefMathSciNetGoogle Scholar
  37. 37.
    Yu, Z.G., Zhou, L.Q., Anh, V.V., Chu, K.H., Long, S.C., Deng, J.Q.: Phylogeny of prokaryotes and chloroplasts revealed by a simple composition approach on all protein sequences from whole genome without sequence alignment. J. Mol. Evol. 60, 538–545 (2005)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2005

Authors and Affiliations

  • Zu-Guo Yu
    • 1
    • 2
  • Vo Anh
    • 1
  • Li-Quan Zhou
    • 2
  1. 1.Program in Statistics and Operations ResearchQueensland University of TechnologyBrisbaneAustralia
  2. 2.School of Mathematics and Computing ScienceXiangtan UniversityHunanChina

Personalised recommendations