Journal of Molecular Evolution

, Volume 64, Issue 1, pp 90–100 | Cite as

Application of the Character Compatibility Approach to Generalized Molecular Sequence Data: Branching Order of the Proteobacterial Subdivisions

  • Radhey S. Gupta
  • Peter H. A. Sneath


The character compatibility approach, which removes all homoplasic characters and involves finding the largest clique of compatible characters in a dataset, in principle, provides a powerful means for obtaining correct topology in difficult to resolve cases. However, the usefulness of this approach to generalized molecular sequence data for phylogeny determination has not been studied in the past. We have used this approach to determine the topology of 23 proteobacterial species (6 each of α-, β- and γ-, 3 δ-, and 2 ε-proteobacteria) using sequence data for 10 conserved proteins (Hsp60, Hsp70, EF-Tu, EF-G, alanyl-tRNA synthetase, RecA, GyrA, GyrB, RpoB and RpoC). All sites in the sequence alignments of these proteins where only two amino acids were found, with each amino acid present in at least two species, were selected. Mutual compatibility determination on these binary state sites was carried out by two means. In one case, all of these sites were combined into a large dataset (Set A; 957 characters) prior to compatibility analysis. In the second case, compatibility analysis was carried out on characters from individual proteins and all compatible sites were combined into a large dataset (Set B; 398 characters) for further studies. Upon compatibility analyses, the largest cliques that were obtained from Sets A and B consisted of 337 and 323 compatible characters, respectively. In these cliques, all proteobacterial subgroups were clearly distinguished and branching orders of most of the species were also resolved. The ε-proteobacteria exhibited the earliest branching, whereas the β- and γ-subgroups were found to have emerged last. The relative placement of the α- and δ-subgroups, however, was not resolved. The topology of these species was also determined based on 16S rRNA sequences and a concatenated dataset of sequences for all 10 proteins by means of neighbor-joining, maximum likelihood, and maximum parsimony methods. In the protein trees, all proteobacterial groups were reliably resolved and they branched in the following order: (ε(δ(α(β,γ)))). However, in the rRNA trees, the γ- and β-subgroups exhibited polyphyletic branching and many internal nodes were not resolved. These results indicate that the character compatibility analysis using generalized molecular sequence data provides a powerful means for evolutionary studies. Based on molecular sequences, it should be possible to obtain very large datasets of compatible characters that should prove very helpful in clarifying difficult to resolve phylogenetic relationships.


Character compatibility approach Ceique analysis Compatibility analysis Molecular sequences Branching order Proteobacteria Protobacterial subdivisions 



We thank Yan Li for writing the computer algorithms for the DUALSITE and the HARMONY programs. The work from R.S.G.’s lab, including support for Yan Li, was through a grant from the National Science and Engineering Research Council of Canada.


  1. Baldauf SL, Roger AJ, Wenk-Siefert I, Doolittle WF (2000) A kingdom-level phylogeny of eukaryotes based on combined protein data. Science 290:972–977PubMedCrossRefGoogle Scholar
  2. Beiko RG, Harlow TJ, Ragan MA (2005) Highways of gene sharing in prokaryotes. Proc Natl Acad Sci USA 102:14332–14337PubMedCrossRefGoogle Scholar
  3. Bron C, Lerbosch J (1973) Alogrithm 457:Finding all cliques of an undirected graph. Commun Assoc Comput Mach 16:575–577Google Scholar
  4. Brown JR, Douady CJ, Italia MJ, Marshall WE, Stanhope MJ (2001) Universal trees based on large combined protein sequence data sets. Nat Genet 28:281–285PubMedCrossRefGoogle Scholar
  5. Buneman P (1971) The recovery of trees from measures of dissimilarity. In: Hodson FR, Kendall DG, Tautu P (eds) Mathematics in the archaeological and historical sciences. Edinburgh University Press, Edinburgh, pp 387–395Google Scholar
  6. Creevey CJ, Fitzpatrick DA, Philip GK, Kinsella RJ, O’Connell MJ, Pentony MM, Travers SA, Wilkinson M, McInerney JO (2004) Does a tree-like phylogeny only exist at the tips in the prokaryotes? Proc Biol Sci 271:2551–2558PubMedCrossRefGoogle Scholar
  7. Daubin V, Gouy M, Perriere G (2002) A phylogenomic approach to bacterial phylogeny:evidence of a core of genes sharing a common history. Genome Res 12:1080–1090PubMedCrossRefGoogle Scholar
  8. De Ley J (1992) The Proteobacteria: ribosomal RNA cistron similarities and bacterial taxonomy. In: Balows A, Trüper HG, Dworkin M, Harder W, Schleifer KH (eds) The prokaryotes. Springer-Verlag, New York, pp 2111–2140Google Scholar
  9. Delsuc F, Brinkmann H, Philippe H (2005) Phylogenomics and the reconstruction of the tree of life. Nat Rev Genet 6:361–375PubMedCrossRefGoogle Scholar
  10. Eisen JA (1995) The RecA protein as a model molecule for molecular systematic studies of bacteria:comparison of trees of RecAs and 16S rRNAs from the same species. J Mol Evol 41:1105–1123PubMedCrossRefGoogle Scholar
  11. Erwin DH, Davidson EH (2002) The last common bilaterian ancestor. Development 129:3021–3032PubMedGoogle Scholar
  12. Estabrook GF, McMorris FR (1980) When is one estimate of evolutionary relationship a refinement of another? J Math Biol 10:367–373CrossRefGoogle Scholar
  13. Estabrook GF, Johnson CS Jr, McMorris FR (1976) A mathematical foundation for the analysis of cladistic character compatibility. Math Biosci 29:181–187CrossRefGoogle Scholar
  14. Felsenstein J (1978) Cases in which parsimony and compatibility methods will be positively misleading. Syst Zool 27:401–410CrossRefGoogle Scholar
  15. Felsenstein J (1981a) A likelihood approach to character weighting and what it tells us about parsimony and compatibility. Biol J Linn Soc 16:183–196Google Scholar
  16. Felsenstein J (1981b) Evolutionary trees from DNA sequences: a maximum likelihood approach. J Mol Evol 17:368–376CrossRefGoogle Scholar
  17. Felsenstein J (1985) Confidence limits in phylogenies: an approach using the bootstap. Evolution 39:783–791CrossRefGoogle Scholar
  18. Felsenstein J (1993) PHYLIP, version 3.5c. University of Washington, SeattleGoogle Scholar
  19. Felsenstein J (2004) Inferring phylogenies. Sinauer Associates, Sunderland, MAGoogle Scholar
  20. Fitch WM (1971) Toward defining the course of evolution:minimum change for a specified tree topology. Syst Zool 20:406–416CrossRefGoogle Scholar
  21. Fitch WM (1975) Towards finding the tree of maximum parsimony. In: Estabrook GF (ed) Proceedings of the Eighth International Conference on Numerical Taxonomy. W. H. Freeman, San Francisco, pp 189–230Google Scholar
  22. Gogarten JP, Doolittle WF, Lawrence JG (2002) Prokaryotic evolution in light of gene transfer. Mol Biol Evol 19:2226–2238PubMedGoogle Scholar
  23. Gophna U, Doolittle WF, Charlebois RL (2005) Weighted genome trees:refinements and applications. J Bacteriol 187:1305–1316PubMedCrossRefGoogle Scholar
  24. Griffiths E, Gupta RS (2004) Signature sequences in diverse proteins provide evidence for the late divergence of the order Aquificales. Int Microbiol 7:41–52PubMedGoogle Scholar
  25. Griffiths E, Ventresca MS, Gupta RS (2006) BLAST screening of chlamydial genomes to identify signature proteins that are unique for the Chlamydiales, Chlamydiaceae, Chlamydophila and Chlamydia groups of species. BMC Genomics 7:14PubMedCrossRefGoogle Scholar
  26. Gupta RS (1995) Phylogenetic analysis of the 90 kD heat shock family of protein sequences and an examination of the relationship among animals, plants, and fungi species. Mol Biol Evol 12:1063–1073PubMedGoogle Scholar
  27. Gupta RS (1998) Protein phylogenies and signature sequences: a reappraisal of evolutionary relationships among archaebacteria, eubacteria, and eukaryotes. Microbiol Mol Biol Rev 62:1435–1491PubMedGoogle Scholar
  28. Gupta RS (2000) The phylogeny of Proteobacteria: relationships to other eubacterial phyla and eukaryotes. FEMS Microbiol Rev 24:367–402PubMedCrossRefGoogle Scholar
  29. Gupta RS (2001) The branching order and phylogenetic placement of species from completed bacterial genomes, based on conserved indels found in various proteins. Inter Microbiol 4:187–202CrossRefGoogle Scholar
  30. Gupta RS (2003) Evolutionary relationships among photosynthetic bacteria. Photosynth Res 76:173–183PubMedCrossRefGoogle Scholar
  31. Gupta RS (2005) Protein signatures distinctive of Alpha proteobacteria and its subgroups and a model for alpha proteobacterial evolution. Crit Rev Microbiol 31:135CrossRefGoogle Scholar
  32. Gupta RS (2006) Molecular signatures (unique proteins and conserved Indels) that are specific for the epsilon proteobacteria (Campylobacterales) BMC Genomics 7:167PubMedCrossRefGoogle Scholar
  33. Gupta RS, Griffiths E (2002) Critical issues in bacterial phylogenies. Theor Popul Biol 61:423–434PubMedCrossRefGoogle Scholar
  34. Harris JK, Kelley ST, Spiegelman GB, Pace NR (2003) The genetic core of the universal ancestor. Genome Res 13:407–412PubMedCrossRefGoogle Scholar
  35. Hasegawa M, Fujiwara M (1993) Relative efficiencies of the maximum likelihood, maximum parsimony, and neighbor-joining methods for estimating protein phylogeny. Mol Phylogenet Evol 2:1–5PubMedCrossRefGoogle Scholar
  36. Huelsenbeck JP, Bollback JP (2001) Empirical and hierarchical Bayesian estimation of ancestral states. Syst Biol 50:351–366PubMedCrossRefGoogle Scholar
  37. Jeanmougin F, Thompson JD, Gouy M, Higgins DG, Gibson TJ (1998) Multiple sequence alignment with Clustal x. Trends Biochem Sci 23:403–405PubMedCrossRefGoogle Scholar
  38. Kainth P, Gupta RS (2005) Signature proteins that are distinctive of alpha proteobacteria. BMC Genomics 6:94PubMedCrossRefGoogle Scholar
  39. Kannan S, Warnow TJ (1995) Inferring evolutionary history from DNA sequences. SIAM J Comput 23:713–737CrossRefGoogle Scholar
  40. Kersters K, Devos P, Gillis M, Vandamme P, Stackebrandt E (2003) Introduction to the proteobacteria. In: Dworkin M (ed) The prokaryotes:an evolving electronic resource for the microbiological community. Springer-Verlag, New YorkGoogle Scholar
  41. Kimura M (1980) A simple method for estimating evolutionary rates of base substitutions through comparative studies of nucleotide sequences. J Mol Evol 16:111–120PubMedCrossRefGoogle Scholar
  42. Kimura M (1983) The neutral theory of molecular evolution. Cambridge University Press, CambridgeGoogle Scholar
  43. Kishino H, Hasegawa M (1989) Evaluation of the maximum likelihood estimate of the evolutionary tree topologies from DNA sequence data, and the branching order in hominoidea. J Mol Evol 29:170–179PubMedCrossRefGoogle Scholar
  44. Kumar S, Tamura K, Nei M (2004) MEGA3: integrated software for molecular evolutionary genetics analysis and sequence alignment. Brief Bioinform 5:150–163PubMedCrossRefGoogle Scholar
  45. Kunisawa T (2001) Gene arrangements and phylogeny in the class Proteobacteria. J Theor Biol 213:9–19PubMedCrossRefGoogle Scholar
  46. Kunisawa T (2006) Dichotomy of major bacterial phyla inferred from gene arrangement comparisons. J Theor Biol 239:367–375PubMedCrossRefGoogle Scholar
  47. Lake JA, Rivera MC (2004) Deriving the genomic tree of life in the presence of horizontal gene transfer:conditioned reconstruction. Mol Biol Evol 21:681–690PubMedCrossRefGoogle Scholar
  48. Le Quesne WJ (1969) A method of selection of characters in numerical taxonomy. Syst Zool 18:201–205CrossRefGoogle Scholar
  49. Le Quesne WJ (1975) The uniquely evolved character concept and its cladistic application. Syst Zool 23:513–517CrossRefGoogle Scholar
  50. Ludwig W, Klenk H-P (2001) Overview: a phylogenetic backbone and taxonomic framework for prokaryotic systamatics. In: Boone DR, Castenholz RW (eds) Bergey’s manual of systematic bacteriology. Springer-Verlag, Berlin, pp 49–65Google Scholar
  51. Maidak BL, Cole JR, Lilburn TG, Parker CT, Jr., Saxman PR, Farris RJ, Garrity GM, Olsen GJ, Schmidt TM, Tiedje JM (2001) The RDP-II (Ribosomal Database Project). Nucleic Acids Res 29:173–174PubMedCrossRefGoogle Scholar
  52. Meacham CA (1994) Phylogenetic relationships at the basal radiation of angiosperms: further study by probability of character compatibilityy. Syst Bot 19:506–522CrossRefGoogle Scholar
  53. Meacham CA, Estabrook GF (1985) Comaptibility methods in systematics. Annu Rev Ecol Syst 16:431–446CrossRefGoogle Scholar
  54. Nielsen C (2003) Defining phyla: morphological and molecular clues to metazoan evolution. Evol Dev 5:386–393PubMedCrossRefGoogle Scholar
  55. O’Keefe FR, Wagner PJ (2001) Inferring and testing hypthoses of cladistic character dependence by using character compatibility. Syst Bot 50:657–675Google Scholar
  56. Ochman H (2001) Lateral and oblique gene transfer. Curr Opin Genet Dev 11:616–619PubMedCrossRefGoogle Scholar
  57. Olsen GJ, Woese CR, Overbeek R (1994) The winds of (evolutionary) change: breathing new life into microbiology. J Bacteriol 176:1–6PubMedGoogle Scholar
  58. Penny D (1976) Criteria for optimising phylogenetic trees and the problem of determining the root of a tree. J Mol Evol 8:95–116PubMedCrossRefGoogle Scholar
  59. Pisani D (2004) Identifying and removing fast-evolving sites using compatibility analysis: an example from the Arthropoda. Syst Biol 53:978–989PubMedCrossRefGoogle Scholar
  60. Saitou N, Nei M (1987) The neighbor-joining method: a new method for reconstructing phylogenetic trees. Mol Biol Evol 4:406–425PubMedGoogle Scholar
  61. Schmidt HA, Strimmer K, Vingron M, von Haeseler A (2002) TREE-PUZZLE: maximum likelihood phylogenetic analysis using quartets and parallel computing. Bioinformatics 18:502–504PubMedCrossRefGoogle Scholar
  62. Semple C, Steel M (2003) Phylogenetics. Oxford University Press, OxfordGoogle Scholar
  63. Sneath PHA (2001) Numerical taxonomy. In: Boone DR, Castenholz RW (eds) Bergey’s manual of systematic bacteriology. Springer-Verlag, Berlin, pp 39–42Google Scholar
  64. Sneath PHA, Sackin MJ, Ambler RP (1975) Detecting evolutionary incompatibilities from protein sequences. Syst Zool 24:311–332CrossRefGoogle Scholar
  65. Stackebrandt E, Murray RGE, Trüper HG (1988) Proteobacteria classis nov., a name for the phylogenetic taxon that includes the “purple bacteria and their relatives.” Int J Syst Bacteriol 38:321–325Google Scholar
  66. Tamura K, Nei M (1993) Estimation of the number of nucleotide substitutions in the control region of mitochondrial DNA in humans and chimpanzees. Mol Biol Evol 10:512–526PubMedGoogle Scholar
  67. Tateno Y, Takezei N, Nei M (1994) Relative efficiencies of the maximum-likelihood, neighbor-joining, and maximum parsimony methods when substitution rate varies with site. Mol Biol Evol 12:261–277Google Scholar
  68. Van de Peer Y, De Wachter R (1994) TREECON for Windows: a software package for the construction and drawing of evolutionary trees for the Microsoft Windows environment. Comput Appl Biosci 10:569–570PubMedGoogle Scholar
  69. Whelan S, Goldman N (2001) A general empirical model of protein evolution derived from multiple protein families using a maximum-likelihood approach. Mol Biol Evol 18:691–699PubMedGoogle Scholar
  70. Wilkinson M (2001) PICA 4.0: software and documentation. Department of Zoology, Natural History Museum, LondonGoogle Scholar
  71. Wilkinson M, Cotton JA, Creevey C, Eulenstein O, Harris SR, Lapointe FJ, Levasseur C, McInerney JO, Pisani D, Thorley JL (2005) The shape of supertrees to come:tree shape related properties of fourteen supertree methods. Syst Biol 54:419–431PubMedCrossRefGoogle Scholar
  72. Wilmotte A, Herdman M (2001) Phylogenetic relationships among the cyanobacteria based on 16S rRNA sequences. In: Boone DR, Castenholz RW (eds) Bergey’s manual of systematic bacteriology. Springer, New York, pp 487–493Google Scholar
  73. Wilson EO (1965) A consistency test for phylogenies based on contemporaneous species. Syst Zool 14:214–220CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media, Inc. 2006

Authors and Affiliations

  1. 1.Department of Biochemistry and Biomedical SciencesMcMaster UniversityHamiltonCanada
  2. 2.Department of Infection, Immunity and InflammationUniversity of LeicesterLeicesterEngland

Personalised recommendations