Skip to main content
Log in

Compositional statistics: An improvement of evolutionary parsimony and its application to deep branches in the tree of life

  • Published:
Journal of Molecular Evolution Aims and scope Submit manuscript

Summary

We present compositional statistics, a new method of phylogenetic inference, which is an extension of evolutionary parsimony. Compositional statistics takes account of the base composition of the compared sequences by using nucleotide positions that evolutionary parsimony ignores. It shares with evolutionary parsimony the features of rate invariance and the fundamental distinction between transitions and transversions. Of the presently available methods of phylogenetic inference, compositional statistics is based on the fewest and mildest assumptions about the mode of DNA sequence evolution. It is therefore applicable to phylogenetic studies of the most distantly related organisms or molecules. This was illustrated by analyzing conservative positions in the DNA sequences of the large subunit of RNA polymerase from three archaebacterial groups, a eubacterium, a chloroplast, and the three eukaryotic polymerases. Internally consistent results, which are in accord with our knowledge of organelle origin and archaebacterial physiology, were achieved.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Allison LA, Moyle M, Shales M, Ingles CJ (1985) Extensive homology among the largest subunits of eukaryotic and prokaryotic RNA polymerases. Cell 42:599–610

    Article  PubMed  Google Scholar 

  • Anderson S, Bankier AT, Barrell BG, de Bruijn MHL, Coulson AR, Drouin J, Eperon IC, Nierlich DP, Roe BA, Sanger F, Schreier PH, Smith AJH, Staden R, Young IG (1981) Sequence and organization of the human mitochondrial genome. Nature 290:457–465

    Article  PubMed  Google Scholar 

  • Anderson S, de Bruijn MHL, Coulson AR, Eperon IC, Sanger F, Young IG (1982) Complete sequence of bovine mitochondrial DNA: conserved features of the mammalian mitochondrial genome. J Mol Biol 156:683–717

    Article  PubMed  Google Scholar 

  • Berghöfer B, Kröckel L, Körtner C, Truss M, Schallenberg J, Klein A (1988) Relatedness of archaebacterial RNA polymerase core subunits to their eubacterial and eukaryotic equivalents. Nucleic Acids Res 16:8113–8128

    PubMed  Google Scholar 

  • Bibb MJ Van Etten RA, Wright CT, Walberg MW, Clayton DA (1981) Sequence and gene organization of mouse mitochondrial DNA. Cell 26:167–180

    Article  PubMed  Google Scholar 

  • Cavender JA (1989) Mechanized derivations of linear invariants. Mol Biol Evol 6:301–316

    PubMed  Google Scholar 

  • Evers R, Hammer A, Köck J, Jess W, Borst P, Mémet S, Cornelissen AWCA (1989)Trypanosoma brucei contains two RNA polymerase II largest subunit genes with an altered C-terminal domain. Cell 56:585–597

    Article  PubMed  Google Scholar 

  • Felsenstein J (1978) Cases in which parsimony and compatibility methods may be positively misleading. Syst Zool 27: 401–410

    Google Scholar 

  • Felsenstein J (1981) Evolutionary trees from DNA sequences: a maximum likelihood approach. J Mol Evol 17:368–376

    Article  PubMed  Google Scholar 

  • Felsenstein J (1985) Confidence limits on phylogenies: an approach using the bootstrap. Evolution 39:783–791

    Google Scholar 

  • Felsenstein J (1989) Phylogenetic inference programs (PHYLIP) manual 3.2. University of Washington, Seattle

    Google Scholar 

  • Fitch FM (1986) The estimate of total nucleotide substitutions from pairwise differences is biased. Philos Trans R Soc Lond B 312:317–324

    Google Scholar 

  • Fox GE, Stackebrandt E, Hespell RB, Gibson J, Maniloff J, Dyer TA, Wolfe RS, Balch WE, Tanner RS, Magrum LJ, Zablen LB, Blakemore R, Gupta R, Bonen L, Lewis BJ, Stahl DA, Luehrsen KR, Chen RN, Woese CR (1980) The phylogeny of prokaryotes. Science 209:457–463

    PubMed  Google Scholar 

  • Gadaleta G, Pepe G, De Candia G, Quagliariello C, Sbisà E, Saccone C (1989) The complete nucleotide sequence of theRattus norvegicus mitochondrial genome: cryptic signals revealed by comparative analysis between vertebrates. J Mol Evol 28:497–516

    PubMed  Google Scholar 

  • Gogarten JP, Rausch T, Bernasconi P, Kibak H, Taiz L (1989a) Molecular evolution of H+-ATPases I.Methanococcus andSulfolobus are monophyletic with respect to eukaryotes and eubacteria. Z Naturforsch 44:97–105

    Google Scholar 

  • Gogarten JP, Kibak H, Dittrich P, Taiz L, Bowman EJ, Bowman BJ, Manolson MF,Poole RJ, Date T, Oshima T, Konishi J, Denda K,Yoshida M (1989b). The evolution of the vacuolar H+-ATPase: implications for the origin of eukaryotes. Proc Natl Acad Sci USA 86:6661–6665

    PubMed  Google Scholar 

  • Goodman M, Koop BF, Czelusniak J, Fitch DHA, Tagle DA, Slighton JL (1989) Molecular phylogeny of the family of apes and humans. Genome 31:316–335

    PubMed  Google Scholar 

  • Gouy M, Li W-H (1989) Phylogenetic analysis based on rRNA sequences supports the archaebacterial rather than the eocyte tree. Nature 339:145–147

    Article  PubMed  Google Scholar 

  • Hudson GS, Holton TA, Whitfield PR, Bottomley W (1988) Spinach chloroplast rpoBC genes encode three subunits of the chloroplast RNA polymerase. J Mol Biol 200:639–654

    Article  PubMed  Google Scholar 

  • Jones WJ, Nagle DP, Whitman WB (1987) Methanogens and the diversity of archaebacteria. Microbiol Rev 51:135–177

    PubMed  Google Scholar 

  • Jukes TH, Cantor CR (1969) Evolution of protein molecules. In: Munro HN (ed) Mammalian protein metabolism, vol 3. Academic Press, New York, pp 21–132

    Google Scholar 

  • Kimura M (1980) A simple method for estimating evolutionary rates of base substitutions through comparative studies of nucleotide sequences. J Mol Evol 16:111–120

    Article  PubMed  Google Scholar 

  • Köck J, Evers R, Cornelissen AWCA (1988) Structure and sequence of the gene for the largest subunit of trypanosomal RNA polymerase III. Nucleic Acids Res 16:8753–8772

    PubMed  Google Scholar 

  • Lake JA (1986) In defence of bacterial phylogeny. Nature 321:657–658

    Article  Google Scholar 

  • Lake JA (1987a) A rate-independent technique for analysis of nucleic acid sequences: evolutionary parsimony. Mol Biol Evol 4:167–191

    PubMed  Google Scholar 

  • Lake JA (1987b) Determining evolutionary distances from highly diverged nucleic acid sequences: operator metrics. J Mol Evol 26:59–73

    PubMed  Google Scholar 

  • Lake JA (1988) Origin of the eukaryotic nucleus determined by rate-invariant analysis of rRNA sequences. Nature 331: 184–186

    PubMed  Google Scholar 

  • Lanave C, Preparata G, Saccone C, Serio G (1984) A new method for calculating evolutionary substitution rates. J Mol Evol 20:86–93

    PubMed  Google Scholar 

  • Lanave C, Preparata G, Saccone C (1985) Mammalian genes as molecular clocks? J Mol Evol 21:346–350

    Google Scholar 

  • Leffers H, Gropp F, Lottspeich F, Zillig W, Garrett RA (1989) Sequence, organization, transcription and evolution of RNA polymerase subunit genes from the archaebacterial extreme halophilesHalobacterium halobium andHalococcus morrhuae. J Mol Biol 206:1–17

    Article  PubMed  Google Scholar 

  • Li W-H (1989) A statistical test of phylogenies estimated from sequence data. Mol Biol Evol 6:424–435

    PubMed  Google Scholar 

  • Margulis L (1970) Origin of eukaryotic cells. Yale University Press, New Haven CT

    Google Scholar 

  • Mémet S, Gouy M, Marck C, Sentenac A, Buhler J-M (1988)RPA190, the gene coding for the largest subunit of yeast RNA polymerase A. J Biol Chem 263:2830–2839

    PubMed  Google Scholar 

  • Muto A, Osawa S (1987) The guanine and cytosine content of genomic DNA and bacterial evolution. Proc Natl Acad Sci USA 84:166–169

    PubMed  Google Scholar 

  • Olsen G (1987) Earliest phylogenetic branchings: comparing rRNA-based evolutionary trees inferred with various techniques. Cold Spring Harbor Symp Quant Biol 52:825–837

    PubMed  Google Scholar 

  • Osawa S, Ohama T, Yamao F, Muto A, Jukes TH, Ozeki H, Umesono K (1988) Directional mutation pressure and transfer RNA in choice of the third nucleotide of synonymous two-codon sets. Proc Natl Acad Sci USA 85:1124–1128

    Google Scholar 

  • Ovchinnikov YA, Monastyrskaya GS, Gubanov VV, Guryev SO, Salomatina IS, Shuvaeva TM, Lipkin VM, Sverdlov ED (1982) The primary structure ofE. coli RNA polymerase. Nucleotide sequence of the rpoC gene and amino acid sequence of the beta′-subunit. Nucleic Acids Res 10:4035–4044

    PubMed  Google Scholar 

  • Prager EM, Wilson AC (1988) Ancient origin of lactalbumin from lysozyme: analysis of DNA and amino acid sequences. J Mol Evol 27:326–335

    PubMed  Google Scholar 

  • Pühler G, Leffers H, Gropp F, Palm P, Klenk H-P, Lottspeich F, Garrett RA, Zillig W (1989a) Archaebacterial DNA-dependent RNA polymerases testify to the evolution of the eukaryotic nuclear genome. Proc Natl Acad Sci USA 86:4569–4573

    Google Scholar 

  • Pühler G, Lottspeich F, Zillig W (1989b) Organization and nucleotide sequence of the genes encoding the large subunits A, B and C of the DNA-dependent RNA polymerase of the archaebacteriumSulfolobus acidocaldarius. Nucleic Acids Res 17:4517–4537

    PubMed  Google Scholar 

  • Saccone C, Pesole G, Preparata G (1989) DNA microenvironments and the molecular clock. J Mol Evol 29:407–411

    PubMed  Google Scholar 

  • Saitou N, Imanishi T (1989) Relative efficiencies of the Fitch-Margoliash, maximum-parsimony, maximum-likelihood, minimum-evolution, and neighbor-joining methods of phylogenetic tree construction in obtaining the correct tree. Mol Biol Evol 6:514–525

    Google Scholar 

  • Saitou N, Nei M (1987) The neighbor-joining method: a new method for reconstructing phylogenetic trees. Mol Biol Evol 4:406–425

    PubMed  Google Scholar 

  • Shoemaker JS, Fitch WM (1989) Evidence from nuclear sequences that invariable sites should be considered when sequence divergence is calculated. Mol Biol Evol 6:270–289

    PubMed  Google Scholar 

  • Spencer DF, Schnare FN, Gray MW (1984) Evidence from nuclear sequences that invariable sites should be considered when sequence divergence is calculated. Mol Biol Evol 6:270–289

    Google Scholar 

  • Spencer DF,Schnare FN, Gray MW (1984) Pronounced structural similarities between the small subunit ribosomal RNA genes of wheat mitochondria andE. coli. Proc Natl Acad Sci USA 81:493–497

    PubMed  Google Scholar 

  • Sueoka N (1988) Directional mutation pressure and neutral molecular evolution. Proc Natl Acad Sci USA 85:2653–2657

    PubMed  Google Scholar 

  • Templeton AR (1983) Convergent evolution and nonparametric inferences from restriction data and DNA sequences. In: Weir BS (ed) Statistical analysis of DNA sequence data. Marcel Dekker, New York, pp 151–179

    Google Scholar 

  • Wilson AC, Cann RL, Carr SM, George M, Gyllensten UB, Helm-Bychowski KM, Higuchi RG, Palumbi SR, Prager EM, Sage RD, Stoneking M (1985) Mitochondrial DNA and two perspectives on evolutionary genetics. Biol J Linn Soc 26375–400

    Google Scholar 

  • Woese CR (1987) Bacterial evolution. Microbiol Rev 51:221–271

    PubMed  Google Scholar 

  • Zillig W, Klenk H-P, Palm P, Leffers H, Pühler G, Gropp F, Garrett RA (1989) Did eukaryotes originate by a fusion event? Endocytobiosis Cell Res 6:1–25

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

About this article

Cite this article

Sidow, A., Wilson, A.C. Compositional statistics: An improvement of evolutionary parsimony and its application to deep branches in the tree of life. J Mol Evol 31, 51–68 (1990). https://doi.org/10.1007/BF02101792

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Issue Date:

  • DOI: https://doi.org/10.1007/BF02101792

Key words

Navigation