Journal of Molecular Evolution

, Volume 36, Issue 2, pp 182–198 | Cite as

Statistical tests of models of DNA substitution

  • Nick Goldman


Penny et al. have written that “The most fundamental criterion for a scientific method is that the data must, in principle, be able to reject the model. Hardly any [phylogenetic] tree-reconstruction methods meet this simple requirement.” The ability to reject models is of such great importance because the results of all phylogenetic analyses depend on their underlying models—to have confidence in the inferences, it is necessary to have confidence in the models. In this paper, a test statistics suggested by Cox is employed to test the adequacy of some statistical models of DNA sequence evolution used in the phylogenetic inference method introduced by Felsentein. Monte Carlo simulations are used to assess significance levels. The resulting statistical tests provide an objective and very general assessment of all the components of a DNA substitution model; more specific versions of the test are devised to test individual components of a model. In all cases, the new analyses have the additional advantage that values of phylogenetic parameters do not have to be assumed in order to perform the tests.

Key words

Phylogenetic inference Maximum likelihood inference Evolutionary models Statistical testing Hypothesis testing Molecular clock 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. Atkinson AC (1970) A method for discriminating between models. J R Statist Soc B 32:323–345Google Scholar
  2. Avery PJ (1987) The analysis of intron data and their use in the detection of short signals. J Mol Evol 26:335–340PubMedGoogle Scholar
  3. Bailey WJ, Fitch DFA, Tagle DA, Czelusniak J (1991) Molecular evolution of the ψη-globin gene locus: gibbon phylogeny and the hominoid slowdown. Mol Biol Evol 8:155–184PubMedGoogle Scholar
  4. Bartlett MS (1963) The spectral analysis of point processes. J R Statist Soc B 25:264–296Google Scholar
  5. Bishop MJ, Friday AE (1985) Evolutionary trees from nucleic acid and protein sequences. Proc R Soc Lond B 226:271–302Google Scholar
  6. Bross ID (1990) How to eradicate fraudulent statistical methods: statisticians must do science. Biometrics 46:1213–1225PubMedGoogle Scholar
  7. Bulmer M (1987) A statistical analysis of nucleotide sequences in introns and exons in human genes. Mol Biol Evol 4:395–405PubMedGoogle Scholar
  8. Bulmer M (1989) Estimating the variability of substitution rates. Genetics 123:615–619PubMedGoogle Scholar
  9. Cavender JA (1989) Mechanized derivation of linear invariants. Mol Biol Evol 6:301–316PubMedGoogle Scholar
  10. Churchill GA (1989) Stochastic models for heterogeneous DNA sequences. Bull Math Biol 51:79–94PubMedGoogle Scholar
  11. Cox DR (1961) Tests of separate families of hypotheses. Proceedings of the 4th Berkeley Symposium (University of California Press) 1:105–123Google Scholar
  12. Cox DR (1962) Further results on tests of separate families of hypotheses. J R Statist Soc B 24:406–424Google Scholar
  13. Cox DR, Miller HD (1977) The theory of stochastic processes. Chapman and Hall, London, pp 146–198Google Scholar
  14. Dams E, Hendriks L, Van de Peer Y, Neefs JM, Smits G, Vanderbempt I, de Wachter R (1988) Compilation of small subunit RNA subsequences. Nucl Acids Res 16:r87-r174PubMedGoogle Scholar
  15. Edwards AWF (1972) Likelihood. Cambridge University Press, Cambridge, pp 31, 70–102Google Scholar
  16. Efron B (1982) The jackknife, the bootstrap and other resampling plans. Soc Ind Appl Math CBMS-Natl Sci Found Monogr 38Google Scholar
  17. Efron B, Gong G (1983) A leisurely look at the bootstrap, the jackknife, and cross-validation. Am Statistician 37:36–48Google Scholar
  18. Efron B, Tibshirani R (1986) Bootstrap methods for standard errors, confidence intervals, and other measures of statistical accuracy. Stat Sci 1:54–77Google Scholar
  19. Felsenstein J (1973) Maximum likelihood and minimum-steps methods for estimating evolutionary trees from data on discrete characters. Syst Zool 22:240–249Google Scholar
  20. Felsenstein J (1978) The number of evolutionary trees. Syst Zool 27:27–33Google Scholar
  21. Felsenstein J (1981) Evolutionary trees from DNA sequences: a maximum likelihood approach. J Mol Evol 17:368–376PubMedGoogle Scholar
  22. Felsenstein J (1983) Statistical inference of phylogenies. J R Statist Soc A 146:246–272Google Scholar
  23. Felsenstein J (1988) Phylogenies from molecular sequences: inference and reliability. Ann Rev Genet 22:521–565PubMedGoogle Scholar
  24. Felsenstein J (1991a) Counting phylogenetic invariants in some simple cases. J Theor Biol 152:357–376PubMedGoogle Scholar
  25. Felsenstein J (1991b) PHYLIP (Phylogenetic Inference Package) version 3.4, documentation. University of Washington, SeattleGoogle Scholar
  26. Gillespie JH (1986) Rates of molecular evolution. Ann Rev Ecol Syst 17:637–665Google Scholar
  27. Gillespie JH (1989) Lineage effects and the index of dispersion of molecular evolution. Mol Biol Evol 6:636–647PubMedGoogle Scholar
  28. Goldman N (1990) Maximum likelihood inference of phylogenetic trees, with special reference to a Poisson process model of DNA substitution and to parsimony analyses. Syst Zool 39:345–361Google Scholar
  29. Goldman N (1991) Statistical estimation of phylogenetic trees. PhD Thesis, University of Cambridge, Cambridge, pp 70–73Google Scholar
  30. Hall P, Wilson SR (1991) Two guidelines for bootstrap hypothesis testing. Biometrics 47:757–762Google Scholar
  31. Hasegawa M, Horai S (1991) Time of the deepest root for polymorphism in human mitochondrial DNA. J Mol Evol 32:37–42PubMedGoogle Scholar
  32. Hasegawa M, Iida Y, Yano T, Takaiwa F, Iwabuchi M (1985a) Phylogenetic relationships among eukaryotic kingdoms inferred from ribosomal RNA sequences. J Mol Evol 22:32–38PubMedGoogle Scholar
  33. Hasegawa M, Kishino H, Yano T (1985b) Dating of the human-ape splitting by a molecular clock of mitochondrial DNA. J Mol Evol 22:160–174PubMedGoogle Scholar
  34. Hasegawa M, Kishino H, Yano T (1987) Man's place in Hominoidea as inferred from molecular clocks of DNA. J Mol Evol 26:132–147PubMedGoogle Scholar
  35. Hasegawa M, Kishino H, Yano T (1988) Phylogenetic inference from DNA sequence data. In: Matusita K (ed) Statistical theory and data analysis II. Elsevier, Holland, pp 1–13Google Scholar
  36. Hasegawa M, Kishino H, Yano T (1989) Estimation of branching dates among primates by molecular clocks of nuclear DNA which slowed down in Hominoidea. J Hum Evol 18:461–476Google Scholar
  37. Hasegawa M, Kishino H, Hayasaka K, Horai S (1990) Mitochondrial DNA evolution in primates: transition rate has been extremely low in lemur. J Mol Evol 31:113–121PubMedGoogle Scholar
  38. Hasegawa M, Yano T, Kishino H (1984) A new molecular clock of mitochondrial DNA and the evolution of hominoids. Proc Jpn Acad B 60:95–98Google Scholar
  39. Holmes EC, Pesole G, Saccone C (1989) Stochastic models of molecular evolution and the estimation of phylogeny and rates of nucleotide substitution in the hominoid primates. J Hum Evol 18:775–794Google Scholar
  40. Hope ACA (1968) A simplified Monte Carlo significance test procedure. J R Statist Soc B 30:582–598Google Scholar
  41. Jukes TH, Cantor CR (1969) Evolution of protein molecules. In: Munro HN (ed) Mammalian protein metabolism, vol 3. Academic Press, New York, pp 21–132Google Scholar
  42. Kendall M, Stuart A (1979) The advanced theory of statistics, vol 2. 4th ed. Charles Griffin, London, pp 240–252Google Scholar
  43. Kimura M (1983) The neutral theory of molecular evolution. Cambridge University Press, Cambridge, pp 65–89Google Scholar
  44. Kishino H, Hasegawa M (1989) Evaluation of the maximum likelihood estimate of the evolutionary tree topologies from DNA sequence data, and the branching order in Hominoidea. J Mol Evol 29:170–179PubMedGoogle Scholar
  45. Kishino H, Hasegawa M (1990) Converting distance to time: application to human evolution. Meth Enz 183:550–570Google Scholar
  46. Koop BF, Goodman M, Xu P, Chan K, Slightom JL (1986) Primate eta-globin DNA sequences and man's place among the great apes. Nature 319:234–238PubMedGoogle Scholar
  47. Lake JA (1987) A rate-independent technique for analysis of nucleic acid sequences: evolutionary parsimony. Mol Biol Evol 4:167–191PubMedGoogle Scholar
  48. Lake JA (1988) Origin of the eukaryotic nucleus determined by rate-invariant analysis of rRNA sequences. Nature 331:184–186PubMedGoogle Scholar
  49. Lanave C, Preparata G, Saccone C, Serio G (1984) A new method for calculating evolutionary substitution rates. J Mol Evol 20:86–93PubMedGoogle Scholar
  50. Langley CH, Fitch WM (1974) An examination of the constancy of the rate of molecular evolution. J Mol Evol 3:161–177PubMedGoogle Scholar
  51. Li W-H, Gojobori T, Nei M (1981) Pseudogenes as a paradigm of neutral evolution. Nature 292:237–239PubMedGoogle Scholar
  52. Lindgren BW (1976) Statistical theory. 3rd ed. Macmillan, New York, pp 307–308, 331, 424Google Scholar
  53. Lindsay JK (1974a) Comparison of probability distributions. J R Statist Soc B 36:38–44Google Scholar
  54. Lindsay JK (1974b) Construction and comparison of statistical models. J R Statist Soc B 36:418–425Google Scholar
  55. Lockhart PJ, Penny D, Hendy MD, Howe CJ, Beanland TJ, Larkum AD (1992) Controversy on chloroplast origins. FEBS Lett 301:127–131PubMedGoogle Scholar
  56. Loh W-Y (1985) A new method for testing separate families of hypotheses. J Am Stat Assoc 80:362–368Google Scholar
  57. Maeda N, Wu CI, Bliska J, Reneke J (1988) Molecular evolution of intergenic DNA in higher primates: pattern of DNA changes, molecular clock, and evolution of repetitive sequences. Mol Biol Evol 5:1–20PubMedGoogle Scholar
  58. Marriott FHC (1979) Barnard's Monte Carlo tests: how many simulations? Appl Statist 28:75–77Google Scholar
  59. McCullagh P, Nelder JA (1989) Generalized linear models. 2nd ed. Chapman and Hall, London, pp 119, 174Google Scholar
  60. Navidi WC, Churchill GA, von Haeseler A (1991) Methods for inferring phylogenies from nucleic acid sequence data by using maximum likelihood and linear invariants. Mol Biol Evol 8:128–143PubMedGoogle Scholar
  61. Oliver JL, Marín A, Medina J-R (1989) SDSE: a software package to simulate the evolution of a pair of DNA sequences. CABIOS 5:47–50PubMedGoogle Scholar
  62. Penny D (1982) Towards a basis for classification: the incompleteness of distance measures, incompatibility analysis and phenetic classification. J Theor Biol 96:129–142PubMedGoogle Scholar
  63. Penny D, Hendy MD (1986) Estimating the reliability of evolutionary trees. Mol Biol Evol 3:403–417PubMedGoogle Scholar
  64. Penny D, Hendy MD, Steel MA (1992) Progress with methods for constructing evolutionary trees. TREE 7:73–79Google Scholar
  65. Pesole G, Bozzetti MP, Lanave C, Preparata G, Saccone C (1991) Glutamine synthetase gene evolution: a good molecular clock. Proc Natl Acad Sci USA 88:522–526PubMedGoogle Scholar
  66. Ripley BD (1987) Stochastic simulation. John Wiley and Sons, New York, pp 171–174, 176Google Scholar
  67. Ritland K, Clegg MT (1987) Evolutionary analysis of plant DNA sequences. Am Nat 130:S74-S100Google Scholar
  68. Rodríguez F, Oliver JL, Marín A, Medina JR (1990) The general stochastic model of nucleotide substitution. J Theor Biol 142:485–501PubMedGoogle Scholar
  69. Silvey SD (1975) Statistical inference. Chapman and Hall, London, pp 108–114Google Scholar
  70. Thorne JL, Kishino H, Felsenstein J (1991) An evolutionary model for maximum likelihood alignment of DNA sequences. J Mol Evol 33:114–124 and Erratum, J Mol Evol (1992) 34:91PubMedGoogle Scholar
  71. Thorne JL, Kishino H, Felsenstein J (1992) Inching toward reality: an improved likelihood model of sequence evolution. J Mol Evol 34:3–16PubMedGoogle Scholar
  72. Williams DA (1970) Discrimination between regression models to determine the pattern of enzyme synthesis in synchronous cell cultures. Biometrics 26:23–32PubMedGoogle Scholar
  73. Wilson AC, Carlson SS, White TJ (1977) Biochemical evolution. Ann Rev Biochem 46:573–639PubMedGoogle Scholar

Copyright information

© Springer-Verlag New York Inc 1993

Authors and Affiliations

  • Nick Goldman
    • 1
  1. 1.University Museum of Zoology, Department of ZoologyUniversity of CambridgeCambridgeUK

Personalised recommendations