Statistical tests of models of DNA substitution
 Nick Goldman
 … show all 1 hide
Rent the article at a discount
Rent now* Final gross prices may vary according to local VAT.
Get AccessSummary
Penny et al. have written that “The most fundamental criterion for a scientific method is that the data must, in principle, be able to reject the model. Hardly any [phylogenetic] treereconstruction methods meet this simple requirement.” The ability to reject models is of such great importance because the results of all phylogenetic analyses depend on their underlying models—to have confidence in the inferences, it is necessary to have confidence in the models. In this paper, a test statistics suggested by Cox is employed to test the adequacy of some statistical models of DNA sequence evolution used in the phylogenetic inference method introduced by Felsentein. Monte Carlo simulations are used to assess significance levels. The resulting statistical tests provide an objective and very general assessment of all the components of a DNA substitution model; more specific versions of the test are devised to test individual components of a model. In all cases, the new analyses have the additional advantage that values of phylogenetic parameters do not have to be assumed in order to perform the tests.
 Atkinson AC (1970) A method for discriminating between models. J R Statist Soc B 32:323–345
 Avery PJ (1987) The analysis of intron data and their use in the detection of short signals. J Mol Evol 26:335–340
 Bailey WJ, Fitch DFA, Tagle DA, Czelusniak J (1991) Molecular evolution of the ψηglobin gene locus: gibbon phylogeny and the hominoid slowdown. Mol Biol Evol 8:155–184
 Bartlett MS (1963) The spectral analysis of point processes. J R Statist Soc B 25:264–296
 Bishop MJ, Friday AE (1985) Evolutionary trees from nucleic acid and protein sequences. Proc R Soc Lond B 226:271–302
 Bross ID (1990) How to eradicate fraudulent statistical methods: statisticians must do science. Biometrics 46:1213–1225
 Bulmer M (1987) A statistical analysis of nucleotide sequences in introns and exons in human genes. Mol Biol Evol 4:395–405
 Bulmer M (1989) Estimating the variability of substitution rates. Genetics 123:615–619
 Cavender JA (1989) Mechanized derivation of linear invariants. Mol Biol Evol 6:301–316
 Churchill GA (1989) Stochastic models for heterogeneous DNA sequences. Bull Math Biol 51:79–94
 Cox DR (1961) Tests of separate families of hypotheses. Proceedings of the 4th Berkeley Symposium (University of California Press) 1:105–123
 Cox DR (1962) Further results on tests of separate families of hypotheses. J R Statist Soc B 24:406–424
 Cox DR, Miller HD (1977) The theory of stochastic processes. Chapman and Hall, London, pp 146–198
 Dams E, Hendriks L, Van de Peer Y, Neefs JM, Smits G, Vanderbempt I, de Wachter R (1988) Compilation of small subunit RNA subsequences. Nucl Acids Res 16:r87r174
 Edwards AWF (1972) Likelihood. Cambridge University Press, Cambridge, pp 31, 70–102
 Efron B (1982) The jackknife, the bootstrap and other resampling plans. Soc Ind Appl Math CBMSNatl Sci Found Monogr 38
 Efron B, Gong G (1983) A leisurely look at the bootstrap, the jackknife, and crossvalidation. Am Statistician 37:36–48
 Efron B, Tibshirani R (1986) Bootstrap methods for standard errors, confidence intervals, and other measures of statistical accuracy. Stat Sci 1:54–77
 Felsenstein J (1973) Maximum likelihood and minimumsteps methods for estimating evolutionary trees from data on discrete characters. Syst Zool 22:240–249
 Felsenstein J (1978) The number of evolutionary trees. Syst Zool 27:27–33
 Felsenstein J (1981) Evolutionary trees from DNA sequences: a maximum likelihood approach. J Mol Evol 17:368–376
 Felsenstein J (1983) Statistical inference of phylogenies. J R Statist Soc A 146:246–272
 Felsenstein J (1988) Phylogenies from molecular sequences: inference and reliability. Ann Rev Genet 22:521–565
 Felsenstein J (1991a) Counting phylogenetic invariants in some simple cases. J Theor Biol 152:357–376
 Felsenstein J (1991b) PHYLIP (Phylogenetic Inference Package) version 3.4, documentation. University of Washington, Seattle
 Gillespie JH (1986) Rates of molecular evolution. Ann Rev Ecol Syst 17:637–665
 Gillespie JH (1989) Lineage effects and the index of dispersion of molecular evolution. Mol Biol Evol 6:636–647
 Goldman N (1990) Maximum likelihood inference of phylogenetic trees, with special reference to a Poisson process model of DNA substitution and to parsimony analyses. Syst Zool 39:345–361
 Goldman N (1991) Statistical estimation of phylogenetic trees. PhD Thesis, University of Cambridge, Cambridge, pp 70–73
 Hall P, Wilson SR (1991) Two guidelines for bootstrap hypothesis testing. Biometrics 47:757–762
 Hasegawa M, Horai S (1991) Time of the deepest root for polymorphism in human mitochondrial DNA. J Mol Evol 32:37–42
 Hasegawa M, Iida Y, Yano T, Takaiwa F, Iwabuchi M (1985a) Phylogenetic relationships among eukaryotic kingdoms inferred from ribosomal RNA sequences. J Mol Evol 22:32–38
 Hasegawa M, Kishino H, Yano T (1985b) Dating of the humanape splitting by a molecular clock of mitochondrial DNA. J Mol Evol 22:160–174
 Hasegawa M, Kishino H, Yano T (1987) Man's place in Hominoidea as inferred from molecular clocks of DNA. J Mol Evol 26:132–147
 Hasegawa M, Kishino H, Yano T (1988) Phylogenetic inference from DNA sequence data. In: Matusita K (ed) Statistical theory and data analysis II. Elsevier, Holland, pp 1–13
 Hasegawa M, Kishino H, Yano T (1989) Estimation of branching dates among primates by molecular clocks of nuclear DNA which slowed down in Hominoidea. J Hum Evol 18:461–476
 Hasegawa M, Kishino H, Hayasaka K, Horai S (1990) Mitochondrial DNA evolution in primates: transition rate has been extremely low in lemur. J Mol Evol 31:113–121
 Hasegawa M, Yano T, Kishino H (1984) A new molecular clock of mitochondrial DNA and the evolution of hominoids. Proc Jpn Acad B 60:95–98
 Holmes EC, Pesole G, Saccone C (1989) Stochastic models of molecular evolution and the estimation of phylogeny and rates of nucleotide substitution in the hominoid primates. J Hum Evol 18:775–794
 Hope ACA (1968) A simplified Monte Carlo significance test procedure. J R Statist Soc B 30:582–598
 Jukes TH, Cantor CR (1969) Evolution of protein molecules. In: Munro HN (ed) Mammalian protein metabolism, vol 3. Academic Press, New York, pp 21–132
 Kendall M, Stuart A (1979) The advanced theory of statistics, vol 2. 4th ed. Charles Griffin, London, pp 240–252
 Kimura M (1983) The neutral theory of molecular evolution. Cambridge University Press, Cambridge, pp 65–89
 Kishino H, Hasegawa M (1989) Evaluation of the maximum likelihood estimate of the evolutionary tree topologies from DNA sequence data, and the branching order in Hominoidea. J Mol Evol 29:170–179
 Kishino H, Hasegawa M (1990) Converting distance to time: application to human evolution. Meth Enz 183:550–570
 Koop BF, Goodman M, Xu P, Chan K, Slightom JL (1986) Primate etaglobin DNA sequences and man's place among the great apes. Nature 319:234–238
 Lake JA (1987) A rateindependent technique for analysis of nucleic acid sequences: evolutionary parsimony. Mol Biol Evol 4:167–191
 Lake JA (1988) Origin of the eukaryotic nucleus determined by rateinvariant analysis of rRNA sequences. Nature 331:184–186
 Lanave C, Preparata G, Saccone C, Serio G (1984) A new method for calculating evolutionary substitution rates. J Mol Evol 20:86–93
 Langley CH, Fitch WM (1974) An examination of the constancy of the rate of molecular evolution. J Mol Evol 3:161–177
 Li WH, Gojobori T, Nei M (1981) Pseudogenes as a paradigm of neutral evolution. Nature 292:237–239
 Lindgren BW (1976) Statistical theory. 3rd ed. Macmillan, New York, pp 307–308, 331, 424
 Lindsay JK (1974a) Comparison of probability distributions. J R Statist Soc B 36:38–44
 Lindsay JK (1974b) Construction and comparison of statistical models. J R Statist Soc B 36:418–425
 Lockhart PJ, Penny D, Hendy MD, Howe CJ, Beanland TJ, Larkum AD (1992) Controversy on chloroplast origins. FEBS Lett 301:127–131
 Loh WY (1985) A new method for testing separate families of hypotheses. J Am Stat Assoc 80:362–368
 Maeda N, Wu CI, Bliska J, Reneke J (1988) Molecular evolution of intergenic DNA in higher primates: pattern of DNA changes, molecular clock, and evolution of repetitive sequences. Mol Biol Evol 5:1–20
 Marriott FHC (1979) Barnard's Monte Carlo tests: how many simulations? Appl Statist 28:75–77
 McCullagh P, Nelder JA (1989) Generalized linear models. 2nd ed. Chapman and Hall, London, pp 119, 174
 Navidi WC, Churchill GA, von Haeseler A (1991) Methods for inferring phylogenies from nucleic acid sequence data by using maximum likelihood and linear invariants. Mol Biol Evol 8:128–143
 Oliver JL, Marín A, Medina JR (1989) SDSE: a software package to simulate the evolution of a pair of DNA sequences. CABIOS 5:47–50
 Penny D (1982) Towards a basis for classification: the incompleteness of distance measures, incompatibility analysis and phenetic classification. J Theor Biol 96:129–142
 Penny D, Hendy MD (1986) Estimating the reliability of evolutionary trees. Mol Biol Evol 3:403–417
 Penny D, Hendy MD, Steel MA (1992) Progress with methods for constructing evolutionary trees. TREE 7:73–79
 Pesole G, Bozzetti MP, Lanave C, Preparata G, Saccone C (1991) Glutamine synthetase gene evolution: a good molecular clock. Proc Natl Acad Sci USA 88:522–526
 Ripley BD (1987) Stochastic simulation. John Wiley and Sons, New York, pp 171–174, 176
 Ritland K, Clegg MT (1987) Evolutionary analysis of plant DNA sequences. Am Nat 130:S74S100
 Rodríguez F, Oliver JL, Marín A, Medina JR (1990) The general stochastic model of nucleotide substitution. J Theor Biol 142:485–501
 Silvey SD (1975) Statistical inference. Chapman and Hall, London, pp 108–114
 Thorne JL, Kishino H, Felsenstein J (1991) An evolutionary model for maximum likelihood alignment of DNA sequences. J Mol Evol 33:114–124 and Erratum, J Mol Evol (1992) 34:91
 Thorne JL, Kishino H, Felsenstein J (1992) Inching toward reality: an improved likelihood model of sequence evolution. J Mol Evol 34:3–16
 Williams DA (1970) Discrimination between regression models to determine the pattern of enzyme synthesis in synchronous cell cultures. Biometrics 26:23–32
 Wilson AC, Carlson SS, White TJ (1977) Biochemical evolution. Ann Rev Biochem 46:573–639
 Title
 Statistical tests of models of DNA substitution
 Journal

Journal of Molecular Evolution
Volume 36, Issue 2 , pp 182198
 Cover Date
 19930201
 DOI
 10.1007/BF00166252
 Print ISSN
 00222844
 Online ISSN
 14321432
 Publisher
 SpringerVerlag
 Additional Links
 Topics
 Keywords

 Phylogenetic inference
 Maximum likelihood inference
 Evolutionary models
 Statistical testing
 Hypothesis testing
 Molecular clock
 Industry Sectors
 Authors

 Nick Goldman ^{(1)}
 Author Affiliations

 1. University Museum of Zoology, Department of Zoology, University of Cambridge, Downing Street, CB2 3EJ, Cambridge, UK