Estimation of evolutionary distances between nucleotide sequences Article Received: 09 February 1993 Accepted: 14 March 1994 DOI :
10.1007/BF00160155

Cite this article as: Zharkikh, A. J Mol Evol (1994) 39: 315. doi:10.1007/BF00160155
174
Citations
301
Downloads
Abstract A formal mathematical analysis of the substitution process in nucleotide sequence evolution was done in terms of the Markov process. By using matrix algebra theory, the theoretical foundation of Barry and Hartigan's (Stat. Sci. 2:191–210, 1987) and Lanave et al.'s (J. Mol. Evol. 20:86–93, 1984) methods was provided. Extensive computer simulation was used to compare the accuracy and effectiveness of various methods for estimating the evolutionary distance between two nucleotide sequences. It was shown that the multiparameter methods of Lanave et al.'s (J. Mol. Evol. 20:86–93, 1984), Gojobori et al.'s (J. Mol. Evol. 18:414–422, 1982), and Barry and Hartigan's (Stat. Sci. 2:191–210, 1987) are preferable to others for the purpose of phylogenetic analysis when the sequences are long. However, when sequences are short and the evolutionary distance is large, Tajima and Nei's (Mol. Biol. Evol. 1:269–285, 1984) method is superior to others.

Key words Nucleotide substitution Markov process Substitution matrix Evolutionary distance

References Barry D, Hartigan JA (1987) Statistical analysis of hominoid molecular evolution. Stat Sci 2:191–210

Google Scholar Bellman R (1960) Introduction to matrix analysis. McGraw-Hill, New York, p 34

Google Scholar Blaisdell BE (1985) A method of estimating from two aligned present-day DNA sequences their ancestral composition and subsequent rates of substitution, possibly different in the two lineages, corrected for multiple and parallel substitutions at the same site. J Mol Evol 22:69–81

Google Scholar Cavender JA, Felsenstein J (1987) Invariants of phylogenies in a simple case with discrete states. J Classification 4:57–71

Google Scholar DeBry RW (1992) The consistency of several phylogeny-inference methods under varying evolutionary rates. Mol Biol Evol 9:537–551

Google Scholar Felsenstein J (1973) Maximum-likelihood and minimum-steps methods for evolutinary trees from data on discrete characters. Syst Zool 26:77–88

Google Scholar Felsenstein J (1983) Statistical inference of phylogenies. J R Statist Soc A 146:246–272

Google Scholar Felsenstein J (1984) Distance methods for inferring phylogenies: a justification. Evolution 38:16–24

Google Scholar Felsenstein J (1992) Phylogenies from restriction sites: a maximum-likelihood approach. Evolution 46:159–173

Google Scholar Fitch WM (1980) Estimating the total number of nucleotide substitutions since the common ancestor of a pair of homologous genes: comparison of several methods and three beta homoglobulin messenger RNA's. J Mol Evol 16:153–209

Google Scholar Fitch WM (1986) The estimate of total nucleotide substitution from pairwise differences is biased. Philos Trans R Soc Lend Biol 312: 317–324

Google Scholar Gojobori T, Ishii K, Nei M (1982) Estimation of average number of nucleotide substitutions when the rate of substitution varies with nucleotide. J Mol Evol 18:414–422

Google Scholar Gojobori T, Moriyama EN, Kimura M (1990) Statistical method for estimating sequence divergence. In: Doolittle RF (ed) Methods in enzymology, vol 183. Molecular evolution: computer analysis of protein and nucleic acid sequences. Academic Press, San Diego, pp 531–550

Google Scholar Gojobori T, Nei M, Ishii K (1981) Mathematical model of nucleotide substitutions with unequal substitution rates. Genetics 97:s43

Google Scholar Hasegawa M, Kishino H, Yano T (1985) Dating of the human-ape splitting by a molecular clock of mitochondrial DNA. J Mol Evol 22:160–174

Google Scholar Holmquist R (1976) Solution to a gene divergence problem under arbitrary stable nucleotide transition probabilities. J Mol Evol 8: 337–349

Google Scholar Jukes TH, Cantor CR (1969) Evolution of protein molecules. In: Munro NH (ed) Mammalian protein metabolism. Academic Press, New York, pp 21–123

Google Scholar Kimura M (1980) A simple method for estimating evolutionary rates of base substitutions through comparative studies of nucleotide sequences. J Mol Evol 16:111–120

Google Scholar Kimura M (1981) Estimation of evolutionary differences between homologous nucleotide sequences. Proc Natl Acad Sci USA 78:454–458

Google Scholar Kishino H, Hasegawa M (1990) Converting distance to time: application to human evolution. In: Doolittle RF (ed) Methods in enzymology, vol 183. Molecular evolution: computer analysis of protein and nucleic acid sequences. Academic Press, San Diego, pp 550–570

Google Scholar Lanave C, Preparata G, Saccone C, Serio G (1984) A new method for calculating evolutionary substitution rates. J Mol Evol 20:86–93

Google Scholar Nei M, Tateno Y (1978) Nonrandom amino acid substitution and estimation of the number of nucleotide substitution in evolution. J Mol Evol 11:333–347

Google Scholar Nguyen T, Speed TP (1992) A derivation of all linear invariants for a nonbalanced transversion model. J Mol Evol 35:60–88

Google Scholar Olsen G (1991) Systematic underestimation of tree branch lengths by Lake's operator metrics: an effect of position-dependent substitution rates. Mol Biol Evol 8:592–608

Google Scholar Saccone C, Lanave C, Pesole G, Preparata G (1990) Influence of base composition on quantitative estimates of gene evolution. In: Doolittle RF (ed) Methods in enzymology, vol 183. Molecular evolution: computer analysis of protein and nucleic acid sequences. Academic Press, San Diego, pp 570–583

Google Scholar Saitou N (1990) Maximum likelihood methods. In: Doolittle RF (ed) Methods in enzymology, vol 183. Molecular evolution: computer analysis of protein and nucleic acid sequences. Academic Press, San Diego, pp 584–598

Google Scholar Saitou N, Nei M (1987) The neighbor-joining method: a new method for reconstructing phylogenetic trees. Mol Biol Evol 4:406–425

Google Scholar Tajima F, Nei M (1982) Biases of the estimates of DNA divergence obtained by the restriction enzyme technique. J Mol Evol 18:115–120

Google Scholar Tajima F, Nei M (1984) Estimation of evolutionary distance between nucleotide sequences. Mol Biol Evol 1:269–285

Google Scholar Takahata N, Kimura M (1981) A model of evolutionary base substitution and its application with special reference to rapid change of pseudo-genes. Genetics 98:641–657

Google Scholar Tamura K (1992) Estimation of the number of nucleotide substitutions when there are strong transition-transversion and G+C content biases. Mol Biol Evol 9:678–687

Google Scholar Zharkikh A, Li W-H (1992) Statistical properties of bootstrap estimation of phylogenetic variability from nucleotide sequences. II. Four taxa without a molecular clock. J Mol Evol 35:356–366

Google Scholar Zharkikh A, Li W-H (1993) Inconsistency of the maximum parsimony method: the case of five taxa with a molecular clock. Syst Biology 42:113–125

Google Scholar © Springer-Verlag New York Inc 1994

Authors and Affiliations 1. Center for Demographic and Population Genetics University of Texas Houston USA 2. Institute of Cytology and Genetics Novosibirsk Russia 3. Center for Demographic and Population Genetics University of Texas Houston USA