Journal of Molecular Evolution

, Volume 39, Issue 3, pp 315–329 | Cite as

Estimation of evolutionary distances between nucleotide sequences

  • Andrey Zharkikh


A formal mathematical analysis of the substitution process in nucleotide sequence evolution was done in terms of the Markov process. By using matrix algebra theory, the theoretical foundation of Barry and Hartigan's (Stat. Sci. 2:191–210, 1987) and Lanave et al.'s (J. Mol. Evol. 20:86–93, 1984) methods was provided. Extensive computer simulation was used to compare the accuracy and effectiveness of various methods for estimating the evolutionary distance between two nucleotide sequences. It was shown that the multiparameter methods of Lanave et al.'s (J. Mol. Evol. 20:86–93, 1984), Gojobori et al.'s (J. Mol. Evol. 18:414–422, 1982), and Barry and Hartigan's (Stat. Sci. 2:191–210, 1987) are preferable to others for the purpose of phylogenetic analysis when the sequences are long. However, when sequences are short and the evolutionary distance is large, Tajima and Nei's (Mol. Biol. Evol. 1:269–285, 1984) method is superior to others.

Key words

Nucleotide substitution Markov process Substitution matrix Evolutionary distance 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. Barry D, Hartigan JA (1987) Statistical analysis of hominoid molecular evolution. Stat Sci 2:191–210Google Scholar
  2. Bellman R (1960) Introduction to matrix analysis. McGraw-Hill, New York, p 34Google Scholar
  3. Blaisdell BE (1985) A method of estimating from two aligned present-day DNA sequences their ancestral composition and subsequent rates of substitution, possibly different in the two lineages, corrected for multiple and parallel substitutions at the same site. J Mol Evol 22:69–81Google Scholar
  4. Cavender JA, Felsenstein J (1987) Invariants of phylogenies in a simple case with discrete states. J Classification 4:57–71Google Scholar
  5. DeBry RW (1992) The consistency of several phylogeny-inference methods under varying evolutionary rates. Mol Biol Evol 9:537–551Google Scholar
  6. Felsenstein J (1973) Maximum-likelihood and minimum-steps methods for evolutinary trees from data on discrete characters. Syst Zool 26:77–88Google Scholar
  7. Felsenstein J (1983) Statistical inference of phylogenies. J R Statist Soc A 146:246–272Google Scholar
  8. Felsenstein J (1984) Distance methods for inferring phylogenies: a justification. Evolution 38:16–24Google Scholar
  9. Felsenstein J (1992) Phylogenies from restriction sites: a maximum-likelihood approach. Evolution 46:159–173Google Scholar
  10. Fitch WM (1980) Estimating the total number of nucleotide substitutions since the common ancestor of a pair of homologous genes: comparison of several methods and three beta homoglobulin messenger RNA's. J Mol Evol 16:153–209Google Scholar
  11. Fitch WM (1986) The estimate of total nucleotide substitution from pairwise differences is biased. Philos Trans R Soc Lend Biol 312: 317–324Google Scholar
  12. Gojobori T, Ishii K, Nei M (1982) Estimation of average number of nucleotide substitutions when the rate of substitution varies with nucleotide. J Mol Evol 18:414–422Google Scholar
  13. Gojobori T, Moriyama EN, Kimura M (1990) Statistical method for estimating sequence divergence. In: Doolittle RF (ed) Methods in enzymology, vol 183. Molecular evolution: computer analysis of protein and nucleic acid sequences. Academic Press, San Diego, pp 531–550Google Scholar
  14. Gojobori T, Nei M, Ishii K (1981) Mathematical model of nucleotide substitutions with unequal substitution rates. Genetics 97:s43Google Scholar
  15. Hasegawa M, Kishino H, Yano T (1985) Dating of the human-ape splitting by a molecular clock of mitochondrial DNA. J Mol Evol 22:160–174Google Scholar
  16. Holmquist R (1976) Solution to a gene divergence problem under arbitrary stable nucleotide transition probabilities. J Mol Evol 8: 337–349Google Scholar
  17. Jukes TH, Cantor CR (1969) Evolution of protein molecules. In: Munro NH (ed) Mammalian protein metabolism. Academic Press, New York, pp 21–123Google Scholar
  18. Kimura M (1980) A simple method for estimating evolutionary rates of base substitutions through comparative studies of nucleotide sequences. J Mol Evol 16:111–120Google Scholar
  19. Kimura M (1981) Estimation of evolutionary differences between homologous nucleotide sequences. Proc Natl Acad Sci USA 78:454–458Google Scholar
  20. Kishino H, Hasegawa M (1990) Converting distance to time: application to human evolution. In: Doolittle RF (ed) Methods in enzymology, vol 183. Molecular evolution: computer analysis of protein and nucleic acid sequences. Academic Press, San Diego, pp 550–570Google Scholar
  21. Lanave C, Preparata G, Saccone C, Serio G (1984) A new method for calculating evolutionary substitution rates. J Mol Evol 20:86–93Google Scholar
  22. Nei M, Tateno Y (1978) Nonrandom amino acid substitution and estimation of the number of nucleotide substitution in evolution. J Mol Evol 11:333–347Google Scholar
  23. Nguyen T, Speed TP (1992) A derivation of all linear invariants for a nonbalanced transversion model. J Mol Evol 35:60–88Google Scholar
  24. Olsen G (1991) Systematic underestimation of tree branch lengths by Lake's operator metrics: an effect of position-dependent substitution rates. Mol Biol Evol 8:592–608Google Scholar
  25. Saccone C, Lanave C, Pesole G, Preparata G (1990) Influence of base composition on quantitative estimates of gene evolution. In: Doolittle RF (ed) Methods in enzymology, vol 183. Molecular evolution: computer analysis of protein and nucleic acid sequences. Academic Press, San Diego, pp 570–583Google Scholar
  26. Saitou N (1990) Maximum likelihood methods. In: Doolittle RF (ed) Methods in enzymology, vol 183. Molecular evolution: computer analysis of protein and nucleic acid sequences. Academic Press, San Diego, pp 584–598Google Scholar
  27. Saitou N, Nei M (1987) The neighbor-joining method: a new method for reconstructing phylogenetic trees. Mol Biol Evol 4:406–425Google Scholar
  28. Tajima F, Nei M (1982) Biases of the estimates of DNA divergence obtained by the restriction enzyme technique. J Mol Evol 18:115–120Google Scholar
  29. Tajima F, Nei M (1984) Estimation of evolutionary distance between nucleotide sequences. Mol Biol Evol 1:269–285Google Scholar
  30. Takahata N, Kimura M (1981) A model of evolutionary base substitution and its application with special reference to rapid change of pseudo-genes. Genetics 98:641–657Google Scholar
  31. Tamura K (1992) Estimation of the number of nucleotide substitutions when there are strong transition-transversion and G+C content biases. Mol Biol Evol 9:678–687Google Scholar
  32. Zharkikh A, Li W-H (1992) Statistical properties of bootstrap estimation of phylogenetic variability from nucleotide sequences. II. Four taxa without a molecular clock. J Mol Evol 35:356–366Google Scholar
  33. Zharkikh A, Li W-H (1993) Inconsistency of the maximum parsimony method: the case of five taxa with a molecular clock. Syst Biology 42:113–125Google Scholar

Copyright information

© Springer-Verlag New York Inc 1994

Authors and Affiliations

  • Andrey Zharkikh
    • 1
    • 2
  1. 1.Center for Demographic and Population GeneticsUniversity of TexasHoustonUSA
  2. 2.Institute of Cytology and GeneticsNovosibirskRussia

Personalised recommendations