Summary
By example, taken from actual experimental data, it is shown that neglecting the phenomena of multiple hits, back mutation, and chance coincidence can lead to errors larger than 100% in the calculated value of the average number of nucleotide base differences to be expected between two homologous polynucleotides. Mathematical formulas are derived to correct quantitatively for these effects, which although they do not change the topology of the phylogenetic trees derived by others, do change materially the quantitative aspects of these phylogenies, such as the length of the legs of the trees. In particular the following problems are solved without approximation:
-
1.
Consider a polynucleotide which containsL individual nucleotides. Let exactlyX mutagenic events occur randomly along the length of this polynucleotide. After theX mutagenic events have occurred, in general, a numberx, which is less thanL, nucleotide sites will have been hit; for example, allX mutagenic events might occur at the same nucleotide site. LetN (x) designate the average number of nucleotide sites which have been hit. An explicit formula forN (x) is derived.
-
2.
The average numberN′ (x) of nucleotide sites that have been altered will in general be less thanN (x) because of back mutations. An explicit expression forN′ (x) is given.
-
3.
An explicit formula for the average number of nucleotide base differencesN (D) between two homologous polynucleotides of the same length is derived, including correction for chance coincidences.
Similar content being viewed by others
References
Dayhoff, M. O.: Atlas of protein sequence and structure. Silver Spring, Maryland: National Biomedical Research Foundation 1969.
Doolittle, R. F., Blombaeck, R.: Nature (Lond.)202, 147 (1964).
Feller, W.: An introduction to probability theory and its applications, 2nd ed., p. 58, 213. New York: John Wiley & Sons 1966.
Fitch, W. M., Margoliash, E.: Science155, 279 (1967).
—, —, Evolutionary Biology4, 88 (1970).
Gatlin, L.: J. theor. Biol.18, 181 (1968).
Hoel, P. G.: Introduction to mathematical statistics, 2nd ed., p. 61, 74, 101. New York: John Wiley & Sons 1954.
Holmquist, R.: Ph. D. Thesis, p. 249–258, California Institute of Technology, Pasadena (1966).
- In: Sixth Berkeley Symposium on Mathematical Statistics and Probability: Conference on Evolution, University of California at Berkeley (1971).
- Cantor, C., Jukes, T. H.: J. molec. Biol. (1972) (in press).
Jukes, T. H., Holmquist, R.: J. molec. Biol. (1972) (in press).
King, J. L., Jukes, T. H.: Science164, 788 (1969).
Kohne, D. E.: Quart. Rev. Biophys.33, 327 (1970).
Margoliash, E.: Canad. J. Biochem.42, 745 (1964).
Martin, M., Hoyer, B. H.: J. molec. Biol.27, 113 (1967).
Matsubara, H., Jukes, T. H., Cantor, C. R.: Brookhaven Symp.21, 201 (1968).
Neyman, J.: Molecular studies of evolution: A source of novel statistical problems. Statistical Laboratory, University of California at Berkeley (1970).
Pauling, L., Zuckerkandl, E.: Acta chem. scand.17, 89 (1963).
Reichert, T. A., Wong, A. K. C.: An application of information theory to genetic mutations and the matching of polypeptide sequences. Biotechnology Program, Carnegie-Mellon University, Pittsburgh (1970).
Shannon, C. E., Weaver, W.: The mathematical theory of communication, Urbana: University Illinois Press 1949.
Wilson, A. C., Sarich, V. M.: Proc. nat. Acad. Sci. Wash.63, 1088 (1969).
Zuckerkandl, E.: Scientific Amer. May, 189 (1965).
—, Pauling, L.: Horizons in biochemistry, M. Kasha and B. Pullman, eds., p. 189. New York: Academic Press 1962.
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Holmquist, R. Theoretical foundations for a quantitative approach to paleogenetics. J Mol Evol 1, 115–133 (1972). https://doi.org/10.1007/BF01659159
Received:
Issue Date:
DOI: https://doi.org/10.1007/BF01659159