Skip to main content
Log in

Distribution of the number of matches between nucleotide sequences

  • Published:
Bulletin of Mathematical Biology Aims and scope Submit manuscript

Abstract

When two strings of symbols are aligned it is important to know whether the observed number of matches is better than that expected between two independent sequences with the same frequency of symbols. When strings are of different lengths, nulls need to be inserted in order to align the sequences. One approach is to use simple approximations of sampling for replacement. We describe an algorithm for exactly determining the frequencies of given numbers of matches, sampling without replacement. This does not lead to a simple closed form expression. However we show examples where sampling with, or without, replacement give very similar results and the simple approach may be adequate for all but the smallest cases.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

Literature

  • Altschul, S. F. and B. W. Erickson. 1988. Significance levels for biological sequence comparison using non-linear similarity functions.Bull. math. Biol. 50, 77–92.

    Article  MATH  Google Scholar 

  • Barker, W. C. and M. O. Dayhoff. 1977. Evolution of lipoproteins deduced from protein sequence data.Comp. Biochem. Physiol. 57B, 309–315.

    Google Scholar 

  • Cedergren, R. J., D. Sankoff, B. LaRue and H. Grosjean. 1981. The evolving tRNA molecule.Crit. Rev. Biochem. 11, 35–104.

    Google Scholar 

  • Felsenstein, J. 1982. Numerical methods for inferring evolutionary trees.Q. Rev. Biol. 57, 379–404.

    Article  Google Scholar 

  • Karlin, S., G. Ghandour and D. E. Foulser. 1985. DNA sequences comparisons of the human, mouse and rabbit immunoglobulin kappa gene.Molec. biol. Evol. 2, 35–52.

    Google Scholar 

  • Kruskal, J. B. 1983. An overview of sequence comparison: time warps, string edits, and macromolecules.SIAM Rev. 25, 201–237.

    Article  MATH  MathSciNet  Google Scholar 

  • Needleman, S. B. and C. D. Wunsch. 1970. A general method applicable to the search for similarities in the amino acid sequence of two proteins.J. molec. Biol. 48, 443–453.

    Article  Google Scholar 

  • Peacock, D. 1982. Data handling for phylogenetic trees. InBiochemical Evolution, H. Gutfreund (ed.), pp. 85–115. Cambridge University Press.

  • Penny, D., L. R. Foulds and M. D. Hendy. 1982. Testing the theory of evolution by comparing phylogenetic trees constructed from five different protein sequences.Nature 297, 197–200.

    Article  Google Scholar 

  • Penny, D. and M. D. Hendy. 1985. Testing methods of evolutionary tree construction.Cladistics 1, 266–278.

    Article  Google Scholar 

  • Penny, D. and M. D. Hendy. 1986. Estimating the reliability of evolutionary trees.Molec. biol. Evol. 3, 403–417.

    Google Scholar 

  • Waterman, M. S. 1984. General methods of sequence comparison.Bull. math. Biol. 46, 473–500.

    Article  MATH  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

About this article

Cite this article

Rinsma, I., Hendy, M. & Penny, D. Distribution of the number of matches between nucleotide sequences. Bltn Mathcal Biology 52, 349–358 (1990). https://doi.org/10.1007/BF02458576

Download citation

  • Received:

  • Revised:

  • Issue Date:

  • DOI: https://doi.org/10.1007/BF02458576

Keywords

Navigation