Skip to main content
Log in

Pattern matching between two non-aligned random sequences

  • Published:
Bulletin of Mathematical Biology Aims and scope Submit manuscript

Abstract

Given two independent sequences of letters, we seek the probability distribution of the length of the longest matching word. This word can be in different positions in the two sequences and we consider both perfect and nearly perfect matching. We derive bounds and approximations for the probability and compare them with other bounds and approximations. The results can be applied to DNA sequences in molecular biology and generalized matching between two independent random sequences.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

Literature

  • Arratia, R., L. Gordon and M. S. Waterman. 1986. An extreme value theory for sequence matching.Ann. Statist. 14, 971–993.

    MATH  MathSciNet  Google Scholar 

  • Arratia, R., L. Gordon and M. S. Waterman. 1990. The Erdos-Renyi law in distribution, for coin tossing and sequence matching.Ann. Statist. 18, 539–570.

    MATH  MathSciNet  Google Scholar 

  • Chen, L. H. Y. 1975. Poisson approximation for dependent trials.Ann. probab. 3, 534–545.

    MATH  Google Scholar 

  • Erdos, P. and P. Revesz. 1975. On the length of the longest head-run.Topics in Information Theory, Colloquia Math Soc. J. Bolyai 16, 219–228. Keszthely, Hungary.

    MathSciNet  Google Scholar 

  • Fu, Y. X. and R. N. Curnow. 1990. Locating a changed segment in a sequence of Bernoulli variables.Biometrika V77, 295–304.

    Article  MATH  MathSciNet  Google Scholar 

  • Glaz, J. 1993. Extreme order statistics for a sequence of dependent random variables. InStochastic Inequalities, IMS Lecture Notes—Monograph Series, Vol. 22, pp. 100–115.

    MathSciNet  Google Scholar 

  • Glaz, J. and J. I. Naus 1991. Tight bounds and approximations for scan statistic probabilities for discrete data.Ann. Appl. Probab. 1, 306–318.

    MATH  MathSciNet  Google Scholar 

  • Gordon, L., M. F. Schilling and M. S. Waterman. 1986. An extreme value theory for long head runs.Probab. Theor. Rel. Fields 72, 279–287.

    Article  MATH  MathSciNet  Google Scholar 

  • Hoover, D. R. 1990. Subset complement addition upper bounds—an improved inclusion-exclusion method.J. Statist. Plann. Inf. 24, 195–202.

    Article  MATH  Google Scholar 

  • Hunter, D. 1976. An upper bound for the probability of a union.J. Appl. Probab. 13, 597–603.

    Article  MATH  MathSciNet  Google Scholar 

  • Karlin, S. and F. Ost. 1987. Counts of long aligned word matches among random letter sequences.Adv. Appl. Prob. 19, 293–351.

    Article  MATH  MathSciNet  Google Scholar 

  • Karlin, S. and F. Ost. 1988. Maximal length of common words among random letter sequences.Ann. Probab. 16, 535–563.

    MATH  MathSciNet  Google Scholar 

  • Mott, R. F., T. B. L. Kirkwood and R. N. Curnow. 1990. An accurate approximation to the distribution of the length of the longest matching word between two random DNA sequences.Bull. math. Biol. 52, 773–784.

    Article  MATH  Google Scholar 

  • Naus, J. I. 1974. Probabilities for a generalized birthday problem.J. Am. Statist. Assoc. 69, 810–815.

    Article  MATH  MathSciNet  Google Scholar 

  • Naus, J. I. 1982. Approximations for distributions of scan statistics.J. Am. Statist. Assoc. 77, 177–183.

    Article  MATH  MathSciNet  Google Scholar 

  • Stein, C. M. 1986.Approximate Computation of Expectations. Hayward, CA: IMS.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

About this article

Cite this article

Sheng, KN., Naus, J.I. Pattern matching between two non-aligned random sequences. Bltn Mathcal Biology 56, 1143–1162 (1994). https://doi.org/10.1007/BF02460290

Download citation

  • Received:

  • Issue Date:

  • DOI: https://doi.org/10.1007/BF02460290

Keywords

Navigation