Approximate matching of regular expressions

Myers, Eugene W.; Miller, Webb

doi:10.1007/BF02458834

Approximate matching of regular expressions

Published: January 1989

Volume 51, pages 5–37, (1989)
Cite this article

Bulletin of Mathematical Biology Aims and scope Submit manuscript

Eugene W. Myers¹ &
Webb Miller²

228 Accesses
112 Citations
Explore all metrics

Abstract

Given a sequenceA and regular expressionR, theapproximate regular expression matching problem is to find a sequence matchingR whose optimal alignment withA is the highest scoring of all such sequences. This paper develops an algorithm to solve the problem in timeO(MN), whereM andN are the lengths ofA andR. Thus, the time requirement is asymptotically no worse than for the simpler problem of aligning two fixed sequences. Our method is superior to an earlier algorithm by Wagner and Seiferas in several ways. First, it treats real-valued costs, in addition to integer costs, with no loss of asymptotic efficiency. Second, it requires onlyO(N) space to deliver just the score of the best alignment. Finally, its structure permits implementation techniques that make it extremely fast in practice. We extend the method to accommodate gap penalties, as required for typical applications in molecular biology, and further refine it to search for substrings ofA that strongly align with a sequence inR, as required for typical data base searches. We also show how to deliver an optimal alignment betweenA andR in onlyO(N+logM) space usingO(MN logM) time. Finally, anO(MN(M+N)+N ²logN) time algorithm is presented for alignment scoring schemes where the cost of a gap is an arbitrary increasing function of its length.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Literature

Abarbanel, R. M., P. R. Wieneke, E. Mansfield, D. A. Jaffe and D. L. Brutlag. 1984. “Rapid Searches for Complex Patterns in Biological Molecules.”Nucleic Acids Res. 12, 263–280.
Google Scholar
Aho, A. 1980. “Pattern Matching in Strings.” InFormal Language Theory, R. Book (Ed.). New York: Academic Press.
Google Scholar
—, J. E. Hopcroft and J. D. Ullman. 1983.Data Structures and Algorithms, pp. 203–208. Reading, MA: Addison-Wesley.
Google Scholar
Cohen, F. E., R. M. Abarbanel, I. D. Kuntz and R. J. Fletterick. 1986. “Turn Prediction in Proteins Using a Pattern-Matching Approach.”Biochemistry 25, 266–275.
Article Google Scholar
Fitch, W. M. and T. F. Smith. 1983. “Optimal Sequence Alignments.”Proc. Natn. Acad. Sci. U.S.A. 80, 1382–1386.
Article Google Scholar
Gotoh, O. 1982. “An Improved Algorithm for Matching Biological Sequences.”J. Molec. Biol. 162, 705–708.
Article Google Scholar
Hecht, M. S. 1977.Flow Analysis of Computer Programs. Amsterdam: North-Holland.
Google Scholar
— and J. D. Ullman. 1975. “A Simple Algorithm for Global Data Flow Analysis Programs.”SIAM J. Computing 4, 519–532.
Article MATH MathSciNet Google Scholar
Hopcroft, J. E. and J. D. Ullman. 1979.Introduction to Automata Theory, Languages, and Computation. Reading, MA: Addison-Wesley.
Google Scholar
Kennedy, K. 1975. “Node Listing Techniques Applied to Data Flow Analysis.”Proceedings of the 2nd ACM Conference on Principles of Programming Languages, 10–21.
Levenshtein, V. I. 1966. “Binary Codes Capable of Correcting Deletions, Insertions, and Reversals.”Cybernetics Control Theory 10, 707–710.
MathSciNet Google Scholar
Miller, W. 1987.A Software Tools Sampler. New Jersey. Prentice-Hall.
Google Scholar
— and E. W. Myers. 1988a. “A Simple Row-Replacement Method.”Software-Practice and Experience 18, 597–611.
Google Scholar
— and —. 1988b. “Sequence Comparison with Concave Weighting Functions.”Bull. Math. Biol. 50, 97–120.
Article MATH MathSciNet Google Scholar
Myers, E. W. and W. Miller. 1988a. “Row replacement Algorithms for Screen Editors.”ACM Trans. Prog. Lang. Systems. (to be published).
— and —. 1988b. “Optimal Alignments in Linear Space.”CABIOS 4, 11–17.
Google Scholar
Pennello, T. J. 1986. “Very Fast LR Parsing.” Proceedings of the SIGPLAN'86 Symposium on Compiler Construction.ACM SIGPLAN Notices 21, 145–150.
Article Google Scholar
Sankoff, D. and J. B. Kruskal. 1983.Time Warps, String Edits and Macromolecules: The Theory and Practice of Sequence Comparison. Reading, MA: Addison-Wesley.
Google Scholar
Sellers, P. H. 1980. “The Theory and Computation of Evolutionary Distances: Pattern Recognition.”J. Algorithms 1, 359–373.
Article MATH MathSciNet Google Scholar
—. 1984. “Pattern Recognition in Genetic Sequences by Mismatch Density.”Bull. Math. Biol. 46, 501–514.
Article MATH MathSciNet Google Scholar
Thompson, K. 1968. “Regular Expression Search Algorithm.”Comm. ACM 11, 419–422.
Article MATH Google Scholar
Wagner, R. A. 1974. “Order-n Correction of Regular Languages.”Comm. ACM 17, 265–268.
Article MATH Google Scholar
— and J. I. Seiferas. 1978. “Correcting Counter-Automaton-Recognizable Languages.”SIAM J. Computing 7, 357–375.
Article MATH MathSciNet Google Scholar
Waterman, M. S. 1984. “General Methods for Sequence Comparison.”Bull. Math. Biol. 46, 473–500.
Article MATH MathSciNet Google Scholar
—, T. F. Smith and W. A. Beyer. 1976. “Some Biological Sequence Metrics.”Adv. Maths 20, 367–387.
Article MATH MathSciNet Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, University of Arizona, 85721, Tucson, AZ, U.S.A.
Eugene W. Myers
Department of Computer Science, The Pennsylvania State University, 16802, University Park, PA, U.S.A.
Webb Miller

Authors

Eugene W. Myers
View author publications
You can also search for this author in PubMed Google Scholar
Webb Miller
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

About this article

Cite this article

Myers, E.W., Miller, W. Approximate matching of regular expressions. Bltn Mathcal Biology 51, 5–37 (1989). https://doi.org/10.1007/BF02458834

Download citation

Received: 16 June 1988
Issue Date: January 1989
DOI: https://doi.org/10.1007/BF02458834

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Approximate matching of regular expressions

Abstract

Access this article

Similar content being viewed by others

From regular expression matching to parsing

Computing the Shortest String and the Edit-Distance for Parsing Expression Languages

Online Matching of Multiple Regular Patterns with Gaps and Character Classes

Literature

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Approximate matching of regular expressions

Abstract

Access this article

Similar content being viewed by others

From regular expression matching to parsing

Computing the Shortest String and the Edit-Distance for Parsing Expression Languages

Online Matching of Multiple Regular Patterns with Gaps and Character Classes

Literature

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation