Abstract
Dictionary matching for regular expressions has gained recent interest because of a multitude of applications, including DNA sequence analysis, XML filtering, and network traffic analysis. In some applications, allowing wildcard and character class gaps in strings is enough, but usually the full expressive power of regular expressions is needed. In this paper we present and analyze a new algorithm for online dictionary matching for regular expressions. The unique feature of our algorithm is that it builds upon an algorithm for dictionary matching of string patterns with wildcard gaps, but is also capable of treating more complex regular expressions. In our experiments we used real data from expressions used for filtering spam e-mail. The size of the dictionary, that is, the number of different regular expressions to be matched varied from one to 3080. To find out how our algorithm scales to much larger numbers of patterns, we made small random changes to these patterns to produce up to 100000 patterns that are similar in style. We found out that the scalability of our algorithm is very good, being at its best for 10000–20000 patterns. Our algorithm outperforms the tested competitors for large dictionaries, GNU grep already for tens of patterns and Google’s RE2 for hundreds of patterns.
This research was partially supported by the Academy of Finland.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Aho, A.V., Corasick, M.J.: Efficient String Matching: an Aid to Bibliographic Search. Commun. of the ACM 18, 333–340 (1975)
Amir, A., Levy, A., Porat, E., Shalom, B.R.: Dictionary matching with one gap. In: Kulikov, A.S., Kuznetsov, S.O., Pevzner, P. (eds.) CPM 2014. LNCS, vol. 8486, pp. 11–20. Springer, Heidelberg (2014)
Bille, P., Gørtz, I.L., Vildhøj, H.W., Wind, D.K.: String Matching with Variable Length Gaps. Theoretical Computer Science 443, 25–34 (2012)
Bille, P., Thorup, M.: Regular expression matching with multi-strings and intervals. In: Proc. of the 21st Annual ACM-SIAM Symposium on Discrete Algorithms, pp. 1297–1308 (2010)
Bucher, P., Bairoch, A.: A generalized profile syntax for biomolecular sequence motifs and its function in automatic sequence interpretation. In: Proc. of Intelligent Systems for Molecular Biology, ISMB, pp. 53–61 (1994)
De Castro, E., Sigrist, C.J.A., Gattiker, A., Bulliard, V., Langendijk-Genevaux, P.S., Gasteiger, E., Bairoch, E.A., Hulo, N.: ScanProsite: Detection of PROSITE Signature Matches and ProRule-Associated Functional and Structural Residues in Proteins. Nucleic Acids Res. 34, 362–365 (2006)
Diao, Y., Rizvi, S., Franklin, M.J.: Towards an internet-scale XML dissemination service. In: Proc. of Very Large Data Bases, VLDB, pp. 612–623 (2004)
Haapasalo, T., Silvasti, P., Sippu, S., Soisalon-Soininen, E.: Online dictionary matching with variable-length gaps. In: Pardalos, P.M., Rebennack, S. (eds.) SEA 2011. LNCS, vol. 6630, pp. 76–87. Springer, Heidelberg (2011)
Morgante, M., Policriti, A., Vitacolonna, N., Zuccolo, A.: Structured Motifs Search. J. Comput. Biol. 12, 1065–1082 (2005)
Navarro, G.: NR-Grep: A Fast and Flexible Pattern-Matching Tool. Software - Practice and Experience - SPE 31, 1265–1312 (2001)
The Open Group, Regular Expressions, Chapter 9 of The Open Group Base Specifications Issue 6, Base Definitions volume, IEEE Std 1003.1, The IEEE and the Open Group (2004)
Pinter, R.Y.: Efficient string matching. In: Combinatorial Algorithms on Words, NATO Advanced Science Institute Series F: Computer and System Sciences, vol. 12, pp. 11–29 (1985)
Rahman, M.S., Iliopoulos, C.S., Lee, I., Mohamed, M., Smyth, W.F.: Finding patterns with variable length gaps or don’t cares. In: Chen, D.Z., Lee, D.T. (eds.) COCOON 2006. LNCS, vol. 4112, pp. 146–155. Springer, Heidelberg (2006)
Sippu, S., Soisalon-Soininen, E.: Online matching of multiple regular patterns with gaps and character classes. In: Dediu, A.-H., Martín-Vide, C., Truthe, B. (eds.) LATA 2013. LNCS, vol. 7810, pp. 523–534. Springer, Heidelberg (2013)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Saikkonen, R., Sippu, S., Soisalon-Soininen, E. (2015). Experimental Analysis of an Online Dictionary Matching Algorithm for Regular Expressions with Gaps. In: Bampis, E. (eds) Experimental Algorithms. SEA 2015. Lecture Notes in Computer Science(), vol 9125. Springer, Cham. https://doi.org/10.1007/978-3-319-20086-6_25
Download citation
DOI: https://doi.org/10.1007/978-3-319-20086-6_25
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-20085-9
Online ISBN: 978-3-319-20086-6
eBook Packages: Computer ScienceComputer Science (R0)