Abstract
Multiple string matching is often completed under the presence of U- or V-uncertain-strings, or combinations thereof. Recognizing large numbers of strings with U-, V-, and U-V-uncertain-strings, including the interleaving of two or more uncertain strings, is important to thoroughly gathering useful information and detecting harmful information. This paper proposes a complete automaton and its high-speed construction algorithm for large-scale U-, V-, and U-V-uncertain multiple strings, including two or more uncertain strings interlaced with one another. The maximum number of parallel complete automation of the V-uncertain string is also given. This paper reveals that there are two kinds of pretermissions, i.e., similarly-connected and interlaced-string pretermissions, and that mistake may appear in the matching of the regular expressions, or states in the automaton may increase in number, if the intersection of the U-uncertain strings sets and the homologous subsequent special point in the U-uncertain strings sets are not eliminated from the whole system.
Similar content being viewed by others
References
Aho A, Corasick M. Efficient string matching: An aid to bibliographic search. Commun ACM, 1975, 18: 333–340
Navarro G, Raffitnot M. Flexible Pattern Matching in Strings: Practical On-line Search Algorithms for Texts and Biological Sequences. Cambridge: Cambridge Univercity Press, 2002
Alicherry M, Muthuprasanna M, Kumar V. High speed pattern matching for network IDS/IPS. In: Proceedings of IEEE International Conference on Network Protocols, Santa Barbara, USA, 2006. 187–196
Baker Z, Prasanna V. High-throughput linked-pattern matching for intrusion detection systems. In: Proceedings of the 2005 ACM Symposium on Architecture for Networking and Communications Systems, Princeton, USA, 2005. 193–202
Baker Z K, Prasanna V K. Time and area efficient pattern matching on FPGAs. In: Proceedings of the ACM/SIGDA 12th International Symposium on Field Programmable Gate Arrays, Monterey, USA, 2004. 223–232
Bispo J, Sourdis I, Cardoso J M, et al. Regular expression matching for reconfigurable packet inspection. In: Proceedings of IEEE International Conference on Field Programmable Technology, Bangkok Thailand, 2006. 119–126
Clark C R, Schimmel D E. Scalable pattern matching for high speed networks. In: Proceedings of 12th IEEE Symposium on Field-Programmable Custom Computing Machines, Napa Valley, USA, 2004. 249–257
Dharmapurikar S, Lockwood J W. Fast and scalable pattern matching for network intrusion detection systems. IEEE J Select Areas Commun, 2006, 24: 1781–1792
Knuth D. The Art of Computer Programming, Vol. 3, Sorting and Searching. Boston: Addison-Wesley, 1973
He L T, Fang B X, Yu X Z. A time optimal exact string matching algorithm (in Chinese). J Softw, 2005, 16: 676–683
Kumar S, Dharmapurikar S, Yu F. Algorithms to accelerate multiple regular expressions matching for deep packet inspection. In: Proceedings of SIGCOMM, Pisa, Italy, 2006. 339–350
Liu Y B, Yang Y F, Liu P, et al. Table compression method for extended Aho-Corasick automaton. In: Proceedings of the 14th International Conference on Implementation and Application of Automata, Sydney, Australia, 2009. 84–93
Liu P. Research of string matching for Internet content filtering (in Chinese). Dissertation for Master’s Degree. Beijing: Institute of Computing Technology, Chinese Academy of Sciences, 2005
Nieminen J, Kilpel P. Efficient implementation of Aho-Corasick pattern matching automata using unicode. Softw Pract Exper, 2007, 37: 669–690
Nishimuar T, Fukamachi S, Shinohara T. Speed-up of Aho-Corasick pattern matching machines by rearranging states. In: Proceedings of SPIRE, Laguna de San Rafael, Chile, 2001. 175–185
Papaefstathiou I, Dimopoulos V, Pnevmatikatos D. A memory-efficient reconfigurable Aho-Corasick FSM implementation for intrusion detection systems. In: Proceedings of IEEE ICSAMOS, Samos, Greece, 2007. 186–193
Piyachon P, Luo Y. Efficient memory utilization on network processors for deep packet inspection. In: Proceedings of the 2006 ACM/IEEE Symposium on Architectures for Networking and Communications Systems, San Jose, USA, 2006. 71–80
Rejeb J, Srinivasan M. Extension of Aho-Corasick algorithm to detect injection attacks. Adv Comput Inf Sci Eng, 2008, 207–212
Song H, Dai Y Q. A new fast string matching algorithm for content filtering and detection (in Chinese). Comput Res Dev, 2004, 41: 940–948
Sourdis I, Pnevmatikatos D N, Vassiliadis S. Scalable multigigabit pattern matching for packet inspection. IEEE Trans VLSI Syst, 2008, 16: 156–166
Tan L, Brotherton B, Sherwood T. Bit-split string-matching engines for intrusion detection and prevention. ACM Trans Archit Code Optim, 2006, 3: 3–34
Tripp G. A finite-state machine based string matching system for intrusion detection on high-speed networks. In: Proceedings of EICAR, Saint Julians, Malta, 2005. 26–40
Tseng K K, Lai Y C, Lin Y D, et al. A fast scalable automaton matching accelerator for embedded content processors. ACM SIGARCH Comput Archit News, 2007, 35: 38–43
Tuck N, Sherwood T, Calder B, et al. Deterministic memory efficient string matching algorithms for intrusion detection. In: Proceedings of the IEEE INFOCOM, Hong Kong, China, 2004. 333–340
Yu J M, Xue Y B, Li J. Memory efficient string matching algorithm for network intrusion management system. Tsinghua Sci Technol, 2007, 12: 585–593
Hu Y, Wang P F, Hwang K. A fast algorithm for multi-string matching based on automata optimization. In: Proceedings of IEEE International Conference on Future Computer and Communication, Wuhan, China, 2010
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Hu, Y., Gao, Q., Guo, L. et al. Giant complete automaton for uncertain multiple string matching and its high speed construction algorithm. Sci. China Inf. Sci. 54, 1562–1571 (2011). https://doi.org/10.1007/s11432-011-4363-z
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11432-011-4363-z