Skip to main content
Log in

Giant complete automaton for uncertain multiple string matching and its high speed construction algorithm

  • Research Papers
  • Published:
Science China Information Sciences Aims and scope Submit manuscript

Abstract

Multiple string matching is often completed under the presence of U- or V-uncertain-strings, or combinations thereof. Recognizing large numbers of strings with U-, V-, and U-V-uncertain-strings, including the interleaving of two or more uncertain strings, is important to thoroughly gathering useful information and detecting harmful information. This paper proposes a complete automaton and its high-speed construction algorithm for large-scale U-, V-, and U-V-uncertain multiple strings, including two or more uncertain strings interlaced with one another. The maximum number of parallel complete automation of the V-uncertain string is also given. This paper reveals that there are two kinds of pretermissions, i.e., similarly-connected and interlaced-string pretermissions, and that mistake may appear in the matching of the regular expressions, or states in the automaton may increase in number, if the intersection of the U-uncertain strings sets and the homologous subsequent special point in the U-uncertain strings sets are not eliminated from the whole system.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Aho A, Corasick M. Efficient string matching: An aid to bibliographic search. Commun ACM, 1975, 18: 333–340

    Article  MathSciNet  MATH  Google Scholar 

  2. Navarro G, Raffitnot M. Flexible Pattern Matching in Strings: Practical On-line Search Algorithms for Texts and Biological Sequences. Cambridge: Cambridge Univercity Press, 2002

    MATH  Google Scholar 

  3. Alicherry M, Muthuprasanna M, Kumar V. High speed pattern matching for network IDS/IPS. In: Proceedings of IEEE International Conference on Network Protocols, Santa Barbara, USA, 2006. 187–196

  4. Baker Z, Prasanna V. High-throughput linked-pattern matching for intrusion detection systems. In: Proceedings of the 2005 ACM Symposium on Architecture for Networking and Communications Systems, Princeton, USA, 2005. 193–202

  5. Baker Z K, Prasanna V K. Time and area efficient pattern matching on FPGAs. In: Proceedings of the ACM/SIGDA 12th International Symposium on Field Programmable Gate Arrays, Monterey, USA, 2004. 223–232

  6. Bispo J, Sourdis I, Cardoso J M, et al. Regular expression matching for reconfigurable packet inspection. In: Proceedings of IEEE International Conference on Field Programmable Technology, Bangkok Thailand, 2006. 119–126

  7. Clark C R, Schimmel D E. Scalable pattern matching for high speed networks. In: Proceedings of 12th IEEE Symposium on Field-Programmable Custom Computing Machines, Napa Valley, USA, 2004. 249–257

  8. Dharmapurikar S, Lockwood J W. Fast and scalable pattern matching for network intrusion detection systems. IEEE J Select Areas Commun, 2006, 24: 1781–1792

    Article  Google Scholar 

  9. Knuth D. The Art of Computer Programming, Vol. 3, Sorting and Searching. Boston: Addison-Wesley, 1973

    Google Scholar 

  10. He L T, Fang B X, Yu X Z. A time optimal exact string matching algorithm (in Chinese). J Softw, 2005, 16: 676–683

    Article  MathSciNet  MATH  Google Scholar 

  11. Kumar S, Dharmapurikar S, Yu F. Algorithms to accelerate multiple regular expressions matching for deep packet inspection. In: Proceedings of SIGCOMM, Pisa, Italy, 2006. 339–350

  12. Liu Y B, Yang Y F, Liu P, et al. Table compression method for extended Aho-Corasick automaton. In: Proceedings of the 14th International Conference on Implementation and Application of Automata, Sydney, Australia, 2009. 84–93

  13. Liu P. Research of string matching for Internet content filtering (in Chinese). Dissertation for Master’s Degree. Beijing: Institute of Computing Technology, Chinese Academy of Sciences, 2005

    Google Scholar 

  14. Nieminen J, Kilpel P. Efficient implementation of Aho-Corasick pattern matching automata using unicode. Softw Pract Exper, 2007, 37: 669–690

    Article  Google Scholar 

  15. Nishimuar T, Fukamachi S, Shinohara T. Speed-up of Aho-Corasick pattern matching machines by rearranging states. In: Proceedings of SPIRE, Laguna de San Rafael, Chile, 2001. 175–185

  16. Papaefstathiou I, Dimopoulos V, Pnevmatikatos D. A memory-efficient reconfigurable Aho-Corasick FSM implementation for intrusion detection systems. In: Proceedings of IEEE ICSAMOS, Samos, Greece, 2007. 186–193

  17. Piyachon P, Luo Y. Efficient memory utilization on network processors for deep packet inspection. In: Proceedings of the 2006 ACM/IEEE Symposium on Architectures for Networking and Communications Systems, San Jose, USA, 2006. 71–80

  18. Rejeb J, Srinivasan M. Extension of Aho-Corasick algorithm to detect injection attacks. Adv Comput Inf Sci Eng, 2008, 207–212

  19. Song H, Dai Y Q. A new fast string matching algorithm for content filtering and detection (in Chinese). Comput Res Dev, 2004, 41: 940–948

    Google Scholar 

  20. Sourdis I, Pnevmatikatos D N, Vassiliadis S. Scalable multigigabit pattern matching for packet inspection. IEEE Trans VLSI Syst, 2008, 16: 156–166

    Article  Google Scholar 

  21. Tan L, Brotherton B, Sherwood T. Bit-split string-matching engines for intrusion detection and prevention. ACM Trans Archit Code Optim, 2006, 3: 3–34

    Article  Google Scholar 

  22. Tripp G. A finite-state machine based string matching system for intrusion detection on high-speed networks. In: Proceedings of EICAR, Saint Julians, Malta, 2005. 26–40

  23. Tseng K K, Lai Y C, Lin Y D, et al. A fast scalable automaton matching accelerator for embedded content processors. ACM SIGARCH Comput Archit News, 2007, 35: 38–43

    Google Scholar 

  24. Tuck N, Sherwood T, Calder B, et al. Deterministic memory efficient string matching algorithms for intrusion detection. In: Proceedings of the IEEE INFOCOM, Hong Kong, China, 2004. 333–340

  25. Yu J M, Xue Y B, Li J. Memory efficient string matching algorithm for network intrusion management system. Tsinghua Sci Technol, 2007, 12: 585–593

    Article  MathSciNet  Google Scholar 

  26. Hu Y, Wang P F, Hwang K. A fast algorithm for multi-string matching based on automata optimization. In: Proceedings of IEEE International Conference on Future Computer and Communication, Wuhan, China, 2010

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yue Hu.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Hu, Y., Gao, Q., Guo, L. et al. Giant complete automaton for uncertain multiple string matching and its high speed construction algorithm. Sci. China Inf. Sci. 54, 1562–1571 (2011). https://doi.org/10.1007/s11432-011-4363-z

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11432-011-4363-z

Keywords

Navigation