Skip to main content
Log in

Multi-pattern matching algorithm with wildcards based on bit-parallelism

  • Algorithm
  • Published:
Wuhan University Journal of Natural Sciences

Abstract

Multi-pattern matching with wildcards is a problem of finding the occurrence of all patterns in a pattern set {p 1, …, p k} in a given text t. If the percentage of wildcards in pattern set is not high, this problem can be solved using finite automata. We introduce a multi-pattern matching algorithm with a fixed number of wildcards to overcome the high percentage of the occurrence of wildcards in patterns. In our proposed method, patterns are matched as bit patterns using a sliding window approach. The window is a bit window that slides along the given text, matching against stored bit patterns. Matching process is executed using bit wise operations. The experimental results demonstrate that the percentage of wildcard occurrence does not affect the proposed algorithm’s performance and the proposed algorithm is more efficient than the algorithms based on the fast Fourier transform. The proposed algorithm is simple to implement and runs efficiently in O(n + d(n/σ)(m/w)) time, where n is text length, d is symbol distribution over k patterns, m is pattern length, and σ is alphabet size.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Clifford P, Clifford R. Simple deterministic wildcard matching [J]. Inf Process Lett, 2007, 101(2): 53–54.

    Article  Google Scholar 

  2. Fischer M, Paterson M. String matching and other products [C] // Proceedings of the 7th SIAMAMS Complexity of Computation. Providence:SIAM, 1974: 113–125.

    Google Scholar 

  3. Baeza-Yates R A, Gonnet G H. A new approach to text searching [J]. Communications of the ACM, 1992, 35(10): 74–82.

    Article  Google Scholar 

  4. Indyk P. Faster algorithms for string matching problems: Matching the convolution bound [C] // Proceedings of the 38th Annual Symposium on Foundations of Computer Science. Washington D C: IEEE Press, 1998: 166–173.

    Google Scholar 

  5. Cole R, Hariharan R. Verifying candidate matches in sparse and wildcard matching [J] // 7th Proceedings of the Annual ACM Symposium on Theory of Computing. New York: ACM Press, 2002:592–601.

    Google Scholar 

  6. Rahman M, Iliopoulos C. Pattern matching algorithms with don’t cares [J]. SOFSEM, 2007,(2): 116–126.

    Google Scholar 

  7. Linhart C, Shamir R. Faster pattern matching with character classes using prime number encoding [J]. J Comput Syst Sci, 2009, 75(3): 155–162.

    Article  Google Scholar 

  8. Kalai A. Efficient pattern-matching with don’t cares [C] // Proceedings of the 13th Annual ACM-SIAM Symposium on Discrete Algorithms. New York: ACM Press, 2002: 655–656.

    Google Scholar 

  9. Qiang J, Guo D, Fang Y, et al. Multiple pattern matching with wildcards and one-off condition [J]. Journal of Computational Information Systems, 2013, 9(14): 5543–5552.

    Google Scholar 

  10. Guo D, Hu X, Xie F, et al. Pattern matching with wildcards and gap-length constraints based on a centrality-degree graph [J]. Applied Intelligence, 2013, 39(1): 57–74.

    Article  Google Scholar 

  11. Navarro G, Raffinot M. Fast and simple character classes and bounded gaps pattern matching, with application to protein searching [C] // Proc the 5th Annual International Conference on Computational Biology. New York: ACM Press, 2001: 231–240.

    Google Scholar 

  12. Morgante M, Policriti A, Vitacolonna N, et al. Structured motifs search [J]. Journal of Computational Biology, 2005, 12(8): 1065–1082.

    Article  CAS  PubMed  Google Scholar 

  13. Cole R, Gottlieb L, Lewenstein M. Dictionary matching and indexing with errors and don't cares [C] // Proc the 36th Annual ACM Symposium on the Theory of Computing. New York: ACM Press, 2004: 91–100.

    Google Scholar 

  14. Haapasalo T, Silvasti P, Sippu S, et al. Online dictionary matching with variable-length gaps [C] // Proc the 10th Int Conf Experimental Algorithms. Berlin: Springer-Verlag, 2011: 76–87.

    Chapter  Google Scholar 

  15. Arslan A N, He D, He Y, et al. Pattern matching with wildcards and length constraints using maximum network flow [J]. Journal of Discrete Algorithms, 2015,(1): 9–16.

    Article  Google Scholar 

  16. Kucherov G, Rusinowitch M. Matching a set of strings with variable length don’t cares [J]. Theor Comput Sci, 1997, 178(1-2): 129–154.

    Article  Google Scholar 

  17. Zhang M, Zhang Y, Hu L. A faster algorithm for matching a set of patterns with variable length don’t cares [J]. Inf Process Lett, 2010, 110(6): 216–220.

    Article  Google Scholar 

  18. Ding B, Lo D, Han J, et al. Efficient mining of closed repetitive gapped subsequences from a sequence database [C] // Proc the 25th IEEE International Conference on Data Engineering, Washington D C: IEEE Press, 2009: 1024–1035.

    Google Scholar 

  19. Wu X, Zhu X, He Y, et al. PMBC: Pattern mining from biological sequences with wildcard constraints [J]. Computers in Biology and Medicine, 2013, 43(5): 481–492.

    Article  CAS  PubMed  Google Scholar 

  20. Barton C, Iliopoulos C S. On the average-case complexity of pattern matching with wildcards [J]. CoRR, 2014, abs/1407.0950.

  21. Fredriksson K, Grabowski S Z. Practical and optimal string matching [C] // Proceedings of the 12th International Symposium on String Processing and Information Retrieval (SPIRE'2005), LNCS 3772. Berlin:Springer-Verlag, 2005: 374–385.

    Google Scholar 

  22. Fredriksson K, Grabowski S. Average-optimal string matching [J]. Journal of Discrete Algorithms, 2009, (5): 579–594.

    Article  Google Scholar 

  23. Navarro G, Raffinot M. Flexible Pattern Matching in Strings-Practical On-line Search Algorithms for Texts and Biological Sequences [M]. Cambridge: Cambridge University Press, 2002.

    Book  Google Scholar 

  24. Peltola H, Tarhio J. Alternative algorithms for bit-parallel string matching [C] // Proceedings of SPIRE'2003, LNCS 2857. Berlin: Springer-Verlag, 2003: 80–94.

    Google Scholar 

  25. Holub J, Durian B. Fast Variants of Bit Parallel Approach to Suffix Automata [R]. Haifa: University of Haifa, 2005.

    Google Scholar 

  26. Durian B, Holub J, Peltola H, et al. Tuning BNDM with q-grams [C] // Proceedings of the Tenth Workshop on Algorithm Engineering and Experiments (ALENEX09). Providence: SIAM, 2009: 29–37.

    Google Scholar 

  27. Bertossi A A, Logi F. Parallel string matching with variable length don't cares [J]. Journal of Parallel and Distributed Computing, 1994, 22(2): 229–234.

    Article  Google Scholar 

  28. Blumer A, Blumer J, Haussler D, et al. The smallest automaton recognizing the subwords of a text [J]. Theoretical Computer Science, 1985, 40(1): 31–55.

    Article  Google Scholar 

  29. Chan H L, Hon W K, Lam W T, et al. Compressed indexes for dynamic text collections [J]. ACM Trans Algorithms, 2007, 3(2): 1–29.

    Article  Google Scholar 

  30. Zhang M, Zhang Y, Tang J. Multi-pattern matching with wildcards [J]. Journal of Software, 2011, 6(12): 2391–2398(Ch).

    Google Scholar 

Download references

Acknowledgments

I would like to express my gratitude to Jilin University, which has provided me with a full schoolarship and an excellent environment for my Ph.D. studies and research. In addition, I would like to extend my sincere appreciation to Dr. Zhang Meng for his continuous support and useful discussions.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Liang Hu.

Additional information

Foundation item: Supported by the European Framework Program (FP7) ( FP7-PEOPLE-2011-IRSES), the National Sci-Tech Support Plan of China (2014BAH02F03)

Biography: Ahmed A. F. Saif, male, Ph.D. candidate, research directions: stringology, network security and algorithm design.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Saif, A.A.F., Hu, L. & Chu, J. Multi-pattern matching algorithm with wildcards based on bit-parallelism. Wuhan Univ. J. Nat. Sci. 22, 178–184 (2017). https://doi.org/10.1007/s11859-017-1232-7

Download citation

  • Received:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11859-017-1232-7

Key words

CLC number

Navigation