Abstract
Multi-pattern matching with wildcards is a problem of finding the occurrence of all patterns in a pattern set {p 1, …, p k} in a given text t. If the percentage of wildcards in pattern set is not high, this problem can be solved using finite automata. We introduce a multi-pattern matching algorithm with a fixed number of wildcards to overcome the high percentage of the occurrence of wildcards in patterns. In our proposed method, patterns are matched as bit patterns using a sliding window approach. The window is a bit window that slides along the given text, matching against stored bit patterns. Matching process is executed using bit wise operations. The experimental results demonstrate that the percentage of wildcard occurrence does not affect the proposed algorithm’s performance and the proposed algorithm is more efficient than the algorithms based on the fast Fourier transform. The proposed algorithm is simple to implement and runs efficiently in O(n + d(n/σ)(m/w)) time, where n is text length, d is symbol distribution over k patterns, m is pattern length, and σ is alphabet size.
Similar content being viewed by others
References
Clifford P, Clifford R. Simple deterministic wildcard matching [J]. Inf Process Lett, 2007, 101(2): 53–54.
Fischer M, Paterson M. String matching and other products [C] // Proceedings of the 7th SIAMAMS Complexity of Computation. Providence:SIAM, 1974: 113–125.
Baeza-Yates R A, Gonnet G H. A new approach to text searching [J]. Communications of the ACM, 1992, 35(10): 74–82.
Indyk P. Faster algorithms for string matching problems: Matching the convolution bound [C] // Proceedings of the 38th Annual Symposium on Foundations of Computer Science. Washington D C: IEEE Press, 1998: 166–173.
Cole R, Hariharan R. Verifying candidate matches in sparse and wildcard matching [J] // 7th Proceedings of the Annual ACM Symposium on Theory of Computing. New York: ACM Press, 2002:592–601.
Rahman M, Iliopoulos C. Pattern matching algorithms with don’t cares [J]. SOFSEM, 2007,(2): 116–126.
Linhart C, Shamir R. Faster pattern matching with character classes using prime number encoding [J]. J Comput Syst Sci, 2009, 75(3): 155–162.
Kalai A. Efficient pattern-matching with don’t cares [C] // Proceedings of the 13th Annual ACM-SIAM Symposium on Discrete Algorithms. New York: ACM Press, 2002: 655–656.
Qiang J, Guo D, Fang Y, et al. Multiple pattern matching with wildcards and one-off condition [J]. Journal of Computational Information Systems, 2013, 9(14): 5543–5552.
Guo D, Hu X, Xie F, et al. Pattern matching with wildcards and gap-length constraints based on a centrality-degree graph [J]. Applied Intelligence, 2013, 39(1): 57–74.
Navarro G, Raffinot M. Fast and simple character classes and bounded gaps pattern matching, with application to protein searching [C] // Proc the 5th Annual International Conference on Computational Biology. New York: ACM Press, 2001: 231–240.
Morgante M, Policriti A, Vitacolonna N, et al. Structured motifs search [J]. Journal of Computational Biology, 2005, 12(8): 1065–1082.
Cole R, Gottlieb L, Lewenstein M. Dictionary matching and indexing with errors and don't cares [C] // Proc the 36th Annual ACM Symposium on the Theory of Computing. New York: ACM Press, 2004: 91–100.
Haapasalo T, Silvasti P, Sippu S, et al. Online dictionary matching with variable-length gaps [C] // Proc the 10th Int Conf Experimental Algorithms. Berlin: Springer-Verlag, 2011: 76–87.
Arslan A N, He D, He Y, et al. Pattern matching with wildcards and length constraints using maximum network flow [J]. Journal of Discrete Algorithms, 2015,(1): 9–16.
Kucherov G, Rusinowitch M. Matching a set of strings with variable length don’t cares [J]. Theor Comput Sci, 1997, 178(1-2): 129–154.
Zhang M, Zhang Y, Hu L. A faster algorithm for matching a set of patterns with variable length don’t cares [J]. Inf Process Lett, 2010, 110(6): 216–220.
Ding B, Lo D, Han J, et al. Efficient mining of closed repetitive gapped subsequences from a sequence database [C] // Proc the 25th IEEE International Conference on Data Engineering, Washington D C: IEEE Press, 2009: 1024–1035.
Wu X, Zhu X, He Y, et al. PMBC: Pattern mining from biological sequences with wildcard constraints [J]. Computers in Biology and Medicine, 2013, 43(5): 481–492.
Barton C, Iliopoulos C S. On the average-case complexity of pattern matching with wildcards [J]. CoRR, 2014, abs/1407.0950.
Fredriksson K, Grabowski S Z. Practical and optimal string matching [C] // Proceedings of the 12th International Symposium on String Processing and Information Retrieval (SPIRE'2005), LNCS 3772. Berlin:Springer-Verlag, 2005: 374–385.
Fredriksson K, Grabowski S. Average-optimal string matching [J]. Journal of Discrete Algorithms, 2009, (5): 579–594.
Navarro G, Raffinot M. Flexible Pattern Matching in Strings-Practical On-line Search Algorithms for Texts and Biological Sequences [M]. Cambridge: Cambridge University Press, 2002.
Peltola H, Tarhio J. Alternative algorithms for bit-parallel string matching [C] // Proceedings of SPIRE'2003, LNCS 2857. Berlin: Springer-Verlag, 2003: 80–94.
Holub J, Durian B. Fast Variants of Bit Parallel Approach to Suffix Automata [R]. Haifa: University of Haifa, 2005.
Durian B, Holub J, Peltola H, et al. Tuning BNDM with q-grams [C] // Proceedings of the Tenth Workshop on Algorithm Engineering and Experiments (ALENEX09). Providence: SIAM, 2009: 29–37.
Bertossi A A, Logi F. Parallel string matching with variable length don't cares [J]. Journal of Parallel and Distributed Computing, 1994, 22(2): 229–234.
Blumer A, Blumer J, Haussler D, et al. The smallest automaton recognizing the subwords of a text [J]. Theoretical Computer Science, 1985, 40(1): 31–55.
Chan H L, Hon W K, Lam W T, et al. Compressed indexes for dynamic text collections [J]. ACM Trans Algorithms, 2007, 3(2): 1–29.
Zhang M, Zhang Y, Tang J. Multi-pattern matching with wildcards [J]. Journal of Software, 2011, 6(12): 2391–2398(Ch).
Acknowledgments
I would like to express my gratitude to Jilin University, which has provided me with a full schoolarship and an excellent environment for my Ph.D. studies and research. In addition, I would like to extend my sincere appreciation to Dr. Zhang Meng for his continuous support and useful discussions.
Author information
Authors and Affiliations
Corresponding author
Additional information
Foundation item: Supported by the European Framework Program (FP7) ( FP7-PEOPLE-2011-IRSES), the National Sci-Tech Support Plan of China (2014BAH02F03)
Biography: Ahmed A. F. Saif, male, Ph.D. candidate, research directions: stringology, network security and algorithm design.
Rights and permissions
About this article
Cite this article
Saif, A.A.F., Hu, L. & Chu, J. Multi-pattern matching algorithm with wildcards based on bit-parallelism. Wuhan Univ. J. Nat. Sci. 22, 178–184 (2017). https://doi.org/10.1007/s11859-017-1232-7
Received:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11859-017-1232-7