Journal of Computer Science and Technology

, Volume 29, Issue 5, pp 740–750 | Cite as

Pattern Matching with Flexible Wildcards

Survey

Abstract

Pattern matching with wildcards (PMW) has great theoretical and practical significance in bioinformatics, information retrieval, and pattern mining. Due to the uncertainty of wildcards, not only is the number of all matches exponential with respect to the maximal gap flexibility and the pattern length, but the matching positions in PMW are also hard to choose. The objective to count the maximal number of matches one by one is computationally infeasible. Therefore, rather than solving the generic PMW problem, many research efforts have further defined new problems within PMW according to different application backgrounds. To break through the limitations of either fixing the number or allowing an unbounded number of wildcards, pattern matching with flexible wildcards (PMFW) allows the users to control the ranges of wildcards. In this paper, we provide a survey on the state-of-the-art algorithms for PMFW, with detailed analyses and comparisons, and discuss challenges and opportunities in PMFW research and applications.

Keywords

pattern matching wildcards bioinformatics pattern mining 

Supplementary material

11390_2014_1464_MOESM1_ESM.pdf (212 kb)
ESM 1(PDF 217 kb)

References

  1. [1]
    Cole J R, Chai B, Farris R J et al. The Ribosomal Database Project (RDP-II): Sequences and tools for high-throughput rRNA analysis. Nucleic Acids Research, 2005, 33(Database Issue): 294–296.CrossRefGoogle Scholar
  2. [2]
    Mendivelso J, Pinzon Y, Lee I. Finding overlaps within regular expressions with variable-length gaps. In Proc. the 2013 Research in Adaptive and Convergent Systems, Oct. 2013, pp.16–21.Google Scholar
  3. [3]
    Patnaik D, Laxman S, Chandramouli B, Ramakrishnan N. A general streaming algorithm for pattern discovery. Knowledge and Information Systems, 2013, 37(3): 585–610.CrossRefGoogle Scholar
  4. [4]
    Xie F, Wu X, Hu X et al. Sequential pattern mining with wildcards. In Proc. the 22nd IEEE International Conference on Tools with Artificial Intelligence, Oct. 2010, pp.241–247.Google Scholar
  5. [5]
    Ding B, Lo D, Han J, Khoo S. E±cient mining of closed repet-itive gapped subsequences from a sequence database. In Proc. the 25th IEEE International Conference on Data Engineering, Mar. 29–April 2, 2009, pp.1024–1035.Google Scholar
  6. [6]
    El-Ramly M, Stroulia E, Sorenson P. From run-time behavior to usage scenarios: An interaction-pattern mining approach. In Proc. the 8th ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, July 2002, pp.315–324.Google Scholar
  7. [7]
    Manber U, Baeza-Yates R. An algorithm for string matching with a sequence of don’t cares. Information Processing Letters, 1991, 37(3): 133–136.MathSciNetMATHCrossRefGoogle Scholar
  8. [8]
    de Pablo-Sánchez C, Segura-Bedmar I, Martínez P, Iglesias-Maqueda A. Lightly supervised acquisition of named entities and linguistic patterns for multilingual text mining. Knowledge and Information Systems, 2013, 35(1): 87–109.CrossRefGoogle Scholar
  9. [9]
    Barbieri N, Bonchi F, Manco G. Topic-aware social influence propagation models. Knowledge and Information Systems, 2013, 37(3): 555–584.CrossRefGoogle Scholar
  10. [10]
    Wei Y, Dominique F, Jean-Paul B. An automatic keyphrase extraction system for scientific documents. Knowledge and Information Systems, 2013, 34(3): 691–724.CrossRefGoogle Scholar
  11. [11]
    Fischer M J, Paterson M S. String matching and other products. Technical Report, Massachusetts Institute of Technology, 1974.Google Scholar
  12. [12]
    Muthukrishnan S, Palem K. Non-standard stringology: Algorithms and complexity. In Proc. the 26th Annual ACM Symposium on Theory of Computing, May 1994, pp.770–779.Google Scholar
  13. [13]
    Indyk P. Faster algorithms for string matching problems: Matching the convolution bound. In Proc. the 39th Symp. Foundations of Computer Science, Nov. 1998, pp.166–173.Google Scholar
  14. [14]
    Clifford P, Clifford R. Simple deterministic wildcard matching. Information Processing Letters, 2007, 101(2): 53–54.MathSciNetMATHCrossRefGoogle Scholar
  15. [15]
    Cole R, Hariharan R. Verifying candidate matches in sparse and wildcard matching. In Proc. the 34th Annual ACM Symposium on Theory of Computing, May 2002, pp.592–601.Google Scholar
  16. [16]
    Guo D, Hu X, Xie F, Wu X. Pattern matching with wildcards and gap-length constraints based on a centrality-degree graph. Applied Intelligence, 2013, 39(1): 57–74.CrossRefGoogle Scholar
  17. [17]
    Navarro G, Raffinot M. Fast and simple character classes and bounded gaps pattern matching, with application to protein searching. In Proc. the 5th Annual International Conference on Computational Biology, April 2001, pp.231–240.Google Scholar
  18. [18]
    Morgante M, Policriti A, Vitacolonna N, Zuccolo A. Structured motifs search. Journal of Computational Biology, 2005, 12(8): 1065–1082.CrossRefGoogle Scholar
  19. [19]
    Cole R, Gottlieb L, Lewenstein M. Dictionary matching and indexing with errors and don’t cares. In Proc. the 36th Annual ACM Symposium on the Theory of Computing, June 2004, pp.91–100.Google Scholar
  20. [20]
    Kalai A. Efficient pattern-matching with don’t cares. In Proc. the 13th Annual ACM-SIAM Symposium on Discrete Algorithms, January 2002, pp.655–656.Google Scholar
  21. [21]
    Haapasalo T, Silvasti P, Sippu S et al. Online dictionary matching with variable-length gaps. In Proc. the 10th Int. Conf. Experimental Algorithms, May 2011, pp.76–87.Google Scholar
  22. [22]
    Kucherov G, Rusinowitch M. Matching a set of strings with variable length don’t cares. Theoretical Computer Science, 1997, 178(1/2): 129–154.MathSciNetMATHCrossRefGoogle Scholar
  23. [23]
    Zhang M, Zhang Y, Hu L. A faster algorithm for matching a set of patterns with variable length don’t cares. Information Processing Letter, 2010, 110(6): 216–220.MATHCrossRefGoogle Scholar
  24. [24]
    Wu X, Zhu X, He Y, Arslan A N. PMBC: Pattern mining from biological sequences with wildcard constraints. Computers in Biology and Medicine, 2013, 43(5): 481–492.CrossRefGoogle Scholar
  25. [25]
    Rahman M S, Iliopoulos C S, Lee I et al. Finding patterns with variable length gaps or don’t cares. In Proc. the 12th Annual International Computing and Combinatorics Conference, August 2006, pp.146–155.Google Scholar
  26. [26]
    Bille P, Gørtz I L, Vildhøj H W, Wind D K. String matching with variable length gaps. Theoretical Computer Science, 2012, 443(20): 25–34.MathSciNetMATHCrossRefGoogle Scholar
  27. [27]
    Min F, Wu X, Lu Z. Pattern matching with independent wildcard gaps. In Proc. the 8th IEEE Int. Conf. Dependable, Autonomic and Secure Computing, December 2009, pp.194–199.Google Scholar
  28. [28]
    Zhu X, Wu X. Mining complex patterns across sequences with gap requirements. In Proc. the 20th Int. Joint Conf. Artificial Intelligence, January 2007, pp.2934–2940.Google Scholar
  29. [29]
    Chen G, Wu X, Zhu X, Arslan A, He Y. Efficient string matching with wildcards and length constraints. Knowledge and Information Systems, 2006, 10(4): 399–419.CrossRefGoogle Scholar
  30. [30]
    Guo D, Hong X, Hu X et al. A bit-parallel algorithm for sequential pattern matching with wildcards. Cybernetics and Systems, 2011, 42(6): 382–401.CrossRefGoogle Scholar
  31. [31]
    Lin P C, Li Z X, Lin Y D et al. Profiling and accelerating string matching algorithms in three network content security applications. IEEE Communications Surveys and Tutorials, 2006, 8(2): 24–37.CrossRefGoogle Scholar
  32. [32]
    Aho A V, Corasick M J. Efficient string matching: An aid to bibliographic search. Communications of the ACM, 1975, 18(6): 333–340.MathSciNetMATHCrossRefGoogle Scholar
  33. [33]
    Tuck N, Sherwood T, Calder B, Varghese G. Deterministic memory-efficient string matching algorithms for intrusion detection. In Proc. the 23rd Annual Joint Conference of the IEEE Computer and Communications Societies, March 2004, pp.2628–2639.Google Scholar
  34. [34]
    Norton M. Optimizing pattern matching for intrusion detection. http://pdf.aminer.org/000/309/890/optimizing pattern match.pdf, July 2014.
  35. [35]
    Boyer R S, Moore J S. A fast string searching algorithm. Communications of the ACM, 1977, 20(10): 762–772.MATHCrossRefGoogle Scholar
  36. [36]
    Wu S, Manber U. A fast algorithm for multi-pattern searching. Technical Report TR-94-17, University of Arizona, 1994.Google Scholar
  37. [37]
    Muth R, Manber U. Approximate multiple string search. In Proc. the 7th Annual Symposium on Combinatorial Pattern Matching (CPM), June 1996, pp.75–86.Google Scholar
  38. [38]
    Karp R M, Rabin M O. Efficient randomized pattern matching algorithms. IBM Journal of Research and Development, 1987, 31(2): 249–260.MathSciNetMATHCrossRefGoogle Scholar
  39. [39]
    Baeza-Yates R, Gonnet G H. A new approach to text searching. Communications of the ACM, 1992, 35(10): 74–82.CrossRefGoogle Scholar
  40. [40]
    Navarro G, Raffinot M. A bit-parallel approach to suffix automata: Fast extended string matching. In Proc. the 9th Annual Symp. Combinatorial Pattern Matching, July 1998, pp.14–33.Google Scholar
  41. [41]
    Navarro G. A guided tour to approximate string matching. ACM Computing Surveys, 2001, 33(1): 31–88.CrossRefGoogle Scholar
  42. [42]
    Kim S, Kim Y. A fast multiple string pattern matching algorithm. In Proc. the 17th AoM/IAoM Conference on Computer Science, August 1999.Google Scholar
  43. [43]
    Agrawal R, Srikant R. Mining sequential patterns. In Proc. the 11th Int. Conf. Data Engineering, March 1995, pp.3–14.Google Scholar
  44. [44]
    Akutsu T. Approximate string matching with variable length don’t care characters. Information Processing Letters, 1995, 55(5): 235–239.MathSciNetCrossRefGoogle Scholar
  45. [45]
    Lee I, Apostolico A, Iliopoulos C S, Park K. Finding approximate occurrences of a pattern that contains gaps. In Proc. the 14th Australasian Workshop on Combinatorial Algorithms, July 2003, pp.89–100.Google Scholar
  46. [46]
    Zhang M, Kao B, Cheung D W, Yip K. Mining periodic patterns with gap requirement from sequences. In Proc. the ACM SIGMOD International Conference on Management of Data, June 2005, pp.623–633.Google Scholar
  47. [47]
    Min F, Wu X. A comparative study of pattern matching algorithms on sequences. In Proc. the 12th International Conference on Rough Sets, Fuzzy Sets, Data Mining and Granular Computing, Dec. 2009, pp.510–517.Google Scholar
  48. [48]
    Wang H, Xie F, Hu X, Li P, Wu X. Pattern matching with exible wildcards and recurring characters. In Proc. the 2010 IEEE International Conference on Granular Computing, Aug. 2010, pp.782–786.Google Scholar
  49. [49]
    Wu Y, Wu X, Jiang H, Min F. A heuristic algorithm for MP-MGOOC. Chinese Journal of Computers, 2011, 34(8): 1452–1462. (in Chinese)MathSciNetCrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media New York 2014

Authors and Affiliations

  1. 1.Department of Computer ScienceHefei University of TechnologyHefeiChina
  2. 2.Department of Computer ScienceUniversity of VermontBurlingtonU.S.A.
  3. 3.Department of Computer Science and TechnologyHefei Normal UniversityHefeiChina

Personalised recommendations