Advertisement

Knowledge and Information Systems

, Volume 10, Issue 4, pp 399–419 | Cite as

Efficient string matching with wildcards and length constraints

  • Gong ChenEmail author
  • Xindong Wu
  • Xingquan Zhu
  • Abdullah N. Arslan
  • Yu He
Regular Paper

Abstract

This paper defines a challenging problem of pattern matching between a pattern P and a text T, with wildcards and length constraints, and designs an efficient algorithm to return each pattern occurrence in an online manner. In this pattern matching problem, the user can specify the constraints on the number of wildcards between each two consecutive letters of P and the constraints on the length of each matching substring in T. We design a complete algorithm, SAIL that returns each matching substring of P in T as soon as it appears in T in an O(n+klmg) time with an O(lm) space overhead, where n is the length of T, k is the frequency of P's last letter occurring in T, l is the user-specified maximum length for each matching substring, m is the length of P, and g is the maximum difference between the user-specified maximum and minimum numbers of wildcards allowed between two consecutive letters in P.

Keywords

String matching Wildcards Constraints 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Akutsu T (1996) Approximate string matching with variable length don't care characters. IEICE Trans Info Syst E79-D(9):1353–1354Google Scholar
  2. 2.
    Cole R, Gottlieb LA, Lewenstein M (2004) Dictionary matching and indexing with errors and don't cares. In: Proceedings of the 36th ACM Symposium on the Theory of Computing. ACM Press, New York, NY, USA, pp 91–100Google Scholar
  3. 3.
    Crochemore M, Hancart C (1997) Automata for matching patterns. In: Rosenberg G, Salomaa A (eds) Handbook of formal languages, vol 2, Linear Modeling. Springer-Verlag, New York, NY, USAGoogle Scholar
  4. 4.
    Fischer MJ, Paterson MS (1974) String matching and other products. In: Karp RM (ed) Complexity of computation, vol 7. Massachusetts Institute of Technology, Cambridge, MA, USA, pp 113–125Google Scholar
  5. 5.
    Gusfield D (1997) Algorithms on strings, trees, and sequences–Computer science and computational biology. Cambridge University Press, CambridgeGoogle Scholar
  6. 6.
    Indyk P (1998) Faster algorithms for string matching problems: Matching the convolution bound. In: Proceedings of the 39th Symposium on Foundations of Computer Science. IEEE Computer Society, Washington, DC, USA, p 166Google Scholar
  7. 7.
    Kalai A (2002) Efficient pattern-matching with don't cares. In: Proceedings of the 13th ACM-SIAM Symposium on Discrete Algorithms. Society for Industrial and Applied Mathematics, Philadelphia, PA, USA, pp 655–656Google Scholar
  8. 8.
    Kucherov G, Rusinowitch M (1995) Matching a set of strings with variable length don't cares. In: Proceedings of the 6th Symposium on Combinatorial Pattern Matching. Springer, Berlin Heidelberg New York, pp 230–247Google Scholar
  9. 9.
    Lin MY, Lee SY (2005) Efficient mining of sequential patterns with time constraints by delimited pattern growth. Knowl Inf Syst 7(4):499–514CrossRefMathSciNetGoogle Scholar
  10. 10.
    Manber U, Baeza-Yates R (1991) An algorithm for string matching with a sequence of don't cares. Inf. Proc. Lett. 37(3):133–136CrossRefzbMATHMathSciNetGoogle Scholar
  11. 11.
    Muthukrishan S, Palem K (1994) Non-standard stringology: Algorithms and complexity. In: Proceedings of the 26th ACM Symposium on the Theory of Computing. ACM Press, New York, NY, USA, pp 770–779Google Scholar
  12. 12.
    Pei J, Han J (2002) Constrained frequent pattern mining: A pattern-growth view. SIGKDD Explor 4(1):31–39Google Scholar
  13. 13.
    Srikant R, Agrawal R (1996) Mining sequential patterns: Generalized and performance improvements. In: Proceedings of the 5th International Conference on Extending Database Technology. Springer, Berlin Heidelberg New York, pp 3–17Google Scholar
  14. 14.
    Tzvetkov P, Yan X, Han J (2005) TSP: Mining top-k closed sequential patterns. Knowl Inf Syst 7(4):438–457CrossRefGoogle Scholar
  15. 15.
    Waterman MS (1995) Introduction to computational biology. Chapman & Hall/CRC, LondonzbMATHGoogle Scholar
  16. 16.
    Yang J, Wang W, Yu PS (2004) Discovering high-order periodic patterns. Knowl Inf Syst 6(3):243–268CrossRefzbMATHGoogle Scholar
  17. 17.
    Zaki MJ (2000) Sequence mining in categorical domains: Incorporating constraints. In: Proceedings of the 9th International Conference on Information and Knowledge Management. ACM Press, New York, NY, USA, pp 422–429Google Scholar

Copyright information

© Springer-Verlag London Limited 2006

Authors and Affiliations

  • Gong Chen
    • 1
    Email author
  • Xindong Wu
    • 1
  • Xingquan Zhu
    • 1
  • Abdullah N. Arslan
    • 1
  • Yu He
    • 1
  1. 1.Department of Computer ScienceUniversity of VermontBurlingtonUSA

Personalised recommendations