Applied Intelligence

, Volume 39, Issue 1, pp 57–74 | Cite as

Pattern matching with wildcards and gap-length constraints based on a centrality-degree graph

Article

Abstract

Pattern matching with wildcards is a challenging topic in many domains, such as bioinformatics and information retrieval. This paper focuses on the problem with gap-length constraints and the one-off condition (The one-off condition means that each character can be used at most once in all occurrences of a pattern in the sequence). It is difficult to achieve the optimal solution. We propose a graph structure WON-Net (WON-Net is a graph structure. It stands for a network with the weighted centralization measure based on each node’s centrality-degree. Its details are given in Definition 4.1) to obtain all candidate matching solutions and then design the WOW (WOW stands for pattern matching with wildcards based on WON-Net) algorithm with the weighted centralization measure based on nodes’ centrality-degrees. We also propose an adjustment mechanism to balance the optimal solutions and the running time. We also define a new variant of WOW as WOW-δ. Theoretical analysis and experiments demonstrate that WOW and WOW-δ are more effective than their peers. Besides, the algorithms demonstrate an advantage on running time by parallel processing.

Keywords

Pattern matching Wildcards Length constraints One-off condition Centrality-degree 

References

  1. 1.
    Pisanti N, Crochemore M, Grossi R, Sagot M-F (2005) Bases of motifs for generating repeated patterns with wild cards. IEEE/ACM Trans Comput Biol Bioinform 2:40–50 CrossRefGoogle Scholar
  2. 2.
    On B-W, Lee I (2011) Meta similarity. Appl Intell 35(3):359–374 CrossRefGoogle Scholar
  3. 3.
    Xiao L, Wissmann D, Brown M, Jablonski S (2004) Information extraction from the web: system and techniques. Appl Intell 21(2):195–224 MATHCrossRefGoogle Scholar
  4. 4.
    Bille P, Gørtz IL, Vildhøj HW, Wind DK (2010) String matching with variable length gaps. In: String processing and information retrieval—17th international symposium, vol 6393, pp 385–394 CrossRefGoogle Scholar
  5. 5.
    Zhou B, Pei J (2012) Aggregate keyword search on large relational databases. Knowl Inf Syst 30(2):283–318 MathSciNetCrossRefGoogle Scholar
  6. 6.
    Hofmann K, Bucher P, Falquet L, Bairoch A (1999) The PROSITE database, its status in 1999. Nucleic Acids Res 27:215–219 CrossRefGoogle Scholar
  7. 7.
    Bucher P, Bairoch A (1994) A generalized profile syntax for biomolecular sequence motifs and its function in automatic sequence interpretation. In: Proceedings of the 2nd international conference on intelligent systems for molecular biology, pp 53–61 Google Scholar
  8. 8.
    Navarro G, Raffinot M (2002) Flexible pattern matching in strings—practical on-line search algorithms for texts and biological sequences. Cambridge University Press, Cambridge MATHGoogle Scholar
  9. 9.
    Cole R, Gottlieb L-A, Lewenstein M (2004) Dictionary matching and indexing with errors and don’t cares. In: Proceedings of the 36th ACM symposium on the theory of computing. ACM, New York, pp 91–100 Google Scholar
  10. 10.
    Ménard PA, Ratté S (2011) Classifier-based acronym extraction for business documents. Knowl Inf Syst 29(2):305–334 CrossRefGoogle Scholar
  11. 11.
    Sánchez D, Isern D (2011) Automatic extraction of acronym definitions from the web. Appl Intell 34(2):311–327 CrossRefGoogle Scholar
  12. 12.
    Ahmed CF, Tanbeer SK, Jeong B-S, Lee Y-K (2011) HUC-prune: an efficient candidate pruning technique to mine high utility patterns. Appl Intell 34(2):181–198 CrossRefGoogle Scholar
  13. 13.
    Shie B-E, Yu PS, Tseng VS (2012) Mining interesting user behavior patterns in mobile commerce environments. Appl Intell. doi:10.1007/s10489-012-0379-3 Google Scholar
  14. 14.
    Agrawal R, Srikant R (1995) Mining sequential patterns. In: Proc of ICDE, Taipei, pp 3–14 Google Scholar
  15. 15.
    Chen G, Wu X, Zhu X, Arslan AN, He Y (2006) Efficient string matching with wildcards and length constraints. Knowl Inf Syst 10(4):399–419 CrossRefGoogle Scholar
  16. 16.
    Fischer MJ, Paterson MS (1974) String matching and other products. Technical report, Massachusetts Institute of Technology, Cambridge, MA, USA Google Scholar
  17. 17.
    Zhang M, Kao B, Cheung DW, Yip KY (2005) Mining periodic patterns with gap requirement from sequences. In: Proceedings of ACM SIGMOD, Baltimore, Maryland, USA, pp 623–633 Google Scholar
  18. 18.
    Ding B, Lo D, Han J, Khoo S (2009) Efficient mining of closed repetitive gapped subsequences from a sequence database. In: Proceedings of IEEE 25th international conference on data engineering (ICDE 09), Shanghai, PR China, 2009. IEEE Comput Soc, Los Alamitos, pp 1024–1035 Google Scholar
  19. 19.
    Min F, Wu X, Lu Z (2009) Pattern matching with independent wildcard gaps. In: Eighth IEEE international conference on dependable, autonomic and secure computing (DASC-2009), Chengdu, China, pp 194–199 CrossRefGoogle Scholar
  20. 20.
    Guo D, Hong X, Hu X, Gao J, Liu Y, Wu G, Wu X (2011) A bit-parallel algorithm for sequential pattern matching with wildcards. Cybern Syst 42(6):382–401 CrossRefGoogle Scholar
  21. 21.
    Wang H, Xie F, Hu X, Li P, Wu X (2010) Pattern matching with flexible wildcards and recurring characters. In: 2010 IEEE international conference on granular computing (GrC 2010), Silicon Valley, USA, 2010. IEEE Comput Soc, Los Alamitos, pp 782–786 CrossRefGoogle Scholar
  22. 22.
    Wu Y, Wu X, Jiang H, Min F (2011) A heuristic algorithm for MPMGOOC. Chin J Comput 32(8):1452–1462 MathSciNetCrossRefGoogle Scholar
  23. 23.
    Chang Y-I, Chen J-R, Hsu M-T (2010) A hash trie filter method for approximate string matching in genomic databases. Appl Intell 33(1):21–38 CrossRefGoogle Scholar
  24. 24.
    He D, Wu X, Zhu X (2007) SAIL-APPROX: an efficient on-line algorithm for approximate pattern matching with wildcards and length constraints. In: Proceedings of the IEEE international conference on bioinformatics and biomedicine (BIBM’07), Silicon Valley, USA, pp 151–158 Google Scholar
  25. 25.
    Dorneles CF, Gonçalves R, Mello RS (2011) Approximate data instance matching: a survey. Knowl Inf Syst 27(1):1–21 CrossRefGoogle Scholar
  26. 26.
    Goethals B, Laurent D, Page WL, Dieng CT (2012) Mining frequent conjunctive queries in relational databases through dependency discovery. Knowl Inf Syst. doi:10.1007/s10115-012-0526-5 MATHGoogle Scholar
  27. 27.
    Xiong N, Funk P (2008) Concise case indexing of time series in health care by means of key sequence discovery. Appl Intell 28(3):247–260 CrossRefGoogle Scholar
  28. 28.
    Xie F, Wu X, Hu X, Gao J, Guo D, Fei Y, Hua E (2011) MAIL: mining sequential patterns with wildcards. Int J Data Min Bioinforma. http://www.inderscience.com/coming.php?ji=189&jc=ijdmb&np=9&jn=International%20Journal%20of%20Data%20Mining%20and%20Bioinformatics
  29. 29.
    Martínez-Trinidad JF, Carrasco-Ochoa JA, Ruiz-Shulcloper J (2011) RP-miner: a relaxed prune algorithm for frequent similar pattern mining. Knowl Inf Syst 27(3):451–471 CrossRefGoogle Scholar
  30. 30.
    Wu Y, Wu X, Min F, Li Y (2010) A nettree for pattern matching with flexible wildcard constraints. In: Proceedings of the 2010 IEEE international conference on information reuse and integration (IRI 2010), Las Vegas, USA, pp 109–114 CrossRefGoogle Scholar
  31. 31.
    Liu Y, Wu X, Hu X, Gao J et al (2009) Pattern matching with wildcards based on key character location. In: Proceedings of the 2009 IEEE international conference on information reuse and integration (IRI 2009), Las Vegas, USA, pp 167–170 Google Scholar
  32. 32.
    National Center for Biotechnology Information (2009) GenBank sequences from pandemic (H1N1) 2009 viruses. http://www.ncbi.nlm.nih.gov/genomes/FLU/SwineFlu.html
  33. 33.

Copyright information

© Springer Science+Business Media New York 2012

Authors and Affiliations

  1. 1.College of Computer Science and Information EngineeringHefei University of TechnologyHefeiChina
  2. 2.Department of Computer Science and TechnologyHefei Normal UniversityHefeiChina
  3. 3.Department of Computer ScienceUniversity of VermontBurlingtonUSA

Personalised recommendations