Skip to main content
Log in

Mining sequential patterns with periodic wildcard gaps

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

Mining frequent patterns with periodic wildcard gaps is a critical data mining problem to deal with complex real-world problems. This problem can be described as follows: given a subject sequence, a pre-specified threshold, and a variable gap-length with wildcards between each two consecutive letters. The task is to gain all frequent patterns with periodic wildcard gaps. State-of-the-art mining algorithms which use matrices or other linear data structures to solve the problem not only consume a large amount of memory but also run slowly. In this study, we use an Incomplete Nettree structure (the last layer of a Nettree which is an extension of a tree) of a sub-pattern P to efficiently create Incomplete Nettrees of all its super-patterns with prefix pattern P and compute the numbers of their supports in a one-way scan. We propose two new algorithms, MAPB (Mining sequentiAl Pattern using incomplete Nettree with Breadth first search) and MAPD (Mining sequentiAl Pattern using incomplete Nettree with Depth first search), to solve the problem effectively with low memory requirements. Furthermore, we design a heuristic algorithm MAPBOK (MAPB for tOp-K) based on MAPB to deal with the Top-K frequent patterns for each length. Experimental results on real-world biological data demonstrate the superiority of the proposed algorithms in running time and space consumption and also show that the pattern matching approach can be employed to mine special frequent patterns effectively.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Algorithm 1
Algorithm 2
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18
Fig. 19
Fig. 20
Fig. 21
Fig. 22
Fig. 23
Fig. 24
Fig. 25
Fig. 26
Fig. 27

Similar content being viewed by others

References

  1. Kang U, Tsourakakis CE, Appel AP, Faloutsos C, Leskovec J (2011) Hadi: mining radii of large graphs. ACM Trans Knowl Discov Data 5(2):8

    Article  Google Scholar 

  2. Zheng YT, Zha ZJ, Chua TS (2012) Mining travel patterns from geotagged photos. ACM Trans Intell Syst Technol 3(3):56

    Article  Google Scholar 

  3. Liu YH (2013) Stream mining on univariate uncertain data. Appl Intell 39(2):315–344

    Article  Google Scholar 

  4. Agrawal R, Srikant R (1995) Mining sequential patterns. In: Proceedings of international conference on data engineering, San Jose, CA, pp 3–14

    Google Scholar 

  5. Mooney CH, Roddick JF (2013) Sequential pattern mining—approaches and algorithms. ACM Comput Surv 45(2):19

    Article  Google Scholar 

  6. Li Z, Han J, Ji M, Tang LA, Yu Y, Ding B, Lee JG, Kays R (2011) MoveMine: mining moving object data for discovery of animal movement patterns. ACM Trans Intell Syst Technol 2(4):37

    Article  Google Scholar 

  7. Wu SY, Yen E (2009) Data mining-based intrusion detectors. Expert Syst Appl 36(3–1):5605–5612

    Article  Google Scholar 

  8. Huang TCK (2012) Mining the change of customer behavior in fuzzy time-interval sequential patterns. Appl Soft Comput 12(3):1068–1086

    Article  Google Scholar 

  9. Liao VCC, Chen MS (2013) DFSP: a Depth-First SPelling algorithm for sequential pattern mining of biological sequences. Knowl Inf Syst. Published online: 26 January

  10. Hu YH, Chen YL, Tang K (2009) Mining sequential patterns in the B2B environment. J Inf Sci 35(6):677–694

    Article  Google Scholar 

  11. Shie BE, Yu PS, Tseng VS (2013) Mining interesting user behavior patterns in mobile commerce environments. Appl Intell 38(3):418–435

    Article  Google Scholar 

  12. Yin J, Zheng Z, Gao L (2012) USpan: an efficient algorithm for mining high utility sequential patterns. In: Proceedings of the 18th ACM SIGKDD international conference on knowledge discovery and data mining, Beijing, China, pp 660–668

    Chapter  Google Scholar 

  13. Zhu F, Qu Q, Lo D, Yan X, Han J, Yu PS (2011) Mining Top-K large structural patterns in a massive network. Proc VLDB Endow 4(11):807–818

    Google Scholar 

  14. Wu C, Shie BE, Yu PS, Tseng VS (2012) Mining Top-K high utility itemsets. In: Proceedings of the 18th ACM SIGKDD international conference on knowledge discovery and data mining, Beijing, China, pp 78–86

    Chapter  Google Scholar 

  15. Pei J, Han J, Mortazavi-Asl B, Pinto H, Chen Q, Dayal U, Hsu M (2001) PrefixSpan: mining sequential patterns efficiently by prefix-projected pattern growth. In: Proceedings of international conference on data engineering, Heidelberg, Germany, pp 215–224

    Google Scholar 

  16. Rasheed F, Alhajj R (2010) STNR: a suffix tree based noise resilient algorithm for periodicity detection in time series databases. Appl Intell 32(3):267–278

    Article  Google Scholar 

  17. Wang YT, Cheng JT (2011) Mining periodic movement patterns of mobile phone users based on an efficient sampling approach. Appl Intell 35(1):32–40

    Article  Google Scholar 

  18. Yen SJ, Lee YS (2012) Mining time-gap sequential patterns. In: 25th international conference on industrial engineering and other applications of applied intelligent systems, Dalian, China, vol 7345, pp 637–646

    Google Scholar 

  19. Yen SJ, Lee YS (2013) Mining non-redundant time-gap sequential patterns. Appl Intell 39(4):727–738

    Article  MathSciNet  Google Scholar 

  20. Zhang M, Kao B, Cheung DW, Yip KY (2007) Mining periodic patterns with gap requirement from sequences. ACM Trans Knowl Discov Data 1(2):7

    Article  Google Scholar 

  21. Ji X, Bailey J, Dong G (2007) Mining minimal distinguishing subsequence patterns with gap constraints. Knowl Inf Syst 11(3):259–286

    Article  Google Scholar 

  22. Li C, Wang J (2008) Efficiently mining closed subsequences with gap constraints. In: SIAM international conference on data mining, Georgia, USA, pp 313–322

    Google Scholar 

  23. Li C, Yang Q, Wang J, Li M (2012) Efficient mining of gap-constrained subsequences and its various applications. ACM Trans Knowl Discov Data 6(1):2

    Article  MathSciNet  Google Scholar 

  24. Min F, Wu Y, Wu X (2012) The apriori property of sequence pattern mining with wildcard gaps. Int J Funct Inform Personal Med 4(1):15–31

    Google Scholar 

  25. Zhu X, Wu X (2007) Mining complex patterns across sequences with gap requirements. In: Proceedings of the 20th international joint conference on artificial intelligence, Hyderabad, India, pp 2934–2940

    Google Scholar 

  26. He Y, Wu X, Zhu X, Arslan AN (2007) Mining frequent patterns with wildcards from biological sequences. In: IEEE international conference on information reuse and integration, Las Vegas, USA, pp 329–334

    Chapter  Google Scholar 

  27. Xie F, Wu X, Hu X, Gao J, Guo D, Fei Y, Hua E (2010) Sequential pattern mining with wildcards. In: Proceedings of the 22nd international conference on tools with artificial intelligence, Arras, France, pp 241–247

    Google Scholar 

  28. Guo D, Hu X, Xie F, Wu X (2013) Pattern matching with wildcards and gap-length constraints based on a centrality-degree graph. Appl Intell 39(1):57–74

    Article  Google Scholar 

  29. Chen G, Wu X, Zhu X, Arslan AN, He Y (2006) Efficient string matching with wildcards and length constraints. Knowl Inf Syst 10(4):399–419

    Article  Google Scholar 

  30. Ding B, Lo D, Han J, Khoo SC (2009) Efficient mining of closed repetitive gapped subsequences from a sequence database. In: Proceedings of conference on data engineering, Shanghai, China, pp 1024–1035

    Google Scholar 

  31. Ahmed CF, Tanbeer SK, Jeong BS, Lee YK (2011) HUC-Prune: an efficient candidate pruning technique to mine high utility patterns. Appl Intell 34(2):181–198

    Article  Google Scholar 

  32. Wu Y, Wu X, Min F, Li Y (2011) A Nettree for pattern matching with flexible wildcard constraints. In: Proceedings of the 2010 IEEE international conference on information reuse and integration, Las Vegas, USA, pp 109–114

    Google Scholar 

  33. Wu Y, Wu X, Jiang H, Min F (2011) A Nettree for approximate maximal pattern matching with gaps and one-off constraint. In: Proceedings of the 22nd international conference on tools with artificial intelligence, Arras, France, pp 38–41

    Google Scholar 

Download references

Acknowledgements

This research is supported by the National Natural Foundation of China under grants No. 61229301, 61170190, and 61370144, the Program for Changjiang Scholars and Innovative Research Team in University (PCSIRT) of the Ministry of Education, China, under grant IRT13059, the National 863 Program of China under grant 2012AA011005, the National 973 Program of China under grant 2013CB329604, the Natural Science Foundation of Hebei Province of China under grant No. F2013202138, the Key Project of the Educational Commission of Hebei Province under grant No. ZH2012038, and the Industrial Science and Technology Pillar Program of Changzhou, Jiangsu, China, under grant CE20120026.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Youxi Wu.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Wu, Y., Wang, L., Ren, J. et al. Mining sequential patterns with periodic wildcard gaps. Appl Intell 41, 99–116 (2014). https://doi.org/10.1007/s10489-013-0499-4

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-013-0499-4

Keywords

Navigation