Mining sequential patterns with periodic wildcard gaps

Wu, Youxi; Wang, Lingling; Ren, Jiadong; Ding, Wei; Wu, Xindong

doi:10.1007/s10489-013-0499-4

Mining sequential patterns with periodic wildcard gaps

Published: 22 January 2014

Volume 41, pages 99–116, (2014)
Cite this article

Applied Intelligence Aims and scope Submit manuscript

Youxi Wu¹,
Lingling Wang¹,
Jiadong Ren²,
Wei Ding³ &
…
Xindong Wu^4,5

714 Accesses
46 Citations
Explore all metrics

Abstract

Mining frequent patterns with periodic wildcard gaps is a critical data mining problem to deal with complex real-world problems. This problem can be described as follows: given a subject sequence, a pre-specified threshold, and a variable gap-length with wildcards between each two consecutive letters. The task is to gain all frequent patterns with periodic wildcard gaps. State-of-the-art mining algorithms which use matrices or other linear data structures to solve the problem not only consume a large amount of memory but also run slowly. In this study, we use an Incomplete Nettree structure (the last layer of a Nettree which is an extension of a tree) of a sub-pattern P to efficiently create Incomplete Nettrees of all its super-patterns with prefix pattern P and compute the numbers of their supports in a one-way scan. We propose two new algorithms, MAPB (Mining sequentiAl Pattern using incomplete Nettree with Breadth first search) and MAPD (Mining sequentiAl Pattern using incomplete Nettree with Depth first search), to solve the problem effectively with low memory requirements. Furthermore, we design a heuristic algorithm MAPBOK (MAPB for tOp-K) based on MAPB to deal with the Top-K frequent patterns for each length. Experimental results on real-world biological data demonstrate the superiority of the proposed algorithms in running time and space consumption and also show that the pattern matching approach can be employed to mine special frequent patterns effectively.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

An efficient join operations for utility list-based high-utility mining approaches using hybrid search technique

Article 12 April 2024

The pattern frequency distribution theory: a mathematic establishment toward rational and reliable pattern mining

Article 20 August 2022

Incremental mining of high utility sequential patterns using MapReduce paradigm

Article 08 November 2021

References

Kang U, Tsourakakis CE, Appel AP, Faloutsos C, Leskovec J (2011) Hadi: mining radii of large graphs. ACM Trans Knowl Discov Data 5(2):8
Article Google Scholar
Zheng YT, Zha ZJ, Chua TS (2012) Mining travel patterns from geotagged photos. ACM Trans Intell Syst Technol 3(3):56
Article Google Scholar
Liu YH (2013) Stream mining on univariate uncertain data. Appl Intell 39(2):315–344
Article Google Scholar
Agrawal R, Srikant R (1995) Mining sequential patterns. In: Proceedings of international conference on data engineering, San Jose, CA, pp 3–14
Google Scholar
Mooney CH, Roddick JF (2013) Sequential pattern mining—approaches and algorithms. ACM Comput Surv 45(2):19
Article Google Scholar
Li Z, Han J, Ji M, Tang LA, Yu Y, Ding B, Lee JG, Kays R (2011) MoveMine: mining moving object data for discovery of animal movement patterns. ACM Trans Intell Syst Technol 2(4):37
Article Google Scholar
Wu SY, Yen E (2009) Data mining-based intrusion detectors. Expert Syst Appl 36(3–1):5605–5612
Article Google Scholar
Huang TCK (2012) Mining the change of customer behavior in fuzzy time-interval sequential patterns. Appl Soft Comput 12(3):1068–1086
Article Google Scholar
Liao VCC, Chen MS (2013) DFSP: a Depth-First SPelling algorithm for sequential pattern mining of biological sequences. Knowl Inf Syst. Published online: 26 January
Hu YH, Chen YL, Tang K (2009) Mining sequential patterns in the B2B environment. J Inf Sci 35(6):677–694
Article Google Scholar
Shie BE, Yu PS, Tseng VS (2013) Mining interesting user behavior patterns in mobile commerce environments. Appl Intell 38(3):418–435
Article Google Scholar
Yin J, Zheng Z, Gao L (2012) USpan: an efficient algorithm for mining high utility sequential patterns. In: Proceedings of the 18th ACM SIGKDD international conference on knowledge discovery and data mining, Beijing, China, pp 660–668
Chapter Google Scholar
Zhu F, Qu Q, Lo D, Yan X, Han J, Yu PS (2011) Mining Top-K large structural patterns in a massive network. Proc VLDB Endow 4(11):807–818
Google Scholar
Wu C, Shie BE, Yu PS, Tseng VS (2012) Mining Top-K high utility itemsets. In: Proceedings of the 18th ACM SIGKDD international conference on knowledge discovery and data mining, Beijing, China, pp 78–86
Chapter Google Scholar
Pei J, Han J, Mortazavi-Asl B, Pinto H, Chen Q, Dayal U, Hsu M (2001) PrefixSpan: mining sequential patterns efficiently by prefix-projected pattern growth. In: Proceedings of international conference on data engineering, Heidelberg, Germany, pp 215–224
Google Scholar
Rasheed F, Alhajj R (2010) STNR: a suffix tree based noise resilient algorithm for periodicity detection in time series databases. Appl Intell 32(3):267–278
Article Google Scholar
Wang YT, Cheng JT (2011) Mining periodic movement patterns of mobile phone users based on an efficient sampling approach. Appl Intell 35(1):32–40
Article Google Scholar
Yen SJ, Lee YS (2012) Mining time-gap sequential patterns. In: 25th international conference on industrial engineering and other applications of applied intelligent systems, Dalian, China, vol 7345, pp 637–646
Google Scholar
Yen SJ, Lee YS (2013) Mining non-redundant time-gap sequential patterns. Appl Intell 39(4):727–738
Article MathSciNet Google Scholar
Zhang M, Kao B, Cheung DW, Yip KY (2007) Mining periodic patterns with gap requirement from sequences. ACM Trans Knowl Discov Data 1(2):7
Article Google Scholar
Ji X, Bailey J, Dong G (2007) Mining minimal distinguishing subsequence patterns with gap constraints. Knowl Inf Syst 11(3):259–286
Article Google Scholar
Li C, Wang J (2008) Efficiently mining closed subsequences with gap constraints. In: SIAM international conference on data mining, Georgia, USA, pp 313–322
Google Scholar
Li C, Yang Q, Wang J, Li M (2012) Efficient mining of gap-constrained subsequences and its various applications. ACM Trans Knowl Discov Data 6(1):2
Article MathSciNet Google Scholar
Min F, Wu Y, Wu X (2012) The apriori property of sequence pattern mining with wildcard gaps. Int J Funct Inform Personal Med 4(1):15–31
Google Scholar
Zhu X, Wu X (2007) Mining complex patterns across sequences with gap requirements. In: Proceedings of the 20th international joint conference on artificial intelligence, Hyderabad, India, pp 2934–2940
Google Scholar
He Y, Wu X, Zhu X, Arslan AN (2007) Mining frequent patterns with wildcards from biological sequences. In: IEEE international conference on information reuse and integration, Las Vegas, USA, pp 329–334
Chapter Google Scholar
Xie F, Wu X, Hu X, Gao J, Guo D, Fei Y, Hua E (2010) Sequential pattern mining with wildcards. In: Proceedings of the 22nd international conference on tools with artificial intelligence, Arras, France, pp 241–247
Google Scholar
Guo D, Hu X, Xie F, Wu X (2013) Pattern matching with wildcards and gap-length constraints based on a centrality-degree graph. Appl Intell 39(1):57–74
Article Google Scholar
Chen G, Wu X, Zhu X, Arslan AN, He Y (2006) Efficient string matching with wildcards and length constraints. Knowl Inf Syst 10(4):399–419
Article Google Scholar
Ding B, Lo D, Han J, Khoo SC (2009) Efficient mining of closed repetitive gapped subsequences from a sequence database. In: Proceedings of conference on data engineering, Shanghai, China, pp 1024–1035
Google Scholar
Ahmed CF, Tanbeer SK, Jeong BS, Lee YK (2011) HUC-Prune: an efficient candidate pruning technique to mine high utility patterns. Appl Intell 34(2):181–198
Article Google Scholar
Wu Y, Wu X, Min F, Li Y (2011) A Nettree for pattern matching with flexible wildcard constraints. In: Proceedings of the 2010 IEEE international conference on information reuse and integration, Las Vegas, USA, pp 109–114
Google Scholar
Wu Y, Wu X, Jiang H, Min F (2011) A Nettree for approximate maximal pattern matching with gaps and one-off constraint. In: Proceedings of the 22nd international conference on tools with artificial intelligence, Arras, France, pp 38–41
Google Scholar

Download references

Acknowledgements

This research is supported by the National Natural Foundation of China under grants No. 61229301, 61170190, and 61370144, the Program for Changjiang Scholars and Innovative Research Team in University (PCSIRT) of the Ministry of Education, China, under grant IRT13059, the National 863 Program of China under grant 2012AA011005, the National 973 Program of China under grant 2013CB329604, the Natural Science Foundation of Hebei Province of China under grant No. F2013202138, the Key Project of the Educational Commission of Hebei Province under grant No. ZH2012038, and the Industrial Science and Technology Pillar Program of Changzhou, Jiangsu, China, under grant CE20120026.

Author information

Authors and Affiliations

School of Computer Science and Engineering, Hebei University of Technology, Tianjin, 300130, China
Youxi Wu & Lingling Wang
School of Information Science and Engineering, Yanshan University, Qinhuangdao, 066004, China
Jiadong Ren
Department of Computer Science, University of Massachusetts Boston, Boston, 02125, USA
Wei Ding
School of Computer Science and Information Engineering, Hefei University of Technology, Hefei, 230009, China
Xindong Wu
Department of Computer Science, University of Vermont, Burlington, VT, 05405, USA
Xindong Wu

Authors

Youxi Wu
View author publications
You can also search for this author in PubMed Google Scholar
Lingling Wang
View author publications
You can also search for this author in PubMed Google Scholar
Jiadong Ren
View author publications
You can also search for this author in PubMed Google Scholar
Wei Ding
View author publications
You can also search for this author in PubMed Google Scholar
Xindong Wu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Youxi Wu.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Wu, Y., Wang, L., Ren, J. et al. Mining sequential patterns with periodic wildcard gaps. Appl Intell 41, 99–116 (2014). https://doi.org/10.1007/s10489-013-0499-4

Download citation

Published: 22 January 2014
Issue Date: July 2014
DOI: https://doi.org/10.1007/s10489-013-0499-4

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Mining sequential patterns with periodic wildcard gaps

Abstract

Access this article

Similar content being viewed by others

An efficient join operations for utility list-based high-utility mining approaches using hybrid search technique

The pattern frequency distribution theory: a mathematic establishment toward rational and reliable pattern mining

Incremental mining of high utility sequential patterns using MapReduce paradigm

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Mining sequential patterns with periodic wildcard gaps

Abstract

Access this article

Similar content being viewed by others

An efficient join operations for utility list-based high-utility mining approaches using hybrid search technique

The pattern frequency distribution theory: a mathematic establishment toward rational and reliable pattern mining

Incremental mining of high utility sequential patterns using MapReduce paradigm

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation