Abstract
The closed contiguous sequential pattern combines the advantages of closedness constraints and contiguity constraints and in recent years has been widely used in the fields of sequence classification, traffic trajectory visualization and football player trajectory analysis. Most of the previously developed closed contiguous sequential pattern mining algorithms pose some challenges. For instance, CCSpan, BP-CCSM, and LCCspm cannot mine the large-scale sequence database with reasonable time and memory usage, while C3Ro, which can mine patterns with multiple constraints, does not consider the specificity induced by the contiguity constraint of the pattern. To address these problems and improve the efficiency of mining closed contiguous sequential patterns, in this paper, we present an algorithm called CCSMP based on the pattern relation graph. Pattern relation graph is a novel data structure that has some key properties related to closed contiguous sequential pattern mining. In the experimental section, we not only conducted extensive experiments on real datasets to evaluate the performance and scalability of CCSMP but also analyzed the running time of each step of CCSMP to verify the effectiveness of the pattern relation graph. The experimental results show that CCSMP outperforms the existing state-of-the-art algorithm in most cases and that the use of the pattern relation graph can significantly reduce the time for closure checking.
Similar content being viewed by others
References
Srikant R, Agrawal R (1996) Mining sequential patterns: generalizations and performance improvements. In: Advances in database technology–EDBT’96: 5th international conference on extending database technology Avignon, France, March 25–29, 1996 Proceedings 5. Springer, pp 1–17
Yang C, Gidófalvi G (2018) Mining and visual exploration of closed contiguous sequential patterns in trajectories. Int J Geogr Inf Sci 32(7):1282–1304
Goo Y-H, Shim K-S, Lee M-S, Kim M-S (2019) Protocol specification extraction based on contiguous sequential pattern algorithm. IEEE Access 7:36057–36074
Li C, Yang Q, Wang J, Li M (2012) Efficient mining of gap-constrained subsequences and its various applications. ACM Trans Knowl Discov Data (TKDD) 6(1):1–39
Zhang M, Kao B, Cheung DW, Yip KY (2007) Mining periodic patterns with gap requirement from sequences. ACM Trans Knowl Discov Data (TKDD) 1(2):7
Wu Y, Tong Y, Zhu X, Wu X (2017) Nosep: nonoverlapping sequence pattern mining with gap constraints. IEEE Trans Cybern 48(10):2809–2822
Abboud Y, Brun A, Boyer A (2019) C3ro: an efficient mining algorithm of extended-closed contiguous robust sequential patterns in noisy data. Expert Syst Appl 131:172–189
Pei J (2004) Mining sequential patterns by pattern-growth: the prefixspan approach. IEEE Trans Knowl Data Eng 16
Ayres J, Flannick J, Gehrke J, Yiu T (2002) Sequential pattern mining using a bitmap representation. In: Proceedings of the eighth ACM SIGKDD international conference on knowledge discovery and data mining, pp 429–435
Fournier-Viger P, Gomariz A, Campos M, Thomas R (2014) Fast vertical mining of sequential patterns using co-occurrence information. In: Advances in knowledge discovery and data mining: 18th Pacific-Asia conference, PAKDD 2014, Tainan, Taiwan, May 13-16, 2014. Proceedings, Part I 18. Springer, pp 40–52
Zaki MJ (2001) Spade: an efficient algorithm for mining frequent sequences. Mach Learn 42:31–60
Pasquier N, Bastide Y, Taouil R, Lakhal L (1999) Discovering frequent closed itemsets for association rules. In: Database theory–ICDT’99: 7th international conference Jerusalem, Israel, January 10–12, 1999 Proceedings 7, Springer, pp 398–416
Yan X, Han J, Afshar R (2003) Clospan: mining: closed sequential patterns in large datasets. In: Proceedings of the 2003 SIAM international conference on data mining, SIAM, pp 166–177
Fürnkranz J (1998) A study using n-gram features for text categorization. Aust Res Inst Artif Intell 3(1998):1–10
Chen J, Cook T (2007) Mining contiguous sequential patterns from web logs. In: Proceedings of the 16th international conference on world wide web, pp 1177–1178
Wang J, Han J (2004) Bide: Efficient mining of frequent closed sequences. In: Proceedings. 20th international conference on data engineering. IEEE, pp 79–90
Gomariz A, Campos M, Marin R, Goethals B (2013) Clasp: an efficient algorithm for mining frequent closed sequences. In: Advances in knowledge discovery and data mining: 17th Pacific-Asia conference, PAKDD 2013, Gold Coast, Australia, April 14-17, 2013, Proceedings, Part I 17. Springer, pp 50–61
Fumarola F, Lanotte PF, Ceci M, Malerba D (2016) Clofast: closed sequential pattern mining using sparse and vertical id-lists. Knowl Inf Syst 48:429–463
Zhang J, Wang Y, Yang D (2015) Ccspan: mining closed contiguous sequential patterns. Knowl-Based Syst 89:1–13
Farzana Zerin S, Jeong B-S (2011) A fast contiguous sequential pattern mining technique in dna data sequences using position information. IETE Tech Rev 28(6):511–519
Zhang J, Wang Y, Zhang C, Shi Y (2015) Mining contiguous sequential generators in biological sequences. IEEE/ACM Trans Comput Biol Bioinform 13(5):855–867
Gan S, Deng H, Qiu Y, Alshahrani M, Liu S (2022) Dsae-impute: learning discriminative stacked autoencoders for imputing single-cell rna-seq data. Curr Bioinform 17(5):440–451
Niranjan U, Subramanyam R, Khanaa V (2010) Developing a web recommendation system based on closed sequential patterns. In: Information and communication technologies: international conference, ICT 2010, Kochi, Kerala, India, September 7-9, 2010. Proceedings. Springer, pp 171–179
Bermingham L, Lee I (2020) Mining distinct and contiguous sequential patterns from large vehicle trajectories. Knowl-Based Syst 189:105076
Ding S, Li Z, Zhang K, Mao F (2022) A comparative study of frequent pattern mining with trajectory data. Sensors 22(19):7608
Adeyemo VE, Palczewska A, Jones B (2021) Lccspm: l-length closed contiguous sequential patterns mining algorithm to find frequent athlete movement patterns from gps. In: 2021 20th IEEE international conference on machine learning and applications (ICMLA). IEEE, pp 455–460
Abboud Y, Boyer A, Brun A (2017) Ccpm: a scalable and noise-resistant closed contiguous sequential patterns mining algorithm. In: Machine learning and data mining in pattern recognition: 13th international conference, MLDM 2017, New York, NY, USA, July 15-20, 2017, Proceedings 13. Springer, pp 147–162
Agrawal R, Imieliński T, Swami A (1993) Mining association rules between sets of items in large databases. In: Proceedings of the 1993 ACM SIGMOD international conference on management of data, pp 207–216
Wu Y, Wang X, Li Y, Guo L, Li Z, Zhang J, Wu X (2022) Owsp-miner: self-adaptive one-off weak-gap strong pattern mining. ACM Trans Manag Inf Syst (TMIS) 13(3):1–23
Nawaz MS, Fournier-Viger P, Shojaee A, Fujita H (2021) Using artificial intelligence techniques for covid-19 genome analysis. Appl Intell 51:3086–3103
Greenfeld JS (2002) Matching gps observations to locations on a digital map. In: Transportation research board 81st annual meeting, vol 22, pp 576–582
Huang G, Gan W, Huang S, Chen J (2022) Negative pattern discovery with individual support. Knowl-Based Syst 251:109194
Wu Y, Yuan Z, Li Y, Guo L, Fournier-Viger P, Wu X (2022) Nwp-miner: nonoverlapping weak-gap sequential pattern mining. Inf Sci 588:124–141
Karim MR, Hossain MA, Rashid MM, Jeong B-S, Choi H-J (2012) A mapreduce framework for mining maximal contiguous frequent patterns in large dna sequence datasets. IETE Tech Rev 29(2):162–168
Karim MR, Rashid MM, Jeong B-S, Choi H-J (2012) An efficient approach to mining maximal contiguous frequent patterns from large dna sequence databases. Genom Inform 10(1):51–57
Li Y, Zhang S, Guo L, Liu J, Wu Y, Wu X (2022) Netnmsp: nonoverlapping maximal sequential pattern mining. Appl Intell 1–24
Acknowledgements
This work was supported by the National Natural Science Foundation of China (Grant No. 62102158), the 2021 Foshan support project for promoting the development of the university scientific and technological achievements service industry (2021DZXX05) and the National Innovation and Entrepreneurship Training Program for Undergraduates.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Hu, H., Zhang, J., Xia, R. et al. CCSMP: an efficient closed contiguous sequential pattern mining algorithm with a pattern relation graph. Appl Intell 53, 29723–29740 (2023). https://doi.org/10.1007/s10489-023-05118-x
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10489-023-05118-x