Skip to main content
Log in

CCSMP: an efficient closed contiguous sequential pattern mining algorithm with a pattern relation graph

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

The closed contiguous sequential pattern combines the advantages of closedness constraints and contiguity constraints and in recent years has been widely used in the fields of sequence classification, traffic trajectory visualization and football player trajectory analysis. Most of the previously developed closed contiguous sequential pattern mining algorithms pose some challenges. For instance, CCSpan, BP-CCSM, and LCCspm cannot mine the large-scale sequence database with reasonable time and memory usage, while C3Ro, which can mine patterns with multiple constraints, does not consider the specificity induced by the contiguity constraint of the pattern. To address these problems and improve the efficiency of mining closed contiguous sequential patterns, in this paper, we present an algorithm called CCSMP based on the pattern relation graph. Pattern relation graph is a novel data structure that has some key properties related to closed contiguous sequential pattern mining. In the experimental section, we not only conducted extensive experiments on real datasets to evaluate the performance and scalability of CCSMP but also analyzed the running time of each step of CCSMP to verify the effectiveness of the pattern relation graph. The experimental results show that CCSMP outperforms the existing state-of-the-art algorithm in most cases and that the use of the pattern relation graph can significantly reduce the time for closure checking.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Algorithm 1
Algorithm 2
Algorithm 3
Algorithm 4
Algorithm 5
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

Notes

  1. https://www.philippe-fournier-viger.com/spmf/datasets/kosarak_sequences.txt

  2. https://www.ncbi.nlm.nih.gov/

  3. https://www.pkbigdata.com/common/zhzgbCmptDetails.html

References

  1. Srikant R, Agrawal R (1996) Mining sequential patterns: generalizations and performance improvements. In: Advances in database technology–EDBT’96: 5th international conference on extending database technology Avignon, France, March 25–29, 1996 Proceedings 5. Springer, pp 1–17

  2. Yang C, Gidófalvi G (2018) Mining and visual exploration of closed contiguous sequential patterns in trajectories. Int J Geogr Inf Sci 32(7):1282–1304

    Article  Google Scholar 

  3. Goo Y-H, Shim K-S, Lee M-S, Kim M-S (2019) Protocol specification extraction based on contiguous sequential pattern algorithm. IEEE Access 7:36057–36074

    Article  Google Scholar 

  4. Li C, Yang Q, Wang J, Li M (2012) Efficient mining of gap-constrained subsequences and its various applications. ACM Trans Knowl Discov Data (TKDD) 6(1):1–39

    Article  Google Scholar 

  5. Zhang M, Kao B, Cheung DW, Yip KY (2007) Mining periodic patterns with gap requirement from sequences. ACM Trans Knowl Discov Data (TKDD) 1(2):7

    Article  Google Scholar 

  6. Wu Y, Tong Y, Zhu X, Wu X (2017) Nosep: nonoverlapping sequence pattern mining with gap constraints. IEEE Trans Cybern 48(10):2809–2822

    Article  Google Scholar 

  7. Abboud Y, Brun A, Boyer A (2019) C3ro: an efficient mining algorithm of extended-closed contiguous robust sequential patterns in noisy data. Expert Syst Appl 131:172–189

    Article  Google Scholar 

  8. Pei J (2004) Mining sequential patterns by pattern-growth: the prefixspan approach. IEEE Trans Knowl Data Eng 16

  9. Ayres J, Flannick J, Gehrke J, Yiu T (2002) Sequential pattern mining using a bitmap representation. In: Proceedings of the eighth ACM SIGKDD international conference on knowledge discovery and data mining, pp 429–435

  10. Fournier-Viger P, Gomariz A, Campos M, Thomas R (2014) Fast vertical mining of sequential patterns using co-occurrence information. In: Advances in knowledge discovery and data mining: 18th Pacific-Asia conference, PAKDD 2014, Tainan, Taiwan, May 13-16, 2014. Proceedings, Part I 18. Springer, pp 40–52

  11. Zaki MJ (2001) Spade: an efficient algorithm for mining frequent sequences. Mach Learn 42:31–60

    Article  Google Scholar 

  12. Pasquier N, Bastide Y, Taouil R, Lakhal L (1999) Discovering frequent closed itemsets for association rules. In: Database theory–ICDT’99: 7th international conference Jerusalem, Israel, January 10–12, 1999 Proceedings 7, Springer, pp 398–416

  13. Yan X, Han J, Afshar R (2003) Clospan: mining: closed sequential patterns in large datasets. In: Proceedings of the 2003 SIAM international conference on data mining, SIAM, pp 166–177

  14. Fürnkranz J (1998) A study using n-gram features for text categorization. Aust Res Inst Artif Intell 3(1998):1–10

    Google Scholar 

  15. Chen J, Cook T (2007) Mining contiguous sequential patterns from web logs. In: Proceedings of the 16th international conference on world wide web, pp 1177–1178

  16. Wang J, Han J (2004) Bide: Efficient mining of frequent closed sequences. In: Proceedings. 20th international conference on data engineering. IEEE, pp 79–90

  17. Gomariz A, Campos M, Marin R, Goethals B (2013) Clasp: an efficient algorithm for mining frequent closed sequences. In: Advances in knowledge discovery and data mining: 17th Pacific-Asia conference, PAKDD 2013, Gold Coast, Australia, April 14-17, 2013, Proceedings, Part I 17. Springer, pp 50–61

  18. Fumarola F, Lanotte PF, Ceci M, Malerba D (2016) Clofast: closed sequential pattern mining using sparse and vertical id-lists. Knowl Inf Syst 48:429–463

    Article  Google Scholar 

  19. Zhang J, Wang Y, Yang D (2015) Ccspan: mining closed contiguous sequential patterns. Knowl-Based Syst 89:1–13

    Article  Google Scholar 

  20. Farzana Zerin S, Jeong B-S (2011) A fast contiguous sequential pattern mining technique in dna data sequences using position information. IETE Tech Rev 28(6):511–519

    Article  Google Scholar 

  21. Zhang J, Wang Y, Zhang C, Shi Y (2015) Mining contiguous sequential generators in biological sequences. IEEE/ACM Trans Comput Biol Bioinform 13(5):855–867

    Article  Google Scholar 

  22. Gan S, Deng H, Qiu Y, Alshahrani M, Liu S (2022) Dsae-impute: learning discriminative stacked autoencoders for imputing single-cell rna-seq data. Curr Bioinform 17(5):440–451

    Article  Google Scholar 

  23. Niranjan U, Subramanyam R, Khanaa V (2010) Developing a web recommendation system based on closed sequential patterns. In: Information and communication technologies: international conference, ICT 2010, Kochi, Kerala, India, September 7-9, 2010. Proceedings. Springer, pp 171–179

  24. Bermingham L, Lee I (2020) Mining distinct and contiguous sequential patterns from large vehicle trajectories. Knowl-Based Syst 189:105076

    Article  Google Scholar 

  25. Ding S, Li Z, Zhang K, Mao F (2022) A comparative study of frequent pattern mining with trajectory data. Sensors 22(19):7608

    Article  Google Scholar 

  26. Adeyemo VE, Palczewska A, Jones B (2021) Lccspm: l-length closed contiguous sequential patterns mining algorithm to find frequent athlete movement patterns from gps. In: 2021 20th IEEE international conference on machine learning and applications (ICMLA). IEEE, pp 455–460

  27. Abboud Y, Boyer A, Brun A (2017) Ccpm: a scalable and noise-resistant closed contiguous sequential patterns mining algorithm. In: Machine learning and data mining in pattern recognition: 13th international conference, MLDM 2017, New York, NY, USA, July 15-20, 2017, Proceedings 13. Springer, pp 147–162

  28. Agrawal R, Imieliński T, Swami A (1993) Mining association rules between sets of items in large databases. In: Proceedings of the 1993 ACM SIGMOD international conference on management of data, pp 207–216

  29. Wu Y, Wang X, Li Y, Guo L, Li Z, Zhang J, Wu X (2022) Owsp-miner: self-adaptive one-off weak-gap strong pattern mining. ACM Trans Manag Inf Syst (TMIS) 13(3):1–23

    Article  Google Scholar 

  30. Nawaz MS, Fournier-Viger P, Shojaee A, Fujita H (2021) Using artificial intelligence techniques for covid-19 genome analysis. Appl Intell 51:3086–3103

    Article  Google Scholar 

  31. Greenfeld JS (2002) Matching gps observations to locations on a digital map. In: Transportation research board 81st annual meeting, vol 22, pp 576–582

  32. Huang G, Gan W, Huang S, Chen J (2022) Negative pattern discovery with individual support. Knowl-Based Syst 251:109194

    Article  Google Scholar 

  33. Wu Y, Yuan Z, Li Y, Guo L, Fournier-Viger P, Wu X (2022) Nwp-miner: nonoverlapping weak-gap sequential pattern mining. Inf Sci 588:124–141

  34. Karim MR, Hossain MA, Rashid MM, Jeong B-S, Choi H-J (2012) A mapreduce framework for mining maximal contiguous frequent patterns in large dna sequence datasets. IETE Tech Rev 29(2):162–168

    Article  Google Scholar 

  35. Karim MR, Rashid MM, Jeong B-S, Choi H-J (2012) An efficient approach to mining maximal contiguous frequent patterns from large dna sequence databases. Genom Inform 10(1):51–57

    Article  Google Scholar 

  36. Li Y, Zhang S, Guo L, Liu J, Wu Y, Wu X (2022) Netnmsp: nonoverlapping maximal sequential pattern mining. Appl Intell 1–24

Download references

Acknowledgements

This work was supported by the National Natural Science Foundation of China (Grant No. 62102158), the 2021 Foshan support project for promoting the development of the university scientific and technological achievements service industry (2021DZXX05) and the National Innovation and Entrepreneurship Training Program for Undergraduates.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Shichao Liu.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Hu, H., Zhang, J., Xia, R. et al. CCSMP: an efficient closed contiguous sequential pattern mining algorithm with a pattern relation graph. Appl Intell 53, 29723–29740 (2023). https://doi.org/10.1007/s10489-023-05118-x

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-023-05118-x

Keywords

Navigation