Abstract
Mining closed contiguous sequential patterns has been addressed in the literature only recently, through the CCSpan algorithm. CCSpan mines a set of patterns that contains the same information than traditional sets of closed sequential patterns, while being more compact due to the contiguity. Although CCSpan outperforms closed sequential pattern mining algorithms in the general case, it does not scale well on large datasets with long sequences. Moreover, in the context of noisy datasets, the contiguity constraint prevents from mining a relevant result set. Inspired by BIDE, that has proven to be one of the most efficient closed sequential pattern mining algorithm, we propose CCPM that mines closed contiguous sequential patterns, while being scalable. Furthermore, CCPM introduces usable wildcards that address the problem of mining noisy data. Experiments show that CCPM greatly outperforms CCSpan, especially on large datasets with long sequences. In addition, they show that the wildcards allows to efficiently tackle the problem of noisy data.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Abboud, Y., Boyer, A., Brun, A.: Predict the emergence - application to competencies in job offers. In: ICTAI (2015)
Agrawal, R., Imieliskiand, T., Swami, A.: Mining association rules between sets of items in large databases. In: ACM SIGMOD, pp. 207–216 (1993)
Agrawal, R., Srikant, R.: Mining sequential patterns. In: Proceedings of the Eleventh International Conference on Data Engineering, pp. 3–14 (1995)
C. Aggarwal, C., Ta, N., Wang, J., Feng, J., J. Zaki, M.: XProj: a framework for projected structural clustering of xml documents. In: Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 46–55 (2007)
Ayres, J., Flannick, J., Gehrke, J., Yiu, T.: Sequential pattern mining using a bitmap representation. In: Proceedings of the Eighth ACM SIGKDD International Conference, pp. 429–435 (2002)
Chen, J., Cook, T.: Mining contiguous sequential patterns from web logs. In: Proceedings of the 16th International Conference on WWW (2007)
Chen, J.: Contiguous item sequential pattern mining using UpDown Tree. Intell. Data Anal. 12(1), 25–49 (2008)
Li, C., Wang, J.: Efficiently mining closed subsequences with gap constraints. In: Proceedings of SIAM International Conference on Data Mining (2008)
Fischer, J., Heun, V., Kramer, S.: Optimal string mining under frequency constraints. In: European Conference on Principles of Data Mining and Knowledge Discovery, pp. 139–150 (2006)
Fürnkranz, J.: A study using n-gram features for text categorization. In: Austrian Research Institute for Artificial Intelligence (1998)
Garofalakis, M., Rastogi, R., Shim, K.: MSPIRIT: sequential pattern mining with regular expression constraints. In: Proceedings of the 25th International Conference on Very Large Data Bases (1999)
Han, J., Pei, J., Mortazavi-Asl, B., Chen, Q., Dayal, U., Hsu, M.: FreeSpan: frequent pattern-projected sequential pattern mining. In: Proceedings of the Sixth ACM SIGKDD (2000)
Kang, T.H., Yoo, J.S., Kim, H.Y.: Mining frequent contiguous sequence patterns in biological sequences. In: 2007 IEEE 7th International Symposium on BioInformatics and BioEngineering (2007)
Karim, M., Rashid, M., Jeong, B.S., Choi, H.J.: An efficient approach to mining maximal contiguous frequent patterns from large DNA sequence databases. Genomics Inform. 10(1), 51–57 (2012)
Liao, V.C.C., Chen, M.S.: DFSP: a Depth-First SPelling algorithm for sequential pattern mining of biological sequences. Knowl. Inf. Syst. 38, 623–639 (2014)
Matsui, T., Uno, T., Umemori, J., Koide, T.: A New Approach to String Pattern Mining with Approximate Match, Discovery Science, pp. 110–125 (2013)
Pei, J., Han, J., Mao, R., Chen, Q., Dayal, U., Hsu, M.: CLOSET: an efficient algorithm for mining frequent closed itemsets. In: DMKD 2001 workshop (2001)
Pei, J., Han, J., Mortazavi-Asl, B., Chen, Q., Dayal, U., Hsu, M.: PrefixSpan: mining sequential patterns efficiently by prefix-projected pattern growth. In: Proceedings of the 17th International Conference on Data Engineering (2001)
Pei, J., Han, J., Wang, W.: Constraint-based sequential pattern mining: the pattern-growth methods. J. Intell. Inf. Syst. 28(2), 133–160 (2007)
Wang, J., Han, J.: BIDE: efficient mining of frequent closed sequences. In: Proceedings of the 20th International Conference on Data Engineering (2004)
Yan, X., Han, J., Afshar, R.: CloSpan: mining closed sequential patterns in large datasets. In: Proceedings of SIAM Conference on Data Mining (2003)
Zaki, M., Hsiao, C.: CHARM: An efficient algorithm for closed itemset mining. In: Proceedings of SIAM Conference on Data Mining, vol. 2 (2002)
Zhang, M., Kao, B., Cheung, D., Yip, K.: Mining periodic patterns with gap requirement from sequences. ACM Trans. Knowl. Discov. Data 1(2), Article No. 7 (2007)
Zhang, J., Wang, Y., Yang, D.: CCSpan: mining closed contiguous sequential patterns. Knowl.-Based Syst. 89, 1–13 (2015)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Abboud, Y., Boyer, A., Brun, A. (2017). CCPM: A Scalable and Noise-Resistant Closed Contiguous Sequential Patterns Mining Algorithm. In: Perner, P. (eds) Machine Learning and Data Mining in Pattern Recognition. MLDM 2017. Lecture Notes in Computer Science(), vol 10358. Springer, Cham. https://doi.org/10.1007/978-3-319-62416-7_11
Download citation
DOI: https://doi.org/10.1007/978-3-319-62416-7_11
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-62415-0
Online ISBN: 978-3-319-62416-7
eBook Packages: Computer ScienceComputer Science (R0)