Skip to main content

CCPM: A Scalable and Noise-Resistant Closed Contiguous Sequential Patterns Mining Algorithm

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 10358))

Abstract

Mining closed contiguous sequential patterns has been addressed in the literature only recently, through the CCSpan algorithm. CCSpan mines a set of patterns that contains the same information than traditional sets of closed sequential patterns, while being more compact due to the contiguity. Although CCSpan outperforms closed sequential pattern mining algorithms in the general case, it does not scale well on large datasets with long sequences. Moreover, in the context of noisy datasets, the contiguity constraint prevents from mining a relevant result set. Inspired by BIDE, that has proven to be one of the most efficient closed sequential pattern mining algorithm, we propose CCPM that mines closed contiguous sequential patterns, while being scalable. Furthermore, CCPM introduces usable wildcards that address the problem of mining noisy data. Experiments show that CCPM greatly outperforms CCSpan, especially on large datasets with long sequences. In addition, they show that the wildcards allows to efficiently tackle the problem of noisy data.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    http://www.philippe-fournier-viger.com/spmf/index.php.

References

  1. Abboud, Y., Boyer, A., Brun, A.: Predict the emergence - application to competencies in job offers. In: ICTAI (2015)

    Google Scholar 

  2. Agrawal, R., Imieliskiand, T., Swami, A.: Mining association rules between sets of items in large databases. In: ACM SIGMOD, pp. 207–216 (1993)

    Google Scholar 

  3. Agrawal, R., Srikant, R.: Mining sequential patterns. In: Proceedings of the Eleventh International Conference on Data Engineering, pp. 3–14 (1995)

    Google Scholar 

  4. C. Aggarwal, C., Ta, N., Wang, J., Feng, J., J. Zaki, M.: XProj: a framework for projected structural clustering of xml documents. In: Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 46–55 (2007)

    Google Scholar 

  5. Ayres, J., Flannick, J., Gehrke, J., Yiu, T.: Sequential pattern mining using a bitmap representation. In: Proceedings of the Eighth ACM SIGKDD International Conference, pp. 429–435 (2002)

    Google Scholar 

  6. Chen, J., Cook, T.: Mining contiguous sequential patterns from web logs. In: Proceedings of the 16th International Conference on WWW (2007)

    Google Scholar 

  7. Chen, J.: Contiguous item sequential pattern mining using UpDown Tree. Intell. Data Anal. 12(1), 25–49 (2008)

    Google Scholar 

  8. Li, C., Wang, J.: Efficiently mining closed subsequences with gap constraints. In: Proceedings of SIAM International Conference on Data Mining (2008)

    Google Scholar 

  9. Fischer, J., Heun, V., Kramer, S.: Optimal string mining under frequency constraints. In: European Conference on Principles of Data Mining and Knowledge Discovery, pp. 139–150 (2006)

    Google Scholar 

  10. Fürnkranz, J.: A study using n-gram features for text categorization. In: Austrian Research Institute for Artificial Intelligence (1998)

    Google Scholar 

  11. Garofalakis, M., Rastogi, R., Shim, K.: MSPIRIT: sequential pattern mining with regular expression constraints. In: Proceedings of the 25th International Conference on Very Large Data Bases (1999)

    Google Scholar 

  12. Han, J., Pei, J., Mortazavi-Asl, B., Chen, Q., Dayal, U., Hsu, M.: FreeSpan: frequent pattern-projected sequential pattern mining. In: Proceedings of the Sixth ACM SIGKDD (2000)

    Google Scholar 

  13. Kang, T.H., Yoo, J.S., Kim, H.Y.: Mining frequent contiguous sequence patterns in biological sequences. In: 2007 IEEE 7th International Symposium on BioInformatics and BioEngineering (2007)

    Google Scholar 

  14. Karim, M., Rashid, M., Jeong, B.S., Choi, H.J.: An efficient approach to mining maximal contiguous frequent patterns from large DNA sequence databases. Genomics Inform. 10(1), 51–57 (2012)

    Article  Google Scholar 

  15. Liao, V.C.C., Chen, M.S.: DFSP: a Depth-First SPelling algorithm for sequential pattern mining of biological sequences. Knowl. Inf. Syst. 38, 623–639 (2014)

    Article  Google Scholar 

  16. Matsui, T., Uno, T., Umemori, J., Koide, T.: A New Approach to String Pattern Mining with Approximate Match, Discovery Science, pp. 110–125 (2013)

    Google Scholar 

  17. Pei, J., Han, J., Mao, R., Chen, Q., Dayal, U., Hsu, M.: CLOSET: an efficient algorithm for mining frequent closed itemsets. In: DMKD 2001 workshop (2001)

    Google Scholar 

  18. Pei, J., Han, J., Mortazavi-Asl, B., Chen, Q., Dayal, U., Hsu, M.: PrefixSpan: mining sequential patterns efficiently by prefix-projected pattern growth. In: Proceedings of the 17th International Conference on Data Engineering (2001)

    Google Scholar 

  19. Pei, J., Han, J., Wang, W.: Constraint-based sequential pattern mining: the pattern-growth methods. J. Intell. Inf. Syst. 28(2), 133–160 (2007)

    Article  Google Scholar 

  20. Wang, J., Han, J.: BIDE: efficient mining of frequent closed sequences. In: Proceedings of the 20th International Conference on Data Engineering (2004)

    Google Scholar 

  21. Yan, X., Han, J., Afshar, R.: CloSpan: mining closed sequential patterns in large datasets. In: Proceedings of SIAM Conference on Data Mining (2003)

    Google Scholar 

  22. Zaki, M., Hsiao, C.: CHARM: An efficient algorithm for closed itemset mining. In: Proceedings of SIAM Conference on Data Mining, vol. 2 (2002)

    Google Scholar 

  23. Zhang, M., Kao, B., Cheung, D., Yip, K.: Mining periodic patterns with gap requirement from sequences. ACM Trans. Knowl. Discov. Data 1(2), Article No. 7 (2007)

    Google Scholar 

  24. Zhang, J., Wang, Y., Yang, D.: CCSpan: mining closed contiguous sequential patterns. Knowl.-Based Syst. 89, 1–13 (2015)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yacine Abboud .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Cite this paper

Abboud, Y., Boyer, A., Brun, A. (2017). CCPM: A Scalable and Noise-Resistant Closed Contiguous Sequential Patterns Mining Algorithm. In: Perner, P. (eds) Machine Learning and Data Mining in Pattern Recognition. MLDM 2017. Lecture Notes in Computer Science(), vol 10358. Springer, Cham. https://doi.org/10.1007/978-3-319-62416-7_11

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-62416-7_11

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-62415-0

  • Online ISBN: 978-3-319-62416-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics