Skip to main content

Constraint-Based Sequence Mining Using Constraint Programming

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 9075))

Abstract

The goal of constraint-based sequence mining is to find sequences of symbols that are included in a large number of input sequences and that satisfy some constraints specified by the user. Many constraints have been proposed in the literature, but a general framework is still missing. We investigate the use of constraint programming as general framework for this task.

We first identify four categories of constraints that are applicable to sequence mining. We then propose two constraint programming formulations. The first formulation introduces a new global constraint called exists-embedding. This formulation is the most efficient but does not support one type of constraint. To support such constraints, we develop a second formulation that is more general but incurs more overhead. Both formulations can use the projected database technique used in specialised algorithms.

Experiments demonstrate the flexibility towards constraint-based settings and compare the approach to existing methods.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Agrawal, R., Srikant, R.: Mining sequential patterns. In: Proceedings of the Eleventh International Conference on Data Engineering, pp. 3–14. IEEE (1995)

    Google Scholar 

  2. Agrawal, R., Srikant, R., et al.: Fast algorithms for mining association rules in large database. In: Proc. 20th Int. Conf. Very Large Data Bases, VLDB, vol. 1215, pp. 487–499 (1994)

    Google Scholar 

  3. Coquery, E., Jabbour, S., Sais, L., Salhi, Y.: A sat-based approach for discovering frequent, closed and maximal patterns in a sequence. In: European Conference on Artificial Intelligence (ECAI), pp. 258–263 (2012)

    Google Scholar 

  4. Fannes, T., Vandermarliere, E., Schietgat, L., Degroeve, S., Martens, L., Ramon, J.: Predicting tryptic cleavage from proteomics data using decision tree ensembles. Journal of Proteome Research 12(5), 2253–2259 (2013). http://pubs.acs.org/doi/abs/10.1021/pr4001114

    Article  Google Scholar 

  5. Guns, T., Nijssen, S., De Raedt, L.: Itemset mining: A constraint programming perspective. Artificial Intelligence 175(12–13), 1951–1983 (2011)

    Article  MathSciNet  Google Scholar 

  6. Han, J., Pei, J., Mortazavi-Asl, B., Pinto, H., Chen, Q., Dayal, U., Hsu, M.: Prefixspan: mining sequential patterns efficiently by prefix-projected pattern growth. ICDE 2001, pp. 215–224, April 2001

    Google Scholar 

  7. Jabbour, S., Sais, L., Salhi, Y.: Boolean satisfiability for sequence mining. In: 22nd International Conference on Information and Knowledge Management (CIKM 2013), pp. 649–658. ACM Press, San Francisco (2013)

    Google Scholar 

  8. Kemmar, A., Ugarte, W., Loudni, S., Charnois, T., Lebbah, Y., Boizumault, P., Cremilleux, B.: Mining relevant sequence patterns with cp-based framework. In: 2013 IEEE 25th International Conference on Tools with Artificial Intelligence (ICTAI). IEEE (2014)

    Google Scholar 

  9. Mannila, H., Toivonen, H., Inkeri Verkamo, A.: Discovery of frequent episodes in event sequences. Data Mining and Knowledge Discovery 1(3), 259–289 (1997)

    Article  Google Scholar 

  10. Métivier, J.P., Loudni, S., Charnois, T.: A constraint programming approach for mining sequential patterns in a sequence database. In: ECML/PKDD 2013 Workshop on Languages for Data Mining and Machine Learning (2013)

    Google Scholar 

  11. Negrevergne, B., Dries, A., Guns, T., Nijssen, S.: Dominance programming for itemset mining. In: International Conference on Data Mining (ICDM) (2013)

    Google Scholar 

  12. Negrevergne, B., Guns, T.: Constraint-based sequence mining using constraint programming. CoRR abs/1501.01178 (2015)

    Google Scholar 

  13. Nijssen, S., Guns, T., De Raedt, L.: Correlated itemset mining in ROC space: A constraint programming approach

    Google Scholar 

  14. Ohtani, H., Kida, T., Uno, T., Arimura, H., Arimura, H.: Efficient serial episode mining with minimal occurrences. In: ICUIMC, pp. 457–464 (2009)

    Google Scholar 

  15. Ugarte Rojas, W., Boizumault, P., Loudni, S., Crémilleux, B., Lepailleur, A.: Mining (soft-) skypatterns using dynamic CSP. In: Simonis, H. (ed.) CPAIOR 2014. LNCS, vol. 8451, pp. 71–87. Springer, Heidelberg (2014)

    Chapter  Google Scholar 

  16. Tatti, N., Vreeken, J.: The long and the short of it: summarising event sequences with serial episodes. In: KDD, pp. 462–470 (2012)

    Google Scholar 

  17. Wang, J., Han, J.: Bide: Efficient mining of frequent closed sequences. In: Proceedings of the 20th International Conference on Data Engineering, pp. 79–90. IEEE (2004)

    Google Scholar 

  18. Yan, X., Han, J., Afshar, R.: Clospan: Mining closed sequential patterns in large datasets. In: Proceedings of SIAM International Conference on Data Mining, pp. 166–177 (2003)

    Google Scholar 

  19. Ye, K., Kosters, W.A., IJzerman, A.P.: An efficient, versatile and scalable pattern growth approach to mine frequent patterns in unaligned protein sequences. Bioinformatics 23(6), 687–693 (2007)

    Article  Google Scholar 

  20. Zaki, M.J.: Sequence mining in categorical domains: incorporating constraints. In: Proceedings of the ninth international conference on Information and knowledge management, pp. 422–429. ACM (2000)

    Google Scholar 

  21. Zaki, M.J.: Spade: An efficient algorithm for mining frequent sequences. Machine Learning 42(1), 31–60 (2001)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Benjamin Negrevergne .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this paper

Cite this paper

Negrevergne, B., Guns, T. (2015). Constraint-Based Sequence Mining Using Constraint Programming. In: Michel, L. (eds) Integration of AI and OR Techniques in Constraint Programming. CPAIOR 2015. Lecture Notes in Computer Science(), vol 9075. Springer, Cham. https://doi.org/10.1007/978-3-319-18008-3_20

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-18008-3_20

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-18007-6

  • Online ISBN: 978-3-319-18008-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics