An Efficient Algorithm for Mining Frequent Sequence with Constraint Programming

Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9852)

Abstract

The main advantage of Constraint Programming (CP) approaches for sequential pattern mining (SPM) is their modularity, which includes the ability to add new constraints (regular expressions, length restrictions, etc.). The current best CP approach for SPM uses a global constraint (module) that computes the projected database and enforces the minimum frequency; it does this with a filtering algorithm similar to the PrefixSpan method. However, the resulting system is not as scalable as some of the most advanced mining systems like Zaki’s cSPADE. We show how, using techniques from both data mining and CP, one can use a generic constraint solver and yet outperform existing specialized systems. This is mainly due to two improvements in the module that computes the projected frequencies: first, computing the projected database can be sped up by pre-computing the positions at which a symbol can become unsupported by a sequence, thereby avoiding to scan the full sequence each time; and second by taking inspiration from the trailing used in CP solvers to devise a backtracking-aware data structure that allows fast incremental storing and restoring of the projected database. Detailed experiments show how this approach outperforms existing CP as well as specialized systems for SPM, and that the gain in efficiency translates directly into increased efficiency for other settings such as mining with regular expressions. The data and software related to this paper are available at http://sites.uclouvain.be/cp4dm/spm/.

References

  1. 1.
    Agrawal, R., Srikant, R.: Mining sequential patterns. In: Proceedings of the Eleventh International Conference on Data Engineering, 1995, pp. 3–14. IEEE (1995)Google Scholar
  2. 2.
    Ayres, J., Flannick, J., Gehrke, J., Yiu, T.: Sequential pattern mining using a bitmap representation. In: ACM SIGKDD, pp. 429–435 (2002)Google Scholar
  3. 3.
    Coquery, E., Jabbour, S., Saïs, L., Salhi, Y.: A SAT-based approach for discovering frequent, closed and maximal patterns in a sequence. In: ECAI (2012)Google Scholar
  4. 4.
    Guns, T., Nijssen, S., De Raedt, L.: Itemset mining: a constraint programming perspective. Artif. Intell. 175(12), 1951–1983 (2011)MathSciNetCrossRefMATHGoogle Scholar
  5. 5.
    Kemmar, A., Loudni, S., Lebbah, Y., Boizumault, P., Charnois, T.: A global constraint for mining sequential patterns with gap constraint. In: CPAIOR16 (2015)Google Scholar
  6. 6.
    Kemmar, A., Loudni, S., Lebbah, Y., Boizumault, P., Charnois, T.: PREFIX-PROJECTION global constraint for sequential pattern mining. In: Pesant, G. (ed.) CP 2015. LNCS, vol. 9255, pp. 226–243. Springer, Heidelberg (2015). doi: 10.1007/978-3-319-23219-5_17 Google Scholar
  7. 7.
    Mabroukeh, N.R., Ezeife, C.I.: A taxonomy of sequential pattern mining algorithms. ACM Comput. Surv. 43(1), 3:1–3:41 (2010)CrossRefGoogle Scholar
  8. 8.
    Negrevergne, B., Guns, T.: Constraint-based sequence mining using constraint programming. In: Michel, L. (ed.) CPAIOR 2015. LNCS, vol. 9075, pp. 288–305. Springer, Heidelberg (2015). doi: 10.1007/978-3-319-18008-3_20 Google Scholar
  9. 9.
    OscaR Team: OscaR: Scala in OR (2012). https://bitbucket.org/oscarlib/oscar
  10. 10.
    Pei, J., Han, J., Mortazavi-Asl, B., Pinto, H., Chen, Q., Dayal, U., Hsu, M.C.: Prefixspan: mining sequential patterns efficiently by prefix-projected pattern growth. In: ICCCN, p. 0215. IEEE (2001)Google Scholar
  11. 11.
    Perez, G., Régin, J.-C.: Improving GAC-4 for table and MDD constraints. In: O’Sullivan, B. (ed.) CP 2014. LNCS, vol. 8656, pp. 606–621. Springer, Heidelberg (2014). doi: 10.1007/978-3-319-10428-7_44 Google Scholar
  12. 12.
    Rossi, F., Van Beek, P., Walsh, T.: Handbook of CP. Elsevier (2006)Google Scholar
  13. 13.
    Schulte, C., Carlsson, M.: Finite domain constraint programming systems. In: Handbook of Constraint Programming, pp. 495–526 (2006)Google Scholar
  14. 14.
    Trasarti, R., Bonchi, F., Goethals, B.: Sequence mining automata: a new technique for mining frequent sequences under regular expressions. In: Eighth IEEE International Conference on Data Mining, 2008, ICDM 2008, pp. 1061–1066. IEEE (2008)Google Scholar
  15. 15.
    Yan, X., Han, J., Afshar, R.: Clospan: mining closed sequential patterns in large datasets. In: SDM, pp. 166–177. SIAM (2003)Google Scholar
  16. 16.
    Yang, Z., Kitsuregawa, M.: LAPIN-SPAM: an improved algorithm for mining sequential pattern. In: International Conference on Data Engineering (2005)Google Scholar
  17. 17.
    Yang, Z., Wang, Y., Kitsuregawa, M.: LAPIN: effective sequential pattern mining algorithms by last position induction for dense databases. In: DAFSAA, pp. 1020–1023 (2007)Google Scholar
  18. 18.
    Zaki, M.J.: Sequence mining in categorical domains: incorporating constraints. In: Proceedings of the Ninth International Conference on Information and Knowledge Management, pp. 422–429. ACM (2000)Google Scholar

Copyright information

© Springer International Publishing AG 2016

Authors and Affiliations

  1. 1.UCLouvainICTEAMLouvain-la-NeuveBelgium
  2. 2.DTAI Research GroupKU LeuvenLeuvenBelgium

Personalised recommendations