Advertisement

EpisodeSupport: A Global Constraint for Mining Frequent Patterns in a Long Sequence of Events

  • Quentin Cappart
  • John O. R. Aoga
  • Pierre Schaus
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10848)

Abstract

The number of applications generating sequential data is exploding. This work studies the discovering of frequent patterns in a large sequence of events, possibly time-stamped. This problem is known as the Frequent Episode Mining (FEM). Similarly to the mining problems recently tackled by Constraint Programming (CP), FEM would also benefit from the modularity offered by CP to accommodate easily additional constraints on the patterns. These advantages do not offer a guarantee of efficiency. Therefore, we introduce two global constraints for solving FEM problems with or without time consideration. The time-stamped version can accommodate gap and span constraints on the matched sequences. Our experiments on real data sets of different levels of complexity show that the introduced constraints is competitive with the state-of-the-art methods in terms of execution time and memory consumption while offering the flexibility of adding constraints on the patterns.

References

  1. 1.
    Agrawal, R., Mannila, H., Srikant, R., Toivonen, H., Verkamo, A.I., et al.: Fast discovery of association rules. Adv. Knowl. Discov. Data Min. 12(1), 307–328 (1996)Google Scholar
  2. 2.
    Aoga, J.O.R., Guns, T., Schaus, P.: An efficient algorithm for mining frequent sequence with constraint programming. In: Frasconi, P., Landwehr, N., Manco, G., Vreeken, J. (eds.) ECML PKDD 2016. LNCS (LNAI), vol. 9852, pp. 315–330. Springer, Cham (2016).  https://doi.org/10.1007/978-3-319-46227-1_20CrossRefGoogle Scholar
  3. 3.
    Aoga, J.O.R., Guns, T., Schaus, P.: Mining time-constrained sequential patterns with constraint programming. Constraints 22(4), 548–570 (2017)MathSciNetCrossRefGoogle Scholar
  4. 4.
    Calders, T., Dexters, N., Goethals, B.: Mining frequent itemsets in a stream. In: 2007 Seventh IEEE International Conference on Data Mining, ICDM 2007, pp. 83–92. IEEE (2007)Google Scholar
  5. 5.
    UniProt Consortium: The universal protein resource (UniProt). Nucleic Acids Res. 36(Suppl. 1), D190–D195 (2008)Google Scholar
  6. 6.
    Cule, B., Goethals, B., Robardet, C.: A new constraint for mining sets in sequences. In: Proceedings of the 2009 SIAM International Conference on Data Mining, pp. 317–328. SIAM (2009)Google Scholar
  7. 7.
    Das, G., Lin, K.I., Mannila, H., Renganathan, G., Smyth, P.: Rule discovery from time series. In: KDD, vol. 98, pp. 16–22 (1998)Google Scholar
  8. 8.
    Dolan, E.D., Moré, J.J.: Benchmarking optimization software with performance profiles. Math. Program. 91(2), 201–213 (2002).  https://doi.org/10.1007/s101070100263MathSciNetCrossRefMATHGoogle Scholar
  9. 9.
    Feurer, M., Klein, A., Eggensperger, K., Springenberg, J., Blum, M., Hutter, F.: Efficient and robust automated machine learning. In: Cortes, C., Lawrence, N.D., Lee, D.D., Sugiyama, M., Garnett, R. (eds.) Advances in Neural Information Processing Systems 28, pp. 2962–2970. Curran Associates, Inc. (2015). http://papers.nips.cc/paper/5872-efficient-and-robust-automated-machine-learning.pdf
  10. 10.
    Ghahramani, Z.: Automating machine learning. In: Lecture Notes in Computer Science, vol. 9852 (2016)Google Scholar
  11. 11.
    Guns, T., Dries, A., Tack, G., Nijssen, S., De Raedt, L.: MiningZinc: a modeling language for constraint-based mining. In: Proceedings of the Twenty-Third International Joint Conference on Artificial Intelligence, pp. 1365–1372. AAAI Press (2013)Google Scholar
  12. 12.
    Guns, T., Nijssen, S., De Raedt, L.: Itemset mining: a constraint programming perspective. Artif. Intell. 175(12–13), 1951–1983 (2011)MathSciNetCrossRefGoogle Scholar
  13. 13.
    Han, J., Pei, J., Mortazavi-Asl, B., Pinto, H., Chen, Q., Dayal, U., Hsu, M.: Prefixspan: mining sequential patterns efficiently by prefix-projected pattern growth. In: Proceedings of the 17th International Conference on Data Engineering, pp. 215–224 (2001)Google Scholar
  14. 14.
    Huang, K.Y., Chang, C.H.: Efficient mining of frequent episodes from complex sequences. Inf. Syst. 33(1), 96–114 (2008)CrossRefGoogle Scholar
  15. 15.
    Iwanuma, K., Takano, Y., Nabeshima, H.: On anti-monotone frequency measures for extracting sequential patterns from a single very-long data sequence. In: 2004 IEEE Conference on Cybernetics and Intelligent Systems, vol. 1, pp. 213–217. IEEE (2004)Google Scholar
  16. 16.
    Kemmar, A., Loudni, S., Lebbah, Y., Boizumault, P., Charnois, T.: A global constraint for mining sequential patterns with GAP constraint. In: Quimper, C.-G. (ed.) CPAIOR 2016. LNCS, vol. 9676, pp. 198–215. Springer, Cham (2016).  https://doi.org/10.1007/978-3-319-33954-2_15CrossRefMATHGoogle Scholar
  17. 17.
    Kotthoff, L., Nanni, M., Guidotti, R., O’Sullivan, B.: Find your way back: mobility profile mining with constraints. In: Pesant, G. (ed.) CP 2015. LNCS, vol. 9255, pp. 638–653. Springer, Cham (2015).  https://doi.org/10.1007/978-3-319-23219-5_44CrossRefGoogle Scholar
  18. 18.
    Kotthoff, L., Thornton, C., Hoos, H.H., Hutter, F., Leyton-Brown, K.: Auto-WEKA 2.0: automatic model selection and hyperparameter optimization in WEKA. J. Mach. Learn. Res. 17, 1–5 (2017)MathSciNetMATHGoogle Scholar
  19. 19.
    Laxman, S., Sastry, P., Unnikrishnan, K.: A fast algorithm for finding frequent episodes in event streams. In: Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 410–419. ACM (2007)Google Scholar
  20. 20.
    Lichman, M.: UCI machine learning repository (2013). https://archive.ics.uci.edu/ml/datasets/UNIX+User+Data
  21. 21.
    Mannila, H., Toivonen, H.: Discovering generalized episodes using minimal occurrences. In: KDD, vol. 96, pp. 146–151 (1996)Google Scholar
  22. 22.
    Mannila, H., Toivonen, H., Verkamo, A.I.: Discovering frequent episodes in sequences extended abstract. In: 1st Conference on Knowledge Discovery and Data Mining (1995)Google Scholar
  23. 23.
    Negrevergne, B., Guns, T.: Constraint-based sequence mining using constraint programming. In: Michel, L. (ed.) CPAIOR 2015. LNCS, vol. 9075, pp. 288–305. Springer, Cham (2015).  https://doi.org/10.1007/978-3-319-18008-3_20CrossRefMATHGoogle Scholar
  24. 24.
    Nijssen, S., Guns, T.: Integrating constraint programming and itemset mining. In: Balcázar, J.L., Bonchi, F., Gionis, A., Sebag, M. (eds.) ECML PKDD 2010. LNCS (LNAI), vol. 6322, pp. 467–482. Springer, Heidelberg (2010).  https://doi.org/10.1007/978-3-642-15883-4_30CrossRefGoogle Scholar
  25. 25.
    Pesant, G.: A regular language membership constraint for finite sequences of variables. In: Wallace, M. (ed.) CP 2004. LNCS, vol. 3258, pp. 482–495. Springer, Heidelberg (2004).  https://doi.org/10.1007/978-3-540-30201-8_36CrossRefMATHGoogle Scholar
  26. 26.
    Rawassizadeh, R., Momeni, E., Dobbins, C., Mirza-Babaei, P., Rahnamoun, R.: Lesson learned from collecting quantified self information via mobile and wearable devices. J. Sens. Actuator Netw. 4(4), 315–335 (2015)CrossRefGoogle Scholar
  27. 27.
    Rawassizadeh, R., Tomitsch, M., Wac, K., Tjoa, A.M.: UbiqLog: a generic mobile phone-based life-log framework. Pers. Ubiquit. Comput. 17(4), 621–637 (2013)CrossRefGoogle Scholar
  28. 28.
    Schaus, P., Aoga, J.O.R., Guns, T.: CoverSize: a global constraint for frequency-based itemset mining. In: Beck, J.C. (ed.) CP 2017. LNCS, vol. 10416, pp. 529–546. Springer, Cham (2017).  https://doi.org/10.1007/978-3-319-66158-2_34CrossRefMATHGoogle Scholar
  29. 29.
    Shokoohi-Yekta, M., Chen, Y., Campana, B., Hu, B., Zakaria, J., Keogh, E.: Discovery of meaningful rules in time series. In: Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1085–1094. ACM (2015)Google Scholar
  30. 30.
    Tatti, N., Cule, B.: Mining closed strict episodes. In: 2010 IEEE 10th International Conference on Data Mining (ICDM), pp. 501–510. IEEE (2010)Google Scholar
  31. 31.
    Team, O.: OscaR: Scala in OR (2012)Google Scholar
  32. 32.
    Yang, Q., Wu, X.: 10 challenging problems in data mining research. Int. J. Inf. Technol. Decis. Making 5(04), 597–604 (2006)CrossRefGoogle Scholar
  33. 33.
    Yang, Z., Wang, Y., Kitsuregawa, M.: LAPIN: effective sequential pattern mining algorithms by last position induction for dense databases. In: Kotagiri, R., Krishna, P.R., Mohania, M., Nantajeewarawat, E. (eds.) DASFAA 2007. LNCS, vol. 4443, pp. 1020–1023. Springer, Heidelberg (2007).  https://doi.org/10.1007/978-3-540-71703-4_95CrossRefGoogle Scholar
  34. 34.
    Zhou, W., Liu, H., Cheng, H.: Mining closed episodes from event sequences efficiently. In: Zaki, M.J., Yu, J.X., Ravindran, B., Pudi, V. (eds.) PAKDD 2010. LNCS (LNAI), vol. 6118, pp. 310–318. Springer, Heidelberg (2010).  https://doi.org/10.1007/978-3-642-13657-3_34CrossRefGoogle Scholar

Copyright information

© Springer International Publishing AG, part of Springer Nature 2018

Authors and Affiliations

  • Quentin Cappart
    • 1
  • John O. R. Aoga
    • 1
  • Pierre Schaus
    • 1
  1. 1.Université catholique de LouvainLouvain-La-NeuveBelgium

Personalised recommendations