Abstract
This article presents the use of Answer Set Programming (ASP) to mine sequential patterns. ASP is a high-level declarative logic programming paradigm for high level encoding combinatorial and optimization problem solving as well as knowledge representation and reasoning. Thus, ASP is a good candidate for implementing pattern mining with background knowledge, which has been a data mining issue for a long time. We propose encodings of the classical sequential pattern mining tasks within two representations of embeddings (fill-gaps versus skip-gaps) and for various kinds of patterns: frequent, constrained and condensed. We compare the computational performance of these encodings with each other to get a good insight into the efficiency of ASP encodings. The results show that the fill-gaps strategy is better on real problems due to lower memory consumption. Finally, compared to a constraint programming approach (CPSM), another declarative programming paradigm, our proposal showed comparable performance.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
It is important to notice that the scope of a variable is the rule and each occurrence of a variable in a rule represents the same value.
- 2.
clingo is fully compliant with the recent ASP standard: https://www.mat.unical.it/aspcomp2013/ASPStandardization.
- 3.
A similar encoding can be done for the fill-gaps strategy applying the same changes as above.
- 4.
- 5.
- 6.
The generator and databases used in our experiments are available at https://sites.google.com/site/aspseqmining.
- 7.
The use of subset-minimal heuristic keeps solving the maximal patterns problem complete.
References
Agrawal, R., Imielinski, T., & Swami, A. (1993). Mining association rules between sets of items in large databases. In Proceedings of the ACM SIGMOD Conference on Management of Data (pp. 207–216).
Agrawal, R., & Srikant, R. (1995). Mining sequential patterns. In Proceedings of the International Conference on Data Engineering (pp. 3–14).
Biere, A., Heule, M., van Maaren, H., & Walsh, T. (2009). Handbook of satisfiability. Frontiers in artificial intelligence and applications (Vol. 185). IOS Press.
Bonchi, F., Giannotti, F., Lucchese, C., Orlando, S., Perego, R., & Trasarti, R. (2006). Conquest: A constraint-based querying system for exploratory pattern discovery. In Proceedings of the International Conference on Data Engineering (pp. 159–159).
Boulicaut, J.-F., & Jeudy, B. (2005). Constraint-based data mining. In O. Maimon & L. Rokach (Eds.), Data mining and knowledge discovery handbook (pp. 399–416). US: Springer.
Brewka, G., Delgrande, J.P., Romero, J., & Schaub, T. (2015). Asprin: Customizing answer set preferences without a headache. In Proceedings of the Conference on Artificial Intelligence (AAAI), pp. 1467–1474.
Bruynooghe, M., Blockeel, H., Bogaerts, B., De Cat, B., De Pooter, S., Jansen, J., et al. (2015). Predicate logic as a modeling language: Modeling and solving some machine learning and data mining problems with IDP3. Theory and Practice of Logic Programming, 15(06), 783–817.
Coletta, R., & Negrevergne, B. (2016). A SAT model to mine flexible sequences in transactional datasets. arXiv:1604.00300.
Coquery, E., Jabbour, S., Saïs, L., & Salhi, Y. (2012). A SAT-Based approach for discovering frequent, closed and maximal patterns in a sequence. In Proceedings of European Conference on Artificial Intelligence (ECAI) (pp. 258–263).
Dao, T., Duong, K., & Vrain, C. (2015). Constrained minimum sum of squares clustering by constraint programming. In Proceedings of Principles and Practice of Constraint Programming (pp. 557–573).
De Raedt, L. (2015). Languages for learning and mining. In Proceedings of the Conference on Artificial Intelligence (AAAI) (pp. 4107–4111).
Garofalakis, M., Rastogi, R., & Shim, K. (1999). SPIRIT: Sequential pattern mining with regular expression constraints. In Proceedings of the International Conference on Very Large Data Bases (pp. 223–234).
Gebser, M., Guyet, T., Quiniou, R., Romero, J., & Schaub, T. (2016). Knowledge-based sequence mining with ASP. In Proceedings of International Join Conference on Artificial Intelligence (pp. 1497–1504).
Gebser, M., Kaminski, R., Kaufmann, B., Ostrowski, M., Schaub, T., & Schneider, M. (2011). Potassco: The Potsdam answer set solving collection. AI Communications, 24(2), 107–124.
Gebser, M., Kaminski, R., Kaufmann, B., Schaub, T. (2014). Clingo = ASP + control: Preliminary report. In Technical Communications of the Thirtieth International Conference on Logic Programming.
Gelfond, M., & Lifschitz, V. (1991). Classical negation in logic programs and disjunctive databases. New Generation Computing, 9, 365–385.
Guns, T., Dries, A., Nijssen, S., Tack, G., & De Raedt, L. (2015). MiningZinc: A declarative framework for constraint-based mining. Artificial Intelligence, page In press.
Guns, T., Nijssen, S., & De Raedt, L. (2011). Itemset mining: A constraint programming perspective. Artificial Intelligence, 175(12–13), 1951–1983.
Gupta, M., & Han, J. (2013). Data mining: Concepts, methodologies, tools, and applications, chapter Applications of pattern discovery using sequential data mining (pp. 947–970). IGI-Global.
Guyet, T., Moinard, Y., & Quiniou, R. (2014). Using answer set programming for pattern mining. In Proceedings of Conference “Intelligence Artificielle Fondamentale” (IAF).
Guyet, T., Moinard, Y., Quiniou, R., & Schaub, T. (2016). Fouille de motifs séquentiels avec ASP. In Proceedings of Conference “Extraction et la Gestion des Connaissances” (EGC) (pp. 39–50).
Imielinski, T., & Mannila, H. (1996). A database perspective on knowledge discovery. Communications of the ACM, 39(11), 58–64.
Janhunen, T., & Niemelä, I. (2016). The answer set programming paradigm. AI Magazine, 37, 13–24.
Järvisalo, M. (2011). Itemset mining as a challenge application for answer set enumeration. In Proceedings of the Conference on Logic Programming and Nonmonotonic Reasoning (pp. 304–310).
Lallouet, A., Moinard, Y., Nicolas, P., & Stéphan, I. (2013). Programmation logique. In P. Marquis, O. Papini, & H. Prade (Eds.), Panorama de l’intelligence artificielle: ses bases méthodologiques, ses développements (Vol. 2). Cépaduès.
Lefèvre, C., & Nicolas, P. (2009). The first version of a new ASP solver: ASPeRiX. In Proceedings of the Conference on Logic Programming and Nonmonotonic Reasoning (pp. 522–527).
Leone, N., Pfeifer, G., Faber, W., Eiter, T., Gottlob, G., Perri, S., et al. (2006). The DLV system for knowledge representation and reasoning. ACM Transactions on Computational Logic, 7(3), 499–562.
Lhote, L. (2010). Number of frequent patterns in random databases. In Skiadas, C. H. (Ed.), Advances in data analysis, Statistics for industry and technology (pp. 33–45).
Lifschitz, V. (2008). What is answer set programming? In Proceedings of the Conference on Artificial Intelligence (AAAI) (pp. 1594–1597).
Low-Kam, C., Raïssi, C., Kaytoue, M., & Pei, J. (2013). Mining statistically significant sequential patterns. In Proceedings of the IEEE International Conference on Data Mining (pp. 488–497).
Métivier, J.-P., Loudni, S., & Charnois, T. (2013). A constraint programming approach for mining sequential patterns in a sequence database. In Proceedings of the Workshops of the European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML/PKDD).
Mooney, C. H., & Roddick, J. F. (2013). Sequential pattern mining—Approaches and algorithms. ACM Computing Surveys, 45(2), 1–39.
Muggleton, S., & De Raedt, L. (1994). Inductive logic programming: Theory and methods. The Journal of Logic Programming, 19, 629–679.
Negrevergne, B., Dries, A., Guns, T., & Nijssen, S. (2013). Dominance programming for itemset mining. In Proceedings of the International Conference on Data Mining (pp. 557–566).
Negrevergne, B., & Guns, T. (2015). Constraint-based sequence mining using constraint programming. In Proceedings of International Conference on Integration of AI and OR Techniques in Constraint Programming, CPAIOR (pp. 288–305).
Nethercote, N., Stuckey, P. J., Becket, R., Brand, S., Duck, G. J., & Tack, G. (2007). MiniZinc: Towards a standard CP modelling language. In Proceedings of the Conference on Principles and Practice of Constraint Programming (pp. 529–543).
Pei, J., Han, J., Mortazavi-Asl, B., Wang, J., Pinto, H., Chen, Q., et al. (2004). Mining sequential patterns by pattern-growth: The prefixspan approach. IEEE Transactions on Knowledge and Data Engineering, 16(11), 1424–1440.
Pei, J., Han, J., & Wang, W. (2007). Constraint-based sequential pattern mining: The pattern-growth methods. Journal of Intelligent Information Systems, 28(2), 133–160.
Perer, A., & Wang, F. (2014). Frequence: Interactive mining and visualization of temporal frequent event sequences. In Proceedings of the international Conference on Intelligent User Interfaces (pp. 153–162).
Rossi, F., Van Beek, P., & Walsh, T. (2006). Handbook of constraint programming. Elsevier.
Shen, W., Wang, J., & Han, J. (2014). Sequential pattern mining. In Aggarwal, C. C., & Han, J. (Ed.), Frequent pattern mining (pp. 261–282). Springer.
Simons, P., Niemelä, I., & Soininen, T. (2002). Extending and implementing the stable model semantics. Artificial Intelligence, 138(1–2), 181–234.
Srikant, R., & Agrawal, R. (1996). Mining sequential patterns: Generalizations and performance improvements. In Proceedings of the 5th International Conference on Extending Database Technology (pp. 3–17).
Syrjänen, T., & Niemelä, I. (2001). The smodels system. In Proceedings of the Conference on Logic Programming and Nonmotonic Reasoning (pp. 434–438).
Ugarte, W., Boizumault, P., Crémilleux, B., Lepailleur, A., Loudni, S., Plantevit, M., Raïssi, C., & Soulet, A. (2015). Skypattern mining: From pattern condensed representations to dynamic constraint satisfaction problems. Artificial Intelligence, page In press.
Uno, T. (2004). http://research.nii.ac.jp/~uno/code/lcm_seq.html.
Vautier, A., Cordier, M., & Quiniou, R. (2007). Towards data mining without information on knowledge structure. In Proceedings of the Conference on Principles and Practice of Knowledge Discovery in Databases (pp. 300–311).
Wang, J., & Han, J. (2004). BIDE: Efficient mining of frequent closed sequences. In Proceedings of the International Conference on Data Engineering (pp. 79–90).
Yan, X., Han, J., & Afshar, R. (2003). CloSpan: Mining closed sequential patterns in large datasets. In Proceedings of the SIAM Conference on Data Mining (pp. 166–177).
Zaki, M. J. (2001). SPADE: An efficient algorithm for mining frequent sequences. Journal of Machine Learning, 42(1/2), 31–60.
Zhang, L., Luo, P., Tang, L., Chen, E., Liu, Q., Wang, M., et al. (2015). Occupancy-based frequent pattern mining. ACM Transactions on Knowledge Discovery from Data, 10(2), 1–33.
Acknowledgements
We would like to thanks Roland Kaminski and Max Ostrowski for their significant inputs and comments about ASP encodings; and Benjamin Negrevergne and Tias Guns for their suggestions about the experimental part. We also thank the anonymous reviewers for their valuable comments and constructive suggestions.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Appendix
Appendix
Listing 3.10 illustrates how the encoding of the skip-gaps strategy can be transformed to mine sequential patterns that are sequences of itemsets.
The first difference with the encoding of Listing 3.2 concerns the generation of patterns. The upper bound constraint of the choice rule in Line 9 has been removed, enabling the possible generation of every non-empty subset of \(\mathscr {I}\).
The second difference is that the new ASP rules verify the inclusion of all items in itemsets. Line 14, seq(T,P,I):pat(1,I) indicates that for each atom pat(1,I) there should exist an atom seq(T,P,I) to satisfy the rule body. A similar expression is used Line 15.
Rights and permissions
Copyright information
© 2018 Springer International Publishing AG
About this chapter
Cite this chapter
Guyet, T., Moinard, Y., Quiniou, R., Schaub, T. (2018). Efficiency Analysis of ASP Encodings for Sequential Pattern Mining Tasks. In: Pinaud, B., Guillet, F., Cremilleux, B., de Runz, C. (eds) Advances in Knowledge Discovery and Management. Studies in Computational Intelligence, vol 732. Springer, Cham. https://doi.org/10.1007/978-3-319-65406-5_3
Download citation
DOI: https://doi.org/10.1007/978-3-319-65406-5_3
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-65405-8
Online ISBN: 978-3-319-65406-5
eBook Packages: EngineeringEngineering (R0)