Abstract
Constraint programming (CP) and propositional satisfiability (SAT) based framework for modeling and solving pattern mining tasks has gained a considerable audience in recent years. However, this nice declarative and generic framework encounters a scaling problem. The huge size of constraints networks/propositional formulas encoding large datasets is identified as the main bottleneck of most existing approaches. In this paper, we propose a parallel SAT based framework for itemset mining problem to push forward the solving efficiency. The proposed approach is based on a divide-and-conquer paradigm, where the transaction database is partitioned using item-based guiding paths. Such decomposition allows us to derive smaller and independent Boolean formulas that can be solved in parallel. The performance and scalability of the proposed algorithm are evaluated through extensive experiments on several datasets. We demonstrate that our partition-based parallel SAT approach outperforms other CP approaches even in the sequential case, while significantly reducing the performances gap with specialized approaches.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
References
Agrawal, R., Imieliński, T., Swami, A.: Mining association rules between sets of items in large databases. In: ACM SIGMOD International Conference on Management of Data, pp. 207–216 (1993)
Bailleux, O., Boufkhad, Y.: Efficient CNF encoding of boolean cardinality constraints. In: International Conference on Principles and Practice of Constraint Programming CP, pp. 108–122 (2003)
Bastide, Y., Pasquier, N., Taouil, R., Stumme, G., Lakhal, L.: Mining minimal non-redundant association rules using frequent closed itemsets. In: Lloyd, J., et al. (eds.) CL 2000. LNCS (LNAI), vol. 1861, pp. 972–986. Springer, Heidelberg (2000). https://doi.org/10.1007/3-540-44957-4_65
Borgelt, C.: Frequent item set mining. Wiley Int. Rev.: Data Min. Knowl. Disc. 2(6), 437–456 (2012)
Boudane, A., Jabbour, S., Sais, L., Salhi, Y.: A sat-based approach for mining association rules. In: IJCAI, pp. 2472–2478 (2016)
Boudane, A., Jabbour, S., Sais, L., Salhi, Y.: Clustering complex data represented as propositional formulas. In: Kim, J., Shim, K., Cao, L., Lee, J.-G., Lin, X., Moon, Y.-S. (eds.) PAKDD 2017. LNCS (LNAI), vol. 10235, pp. 441–452. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-57529-2_35
Dao, T., Duong, K., Vrain, C.: Constrained clustering by constraint programming. Artif. Intell. 244, 70–94 (2017)
Davis, M., Logemann, G., Loveland, D.: A machine program for theorem proving. Commun. ACM 5, 394–397 (1962)
En, N., Srensson, N.: An extensible sat-solver. In: Proceedings of the Sixth International Conference on Theory and Applications of Satisfiability Testing (SAT 2003), pp. 502–518 (2002)
Ganji, M., Bailey, J., Stuckey, P.J.: A declarative approach to constrained community detection. In: International Conference on Principles and Practice of Constraint Programming, pp. 477–494 (2017)
Gebser, M., Guyet, T., Quiniou, R., Romero, J., Schaub, T.: Knowledge-based sequence mining with ASP. In: International Joint Conference on Artificial Intelligence, pp. 1497–1504 (2016)
Guns, T., Dries, A., Tack, G., Nijssen, S., Raedt, L.D.: Miningzinc: a modeling language for constraint-based mining. In: International Joint Conference on Artificial Intelligence, pp. 1365–1372 (2013)
Guns, T., Nijssen, S., Raedt, L.D.: Itemset mining: a constraint programming perspective. Artif. Intell. 175(12–13), 1951–1983 (2011)
Hamadi, Y., Jabbour, S., Sais, L.: Manysat: a parallel SAT solver. JSAT 6(4), 245–262 (2009)
Henriques, R., Lynce, I., Manquinho, V.M.: On when and how to use sat to mine frequent itemsets. CoRR, abs/1207.6253 (2012)
Jabbour, S., Mhadhbi, N., Raddaoui, B., Sais, L.: A sat-based framework for overlapping community detection in networks. In: Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining, pp. 786–798 (2017)
Jabbour, S., Sais, L., Salhi, Y.: A pigeon-hole based encoding of cardinality constraints. TPLP 13(4-5-Online-Supplement) (2013)
Jabbour, S., Sais, L., Salhi, Y.: The top-k frequent closed itemset mining using top-k SAT problem. In: ECML/PKDD, pp. 403–418 (2013)
Jabbour, S., Sais, L., Salhi, Y.: Decomposition based SAT encodings for itemset mining problems. In: Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining, pp. 662–674 (2015)
Jabbour, S., Sais, L., Salhi, Y.: Mining top-k motifs with a SAT-based framework. Artif. Intell. 244, 30–47 (2017)
Jeroslow, R.G., Wang, J.: Solving propositional satisfiability problems. Ann. Math. Artif. Intell. 1, 167–187 (1990)
Lazaar, N., Lebbah, Y., Loudni, S., Maamar, M., Lemière, V., Bessiere, C., Boizumault, P.: A global constraint for closed frequent pattern mining. In: International Conference on Principles and Practice of Constraint Programming, pp. 333–349 (2016)
Lin, Y.C., Wu, C., Tseng, V.S.: Mining high utility itemsets in big data. In: Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining, pp. 649–661 (2015)
Liu, L., Li, E., Zhang, Y., Tang, Z.: Optimization of frequent itemset mining on multiple-core processor. In: International Conference on Very Large Data Bases (2007)
Moens, S., Aksehirli, E., Goethals, B.: Frequent itemset mining for big data. In: IEEE International Conference on Big Data, pp. 111–118 (2013)
Négrevergne, B., Guns, T.: Constraint-based sequence mining using constraint programming. In: International Conference on Integration of AI and OR Techniques in Constraint Programming, pp. 288–305 (2015)
Négrevergne, B., Termier, A., Méhaut, J., Uno, T.: Discovering closed frequent itemsets on multicore: parallelizing computations and optimizing memory accesses. In: International Conference on High Performance Computing & Simulation, pp. 521–528 (2010)
Raedt, L.D., Guns, T., Nijssen, S.: Constraint programming for itemset mining. In: ACM SIGKDD, pp. 204–212 (2008)
Savasere, A., Omiecinski, E., Navathe, S.B.: An efficient algorithm for mining association rules in large databases. In: International Conference on Very Large Data Bases, pp. 432–444 (1995)
Schaus, P., Aoga, J.O.R., Guns, T.: Coversize: a global constraint for frequency-based itemset mining. In: International Conference on Principles and Practice of Constraint Programming, pp. 529–546 (2017)
Schubert, T., Lewis, M.D.T., Becker, B.: Pamiraxt: parallel SAT solving with threads and message passing. JSAT 6(4), 203–222 (2009)
Tseitin, G.: On the complexity of derivations in the propositional calculus. In: Studies in Mathematics and Mathematical Logic, pp. 115–125 (1968)
Wang, S., Yang, Y., Gao, Y., Chen, G., Zhang, Y.: Mapreduce-based closed frequent itemset mining with efficient redundancy filtering. In: IEEE International Conference on Data Mining Workshops ICDM, pp. 449–453 (2012)
Warners, J.P.: A linear-time transformation of linear inequalities into conjunctive normal form. Inf Process Lett 68(2), 63–69 (1998)
Zaïane, O.R., El-Hajj, M., Lu, P.: Fast parallel association rule mining without candidacy generation. In: IEEE International Conference on Data Mining, pp. 665–668 (2001)
Zaki, M.J.: Mining non-redundant association rules. Data Min. Knowl. Discov. 9(3), 223–248 (2004)
Zhang, H., Bonacina, M.P., Hsiang, J.: Psato: a distributed propositional prover and its application to quasigroup problems. J. Symbolic Comput. 21(4), 543–560 (1996)
Zitouni, M., Akbarinia, R., Yahia, S.B., Masseglia, F.: Massively distributed environments and closed itemset mining: the DCIM approach. In: International Conference on Advanced Information Systems Engineering, pp. 231–246 (2017)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer Nature Switzerland AG
About this paper
Cite this paper
Dlala, I.O., Jabbour, S., Raddaoui, B., Sais, L. (2018). A Parallel SAT-Based Framework for Closed Frequent Itemsets Mining. In: Hooker, J. (eds) Principles and Practice of Constraint Programming. CP 2018. Lecture Notes in Computer Science(), vol 11008. Springer, Cham. https://doi.org/10.1007/978-3-319-98334-9_37
Download citation
DOI: https://doi.org/10.1007/978-3-319-98334-9_37
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-98333-2
Online ISBN: 978-3-319-98334-9
eBook Packages: Computer ScienceComputer Science (R0)