Skip to main content

A Parallel SAT-Based Framework for Closed Frequent Itemsets Mining

  • Conference paper
  • First Online:
Principles and Practice of Constraint Programming (CP 2018)

Part of the book series: Lecture Notes in Computer Science ((LNPSE,volume 11008))

Abstract

Constraint programming (CP) and propositional satisfiability (SAT) based framework for modeling and solving pattern mining tasks has gained a considerable audience in recent years. However, this nice declarative and generic framework encounters a scaling problem. The huge size of constraints networks/propositional formulas encoding large datasets is identified as the main bottleneck of most existing approaches. In this paper, we propose a parallel SAT based framework for itemset mining problem to push forward the solving efficiency. The proposed approach is based on a divide-and-conquer paradigm, where the transaction database is partitioned using item-based guiding paths. Such decomposition allows us to derive smaller and independent Boolean formulas that can be solved in parallel. The performance and scalability of the proposed algorithm are evaluated through extensive experiments on several datasets. We demonstrate that our partition-based parallel SAT approach outperforms other CP approaches even in the sequential case, while significantly reducing the performances gap with specialized approaches.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    http://fimi.ua.ac.be/data/.

  2. 2.

    http://dtai.cs.kuleuven.be/CP4IM/datasets/.

  3. 3.

    http://www.cril.univ-artois.fr/decMining/.

  4. 4.

    http://www.lirmm.fr/~lazaar/cpminer.html.

  5. 5.

    https://lemierev.users.greyc.fr/closedpattern/.

  6. 6.

    https://www.info.ucl.ac.be/~pschaus.

  7. 7.

    http://research.nii.ac.jp/~uno/code/lcm.html.

References

  1. Agrawal, R., Imieliński, T., Swami, A.: Mining association rules between sets of items in large databases. In: ACM SIGMOD International Conference on Management of Data, pp. 207–216 (1993)

    Google Scholar 

  2. Bailleux, O., Boufkhad, Y.: Efficient CNF encoding of boolean cardinality constraints. In: International Conference on Principles and Practice of Constraint Programming CP, pp. 108–122 (2003)

    Chapter  Google Scholar 

  3. Bastide, Y., Pasquier, N., Taouil, R., Stumme, G., Lakhal, L.: Mining minimal non-redundant association rules using frequent closed itemsets. In: Lloyd, J., et al. (eds.) CL 2000. LNCS (LNAI), vol. 1861, pp. 972–986. Springer, Heidelberg (2000). https://doi.org/10.1007/3-540-44957-4_65

    Chapter  Google Scholar 

  4. Borgelt, C.: Frequent item set mining. Wiley Int. Rev.: Data Min. Knowl. Disc. 2(6), 437–456 (2012)

    Google Scholar 

  5. Boudane, A., Jabbour, S., Sais, L., Salhi, Y.: A sat-based approach for mining association rules. In: IJCAI, pp. 2472–2478 (2016)

    Google Scholar 

  6. Boudane, A., Jabbour, S., Sais, L., Salhi, Y.: Clustering complex data represented as propositional formulas. In: Kim, J., Shim, K., Cao, L., Lee, J.-G., Lin, X., Moon, Y.-S. (eds.) PAKDD 2017. LNCS (LNAI), vol. 10235, pp. 441–452. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-57529-2_35

    Chapter  Google Scholar 

  7. Dao, T., Duong, K., Vrain, C.: Constrained clustering by constraint programming. Artif. Intell. 244, 70–94 (2017)

    Article  MathSciNet  Google Scholar 

  8. Davis, M., Logemann, G., Loveland, D.: A machine program for theorem proving. Commun. ACM 5, 394–397 (1962)

    Article  MathSciNet  Google Scholar 

  9. En, N., Srensson, N.: An extensible sat-solver. In: Proceedings of the Sixth International Conference on Theory and Applications of Satisfiability Testing (SAT 2003), pp. 502–518 (2002)

    Google Scholar 

  10. Ganji, M., Bailey, J., Stuckey, P.J.: A declarative approach to constrained community detection. In: International Conference on Principles and Practice of Constraint Programming, pp. 477–494 (2017)

    Google Scholar 

  11. Gebser, M., Guyet, T., Quiniou, R., Romero, J., Schaub, T.: Knowledge-based sequence mining with ASP. In: International Joint Conference on Artificial Intelligence, pp. 1497–1504 (2016)

    Google Scholar 

  12. Guns, T., Dries, A., Tack, G., Nijssen, S., Raedt, L.D.: Miningzinc: a modeling language for constraint-based mining. In: International Joint Conference on Artificial Intelligence, pp. 1365–1372 (2013)

    Google Scholar 

  13. Guns, T., Nijssen, S., Raedt, L.D.: Itemset mining: a constraint programming perspective. Artif. Intell. 175(12–13), 1951–1983 (2011)

    Article  MathSciNet  Google Scholar 

  14. Hamadi, Y., Jabbour, S., Sais, L.: Manysat: a parallel SAT solver. JSAT 6(4), 245–262 (2009)

    MATH  Google Scholar 

  15. Henriques, R., Lynce, I., Manquinho, V.M.: On when and how to use sat to mine frequent itemsets. CoRR, abs/1207.6253 (2012)

    Google Scholar 

  16. Jabbour, S., Mhadhbi, N., Raddaoui, B., Sais, L.: A sat-based framework for overlapping community detection in networks. In: Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining, pp. 786–798 (2017)

    Chapter  Google Scholar 

  17. Jabbour, S., Sais, L., Salhi, Y.: A pigeon-hole based encoding of cardinality constraints. TPLP 13(4-5-Online-Supplement) (2013)

    Google Scholar 

  18. Jabbour, S., Sais, L., Salhi, Y.: The top-k frequent closed itemset mining using top-k SAT problem. In: ECML/PKDD, pp. 403–418 (2013)

    Chapter  Google Scholar 

  19. Jabbour, S., Sais, L., Salhi, Y.: Decomposition based SAT encodings for itemset mining problems. In: Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining, pp. 662–674 (2015)

    Chapter  Google Scholar 

  20. Jabbour, S., Sais, L., Salhi, Y.: Mining top-k motifs with a SAT-based framework. Artif. Intell. 244, 30–47 (2017)

    Article  MathSciNet  Google Scholar 

  21. Jeroslow, R.G., Wang, J.: Solving propositional satisfiability problems. Ann. Math. Artif. Intell. 1, 167–187 (1990)

    Article  Google Scholar 

  22. Lazaar, N., Lebbah, Y., Loudni, S., Maamar, M., Lemière, V., Bessiere, C., Boizumault, P.: A global constraint for closed frequent pattern mining. In: International Conference on Principles and Practice of Constraint Programming, pp. 333–349 (2016)

    Google Scholar 

  23. Lin, Y.C., Wu, C., Tseng, V.S.: Mining high utility itemsets in big data. In: Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining, pp. 649–661 (2015)

    Chapter  Google Scholar 

  24. Liu, L., Li, E., Zhang, Y., Tang, Z.: Optimization of frequent itemset mining on multiple-core processor. In: International Conference on Very Large Data Bases (2007)

    Google Scholar 

  25. Moens, S., Aksehirli, E., Goethals, B.: Frequent itemset mining for big data. In: IEEE International Conference on Big Data, pp. 111–118 (2013)

    Google Scholar 

  26. Négrevergne, B., Guns, T.: Constraint-based sequence mining using constraint programming. In: International Conference on Integration of AI and OR Techniques in Constraint Programming, pp. 288–305 (2015)

    MATH  Google Scholar 

  27. Négrevergne, B., Termier, A., Méhaut, J., Uno, T.: Discovering closed frequent itemsets on multicore: parallelizing computations and optimizing memory accesses. In: International Conference on High Performance Computing & Simulation, pp. 521–528 (2010)

    Google Scholar 

  28. Raedt, L.D., Guns, T., Nijssen, S.: Constraint programming for itemset mining. In: ACM SIGKDD, pp. 204–212 (2008)

    Google Scholar 

  29. Savasere, A., Omiecinski, E., Navathe, S.B.: An efficient algorithm for mining association rules in large databases. In: International Conference on Very Large Data Bases, pp. 432–444 (1995)

    Google Scholar 

  30. Schaus, P., Aoga, J.O.R., Guns, T.: Coversize: a global constraint for frequency-based itemset mining. In: International Conference on Principles and Practice of Constraint Programming, pp. 529–546 (2017)

    Google Scholar 

  31. Schubert, T., Lewis, M.D.T., Becker, B.: Pamiraxt: parallel SAT solving with threads and message passing. JSAT 6(4), 203–222 (2009)

    MATH  Google Scholar 

  32. Tseitin, G.: On the complexity of derivations in the propositional calculus. In: Studies in Mathematics and Mathematical Logic, pp. 115–125 (1968)

    Google Scholar 

  33. Wang, S., Yang, Y., Gao, Y., Chen, G., Zhang, Y.: Mapreduce-based closed frequent itemset mining with efficient redundancy filtering. In: IEEE International Conference on Data Mining Workshops ICDM, pp. 449–453 (2012)

    Google Scholar 

  34. Warners, J.P.: A linear-time transformation of linear inequalities into conjunctive normal form. Inf Process Lett 68(2), 63–69 (1998)

    Article  MathSciNet  Google Scholar 

  35. Zaïane, O.R., El-Hajj, M., Lu, P.: Fast parallel association rule mining without candidacy generation. In: IEEE International Conference on Data Mining, pp. 665–668 (2001)

    Google Scholar 

  36. Zaki, M.J.: Mining non-redundant association rules. Data Min. Knowl. Discov. 9(3), 223–248 (2004)

    Article  MathSciNet  Google Scholar 

  37. Zhang, H., Bonacina, M.P., Hsiang, J.: Psato: a distributed propositional prover and its application to quasigroup problems. J. Symbolic Comput. 21(4), 543–560 (1996)

    Article  MathSciNet  Google Scholar 

  38. Zitouni, M., Akbarinia, R., Yahia, S.B., Masseglia, F.: Massively distributed environments and closed itemset mining: the DCIM approach. In: International Conference on Advanced Information Systems Engineering, pp. 231–246 (2017)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Lakhdar Sais .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Dlala, I.O., Jabbour, S., Raddaoui, B., Sais, L. (2018). A Parallel SAT-Based Framework for Closed Frequent Itemsets Mining. In: Hooker, J. (eds) Principles and Practice of Constraint Programming. CP 2018. Lecture Notes in Computer Science(), vol 11008. Springer, Cham. https://doi.org/10.1007/978-3-319-98334-9_37

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-98334-9_37

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-98333-2

  • Online ISBN: 978-3-319-98334-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics