Skip to main content

Sets of Robust Rules, and How to Find Them

  • Conference paper
  • First Online:
Machine Learning and Knowledge Discovery in Databases (ECML PKDD 2019)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 11906))

Abstract

Association rules are among the most important concepts in data mining. Rules of the form \(X \rightarrow Y\) are simple to understand, simple to act upon, yet can model important local dependencies in data. The problem is, however, that there are so many of them. Both traditional and state-of-the-art frameworks typically yield millions of rules, rather than identifying a small set of rules that capture the most important dependencies of the data. In this paper, we define the problem of association rule mining in terms of the Minimum Description Length principle. That is, we identify the best set of rules as the one that most succinctly describes the data. We show that the resulting optimization problem does not lend itself for exact search, and hence propose Grab, a greedy heuristic to efficiently discover good sets of noise-resistant rules directly from data. Through extensive experiments we show that, unlike the state-of-the-art, Grab does reliably recover the ground truth. On real world data we show it finds reasonable numbers of rules, that upon close inspection give clear insight in the local distribution of the data.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
EUR 32.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or Ebook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    http://eda.mmci.uni-saarland.de/grab/.

  2. 2.

    http://eda.mmci.uni-saarland.de/grab/.

  3. 3.

    No relation to the first author.

References

  1. Agrawal, R., Srikant, R.: Fast algorithms for mining association rules. In: VLDB, pp. 487–499 (1994)

    Google Scholar 

  2. Bayardo, R.: Efficiently mining long patterns from databases. In: SIGMOD, pp. 85–93 (1998)

    Google Scholar 

  3. Calders, T., Goethals, B.: Non-derivable itemset mining. Data Min. Knowl. Disc. 14(1), 171–206 (2007). https://doi.org/10.1007/s10618-006-0054-6

    Article  MathSciNet  Google Scholar 

  4. De Bie, T.: Maximum entropy models and subjective interestingness: an application to tiles in binary databases. Data Min. Knowl. Disc. 23(3), 407–446 (2011). https://doi.org/10.1007/s10618-010-0209-3

    Article  MathSciNet  MATH  Google Scholar 

  5. Fowkes, J., Sutton, C.: A subsequence interleaving model for sequential pattern mining. In: KDD (2016)

    Google Scholar 

  6. Grünwald, P.: The Minimum Description Length Principle. MIT Press, Cambridge (2007)

    Book  Google Scholar 

  7. Hämäläinen, W.: Kingfisher: an efficient algorithm for searching for both positive and negative dependency rules with statistical significance measures. Knowl. Inf. Syst. 32(2), 383–414 (2012). https://doi.org/10.1007/s10115-011-0432-2

    Article  Google Scholar 

  8. Han, J., Pei, J., Yin, Y.: Mining frequent patterns without candidate generation. In: SIGMOD, pp. 1–12. ACM (2000)

    Google Scholar 

  9. Jaroszewicz, S., Simovici, D.A.: Interestingness of frequent itemsets using Bayesian networks as background knowledge. In: KDD, pp. 178–186. ACM (2004)

    Google Scholar 

  10. Kontkanen, P., Myllymäki, P.: MDL histogram density estimation. In: AISTATS (2007)

    Google Scholar 

  11. Li, M., Vitányi, P.: An Introduction to Kolmogorov Complexity and Its Applications. Springer, New York (1993). https://doi.org/10.1007/978-1-4757-3860-5

    Book  MATH  Google Scholar 

  12. Lucchese, C., Orlando, S., Perego, R.: Mining top-k patterns from binary datasets in presence of noise. In: SDM, pp. 165–176 (2010)

    Google Scholar 

  13. Mampaey, M., Vreeken, J., Tatti, N.: Summarizing data succinctly with the most informative itemsets. ACM TKDD 6, 1–44 (2012)

    Article  Google Scholar 

  14. Mannila, H., Toivonen, H., Verkamo, A.I.: Efficient algorithms for discovering association rules. In: KDD, pp. 181–192 (1994)

    Google Scholar 

  15. Miettinen, P., Vreeken, J.: MDL4BMF: minimum description length for Boolean matrix factorization. ACM TKDD 8(4), A18:1–31 (2014)

    Google Scholar 

  16. Mitchell-Jones, T.: Societas Europaea Mammalogica (1999). http://www.european-mammals.org

  17. Moerchen, F., Thies, M., Ultsch, A.: Efficient mining of all margin-closed itemsets with applications in temporal knowledge discovery and classification by compression. Knowl. Inf. Syst. 29(1), 55–80 (2011). https://doi.org/10.1007/s10115-010-0329-5

    Article  Google Scholar 

  18. Myllykangas, S., Himberg, J., Böhling, T., Nagy, B., Hollmén, J., Knuutila, S.: DNA copy number amplification profiling of human neoplasms. Oncogene 25(55), 7324–7332 (2006)

    Article  Google Scholar 

  19. Papaxanthos, L., Llinares-López, F., Bodenham, D.A., Borgwardt, K.M.: Finding significant combinations of features in the presence of categorical covariates. In: NIPS, pp. 2271–2279 (2016)

    Google Scholar 

  20. Pearl, J.: Causality: Models, Reasoning and Inference, 2nd edn. Cambridge University Press, Cambridge (2009)

    Book  Google Scholar 

  21. Pellegrina, L., Vandin, F.: Efficient mining of the most significant patterns with permutation testing. In: KDD, pp. 2070–2079 (2018)

    Google Scholar 

  22. Rissanen, J.: Modeling by shortest data description. Automatica 14(1), 465–471 (1978)

    Article  Google Scholar 

  23. Rissanen, J.: A universal prior for integers and estimation by minimum description length. Ann. Stat. 11(2), 416–431 (1983)

    Article  MathSciNet  Google Scholar 

  24. Tatti, N.: Maximum entropy based significance of itemsets. Knowl. Inf. Syst. 17(1), 57–77 (2008)

    Article  Google Scholar 

  25. Tatti, N., Vreeken, J.: Finding good itemsets by packing data. In: ICDM, pp. 588–597 (2008)

    Google Scholar 

  26. Vreeken, J., Tatti, N.: Interesting patterns. In: Aggarwal, C.C., Han, J. (eds.) Frequent Pattern Mining, pp. 105–134. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-07821-2_5

    Chapter  Google Scholar 

  27. Vreeken, J., van Leeuwen, M., Siebes, A.: KRIMP: mining itemsets that compress. Data Min. Knowl. Disc. 23(1), 169–214 (2011). https://doi.org/10.1007/s10618-010-0202-x

    Article  MathSciNet  MATH  Google Scholar 

  28. Wang, F., Rudin, C.: Falling rule lists. In: AISTATS (2015)

    Google Scholar 

  29. Webb, G.I.: Discovering significant patterns. Mach. Learn. 68(1), 1–33 (2007). https://doi.org/10.1007/s10994-007-5006-x

    Article  MathSciNet  Google Scholar 

  30. Xiang, Y., Jin, R., Fuhry, D., Dragan, F.F.: Succinct summarization of transactional databases: an overlapped hyperrectangle scheme. In: KDD, pp. 758–766 (2008)

    Google Scholar 

  31. Zaki, M.J., Parthasarathy, S., Ogihara, M., Li, W.: New algorithms for fast discovery of association rules. In: KDD, August 1997

    Google Scholar 

  32. Zimmermann, A., Nijssen, S.: Supervised pattern mining and applications to classification. In: Aggarwal, C.C., Han, J. (eds.) Frequent Pattern Mining, pp. 425–442. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-07821-2_17

    Chapter  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jonas Fischer .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Fischer, J., Vreeken, J. (2020). Sets of Robust Rules, and How to Find Them. In: Brefeld, U., Fromont, E., Hotho, A., Knobbe, A., Maathuis, M., Robardet, C. (eds) Machine Learning and Knowledge Discovery in Databases. ECML PKDD 2019. Lecture Notes in Computer Science(), vol 11906. Springer, Cham. https://doi.org/10.1007/978-3-030-46150-8_3

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-46150-8_3

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-46149-2

  • Online ISBN: 978-3-030-46150-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics