Advertisement

Journal of Intelligent Information Systems

, Volume 45, Issue 3, pp 299–317 | Cite as

Objectively evaluating condensed representations and interestingness measures for frequent itemset mining

  • Albrecht ZimmermannEmail author
Article

Abstract

Itemset mining approaches, while having been studied for more than 15 years, have been evaluated only on a handful of data sets. In particular, they have never been evaluated on data sets for which the ground truth was known. Thus, it is currently unknown whether itemset mining techniques actually recover underlying patterns. Since the weakness of the algorithmically attractive support/confidence framework became apparent early on, a number of interestingness measures have been proposed. Their utility, however, has not been evaluated, except for attempts to establish congruence with expert opinions. Using an extension of the Quest generator proposed in the original itemset mining paper, we propose to evaluate these measures objectively for the first time, showing how many non-relevant patterns slip through the cracks.

Keywords

Result verification Data generation Interestingness measures 

Notes

Acknowledgments

We are grateful to Christian Borgelt and Tijl De Bie for their support w.r.t. the FPGrowth implementation and the MaxEnt Database Generator, respectively, and to our colleagues Matthijs van Leeuwen and Tias Guns, and the participants of Qimie 2013 for helpful discussions. Finally, we thank the anonymous reviewers for their help in improving the manuscript. The author is supported by a post-doctoral grant by the Fonds Wetenschappelijk Onderzoek Vlanderen (FWO).

References

  1. Agrawal, R., & Srikant, R. (1994). Fast algorithms for mining association rules in large databases. In 20th VLDB (pp. 487–499). Chile: Morgan Kaufmann.Google Scholar
  2. Bayardo, R.J. Jr., Goethals, B., Zaki, M.J. (Eds.) (2004). FIMI 04, proceedings of the IEEE ICDM workshop on FIM implementations. Brighton.Google Scholar
  3. Bie, T.D. (2011). Maximum entropy models and subjective interestingness: an application to tiles in binary databases. Data Mining and Knowledge Discovery, 23(3), 407–446.zbMATHMathSciNetCrossRefGoogle Scholar
  4. Blake, C., & Merz, C. (1998). UCI repository of machine learning databases. http://www.ics.uci.edu/mlearn/MLRepository.html.
  5. Blanchard, J., Guillet, F., Gras, R., Briand, H. (2005). Using information-theoretic measures to assess association rule interestingness. In J. Han, B.W. Wah, V. Raghavan, X. Wu, R. Rastogi (Eds.), ICDM (pp. 66–73). Houston: IEEE.Google Scholar
  6. Boulicaut, J.F., & Jeudy, B. (2001). Mining free itemsets under constraints. In M.E. Adiba, C. Collet, B.C. Desai (Eds.), IDEAS ’01 (pp. 322–329).Google Scholar
  7. Brin, S., Motwani, R., Silverstein, C. (1997). Beyond market baskets: generalizing association rules to correlations. In J. Peckham (Ed.), (pp. 265–276).Google Scholar
  8. Carvalho, D., Freitas, A., Ebecken, N. (2005). Evaluating the correlation between objective rule interestingness measures and real human interest. In A. Jorge, L. Torgo, P. Brazdil, R. Camacho, J. Gama (Eds.), PKDD (pp. 453–461). Springer.Google Scholar
  9. Cooper, C., & Zito, M. (2007). Realistic synthetic data for testing association rule mining algorithms for market basket databases. In J.N. Kok, J. Koronacki, R.L. de Mántaras, S. Matwin, D. Mladenic, A. Skowron (Eds.), PKDD (pp. 398–405). Springer.Google Scholar
  10. Gouda, K., & Zaki, M.J. (2005). Genmax: an efficient algorithm for mining maximal frequent itemsets. Data Mining and Knowledge Discovery, 11(3), 223–242.MathSciNetCrossRefGoogle Scholar
  11. Han, J., Pei, J., Yin, Y. (2000). Mining frequent patterns without candidate generation. In W. Chen, J.F. Naughton, P.A. Bernstein (Eds.), SIGMOD conference (pp. 1–12). ACM.Google Scholar
  12. Heikinheimo, H., Seppänen, J.K., Hinkkanen, E., Mannila, H., Mielikäinen, T. (2007). Finding low-entropy sets and trees from binary data. In P. Berkhin, R. Caruana, X. Wu (Eds.), KDD (pp. 350–359). ACM.Google Scholar
  13. Lenca, P., Meyer, P., Vaillant, B., Lallich, S. (2008). On selecting interestingness measures for association rules: user oriented description and multiple criteria decision aid. European Journal of Operational Research, 184(2), 610–626.zbMATHCrossRefGoogle Scholar
  14. Mampaey, M., & Vreeken, J. (2013). Summarizing categorical data by clustering attributes. Data Mining and Knowledge Discovery, 26(1), 130–173.zbMATHMathSciNetCrossRefGoogle Scholar
  15. Pasquier, N., Bastide, Y., Taouil, R., Lakhal, L. (1999). Discovering frequent closed itemsets for association rules. In C. Beeri, P. Buneman (Eds.), ICDT (pp. 398–416). Springer.Google Scholar
  16. Peckham, J., & (Ed.) (1997). SIGMOD 1997, May 13–15. Tucson: ACM Press.Google Scholar
  17. Pei, J., Han, J., Mao, R. (2000). Closet: an efficient algorithm for mining frequent closed itemsets. In ACM SIGMOD workshop on research issues in data mining and knowledge discovery (pp. 21–30).Google Scholar
  18. Pei, Y., & Zaïane, O. (2006). A synthetic data generator for clustering and outlier analysis. Tech. rep.Google Scholar
  19. Piatetsky-Shapiro, G. (1991). Discovery, analysis, and presentation of strong rules. In Knowledge discovery in databases (pp. 229–248). AAAI/MIT Press.Google Scholar
  20. Ramesh, G., Zaki, M.J., Maniatty, W. (2005). Distribution-based synthetic database generation techniques for itemset mining. In IDEAS (pp. 307–316). IEEE.Google Scholar
  21. Tan, P.N., Kumar, V., Srivastava, J. (2002). Selecting the right interestingness measure for association patterns. In KDD (pp. 32–41). ACM.Google Scholar
  22. Vaillant, B., Lenca, P., Lallich, S. (2004). A clustering of interestingness measures. In E. Suzuki, S. Arikawa (Eds.), Discovery science (pp. 290–297). Springer.Google Scholar
  23. Vreeken, J., van Leeuwen, M., Siebes, A. (2007). Preserving privacy through data generation. In N. Ramakrishnan, O. Zaiane (Eds.), ICDM (pp. 685–690). IEEE.Google Scholar
  24. Wu, T., Chen, Y., Han, J. (2010). Re-examination of interestingness measures in pattern mining: a unified framework. Data Mining and Knowledge Discovery, 21(3), 371–397.MathSciNetCrossRefGoogle Scholar
  25. Zaki, M.J. (2000). Scalable algorithms for association mining. IEEE Transactions on Knowledge and Data Engineering, 12(3), 372–390.MathSciNetCrossRefGoogle Scholar
  26. Zaki, M.J., & Hsiao, C.J. (1999). Charm: an efficient algorithm for closed association rule mining. Tech. rep., CS Department, Rensselaer Polytech Institute.Google Scholar
  27. Zaki, M.J., & Hsiao, C.J. (2002). Charm: an efficient algorithm for closed itemset mining. In Grossman, Han, Kumar, Mannila, Motwani (Eds.), SDM. SIAM.Google Scholar
  28. Zheng, Z., Kohavi, R., Mason, L. (2001). Real world performance of association rule algorithms. In KDD (pp. 401–406).Google Scholar

Copyright information

© Springer Science+Business Media New York 2013

Authors and Affiliations

  1. 1.KU LeuvenLeuvenBelgium

Personalised recommendations