Skip to main content
Log in

Objectively evaluating condensed representations and interestingness measures for frequent itemset mining

  • Published:
Journal of Intelligent Information Systems Aims and scope Submit manuscript

Abstract

Itemset mining approaches, while having been studied for more than 15 years, have been evaluated only on a handful of data sets. In particular, they have never been evaluated on data sets for which the ground truth was known. Thus, it is currently unknown whether itemset mining techniques actually recover underlying patterns. Since the weakness of the algorithmically attractive support/confidence framework became apparent early on, a number of interestingness measures have been proposed. Their utility, however, has not been evaluated, except for attempts to establish congruence with expert opinions. Using an extension of the Quest generator proposed in the original itemset mining paper, we propose to evaluate these measures objectively for the first time, showing how many non-relevant patterns slip through the cracks.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15

Similar content being viewed by others

References

  • Agrawal, R., & Srikant, R. (1994). Fast algorithms for mining association rules in large databases. In 20th VLDB (pp. 487–499). Chile: Morgan Kaufmann.

  • Bayardo, R.J. Jr., Goethals, B., Zaki, M.J. (Eds.) (2004). FIMI 04, proceedings of the IEEE ICDM workshop on FIM implementations. Brighton.

  • Bie, T.D. (2011). Maximum entropy models and subjective interestingness: an application to tiles in binary databases. Data Mining and Knowledge Discovery, 23(3), 407–446.

    Article  MATH  MathSciNet  Google Scholar 

  • Blake, C., & Merz, C. (1998). UCI repository of machine learning databases. http://www.ics.uci.edu/mlearn/MLRepository.html.

  • Blanchard, J., Guillet, F., Gras, R., Briand, H. (2005). Using information-theoretic measures to assess association rule interestingness. In J. Han, B.W. Wah, V. Raghavan, X. Wu, R. Rastogi (Eds.), ICDM (pp. 66–73). Houston: IEEE.

  • Boulicaut, J.F., & Jeudy, B. (2001). Mining free itemsets under constraints. In M.E. Adiba, C. Collet, B.C. Desai (Eds.), IDEAS ’01 (pp. 322–329).

  • Brin, S., Motwani, R., Silverstein, C. (1997). Beyond market baskets: generalizing association rules to correlations. In J. Peckham (Ed.), (pp. 265–276).

  • Carvalho, D., Freitas, A., Ebecken, N. (2005). Evaluating the correlation between objective rule interestingness measures and real human interest. In A. Jorge, L. Torgo, P. Brazdil, R. Camacho, J. Gama (Eds.), PKDD (pp. 453–461). Springer.

  • Cooper, C., & Zito, M. (2007). Realistic synthetic data for testing association rule mining algorithms for market basket databases. In J.N. Kok, J. Koronacki, R.L. de Mántaras, S. Matwin, D. Mladenic, A. Skowron (Eds.), PKDD (pp. 398–405). Springer.

  • Gouda, K., & Zaki, M.J. (2005). Genmax: an efficient algorithm for mining maximal frequent itemsets. Data Mining and Knowledge Discovery, 11(3), 223–242.

    Article  MathSciNet  Google Scholar 

  • Han, J., Pei, J., Yin, Y. (2000). Mining frequent patterns without candidate generation. In W. Chen, J.F. Naughton, P.A. Bernstein (Eds.), SIGMOD conference (pp. 1–12). ACM.

  • Heikinheimo, H., Seppänen, J.K., Hinkkanen, E., Mannila, H., Mielikäinen, T. (2007). Finding low-entropy sets and trees from binary data. In P. Berkhin, R. Caruana, X. Wu (Eds.), KDD (pp. 350–359). ACM.

  • Lenca, P., Meyer, P., Vaillant, B., Lallich, S. (2008). On selecting interestingness measures for association rules: user oriented description and multiple criteria decision aid. European Journal of Operational Research, 184(2), 610–626.

    Article  MATH  Google Scholar 

  • Mampaey, M., & Vreeken, J. (2013). Summarizing categorical data by clustering attributes. Data Mining and Knowledge Discovery, 26(1), 130–173.

    Article  MATH  MathSciNet  Google Scholar 

  • Pasquier, N., Bastide, Y., Taouil, R., Lakhal, L. (1999). Discovering frequent closed itemsets for association rules. In C. Beeri, P. Buneman (Eds.), ICDT (pp. 398–416). Springer.

  • Peckham, J., & (Ed.) (1997). SIGMOD 1997, May 13–15. Tucson: ACM Press.

  • Pei, J., Han, J., Mao, R. (2000). Closet: an efficient algorithm for mining frequent closed itemsets. In ACM SIGMOD workshop on research issues in data mining and knowledge discovery (pp. 21–30).

  • Pei, Y., & Zaïane, O. (2006). A synthetic data generator for clustering and outlier analysis. Tech. rep.

  • Piatetsky-Shapiro, G. (1991). Discovery, analysis, and presentation of strong rules. In Knowledge discovery in databases (pp. 229–248). AAAI/MIT Press.

  • Ramesh, G., Zaki, M.J., Maniatty, W. (2005). Distribution-based synthetic database generation techniques for itemset mining. In IDEAS (pp. 307–316). IEEE.

  • Tan, P.N., Kumar, V., Srivastava, J. (2002). Selecting the right interestingness measure for association patterns. In KDD (pp. 32–41). ACM.

  • Vaillant, B., Lenca, P., Lallich, S. (2004). A clustering of interestingness measures. In E. Suzuki, S. Arikawa (Eds.), Discovery science (pp. 290–297). Springer.

  • Vreeken, J., van Leeuwen, M., Siebes, A. (2007). Preserving privacy through data generation. In N. Ramakrishnan, O. Zaiane (Eds.), ICDM (pp. 685–690). IEEE.

  • Wu, T., Chen, Y., Han, J. (2010). Re-examination of interestingness measures in pattern mining: a unified framework. Data Mining and Knowledge Discovery, 21(3), 371–397.

    Article  MathSciNet  Google Scholar 

  • Zaki, M.J. (2000). Scalable algorithms for association mining. IEEE Transactions on Knowledge and Data Engineering, 12(3), 372–390.

    Article  MathSciNet  Google Scholar 

  • Zaki, M.J., & Hsiao, C.J. (1999). Charm: an efficient algorithm for closed association rule mining. Tech. rep., CS Department, Rensselaer Polytech Institute.

  • Zaki, M.J., & Hsiao, C.J. (2002). Charm: an efficient algorithm for closed itemset mining. In Grossman, Han, Kumar, Mannila, Motwani (Eds.), SDM. SIAM.

  • Zheng, Z., Kohavi, R., Mason, L. (2001). Real world performance of association rule algorithms. In KDD (pp. 401–406).

Download references

Acknowledgments

We are grateful to Christian Borgelt and Tijl De Bie for their support w.r.t. the FPGrowth implementation and the MaxEnt Database Generator, respectively, and to our colleagues Matthijs van Leeuwen and Tias Guns, and the participants of Qimie 2013 for helpful discussions. Finally, we thank the anonymous reviewers for their help in improving the manuscript. The author is supported by a post-doctoral grant by the Fonds Wetenschappelijk Onderzoek Vlanderen (FWO).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Albrecht Zimmermann.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zimmermann, A. Objectively evaluating condensed representations and interestingness measures for frequent itemset mining. J Intell Inf Syst 45, 299–317 (2015). https://doi.org/10.1007/s10844-013-0297-9

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10844-013-0297-9

Keywords

Navigation