Abstract
A number of studies, theoretical, empirical, or both, have been conducted to provide insight into the properties and behavior of interestingness measures for association rule mining. While each has value in its own right, most are either limited in scope or, more importantly, ignore the purpose for which interestingness measures are intended, namely the ultimate ranking of discovered association rules. This paper, therefore, focuses on an analysis of the rule-ranking behavior of 61 well-known interestingness measures tested on the rules generated from 110 different datasets. By clustering based on ranking behavior, we highlight, and formally prove, previously unreported equivalences among interestingness measures. We also show that there appear to be distinct clusters of interestingness measures, but that there remain differences among clusters, confirming that domain knowledge is essential to the selection of an appropriate interestingness measure for a particular task and business objective.
Similar content being viewed by others
References
Abe H, Tsumoto S (2008) Analyzing behavior of objective rule evaluation indices based on a correlation coefficient. In: Proceedings of the 12th international conference on knowledge-based intelligent information and engineering systems (LNAI 5178), pp 758–765
Aggarwal C, Yu P (1998) A new framework for itemset generation. In: Proceedings of the 7th ACM symposium on principles of database systems, pp 18–24
Agrawal R, Srikant R (1994) Fast algorithms for mining association rules. In: Proceedings of the 20th international conference on very large data bases, pp 487–499
Agrawal R, Imieliński T, Swami A (1993) Mining association rules between sets of items in large databases. ACM SIGMOD Rec 22(2):207–216
Ali K, Manganaris S, Srikant R (1997) Partial classification using association rules. In: Proceedings of the 3rd ACM SIGKDD international conference on knowledge discovery and data mining, pp 115–118
Arunasalam B, Chawla S (2006) CCCS: a top-down associative classifier for imbalanced class distribution. In: Proceedings of the 12th ACM SIGKDD international conference on knowledge discovery and data mining, pp 517–522
Asuncion A, Newman D (2007) UCI machine learning repository. School of Information and Computer Sciences, University of California, Irvine. http://www.ics.uci.edu/mlearn/mlrepository.html
Azé J, Kodratoff Y (2002) Evaluation de la résistance au bruit de quelques mesures d’extraction de règles d’association. In: Actes des 2èmes Journées Extraction et Gestion des Connaissances, pp 143–154
Bertrand P, Bel Mufti G (2006) Loevinger’s measures of rule quality for assessing cluster stability. Comput Stat Data Anal 50(4):992–1015
Berzal F, Blanco I, Sánchez D, Vila MA (2002) Measuring the accuracy and interest of association rules: a new framework. Intell Data Anal 6(3):221–235
Blachman N (1968) The amount of information that y gives about x. IEEE Trans Inf Theory 14(1):27–31
Blanchard J, Kuntz P, Guillet F, Gras R (2003) Implication intensity: from the basic statistical definition to the entropic version. In: Bozdogan H (ed) Statistical data mining and knowledge discovery. Chapman & Hall/CRC Press, Boca Raton, pp 475–493
Blanchard J, Guillet F, Gras R, Briand H (2004) Mesurer la qualité des règles et de leurs contraposées avec le taux informationnel tic. In: Actes des 4èmes Journées Extraction et Gestion des Connaissances, pp 287–298
Blanchard J, Guillet F, Briand H, Gras R (2005a) Assessing rule interestingness with a probabilistic measure of deviation from equilibrium. In: Proceedings of the 11th international symposium on applied stochastic models and data analysis, pp 191–200
Blanchard J, Guillet F, Gras R, Briand H (2005b) Using information-theoretic measures to assess association rule interestingness. In: Proceedings of the 5th IEEE international conference on data mining, pp 66–73
Breiman L, Friedman J, Olshen R, Stone C (1984) Classification and regression trees. Chapman & Hall/CRC Press, Boca Raton
Brin S, Motwani R, Silverstein C (1997a) Beyond market baskets: generalizing association rules to correlations. In: Proceedings of the ACM SIGMOD international conference on management of data, pp 265–276
Brin S, Motwani R, Ullman J, Tsur S (1997b) Dynamic itemset counting and implication rules for market basket data. In: Proceedings of the ACM SIGMOD international conference on management of data, pp 255–264
Clark P, Boswell R (1991) Rule induction with CN2: some recent improvements. In: Proceedings of the 5th European working session on, learning, pp 151–163
Cohen J (1960) A coefficient of agreement for nominal scales. Educ Psychol Meas 20(1):37–46
Corey D, Dunlap W, Burke M (1998) Averaging correlations: expected values and bias in combined Pearson \(r\)s and Fisher’s \(z\) transformations. J Gen Psychol 125(3):245–261
De Bie T, Kontonasios KN, Spyropoulou E (2010) A framework for mining interesting pattern sets. SIGKDD Explor 12(2):92–100
Duda R, Gaschnig J, Hart P (1981) Model design in the prospector consultant system for mineral exploration. In: Webber B, Nilsson N (eds) Readings in artificial intelligence. Tioga, Palo Alto, pp 334–348
Fieller E, Hartley H, Pearson E (1957) Test for rank correlation coefficients. I. Biometrika 44(3/4):470–481
Fürnkranz J, Flach P (2005) Roc n rule learning—towards a better understanding of covering algorithms. Mach Learn 58(1):39–77
Gallo A, De Bie T, Cristianini N (2007) MINI: mining informative non-redundant itemsets. In: Proceedings of the 11th conference on principles and practice of knowledge discovery in databases, pp 438–445
Ganascia J (1991) CHARADE: Apprentissage de bases de connaissances. In: Kodratoff Y, Diday E (eds) Induction Symbolique-Numérique à Partir de Données. Cépaduès-éditions, Toulouse
Geng L, Hamilton H (2006) Interestingness measures for data mining: a survey. ACM Comput Surv 38(3):1–32
Goodman L, Kruskal W (1954) Measures of association for cross-classifications. J Am Stat Soc 49(268):732–764
Gras R, Larher A (1992) L’implication statistique, une nouvelle méthode d’analyse de données. Mathématiques et Sciences Humaines 120:5–31
Gray B, Orlowska M (1998) CCAIIA: clustering categorical attributes into interesting association rules. In: Proceedings of the 2nd Pacific Asia conference on knowledge discovery and data mining, pp 132–143
Greenacre M, Primicerio R (2013) Multivariate data analysis for ecologists. Foundation BBVA, Madrid
Hahsler M, Hornik K (2007) New probabilistic interest measures for association rules. Intell Data Anal 11(5):437–455
Hill T, Lewicki P (2007) Statistics: methods and applications. StatSoft, Tulsa. http://www.statsoft.com/textbook/
Huynh XH, Guillet F, Briand H (2005) A data analysis approach for evaluating the behavior of interestingness measures. In: Proceedings of the 8th international conference on discovery science (LNAI 3735), pp 330–337
Huynh XH, Guillet F, Briand H (2006) Discovering the stable clusters between interestingness measures. In: Proceedings of the 8th international conference on enterprise information systems: databases and information systems integration, pp 196–201
Huynh XH, Guillet F, Blanchard J, Kuntz P, Briand H, Gras R (2007) A graph-based clustering approach to evaluate interestingness measures: a tool and a comparative study. In: Guillet F, Hamilton H (eds) Quality measures in data mining, vol 43. Studies in computational intelligence, Springer, Heidelberg, pp 25–50
Jaccard P (1901) Etude comparative de la distribution florale dans une portion des alpes et des jura. Bulletin de la Société Vaudoise des Sciences Naturelles 37:547–579
Jain A, Dubes R (1988) Algorithms for clustering data. Prentice-Hall, Inc., Englewood Cliffs
Jain A, Murty M, Flynn P (1999) Data clustering: a review. ACM Comput Surv 31(3):264–323
Jalali-Heravi M, Zaïane O (2010) A study on interestingness measures for associative classifiers. In: Proceedings of the 25th ACM symposium on applied computing, pp 1039–1046
Jaroszewicz S, Simovici D (2004) Interestingness of frequent itemsets using Bayesian networks as background knowledge. In: Proceedings of the 10th ACM SIGKDD international conference on knowledge discovery and data mining, pp 178–186
Johnson S (1967) Hierarchical clustering schemes. Psychometrika 2:241–254
Kamber M, Shinghal R (1996) Evaluating the interestingness of characteristic rules. In: Proceedings of the 2nd ACM SIGKDD international conference on knowledge discovery and data mining, pp 263–266
Kannan S, Bhaskaran R (2009) Association rule pruning based on interestingness measures with clustering. Int J Comput Sci Issues 6(1):35–45
Kiran U, Re K et al (2009) An improved multiple minimum support-based approach to mine rare association rules. In: Proceedings of the IEEE symposium on computational intelligence and data mining, pp 340–347
Klösgen W (1996) Explora: a multipattern and multistrategy discovery assistant. In: Fayyad U, Piatetsky-Shapiro G, Smyth P, Uthurusamy R (eds) Advances in knowledge discovery and data mining. AAAI Press, Cambridge, pp 249–271
Kodratoff Y (2001) Comparing machine learning and knowledge discovery in databases: an application to knowledge discovery in texts. In: Paliouras G, Karkaletsis V, Spyropoulos CD (eds) Machine learning and its applications. Springer, New York, pp 1–21
Koh Y, Pears R (2008) Rare association rule mining via transaction clustering. In: Proceedings of the 7th Australasian conference on knowledge discovery and data mining, pp 87–94
Kulczynski S (1927) Die pflanzenassoziationen der pieninen. Bulletin International de l’Académie Polonaise des Sciences et des Lettres, Classe des Sciences Mathématiques et Naturelles B 2:57–203
Lallich S, Teytaud O, Prudhomme E (2007) Association rule interestingness: measure and statistical validation. In: Guillet F, Hamilton H (eds) Quality measures in data mining, vol 43. Studies in computational intelligence. Springer, Heidelberg, pp 251–275
Lan Y, Chen G, Janssens D, Wets G (2004) Dilated chi-square: a novel interestingness measure to build accurate and compact decision list. In: Proceedings of the international conference on intelligent information processing, pp 233–237
Lan Y, Janssens D, Chen G, Wets G (2006) Improving associative classification by incorporating novel interestingness measures. Expert Syst Appl 31(1):184–192
Lavrac̆ N, Flach P, Zupan B (1999) Rule evaluation measures: a unifying view. In: Proceedings of the 9th international workshop on inductive logic programming (LNAI 1634), pp 174–185
Lee J, Giraud-Carrier C (2011) A metric for unsupervised metalearning. Intell Data Anal 15(6):827–841
Lenca P, Vaillant B, Meyer P, Lallich S (2007) Association rule interestingness measures: experimental and theoretical studies. ReCALL 43:51–76
Lenca P, Meyer P, Vaillant B, Lallich S (2008) On selecting interestingness measures for association rules: user oriented description and multiple criteria decision aid. Eur J Oper Res 184(2):610–626
Lerman I, Gras R, Rostam H (1981a) Elaboration et evaluation d’un indice d’ implication pour des données binaires 1. Mathématiques et Sciences Humaines 74:5–35
Lerman I, Gras R, Rostam H (1981b) Elaboration et evaluation d’un indice d’ implication pour des données binaires 2. Mathématiques et Sciences Humaines 75:5–47
Li J (2006) On optimal rule discovery. IEEE Trans Knowl Data Eng 18(4):460–471
Liu B, Hsu W, Ma Y (1999) Mining association rules with multiple minimum supports. In: Proceedings of the 5th ACM SIGKDD international conference on knowledge discovery and data mining, pp 337–341
Loevinger J (1947) A systematic approach to the construction and evaluation of tests of ability. Psychol Monogr 61(4):1–49
Mampaey M, Tatti N, Vreeken J (2011) Tell me what I need to know: succinctly summarizing data with itemsets. In: Proceedings of the 17th ACM SIGKDD international conference on knowledge discovery and data mining, pp 573–581
McGarry K (2005) A survey of interestingness measures for knowledge discovery. Knowl Eng Rev 20(1):39–61
Meilă M (2012) Logical equivalences of distances between clusterings—a geometric perspective. Mach Learn 86(3):369–389
Mosteller F (1968) Association and estimation in contingency tables. J Am Stat Soc 63(321):1–28
Ohsaki M, Sato Y, Yokoi H, Yamaguchi T (2002) A rule discovery support system doe sequential medical data–in the case study of a chronic hepatitis dataset. In: Proceedings of the ICDM workshop on active mining, pp 97–102
Ohsaki M, Kitaguchi S, Yokoi H, Yamaguchi T (2003) Investigation of rule interestingness in medical data mining. In: Proceedings of the 2nd international workshop on active mining (LNAI 3430), pp 174–189
Ohsaki M, Kitaguchi S, Okamoto K, Yokoi H, Yamaguchi T (2004) Evaluation of rule interestingness measures with a clinical dataset on hepatitis. In: Proceedings of the 8th European conference on principles and practice of knowledge discovery in databases (LNAI 3203), pp 362–373
Padmanabhan B (2004) The interestingness paradox in pattern discovery. J Appl Stat 31(8):1019–1035
Peterson A, Martinez T (2005) Estimating the potential for combining learning models. In: Proceedings of the ICML workshop on meta-learning, pp 68–75
Piatetsky-Shapiro G (1991) Discovery, analysis, and presentation of strong rules. In: Piatetsky-Shapiro G, Frawley WJ (eds) Knowledge discovery in databases. AAAI Press, Cambridge, pp 229–248
Plasse M, Niang N, Saportaa G, Villeminot A, Leblond L (2007) Combined use of association rules mining and clustering methods to find relevant links between binary rare attributes in a large data set. Comput Stat Data Anal 52(1):596–613
R Development Core Team (2007) R: a language and environment for statistical computing. R Foundation for Statistical Computing, Vienna
Ritschard G, Zighed D (2006) Implication strength of classification rules. In: Proceedings of the 16th international symposium on methodologies for intelligent systems (LNCS 4203), pp 463–472
Sahar S (1999) Interestingness via what is not interesting. In: Proceedings of the 5th ACM SIGKDD international conference on knowledge discovery and data mining, pp 332–336
Sahar S (2002) Exploring interestingness through clustering: a framework. In: Proceedings of the 2nd IEEE international conference on data mining, pp 677–680
Sahar S (2003) What is interesting: studies on interestingness in knowledge discovery. PhD thesis, School of Computer Science, Tel-Aviv University
Sahar S (2010) Interestingness measures—on determining what is interesting. In: Maimon O, Rokach L (eds) Data mining and knowledge discovery handbook, 2nd edn. Springer, New York, pp 603–612
Sebag M, Schoenauer M (1988) Generation of rules with certainty and confidence factors from incomplete and incoherent learning bases. In: Proceedings of the European knowledge acquisition, workshop, pp 28.1-28.20
Silver N, Dunlap W (1987) Averaging correlation coefficients: should Fisher’s \(z\) transformation be used? J Appl Psychol 72(1):146–148
Smyth P, Goodman R (1992) An information theoretic approach to rule induction from databases. IEEE Trans Knowl Data Eng 4(4):301–316
Spyropoulou E, De Bie T (2011) Interesting multi-relational patterns. In: Proceedings of the 11th international conference on data mining, pp 675–684
Stiglic G, Kokol P (2009) GEMLeR: gene expression machine learning repository. Faculty of Health Sciences, University of Maribor. http://gemler.fzv.uni-mb.si/
Tan P, Kumar V (2000) Interestingness measures for association patterns: a perspective. In: Proceedings of the KDD’00 workshop on postprocessing in machine learning and data mining
Tan P, Kumar V, Srivastava J (2002) Selecting the right interestingness measure for association patterns. In: Proceedings of the 8th ACM SIGKDD international conference on knowledge discovery and data mining, pp 32–41
Tan P, Kumar V, Srivastava J (2004) Selecting the right objective measure for association analysis. Inf Syst 29(4):293–313
Tatti N, Mampaey M (2010) Using background knowledge to rank itemsets. Data Min Knowl Discov 21(2):293–309
Vaillant B, Lenca P, Lallich S (2004) A clustering of interestingness measures. In: Proceedings of the 7th international conference on discovery science (LNAI 3245), pp 290–297
Verhein F, Chawla S (2007) Using significant, positively associated and relatively class correlated rules for associative classification of imbalanced datasets. In: Proceedings of the 7th IEEE international conference on data mining, pp 679–684
Webb G (2006) Discovery significant rule. In: Proceedings of the 12th ACM SIGKDD international conference on knowledge discovery and data mining, pp 434–443
Webb G (2010) Self-sufficient itemsets: an approach to screening potentially interesting associations between items. ACM Trans Knowl Discov Data 4(1):3:1–3:20
Webb G (2011) Filtered-top-k association discovery. Wiley Interdiscip Rev Data Min Knowl Discov 1(3):183–192
Winitzki S (2003) Uniform approximation for transcendental functions. In: Proceedings of the international conference on computational science and its applications, part I (LNCS 2667), pp 780–789
Winitzki S (2008) A handy approximation for the error function and its inverse. http://www.scribd.com/doc/82414963/Winitzki-Approximation-to-Error-Function. Accessed 20 June 2012
Witten I, Eibe F (2000) Data mining: practical machine learning tools with Java implementations. Morgan Kaufmann, San Francisco
Wu T, Chen Y, Han J (2010) Re-examination of interestingness measures in pattern mining: a unified framework. Data Min Knowl Discov 21(3):371–397
Yao J, Liu H (1997) Searching multiple databases for interesting complexes. In: Proceedings of the 1st Pacific-Asia conference on knowledge discovery and data mining
Yao Y, Zhong N (1999) An analysis of quantitative measures associated with rules. In: Proceedings of the 3rd Pacific-Asia conference on knowledge discovery and data mining (LNCS 1574), pp 479–488
Yule G (1900) On the association of attributes in statistics: with illustrations from the material of the childhood society, &c. Philos Trans R Soc A 194:257–319
Yule G (1912) On the methods of measuring association between two attributes. J R Stat Soc 75(6):579–652
Zhang T (2000) Association rules. In: Proceedings of the 4th Pacific-Asia conference on knowledge discovery and data mining (LNAI 1805), pp 245–256
Author information
Authors and Affiliations
Corresponding author
Additional information
Responsible editor: Bart Goethals.
Rights and permissions
About this article
Cite this article
Tew, C., Giraud-Carrier, C., Tanner, K. et al. Behavior-based clustering and analysis of interestingness measures for association rule mining. Data Min Knowl Disc 28, 1004–1045 (2014). https://doi.org/10.1007/s10618-013-0326-x
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10618-013-0326-x