Skip to main content
Log in

Behavior-based clustering and analysis of interestingness measures for association rule mining

  • Published:
Data Mining and Knowledge Discovery Aims and scope Submit manuscript

Abstract

A number of studies, theoretical, empirical, or both, have been conducted to provide insight into the properties and behavior of interestingness measures for association rule mining. While each has value in its own right, most are either limited in scope or, more importantly, ignore the purpose for which interestingness measures are intended, namely the ultimate ranking of discovered association rules. This paper, therefore, focuses on an analysis of the rule-ranking behavior of 61 well-known interestingness measures tested on the rules generated from 110 different datasets. By clustering based on ranking behavior, we highlight, and formally prove, previously unreported equivalences among interestingness measures. We also show that there appear to be distinct clusters of interestingness measures, but that there remain differences among clusters, confirming that domain knowledge is essential to the selection of an appropriate interestingness measure for a particular task and business objective.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

References

  • Abe H, Tsumoto S (2008) Analyzing behavior of objective rule evaluation indices based on a correlation coefficient. In: Proceedings of the 12th international conference on knowledge-based intelligent information and engineering systems (LNAI 5178), pp 758–765

  • Aggarwal C, Yu P (1998) A new framework for itemset generation. In: Proceedings of the 7th ACM symposium on principles of database systems, pp 18–24

  • Agrawal R, Srikant R (1994) Fast algorithms for mining association rules. In: Proceedings of the 20th international conference on very large data bases, pp 487–499

  • Agrawal R, Imieliński T, Swami A (1993) Mining association rules between sets of items in large databases. ACM SIGMOD Rec 22(2):207–216

    Article  Google Scholar 

  • Ali K, Manganaris S, Srikant R (1997) Partial classification using association rules. In: Proceedings of the 3rd ACM SIGKDD international conference on knowledge discovery and data mining, pp 115–118

  • Arunasalam B, Chawla S (2006) CCCS: a top-down associative classifier for imbalanced class distribution. In: Proceedings of the 12th ACM SIGKDD international conference on knowledge discovery and data mining, pp 517–522

  • Asuncion A, Newman D (2007) UCI machine learning repository. School of Information and Computer Sciences, University of California, Irvine. http://www.ics.uci.edu/mlearn/mlrepository.html

  • Azé J, Kodratoff Y (2002) Evaluation de la résistance au bruit de quelques mesures d’extraction de règles d’association. In: Actes des 2èmes Journées Extraction et Gestion des Connaissances, pp 143–154

  • Bertrand P, Bel Mufti G (2006) Loevinger’s measures of rule quality for assessing cluster stability. Comput Stat Data Anal 50(4):992–1015

    Article  Google Scholar 

  • Berzal F, Blanco I, Sánchez D, Vila MA (2002) Measuring the accuracy and interest of association rules: a new framework. Intell Data Anal 6(3):221–235

    MATH  Google Scholar 

  • Blachman N (1968) The amount of information that y gives about x. IEEE Trans Inf Theory 14(1):27–31

    Article  MATH  MathSciNet  Google Scholar 

  • Blanchard J, Kuntz P, Guillet F, Gras R (2003) Implication intensity: from the basic statistical definition to the entropic version. In: Bozdogan H (ed) Statistical data mining and knowledge discovery. Chapman & Hall/CRC Press, Boca Raton, pp 475–493

    Google Scholar 

  • Blanchard J, Guillet F, Gras R, Briand H (2004) Mesurer la qualité des règles et de leurs contraposées avec le taux informationnel tic. In: Actes des 4èmes Journées Extraction et Gestion des Connaissances, pp 287–298

  • Blanchard J, Guillet F, Briand H, Gras R (2005a) Assessing rule interestingness with a probabilistic measure of deviation from equilibrium. In: Proceedings of the 11th international symposium on applied stochastic models and data analysis, pp 191–200

  • Blanchard J, Guillet F, Gras R, Briand H (2005b) Using information-theoretic measures to assess association rule interestingness. In: Proceedings of the 5th IEEE international conference on data mining, pp 66–73

  • Breiman L, Friedman J, Olshen R, Stone C (1984) Classification and regression trees. Chapman & Hall/CRC Press, Boca Raton

    MATH  Google Scholar 

  • Brin S, Motwani R, Silverstein C (1997a) Beyond market baskets: generalizing association rules to correlations. In: Proceedings of the ACM SIGMOD international conference on management of data, pp 265–276

  • Brin S, Motwani R, Ullman J, Tsur S (1997b) Dynamic itemset counting and implication rules for market basket data. In: Proceedings of the ACM SIGMOD international conference on management of data, pp 255–264

  • Clark P, Boswell R (1991) Rule induction with CN2: some recent improvements. In: Proceedings of the 5th European working session on, learning, pp 151–163

  • Cohen J (1960) A coefficient of agreement for nominal scales. Educ Psychol Meas 20(1):37–46

    Article  Google Scholar 

  • Corey D, Dunlap W, Burke M (1998) Averaging correlations: expected values and bias in combined Pearson \(r\)s and Fisher’s \(z\) transformations. J Gen Psychol 125(3):245–261

    Article  Google Scholar 

  • De Bie T, Kontonasios KN, Spyropoulou E (2010) A framework for mining interesting pattern sets. SIGKDD Explor 12(2):92–100

    Google Scholar 

  • Duda R, Gaschnig J, Hart P (1981) Model design in the prospector consultant system for mineral exploration. In: Webber B, Nilsson N (eds) Readings in artificial intelligence. Tioga, Palo Alto, pp 334–348

    Google Scholar 

  • Fieller E, Hartley H, Pearson E (1957) Test for rank correlation coefficients. I. Biometrika 44(3/4):470–481

    Google Scholar 

  • Fürnkranz J, Flach P (2005) Roc n rule learning—towards a better understanding of covering algorithms. Mach Learn 58(1):39–77

    Article  MATH  Google Scholar 

  • Gallo A, De Bie T, Cristianini N (2007) MINI: mining informative non-redundant itemsets. In: Proceedings of the 11th conference on principles and practice of knowledge discovery in databases, pp 438–445

  • Ganascia J (1991) CHARADE: Apprentissage de bases de connaissances. In: Kodratoff Y, Diday E (eds) Induction Symbolique-Numérique à Partir de Données. Cépaduès-éditions, Toulouse

    Google Scholar 

  • Geng L, Hamilton H (2006) Interestingness measures for data mining: a survey. ACM Comput Surv 38(3):1–32

    Article  Google Scholar 

  • Goodman L, Kruskal W (1954) Measures of association for cross-classifications. J Am Stat Soc 49(268):732–764

    MATH  Google Scholar 

  • Gras R, Larher A (1992) L’implication statistique, une nouvelle méthode d’analyse de données. Mathématiques et Sciences Humaines 120:5–31

    MATH  MathSciNet  Google Scholar 

  • Gray B, Orlowska M (1998) CCAIIA: clustering categorical attributes into interesting association rules. In: Proceedings of the 2nd Pacific Asia conference on knowledge discovery and data mining, pp 132–143

  • Greenacre M, Primicerio R (2013) Multivariate data analysis for ecologists. Foundation BBVA, Madrid

    Google Scholar 

  • Hahsler M, Hornik K (2007) New probabilistic interest measures for association rules. Intell Data Anal 11(5):437–455

    Google Scholar 

  • Hill T, Lewicki P (2007) Statistics: methods and applications. StatSoft, Tulsa. http://www.statsoft.com/textbook/

  • Huynh XH, Guillet F, Briand H (2005) A data analysis approach for evaluating the behavior of interestingness measures. In: Proceedings of the 8th international conference on discovery science (LNAI 3735), pp 330–337

  • Huynh XH, Guillet F, Briand H (2006) Discovering the stable clusters between interestingness measures. In: Proceedings of the 8th international conference on enterprise information systems: databases and information systems integration, pp 196–201

  • Huynh XH, Guillet F, Blanchard J, Kuntz P, Briand H, Gras R (2007) A graph-based clustering approach to evaluate interestingness measures: a tool and a comparative study. In: Guillet F, Hamilton H (eds) Quality measures in data mining, vol 43. Studies in computational intelligence, Springer, Heidelberg, pp 25–50

  • Jaccard P (1901) Etude comparative de la distribution florale dans une portion des alpes et des jura. Bulletin de la Société Vaudoise des Sciences Naturelles 37:547–579

    Google Scholar 

  • Jain A, Dubes R (1988) Algorithms for clustering data. Prentice-Hall, Inc., Englewood Cliffs

    MATH  Google Scholar 

  • Jain A, Murty M, Flynn P (1999) Data clustering: a review. ACM Comput Surv 31(3):264–323

    Article  Google Scholar 

  • Jalali-Heravi M, Zaïane O (2010) A study on interestingness measures for associative classifiers. In: Proceedings of the 25th ACM symposium on applied computing, pp 1039–1046

  • Jaroszewicz S, Simovici D (2004) Interestingness of frequent itemsets using Bayesian networks as background knowledge. In: Proceedings of the 10th ACM SIGKDD international conference on knowledge discovery and data mining, pp 178–186

  • Johnson S (1967) Hierarchical clustering schemes. Psychometrika 2:241–254

    Article  Google Scholar 

  • Kamber M, Shinghal R (1996) Evaluating the interestingness of characteristic rules. In: Proceedings of the 2nd ACM SIGKDD international conference on knowledge discovery and data mining, pp 263–266

  • Kannan S, Bhaskaran R (2009) Association rule pruning based on interestingness measures with clustering. Int J Comput Sci Issues 6(1):35–45

    Google Scholar 

  • Kiran U, Re K et al (2009) An improved multiple minimum support-based approach to mine rare association rules. In: Proceedings of the IEEE symposium on computational intelligence and data mining, pp 340–347

  • Klösgen W (1996) Explora: a multipattern and multistrategy discovery assistant. In: Fayyad U, Piatetsky-Shapiro G, Smyth P, Uthurusamy R (eds) Advances in knowledge discovery and data mining. AAAI Press, Cambridge, pp 249–271

    Google Scholar 

  • Kodratoff Y (2001) Comparing machine learning and knowledge discovery in databases: an application to knowledge discovery in texts. In: Paliouras G, Karkaletsis V, Spyropoulos CD (eds) Machine learning and its applications. Springer, New York, pp 1–21

    Chapter  Google Scholar 

  • Koh Y, Pears R (2008) Rare association rule mining via transaction clustering. In: Proceedings of the 7th Australasian conference on knowledge discovery and data mining, pp 87–94

  • Kulczynski S (1927) Die pflanzenassoziationen der pieninen. Bulletin International de l’Académie Polonaise des Sciences et des Lettres, Classe des Sciences Mathématiques et Naturelles B 2:57–203

    Google Scholar 

  • Lallich S, Teytaud O, Prudhomme E (2007) Association rule interestingness: measure and statistical validation. In: Guillet F, Hamilton H (eds) Quality measures in data mining, vol 43. Studies in computational intelligence. Springer, Heidelberg, pp 251–275

  • Lan Y, Chen G, Janssens D, Wets G (2004) Dilated chi-square: a novel interestingness measure to build accurate and compact decision list. In: Proceedings of the international conference on intelligent information processing, pp 233–237

  • Lan Y, Janssens D, Chen G, Wets G (2006) Improving associative classification by incorporating novel interestingness measures. Expert Syst Appl 31(1):184–192

    Article  Google Scholar 

  • Lavrac̆ N, Flach P, Zupan B (1999) Rule evaluation measures: a unifying view. In: Proceedings of the 9th international workshop on inductive logic programming (LNAI 1634), pp 174–185

  • Lee J, Giraud-Carrier C (2011) A metric for unsupervised metalearning. Intell Data Anal 15(6):827–841

    Google Scholar 

  • Lenca P, Vaillant B, Meyer P, Lallich S (2007) Association rule interestingness measures: experimental and theoretical studies. ReCALL 43:51–76

    Google Scholar 

  • Lenca P, Meyer P, Vaillant B, Lallich S (2008) On selecting interestingness measures for association rules: user oriented description and multiple criteria decision aid. Eur J Oper Res 184(2):610–626

    Article  MATH  Google Scholar 

  • Lerman I, Gras R, Rostam H (1981a) Elaboration et evaluation d’un indice d’ implication pour des données binaires 1. Mathématiques et Sciences Humaines 74:5–35

    MATH  Google Scholar 

  • Lerman I, Gras R, Rostam H (1981b) Elaboration et evaluation d’un indice d’ implication pour des données binaires 2. Mathématiques et Sciences Humaines 75:5–47

    MATH  Google Scholar 

  • Li J (2006) On optimal rule discovery. IEEE Trans Knowl Data Eng 18(4):460–471

    Article  Google Scholar 

  • Liu B, Hsu W, Ma Y (1999) Mining association rules with multiple minimum supports. In: Proceedings of the 5th ACM SIGKDD international conference on knowledge discovery and data mining, pp 337–341

  • Loevinger J (1947) A systematic approach to the construction and evaluation of tests of ability. Psychol Monogr 61(4):1–49

    Article  Google Scholar 

  • Mampaey M, Tatti N, Vreeken J (2011) Tell me what I need to know: succinctly summarizing data with itemsets. In: Proceedings of the 17th ACM SIGKDD international conference on knowledge discovery and data mining, pp 573–581

  • McGarry K (2005) A survey of interestingness measures for knowledge discovery. Knowl Eng Rev 20(1):39–61

    Article  Google Scholar 

  • Meilă M (2012) Logical equivalences of distances between clusterings—a geometric perspective. Mach Learn 86(3):369–389

    Article  MATH  MathSciNet  Google Scholar 

  • Mosteller F (1968) Association and estimation in contingency tables. J Am Stat Soc 63(321):1–28

    Article  MathSciNet  Google Scholar 

  • Ohsaki M, Sato Y, Yokoi H, Yamaguchi T (2002) A rule discovery support system doe sequential medical data–in the case study of a chronic hepatitis dataset. In: Proceedings of the ICDM workshop on active mining, pp 97–102

  • Ohsaki M, Kitaguchi S, Yokoi H, Yamaguchi T (2003) Investigation of rule interestingness in medical data mining. In: Proceedings of the 2nd international workshop on active mining (LNAI 3430), pp 174–189

  • Ohsaki M, Kitaguchi S, Okamoto K, Yokoi H, Yamaguchi T (2004) Evaluation of rule interestingness measures with a clinical dataset on hepatitis. In: Proceedings of the 8th European conference on principles and practice of knowledge discovery in databases (LNAI 3203), pp 362–373

  • Padmanabhan B (2004) The interestingness paradox in pattern discovery. J Appl Stat 31(8):1019–1035

    Article  MATH  MathSciNet  Google Scholar 

  • Peterson A, Martinez T (2005) Estimating the potential for combining learning models. In: Proceedings of the ICML workshop on meta-learning, pp 68–75

  • Piatetsky-Shapiro G (1991) Discovery, analysis, and presentation of strong rules. In: Piatetsky-Shapiro G, Frawley WJ (eds) Knowledge discovery in databases. AAAI Press, Cambridge, pp 229–248

    Google Scholar 

  • Plasse M, Niang N, Saportaa G, Villeminot A, Leblond L (2007) Combined use of association rules mining and clustering methods to find relevant links between binary rare attributes in a large data set. Comput Stat Data Anal 52(1):596–613

    Article  MATH  Google Scholar 

  • R Development Core Team (2007) R: a language and environment for statistical computing. R Foundation for Statistical Computing, Vienna

    Google Scholar 

  • Ritschard G, Zighed D (2006) Implication strength of classification rules. In: Proceedings of the 16th international symposium on methodologies for intelligent systems (LNCS 4203), pp 463–472

  • Sahar S (1999) Interestingness via what is not interesting. In: Proceedings of the 5th ACM SIGKDD international conference on knowledge discovery and data mining, pp 332–336

  • Sahar S (2002) Exploring interestingness through clustering: a framework. In: Proceedings of the 2nd IEEE international conference on data mining, pp 677–680

  • Sahar S (2003) What is interesting: studies on interestingness in knowledge discovery. PhD thesis, School of Computer Science, Tel-Aviv University

  • Sahar S (2010) Interestingness measures—on determining what is interesting. In: Maimon O, Rokach L (eds) Data mining and knowledge discovery handbook, 2nd edn. Springer, New York, pp 603–612

    Google Scholar 

  • Sebag M, Schoenauer M (1988) Generation of rules with certainty and confidence factors from incomplete and incoherent learning bases. In: Proceedings of the European knowledge acquisition, workshop, pp 28.1-28.20

  • Silver N, Dunlap W (1987) Averaging correlation coefficients: should Fisher’s \(z\) transformation be used? J Appl Psychol 72(1):146–148

    Article  Google Scholar 

  • Smyth P, Goodman R (1992) An information theoretic approach to rule induction from databases. IEEE Trans Knowl Data Eng 4(4):301–316

    Article  Google Scholar 

  • Spyropoulou E, De Bie T (2011) Interesting multi-relational patterns. In: Proceedings of the 11th international conference on data mining, pp 675–684

  • Stiglic G, Kokol P (2009) GEMLeR: gene expression machine learning repository. Faculty of Health Sciences, University of Maribor. http://gemler.fzv.uni-mb.si/

  • Tan P, Kumar V (2000) Interestingness measures for association patterns: a perspective. In: Proceedings of the KDD’00 workshop on postprocessing in machine learning and data mining

  • Tan P, Kumar V, Srivastava J (2002) Selecting the right interestingness measure for association patterns. In: Proceedings of the 8th ACM SIGKDD international conference on knowledge discovery and data mining, pp 32–41

  • Tan P, Kumar V, Srivastava J (2004) Selecting the right objective measure for association analysis. Inf Syst 29(4):293–313

    Article  Google Scholar 

  • Tatti N, Mampaey M (2010) Using background knowledge to rank itemsets. Data Min Knowl Discov 21(2):293–309

    Article  MathSciNet  Google Scholar 

  • Vaillant B, Lenca P, Lallich S (2004) A clustering of interestingness measures. In: Proceedings of the 7th international conference on discovery science (LNAI 3245), pp 290–297

  • Verhein F, Chawla S (2007) Using significant, positively associated and relatively class correlated rules for associative classification of imbalanced datasets. In: Proceedings of the 7th IEEE international conference on data mining, pp 679–684

  • Webb G (2006) Discovery significant rule. In: Proceedings of the 12th ACM SIGKDD international conference on knowledge discovery and data mining, pp 434–443

  • Webb G (2010) Self-sufficient itemsets: an approach to screening potentially interesting associations between items. ACM Trans Knowl Discov Data 4(1):3:1–3:20

    Google Scholar 

  • Webb G (2011) Filtered-top-k association discovery. Wiley Interdiscip Rev Data Min Knowl Discov 1(3):183–192

    Article  Google Scholar 

  • Winitzki S (2003) Uniform approximation for transcendental functions. In: Proceedings of the international conference on computational science and its applications, part I (LNCS 2667), pp 780–789

  • Winitzki S (2008) A handy approximation for the error function and its inverse. http://www.scribd.com/doc/82414963/Winitzki-Approximation-to-Error-Function. Accessed 20 June 2012

  • Witten I, Eibe F (2000) Data mining: practical machine learning tools with Java implementations. Morgan Kaufmann, San Francisco

    Google Scholar 

  • Wu T, Chen Y, Han J (2010) Re-examination of interestingness measures in pattern mining: a unified framework. Data Min Knowl Discov 21(3):371–397

    Article  MathSciNet  Google Scholar 

  • Yao J, Liu H (1997) Searching multiple databases for interesting complexes. In: Proceedings of the 1st Pacific-Asia conference on knowledge discovery and data mining

  • Yao Y, Zhong N (1999) An analysis of quantitative measures associated with rules. In: Proceedings of the 3rd Pacific-Asia conference on knowledge discovery and data mining (LNCS 1574), pp 479–488

  • Yule G (1900) On the association of attributes in statistics: with illustrations from the material of the childhood society, &c. Philos Trans R Soc A 194:257–319

    Article  MATH  Google Scholar 

  • Yule G (1912) On the methods of measuring association between two attributes. J R Stat Soc 75(6):579–652

    Article  Google Scholar 

  • Zhang T (2000) Association rules. In: Proceedings of the 4th Pacific-Asia conference on knowledge discovery and data mining (LNAI 1805), pp 245–256

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to C. Giraud-Carrier.

Additional information

Responsible editor: Bart Goethals.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Tew, C., Giraud-Carrier, C., Tanner, K. et al. Behavior-based clustering and analysis of interestingness measures for association rule mining. Data Min Knowl Disc 28, 1004–1045 (2014). https://doi.org/10.1007/s10618-013-0326-x

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10618-013-0326-x

Keywords

Navigation