Behavior-based clustering and analysis of interestingness measures for association rule mining

Tew, C.; Giraud-Carrier, C.; Tanner, K.; Burton, S.

doi:10.1007/s10618-013-0326-x

Behavior-based clustering and analysis of interestingness measures for association rule mining

Published: 26 June 2013

Volume 28, pages 1004–1045, (2014)
Cite this article

Data Mining and Knowledge Discovery Aims and scope Submit manuscript

C. Tew¹,
C. Giraud-Carrier¹,
K. Tanner¹ &
…
S. Burton¹

2329 Accesses
54 Citations
3 Altmetric
Explore all metrics

Abstract

A number of studies, theoretical, empirical, or both, have been conducted to provide insight into the properties and behavior of interestingness measures for association rule mining. While each has value in its own right, most are either limited in scope or, more importantly, ignore the purpose for which interestingness measures are intended, namely the ultimate ranking of discovered association rules. This paper, therefore, focuses on an analysis of the rule-ranking behavior of 61 well-known interestingness measures tested on the rules generated from 110 different datasets. By clustering based on ranking behavior, we highlight, and formally prove, previously unreported equivalences among interestingness measures. We also show that there appear to be distinct clusters of interestingness measures, but that there remain differences among clusters, confirming that domain knowledge is essential to the selection of an appropriate interestingness measure for a particular task and business objective.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

An Investigation of Objective Interestingness Measures for Association Rule Mining

Interestingness Measures for Multi-Level Association Rules

A Framework for Interestingness Measures for Association Rules with Discrete and Continuous Attributes Based on Statistical Validity

References

Abe H, Tsumoto S (2008) Analyzing behavior of objective rule evaluation indices based on a correlation coefficient. In: Proceedings of the 12th international conference on knowledge-based intelligent information and engineering systems (LNAI 5178), pp 758–765
Aggarwal C, Yu P (1998) A new framework for itemset generation. In: Proceedings of the 7th ACM symposium on principles of database systems, pp 18–24
Agrawal R, Srikant R (1994) Fast algorithms for mining association rules. In: Proceedings of the 20th international conference on very large data bases, pp 487–499
Agrawal R, Imieliński T, Swami A (1993) Mining association rules between sets of items in large databases. ACM SIGMOD Rec 22(2):207–216
Article Google Scholar
Ali K, Manganaris S, Srikant R (1997) Partial classification using association rules. In: Proceedings of the 3rd ACM SIGKDD international conference on knowledge discovery and data mining, pp 115–118
Arunasalam B, Chawla S (2006) CCCS: a top-down associative classifier for imbalanced class distribution. In: Proceedings of the 12th ACM SIGKDD international conference on knowledge discovery and data mining, pp 517–522
Asuncion A, Newman D (2007) UCI machine learning repository. School of Information and Computer Sciences, University of California, Irvine. http://www.ics.uci.edu/mlearn/mlrepository.html
Azé J, Kodratoff Y (2002) Evaluation de la résistance au bruit de quelques mesures d’extraction de règles d’association. In: Actes des 2èmes Journées Extraction et Gestion des Connaissances, pp 143–154
Bertrand P, Bel Mufti G (2006) Loevinger’s measures of rule quality for assessing cluster stability. Comput Stat Data Anal 50(4):992–1015
Article Google Scholar
Berzal F, Blanco I, Sánchez D, Vila MA (2002) Measuring the accuracy and interest of association rules: a new framework. Intell Data Anal 6(3):221–235
MATH Google Scholar
Blachman N (1968) The amount of information that y gives about x. IEEE Trans Inf Theory 14(1):27–31
Article MATH MathSciNet Google Scholar
Blanchard J, Kuntz P, Guillet F, Gras R (2003) Implication intensity: from the basic statistical definition to the entropic version. In: Bozdogan H (ed) Statistical data mining and knowledge discovery. Chapman & Hall/CRC Press, Boca Raton, pp 475–493
Google Scholar
Blanchard J, Guillet F, Gras R, Briand H (2004) Mesurer la qualité des règles et de leurs contraposées avec le taux informationnel tic. In: Actes des 4èmes Journées Extraction et Gestion des Connaissances, pp 287–298
Blanchard J, Guillet F, Briand H, Gras R (2005a) Assessing rule interestingness with a probabilistic measure of deviation from equilibrium. In: Proceedings of the 11th international symposium on applied stochastic models and data analysis, pp 191–200
Blanchard J, Guillet F, Gras R, Briand H (2005b) Using information-theoretic measures to assess association rule interestingness. In: Proceedings of the 5th IEEE international conference on data mining, pp 66–73
Breiman L, Friedman J, Olshen R, Stone C (1984) Classification and regression trees. Chapman & Hall/CRC Press, Boca Raton
MATH Google Scholar
Brin S, Motwani R, Silverstein C (1997a) Beyond market baskets: generalizing association rules to correlations. In: Proceedings of the ACM SIGMOD international conference on management of data, pp 265–276
Brin S, Motwani R, Ullman J, Tsur S (1997b) Dynamic itemset counting and implication rules for market basket data. In: Proceedings of the ACM SIGMOD international conference on management of data, pp 255–264
Clark P, Boswell R (1991) Rule induction with CN2: some recent improvements. In: Proceedings of the 5th European working session on, learning, pp 151–163
Cohen J (1960) A coefficient of agreement for nominal scales. Educ Psychol Meas 20(1):37–46
Article Google Scholar
Corey D, Dunlap W, Burke M (1998) Averaging correlations: expected values and bias in combined Pearson \(r\)s and Fisher’s \(z\) transformations. J Gen Psychol 125(3):245–261
Article Google Scholar
De Bie T, Kontonasios KN, Spyropoulou E (2010) A framework for mining interesting pattern sets. SIGKDD Explor 12(2):92–100
Google Scholar
Duda R, Gaschnig J, Hart P (1981) Model design in the prospector consultant system for mineral exploration. In: Webber B, Nilsson N (eds) Readings in artificial intelligence. Tioga, Palo Alto, pp 334–348
Google Scholar
Fieller E, Hartley H, Pearson E (1957) Test for rank correlation coefficients. I. Biometrika 44(3/4):470–481
Google Scholar
Fürnkranz J, Flach P (2005) Roc n rule learning—towards a better understanding of covering algorithms. Mach Learn 58(1):39–77
Article MATH Google Scholar
Gallo A, De Bie T, Cristianini N (2007) MINI: mining informative non-redundant itemsets. In: Proceedings of the 11th conference on principles and practice of knowledge discovery in databases, pp 438–445
Ganascia J (1991) CHARADE: Apprentissage de bases de connaissances. In: Kodratoff Y, Diday E (eds) Induction Symbolique-Numérique à Partir de Données. Cépaduès-éditions, Toulouse
Google Scholar
Geng L, Hamilton H (2006) Interestingness measures for data mining: a survey. ACM Comput Surv 38(3):1–32
Article Google Scholar
Goodman L, Kruskal W (1954) Measures of association for cross-classifications. J Am Stat Soc 49(268):732–764
MATH Google Scholar
Gras R, Larher A (1992) L’implication statistique, une nouvelle méthode d’analyse de données. Mathématiques et Sciences Humaines 120:5–31
MATH MathSciNet Google Scholar
Gray B, Orlowska M (1998) CCAIIA: clustering categorical attributes into interesting association rules. In: Proceedings of the 2nd Pacific Asia conference on knowledge discovery and data mining, pp 132–143
Greenacre M, Primicerio R (2013) Multivariate data analysis for ecologists. Foundation BBVA, Madrid
Google Scholar
Hahsler M, Hornik K (2007) New probabilistic interest measures for association rules. Intell Data Anal 11(5):437–455
Google Scholar
Hill T, Lewicki P (2007) Statistics: methods and applications. StatSoft, Tulsa. http://www.statsoft.com/textbook/
Huynh XH, Guillet F, Briand H (2005) A data analysis approach for evaluating the behavior of interestingness measures. In: Proceedings of the 8th international conference on discovery science (LNAI 3735), pp 330–337
Huynh XH, Guillet F, Briand H (2006) Discovering the stable clusters between interestingness measures. In: Proceedings of the 8th international conference on enterprise information systems: databases and information systems integration, pp 196–201
Huynh XH, Guillet F, Blanchard J, Kuntz P, Briand H, Gras R (2007) A graph-based clustering approach to evaluate interestingness measures: a tool and a comparative study. In: Guillet F, Hamilton H (eds) Quality measures in data mining, vol 43. Studies in computational intelligence, Springer, Heidelberg, pp 25–50
Jaccard P (1901) Etude comparative de la distribution florale dans une portion des alpes et des jura. Bulletin de la Société Vaudoise des Sciences Naturelles 37:547–579
Google Scholar
Jain A, Dubes R (1988) Algorithms for clustering data. Prentice-Hall, Inc., Englewood Cliffs
MATH Google Scholar
Jain A, Murty M, Flynn P (1999) Data clustering: a review. ACM Comput Surv 31(3):264–323
Article Google Scholar
Jalali-Heravi M, Zaïane O (2010) A study on interestingness measures for associative classifiers. In: Proceedings of the 25th ACM symposium on applied computing, pp 1039–1046
Jaroszewicz S, Simovici D (2004) Interestingness of frequent itemsets using Bayesian networks as background knowledge. In: Proceedings of the 10th ACM SIGKDD international conference on knowledge discovery and data mining, pp 178–186
Johnson S (1967) Hierarchical clustering schemes. Psychometrika 2:241–254
Article Google Scholar
Kamber M, Shinghal R (1996) Evaluating the interestingness of characteristic rules. In: Proceedings of the 2nd ACM SIGKDD international conference on knowledge discovery and data mining, pp 263–266
Kannan S, Bhaskaran R (2009) Association rule pruning based on interestingness measures with clustering. Int J Comput Sci Issues 6(1):35–45
Google Scholar
Kiran U, Re K et al (2009) An improved multiple minimum support-based approach to mine rare association rules. In: Proceedings of the IEEE symposium on computational intelligence and data mining, pp 340–347
Klösgen W (1996) Explora: a multipattern and multistrategy discovery assistant. In: Fayyad U, Piatetsky-Shapiro G, Smyth P, Uthurusamy R (eds) Advances in knowledge discovery and data mining. AAAI Press, Cambridge, pp 249–271
Google Scholar
Kodratoff Y (2001) Comparing machine learning and knowledge discovery in databases: an application to knowledge discovery in texts. In: Paliouras G, Karkaletsis V, Spyropoulos CD (eds) Machine learning and its applications. Springer, New York, pp 1–21
Chapter Google Scholar
Koh Y, Pears R (2008) Rare association rule mining via transaction clustering. In: Proceedings of the 7th Australasian conference on knowledge discovery and data mining, pp 87–94
Kulczynski S (1927) Die pflanzenassoziationen der pieninen. Bulletin International de l’Académie Polonaise des Sciences et des Lettres, Classe des Sciences Mathématiques et Naturelles B 2:57–203
Google Scholar
Lallich S, Teytaud O, Prudhomme E (2007) Association rule interestingness: measure and statistical validation. In: Guillet F, Hamilton H (eds) Quality measures in data mining, vol 43. Studies in computational intelligence. Springer, Heidelberg, pp 251–275
Lan Y, Chen G, Janssens D, Wets G (2004) Dilated chi-square: a novel interestingness measure to build accurate and compact decision list. In: Proceedings of the international conference on intelligent information processing, pp 233–237
Lan Y, Janssens D, Chen G, Wets G (2006) Improving associative classification by incorporating novel interestingness measures. Expert Syst Appl 31(1):184–192
Article Google Scholar
Lavrac̆ N, Flach P, Zupan B (1999) Rule evaluation measures: a unifying view. In: Proceedings of the 9th international workshop on inductive logic programming (LNAI 1634), pp 174–185
Lee J, Giraud-Carrier C (2011) A metric for unsupervised metalearning. Intell Data Anal 15(6):827–841
Google Scholar
Lenca P, Vaillant B, Meyer P, Lallich S (2007) Association rule interestingness measures: experimental and theoretical studies. ReCALL 43:51–76
Google Scholar
Lenca P, Meyer P, Vaillant B, Lallich S (2008) On selecting interestingness measures for association rules: user oriented description and multiple criteria decision aid. Eur J Oper Res 184(2):610–626
Article MATH Google Scholar
Lerman I, Gras R, Rostam H (1981a) Elaboration et evaluation d’un indice d’ implication pour des données binaires 1. Mathématiques et Sciences Humaines 74:5–35
MATH Google Scholar
Lerman I, Gras R, Rostam H (1981b) Elaboration et evaluation d’un indice d’ implication pour des données binaires 2. Mathématiques et Sciences Humaines 75:5–47
MATH Google Scholar
Li J (2006) On optimal rule discovery. IEEE Trans Knowl Data Eng 18(4):460–471
Article Google Scholar
Liu B, Hsu W, Ma Y (1999) Mining association rules with multiple minimum supports. In: Proceedings of the 5th ACM SIGKDD international conference on knowledge discovery and data mining, pp 337–341
Loevinger J (1947) A systematic approach to the construction and evaluation of tests of ability. Psychol Monogr 61(4):1–49
Article Google Scholar
Mampaey M, Tatti N, Vreeken J (2011) Tell me what I need to know: succinctly summarizing data with itemsets. In: Proceedings of the 17th ACM SIGKDD international conference on knowledge discovery and data mining, pp 573–581
McGarry K (2005) A survey of interestingness measures for knowledge discovery. Knowl Eng Rev 20(1):39–61
Article Google Scholar
Meilă M (2012) Logical equivalences of distances between clusterings—a geometric perspective. Mach Learn 86(3):369–389
Article MATH MathSciNet Google Scholar
Mosteller F (1968) Association and estimation in contingency tables. J Am Stat Soc 63(321):1–28
Article MathSciNet Google Scholar
Ohsaki M, Sato Y, Yokoi H, Yamaguchi T (2002) A rule discovery support system doe sequential medical data–in the case study of a chronic hepatitis dataset. In: Proceedings of the ICDM workshop on active mining, pp 97–102
Ohsaki M, Kitaguchi S, Yokoi H, Yamaguchi T (2003) Investigation of rule interestingness in medical data mining. In: Proceedings of the 2nd international workshop on active mining (LNAI 3430), pp 174–189
Ohsaki M, Kitaguchi S, Okamoto K, Yokoi H, Yamaguchi T (2004) Evaluation of rule interestingness measures with a clinical dataset on hepatitis. In: Proceedings of the 8th European conference on principles and practice of knowledge discovery in databases (LNAI 3203), pp 362–373
Padmanabhan B (2004) The interestingness paradox in pattern discovery. J Appl Stat 31(8):1019–1035
Article MATH MathSciNet Google Scholar
Peterson A, Martinez T (2005) Estimating the potential for combining learning models. In: Proceedings of the ICML workshop on meta-learning, pp 68–75
Piatetsky-Shapiro G (1991) Discovery, analysis, and presentation of strong rules. In: Piatetsky-Shapiro G, Frawley WJ (eds) Knowledge discovery in databases. AAAI Press, Cambridge, pp 229–248
Google Scholar
Plasse M, Niang N, Saportaa G, Villeminot A, Leblond L (2007) Combined use of association rules mining and clustering methods to find relevant links between binary rare attributes in a large data set. Comput Stat Data Anal 52(1):596–613
Article MATH Google Scholar
R Development Core Team (2007) R: a language and environment for statistical computing. R Foundation for Statistical Computing, Vienna
Google Scholar
Ritschard G, Zighed D (2006) Implication strength of classification rules. In: Proceedings of the 16th international symposium on methodologies for intelligent systems (LNCS 4203), pp 463–472
Sahar S (1999) Interestingness via what is not interesting. In: Proceedings of the 5th ACM SIGKDD international conference on knowledge discovery and data mining, pp 332–336
Sahar S (2002) Exploring interestingness through clustering: a framework. In: Proceedings of the 2nd IEEE international conference on data mining, pp 677–680
Sahar S (2003) What is interesting: studies on interestingness in knowledge discovery. PhD thesis, School of Computer Science, Tel-Aviv University
Sahar S (2010) Interestingness measures—on determining what is interesting. In: Maimon O, Rokach L (eds) Data mining and knowledge discovery handbook, 2nd edn. Springer, New York, pp 603–612
Google Scholar
Sebag M, Schoenauer M (1988) Generation of rules with certainty and confidence factors from incomplete and incoherent learning bases. In: Proceedings of the European knowledge acquisition, workshop, pp 28.1-28.20
Silver N, Dunlap W (1987) Averaging correlation coefficients: should Fisher’s \(z\) transformation be used? J Appl Psychol 72(1):146–148
Article Google Scholar
Smyth P, Goodman R (1992) An information theoretic approach to rule induction from databases. IEEE Trans Knowl Data Eng 4(4):301–316
Article Google Scholar
Spyropoulou E, De Bie T (2011) Interesting multi-relational patterns. In: Proceedings of the 11th international conference on data mining, pp 675–684
Stiglic G, Kokol P (2009) GEMLeR: gene expression machine learning repository. Faculty of Health Sciences, University of Maribor. http://gemler.fzv.uni-mb.si/
Tan P, Kumar V (2000) Interestingness measures for association patterns: a perspective. In: Proceedings of the KDD’00 workshop on postprocessing in machine learning and data mining
Tan P, Kumar V, Srivastava J (2002) Selecting the right interestingness measure for association patterns. In: Proceedings of the 8th ACM SIGKDD international conference on knowledge discovery and data mining, pp 32–41
Tan P, Kumar V, Srivastava J (2004) Selecting the right objective measure for association analysis. Inf Syst 29(4):293–313
Article Google Scholar
Tatti N, Mampaey M (2010) Using background knowledge to rank itemsets. Data Min Knowl Discov 21(2):293–309
Article MathSciNet Google Scholar
Vaillant B, Lenca P, Lallich S (2004) A clustering of interestingness measures. In: Proceedings of the 7th international conference on discovery science (LNAI 3245), pp 290–297
Verhein F, Chawla S (2007) Using significant, positively associated and relatively class correlated rules for associative classification of imbalanced datasets. In: Proceedings of the 7th IEEE international conference on data mining, pp 679–684
Webb G (2006) Discovery significant rule. In: Proceedings of the 12th ACM SIGKDD international conference on knowledge discovery and data mining, pp 434–443
Webb G (2010) Self-sufficient itemsets: an approach to screening potentially interesting associations between items. ACM Trans Knowl Discov Data 4(1):3:1–3:20
Google Scholar
Webb G (2011) Filtered-top-k association discovery. Wiley Interdiscip Rev Data Min Knowl Discov 1(3):183–192
Article Google Scholar
Winitzki S (2003) Uniform approximation for transcendental functions. In: Proceedings of the international conference on computational science and its applications, part I (LNCS 2667), pp 780–789
Winitzki S (2008) A handy approximation for the error function and its inverse. http://www.scribd.com/doc/82414963/Winitzki-Approximation-to-Error-Function. Accessed 20 June 2012
Witten I, Eibe F (2000) Data mining: practical machine learning tools with Java implementations. Morgan Kaufmann, San Francisco
Google Scholar
Wu T, Chen Y, Han J (2010) Re-examination of interestingness measures in pattern mining: a unified framework. Data Min Knowl Discov 21(3):371–397
Article MathSciNet Google Scholar
Yao J, Liu H (1997) Searching multiple databases for interesting complexes. In: Proceedings of the 1st Pacific-Asia conference on knowledge discovery and data mining
Yao Y, Zhong N (1999) An analysis of quantitative measures associated with rules. In: Proceedings of the 3rd Pacific-Asia conference on knowledge discovery and data mining (LNCS 1574), pp 479–488
Yule G (1900) On the association of attributes in statistics: with illustrations from the material of the childhood society, &c. Philos Trans R Soc A 194:257–319
Article MATH Google Scholar
Yule G (1912) On the methods of measuring association between two attributes. J R Stat Soc 75(6):579–652
Article Google Scholar
Zhang T (2000) Association rules. In: Proceedings of the 4th Pacific-Asia conference on knowledge discovery and data mining (LNAI 1805), pp 245–256

Download references

Author information

Authors and Affiliations

Department of Computer Science, Brigham Young University, Provo, UT, 84602, USA
C. Tew, C. Giraud-Carrier, K. Tanner & S. Burton

Authors

C. Tew
View author publications
You can also search for this author in PubMed Google Scholar
C. Giraud-Carrier
View author publications
You can also search for this author in PubMed Google Scholar
K. Tanner
View author publications
You can also search for this author in PubMed Google Scholar
S. Burton
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to C. Giraud-Carrier.

Additional information

Responsible editor: Bart Goethals.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Tew, C., Giraud-Carrier, C., Tanner, K. et al. Behavior-based clustering and analysis of interestingness measures for association rule mining. Data Min Knowl Disc 28, 1004–1045 (2014). https://doi.org/10.1007/s10618-013-0326-x

Download citation

Received: 10 July 2012
Accepted: 06 June 2013
Published: 26 June 2013
Issue Date: July 2014
DOI: https://doi.org/10.1007/s10618-013-0326-x

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Behavior-based clustering and analysis of interestingness measures for association rule mining

Abstract

Access this article

Similar content being viewed by others

An Investigation of Objective Interestingness Measures for Association Rule Mining

Interestingness Measures for Multi-Level Association Rules

A Framework for Interestingness Measures for Association Rules with Discrete and Continuous Attributes Based on Statistical Validity

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Behavior-based clustering and analysis of interestingness measures for association rule mining

Abstract

Access this article

Similar content being viewed by others

An Investigation of Objective Interestingness Measures for Association Rule Mining

Interestingness Measures for Multi-Level Association Rules

A Framework for Interestingness Measures for Association Rules with Discrete and Continuous Attributes Based on Statistical Validity

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation