A General Measure of Rule Interestingness

  • Szymon Jaroszewicz
  • Dan A. Simovici
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 2168)


The paper presents a new general measure of rule interestingness. Many known measures such as chi-square, gini gain or entropy gain can be obtained from this measure by setting some numerical parameters, including the amount of trust we have in the estimation of the probability distribution of the data. Moreover, we show that there is a continuum of measures having chi-square, Gini gain and entropy gain as boundary cases. Therefore our measure generalizes both conditional and unconditional classical measures of interestingness. Properties and experimental evaluation of the new measure are also presented.


interestingness measure distribution Kullback-Leibler divergence Cziszar divergence rule 


  1. 1.
    Bayardo R.J. and R. Agrawal, Mining the Most Interesting Rules, Proc. of the 5th ACM SIGKDD Int’l Conf. on Knowledge Discovery and Data Mining, pp. 145–154, August 1999.Google Scholar
  2. 2.
    Blake C.L., and Merz C.J. UCI Repository of machine learning databases Irvine, CA: University of California, Department of Information and Computer Science,
  3. 3.
    Chou P.A., Optimal Partitioning for classification and regression trees IEEE Trans. on Pattern Analysis and Machine Intelligence, PAMI-13(14):340–354, 1991.CrossRefGoogle Scholar
  4. 4.
    Silverstein C., Brin S. and Motwani R., Beyond Market Baskets: Generalizing Association Rules to Dependence Rules Data Mining and Knowledge Discovery, 2(1998), pp. 39–68CrossRefGoogle Scholar
  5. 5.
    Cziszar I., A Class of Measures of Informativity of Observation Channels, Periodic Math. Hungarica, 2:191–213, 1972.CrossRefGoogle Scholar
  6. 6.
    Havrda J.H., Charvát F., Quantification Methods of Classification Processes: Concepts of Structural α Entropy, Kybernetica, 3:30–35, 1967.zbMATHGoogle Scholar
  7. 7.
    Kapur J.N. and Kesavan H.K., Entropy Optimization Principles with Applications, Academic Press, San Diego, 1992.Google Scholar
  8. 8.
    Kvålseth T.O., Entropy and Correlation: Some comments, IEEE Trans. on Systems, Man and Cybernetics, SMC-17(3):517–519, 1987.CrossRefGoogle Scholar
  9. 9.
    McEliece R.J. The Theory of Information and Coding. A mathematical Framework for Communication, Encyclopedia of Mathematics and its Applications, Addisson-Wesley, Reading Massachusetts, 1977.Google Scholar
  10. 10.
    Mitchell T.M.. Machine Learning, McGraw-Hill, ISBN: 0070428077.Google Scholar
  11. 11.
    Morimoto Y., Fukuda T., Matsuzawa H., Tokuyama T. and Yoda K. Algorithms for Mining Association Rules for Binary Segmentations of Huge Categorical Databases, Proc. of the 24th Conf. on Very Large Databases, pp. 380–391, 1998Google Scholar
  12. 12.
    Morishita S., On Classification and Regression Proc. of the First Int’l Conf. on Discovery Science — Lecture Notes in Arti.cial Intelligence 1532:40–57, 1998Google Scholar
  13. 13.
    Padmanabhan B. and Tuzhilin A. Unexpectedness as a measure of interestingness in knowledge discovery Decision and Support Systems 27(1999), pp. 303–318CrossRefGoogle Scholar
  14. 14.
    Simovici, D. A. and Tenney R. L. Relational Database Systems, Academic Press, 1995, San Diego.Google Scholar
  15. 15.
    Wehenkel L., On uncertainty Measures Used for Decision Tree Induction, Info. Proc. and Manag. of Uncertainty in Knowledge-Based Systems (IPMU’96), July 1–5, 1996, Granada Spain, pp. 413–418.Google Scholar
  16. 16.
    Witten I.H., and Eibe F., Data Mining, Practical Machine Learning and Techniques with JAVA Implementations, Academic Press, San Diego, CA, 2000, ISBN: 1558605525.Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2001

Authors and Affiliations

  • Szymon Jaroszewicz
    • 1
  • Dan A. Simovici
    • 1
  1. 1.Department of Mathematics and Computer ScienceUniversity of Massachusetts at BostonBostonUSA

Personalised recommendations