Skip to main content

Evaluation of Interestingness Measures for Ranking Discovered Knowledge

  • Conference paper
  • First Online:
Advances in Knowledge Discovery and Data Mining (PAKDD 2001)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 2035))

Included in the following conference series:

Abstract

When mining a large database, the number of patterns discovered can easily exceed the capabilities of a human user to identify interesting results. To address this problem, various techniques have been suggested to reduce and/or order the patterns prior to presenting them to the user. In this paper, our focus is on ranking summaries generated from a single dataset, where attributes can be generalized in many different ways and to many levels of granularity according to taxonomic hierarchies. We theoretically and empirically evaluate thirteen diversity measures used as heuristic measures of interestingness for ranking summaries generated from databases. The thirteen diversity measures have previously been utilized in various disciplines, such as information theory, statistics, ecology, and economics. We describe five principles that any measure must satisfy to be considered useful for ranking summaries. Theoretical results show that only four of the thirteen diversity measures satisfy all of the principles. We then analyze the distribution of the index values generated by each of the thirteen diversity measures. Empirical results, obtained using synthetic data, show that the distribution of index values generated tend to be highly skewed about the mean, median, and middle index values. The objective of this work is to gain some insight into the behaviour that can be expected from each of the measures in practice.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. R. Agrawal and R. Srikant. Fast algorithms for mining association rules. In Proceedings of the 20th International Conference on Very Large Databases (VLDB’94), pages 487–499, Santiago, Chile, September 1994.

    Google Scholar 

  2. A.B. Atkinson. On the measurement of inequality. Journal of Economic Theory, 2:244–263, 1970.

    Article  MathSciNet  Google Scholar 

  3. R.J. Bayardo and R. Agrawal. Mining the most interesting rules. In Proceedings of the Fifth International Conference on Knowledge Discovery and Data Mining (KDD’99), pages 145–154, San Diego, California, August 1999.

    Google Scholar 

  4. I. Bournaud and J.-G. Ganascia. Accounting for domain knowledge in the construction of a generalization space. In Proceedings of the Third International Conference on Conceptual Structures, pages 446–459. Springer-Verlag, August 1997.

    Google Scholar 

  5. C.L. Carter and H.J. Hamilton. Efficient attribute-oriented algorithms for knowledge discovery from large databases. IEEE Transactions on Knowledge and Data Engineering, 10(2):193–208, March/April 1998.

    Article  Google Scholar 

  6. H. Dalton. The measurement of the inequality of incomes. Economic Journal, 30:348–361, 1920.

    Article  Google Scholar 

  7. G. Dong and J. Li. Interestingness of discovered association rules in terms of neighborhood-based unexpectedness. In X. Wu, R. Kotagiri, and K. Korb, editors, Proceedings of the Second Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD’98), pages 72–86, Melbourne, Australia, April 1998.

    Google Scholar 

  8. A.A. Freitas. On objective measures of rule surprisingness. In J. Zytkow and M. Quafafou, editors, Proceedings of the Second European Conference on the Principles of Data Mining and Knowledge Discovery (PKDD’98), pages 1–9, Nantes, France, September 1998.

    Google Scholar 

  9. R. Godin, R. Missaoui, and H. Alaoui. Incremental concept formation algorithms based on galois (concept) lattices. Computational Intelligence, 11(2):246–267, 1995.

    Article  Google Scholar 

  10. R.J. Hilderman and H.J. Hamilton. Heuristic measures of interestingness. In J. Zytkow and J. Rauch, editors, Proceedings of the Third European Conference on the Principles of Data Mining and Knowledge Discovery (PKDD’99), pages 232–241, Prague, Czech Republic, September 1999.

    Google Scholar 

  11. R.J. Hilderman and H.J. Hamilton. Heuristics for ranking the interestingness of discovered knowledge. In N. Zhong and L. Zhou, editors, Proceedings of the Third Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD’99), pages 204–209, Beijing, China, April 1999.

    Google Scholar 

  12. R.J. Hilderman and H.J. Hamilton. Applying objective interestingness measures in data mining systems. In Proceedings of the 4th European Symposium on Principles of Data Mining and Knowledge Discovery (PKDD’00), pages 432–439, Lyon, France, September 2000.

    Google Scholar 

  13. R.J. Hilderman and H.J. Hamilton. Principles for mining summaries: Theorems and proofs. Technical Report CS 00-01, Department of Computer Science, University of Regina, February 2000. Online at http://www.cs.uregina.ca/research/Techreport/0001.ps.

  14. R.J. Hilderman and H.J. Hamilton. Principles for mining summaries using objective measures of interestingness. In Proceedings of the Twelfth IEEE International Conference on Tools with Artificial Intelligence (ICTAI’00), pages 72–81, Vancouver, Canada, November 2000.

    Google Scholar 

  15. R.J. Hilderman, H.J. Hamilton, and N. Cercone. Data mining in large databases using domain generalization graphs. Journal of Intelligent Information Systems, 13(3):195–234, November 1999.

    Article  Google Scholar 

  16. S. Lieberson. An extension of Greenberg’s linguistic diversity measures. Language, 40:526–531, 1964.

    Article  Google Scholar 

  17. A.E. Magurran. Ecological diversity and its measurement. Princeton University Press, 1988.

    Google Scholar 

  18. B. Padmanabhan and A. Tuzhilin. A belief-driven method for discovering unexpected patterns. In Proceedings of the Fourth International Conference on Knowledge Discovery and Data Mining (KDD’98), pages 94–100, New York, New York, August 1998.

    Google Scholar 

  19. G.P. Patil and C. Taillie. Diversity as a concept and its measurement. Journal of the American Statistical Association, 77(379):548–567, 1982.

    Article  MATH  MathSciNet  Google Scholar 

  20. S. Sahar. Interestingness via what is not interesting. In Proceedings of the Fifth International Conference on Knowledge Discovery and Data Mining (KDD’99), pages 332–336, San Diego, California, August 1999.

    Google Scholar 

  21. C.E. Shannon and W. Weaver. The mathematical theory of communication. University of Illinois Press, 1949.

    Google Scholar 

  22. G. Stumme, R. Wille, and U. Wille. Conceptual knowledge discovery in databases using formal concept analysis methods. In J. Zytkow and M. Quafafou, editors, Proceedings of the Second European Conference on the Principles of Data Mining and Knowledge Discovery (PKDD’98), pages 450–458, Nantes, France, September 1998.

    Google Scholar 

  23. M.L. Weitzman. On diversity. The Quarterly Journal of Economics, pages 363–405, May 1992.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2001 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Hilderman, R.J., Hamilton, H.J. (2001). Evaluation of Interestingness Measures for Ranking Discovered Knowledge. In: Cheung, D., Williams, G.J., Li, Q. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2001. Lecture Notes in Computer Science(), vol 2035. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45357-1_28

Download citation

  • DOI: https://doi.org/10.1007/3-540-45357-1_28

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-41910-5

  • Online ISBN: 978-3-540-45357-4

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics