Data Mining and Knowledge Discovery

, Volume 24, Issue 1, pp 288–309 | Cite as

Entropy on covers

  • Zhimin WangEmail author


As a generalization of partitions, covers allow overlaps between their members. In this paper, we will propose a family of entropy-like measures over covers, which are anti monotonic with regard to the partial order defined by refinement relations of covers. In parallel to the entropy theory, we also develop their conditional forms, which in turn lead to a family of semi-metrics on covers. These make it possible to apply entropy-based techniques in data mining or machine learning to problems naturally modelled by covers.


Entropy Cover Feature selection Soft clustering 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. Amigó E, Gonzalo J, Artiles J, Verdejo F (2009) A comparison of extrinsic clustering evaluation metrics based on formal constraints. Inf Retr 12(4): 461–486CrossRefGoogle Scholar
  2. Asuncion A, Newman D (2007) UCI machine learning repository.
  3. Berkhin P (2006) A survey of clustering data mining techniques. In: Kogan J, Nicholas C, Teboulle M (eds) Grouping multidimensional data. Springer, Berlin Heidelberg, pp 25–71CrossRefGoogle Scholar
  4. Cheng C-H, Fu AW, Zhang Y (1999) Entropy-based subspace clustering for mining numerical data. In: KDD ’99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, New York, pp 84–93Google Scholar
  5. Cohen WW (1995) Fast effective rule induction. In: Proceedings of the twelfth international conference on machine learning. Morgan Kaufmann, San Mateo, pp 115–123Google Scholar
  6. Cover TM, Thomas JA (2006) Elements of information theory, ch 2. Wiley-Interscience, New YorkGoogle Scholar
  7. Dash M, Liu H (1997) Feature selection for classification. Intell Data Anal 1: 131–156CrossRefGoogle Scholar
  8. De Mãąntaras RL (1991) A distance-based attribute selection measure for decision tree induction. Mach Learn 6: 81–92. doi: 10.1023/A:1022694001379 Google Scholar
  9. Frank E, Witten IH (1998) Generating accurate rule sets without global optimization. In: Generating accurate rule sets without global optimization. Morgan Kaufmann, San Mateo, pp 144–151Google Scholar
  10. Halkidi M, Batistakis Y, Vazirgiannis M (2001) On clustering validation techniques. J Intell Inf Syst 17: 107–145CrossRefzbMATHGoogle Scholar
  11. Hall MA (1999) Correlation-based feature selection for machine learning, Technical report. University of Waikato, HamiltonGoogle Scholar
  12. Jebara T (2000) Feature selection and dualities in maximum entropy discrimination. In: Proceedings of the 16th conference in uncertainty in artificial intelligence. pp 291–300Google Scholar
  13. Jensen R, Shen Q (2007) Tolerance-based and fuzzy-rough feature selection. In: Fuzzy systems conference, 2007. FUZZ-IEEE 2007. IEEE International. pp 1098–7584Google Scholar
  14. Kira K, Rendell L (1992) The feature selction problem: traditional methods and a new algorithm. In: Proceedings of nineth national conference on artificial intelligence. pp 129–134Google Scholar
  15. Kononenko I (1994) Estimating attributes: analysis and extensions of relief. In: Bergadano F, De Raedt L (eds) Machine learning: ECML-94, vol 784 of lecture notes in computer science. Springer, Berlin/ Heidelberg, pp 171–182Google Scholar
  16. Markov Z, Russell I (2006) An introduction to the WEKA data mining system. SIGCSE Bull 38(3): 367–368CrossRefGoogle Scholar
  17. Meilă M (2007) Comparing clusterings—an information based distance. J Multivar Anal 98(5): 873–895CrossRefzbMATHGoogle Scholar
  18. Mirkin B (1996) Mathematical classification and clustering. Kluwer Academeic Press, DordrechtCrossRefzbMATHGoogle Scholar
  19. Nguyen S, Skowron A (1997) Searching for relational patterns in data. In: Komorowski J, Zytkow J (eds) Principles of data mining and knowledge discovery, vol. 1263 of Lecture Notes in Computer Science. Springer, Berlin/Heidelberg, pp 265–276Google Scholar
  20. Parpinelli RS, Lopes HS, Freitas AA (2002) Data mining with an ant colony optimization algorithm. IEEE Trans Evol Comput 6: 321–332CrossRefGoogle Scholar
  21. Parthalãąin NM, Shen Q (2009) Exploring the boundary region of tolerance rough sets for feature selection. Pattern Recogn 42(5): 655–667CrossRefGoogle Scholar
  22. Peng H, Long F, Ding C (2005) Feature selection based on mutual information: criteria of max-dependency, max-relevance, and min-redundancy. IEEE Trans Pattern Anal Machine Intell 27: 1226–1238CrossRefGoogle Scholar
  23. Quinlan JR (1986) Induction of decision trees. Mach Learn 1: 81–106Google Scholar
  24. Quinlan JR (1993) C4.5: Programs for Machine Learning. Morgan Kaufmann, San MateoGoogle Scholar
  25. Rand WM (1971) Objective criteria for the evaluation of clustering methods. J Am Stat Assoc 66(336): 846–850CrossRefGoogle Scholar
  26. Rijsbergen V (1974) Foundation of evaluation. J Documentation 30(4): 365–373CrossRefGoogle Scholar
  27. Shannon CE (1948) A mathematical theory of communication. Bell Syst Tech J 27:379–423 and 623–656Google Scholar
  28. Shen Q, Chouchoulas A (2002) A rough-fuzzy approach for generating classification rules. Pattern Recogn 35(11): 2425–2438CrossRefzbMATHGoogle Scholar
  29. Simovici DA (2007) On the axiomatization of generalized entropic distances. Multivalued Logic Soft Comput 13(4–6):295–320Google Scholar
  30. Stepaniuk J (1998) Optimizations of rough set model. Fundam Inf 36: 265–283zbMATHMathSciNetGoogle Scholar
  31. Wang Z. Metrics for overlapping clustering comparison. Accessed 4 Aug 2011
  32. Zamir O, Etzioni O (1998) Web document clustering: a feasibility demonstration. In: Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval. SIGIR ’98, ACM, New York, pp 46–54Google Scholar

Copyright information

© The Author(s) 2011

Authors and Affiliations

  1. 1.Harvard University HerbariaCambridgeUSA

Personalised recommendations