Asymmetric and Sample Size Sensitive Entropy Measures for Supervised Learning

  • Djamel A. Zighed
  • Gilbert Ritschard
  • Simon Marcellin
Part of the Studies in Computational Intelligence book series (SCI, volume 265)


Many algorithms of machine learning use an entropy measure as optimization criterion.Among the widely used entropy measures, Shannon’s is one of the most popular. In some real world applications, the use of such entropy measures without precautions, could lead to inconsistent results. Indeed, the measures of entropy are built upon some assumptions which are not fulfilled in many real cases. For instance, in supervised learning such as decision trees, the classification cost of the classes is not explicitly taken into account in the tree growing process. Thus, the misclassification costs are assumed to be the same for all classes. In the case where those costs are not equal on all classes, the maximum of entropy must be elsewhere than on the uniform probability distribution. Also, when the classes don’t have the same a priori distribution of probability, the worst case (maximum of the entropy) must be elsewhere than on the uniform distribution. In this paper, starting from real world problems, we will show that classical entropy measures are not suitable for building a predictive model. Then, we examine the main axioms that define an entropy and discuss their inadequacy in machine learning. This we lead us to propose a new entropy measure that possesses more suitable proprieties.Afterwhat,we carry out some evaluations on data sets that illustrate the performance of the new measure of entropy.


Asymmetric Entropy Unbalanced cost sensitive Decision tree 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Aczel, J., Daroczy, Z.: On Measures of Information and Their Characterizations. Academic Press, London (1975)zbMATHGoogle Scholar
  2. 2.
    Barandela, R., Sanchez, J.S., Garcia, V., Rangel, E.: Strategies for learning in class imbalance problems. Pattern Recognition 36(3), 849–851 (2003)CrossRefGoogle Scholar
  3. 3.
    Breiman, L., Friedman, J.H., Olshen, R.A., Stone, C.J.: Classification And Regression Trees. Chapman and Hall, New York (1984)zbMATHGoogle Scholar
  4. 4.
    Chai, X., Deng, L., Yang, Q.: Ling: Test-cost sensitive naive bayes classification. In: IEEE (ed.) ICDM apos; 2004. Fourth IEEE International Conference on Data Mining, ICDM 2004, pp. 973–978 (2004)Google Scholar
  5. 5.
    Chen, C., Liaw, A., Breiman, L.: Using random forest to learn imbalanced data. Technical Report 666, Berkeley, Department of Statistics, University of California (2004)Google Scholar
  6. 6.
    Domingos, P.: Metacost: A general method for making classifiers cost-sensitive. In: Proceedings of the Fifth International Conference on Knowledge Discovery and Data Mining (KDD 1999), pp. 155–164 (1999)Google Scholar
  7. 7.
    Egan, J.: Signal detection theory and roc analysis. Series in Cognition and Perception (1975)Google Scholar
  8. 8.
    Elkan, C.: The foundations of cost-sensitive learning. In: Nebel, B. (ed.) IJCAI, pp. 973–978. Morgan Kaufmann, San Francisco (2001)Google Scholar
  9. 9.
    Fawcett, T.: An introduction to roc analysis. Pattern Recognition Letter 27(8), 861–874 (2006)CrossRefMathSciNetGoogle Scholar
  10. 10.
    Forte, B.: Why shannon’s entropy. In Conv. Inform. Teor. 15, 137–152 (1973)Google Scholar
  11. 11.
    Hartley, R.V.: Transmission of information. Bell System Tech. J. 7, 535–563 (1928)Google Scholar
  12. 12.
    Hencin, A.J.: The concept of entropy in the theory of probability. Math. Found. of Information Theory, 1–28 (1957)Google Scholar
  13. 13.
    Hettich, S., Bay, S.D.: The uci kdd archive (1999)Google Scholar
  14. 14.
    Lallich, S., Lenca, P., Vaillant, B.: Probabilistic framework towards the parametrisation of association rule interestingness measures. Methodology and Computing in Applied Probability 9(3), 447–463 (2007)zbMATHCrossRefMathSciNetGoogle Scholar
  15. 15.
    Marcellin, S., Zighed, D., Ritschard, G.: An asymmetric entropy measure for decision trees. In: 11th Information Processing and Management of Uncertainty in knowledge-based systems (IPMU 2006), Paris, France, pp. 1292–1299 (2006)Google Scholar
  16. 16.
    Provost, F.: Learning with imbalanced data sets. Invited paper for the AAAI 2000 Workshop on Imbalanced Data Sets (2000)Google Scholar
  17. 17.
    Provost, F.J., Fawcett, T.: Analysis and visualization of classifier performance: Comparison under imprecise class and cost distributions. In: Knowledge Discovery and Data Mining, pp. 43–48 (1997)Google Scholar
  18. 18.
    Quinlan, J.R.: C4.5: Programs for Machine Learning. Morgan Kaufmann, San Mateo (1993)Google Scholar
  19. 19.
    Renyi, A.: On measures of entropy and information. In: 4th Berkely Symp. Math. Statist. Probability, vol. 1, pp. 547–561 (1960)Google Scholar
  20. 20.
    Ritschard, G., Zighed, D., Marcellin, S.: Données déséquilibrées, entropie décentrée et indice d’implication. In: Gras, R., Orus, P., Pinaud, B., Gregori, P. (eds.) Nouveaux apports théoriques à l’analyse statistique implicative et applications. actes des 4émes rencontres ASI4, Castellon de la Plana (Espana), Departament de Matematiques, Universitat Jaume I, Octobre 18-21, pp. 315–327 (2007)Google Scholar
  21. 21.
    Sebastiani, F.: Machine learning in automated text categorization. ACM Comput. Surv. 34(1), 1–47 (2002)CrossRefGoogle Scholar
  22. 22.
    Shannon, C.E.: A mathematical theory of communication. Bell System Tech. J. 27, 379–423 (1948)zbMATHMathSciNetGoogle Scholar
  23. 23.
    Shannon, C.A., Weaver, W.: The mathematical of communication. University of Illinois Press (1949)Google Scholar
  24. 24.
    Thomas, J.: Apprentissage supervisé de données déséquilibrées par forêt aléatoire. Thése de doctorat, Université Lyon 2 (2009)Google Scholar
  25. 25.
    Zighed, D.A., Marcellin, S., Ritschard, G.: Mesure d’entropie asymétrique et consistante. In: Noirhomme-Fraiture, M., Venturini, G. (eds.) EGC, Cépadués edn. Revue des Nouvelles Technologies de l’Information, vol. RNTI-E-9, pp. 81–86 (2007)Google Scholar
  26. 26.
    Zighed, D., Rakotomalala, R.: Graphe d’induction: Apprentissage et Data Mining. Hermés, Paris (2000)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2010

Authors and Affiliations

  • Djamel A. Zighed
    • 1
  • Gilbert Ritschard
    • 2
  • Simon Marcellin
    • 1
  1. 1.ERIC Lab.University of Lyon 2BronFrance
  2. 2.Dep. of econometryUniversity of GenevaGeneva 4Switzerland

Personalised recommendations