Theoretical Comparison between the Gini Index and Information Gain Criteria

  • Laura Elena Raileanu
  • Kilian Stoffel


Knowledge Discovery in Databases (KDD) is an active and important research area with the promise for a high payoff in many business and scientific applications. One of the main tasks in KDD is classification. A particular efficient method for classification is decision tree induction. The selection of the attribute used at each node of the tree to split the data (split criterion) is crucial in order to correctly classify objects. Different split criteria were proposed in the literature (Information Gain, Gini Index, etc.). It is not obvious which of them will produce the best decision tree for a given data set. A large amount of empirical tests were conducted in order to answer this question. No conclusive results were found. In this paper we introduce a formal methodology, which allows us to compare multiple split criteria. This permits us to present fundamental insights into the decision process. Furthermore, we are able to present a formal description of how to select between split criteria for a given data set. As an illustration we apply the methodology to two widely used split criteria: Gini Index and Information Gain.

decision trees classification Gini Index Information Gain theoretical comparison 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. [1]
    A. Babic, E. Krusinska and J.E. Stromberg, Extraction of diagnostic rules using recursive partitioning systems: A comparison of two approches, Artificial Intelligence in Medicine 20(5) (1992) 373–387.Google Scholar
  2. [2]
    E. Baker and A.K. Jain, On feature ordering in practice and some finite sample effects, in: Proceedings of the Third International Joint Conference on Pattern Recognition, San Diego, CA (1976) pp. 45–49.Google Scholar
  3. [3]
    M. Ben-Bassat, Myopic policies in sequential classification, IEEE Transactions on Computing 27(2) (1978) 170–174.Google Scholar
  4. [4]
    L. Breiman, J. Friedman, R. Olshen and C. Stone, Classification and Regression Trees (Wadsworth International Group, 1984).Google Scholar
  5. [5]
    Lopez de Mantaras, A distance-based attribute selection measure for decision tree induction, Machine Learning 6(1) (1991) 81–92.Google Scholar
  6. [6]
    J. Gama and P. Brazdil, Characterization of classification algorithms, in: EPIA-95: Progress in Artificial Intelligence, 7th Portuguese Conference on Artificial Intelligence, eds. C. Pinto-Ferreira and N. Mamede (Springer, 1995) pp. 189–200.Google Scholar
  7. [7]
    I. Kononenko, On biases in estimating multi-valued attributes, in: IJCAI-95: Proceedings of the 14th International Joint Conference on Artificial Intelligence, Montreal, Canada, ed. C. Mellish (Morgan Kaufmann, San Mateo, CA, 1995) pp. 1034–1040.Google Scholar
  8. [8]
    T.-S. Lim, W.-Y. Loh and Y.-S. Shih, A comparison of prediction accuracy, complexity and training time of thirty-three old and new classification algorithms, Machine Learning (1999).Google Scholar
  9. [9]
    J. Mingers, Expert systems-rule induction with statistical data, Journal of the Operational Research Society 38(1) (1987) 39–47.Google Scholar
  10. [10]
    J. Mingers, An empirical comparison of selection measures for decision tree induction, Machine Learning 3 (1989) 319–342.Google Scholar
  11. [11]
    M. Miyakawa, Criteria for selecting a variable in the construction of efficient decision trees, IEEE Transactions on Computers 35(1) (1929) 133–141.Google Scholar
  12. [12]
    B.M. Moret, Decision trees and diagrams, Computing Surveys 14(4) (1982) 593–623.Google Scholar
  13. [13]
    K.V.S. Murthy, On growing better decision trees fromdata, Ph.D. thesis, The John Hopkins University, Baltimore, MD (1995).Google Scholar
  14. [14]
    G. Pagallo, Adaptive decision tree algorithms for learning from examples, Ph.D. thesis, University of California, Santa Cruz, CA (1990).Google Scholar
  15. [15]
    J.R. Quinlan, Simplifying decision trees, International Journal of Man-Machine Studies 27 (1987) 221–234.Google Scholar
  16. [16]
    J.R. Quinlan, C4.5 Programs for Machine Learning (Morgan Kaufmann, 1993).Google Scholar
  17. [17]
    L.E. Raileanu, Formalization and comparison of split criteria for decision trees, Ph.D. thesis, University of Neuchâtel, Switzerland (May 2002).Google Scholar
  18. [18]
    S.R. Safavin and D. Langrebe, A survey of decision tree classifier methodology, IEEE Transactions on Systems, Man and Cybernetics 21(3) (1991) 660–674.Google Scholar
  19. [19]
    M. Sahami, Learning non-linearly separable Boolean functions with linear threshold unit trees and madaline-style networks, in: Proceedings of the Eleventh National Conference on Artificial Intelligence (AAAI Press, 1993) pp. 335–341.Google Scholar
  20. [20]
    K. Stoffel and L.E. Raileanu, Selecting optimal split-functions for large datasets, in: Research and Development in Intelligent Systems XVII, BCS Conference Series (2000).Google Scholar
  21. [21]
    R. Vilalta and D. Oblinger, A quantification of distance-bias between evaluation metrics in classification, in: Proceedings of the 17th International Conference on Machine Learning, Stanford University (2000).Google Scholar
  22. [22]
    A.P. White and W.Z. Liu, Bias il information-based measures in decision tree induction, Machine Learning 15(3) (1997) 321–328.Google Scholar

Copyright information

© Kluwer Academic Publishers 2004

Authors and Affiliations

  • Laura Elena Raileanu
    • 1
  • Kilian Stoffel
    • 1
  1. 1.Computer Science DepartmentUniversity of NeuchâtelNeuchâtelSwitzerland

Personalised recommendations