GD: A Measure Based on Information Theory for Attribute Selection

  • Javier Lorenzo
  • Mario Hernández
  • Juan Méndez
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 1484)


In this work a measure called GD is presented for attribute selection. This measure is defined between an attribute set and a class and corresponds to a generalization of the Mántaras distance that allows to detect the interdependencies between attributes. In the same way, the proposed measure allows to order the attributes by importance in the definition of the concept. This measure does not exhibit a noticeable bias in favor of attributes with many values. The quality of the selected attributes using the GD measure is tested by means of different comparisons with other two attribute selection methods over 19 datasets.


Machine learning Intelligent information retrieval Feature selection 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    David W. Aha and Richard L. Bankert. Feature selection for case-based classification of cloud types: An empirical comparison. In Proc. of the 1994 AAAI Workshop on Case-Based Reasoning, pages 106–112. AAAI Press, 1994. 124Google Scholar
  2. 2.
    David W. Aha, Dennis Kibler, and Marc K. Albert. Instance-based learning algorithms. Machine Learning, 6:37–66, 1991. 130Google Scholar
  3. 3.
    H. Almuallim and T.G. Dietterich. Learning with many irrelevant features. In Proc. of the Ninth National Conference on Artificial Intelligence, pages 547–552. AAAI Press, 1991. 124Google Scholar
  4. 4.
    Michael R. Anderberg. Cluster Analysis for Applications. Academic Press Inc., New York, 1973. 129zbMATHGoogle Scholar
  5. 5.
    Rich Caruana and Dayne Freitag. Greedy attribute selection. In Proc. of the 11th International Machine Learning Conference, pages 28–36, New Brunswick, NJ, 1994. Morgan Kaufmann. 130Google Scholar
  6. 6.
    T. M. Cover and J. A. Thomas. Elements of Information Theory. John Wiley & Sons Inc., 1991. 126Google Scholar
  7. 7.
    Walter Daelemans and Antal van den Bosch. Generalization performance of backpropagation learning on a syllabification task. In Proc. of the Third Twente Workshop on Language Technology, pages 27–38, 1992. 125Google Scholar
  8. 8.
    P. A. Devijver and J. Kittler. Pattern Recognition: A Statistical Approach. Prentice-Hall, Englewood Cliffs, New Jersey, 1982. 131zbMATHGoogle Scholar
  9. 9.
    R. Duda and P. Hart. Pattern Classification and Scene Analysis. John Willey and Sons, 1973. 128, 130Google Scholar
  10. 10.
    G. H. John, R. Kohavi, and K. Pfleger. Irrelevant features and the subset selection problem. In W. William and Haym Hirsh, editors, Procs. of the Eleventh International Conference on Machine Learning, pages 121–129. Morgan Kaufmann, San Francisco, CA, 1994. 124Google Scholar
  11. 11.
    Kenji Kira and Larry A. Rendell. The feature selection problem: Traditional methods and a new algorithm. In Proc. of the 10th National Conf. on Artificial Intelligence, pages 129–134, 1992. 124Google Scholar
  12. 12.
    Ron Kohavi and George H. John. Wrappers for feature subset selection. Artificial Intelligence, 97(1–2):273–324, December 1997. 130zbMATHCrossRefGoogle Scholar
  13. 13.
    Ron Kohavi, Dan Sommerfield, and James Dougherty. Data mining using MLC++: A machine learning library in C++. In Tools with Artificial Intelligence, pages 234–245. IEEE Computer Society Press, 1996. Received the best paper award. 130Google Scholar
  14. 14.
    Igor Kononenko. Estimating attributes: Analysis and extensions of relief. In F. Bergadano and L. de Raedt, editors, Machine Learning: ECML-94, pages 171–182, Berlin, 1994. Springer. 130Google Scholar
  15. 15.
    Nick Littlestone. Learning quickly when irrelevant attributes abound: A new linear-threshold algorithm. Machine Learning, 2:285–318, 1988. 124Google Scholar
  16. 16.
    R. Lopez de Mántaras. A distance-based attribute selection measure for decision tree induction. Machine Learning, 6:81–92, 1991. 125, 126CrossRefGoogle Scholar
  17. 17.
    Javier Lorenzo and Mario Hernández. Sobre el uso de conceptos de teoráa de la información en la selección de características. Technical Report GIAS-TR-006, Grupo de Inteligencia Artificial y Sistemas, Dpto. de Informática y Sistemas, Univ. de Las Palmas de Gran Canaria, 1996. 127, 129Google Scholar
  18. 18.
    David J.C. MacKay. Information theory, inference and learning algorithms., 1997. 126
  19. 19.
    C. J. Merz and P.M. Murphy. UCI Repository of machine learning databases. Irvine, CA: University of California, Department of Information and Computer Science., 1996. 130Google Scholar
  20. 20.
    J. R. Quinlan. Induction of decision trees. Machine Learning, 1:81–106, 1986. 125, 130Google Scholar
  21. 21.
    M Scherf and W. Brauer. Feature selection by means of a feature weighting approach. Technical Report FKI-221-97, Institut fur Informatik, Technische Universitat Munchen, 1997. 125Google Scholar
  22. 22.
    Dietrich Wettschereck and David W. Aha. Weighting features. In Proc. of the First Int. Conference on Case-Based Reasoning, pages 347–358, 1995. 130Google Scholar
  23. 23.
    Dietrich Wettschereck and Thomas G. Dieterich. An experimental comparison of the nearest-neighbor and nearest-hyperrectangle algorithms. Machine Learning, pages 5–27, 1995. 125Google Scholar
  24. 24.
    Allan P. White and Wei Zhong Liu. Bias in information-based measures in decision tree induction. Machine Learning, 15:321–329, 1994. 125, 129zbMATHGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 1998

Authors and Affiliations

  • Javier Lorenzo
    • 1
  • Mario Hernández
    • 1
  • Juan Méndez
    • 1
  1. 1.Dpto. de Informática y SistemasUniv. de Las Palmas de Gran CanariaLas PalmasSpain

Personalised recommendations