GD: A Measure Based on Information Theory for Attribute Selection

Lorenzo, Javier; Hernández, Mario; Méndez, Juan

doi:10.1007/3-540-49795-1_11

Javier Lorenzo²,
Mario Hernández² &
Juan Méndez²

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 1484))

Included in the following conference series:

Ibero-American Conference on Artificial Intelligence

2487 Accesses
3 Citations

Abstract

In this work a measure called GD is presented for attribute selection. This measure is defined between an attribute set and a class and corresponds to a generalization of the Mántaras distance that allows to detect the interdependencies between attributes. In the same way, the proposed measure allows to order the attributes by importance in the definition of the concept. This measure does not exhibit a noticeable bias in favor of attributes with many values. The quality of the selected attributes using the GD measure is tested by means of different comparisons with other two attribute selection methods over 19 datasets.

This work was supported in part the Spanish Ministry of Education under project TAP95-0288

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

David W. Aha and Richard L. Bankert. Feature selection for case-based classification of cloud types: An empirical comparison. In Proc. of the 1994 AAAI Workshop on Case-Based Reasoning, pages 106–112. AAAI Press, 1994. 124
Google Scholar
David W. Aha, Dennis Kibler, and Marc K. Albert. Instance-based learning algorithms. Machine Learning, 6:37–66, 1991. 130
Google Scholar
H. Almuallim and T.G. Dietterich. Learning with many irrelevant features. In Proc. of the Ninth National Conference on Artificial Intelligence, pages 547–552. AAAI Press, 1991. 124
Google Scholar
Michael R. Anderberg. Cluster Analysis for Applications. Academic Press Inc., New York, 1973. 129
MATH Google Scholar
Rich Caruana and Dayne Freitag. Greedy attribute selection. In Proc. of the 11th International Machine Learning Conference, pages 28–36, New Brunswick, NJ, 1994. Morgan Kaufmann. 130
Google Scholar
T. M. Cover and J. A. Thomas. Elements of Information Theory. John Wiley & Sons Inc., 1991. 126
Google Scholar
Walter Daelemans and Antal van den Bosch. Generalization performance of backpropagation learning on a syllabification task. In Proc. of the Third Twente Workshop on Language Technology, pages 27–38, 1992. 125
Google Scholar
P. A. Devijver and J. Kittler. Pattern Recognition: A Statistical Approach. Prentice-Hall, Englewood Cliffs, New Jersey, 1982. 131
MATH Google Scholar
R. Duda and P. Hart. Pattern Classification and Scene Analysis. John Willey and Sons, 1973. 128, 130
Google Scholar
G. H. John, R. Kohavi, and K. Pfleger. Irrelevant features and the subset selection problem. In W. William and Haym Hirsh, editors, Procs. of the Eleventh International Conference on Machine Learning, pages 121–129. Morgan Kaufmann, San Francisco, CA, 1994. 124
Google Scholar
Kenji Kira and Larry A. Rendell. The feature selection problem: Traditional methods and a new algorithm. In Proc. of the 10th National Conf. on Artificial Intelligence, pages 129–134, 1992. 124
Google Scholar
Ron Kohavi and George H. John. Wrappers for feature subset selection. Artificial Intelligence, 97(1–2):273–324, December 1997. 130
Article MATH Google Scholar
Ron Kohavi, Dan Sommerfield, and James Dougherty. Data mining using MLC++: A machine learning library in C++. In Tools with Artificial Intelligence, pages 234–245. IEEE Computer Society Press, 1996. Received the best paper award. 130
Google Scholar
Igor Kononenko. Estimating attributes: Analysis and extensions of relief. In F. Bergadano and L. de Raedt, editors, Machine Learning: ECML-94, pages 171–182, Berlin, 1994. Springer. 130
Google Scholar
Nick Littlestone. Learning quickly when irrelevant attributes abound: A new linear-threshold algorithm. Machine Learning, 2:285–318, 1988. 124
Google Scholar
R. Lopez de Mántaras. A distance-based attribute selection measure for decision tree induction. Machine Learning, 6:81–92, 1991. 125, 126
Article Google Scholar
Javier Lorenzo and Mario Hernández. Sobre el uso de conceptos de teoráa de la información en la selección de características. Technical Report GIAS-TR-006, Grupo de Inteligencia Artificial y Sistemas, Dpto. de Informática y Sistemas, Univ. de Las Palmas de Gran Canaria, 1996. 127, 129
Google Scholar
David J.C. MacKay. Information theory, inference and learning algorithms. http://wol.ra.phy.cam.ac.uk/mackay/itprnn/book.ps.gz, 1997. 126
C. J. Merz and P.M. Murphy. UCI Repository of machine learning databases. Irvine, CA: University of California, Department of Information and Computer Science., 1996. 130
Google Scholar
J. R. Quinlan. Induction of decision trees. Machine Learning, 1:81–106, 1986. 125, 130
Google Scholar
M Scherf and W. Brauer. Feature selection by means of a feature weighting approach. Technical Report FKI-221-97, Institut fur Informatik, Technische Universitat Munchen, 1997. 125
Google Scholar
Dietrich Wettschereck and David W. Aha. Weighting features. In Proc. of the First Int. Conference on Case-Based Reasoning, pages 347–358, 1995. 130
Google Scholar
Dietrich Wettschereck and Thomas G. Dieterich. An experimental comparison of the nearest-neighbor and nearest-hyperrectangle algorithms. Machine Learning, pages 5–27, 1995. 125
Google Scholar
Allan P. White and Wei Zhong Liu. Bias in information-based measures in decision tree induction. Machine Learning, 15:321–329, 1994. 125, 129
MATH Google Scholar

Download references

Author information

Authors and Affiliations

Dpto. de Informática y Sistemas, Univ. de Las Palmas de Gran Canaria, Campus Univ. de Tafira, 35017, Las Palmas, Spain
Javier Lorenzo, Mario Hernández & Juan Méndez

Authors

Javier Lorenzo
View author publications
You can also search for this author in PubMed Google Scholar
Mario Hernández
View author publications
You can also search for this author in PubMed Google Scholar
Juan Méndez
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Dep. Informática, Fac. Ciências de Lisboa, Bloco C5, Piso 1, Campo Grande, 1700, Lisboa, Portugal
Helder Coelho

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Lorenzo, J., Hernández, M., Méndez, J. (1998). GD: A Measure Based on Information Theory for Attribute Selection. In: Coelho, H. (eds) Progress in Artificial Intelligence — IBERAMIA 98. IBERAMIA 1998. Lecture Notes in Computer Science(), vol 1484. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-49795-1_11

Download citation

DOI: https://doi.org/10.1007/3-540-49795-1_11
Published: 14 January 2003
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-64992-2
Online ISBN: 978-3-540-49795-0
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics