A Formalism for Relevance and Its Application in Feature Subset Selection

Bell, David A.; Wang, Hui

doi:10.1023/A:1007612503587

A Formalism for Relevance and Its Application in Feature Subset Selection

Published: November 2000

Volume 41, pages 175–195, (2000)
Cite this article

We’re sorry, something doesn't seem to be working properly.

Please try refreshing the page. If that doesn't work, please contact support so we can address the problem.

Machine Learning Aims and scope Submit manuscript

A Formalism for Relevance and Its Application in Feature Subset Selection

We’re sorry, something doesn't seem to be working properly.

Please try refreshing the page. If that doesn't work, please contact support so we can address the problem.

David A. Bell¹ &
Hui Wang²

1657 Accesses
126 Citations
Explore all metrics

Abstract

The notion of relevance is used in many technical fields. In the areas of machine learning and data mining, for example, relevance is frequently used as a measure in feature subset selection (FSS). In previous studies, the interpretation of relevance has varied and its connection to FSS has been loose. In this paper a rigorous mathematical formalism is proposed for relevance, which is quantitative and normalized. To apply the formalism in FSS, a characterization is proposed for FSS: preservation of learning information and minimization of joint entropy. Based on the characterization, a tight connection between relevance and FSS is established: maximizing the relevance of features to the decision attribute, and the relevance of the decision attribute to the features. This connection is then used to design an algorithm for FSS. The algorithm is linear in the number of instances and quadratic in the number of features. The algorithm is evaluated using 23 public datasets, resulting in an improvement in prediction accuracy on 16 datasets, and a loss in accuracy on only 1 dataset. This provides evidence that both the formalism and its connection to FSS are sound.

References

Aha, D. W. & Bankert, R. L. (1994). Feature selection for case-based classification of cloud types. In Working notes of the AAAI94 Workshop on Case-based Reasoning (pp. 106–112). AAAI Press.
Almuallim, H. & Dietterich, T. G. (1991). Learning with many irrelevant features. In Proc. Ninth National Conference on Artificial Intelligence (pp. 547–552). MIT Press.
Almuallim, H. & Dietterich, T. G. (1994). Learning boolean concepts in the presence of many irrelevant features. Artificial Intelligence, 69, 279–305.
Google Scholar
Amirikian, B. & Nishimura, H. (1994). What size network is good for generalization of a specific task of interest? Neural Networks, 7(2), 321–329.
Google Scholar
Blum, A. (1994). Relevant examples & relevant features: thoughts from computational learning theory. In Relevance: Proc. 1994 AAAI Fall Symposium (pp. 14–18). AAAI Press.
Blumer, A., Ehrenfeucht, A., Haussler, D., & Warmuth, M. K. (1987). Occam's Razor. Information Processing Letters, 24, 377–380.
Google Scholar
Carnap, R. (1962). Logical foundations of probability. The University of Chicago Press.
Caruana, R. A. & Freitag, D. (1994). Greedy attribute selection. In Proceedings of the 11th international conference on machine learning (pp. 28–36). New Brunswick, NJ: Morgan Kaufmann.
Google Scholar
Cover, T. M. & Thomas, J. A. 1991. Elements of information theory. John Wiley & Sons, Inc.
Davies, S. & Russell, S. 1994. NP-completeness of searches for smallest possible feature sets. In Proceedings of the 1994 AAAI Fall Symposium on Relevance (pp. 37–39). AAAI Press.
Fayyad, U. & Irani, K. (1990). What should be minimized in a decision tree? In AAAI-90: Proceedings of 8th National Conference on Artificial Intelligence (pp. 749–754).
Fayyad, U. & Irani, K. (1992). The attribute selection problem in decision tree generation. In AAAI-92: Proceedings of 10th National Conference on Artificial Intelligence (pp. 104–110).
Gärdenfors, P. (1978). On the logic of relevance. Synthese, 37, 351–367.
Google Scholar
Gennari, J. H., Langley, P., & Fisher, D. (1989). Models of incremental concept formation. Artificial Intelligence, 40, 11–61.
Google Scholar
Greiner, R. & Subramanian, D. (Eds.). 1994. In Relevance: Proc. 1994 AAAI Fall Symposium. The AAAI Press. AAAI Technical Report FS–94–02.
John, G. H., Kohavi, R., & Pfleger, K. (1994). Irrelevant features and the subset selection problem. In Proceedings of the 11th international conference on machine learning (pp. 121–129). New Brunswick, NJ: Morgan Kaufmann.
Google Scholar
Keynes, J. M. (1921). A treatise on probability. London: Macmillan.
Google Scholar
Kira, K. & Rendell, L. A. (1992). The feature selection problem: traditional methods and a new algorithm. In AAAI-92 (pp. 129–134).
Kohavi, R. & Sommerfield, D. (1995). Feature subset selection using the wrapper method: Overfitting and dynamic search space topology. In U. M. Fayyad & R. Uthurusamy (Eds.), Proceedings of KDD'95 (pp. 192–197).
Kohavi, R. (1994). Feature Subset Selection as Search with Probabilistic Estimates. In R. Greiner, & D. Subramanian (Eds.). Relevance: Proc 1994 AAAI Fall Symposium (pp. 122–126). The AAAI Press.
Kononenko, I., Simec, E., & Robnik-Sikonja, M. (1997). Overcoming the myopia of inductive learning algorithms with relieff. Applied Intelligence, 7, 39–55.
Google Scholar
Kononenko, I. (1994) Estimating attributes: Analysis and extensions of RELIEF. In Proceedings of the 1994 European Conference on Machine Learning (pp. 171–182).
Lakemeyer, G. (1995). A Logical account of relevance. In Proc. of IJCAI-95 (pp. 853–859).
Littlestone, N. Learning quickly when irrelevant attributes abound: A new linear-threshold algorithm. (1988). Machine learning, 2, 285–318.
Google Scholar
Liu, H. & Setiono, R. (1998). Feature transformation and multivariate decision tree induction. In Proceedings of The First International Conference on Discovery Science (DS'98) (pp. 279–290). Fukuoka, Japan. Springer-Verlag.
Google Scholar
Liu, H. & Setiono, R. (1997). Feature selection via discretization of numeric attributes. IEEE Trans on Knowledge and Data Engineering, 9(4), 642–645.
Google Scholar
Muggleton, S. (ed.). 1992. Inductive Logic Programming. London: Academic Press.
Google Scholar
Pearl, J. (1988). Probabilistic reasoning in intelligent systems: Networks of plausible inference. San Francisco, California: Morgan Kaufmann Publishers, Inc.
Google Scholar
Quinlan, J. & Rivest, R. (1989). Inferring decision trees using the minimum description length principle. Information and Computation, 80, 227–248.
Google Scholar
Rissanen, J. (1986). Stochastic complexity and modeling. Ann. Statist., 14, 1080–1100.
Google Scholar
Schlimmer, J. C. (1993). Efficiently inducing determinations: A complete and systematic search algorithm that uses optimal pruning. In ML93, pp. 284–290.
Schweitzer, H. (1995). Occam algorithms for computing visual motion. IEEE Transactions on Pattern Analysis and Machine Intelligence, 17(11), 1033–1042.
Google Scholar
Shore, J. E. & Johnson, R.W. (1980). Axiomatic derivation of the principle of maximum entropy and the principle of minimum cross-entropy. IEEE Trans. Information Theory, 26, 26–37.
Google Scholar
Skalak, D. B. (1994). Prototype and feature selection by sampling and random mutation hill-climbing algorithms. In Proceedings of the 11th International Conference on Machine Learning (pp. 293–301). New Brunswick, N.J.: Morgan Kaufmann.
Google Scholar
Subramanian, D. & Genesereth, M. R. (1987). The relevance of irrelevance. In Proc. of IJCAI-87 (pp. 416–422).
Ullman, J. D. (1989). Principles of database and knowledgebase systems. Computer Science Press.
Wallace, C. & Freeman, P. (1987). Estimation and inference by compact coding. Journal of the Royal Statistical Society (B), 49, 240–265.
Google Scholar
Wang, H. (1996). Towards a unified framework of relevance. Ph.D. Thesis, Faculty of Informatics, University of Ulster, N. Ireland, UK. http://www.infj.ulst.ac.uk/∼cbcj23/thesis.ps.
Google Scholar
Wolpert, D. H. (1990). The relationship between Occam's Razor and convergent guessing. Complex Systems, 4, 319–368.
Google Scholar

Download references

Author information

Authors and Affiliations

School of Information and Software Engineering, Faculty of Informatics, University of Ulster at Jordanstown, Shore Road, Newtownabbey, BT37 0QB, Northern Ireland
David A. Bell
School of Information and Software Engineering, Faculty of Informatics, University of Ulster at Jordanstown, Shore Road, Newtownabbey, BT37 0QB, Northern Ireland
Hui Wang

Authors

David A. Bell
View author publications
You can also search for this author in PubMed Google Scholar
Hui Wang
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

About this article

Cite this article

Bell, D.A., Wang, H. A Formalism for Relevance and Its Application in Feature Subset Selection. Machine Learning 41, 175–195 (2000). https://doi.org/10.1023/A:1007612503587

Download citation

Issue Date: November 2000
DOI: https://doi.org/10.1023/A:1007612503587

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

A Formalism for Relevance and Its Application in Feature Subset Selection

Abstract

Article PDF

Similar content being viewed by others

A Novel Outlook on Feature Selection as a Multi-objective Problem

Why Feature Selection in Data Mining Is Prominent? A Survey

Feature selection techniques for machine learning: a survey of more than two decades of research

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Navigation

A Formalism for Relevance and Its Application in Feature Subset Selection

Abstract

Article PDF

Similar content being viewed by others

A Novel Outlook on Feature Selection as a Multi-objective Problem

Why Feature Selection in Data Mining Is Prominent? A Survey

Feature selection techniques for machine learning: a survey of more than two decades of research

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Share this article

Search

Navigation