Skip to main content

Relevance Approach to Feature Subset Selection

  • Chapter

Part of the book series: The Springer International Series in Engineering and Computer Science ((SECS,volume 453))

Abstract

In this chapter an axiomatic characterisation of feature subset selection is presented. Two axioms are presented: sufficiency axiom — preservation of learning information, and necessity axiom — minimising encoding length. The sufficiency axiom concerns the existing dataset and is derived based on the following understanding: any selected feature subset should be able to describe the training dataset without losing information, i.e., it is consistent with the training dataset. The necessity axiom concerns predictability and is derived from Occam’s razor, which states that the simplest among different alternatives is preferred for prediction. The two axioms are then re-stated in terms of relevance in a concise form: maximising both the r(X; Y) and r(Y; X) relevance. Based on the relevance characterisation, a heuristic selection algorithm is presented and experimented with. The results support the axioms.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   169.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD   219.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  • Aha, D. W. and Bankert, R. L. (1994). Feature selection for case-based classification of cloud types. In Working notes of the AAAI94 Workshop on Case-based Reasoning, pages 106–112. AAAI Press.

    Google Scholar 

  • Almuallim, H. and Dietterich, T. G. (1991). Learning with many irrelevant features. In Proc. Ninth National Conference on Artificial Intelligence, pages 547–552. MIT Press.

    Google Scholar 

  • Amirikian, B. and Nishimura, H. (1994). What size network is good for generalization of a specific task of interest? Neural Networks, 7(2):321–329.

    Article  Google Scholar 

  • Blumer, A., Ehrenfeucht, A., Haussier, D., and Warmuth, M. K. (1987). Occam’s Razor. Information Processing Letters, 24:377–380.

    Article  MathSciNet  MATH  Google Scholar 

  • Caruana, R. and Freitag, D. (1994). How useful is relevance? In Proceedings of the 1994 AAAI Fall Symposium on Relevance, pages 21–25. AAAI Press.

    Google Scholar 

  • Cover, T. M. and Thomas, J. A. (1991). Elements of information theory. John Wiley & Sons, Inc.

    Book  MATH  Google Scholar 

  • Data mining system, C. A. Integral Solutions Limited (ISL).http://www.isl.co.uk/.

  • Davies, S. and Russell, S. (1994). NP-Completeness of Searches for Smallest Possible Feature Sets. In Proceedings of the 1994 AAAI Fall Symposium on Relevance,pages 37–39. AAAI Press.

    Google Scholar 

  • Devijver, P. A. and Kittler, J. (1982). Pattern recognition: A statistical approach. New York: Prentice-Hall.

    MATH  Google Scholar 

  • Fayyad, U. and Irani, K. (1990). What should be minimized in a decision tree? In AAAI-90: Proceedings of 8th National Conference on Artificial Intelligence.

    Google Scholar 

  • Fayyad, U. and Irani, K. (1992). The attribute selection problem in decision tree generation. In AAAI-92: Proceedings of 10th National Conference on Artificial Intelligence.

    Google Scholar 

  • Hill, J. R. (1991). Relational Databases: A Tutorial for Statisticians. In Keramidas, E. M. and Kaufman, S. M., editors, Computing Science and Statistics: Proc. of the 23rd Symposium on the Interface, pages 86–93.

    Google Scholar 

  • John, G. H., Kohavi, R., and Pfleger, K. (1994). Irrelevant features and the subset selection problem. In Proceedings of the 11th international conference on machine learning, pages 121–129. New Brunswick, NJ: Morgan Kaufmann

    Google Scholar 

  • Kira, K. and Rendell, L. A. (1992). The feature selection problem: traditional methods and a new algorithm. In AAAI-92, pages 129–134.

    Google Scholar 

  • Kohavi, R. and Sommerfield, D. (1995). Feature Subset Selection Using the Wrapper Method: Overfitting and Dynamic Search Space Topology. In Fayyad, U. M. and Uthurusamy, R., editors, Proceedings of KDD’95, pages 192–197.

    Google Scholar 

  • Kononenko, I. (1994). Estimating attributes: analysis and extensions of RELIEF. In Proceedings of the 1994 European Conference on Machine Learning.

    Google Scholar 

  • Langley, P. (1994). Selection of relevant features in machine learning. In Relevance: proc. 1994 AAAI Fall Symposium, pages 127–131. AAAI Press.

    Google Scholar 

  • Murphy, P. M. and Aha, D. W. (1994). UCI Repository of Machine Learning Databases and Domain Theories. Irvin, CA. ftp://ftp.ics.uci.edu.

    Google Scholar 

  • Quinlan, J. and Rivest, R. (1989). Inferring decision trees using the minimum description length principle. Information and Computation, 80:227–248.

    Article  MathSciNet  MATH  Google Scholar 

  • Rissanen, J. (1986). Stochastic complexity and modeling. Ann. Statist.,14:1080–1100.

    Article  MathSciNet  MATH  Google Scholar 

  • Schlimmer, J. C. (1993). Efficiently inducing determinations: a complete and systematic search algorithm that uses optimal pruning. In ML93, pages 284–290.

    Google Scholar 

  • Schweitzer, H. (1995). Occam algorithms for computing visual motion. IEEE Transactions on Pattern Analysis and Machine Intelligence, 17(11):1033–1042.

    Article  Google Scholar 

  • Wang, H. (1996). Towards a unified framework of relevance. PhD thesis, Faculty of Informatics, University of Ulster, N. Ireland, UK.http://www.infm.ulst.ac.uk/‐hwang/thesis.ps.

    Google Scholar 

  • Wolpert, D. H. (1990). The relationship between Occam’s Razor and convergent guessing. Complex Systems, 4:319–368.

    MathSciNet  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 1998 Springer Science+Business Media New York

About this chapter

Cite this chapter

Wang, H., Bell, D., Murtagh, F. (1998). Relevance Approach to Feature Subset Selection. In: Liu, H., Motoda, H. (eds) Feature Extraction, Construction and Selection. The Springer International Series in Engineering and Computer Science, vol 453. Springer, Boston, MA. https://doi.org/10.1007/978-1-4615-5725-8_6

Download citation

  • DOI: https://doi.org/10.1007/978-1-4615-5725-8_6

  • Publisher Name: Springer, Boston, MA

  • Print ISBN: 978-1-4613-7622-4

  • Online ISBN: 978-1-4615-5725-8

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics