From here on, we study feature selection for classification. By choosing this type of feature selection, we can focus on many common perspectives of feature selection, obtain a deep understanding of basic issues of feature selection, appreciate many different methods of feature selection, and later in the book move on to topics related to feature selection. The problem of feature selection can be examined in many perspectives. The four major ones are (1) how to search for the “best” features? (2) what should be used to determine best features, or what are the criteria for evaluation? (3) how should new features be generated for selection, adding or deleting one feature to the existing subset or changing a subset of features? (That is, feature generation is conducted sequentially or in parallel.) and (4) how applications determine feature selection? Applications have different requirements in terms of computational time, results, etc. For instance, the focus of machine learning (Dietterich, 1997) differs from that of data mining (Fayyad et al., 1996).


Feature Selection Search Space Information Gain Feature Subset Feature Selection Algorithm 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. Aha, D. W. (1998). Feature Weighting for Lazy Learning Algorithms, pages 13–32. In (Liu and Motoda, 1998).Google Scholar
  2. Almuallim, H. and Dietterich, T. (1994).Learning boolean concepts in the presence of many irrelevant features. Artificial Intelligence, 69(1-2):279–305.MathSciNetzbMATHCrossRefGoogle Scholar
  3. Ben-Bassat, M. (1982). Pattern recognition and reduction of dimensionality. In Krishnaiah, P. R. and Kanal, L. N., editors, Handbook of statistics-II, pages 773–791. North Holland.Google Scholar
  4. Blum, A. and Langley, P. (1997). Selection of relevant features and examples in machine learning. Artificial Intelligence, 97:245–271.MathSciNetzbMATHCrossRefGoogle Scholar
  5. Blumer, A., Ehrenfeucht, A., Haussler, D., and Warmuth, M. (1990). Occam’s razor. In Shavlik, J. and Dietterich, T., editors, Readings in Machine Learning, pages 201–204. Morgan Kaufmann.Google Scholar
  6. Breiman, L., Friedman, J., Olshen, R., and Stone, C. (1984). Classification and Regression Trees. Wadsworth & Brooks/Cole Advanced Books & Software.Google Scholar
  7. Dash, M. and Liu, H. (1997). Feature selection methods for classifications. Intelligent Data Analysis: An International Journal, 1(3).Google Scholar
  8. Dietterich, T. (1997). Machine learning research: Four current directions. AI Magazine, pages 97–136.Google Scholar
  9. Domingos, B. and Pazzani, M. (1996). Beyond independence: Conditions for the optimality of the simple Bayesian classifier. In Saitta, L., editor, Machine Learning: Proceedings of Thirteenth International Conference, pages 105–112. Morgan Kaufmann Internationals, Inc.Google Scholar
  10. Fayyad, U., Piatetsky-Shapiro, G., and Smyth, P. (1996). From data mining to knowledge discovery: An overview. In Fayyad, U., Piatetsky-Shapiro, G., Smyth, P., and Uthurusamy, R., editors, Advances in Knowledge Discovery and Data Mining, pages 495–515. AAAI Press / The MIT Press.Google Scholar
  11. Friedman, J. (1997). On bias, variance, 0/1-loss, and the curse-of-dimensionality. Data Mining and Knowledge Discovery, 1(1).Google Scholar
  12. Hagan, M., Demuth, H., and Beale, M. (1996). Neural Network Design. PWS Publishing Company.Google Scholar
  13. Hecht, R. (1990). Neurocomputing. Addison-Wesley Pub. Company.Google Scholar
  14. John, G., Kohavi, R., and Pfleger, K. (1994). Irrelevant feature and the subset selection problem. In Machine Learning: Proceedings of the Eleventh International Conference, pages 121–129. Morgan Kaufmann Publisher.Google Scholar
  15. Kira, K. and Rendell, L. (1992). The feature selection problem: Traditional methods and a new algorithm. In Proceedings of the Ninth National Conference on Artificial Intelligence, pages 129–134. AAAI Press/The MIT Press.Google Scholar
  16. Kohavi, R. (1995). Wrappers for performance enhancement and oblivious decision graphs. PhD thesis, Department of Computer Science, Standford University, Stanford, CA.Google Scholar
  17. Kohavi, R. and John, G. (1997). Wrappers for feature subset selection. Artificial Intelligence, 97(1-2):273–324.zbMATHCrossRefGoogle Scholar
  18. Kohavi, R. and John, G. (1998). The Wrapper Approach, pages 33–50. In (Liu and Motoda, 1998).Google Scholar
  19. Koller, D. and Sahami, M. (1996). Toward optimal feature selection. In Saitta, L., editor, Machine Learning: Proceedings of the Thirteenth International Conference, pages 284–292. Morgan Kaufmann Publishers.Google Scholar
  20. Liu, H. and Motoda, H., editors (1998). Feature Extraction, Construction and Selection: A Data Mining Perspective. Kluwer Academic Publishers.Google Scholar
  21. Liu, H. and Setiono, R. (1996). A probabilistic approach to feature selection — a filter solution. In Saitta, L., editor, Proceedings of International Conference on Machine Learning (ICML-96), pages 319–327. Morgan Kaufmann Publishers.Google Scholar
  22. Mingers, J. (1989a). An empirical comparison of pruning methods for decision tree induction. Machine Learning, 4:227–243.CrossRefGoogle Scholar
  23. Mingers, J. (1989b). An empirical comparison of selection measures for decisiontree induction. Machine Learning, 3:319–342.Google Scholar
  24. Narendra, P. and Fukunaga, K. (1977). A branch and bound algorithm for feature subset selection. IEEE Trans, on Computer, C-26(9):917–922.CrossRefGoogle Scholar
  25. Quinlan, J. (1986). Induction of decision trees. Machine Learning, 1(1):81–106.Google Scholar
  26. Quinlan, J. (1993). C4-5: Programs for Machine Learning. Morgan Kaufmann.Google Scholar
  27. Russell, S. and Norvig, P. (1995). Artificial Intelligence: A Modern Approach. Prentice Hall.Google Scholar
  28. Schlimmer, J. C. (1993). Efficiently inducing determinations: a complete and systematic search algorithm that uses optimal pruning. In Proceedings of the Tenth International Conference on Machine Learning, pages 284–290.Google Scholar
  29. Siedlecki, W. and Sklansky, J. (1988). On automatic feature selection. International Journal of Pattern Recognition and Artificial Intelligence, 2:197–220.CrossRefGoogle Scholar
  30. Weiss, S. M. and Kulikowski, C. A. (1991). Computer Systems That Learn. Morgan Kaufmann Publishers, San Mateo, California.Google Scholar
  31. Zilberstein, S. (1996). Using anytime algorithms in intelligent systems. AI Magazine, pages 73–83.Google Scholar

Copyright information

© Springer Science+Business Media New York 1998

Authors and Affiliations

  • Huan Liu
    • 1
  • Hiroshi Motoda
    • 2
  1. 1.National University of SingaporeSingapore
  2. 2.Osaka UniversityOsakaJapan

Personalised recommendations