Journal of Intelligent Information Systems

, Volume 30, Issue 3, pp 273–292 | Cite as

Consistency measures for feature selection

  • Antonio Arauzo-Azofra
  • Jose Manuel Benitez
  • Juan Luis Castro
Article

Abstract

The use of feature selection can improve accuracy, efficiency, applicability and understandability of a learning process. For this reason, many methods of automatic feature selection have been developed. Some of these methods are based on the search of the features that allows the data set to be considered consistent. In a search problem we usually evaluate the search states, in the case of feature selection we measure the possible feature sets. This paper reviews the state of the art of consistency based feature selection methods, identifying the measures used for feature sets. An in-deep study of these measures is conducted, including the definition of a new measure necessary for completeness. After that, we perform an empirical evaluation of the measures comparing them with the highly reputed wrapper approach. Consistency measures achieve similar results to those of the wrapper approach with much better efficiency.

Keywords

Feature selection Attribute evaluation Consistency Measures 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Almuallim, H., & Dietterich, T. G. (1991). Learning with many irrelevant features. In Proceedings of the ninth national conference on artificial intelligence ( AAAI-91) , Anaheim, CA, vol. 2 (pp. 547–552). Menlo Park, CA: AAAI Press.Google Scholar
  2. Almuallim, H., & Dietterich, T. G. (1994). Learning boolean concepts in the presence of many irrelevant features. Artificial Intelligence, 69(1, 2), 279–305.MATHCrossRefMathSciNetGoogle Scholar
  3. Arauzo Azofra, A., Benitez, J. M., & Castro, J. L. (2003a). C-FOCUS: A continuous extension of FOCUS. In Proceedings of the 7th online world conference on soft computing in industrial applications (pp. 225–232).Google Scholar
  4. Arauzo Azofra, A., Benitez-Sanchez, J. M., & Castro-Peña, J. L. (2003b). A feature selection algorithm with fuzzy information. In Proceedings of the 10th IFSA world congress (pp. 220–223).Google Scholar
  5. Blum, A. L., & Langley, P. (1997). Selection of relevant features and examples in machine learning. Artificial Intelligence, 97, 245–271.MATHCrossRefMathSciNetGoogle Scholar
  6. Boros, E., Hammer, P. L., Ibaraki, T., Kogan, A., Mayoraz, E., & Muchnik, I. (2000). An implementation of logical analysis of data. IEEE Transactions on Knowledge Discovery and Data Engineering, 12(2), 292–306.Google Scholar
  7. Brill, F. Z., Brown, D. E., & Martin, W. N. (1992). Fast genetic selection of features for neural network classifiers. IEEE Transactions on Neural Networks, 3(2), 324–328.CrossRefGoogle Scholar
  8. Chmielewski, M. R., & Grzymala-Busse, J. W. (1996). Global discretization of constinuous attributes as preprocessing for machine learning. International Journal of Approximate Reasoning, 15(4), 319–331.MATHCrossRefGoogle Scholar
  9. Dash, M. (1997). Feature selection via set cover. In IEEE Knowledge and Data Engineering Exchange Workshop.Google Scholar
  10. Dash, M., & Liu, H. (1997). Feature selection for classification. Intelligent Data Analysis, 1(1–4), 131–156.CrossRefGoogle Scholar
  11. Dash, M., & Liu, H. (2003). Consistency-based search in feature selection. Artificial Intelligence, 151(1, 2), 155–176.MATHCrossRefMathSciNetGoogle Scholar
  12. Demsar, J., & Zupan, B. (2004). Orange: From experimental machine learning to interactive data mining. (White paper) http://www.ailab.si/orange.
  13. Hettich, S., & Bay, S. D. (1999). The uci kdd archive. http://kdd.ics.uci.edu/.
  14. Jain, A., & Zongker, D. (1997). Feature selection: Evaluation, application, and small sample performance. IEEE transactions on pattern analysis and machine intelligence, 19(2), 153–158.CrossRefGoogle Scholar
  15. John, G. H., Kohavi, R., & Pfleger, K. (1994). Irrelevant features and the subset selection problem. In International conference on machine learning, (pp. 121–129). Journal version in AIJ, available at http://citeseer.nj.nec.com/13663.html.
  16. Kira, K., & Rendell, L. A. (1992). A practical approach to feature selection. In Proceedings of the ninth international workshop on machine learning (pp. 249–256). San MAteo, CA, Morgan Kaufmann.Google Scholar
  17. Kohavi, R. (1994). Feature subset selection as search with probabilistic estimates. In AAAI fall symposium on relevance (pp. 122–126).Google Scholar
  18. Kohavi, R., & John, G. H. (1997). Wrappers for feature subset selection. Artificial Intelligence, 97(1, 2), 273–324.MATHCrossRefGoogle Scholar
  19. Komorowski, J., Pawlak, Z., Polkowski, L., & Skowron, A. (1998). Rough sets: A tutorial. In S. K. Paland, & A. Skowron (Eds.) Rough-fuzzy hybridization: A new trend in decision-making (pp. 3–98). Singapore: Springer.Google Scholar
  20. Kudo, M., & Sklansky, J. (2000). Comparison of algorithms that select features for pattern classifiers. Pattern Recognition, 33(1), 25–41.CrossRefGoogle Scholar
  21. Langley, P. (1994). Selection of relevant features in machine learning. In Procedings of the AAAI fall symposium on relevance, New Orleans, LA. Menlo Park, CA: AAAI Press.Google Scholar
  22. Liu, H., Hussain, F., Tan, C. L., & Dash, M. (2002). Discretization: An enabling technique. Data Mining and Knowledge Discovery, 6, 393–423.CrossRefMathSciNetGoogle Scholar
  23. Liu, H., Motoda, H., & Dash, M. (1998). A monotonic measure for optimal feature selection. In European conference on machine learning (pp. 101–106).Google Scholar
  24. Liu, H., & Setiono, R. (1997). Feature selection via discretization. Knowledge and Data Engineering, 9(4), 642–645.CrossRefGoogle Scholar
  25. Modrzejewski, M. (1993). Feature selection using rough sets theory. In Proceedings of the European conference on machine learning (pp. 213–216).Google Scholar
  26. Oliveira, A., & Sangiovanni-Vicentelli, A. (1992). Constructive induction using a non-greedy strategy for feature selection. In Proceedings of ninth international conference on machine learning, Aberdeen, Scotland (pp. 355–360). San Mateo, CA: Morgan Kaufmann.Google Scholar
  27. Pawlak, Z. (1991). Rough sets, theoretical aspects of reasoning about data. Boston, MA: Kluwer.MATHGoogle Scholar
  28. Polkowski, L., & Skowron, A., (Eds.) (1998). Rough sets in knowledge discovery. Heidelberg: Physica Verlag.Google Scholar
  29. Schlimmer, J. (1993). Efficiently inducing determinations: A complete and systematic search algorithm that uses optimal pruning. In Proceedings of tenth international conference on machine learning (pp. 289–290).Google Scholar
  30. Somol, P. & Pudil, P. (2004). Fast branch & bound algorithms for optimal feature selection. IEEE Transactions on Pattern Analysis and Machine Intelligence, 26(7), 900–912.CrossRefGoogle Scholar
  31. Tay, F. E. H., & Shen, L. (2002). A modified chi2 algorithm for discretization. Knowledge and Data Engineering, 14(3), 666–670.CrossRefGoogle Scholar
  32. Wettschereck, D., Aha, D. W., & Mohri, T. (1997). A review and empirical evaluation of feature weighting methods for a class of lazy learning algorithms. Artificial Intelligence Review, 11(1-5), 273–314.CrossRefGoogle Scholar
  33. Zhong, N., Dong, J., & Ohsuga, S. (2001). Using rough sets with heuristics for feature selection. Journal of Intelligent Information Systems, 16(3), 199–214.MATHCrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC 2007

Authors and Affiliations

  • Antonio Arauzo-Azofra
    • 1
  • Jose Manuel Benitez
    • 2
  • Juan Luis Castro
    • 2
  1. 1.Department of Rural EngineeringUniversity of CordobaCordobaSpain
  2. 2.Department of Computer Science and Artificial IntelligenceUniversity of GranadaGranadaSpain

Personalised recommendations