Advertisement

Best Subset Feature Selection for Massive Mixed-Type Problems

  • Eugene Tuv
  • Alexander Borisov
  • Kari Torkkola
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4224)

Abstract

We address the problem of identifying a non-redundant subset of important variables. All modern feature selection approaches including filters, wrappers, and embedded methods experience problems in very general settings with massive mixed-type data, and with complex relationships between the inputs and the target. We propose an efficient ensemble-based approach measuring statistical independence between a target and a potentially very large number of inputs including any meaningful order of interactions between them, removing redundancies from the relevant ones, and finally ranking variables in the identified minimum feature set. Experiments with synthetic data illustrate the sensitivity and the selectivity of the method, whereas the scalability of the method is demonstrated with a real car sensor data base.

Keywords

Feature Selection Variable Importance Importance Score Subset Feature Selection Redundant Variable 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Borisov, A., Eruhimov, V., Tuv, E.: Dynamic soft feature selection for tree-based ensembles. In: Guyon, I., Gunn, S., Nikravesh, M., Zadeh, L. (eds.) Feature Extraction, Foundations and Applications. Springer, New York (2005)Google Scholar
  2. Breiman, L.: Random forests. Machine Learning 45(1), 5–32 (2001)MATHCrossRefGoogle Scholar
  3. Breiman, L., Friedman, J.H., Olshen, R.A., Stone, C.J.: Classification and Regression Trees. CRC Press, Boca Raton (1984)MATHGoogle Scholar
  4. Friedman, J.H.: Greedy function approximation: a gradient boosting machine. Technical report, Dept. of Statistics, Stanford University (1999)Google Scholar
  5. Gabrilovich, E., Markovitch, S.: Text categorization with many redundant features: Using aggressive feature selection to make svms competitive with c4.5. In: Proc. ICML 2004 (2004)Google Scholar
  6. Ho, T.K.: The random subspace method for constructing decision forests. IEEE Transactions on Pattern Analysis and Machine Intelligence 20(8), 832–844 (1998)CrossRefGoogle Scholar
  7. Torkkola, K., Tuv, E.: Ensembles of regularized least squares classifiers for highdimensional problems. In: Guyon, I., Gunn, S., Nikravesh, M., Zadeh, L. (eds.) Feature Extraction, Foundations and Applications. Springer, Heidelberg (2005)Google Scholar
  8. Torkkola, K., Gardner, M., Wood, C., Schreiner, C., Massey, N., Leivian, B., Summers, J., Venkatesan, S.: Toward modeling and classification of naturalistic driving. In: Proceedings of the 2005 IEEE Intelligent Vehicles Symposium, Las Vegas, NV, USA, June 6 - 8 2005, pp. 638–643 (2005)Google Scholar
  9. Tuv, E.: Feature selection and ensemble learning. In: Guyon, I., Gunn, S., Nikravesh, M., Zadeh, L. (eds.) Feature Extraction, Foundations and Applications. Springer, New York (2005)Google Scholar
  10. Tuv, E., Borisov, A., Torkkola, K.: Feature selection using ensemble based ranking against artificial contrasts. In: Proceedings of the International Joint Conference on Neural Networks (IJCNN 2006), Vancouver, Canada, July 16-22 (2006)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Eugene Tuv
    • 1
  • Alexander Borisov
    • 2
  • Kari Torkkola
    • 3
  1. 1.Intel, Analysis and Control TechnologyChandlerUSA
  2. 2.Intel, Analysis and Control TechnologyN.NovgorodRussia
  3. 3.Motorola, Intelligent Systems LabTempeUSA

Personalised recommendations