Best Subset Feature Selection for Massive Mixed-Type Problems
We address the problem of identifying a non-redundant subset of important variables. All modern feature selection approaches including filters, wrappers, and embedded methods experience problems in very general settings with massive mixed-type data, and with complex relationships between the inputs and the target. We propose an efficient ensemble-based approach measuring statistical independence between a target and a potentially very large number of inputs including any meaningful order of interactions between them, removing redundancies from the relevant ones, and finally ranking variables in the identified minimum feature set. Experiments with synthetic data illustrate the sensitivity and the selectivity of the method, whereas the scalability of the method is demonstrated with a real car sensor data base.
KeywordsFeature Selection Variable Importance Importance Score Subset Feature Selection Redundant Variable
Unable to display preview. Download preview PDF.
- Borisov, A., Eruhimov, V., Tuv, E.: Dynamic soft feature selection for tree-based ensembles. In: Guyon, I., Gunn, S., Nikravesh, M., Zadeh, L. (eds.) Feature Extraction, Foundations and Applications. Springer, New York (2005)Google Scholar
- Friedman, J.H.: Greedy function approximation: a gradient boosting machine. Technical report, Dept. of Statistics, Stanford University (1999)Google Scholar
- Gabrilovich, E., Markovitch, S.: Text categorization with many redundant features: Using aggressive feature selection to make svms competitive with c4.5. In: Proc. ICML 2004 (2004)Google Scholar
- Torkkola, K., Tuv, E.: Ensembles of regularized least squares classifiers for highdimensional problems. In: Guyon, I., Gunn, S., Nikravesh, M., Zadeh, L. (eds.) Feature Extraction, Foundations and Applications. Springer, Heidelberg (2005)Google Scholar
- Torkkola, K., Gardner, M., Wood, C., Schreiner, C., Massey, N., Leivian, B., Summers, J., Venkatesan, S.: Toward modeling and classification of naturalistic driving. In: Proceedings of the 2005 IEEE Intelligent Vehicles Symposium, Las Vegas, NV, USA, June 6 - 8 2005, pp. 638–643 (2005)Google Scholar
- Tuv, E.: Feature selection and ensemble learning. In: Guyon, I., Gunn, S., Nikravesh, M., Zadeh, L. (eds.) Feature Extraction, Foundations and Applications. Springer, New York (2005)Google Scholar
- Tuv, E., Borisov, A., Torkkola, K.: Feature selection using ensemble based ranking against artificial contrasts. In: Proceedings of the International Joint Conference on Neural Networks (IJCNN 2006), Vancouver, Canada, July 16-22 (2006)Google Scholar