Abstract
Feature subset selection is a key problem in the data-mining classification task that helps to obtain more compact and understandable models without degrading their performance. This paper deals with the problem of supervised wrapper based feature subset selection in data sets with a very large number of attributes and a low sample size. In this case, standard wrapper algorithms cannot be applied because of their complexity. In this work we propose a new hybrid -filter wrapper- approach based on instance learning with the main goal of accelerating the feature subset selection process by reducing the number of wrapper evaluations. In our hybrid feature selection method, named Hybrid Instance Based Sequential Backward Search (HIB-SBS), instance learning is used to weight features and generate candidate feature subsets, then SBS and K-nearest neighbours (KNN) compose an evaluation system of wrappers. Our method is experimentally tested and compared with state-of-the-art algorithms over four high-dimensional low sample size datasets. The results show an impressive reduction in the execution time compared to the wrapper approach and that our proposal outperforms other methods in terms of accuracy and cardinality of the selected subset.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Guyon, I., Elisseff, A.: An introduction to variable and feature selection. J. Mach. Learn. Res. 3, 1157–1182 (2003)
Kira, K., Rendell, L.: A practical approach to feature selection. In: Sleeman, D., Edwards, P. (eds.) International Conference on Machine Learning, pp. 368–377 (1992)
Alizadeh, A.A., Eisen, M.B., Davis, R.E., Ma, C., Lossos, I.S., Rosenwald, A., Boldrick, J.C., Sabet, H., Tran, T., Yu, X., Powell, J.I., Yang, L., Marti, G.E., Moore, T., Hudson Jr., J., Lu, L., Lewis, D.B., Tibshirani, R., Sherlock, G., Chan, W.C., Greiner, T.C., Weisenburger, D.D., Armitage, J.O., Warnke, R., Levy, R., Wilson, W., Grever, M.R., Byrd, J.C., Botstein, D., Brown, P.O., Staudt, L.M.: Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling. Nature 403(6769), 503–511 (2000)
Cover, T., Hart, P.: Nearest neighbor pattern classification. IEEE Trans. Inf. Theory 13, 21–27 (1967)
Sun, Y., Todorovic, S., Goodison, S.: Local learning based feature selection for high dimensional data analysis. IEEE Trans. Pattern Anal. Mach. Intell. (TPAMI) 32, 1610–1626 (2010)
Jain, A., Zongker, D.: Feature selection: evaluation, application, and small sample performance. IEEE Trans. Pattern Anal. Mach. Intell. (TPAMI) 19, 153–158 (1997)
Brahim, A.B., Limam, M.: A stable instance based filter for feature selection in small sample size data sets. In: Luo, X., Yu, J.X., Li, Z. (eds.) ADMA 2014. LNCS, vol. 8933, pp. 334–344. Springer, Heidelberg (2014)
Kohavi, R., John, G.H.: Wrappers for feature subset selection. Artif. Intell. 97, 273–324 (1997)
Shipp, M.A., Ross, K.N., Tamayo, P., Weng, A.P., Kutok, J.L., Aguiar, R.C., Gaasenbeek, M., Angelo, M., Reich, M., Pinkus, G.S., Ray, T.S., Koval, M.A., Last, K.W., Norton, A., Lister, T.A., Mesirov, J., Neuberg, D.S.: Diffuse large B-cell lymphoma outcome prediction by gene-expression profiling and supervised machine learning. Nat. Med. 9, 68–74 (2000)
vant Veer, L.J.: Gene expression profiling predicts clinical outcome of breast cancer. Nature 415, 530–536 (2002)
Pomeroy, S.L.: Prediction of central nervous system embryonal tumour outcome based on gene expression. Nature 415, 436–442 (2002)
Saeys, Y., Inza, I., Larranaga, P.: A review of feature selection techniques in bioinformatics. Bioinformatics 23, 25072517 (2007)
Huang, J., Cai, Y., Xu, X.: A hybrid genetic algorithm for feature selection wrapper based on mutual information. Pattern Recogn. Lett. 28, 18251844 (2007)
Bermejo, P., Gmez, J.A., Puerta, J.M.: A grasp algorithm for fast hybrid (filter-wrapper) feature subset selection in high-dimensional datasets. Pattern Recogn. Lett. 32, 701–711 (2011)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Ben Brahim, A., Limam, M. (2015). Sequential Instance Based Feature Subset Selection for High Dimensional Data. In: Prasath, R., Vuppala, A., Kathirvalavakumar, T. (eds) Mining Intelligence and Knowledge Exploration. MIKE 2015. Lecture Notes in Computer Science(), vol 9468. Springer, Cham. https://doi.org/10.1007/978-3-319-26832-3_5
Download citation
DOI: https://doi.org/10.1007/978-3-319-26832-3_5
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-26831-6
Online ISBN: 978-3-319-26832-3
eBook Packages: Computer ScienceComputer Science (R0)