International Conference on Mining Intelligence and Knowledge Exploration

Mining Intelligence and Knowledge Exploration pp 37-46 | Cite as

Sequential Instance Based Feature Subset Selection for High Dimensional Data

Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9468)

Abstract

Feature subset selection is a key problem in the data-mining classification task that helps to obtain more compact and understandable models without degrading their performance. This paper deals with the problem of supervised wrapper based feature subset selection in data sets with a very large number of attributes and a low sample size. In this case, standard wrapper algorithms cannot be applied because of their complexity. In this work we propose a new hybrid -filter wrapper- approach based on instance learning with the main goal of accelerating the feature subset selection process by reducing the number of wrapper evaluations. In our hybrid feature selection method, named Hybrid Instance Based Sequential Backward Search (HIB-SBS), instance learning is used to weight features and generate candidate feature subsets, then SBS and K-nearest neighbours (KNN) compose an evaluation system of wrappers. Our method is experimentally tested and compared with state-of-the-art algorithms over four high-dimensional low sample size datasets. The results show an impressive reduction in the execution time compared to the wrapper approach and that our proposal outperforms other methods in terms of accuracy and cardinality of the selected subset.

Keywords

Feature selection Sequential Hybrid Wrapper High dimensional data 

References

  1. 1.
    Guyon, I., Elisseff, A.: An introduction to variable and feature selection. J. Mach. Learn. Res. 3, 1157–1182 (2003)MATHGoogle Scholar
  2. 2.
    Kira, K., Rendell, L.: A practical approach to feature selection. In: Sleeman, D., Edwards, P. (eds.) International Conference on Machine Learning, pp. 368–377 (1992)Google Scholar
  3. 3.
    Alizadeh, A.A., Eisen, M.B., Davis, R.E., Ma, C., Lossos, I.S., Rosenwald, A., Boldrick, J.C., Sabet, H., Tran, T., Yu, X., Powell, J.I., Yang, L., Marti, G.E., Moore, T., Hudson Jr., J., Lu, L., Lewis, D.B., Tibshirani, R., Sherlock, G., Chan, W.C., Greiner, T.C., Weisenburger, D.D., Armitage, J.O., Warnke, R., Levy, R., Wilson, W., Grever, M.R., Byrd, J.C., Botstein, D., Brown, P.O., Staudt, L.M.: Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling. Nature 403(6769), 503–511 (2000)CrossRefGoogle Scholar
  4. 4.
    Cover, T., Hart, P.: Nearest neighbor pattern classification. IEEE Trans. Inf. Theory 13, 21–27 (1967)MATHCrossRefGoogle Scholar
  5. 5.
    Sun, Y., Todorovic, S., Goodison, S.: Local learning based feature selection for high dimensional data analysis. IEEE Trans. Pattern Anal. Mach. Intell. (TPAMI) 32, 1610–1626 (2010)CrossRefGoogle Scholar
  6. 6.
    Jain, A., Zongker, D.: Feature selection: evaluation, application, and small sample performance. IEEE Trans. Pattern Anal. Mach. Intell. (TPAMI) 19, 153–158 (1997)CrossRefGoogle Scholar
  7. 7.
    Brahim, A.B., Limam, M.: A stable instance based filter for feature selection in small sample size data sets. In: Luo, X., Yu, J.X., Li, Z. (eds.) ADMA 2014. LNCS, vol. 8933, pp. 334–344. Springer, Heidelberg (2014) Google Scholar
  8. 8.
    Kohavi, R., John, G.H.: Wrappers for feature subset selection. Artif. Intell. 97, 273–324 (1997)MATHCrossRefGoogle Scholar
  9. 9.
    Shipp, M.A., Ross, K.N., Tamayo, P., Weng, A.P., Kutok, J.L., Aguiar, R.C., Gaasenbeek, M., Angelo, M., Reich, M., Pinkus, G.S., Ray, T.S., Koval, M.A., Last, K.W., Norton, A., Lister, T.A., Mesirov, J., Neuberg, D.S.: Diffuse large B-cell lymphoma outcome prediction by gene-expression profiling and supervised machine learning. Nat. Med. 9, 68–74 (2000)Google Scholar
  10. 10.
    vant Veer, L.J.: Gene expression profiling predicts clinical outcome of breast cancer. Nature 415, 530–536 (2002)CrossRefGoogle Scholar
  11. 11.
    Pomeroy, S.L.: Prediction of central nervous system embryonal tumour outcome based on gene expression. Nature 415, 436–442 (2002)CrossRefGoogle Scholar
  12. 12.
    Saeys, Y., Inza, I., Larranaga, P.: A review of feature selection techniques in bioinformatics. Bioinformatics 23, 25072517 (2007)Google Scholar
  13. 13.
    Huang, J., Cai, Y., Xu, X.: A hybrid genetic algorithm for feature selection wrapper based on mutual information. Pattern Recogn. Lett. 28, 18251844 (2007)Google Scholar
  14. 14.
    Bermejo, P., Gmez, J.A., Puerta, J.M.: A grasp algorithm for fast hybrid (filter-wrapper) feature subset selection in high-dimensional datasets. Pattern Recogn. Lett. 32, 701–711 (2011)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing Switzerland 2015

Authors and Affiliations

  1. 1.ISGUniversity of TunisTunisTunisia
  2. 2.Dhofar UniversitySalalahOman

Personalised recommendations