Skip to main content

Sequential Instance Based Feature Subset Selection for High Dimensional Data

  • Conference paper
  • First Online:
Mining Intelligence and Knowledge Exploration (MIKE 2015)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 9468))

  • 1784 Accesses

Abstract

Feature subset selection is a key problem in the data-mining classification task that helps to obtain more compact and understandable models without degrading their performance. This paper deals with the problem of supervised wrapper based feature subset selection in data sets with a very large number of attributes and a low sample size. In this case, standard wrapper algorithms cannot be applied because of their complexity. In this work we propose a new hybrid -filter wrapper- approach based on instance learning with the main goal of accelerating the feature subset selection process by reducing the number of wrapper evaluations. In our hybrid feature selection method, named Hybrid Instance Based Sequential Backward Search (HIB-SBS), instance learning is used to weight features and generate candidate feature subsets, then SBS and K-nearest neighbours (KNN) compose an evaluation system of wrappers. Our method is experimentally tested and compared with state-of-the-art algorithms over four high-dimensional low sample size datasets. The results show an impressive reduction in the execution time compared to the wrapper approach and that our proposal outperforms other methods in terms of accuracy and cardinality of the selected subset.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Guyon, I., Elisseff, A.: An introduction to variable and feature selection. J. Mach. Learn. Res. 3, 1157–1182 (2003)

    MATH  Google Scholar 

  2. Kira, K., Rendell, L.: A practical approach to feature selection. In: Sleeman, D., Edwards, P. (eds.) International Conference on Machine Learning, pp. 368–377 (1992)

    Google Scholar 

  3. Alizadeh, A.A., Eisen, M.B., Davis, R.E., Ma, C., Lossos, I.S., Rosenwald, A., Boldrick, J.C., Sabet, H., Tran, T., Yu, X., Powell, J.I., Yang, L., Marti, G.E., Moore, T., Hudson Jr., J., Lu, L., Lewis, D.B., Tibshirani, R., Sherlock, G., Chan, W.C., Greiner, T.C., Weisenburger, D.D., Armitage, J.O., Warnke, R., Levy, R., Wilson, W., Grever, M.R., Byrd, J.C., Botstein, D., Brown, P.O., Staudt, L.M.: Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling. Nature 403(6769), 503–511 (2000)

    Article  Google Scholar 

  4. Cover, T., Hart, P.: Nearest neighbor pattern classification. IEEE Trans. Inf. Theory 13, 21–27 (1967)

    Article  MATH  Google Scholar 

  5. Sun, Y., Todorovic, S., Goodison, S.: Local learning based feature selection for high dimensional data analysis. IEEE Trans. Pattern Anal. Mach. Intell. (TPAMI) 32, 1610–1626 (2010)

    Article  Google Scholar 

  6. Jain, A., Zongker, D.: Feature selection: evaluation, application, and small sample performance. IEEE Trans. Pattern Anal. Mach. Intell. (TPAMI) 19, 153–158 (1997)

    Article  Google Scholar 

  7. Brahim, A.B., Limam, M.: A stable instance based filter for feature selection in small sample size data sets. In: Luo, X., Yu, J.X., Li, Z. (eds.) ADMA 2014. LNCS, vol. 8933, pp. 334–344. Springer, Heidelberg (2014)

    Google Scholar 

  8. Kohavi, R., John, G.H.: Wrappers for feature subset selection. Artif. Intell. 97, 273–324 (1997)

    Article  MATH  Google Scholar 

  9. Shipp, M.A., Ross, K.N., Tamayo, P., Weng, A.P., Kutok, J.L., Aguiar, R.C., Gaasenbeek, M., Angelo, M., Reich, M., Pinkus, G.S., Ray, T.S., Koval, M.A., Last, K.W., Norton, A., Lister, T.A., Mesirov, J., Neuberg, D.S.: Diffuse large B-cell lymphoma outcome prediction by gene-expression profiling and supervised machine learning. Nat. Med. 9, 68–74 (2000)

    Google Scholar 

  10. vant Veer, L.J.: Gene expression profiling predicts clinical outcome of breast cancer. Nature 415, 530–536 (2002)

    Article  Google Scholar 

  11. Pomeroy, S.L.: Prediction of central nervous system embryonal tumour outcome based on gene expression. Nature 415, 436–442 (2002)

    Article  Google Scholar 

  12. Saeys, Y., Inza, I., Larranaga, P.: A review of feature selection techniques in bioinformatics. Bioinformatics 23, 25072517 (2007)

    Google Scholar 

  13. Huang, J., Cai, Y., Xu, X.: A hybrid genetic algorithm for feature selection wrapper based on mutual information. Pattern Recogn. Lett. 28, 18251844 (2007)

    Google Scholar 

  14. Bermejo, P., Gmez, J.A., Puerta, J.M.: A grasp algorithm for fast hybrid (filter-wrapper) feature subset selection in high-dimensional datasets. Pattern Recogn. Lett. 32, 701–711 (2011)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Afef Ben Brahim .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this paper

Cite this paper

Ben Brahim, A., Limam, M. (2015). Sequential Instance Based Feature Subset Selection for High Dimensional Data. In: Prasath, R., Vuppala, A., Kathirvalavakumar, T. (eds) Mining Intelligence and Knowledge Exploration. MIKE 2015. Lecture Notes in Computer Science(), vol 9468. Springer, Cham. https://doi.org/10.1007/978-3-319-26832-3_5

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-26832-3_5

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-26831-6

  • Online ISBN: 978-3-319-26832-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics