Skip to main content

A Novel Criterion to Obtain the Best Feature Subset from Filter Ranking Methods

  • Conference paper
  • First Online:
  • 721 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 10880))

Abstract

The amount of data available in any field is permanently increasing, including high dimensionalities in the datasets that describe them. This high dimensionality makes the treatment of a dataset more complicated since algorithms require complex internal processes. To address the problem of dimensionality reduction, multiple Feature Selection techniques have been developed. However, most of these techniques just offer as result an ordered list of features according to their relevance (ranking), but they do not indicate which one is the optimal feature subset for representing the data. Therefore, it is necessary to design additional strategies for finding this best feature subset. This paper proposes a novel criterion based on sequential search methods to choose feature subsets automatically, without having to exhaustively evaluate rankings derived from filter selectors. The experimental results on 27 real datasets, applying eight selectors and six classifiers for evaluating their results, show that the best feature subset are reached.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   44.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   59.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Sánchez-Maroño, N., Alonso-Betanzos, A., Tombilla-Sanromán, M.: Filter methods for feature selection – a comparative study. In: Yin, H., Tino, P., Corchado, E., Byrne, W., Yao, X. (eds.) IDEAL 2007. LNCS, vol. 4881, pp. 178–187. Springer, Heidelberg (2007). https://doi.org/10.1007/978-3-540-77226-2_19

    Chapter  Google Scholar 

  2. Belanche, L.A., Gonzalez, F.F.: Review and evaluation of feature selection algorithms in synthetic problems. Technical report. Universitat Politecnica de Catalunya, Barcelona (2011). http://mawi.wide.ad.jp/mawi/

  3. Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutmann, P., Witten, I.: The WEKA data mining software: an update. SIGKDD Explor. 11, 10–18 (2009). http://www.cs.waikato.ac.nz/ml/weka/

    Article  Google Scholar 

  4. Sadeghi, R., Zarkami, R., Sabetraftar, K., Van Damme, P.: Application of genetic algorithm and greedy stepwise to select input variables in classification tree models for the prediction of habitat requirements of Azolla filiculoides (Lam) in Anzali wetland, Iran. Ecol. Model. 215, 44–53 (2013)

    Article  Google Scholar 

  5. Pudil, P., Novovicova, J., Kittler, J.: Floating search methods in feature selection. Pattern Recognit. Lett. 15, 1119–1125 (1994)

    Article  Google Scholar 

  6. Dunning, T.: Accurate methods for the statistics of surprise and coincidence. Comput. Linguist. 19, 61–74 (1993)

    Google Scholar 

  7. Quinlan, J.R.: C4.5: Programs for Machine Learning. Morgan Kaufmann Publishers Inc., San Francisco (1993)

    Google Scholar 

  8. Quinlan, J.R.: Induction of decision trees. Mach. Learn. 1, 81–106 (1986)

    Google Scholar 

  9. He, X., Cai, D., Niyogi, P.: Laplacian score for feature selection. In: Advances in Neural Information Processing Systems, vol. 18, pp. 507–514. MIT Press, Cambridge (2005)

    Google Scholar 

  10. Holte, R.C.: Very simple classification rules perform well on most commonly used datasets. Mach. Learn. 11, 63–91 (1993)

    Article  Google Scholar 

  11. Kira, K., Rendell, L.A.: The feature selection problem: traditional methods and a new algorithm. In: Tenth National Conference on Artificial Intelligence, pp. 129–134. MIT Press, San Jose (1992)

    Google Scholar 

  12. Vapnik, V., Lerner, A.: Pattern recognition using generalized portrait method. Autom. Remote Control 24, 774–780 (1963)

    Google Scholar 

  13. Press, W.H., Teukolsky, S.A., Vetterling, W.T., Flannery, B.P.: Numerical Recipes in C. Press Syndicate of the University of Cambridge, Cambridge (1988)

    MATH  Google Scholar 

  14. Singh, R., Kumar, H., Singla, R.K.: Analysis of feature selection techniques for network traffic dataset. In: IEEE (eds.) International Conference on Machine Intelligence and Research Advancement, pp. 42–46, Katra, India (2013)

    Google Scholar 

  15. Platt, J.C.: Sequential minimal optimization: a fast algorithm for training support vector machines. Technical report. Microsoft Co. (1998)

    Google Scholar 

  16. Titterington, D.M., Murray, G.D., Murray, L.S., Spiegelhalter, D.J., Skene, A.M., Habbema, J.D.F., Gelpke, G.J.: Comparison of discrimination techniques applied to a complex dataset of head injured. J. Roy. Stat. Soc. Ser. A 144, 145–175 (1981)

    Article  Google Scholar 

  17. Fix, E., Hodges Jr., J.L.: Discriminatory analysis nonparametric discrimination consistency properties, Project number 21-49-004. University of California, Berkeley (1951)

    Google Scholar 

  18. Cleary, J.G., Trigg, L.E.: K*: an instance-based learner using an entropic distance measure. In: 12th International Conference on Machine Learning, pp. 108–114. University of Waikato, New Zealand (1995)

    Chapter  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Lauro Vargas-Ruíz .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer International Publishing AG, part of Springer Nature

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Vargas-Ruíz, L., Franco-Arcega, A., Alonso-Lavernia, MdlÁ. (2018). A Novel Criterion to Obtain the Best Feature Subset from Filter Ranking Methods. In: Martínez-Trinidad, J., Carrasco-Ochoa, J., Olvera-López, J., Sarkar, S. (eds) Pattern Recognition. MCPR 2018. Lecture Notes in Computer Science(), vol 10880. Springer, Cham. https://doi.org/10.1007/978-3-319-92198-3_2

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-92198-3_2

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-92197-6

  • Online ISBN: 978-3-319-92198-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics