A Novel Criterion to Obtain the Best Feature Subset from Filter Ranking Methods

Vargas-Ruíz, Lauro; Franco-Arcega, Anilu; Alonso-Lavernia, María-de-los-Ángeles

doi:10.1007/978-3-319-92198-3_2

A Novel Criterion to Obtain the Best Feature Subset from Filter Ranking Methods

Lauro Vargas-Ruíz^17,18,
Anilu Franco-Arcega¹⁸ &
María-de-los-Ángeles Alonso-Lavernia¹⁸

Conference paper
First Online: 25 May 2018

721 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 10880))

Abstract

The amount of data available in any field is permanently increasing, including high dimensionalities in the datasets that describe them. This high dimensionality makes the treatment of a dataset more complicated since algorithms require complex internal processes. To address the problem of dimensionality reduction, multiple Feature Selection techniques have been developed. However, most of these techniques just offer as result an ordered list of features according to their relevance (ranking), but they do not indicate which one is the optimal feature subset for representing the data. Therefore, it is necessary to design additional strategies for finding this best feature subset. This paper proposes a novel criterion based on sequential search methods to choose feature subsets automatically, without having to exhaustively evaluate rankings derived from filter selectors. The experimental results on 27 real datasets, applying eight selectors and six classifiers for evaluating their results, show that the best feature subset are reached.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 44.99; Price excludes VAT (USA)

Softcover Book: USD 59.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

Sánchez-Maroño, N., Alonso-Betanzos, A., Tombilla-Sanromán, M.: Filter methods for feature selection – a comparative study. In: Yin, H., Tino, P., Corchado, E., Byrne, W., Yao, X. (eds.) IDEAL 2007. LNCS, vol. 4881, pp. 178–187. Springer, Heidelberg (2007). https://doi.org/10.1007/978-3-540-77226-2_19
Chapter Google Scholar
Belanche, L.A., Gonzalez, F.F.: Review and evaluation of feature selection algorithms in synthetic problems. Technical report. Universitat Politecnica de Catalunya, Barcelona (2011). http://mawi.wide.ad.jp/mawi/
Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutmann, P., Witten, I.: The WEKA data mining software: an update. SIGKDD Explor. 11, 10–18 (2009). http://www.cs.waikato.ac.nz/ml/weka/
Article Google Scholar
Sadeghi, R., Zarkami, R., Sabetraftar, K., Van Damme, P.: Application of genetic algorithm and greedy stepwise to select input variables in classification tree models for the prediction of habitat requirements of Azolla filiculoides (Lam) in Anzali wetland, Iran. Ecol. Model. 215, 44–53 (2013)
Article Google Scholar
Pudil, P., Novovicova, J., Kittler, J.: Floating search methods in feature selection. Pattern Recognit. Lett. 15, 1119–1125 (1994)
Article Google Scholar
Dunning, T.: Accurate methods for the statistics of surprise and coincidence. Comput. Linguist. 19, 61–74 (1993)
Google Scholar
Quinlan, J.R.: C4.5: Programs for Machine Learning. Morgan Kaufmann Publishers Inc., San Francisco (1993)
Google Scholar
Quinlan, J.R.: Induction of decision trees. Mach. Learn. 1, 81–106 (1986)
Google Scholar
He, X., Cai, D., Niyogi, P.: Laplacian score for feature selection. In: Advances in Neural Information Processing Systems, vol. 18, pp. 507–514. MIT Press, Cambridge (2005)
Google Scholar
Holte, R.C.: Very simple classification rules perform well on most commonly used datasets. Mach. Learn. 11, 63–91 (1993)
Article Google Scholar
Kira, K., Rendell, L.A.: The feature selection problem: traditional methods and a new algorithm. In: Tenth National Conference on Artificial Intelligence, pp. 129–134. MIT Press, San Jose (1992)
Google Scholar
Vapnik, V., Lerner, A.: Pattern recognition using generalized portrait method. Autom. Remote Control 24, 774–780 (1963)
Google Scholar
Press, W.H., Teukolsky, S.A., Vetterling, W.T., Flannery, B.P.: Numerical Recipes in C. Press Syndicate of the University of Cambridge, Cambridge (1988)
MATH Google Scholar
Singh, R., Kumar, H., Singla, R.K.: Analysis of feature selection techniques for network traffic dataset. In: IEEE (eds.) International Conference on Machine Intelligence and Research Advancement, pp. 42–46, Katra, India (2013)
Google Scholar
Platt, J.C.: Sequential minimal optimization: a fast algorithm for training support vector machines. Technical report. Microsoft Co. (1998)
Google Scholar
Titterington, D.M., Murray, G.D., Murray, L.S., Spiegelhalter, D.J., Skene, A.M., Habbema, J.D.F., Gelpke, G.J.: Comparison of discrimination techniques applied to a complex dataset of head injured. J. Roy. Stat. Soc. Ser. A 144, 145–175 (1981)
Article Google Scholar
Fix, E., Hodges Jr., J.L.: Discriminatory analysis nonparametric discrimination consistency properties, Project number 21-49-004. University of California, Berkeley (1951)
Google Scholar
Cleary, J.G., Trigg, L.E.: K*: an instance-based learner using an entropic distance measure. In: 12th International Conference on Machine Learning, pp. 108–114. University of Waikato, New Zealand (1995)
Chapter Google Scholar

Download references

Author information

Authors and Affiliations

Instituto Tecnológico Superior del Oriente del Estado de Hidalgo, 43900, Apan, Hidalgo, Mexico
Lauro Vargas-Ruíz
Instituto de Ciencias Básicas e Ingenierías, Universidad Autónoma del Estado de Hidalgo, 42184, Mineral de la Reforma, Hidalgo, Mexico
Lauro Vargas-Ruíz, Anilu Franco-Arcega & María-de-los-Ángeles Alonso-Lavernia

Authors

Lauro Vargas-Ruíz
View author publications
You can also search for this author in PubMed Google Scholar
Anilu Franco-Arcega
View author publications
You can also search for this author in PubMed Google Scholar
María-de-los-Ángeles Alonso-Lavernia
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Lauro Vargas-Ruíz .

Editor information

Editors and Affiliations

National Institute of Astrophysics, Optics and Electronics, Sta. Maria Tonantzintla, Puebla, Mexico
José Francisco Martínez-Trinidad
National Institute of Astrophysics, Optics and Electronics, Sta. Maria Tonantzintla, Puebla, Mexico
Jesús Ariel Carrasco-Ochoa
Autonomous University of Puebla, Puebla, Puebla, Mexico
José Arturo Olvera-López
University of South Florida, Tampa, Florida, USA
Sudeep Sarkar

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Vargas-Ruíz, L., Franco-Arcega, A., Alonso-Lavernia, MdlÁ. (2018). A Novel Criterion to Obtain the Best Feature Subset from Filter Ranking Methods. In: Martínez-Trinidad, J., Carrasco-Ochoa, J., Olvera-López, J., Sarkar, S. (eds) Pattern Recognition. MCPR 2018. Lecture Notes in Computer Science(), vol 10880. Springer, Cham. https://doi.org/10.1007/978-3-319-92198-3_2

Download citation

DOI: https://doi.org/10.1007/978-3-319-92198-3_2
Published: 25 May 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-92197-6
Online ISBN: 978-3-319-92198-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

The International Association for Pattern Recognition (opens in a new tab)