Skip to main content

Filter-Based Feature Selection Methods Using Hill Climbing Approach

Part of the Unsupervised and Semi-Supervised Learning book series (UNSESUL)

Abstract

Feature selection remains one of the most important steps for usability of a model for both supervised and unsupervised classification. For a dataset, with n features, the number of possible feature subsets is 2n. Even for a moderate size of n, there is a combinatorial explosion in the search space. Feature selection is a NP-hard problem; hence finding the optimal solution is not feasible. Typically various kinds of intelligent and metaheuristic search techniques can be employed for this purpose. Hill climbing is arguably the simplest of such techniques. It has many variants based on (a) trade-off between greediness and randomness, (b) direction of the search, and (c) size of the neighborhood. Consequently it might not be trivial for the practitioner to choose a suitable method for the task in hand. In this paper, we have attempted to address this issue in the context of feature selection. The descriptions of the methods are followed by an extensive empirical study over 20 publicly available datasets. Finally a comparison has been done with genetic algorithm, which shows the effectiveness of hill climbing methods in the context of feature selection.

Keywords

  • Hill climbing
  • Filter
  • Feature selection
  • Heuristic
  • Classification

This is a preview of subscription content, access via your institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • DOI: 10.1007/978-3-319-98566-4_10
  • Chapter length: 22 pages
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
eBook
USD   129.00
Price excludes VAT (USA)
  • ISBN: 978-3-319-98566-4
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
Softcover Book
USD   169.99
Price excludes VAT (USA)
Hardcover Book
USD   169.99
Price excludes VAT (USA)
Fig. 10.1
Fig. 10.2
Fig. 10.3
Fig. 10.4
Fig. 10.5
Fig. 10.6
Fig. 10.7
Fig. 10.8

References

  1. Goswami S, Chakrabarti A (2014) Feature selection: a practitioner view. IJITCS 6(11):66–77. https://doi.org/10.5815/ijitcs.2014.11.10

    CrossRef  Google Scholar 

  2. Liu H, Yu L (2005 Apr) Toward integrating feature selection algorithms for classification and clustering. IEEE Trans Knowl Data Eng 17(4):491–502

    CrossRef  Google Scholar 

  3. Tibshirani R (1996) Regression shrinkage and selection via the lasso. J R Stat Soc Ser B Methodol 58:267–288

    MathSciNet  MATH  Google Scholar 

  4. Das AK, Goswami S, Chakrabarti A, Chakraborty B (2017) A new hybrid feature selection approach using feature association map for supervised and unsupervised classification. Expert Syst Appl 88:81–94

    CrossRef  Google Scholar 

  5. Goswami S, Das AK, Guha P, Tarafdar A, Chakraborty S, Chakrabarti A, Chakraborty B (2017) An approach of feature selection using graph-theoretic heuristic and hill climbing. Pattern Anal Applic:1–17

    Google Scholar 

  6. Goswami S, Chakrabarti A, Chakraborty B (2016) A proposal for recommendation of feature selection algorithm based on data set characteristics. J UCS 22(6):760–781

    MathSciNet  Google Scholar 

  7. Goswami S, Saha S, Chakravorty S, Chakrabarti A, Chakraborty B (2015) A new evaluation measure for feature subset selection with genetic algorithm. Int J Intell Syst Appl MECS 7(10):28

    CrossRef  Google Scholar 

  8. Gheyas IA, Smith LS (2010) Feature subset selection in large dimensionality domains. Pattern Recogn 43(1):5–13

    CrossRef  Google Scholar 

  9. De La Iglesia B (2013) Evolutionary computation for feature selection in classification problems. Wiley Interdiscip Rev Data Min Knowl Disc 3(6):381–407

    CrossRef  Google Scholar 

  10. Goswami S, Das AK, Chakrabarti A, Chakraborty B (2017) A feature cluster taxonomy based feature selection technique. Expert Syst Appl 79:76–89

    CrossRef  Google Scholar 

  11. Goswami S, Chakraborty S, Saha HN (2017) An univariate feature elimination strategy for clustering based on metafeatures. Int J Intell Syst Appl 9(10):20

    Google Scholar 

  12. Goswami S, Chakrabarti A, Chakraborty B (2017) An efficient feature selection technique for clustering based on a new measure of feature importance. J Intell Fuzzy Syst 32(6):3847–3858

    CrossRef  Google Scholar 

  13. Gent IP, Walsh T (1993) Towards an understanding of hill-climbing procedures for SAT. In: AAAI, vol 93, pp 28–33

    Google Scholar 

  14. Wang R, Youssef AM, Elhakeem AK (2006) On some feature selection strategies for spam filter design. In: Electrical and computer engineering, 2006. CCECE'06, Canadian Conference on 2006 May. IEEE, pp 2186–2189

    Google Scholar 

  15. Burke EK, Bykov Y (2008) A late acceptance strategy in hill-climbing for exam timetabling problems. PATAT 2008 Conference, Montreal

    Google Scholar 

  16. Lang KJ (2016) Hill climbing beats genetic search on a boolean circuit synthesis problem of koza's. In: Proceedings of the twelfth international conference on machine learning 2016 Jan 22, pp 340–343

    Google Scholar 

  17. Bykov Y, Petrovic S (2016) A step counting hill climbing algorithm applied to university examination timetabling. J Schedul:1–4

    Google Scholar 

  18. Seyedmahmoudian M, Horan B, Rahmani R, Maung Than Oo A, Stojcevski A (2016) Efficient photovoltaic system maximum power point tracking using a new technique. Energies 9(3):147

    CrossRef  Google Scholar 

  19. Saichandana B, Srinivas K, Kumar RK (2014) Clustering algorithm combined with hill climbing for classification of remote sensing image. Int J Electr Comput Eng 4(6):923–930

    Google Scholar 

  20. Ou TC, Su WF, Liu XZ, Huang SJ, Tai TY (2016) A modified bird-mating optimization with hill-climbing for connection decisions of transformers. Energies 9(9):671

    CrossRef  Google Scholar 

  21. Nunes CM, Britto AS, Kaestner CA, Sabourin R (2004) An optimized hill climbing algorithm for feature subset selection: Evaluation on handwritten character recognition. In: Frontiers in handwriting recognition, 2004. IWFHR-9 2004. Ninth international workshop on 2004 Oct 26. IEEE, pp 365–370

    Google Scholar 

  22. Gelbart D, Morgan N, Tsymbal A (2009) Hill-climbing feature selection for multi-stream ASR. In: INTERSPEECH 2009, pp 2967–2970

    Google Scholar 

  23. Hall MA, Smith LA (1997) Feature subset selection: a correlation based filter approach. In: International conference on neural information processing and intelligent information systems, pp 855–858

    Google Scholar 

  24. Liu Y, Schumann M (2005) Data mining feature selection for credit scoring models. J Oper Res Soc 56(9):1099–1108

    CrossRef  Google Scholar 

  25. Begg RK, Palaniswami M, Owen B (2005) Support vector machines for automated gait classification. IEEE Trans Biomed Eng 52(5):828–838

    CrossRef  Google Scholar 

  26. Farmer ME, Bapna S, Jain AK (2004) Large scale feature selection using modified random mutation hill climbing. In: Pattern recognition, 2004. ICPR 2004. Proceedings of the 17th international conference on 2004 Aug 23, vol 2. IEEE, pp 287–290

    Google Scholar 

  27. Malakasiotis P (2009) Paraphrase recognition using machine learning to combine similarity measures. In: Proceedings of the ACL-IJCNLP 2009 student research workshop 2009 Aug 4. Association for Computational Linguistics, pp 27–35

    Google Scholar 

  28. Caruana R, Freitag D (1994) Greedy Attribute Selection. In: ICML, pp 28–36

    Google Scholar 

  29. Lewis R (2009) A general-purpose hill-climbing method for order independent minimum grouping problems: A case study in graph colouring and bin packing. Comput Oper Res 36(7):2295–2310

    CrossRef  MathSciNet  Google Scholar 

  30. Mitchell M, Holland JH, Forrest S (2014) Relative building-block fitness and the building block hypothesis. D. Whitley. Found Genet Algorithms 2:109–126

    Google Scholar 

  31. Lourenço HR, Martin OC, Stützle T (2003) Iterated local search. In: Handbook of metaheuristics. Springer, Boston, pp 320–353

    Google Scholar 

  32. Mitchell M, Holland JH When will a genetic algorithm outperform hill-climbing?

    Google Scholar 

  33. Hall MA Correlation-based feature selection for machine learning. Doctoral dissertation, The University of Waikato

    Google Scholar 

  34. Lichman M (2013) UCI machine learning repository [http://archive.ics.uci.edu/ml]. University of California, School of Information and Computer Science, Irvine

    Google Scholar 

  35. Alcalá-Fdez J, Fernandez A, Luengo J, Derrac J, García S, Sánchez L, Herrera F (2011) KEEL Data-Mining Software Tool: Data Set Repository, Integration of Algorithms and Experimental Analysis Framework. J Mult Valued Log Soft Comput 17(2-3):255–287

    Google Scholar 

  36. R Core Team (2013) R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. ISBN 3-900051-07-0, URL http://www.R-project.org/

  37. Luca Scrucca (2013) GA: A Package for Genetic Algorithms in R. Journal of Statistical Software, 53(4), 1–37. URL,http://www.jstatsoft.org/v53/i04/

  38. Taylor BM (2013) miscFuncs: miscellaneous useful functions. R package version 1.2-4. http://CRAN.R-project.org/package=miscFuncs

  39. Hausser J, Strimmer K (2012) entropy: entropy and mutual information estimation. R package version 1.1.7 http://CRAN.R-project.org/package=entropy

  40. Gutowski MW (2005) Biology, physics, small worlds and genetic algorithms. In: Shannon S (ed) Leading edge computer science research. Nova Science Publishers Inc, Hauppage, pp 165–218

    Google Scholar 

  41. Therneau T, Atkinson B, Ripley B (2012) rpart: recursive partitioning. R package version 4.1-0

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and Permissions

Copyright information

© 2019 Springer International Publishing AG, part of Springer Nature

About this chapter

Verify currency and authenticity via CrossMark

Cite this chapter

Goswami, S., Chakraborty, S., Guha, P., Tarafdar, A., Kedia, A. (2019). Filter-Based Feature Selection Methods Using Hill Climbing Approach. In: Li, X., Wong, KC. (eds) Natural Computing for Unsupervised Learning. Unsupervised and Semi-Supervised Learning. Springer, Cham. https://doi.org/10.1007/978-3-319-98566-4_10

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-98566-4_10

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-98565-7

  • Online ISBN: 978-3-319-98566-4

  • eBook Packages: EngineeringEngineering (R0)