Feature selection is an important but often expensive process, especially with a large number of instances. This problem can be addressed by using a small training set, i.e. a surrogate set. In this work, we propose to use a hierarchical clustering method to build various surrogate sets, which allows to analyze the effect of surrogate sets with different qualities and quantities on the feature subsets. Further, a dynamic surrogate model is proposed to automatically adjust surrogate sets for different datasets. Based on this idea, a feature selection system is developed using particle swarm optimization as the search mechanism. The experiments show that the hierarchical clustering method can build better surrogate sets to reduce the computational time, improve the feature selection performance, and alleviate overfitting. The dynamic method can automatically choose suitable surrogate sets to further improve the classification accuracy.
This is a preview of subscription content, access via your institution.
Buy single article
Instant access to the full article PDF.
Tax calculation will be finalised during checkout.
Subscribe to journal
Immediate online access to all issues from 2019. Subscription will auto renew annually.
Tax calculation will be finalised during checkout.
Banka H, Dara S (2015) A hamming distance based binary particle swarm optimization (HDBPSO) algorithm for high dimensional feature selection, classification and validation. Pattern Recognit Lett 52:94–100
Chen Q, Zhang M, Xue B (2017) Feature selection to improve generalization of genetic programming for high-dimensional symbolic regression. IEEE Trans Evol Comput 21(5):792–806. https://doi.org/10.1109/TEVC.2017.2683489
Chinnaswamy A, Srinivasan R (2016) Hybrid feature selection using correlation coefficient and particle swarm optimization on microarray gene expression data. In: Snášel V, Abraham A, Krömer P, Pant M, Muda A (eds) Innovations in bio-inspired computing and applications. Springer, pp 229–239
Eberhart RC, Shi Y (1998) Comparison between genetic algorithms and particle swarm optimization. In: International conference on evolutionary programming. Springer, pp 611–616
Friedman J, Hastie T, Tibshirani R (2001) The elements of statistical learning, vol 1. Springer series in statistics. Springer, Berlin
Guyon I, Elisseeff A (2003) An introduction to variable and feature selection. J Mach Learn Res 3(Mar):1157–1182
Jiang S, Chin KS, Wang L, Qu G, Tsui KL (2017) Modified genetic algorithm-based feature selection combined with pre-trained deep neural network for demand forecasting in outpatient department. Expert Syst Appl 82:216–230
Kennedy J (2011) Particle swarm optimization. In: Sammut C, Webb GI (eds) Encyclopedia of machine learning. Springer, pp 760–766
Kennedy J, Eberhart RC (1997) A discrete binary version of the particle swarm algorithm. In: IEEE international conference on systems, man, and cybernetics, computational cybernetics and simulation, vol 5. IEEE, pp 4104–4108
Koza JR (1999) Genetic programming III: darwinian invention and problem solving, vol 3. Morgan Kaufmann, Burling
Li Z, Liu J, Yang Y, Zhou X, Lu H (2014) Clustering-guided sparse structural learning for unsupervised feature selection. IEEE Trans Knowl Data Eng 26(9):2138–2150
Lichman M (2013) UCI machine learning repository. University of California, School of Information and Computer Sciences, Irvine, CA. http://archive.ics.uci.edu/ml
MacQueen J et al (1967) Some methods for classification and analysis of multivariate observations. In: Proceedings of the fifth Berkeley symposium on mathematical statistics and probability, Oakland, CA, USA, vol 1, pp 281–297
Marill T, Green DM (1963) On the effectiveness of receptors in recognition systems. IEEE Trans Inf Theory 9(1):11–17
Muni DP, Pal NR, Das J (2006) Genetic programming for simultaneous feature selection and classifier design. IEEE Trans Syst Man Cybern Part B (Cybern) 36(1):106–117
Murtagh F, Legendre P (2014) Wards hierarchical agglomerative clustering method: which algorithms implement wards criterion? J Classif 31(3):274–295
Neshatian K, Zhang M, Andreae P (2012) A filter approach to multiple feature construction for symbolic learning classifiers using genetic programming. IEEE Trans Evol Comput 16(5):645–661
Nguyen BH, Xue B, Andreae P (2017a) A novel binary particle swarm optimization algorithm and its applications on knapsack and feature selection problems. In: Proceeding of the 20th Asia pacific symposium on intelligent and evolutionary systems. Springer, pp 319–332
Nguyen HB, Xue B, Liu I, Andreae P, Zhang M (2015) Gaussian transformation based representation in particle swarm optimisation for feature selection. In: European conference on the applications of evolutionary computation. Springer, pp 541–553
Nguyen HB, Xue B, Andreae P (2016) Mutual information for feature selection: estimation or counting? Evol Intel 9(3):95–110
Nguyen HB, Xue B, Andreae P (2017b) Surrogate-model based particle swarm optimisation with local search for feature selection in classification, vol 10199. Springer, Berlin, pp 487–505
Niu G (2017) Feature selection optimization. Springer, Berlin, pp 139–171
Olvera-López JA, Carrasco-Ochoa JA, Martínez-Trinidad JF, Kittler J (2010) A review of instance selection methods. Artif Intell Rev 34(2):133–143
Siedlecki W, Sklansky J (1989) A note on genetic algorithms for large-scale feature selection. Pattern Recognit Lett 10(5):335–347
Tang J, Alelyani S, Liu H (2014) Feature selection for classification: a review. In: Data classification: algorithms and applications. CRC Press
Wang F, Liang J (2016) An efficient feature selection algorithm for hybrid data. Neurocomputing 193:33–41
Whitney AW (1971) A direct method of nonparametric measurement selection. IEEE Trans Comput 100(9):1100–1103
Xue B, Zhang M, Browne WN (2012) Multi-objective particle swarm optimisation (pso) for feature selection. In: Proceedings of the 14th annual conference on genetic and evolutionary computation. ACM, pp 81–88
Xue B, Nguyen S, Zhang M (2014) A new binary particle swarm optimisation algorithm for feature selection. In: European conference on the applications of evolutionary computation. Springer, pp 501–513
Xue B, Zhang M, Browne WN, Yao X (2016) A survey on evolutionary computation approaches to feature selection. IEEE Trans Evol Comput 20(4):606–626
About this article
Cite this article
Nguyen, H.B., Xue, B. & Andreae, P. PSO with surrogate models for feature selection: static and dynamic clustering-based methods. Memetic Comp. 10, 291–300 (2018). https://doi.org/10.1007/s12293-018-0254-9
- Surrogate model
- Feature selection
- Particle swarm optimization