Pattern Analysis and Applications

, Volume 21, Issue 1, pp 57–66 | Cite as

A new feature subset selection using bottom-up clustering

Theoretical Advances


Feature subset selection and/or dimensionality reduction is an essential preprocess before performing any data mining task, especially when there are too many features in the problem space. In this paper, a clustering-based feature subset selection (CFSS) algorithm is proposed for discriminating more relevant features. In each level of agglomeration, it uses similarity measure among features to merge two most similar clusters of features. By gathering similar features into clusters and then introducing representative features of each cluster, it tries to remove some redundant features. To identify the representative features, a criterion based on mutual information is proposed. Since CFSS works in a filter manner in specifying the representatives, it is noticeably fast. As an advantage of hierarchical clustering, it does not need to determine the number of clusters in advance. In CFSS, the clustering process is repeated until all features are distributed in some clusters. However, to diffuse the features in a reasonable number of clusters, a recently proposed approach is used to obtain a suitable level for cutting the clustering tree. To assess the performance of CFSS, we have applied it on some valid UCI datasets and compared with some popular feature selection methods. The experimental results reveal the efficiency and fastness of our proposed method.


Dimensionality reduction Feature selection Hierarchical clustering Feature clustering 


  1. 1.
    Roweis S, Saul L (2000) Nonlinear dimensionality reduction by locally linear embedding. Science 290(5500):2323–2326CrossRefGoogle Scholar
  2. 2.
    Kohavi R, John GH (1997) Wrapper for feature subset selection. Artif Intell 97(1–2):273–324CrossRefMATHGoogle Scholar
  3. 3.
    Pudil P, Novovicova J, Kittler J (1994) Floating search methods in feature selection. Pattern Recognit Lett 15:1119–1125CrossRefGoogle Scholar
  4. 4.
    Reunanen J (2003) Overfitting in making comparisons between variable selection methods. J Mach Learn Res 3:1371–1382MATHGoogle Scholar
  5. 5.
    Goldberg D (1989) Genetic algorithms in search, optimization and machine learning. Addison Wesley, ReadingMATHGoogle Scholar
  6. 6.
    Kennedy J, Eberhart RC (1995) Particle swarm optimization. IEEE Int Conf Neural Netw 4:942–1948Google Scholar
  7. 7.
    Chandrashekar G, Sahin F (2014) A survey on feature selection methods. Comput Electr Eng 40(1):16–28CrossRefGoogle Scholar
  8. 8.
    Peng H, Long F, Ding C (2005) Feature selection based on mutual information: criteria of max-dependency, max-relevance, and min-redundance. IEEE Trans Pattern Anal Mach Intell 27(8):1226–1238CrossRefGoogle Scholar
  9. 9.
    Dubes R, Jain AK (1980) Clustering methodologies in exploratory data analysis. In: Yovits MC (ed) Advances in computers. Academic Press Inc., New York, pp 113–125Google Scholar
  10. 10.
    Kasim S, Deris S, Othman RM (2013) Multi-stage filtering for improving confidence level and determining dominant clusters in clustering algorithms of gene expression data. Comput Biol Med 43:1120–1133CrossRefGoogle Scholar
  11. 11.
    MacQueen JB (1967) Some methods for classification and analysis of multivariate observations. In: Proceedings of 5th Berkeley symposium on mathematical statistics and probability, vol 1. University of California Press, pp 281–297Google Scholar
  12. 12.
    Rokach L, Maimon O (2005) Clustering methods. In: Data mining and knowledge discovery handbook. Springer, New York, pp 321–352CrossRefGoogle Scholar
  13. 13.
    Manning CD, Schütze H (1999) Foundations of statistical natural language processing. MIT Press, CambridgeMATHGoogle Scholar
  14. 14.
    Rafsanjani MK, Varzaneh ZA, Chukanlo NE (2012) A survey of hierarchical clustering algorithms. J Math Comput Sci 5(3):229–240Google Scholar
  15. 15.
    Yu-chieh WU (2014) A top-down information theoretic word clustering algorithm for phrase recognition. Inf Sci 275:213–225CrossRefGoogle Scholar
  16. 16.
    Mitra P, Murthy C, Pal SK (2002) Unsupervised feature selection using feature similarity. IEEE Trans Pattern Anal Mach Intell 24(3):301–312CrossRefGoogle Scholar
  17. 17.
    Sotoca JM, Pla F (2010) Supervised feature selection by clustering using conditional mutual information based distances. Pattern Recogn 43(6):325–343MATHGoogle Scholar
  18. 18.
    Song Q, Ni J, Wang G (2013) A fast clustering-based feature subset selection algorithm for high-dimensional data. IEEE Trans Knowl Data Eng 25(1):1–14CrossRefGoogle Scholar
  19. 19.
    Altman NS (1992) An introduction to kernel and nearest neighbor nonparametric regression. Am Stat 46(3):175–185MathSciNetGoogle Scholar
  20. 20.
    Ward JH (1963) Hierarchical grouping to optimize an objective function. J Am Stat Assoc 58(301):236–244MathSciNetCrossRefGoogle Scholar
  21. 21.
    Song Y, Jin S, Shen J (2011) A unique property of single-link distance and its application in data clustering. Data Knowl Eng 70:984–1003CrossRefGoogle Scholar
  22. 22.
    Mansoori EG (2014) GACH: a grid-based algorithm for hierarchical clustering of high-dimensional data. Soft Comput 18(5):905–922CrossRefGoogle Scholar
  23. 23.
    Khedkar SA, Bainwad AM, Chitnis PO (2014) A survey on clustered feature selection algorithms for high dimensional data. Int J Comput Sci Inf Technol (IJCSIT) 5(3):3274–3280Google Scholar
  24. 24.
    Cover TM, Thomas JA (1991) Elements of information theory. Wiley, New YorkCrossRefMATHGoogle Scholar
  25. 25.
    Sibson R (1973) SLINK: an optimally efficient algorithm for the single-link cluster method. Comput J (Br Comput Soc) 16(1):30–34MathSciNetGoogle Scholar
  26. 26.
    Defays D (1977) An efficient algorithm for a complete link method. Comput J (Br Comput Soc) 20(4):364–366MathSciNetMATHGoogle Scholar
  27. 27.
    Mansoori EG (2013) Using statistical measures for feature ranking. Int J Pattern Recognit Artif Intell 27(1):1–14MathSciNetCrossRefGoogle Scholar
  28. 28.
    Asuncion A, Newman DJ (2007) UCI machine learning repository. Department of Information and Computer science, University of California, Irvine, CA, online available: Google Scholar
  29. 29.
    McLachlan GJ, Do KA, Ambroise C (2004) Analyzing microarray gene expression data. Wiley, New YorkCrossRefMATHGoogle Scholar
  30. 30.
    Raskutti B, Leckie C (1999) An evaluation of criteria for measuring the quality of clusters. In: Proceedings of the international joint conference of artificial intelligence, pp 905–910Google Scholar
  31. 31.
    Robnik-Sikonja M, Kononenko I (1997) An adaptation of relief for attribute estimation in regression. In: Machine learning proceedings of the fourteenth international conference (ICML), pp 296–304Google Scholar
  32. 32.
    Jitkrittum W, Hachiya H, Sugiyama M (2013) Feature selection via L1-penalized squared loss mutual information. IEICE Trans Inf Syst 96(7):1513–1524CrossRefGoogle Scholar

Copyright information

© Springer-Verlag London 2016

Authors and Affiliations

  1. 1.School of Electrical and Computer EngineeringShiraz UniversityShirazIran

Personalised recommendations