Abstract
Feature selection plays an important part in improving the quality of learning algorithms in machine learning and data mining. It has been widely studied in supervised learning, whereas it is still relatively rare researched in unsupervised learning. In this work, a clustering-based framework formed by an unsupervised feature selection algorithm is proposed. The proposed framework is mainly concerned with the problem of determining and choosing important features, which are selected by ranking the features according to the importance measure scores, from the original feature set without class information. Theory analyzed indicates that the time complexity of each algorithm is nearly linear with the size and the number of features of dataset. Experimental results on UCI datasets show that algorithm with different scores in the framework are able to identify the important features with clustering, and the proposed algorithm have obtained competitive results in terms of classification error rate and the degree of dimensionality reduction when compared with the state-of-the-art supervised and unsupervised feature selection approaches.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Asuncion, A., Newman, D. J.: UCI Machine Learning Repository (2007), http://www.ics.uci.edu/~mlearn/MLRepository.html
Au, W., Chan, K.C.C., Wong, A.K.C.: Attribute Clustering for Grouping, Selection, and Classification of Gene Expression Data. IEEE/ACM Transactions on Computational Biology and Bioinformatics 2, 83–101 (2005)
Bishop, C.M.: Neural Networks for Pattern Recognition. Oxford University Press (1995)
Covões, T.F., Hruschka, E.R., de Castro, L.N., Santos, Á.M.: A Cluster-Based Feature Selection Approach. In: Corchado, E., Wu, X., Oja, E., Herrero, Á., Baruque, B. (eds.) HAIS 2009. LNCS, vol. 5572, pp. 169–176. Springer, Heidelberg (2009)
Dash, M., Liu, H., Yao, J.: Dimensionality Reduction of Unsupervised Data. Newport Beach. In: Proc 9th IEEE Int’l Conf. Tools with Artificial Intelligence, pp. 532–539 (1997)
Guyon, I., Elisseeff, A.: An Introduction to Variable and Feature Selection. Journal of Machine Learning Research 3, 1157–1182 (2003)
Huang, J.Z., Ng, M.K., Rong, H.Q.: Automated Variable Weighting in k-Means Type Clustering. IEEE Transactions on Pattern Analysis and Machine Intelligence 27, 657–668 (2005)
Jiang, S.Y., Song, X.Y.: A Clustering-based Method for Unsupervised Intrusion Detections. Pattern Recognition Letters 5, 802–810 (2006)
Jiang, S.Y., Li, X., Zheng, Q., et al.: Approximate Equal Frequency Discretization Method. In: GCIS, vol. 5, pp. 514–518 (2009)
Sotoca, J., Pla, F.: Supervised Feature Selection by Clustering Using Conditional Mutual Information-based Distances. Pattern Recognition 43, 2068–2081 (2010)
Kira, K., Rendell, L.: The Feature Selection Problem: Traditional Methods and a New Algorithm. In: Proceedings of AAAI 1992, San Jose, CA, pp. 129–134 (1992)
Last, M., Kandel, A., Maimon, O.: Information-theoretic Algorithm for Feature Selection. Pattern Recognition Letters 22, 799–811 (2001)
Liu, H., Yu, L.: Toward Integrating Feature Selection Algorithms for Classification and Clustering. IEEE Transactions on Knowledge and Data Engineering 17, 1–12 (2005)
Liu, H., Motoda, H.: Feature Selection for Knowledge Discovery and Data Mining, vol. 454, pp. 121–135. kluwer Academic Publishers, Boston (1998)
Mingers, J.: An Empirical Comparison of Selection Measures for Decision-Tree Induction. Machine Learning 3, 19–342 (1989)
Mitra, P., Murthy, C.A.: Unsupervised Feature Selection Using Feature Similarity. IEEE Transactions on Pattern Analysis and Machine Intelligence 24, 301–312 (2002)
Modha, D.S., Spangler, W.S.: Feature Weighting in k-means Clustering. Machine Learning 52, 217–237 (2003)
Singh, S., Murthy, H., Gonsalves, T.: Feature Selection for Text Classification Based on Gini Coefficient of Inequality. In: 4th Workshop on Feature Selection in Data Mining, pp. 76–85 (2010)
Wang, X.Z., Wang, Y.D.: Improving Fuzzy C-means Clustering Based on Feature-weight Learning. Pattern Recognition Letters 25, 1123–1132 (2004)
Witten, I.H., Frank, E.: Data Mining: Practical Machine Learning Tools and Techniques, 2nd edn. Morgan Kaufmann, San Francisco (2005), http://www.cs.waikato.ac.nz/ml/weak/
Yu, L., Liu, H.: Efficient Feature Selection via Analysis of Relevance and Redundancy. Journal of Machine Learning Research 5, 1205–1224 (2004)
Zhang, D., Chen, S., Zhou, Z.: Constraint score: A New Filter Method for Feature Selection with Pair-wise Constraints. Pattern Recognition 41, 1440–1451 (2008)
Zeng, H., Cheung, Y.: A New Feature Selection Method for Gaussian Mixture Clustering. Pattern Recognition 42, 243–250 (2009)
Bishop, C.M.: Neural Networks for Pattern Recognition. Oxford University Press (1995)
Dy, J.G., Brodley, C.E.: Feature Selection for Unsupervised Learning. Journal of Machine Learning Research 5, 845–889 (2004)
Hall, M.A.: Correlation-based Feature Subset Selection for Machine Learning, Hamilton, New Zealand (1998)
Hu, Q., Liu, J., Yu, D.: Mixed Feature Selection Based on Granulation and Approximation. Knowledge based Systems 21, 294–304 (2008)
Hu, Q., Pedrycz, W., Yu, D.: Selecting Categorical and Continuous Features Based on Neighborhood Decision Error Minimization. IEEE Trans. on Systems, Man, and Cybernetics-Part B: Cybernetics 40, 137–150 (2010)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Jiang, Sy., Wang, Lx. (2012). An Unsupervised Feature Selection Framework Based on Clustering. In: Cao, L., Huang, J.Z., Bailey, J., Koh, Y.S., Luo, J. (eds) New Frontiers in Applied Data Mining. PAKDD 2011. Lecture Notes in Computer Science(), vol 7104. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-28320-8_29
Download citation
DOI: https://doi.org/10.1007/978-3-642-28320-8_29
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-28319-2
Online ISBN: 978-3-642-28320-8
eBook Packages: Computer ScienceComputer Science (R0)