Skip to main content

An Unsupervised Feature Selection Framework Based on Clustering

  • Conference paper
New Frontiers in Applied Data Mining (PAKDD 2011)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 7104))

Included in the following conference series:

Abstract

Feature selection plays an important part in improving the quality of learning algorithms in machine learning and data mining. It has been widely studied in supervised learning, whereas it is still relatively rare researched in unsupervised learning. In this work, a clustering-based framework formed by an unsupervised feature selection algorithm is proposed. The proposed framework is mainly concerned with the problem of determining and choosing important features, which are selected by ranking the features according to the importance measure scores, from the original feature set without class information. Theory analyzed indicates that the time complexity of each algorithm is nearly linear with the size and the number of features of dataset. Experimental results on UCI datasets show that algorithm with different scores in the framework are able to identify the important features with clustering, and the proposed algorithm have obtained competitive results in terms of classification error rate and the degree of dimensionality reduction when compared with the state-of-the-art supervised and unsupervised feature selection approaches.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Asuncion, A., Newman, D. J.: UCI Machine Learning Repository (2007), http://www.ics.uci.edu/~mlearn/MLRepository.html

  2. Au, W., Chan, K.C.C., Wong, A.K.C.: Attribute Clustering for Grouping, Selection, and Classification of Gene Expression Data. IEEE/ACM Transactions on Computational Biology and Bioinformatics 2, 83–101 (2005)

    Article  Google Scholar 

  3. Bishop, C.M.: Neural Networks for Pattern Recognition. Oxford University Press (1995)

    Google Scholar 

  4. Covões, T.F., Hruschka, E.R., de Castro, L.N., Santos, Á.M.: A Cluster-Based Feature Selection Approach. In: Corchado, E., Wu, X., Oja, E., Herrero, Á., Baruque, B. (eds.) HAIS 2009. LNCS, vol. 5572, pp. 169–176. Springer, Heidelberg (2009)

    Chapter  Google Scholar 

  5. Dash, M., Liu, H., Yao, J.: Dimensionality Reduction of Unsupervised Data. Newport Beach. In: Proc 9th IEEE Int’l Conf. Tools with Artificial Intelligence, pp. 532–539 (1997)

    Google Scholar 

  6. Guyon, I., Elisseeff, A.: An Introduction to Variable and Feature Selection. Journal of Machine Learning Research 3, 1157–1182 (2003)

    MATH  Google Scholar 

  7. Huang, J.Z., Ng, M.K., Rong, H.Q.: Automated Variable Weighting in k-Means Type Clustering. IEEE Transactions on Pattern Analysis and Machine Intelligence 27, 657–668 (2005)

    Article  Google Scholar 

  8. Jiang, S.Y., Song, X.Y.: A Clustering-based Method for Unsupervised Intrusion Detections. Pattern Recognition Letters 5, 802–810 (2006)

    Article  Google Scholar 

  9. Jiang, S.Y., Li, X., Zheng, Q., et al.: Approximate Equal Frequency Discretization Method. In: GCIS, vol. 5, pp. 514–518 (2009)

    Google Scholar 

  10. Sotoca, J., Pla, F.: Supervised Feature Selection by Clustering Using Conditional Mutual Information-based Distances. Pattern Recognition 43, 2068–2081 (2010)

    Article  MATH  Google Scholar 

  11. Kira, K., Rendell, L.: The Feature Selection Problem: Traditional Methods and a New Algorithm. In: Proceedings of AAAI 1992, San Jose, CA, pp. 129–134 (1992)

    Google Scholar 

  12. Last, M., Kandel, A., Maimon, O.: Information-theoretic Algorithm for Feature Selection. Pattern Recognition Letters 22, 799–811 (2001)

    Article  MATH  Google Scholar 

  13. Liu, H., Yu, L.: Toward Integrating Feature Selection Algorithms for Classification and Clustering. IEEE Transactions on Knowledge and Data Engineering 17, 1–12 (2005)

    Article  Google Scholar 

  14. Liu, H., Motoda, H.: Feature Selection for Knowledge Discovery and Data Mining, vol. 454, pp. 121–135. kluwer Academic Publishers, Boston (1998)

    Book  MATH  Google Scholar 

  15. Mingers, J.: An Empirical Comparison of Selection Measures for Decision-Tree Induction. Machine Learning 3, 19–342 (1989)

    Google Scholar 

  16. Mitra, P., Murthy, C.A.: Unsupervised Feature Selection Using Feature Similarity. IEEE Transactions on Pattern Analysis and Machine Intelligence 24, 301–312 (2002)

    Article  Google Scholar 

  17. Modha, D.S., Spangler, W.S.: Feature Weighting in k-means Clustering. Machine Learning 52, 217–237 (2003)

    Article  MATH  Google Scholar 

  18. Singh, S., Murthy, H., Gonsalves, T.: Feature Selection for Text Classification Based on Gini Coefficient of Inequality. In: 4th Workshop on Feature Selection in Data Mining, pp. 76–85 (2010)

    Google Scholar 

  19. Wang, X.Z., Wang, Y.D.: Improving Fuzzy C-means Clustering Based on Feature-weight Learning. Pattern Recognition Letters 25, 1123–1132 (2004)

    Article  Google Scholar 

  20. Witten, I.H., Frank, E.: Data Mining: Practical Machine Learning Tools and Techniques, 2nd edn. Morgan Kaufmann, San Francisco (2005), http://www.cs.waikato.ac.nz/ml/weak/

    MATH  Google Scholar 

  21. Yu, L., Liu, H.: Efficient Feature Selection via Analysis of Relevance and Redundancy. Journal of Machine Learning Research 5, 1205–1224 (2004)

    MathSciNet  MATH  Google Scholar 

  22. Zhang, D., Chen, S., Zhou, Z.: Constraint score: A New Filter Method for Feature Selection with Pair-wise Constraints. Pattern Recognition 41, 1440–1451 (2008)

    Article  MATH  Google Scholar 

  23. Zeng, H., Cheung, Y.: A New Feature Selection Method for Gaussian Mixture Clustering. Pattern Recognition 42, 243–250 (2009)

    Article  MATH  Google Scholar 

  24. Bishop, C.M.: Neural Networks for Pattern Recognition. Oxford University Press (1995)

    Google Scholar 

  25. Dy, J.G., Brodley, C.E.: Feature Selection for Unsupervised Learning. Journal of Machine Learning Research 5, 845–889 (2004)

    MathSciNet  MATH  Google Scholar 

  26. Hall, M.A.: Correlation-based Feature Subset Selection for Machine Learning, Hamilton, New Zealand (1998)

    Google Scholar 

  27. Hu, Q., Liu, J., Yu, D.: Mixed Feature Selection Based on Granulation and Approximation. Knowledge based Systems 21, 294–304 (2008)

    Article  Google Scholar 

  28. Hu, Q., Pedrycz, W., Yu, D.: Selecting Categorical and Continuous Features Based on Neighborhood Decision Error Minimization. IEEE Trans. on Systems, Man, and Cybernetics-Part B: Cybernetics 40, 137–150 (2010)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2012 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Jiang, Sy., Wang, Lx. (2012). An Unsupervised Feature Selection Framework Based on Clustering. In: Cao, L., Huang, J.Z., Bailey, J., Koh, Y.S., Luo, J. (eds) New Frontiers in Applied Data Mining. PAKDD 2011. Lecture Notes in Computer Science(), vol 7104. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-28320-8_29

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-28320-8_29

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-28319-2

  • Online ISBN: 978-3-642-28320-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics