Skip to main content

Cluster-Dependent Feature Selection through a Weighted Learning Paradigm

  • Chapter
Advances in Knowledge Discovery and Management

Part of the book series: Studies in Computational Intelligence ((SCI,volume 292))

Abstract

This paper addresses the problem of selecting a subset of the most relevant features from a dataset through a weighted learning paradigm.We propose two automated feature selection algorithms for unlabeled data. In contrast to supervised learning, the problem of automated feature selection and feature weighting in the context of unsupervised learning is challenging, because label information is not available or not used to guide the feature selection. These algorithms involve both the introduction of unsupervised local feature weights, identifying certain relevant features of the data, and the suppression of the irrelevant features using unsupervised selection. The algorithms described in this paper provide topographic clustering, each cluster being associated to a prototype and a weight vector, reflecting the relevance of the feature. The proposed methods require simple computational techniques and are based on the self-organizing map (SOM) model. Empirical results based on both synthetic and real datasets from the UCI repository, are given and discussed.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  • Almuallim, H., Dietterich, T.: Learning with many irrelevant features. In: Proceedings of the Ninth National Conference on Artificial Intelligence, pp. 547–552. AAAI Press, Anaheim (1991)

    Google Scholar 

  • Asuncion, A., Newman, D.: UCI Machine Learning Repository (2007), http://www.ics.uci.edu/~mlearn/MLRepository.html

  • Benabdeslem, K., Lebbah, M.: Feature selection for Self Organizing Map. In: International Conference on Information Technology Interface-ITI 2007, Cavtat-Dubrovnik,Croatia, June 25-28, pp. 45–50 (2007)

    Google Scholar 

  • Bennani., Y.: Adaptive weighting of pattern features during learning. In: IJCNN 1999, Piscataway, NJ, vol. 5, pp. 3008–3013 (1999)

    Google Scholar 

  • Bishop, C.M., Svensén, M., Williams, C.K.I.: GTM: The generative topographic mapping. Neural Comput. 10(1), 215–234 (1998)

    Article  Google Scholar 

  • Blansche, A., Gancarski, P., Korczak, J.: MACLAW: A modular approach for clustering with local attribute weighting. Pattern Recognition Letters 27(11), 1299–1306 (2006)

    Article  Google Scholar 

  • Cattell, R.: The scree test for the number of factors. Multivariate Behavioral Research 1, 245–276 (1966)

    Article  Google Scholar 

  • Dy, J.G., Brodley, C.E.: Feature Selection for Unsupervised Learning. JMLR 5, 845–889 (2004)

    MathSciNet  Google Scholar 

  • Frigui, H., Nasraoui, O.: Unsupervised learning of prototypes and attribute weights. Pattern Recognition 37(3), 567–581 (2004)

    Article  Google Scholar 

  • Fukunaga, K.: Introduction to Statistical Pattern Recognition, 2nd edn. Computer Science and Scientific Computing Series. Academic Press, London (1990)

    MATH  Google Scholar 

  • Guérif, S., Bennani, Y.: Dimensionality reduction trough unsupervised features selection. In: International Conference on Engineering Applications of Neural Networks (2007)

    Google Scholar 

  • Horn, J.L., Engstrom, R.: Cattell’s Scree Test in Relation to Bartlett’s Chi-Square Test and Other Observations on the Number of Factors Problem. Multivariate Behavioral Research 14(3), 283–300 (1979)

    Article  Google Scholar 

  • Huang, J.Z., Ng, M.K., Rong, H., Li, Z.: Automated Variable Weighting in k-Means Type Clustering. IEEE Trans. Pattern Anal. Mach. Intell. 27(5), 657–668 (2005), http://dx.doi.org/10.1109/TPAMI.2005.95

    Article  Google Scholar 

  • Huh, M.-H., Lim, Y.B.: Weighting variables in K-means clustering. Journal of Applied Statistics 36(1), 67–78 (2009)

    Article  MATH  Google Scholar 

  • Jain, A.K., Dubes, R.C.: Algorithms for clustering data. Prentice-Hall, Inc., Upper Saddle River (1988)

    MATH  Google Scholar 

  • Jing, L., Ng, M.K., Huang, J.Z.: An Entropy Weighting k-Means Algorithm for Subspace Clustering of High-Dimensional Sparse Data. IEEE Trans. on Knowl. and Data Eng. 19(8), 1026–1041 (2007), http://dx.doi.org/10.1109/TKDE.2007.1048

    Article  Google Scholar 

  • Kohonen, T.: Self-organizing Maps. Springer, Berlin (2001)

    MATH  Google Scholar 

  • Lebbah, M., Rogovschi, N., Bennani, Y.: BeSOM: Bernoulli on Self Organizing Map. In: IJCNN 2007, Orlando, Florida (2007)

    Google Scholar 

  • Li, C.-X., Yu, J.: A novel fuzzy C-means clustering algorithm. In: Wang, G.-Y., Peters, J.F., Skowron, A., Yao, Y. (eds.) RSKT 2006. LNCS (LNAI), vol. 4062, pp. 510–515. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  • Raîche, G., Riopel, M., Blais, J.-G.: Non Graphical Solutions for the Cattell’s Scree Test. In: International Meeting of the Psychometric Society, IMPS 2006, HEC, Montréal (2006)

    Google Scholar 

  • Tsai, C.-Y., Chiu, C.-C.: Developing a feature weight self-adjustment mechanism for a K-means clustering algorithm. Comput. Stat. Data Anal. 52(10), 4658–4672 (2008), http://dx.doi.org/10.1016/j.csda.2008.03.002

    Article  MATH  MathSciNet  Google Scholar 

  • Verbeek, J., Vlassis, N., Krose, B.: Self-organizing mixture models. Neurocomputing 63, 99–123 (2005)

    Article  Google Scholar 

  • Vesanto, J., Alhoniemi, E.: Clustering of the Self-Organizing Map. IEEE Transactions on Neural Networks 11(3), 586–600 (2000)

    Article  Google Scholar 

  • Wang, C.-M., Huang, Y.-F.: Evolutionary-based feature selection approaches with new criteria for data mining: A case study of credit approval data. Expert Systems with Applications 36(3, Part 2), 5900–5908 (2009)

    Article  Google Scholar 

  • Wang, Q., Ye, Y., Huang, J.Z.: Fuzzy K-Means with Variable Weighting in High Dimensional Data Analysis. In: International Conference on Web-Age Information Management, vol. 0, pp. 365–372 (2008), http://doi.ieeecomputersociety.org/10.1109/WAIM.2008.50

  • Wiratunga, N., Lothian, R., Massie, S.: Unsupervised Feature Selection for Text Data. In: Roth-Berghofer, T.R., Göker, M.H., Güvenir, H.A. (eds.) ECCBR 2006. LNCS (LNAI), vol. 4106, pp. 340–354. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  • Yacoub, M., Bennani, Y.: Features Selection and Architecture Optimization in Connectionist Systems. IJNS 10(5) (2000)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2010 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Grozavu, N., Bennani, Y., Lebbah, M. (2010). Cluster-Dependent Feature Selection through a Weighted Learning Paradigm. In: Guillet, F., Ritschard, G., Zighed, D.A., Briand, H. (eds) Advances in Knowledge Discovery and Management. Studies in Computational Intelligence, vol 292. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-00580-0_8

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-00580-0_8

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-00579-4

  • Online ISBN: 978-3-642-00580-0

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics