Unsupervised Feature Selection in High Dimensional Spaces and Uncertainty

  • José R. Villar
  • María R. Suárez
  • Javier Sedano
  • Felipe Mateos
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5572)


Developing models and methods to manage data vagueness is a current effervescent research field. Some work has been done with supervised problems but unsupervised problems and uncertainty have still not been studied. In this work, an extension of the Fuzzy Mutual Information Feature Selection algorithm for unsupervised problems is outlined. This proposal is a two stage procedure. Firstly, it makes use of the fuzzy mutual information measure and Battiti’s feature selection algorithm and of a genetic algorithm to analyze the relationships between feature subspaces in a high dimensional space. The second stage uses a simple ad hoc heuristic with the aim to extract the most relevant relationships. It is concluded, given the results from the experiments carried out in this preliminary work, that it is possible to apply frequent pattern mining or similar methods in the second stage to reduce the dimensionality of the data set.


Unsupervised feature selection genetic algorithms data uncertainty frequent pattern mining 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Alcala-Fdez, J., Sanchez, L., Garcia, S., Jesus, M.J.D., Ventura, S., Garrell, J.M., Otero, J., Romero, C., Bacardit, J., Rivas, V.M., Fernandez, J.C., Herrera, F.: KEEL: A Software Tool to Assess Evolutionary Algorithms to Data Mining Problems. Soft Computing 13(3), 307–318 (2009)CrossRefGoogle Scholar
  2. 2.
    Battiti, R.: Using mutual information for selecting features in supervised neural net learning. IEEE Transactions on Neural Networks 5(4), 537–550 (1994)CrossRefGoogle Scholar
  3. 3.
    Casillas, J., Cordon, O., Jesus, M.J.D., Herrera, F.: Genetic feature selection in a fuzzy rule-based classification system learning process for high-dimensional problems. Information Sciences 136, 135–157 (2001)CrossRefzbMATHGoogle Scholar
  4. 4.
    Chow, T.W.S., Wang, P., Ma, E.W.M.: A New Feature Selection Scheme Using a Data Distribution Factor for Unsupervised Nominal Data. IEEE Transactions on Systems, Man and Cybernetics - PART B: Cybernetics 38(2), 499–509 (2008)CrossRefGoogle Scholar
  5. 5.
    Conaire, C.O., Connor, N.E.: Unsupervised feature selection for detection using mutual information thresholding. In: Ninth International Workshop on Image Analysis for Multimedia Interactive Services (2008)Google Scholar
  6. 6.
    Ho, T.K.: The random subspace method for constructing decision forests. IEEE Transactions on Pattern Analysis and Machine Intelligence 20(8), 832–844 (1998)CrossRefGoogle Scholar
  7. 7.
    Hong, Y., Kwong, S., Chang, Y., Ren, Q.: Consensus unsupervised feature ranking from multiple views. Pattern Recognition Letters 29(5), 595–602 (2008)CrossRefGoogle Scholar
  8. 8.
    Hu, Q., Yu, D., Xie, Z., Liu, J.: Fuzzy Probabilistic Approximation Spaces and Their Information Measures. IEEE Transactions on Fuzzy Systems 14(2), 191–201 (2006)CrossRefGoogle Scholar
  9. 9.
    Jensen, R., Shen, Q.: Fuzzy-rough sets assisted attribute selection. IEEE Transactions on Fuzzy Systems 1(15), 73–89 (2007)CrossRefGoogle Scholar
  10. 10.
    Marcelloni, F.: Feature selection based on a modified fuzzy c-means algorithm with supervision. Information Sciences 151 (2003)Google Scholar
  11. 11.
    Mitra, P., Murthy, C.A., Pal, S.K.: Unsupervised Feature Selection using Feature Similarity. IEEE Transactions on Pattern Analysis and Machine Intelligence 24(3), 301–312 (2002)CrossRefGoogle Scholar
  12. 12.
    Roubus, J.A., Setnes, M., Abonyi, J.: Learning fuzzy classification rules from labelled data. Information Sciences 150, 77–93 (2003)MathSciNetCrossRefGoogle Scholar
  13. 13.
    Sanchez, L., Suarez, M.R., Villar, J.R., Couso, I.: Mutual Information-based Feature Selection and Fuzzy Discretization of Vague Data. International Journal of Aproximate Reasoning (2008),
  14. 14.
    Sanchez, L., Villar, J.R., Couso, I.: Proceedings of the 12th International Conference on Information Processing and Management of Uncertainty in Knowledge-Based Systems. EUSFLAT, Genetic Feature Selection for Fuzzy Discretized Data (2008)Google Scholar
  15. 15.
    Sedano, J., Villar, J.R., Corchado, E.S., Curiel, L., Bravo, P.M.: The application of a two-step AI model to an Automated Pneumatic Drilling Process. Accepted to be published in the International Journal of Computer Mathematics (2008)Google Scholar
  16. 16.
    Thangavel, K., Pethalakshmi, A.: Dimensionality reduction based on rough set theory: A review. Applied Soft Computing 9(1), 1–12 (2009)CrossRefGoogle Scholar
  17. 17.
    Uncu, O., Turksen, I.: A novel feature selection approach: Combining feature wrappers and filters. Information Sciences 177, 449–466 (2007)MathSciNetCrossRefzbMATHGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2009

Authors and Affiliations

  • José R. Villar
    • 1
  • María R. Suárez
    • 1
  • Javier Sedano
    • 2
  • Felipe Mateos
    • 3
  1. 1.Computer Science DepartmentUniversity of OviedoGijónSpain
  2. 2.Electromechanic Engineering DepartmentUniversity of BurgosSpain
  3. 3.Electric, Electronic, Computers and Systems Engineering DepartmentUniversity of OviedoGijónSpain

Personalised recommendations