Unsupervised Feature Selection for Noisy Data

  • Kaveh MahdaviEmail author
  • Jesus Labarta
  • Judit Gimenez
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11888)


Feature selection techniques are enormously applied in a variety of data analysis tasks in order to reduce the dimensionality. According to the type of learning, feature selection algorithms are categorized to: supervised or unsupervised. In unsupervised learning scenarios, selecting features is a much harder problem, due to the lack of class labels that would facilitate the search for relevant features. The selecting feature difficulty is amplified when the data is corrupted by different noises. Almost all traditional unsupervised feature selection methods are not robust against the noise in samples. These approaches do not have any explicit mechanism for detaching and isolating the noise thus they can not produce an optimal feature subset. In this article, we propose an unsupervised approach for feature selection on noisy data, called Robust Independent Feature Selection (RIFS). Specifically, we choose feature subset that contains most of the underlying information, using the same criteria as the Independent component analysis (ICA). Simultaneously, the noise is separated as an independent component. The isolation of representative noise samples is achieved using factor oblique rotation whereas noise identification is performed using factor pattern loadings. Extensive experimental results over divers real-life data sets have showed the efficiency and advantage of the proposed algorithm.


Feature selection Independent Component Analysis Oblique rotation Noise separation 



We thankfully acknowledge the support of the Comision Interministerial de Ciencia y Tecnologa (CICYT) under contract No. TIN2015-65316-P which has partially funded this work.


  1. 1.
    Arai, K., Barakbah A. R.: Hierarchical K-means: an algorithm for centroids initialization for K-means. Reports of the Faculty of Science, Saga University, 36(1), pp. 25–31 (2007)Google Scholar
  2. 2.
    Cai, D., Zhang, C., He, X.: Unsupervised feature selection for multi-cluster data. In: Proceedings of the 16th ACM SIGKDD (2010).
  3. 3.
    Cover, T.M., Thomas, J.A.: Elements of Information Theory, 2nd edn., p. 394. Wiley-Interscience (2005). Scholar
  4. 4.
    Duda, R.O., Hart, P.E., Stork, D.G.: Pattern Classification, 2nd edn., pp. 400–401. Wiley-Interscience, Hoboken (2000)Google Scholar
  5. 5.
    Dy, J.G., Brodley, C.: Feature selection for unsupervised learning. JMLR 5, 845–889 (2004)MathSciNetzbMATHGoogle Scholar
  6. 6.
    Guyon, I., Elisseeff, A.: An introduction to variable and feature selection. JMLR 3, 1157–1182 (2003)zbMATHGoogle Scholar
  7. 7.
    He, X., Cai, D., Niyogi, P.: Laplacian score for feature selection. In: Advances in Neural Information Processing System, vol. 18, pp. 507–514 (2005)Google Scholar
  8. 8.
    Hendrickson, A.E., White, P.O.: PROMAX: a quick method for rotation to oblique simple structure. J. Stat. Psychol. 17(1), 65–70 (1964). Scholar
  9. 9.
    Hyvrinen, A., Oja, E.: Independent component analysis: algorithms and applications. Neural Netw. 13(4–5), 411–430 (2000)CrossRefGoogle Scholar
  10. 10.
    Kaiser, H.F.: The varimax criterion for analytic rotation in factor analysis. Psychometrika 23(3), 187–200 (1958). Scholar
  11. 11.
    Liu, H., Yu, L.: Toward integrating feature selection algorithms for classification and clustering. IEEE Trans. Knowl. Data Eng. 17(4), 491–502 (2005). Scholar
  12. 12.
    Lu, Y., Cohen, I., Zhou, X.S., Tian, Q.: Feature selection using principal feature analysis. In: Proceedings of the 15th ACM International Conference on Multimedia, pp. 301–304 (2007).
  13. 13.
    Mancini, R., Carter, B.: Op Amps for Everyone. Texas Instruments, pp. 10–11 (2009).
  14. 14.
    McCabe, G.P.: Principal variables. Technometrics 26(2), 137–144 (1984)MathSciNetCrossRefGoogle Scholar
  15. 15.
    Qian, M., Zhai, C.: Robust unsupervised feature selection. In: Proceedings of the 23rd International Joint Conference on Artificial Intelligence, pp. 1621–1627 (2013)Google Scholar
  16. 16.
    Rodgers, J.L., Nicewander, W.A.: Thirteen ways to look at the correlation coefficient. Am. Stat. 42(1), 59–66 (1988). Scholar
  17. 17.
    Shlens, J.: A tutorial on principal component analysis. ArXiv preprint arXiv:1404.2986 (2014)
  18. 18.
    Shukla, H., Kumar, N., Tripathi, R.P.: Gaussian noise filtering techniques using new median filter. IJCA 95(12), 12–15 (2014). Scholar
  19. 19.
    Zarzoso, V., Comon, P., Kallel, M.: How fast is FastICA? In: Proceedings of the 14th European Signal Processing Conference, pp. 1–5 (2006)Google Scholar
  20. 20.
    Zhao, Z., Liu, H.: Spectral feature selection for supervised and unsupervised learning. In: Proceedings of the 24th ICML, pp. 1151–1157 (2007).

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  • Kaveh Mahdavi
    • 1
    • 2
    Email author
  • Jesus Labarta
    • 1
    • 2
  • Judit Gimenez
    • 1
    • 2
  1. 1.Barcelona Supercomputing Center (BSC)BarcelonaSpain
  2. 2.Universitat Politcnica de CatalunyaBarcelonaSpain

Personalised recommendations