Advertisement

Unsupervised Elimination of Redundant Features Using Genetic Programming

  • Kourosh Neshatian
  • Mengjie Zhang
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5866)

Abstract

While most feature selection algorithms focus on finding relevant features, few take the redundancy issue into account. We propose a nonlinear redundancy measure which uses genetic programming to find the redundancy quotient of a feature with respect to a subset of features. The proposed measure is unsupervised and works with unlabeled data. We introduce a forward selection algorithm which can be used along with the proposed measure to perform feature selection over the output of a feature ranking algorithm. The effectiveness of the proposed method is assessed by applying it to the output of the Chi-square (χ 2) feature ranker on a classification task. The results show significant improvements in the performance of decision tree and SVM classifiers.

Keywords

Feature Selection Mean Square Error Feature Subset Ranking Algorithm Redundant Feature 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Jong, K., Mary, J., Cornuéjols, A., Marchiori, E., Sebag, M.: Ensemble feature ranking. In: Boulicaut, J.-F., Esposito, F., Giannotti, F., Pedreschi, D. (eds.) PKDD 2004. LNCS (LNAI), vol. 3202, pp. 267–278. Springer, Heidelberg (2004)Google Scholar
  2. 2.
    Ruiz, R., Riquelme, J.C., Aguilar-Ruiz, J.S.: Fast feature ranking algorithm. In: Knowledge-Based Intelligent Information and Engineering Systems, pp. 325–331 (2003)Google Scholar
  3. 3.
    Neshatian, K., Zhang, M.: Genetic programming for feature subset ranking in binary classification problems. In: Vanneschi, L., et al. (eds.) EuroGP 2009. LNCS, vol. 5481. Springer, Heidelberg (2009)Google Scholar
  4. 4.
    Zheng, Z., Srihari, R., Srihari, S.: A feature selection framework for text filtering. In: Proceedings of the Third IEEE International Conference on Data Mining. IEEE Computer Society, Washington (2003)Google Scholar
  5. 5.
    Johnson, R.A., Wichern, D.W.: Applied Multivariate Statistical Analysis, 5th edn. Prentice Hall, Englewood Cliffs (2002)Google Scholar
  6. 6.
    Asuncion, A., Newman, D.: UCI machine learning repository (2007), http://archive.ics.uci.edu/ml/index.html
  7. 7.
    Koza, J.R.: Genetic Programming: On the Programming of Computers by Means of Natural Selection. MIT Press, Cambridge (1992)zbMATHGoogle Scholar
  8. 8.
    Witten, I.H., Frank, E.: Data Mining: Practical Machine Learning Tools and Techniques, 2nd edn. Morgan Kaufmann, San Francisco (2005)zbMATHGoogle Scholar
  9. 9.
    Pearl, J.: Causality: Models, Reasoning, and Inference. Cambridge University Press, Cambridge (2000)zbMATHGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2009

Authors and Affiliations

  • Kourosh Neshatian
    • 1
  • Mengjie Zhang
    • 1
  1. 1.School of Engineering and Computer ScienceVictoria University of WellingtonWellingtonNew Zealand

Personalised recommendations