Speeding Up the Wrapper Feature Subset Selection in Regression by Mutual Information Relevance and Redundancy Analysis

  • Gert Van Dijck
  • Marc M. Van Hulle
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4131)


A hybrid filter/wrapper feature subset selection algorithm for regression is proposed. First, features are filtered by means of a relevance and redundancy filter using mutual information between regression and target variables. We introduce permutation tests to find statistically significant relevant and redundant features. Second, a wrapper searches for good candidate feature subsets by taking the regression model into account. The advantage of a hybrid approach is threefold. First, the filter provides interesting features independently from the regression model and, hence, allows for an easier interpretation. Secondly, because the filter part is computationally less expensive, the global algorithm will faster provide good candidate subsets compared to a stand-alone wrapper approach. Finally, the wrapper takes the bias of the regression model into account, because the regression model guides the search for optimal features. Results are shown for the ‘Boston housing’ and ‘orange juice’ benchmarks based on the multilayer perceptron regression model.


Feature Selection Mean Square Error Mutual Information Orange Juice Feature Subset 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Guyon, I., Elisseeff, A.: An Introduction to Variable and Feature Selection. Journal of Ma-chine Learning Research 3, 1157–1182 (2003)MATHCrossRefGoogle Scholar
  2. 2.
    Kurgan, L.A., Cios, K.J.: CAIM Discretization Algorithm. IEEE Transactions on Knowl-edge and Data Engineering 16, 145–153 (2004)CrossRefGoogle Scholar
  3. 3.
    Kohavi, R., John, G.H.: Wrappers for Feature Subset Selection. Artificial Intelligence 97, 273–324 (1997)MATHCrossRefGoogle Scholar
  4. 4.
    Van Dijck, G., Van Hulle, M.M., Wevers, M.: Hierarchical Feature Subset Selection for Features Computed from the Continuous Wavelet Transform. In: 2005 IEEE Workshop on Machine Learning for Signal Processing, pp. 81–86 (2005)Google Scholar
  5. 5.
    Cover, T.M., Thomas, J.A.: Elements of information theory. John Wiley & Sons, New York (1991)MATHCrossRefGoogle Scholar
  6. 6.
    Schreiber, T., Schmitz, A.: Surrogate Time Series. Physica D 142, 346–382 (2000)MATHCrossRefMathSciNetGoogle Scholar
  7. 7.
    Kraskov, A., Stögbauer, H., Grassberger, P.: Estimating Mutual Information. Phys. Rev. E. 69, 66138 (2004)CrossRefGoogle Scholar
  8. 8.
    Francois, D., Wertz, V., Verleysen, M.: The Permutation Test for Feature Selection by Mutual Information. In: European Symposium on Artificial Neural Networks, pp. 239–244 (2006)Google Scholar
  9. 9.
    John, G., Kohavi, R., Pfleger, K.: Irrelevant Features and the Subset Selection Problem. In: Proc. of the Eleventh Int. Conf. on Machine Learning, pp. 121–129 (1994)Google Scholar
  10. 10.
    Duda, R.O., Hart, P.E., Stork, D.G.: Pattern Classification. John Wiley & Sons Inc., New York (2001)MATHGoogle Scholar
  11. 11.
    Bishop, C.M.: Neural Networks for Pattern Recognition. Oxford University Press Inc., New York (1997)Google Scholar
  12. 12.
    Narendra, P.M., Fukunaga, K.: A Branch and Bound Algorithm for Feature Subset Selection. IEEE Trans. Computers 26, 917–922 (1977)MATHCrossRefGoogle Scholar
  13. 13.
    Pudil, P., Novovicova, J., Kittler, J.: Floating Search Methods in Feature Selection. Pattern Recognition Letters 15, 1119–1125 (1994)CrossRefGoogle Scholar
  14. 14.
    Michalewicz, Z.: Genetic Algorithms + Data Structures = Evolution Programs, 3rd edn. Springer, Heidelberg (1996)MATHGoogle Scholar
  15. 15.
    Kudo, M., Sklansky, J.: Comparison of Algorithms that Select Features for Pattern Recognition. Pattern Recognition 33, 25–41 (2000)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Gert Van Dijck
    • 1
  • Marc M. Van Hulle
    • 1
  1. 1.Computational Neuroscience Research Group, Laboratorium voor Neuro-en PsychofysiologieK.U. LeuvenLeuvenBelgium

Personalised recommendations