A Performance Evaluation of Mutual Information Estimators for Multivariate Feature Selection
Mutual information is one of the most popular criteria used in feature selection, for which many estimation techniques have been proposed. The large majority of them are based on probability density estimation and perform badly when faced to high-dimensional data, because of the curse of dimensionality. However, being able to evaluate robustly the mutual information between a subset of features and an output vector can be of great interest in feature selection. This is particularly the case when some features are only jointly redundant or relevant. In this paper, different mutual information estimators are compared according to important criteria for feature selection; the interest of a nearest neighbors-based estimator is shown.
KeywordsMutual information estimation Feature selection Density estimation Nearest neighbors
Unable to display preview. Download preview PDF.
- 1.Bellman, R.E.: Adaptive control processes - A guided tour. Princeton University Press (1961)Google Scholar
- 2.Shannon, C.E.: A mathematical Theory of Communication. Bell Syst. Tech. J. 27, 379–423, 623–656 (1948)Google Scholar
- 6.Kraskov, A., Stögbauer, H., Grassberger, P.: Estimating Mutual Information. Phys. Rev. E 69, 066138 (2004)Google Scholar
- 12.Turlach, B.A.: Bandwidth Selection in Kernel Density Estimation: A Review. CORE and Institut de Statistique, 23–493 (1993)Google Scholar
- 13.Daub, C., Steuer, R., Selbig, J., Kloska, S.: Estimating mutual information using B-spline functions - an improved similarity measure for analysing gene expression data. BMC Bioinformatics 5 (2004)Google Scholar
- 19.Cover, T.M., Thomas, J.A.: Elements of Information Theory. Wiley-Interscience (1981)Google Scholar
- 21.Rudemo, M.: Empirical Choice of Histograms and Kernel Density Estimators. Scand. J. Stat. 9 (1982)Google Scholar