Pattern Analysis and Applications

, Volume 11, Issue 3–4, pp 309–319 | Cite as

Feature selection, mutual information, and the classification of high-dimensional patterns

Applications to image classification and microarray data analysis
  • Boyan BonevEmail author
  • Francisco Escolano
  • Miguel Cazorla
Theoretical Advances


We propose a novel feature selection filter for supervised learning, which relies on the efficient estimation of the mutual information between a high-dimensional set of features and the classes. We bypass the estimation of the probability density function with the aid of the entropic-graphs approximation of Rényi entropy, and the subsequent approximation of the Shannon entropy. Thus, the complexity does not depend on the number of dimensions but on the number of patterns/samples, and the curse of dimensionality is circumvented. We show that it is then possible to outperform algorithms which individually rank features, as well as a greedy algorithm based on the maximal relevance and minimal redundancy criterion. We successfully test our method both in the contexts of image classification and microarray data classification. For most of the tested data sets, we obtain better classification results than those reported in the literature.


Filter feature selection Mutual information Entropic spanning graphs Microarray 



This research is funded by the project DPI2005-01280 from the Spanish Government.


  1. 1.
    Sima C, Dougherty ER (2006) What should be expected from feature selection in small-sample settings. Bioinformatics 22(19):2430–2436CrossRefGoogle Scholar
  2. 2.
    Xing EP, Jordan MI, Karp RM (2001) Feature selection for high-dimensional genomic microarray data. In: Proceedings of the 18th international conference on machine learning 601–608Google Scholar
  3. 3.
    Gentile C (2003) Fast feature selection from microarray expression data via multiplicative large margin algorithms. In: Thrun S, Saul L, Schölkopf B (eds) Advances in Neural Information Processing Systems 16. MIT Press, CambridgeGoogle Scholar
  4. 4.
    Guyon I, Elisseeff A (2003) An introduction to variable and feature selection. J Mach Learn Res 3:1157–1182Google Scholar
  5. 5.
    Abe N, Kude M, Toyama J, Shimbo M (2006) Classifier-independent feature selection on the basis of divergence criterion. Pattern Anal Appl 9(2):127–137CrossRefMathSciNetGoogle Scholar
  6. 6.
    Blum AL, Langley P (1997) Selection of relevant features and examples in machine learning. Artif Intell 97(1–2):245–271zbMATHCrossRefMathSciNetGoogle Scholar
  7. 7.
    Perkins S, Theiler J (2003) Online feature selection using grafting. In: Proceedings of the 20th international conference on machine learning (ICML-2003), WashingtonGoogle Scholar
  8. 8.
    Harol A, Lai C, Pekalska E, Duin RPW (2007) Pairwise feature evaluation for constructing reduced representations. Pattern Anal Appl 10(1):55–68CrossRefMathSciNetGoogle Scholar
  9. 9.
    Cover T, Thomas J (1991) Elements of information theory. Wiley, New YorkGoogle Scholar
  10. 10.
    Peng H, Long F, Ding C (2005) Feature selection based on mutual information: criteria of max-dependency, max-relevance, and min-redundancy. IEEE Trans Pattern Anal Mach Intell 27(8):1226–1238CrossRefGoogle Scholar
  11. 11.
    Hero AO, Michel O (2002) Applications of entropic spanning graphs. IEEE Signal Process Mag 19(5):85–95CrossRefGoogle Scholar
  12. 12.
    Zyczkowski K (2003) Renyi extrapolation of Shannon entropy. Open Syst Inf Dyn 10(3):298–310MathSciNetCrossRefGoogle Scholar
  13. 13.
    Mokkadem A (1989) Estimation of the entropy and information of absolutely continuous random variables. IEEE Trans Inf Theory 35(1):193–196zbMATHCrossRefMathSciNetGoogle Scholar
  14. 14.
    Torkkola K Feature (2003) Extraction by non-parametric mutual information maximization. J Mach Learn Res 3:1415–1438Google Scholar
  15. 15.
    Neemuchwala H, Hero A, Carson P (2006) Image registration methods in high-dimensional space. Int J Imaging Syst Technol 16(5):130–145CrossRefGoogle Scholar
  16. 16.
    Paninski I (2003) Estimation of entropy and mutual information. Neural Comput 15(1):Google Scholar
  17. 17.
    Wolpert D, Wolf D (1995) Estimating function of probability distribution from a finite set of samples. Los Alamos National Laboratory Report LA-UR-92-4369, Santa Fe Institute Report TR-93-07-046Google Scholar
  18. 18.
    Wachowiak P, Smolíková R, Tourassi D, Elmaghraby S (2005) Estimation of generalized entropies with sample spacing. Pattern Anal Appl 8(1–2):95–101Google Scholar
  19. 19.
    Beirlant E, Dudewicz E, Gyorfi L, Van der Meulen E (1996) Nonparametric entropy estimation. Int J Math Stat Sci 5(1):17–39MathSciNetGoogle Scholar
  20. 20.
    Oubel E, Neemuchwala H, Hero A, Boisrobert L, Laclaustra M, Frangi AF (2005) Assessment of artery dilation by using image registration based on spatial features. In: Proceedings of SPIE medical imaging, April 2005, vol 5747, pp 1283–1291Google Scholar
  21. 21.
    Karger DR, Klein PN, Tarjan RE (1995) A randomized linear-time algorithm to find minimum spanning trees. J ACM 42(2): 321–328Google Scholar
  22. 22.
    Katriel I, Sanders P, Träff J (2003) A practical minimum spanning tree algorithm using the cycle property. 11th European Symposium on Algorithms(ESA), LNCS No. 2832, 679–690Google Scholar
  23. 23.
    Hero AO, Michel O (1999) Asymptotic theory of greedy aproximations to minnimal k-point random graphs. IEEE Trans Inf Theory 45(6):1921–1939zbMATHCrossRefMathSciNetGoogle Scholar
  24. 24.
    Bertsimas DJ, Van Ryzin G (1990) An asymptotic determination of the minimum spanning tree and minimum matching constants in geometrical probability. Oper Res Lett 9(1):223–231zbMATHCrossRefMathSciNetGoogle Scholar
  25. 25.
    Peñalver A, Escolano F, Sáez JM (2006) EBEM an entropy-based EM algorithm for Gaussian mixture models. ICPR 451–455Google Scholar
  26. 26.
    Tarr MJ, Bülthoff HH (1999) Object recognition in man, monkey, and machine. Cognition Special Issues, MIT Press, MassachusettsGoogle Scholar
  27. 27.
    Dill M, Wolf R, Heisenberg M (1993) Visual pattern recognition in Drosophila involves retinotopic matching. Nature 365(6448):639–644CrossRefGoogle Scholar
  28. 28.
    Meese TS, Hess RF (2004) Low spatial frequencies are suppressively masked across spatial scale, orientation, field position, and eye of origin. J Vis 4(10):843–859Google Scholar
  29. 29.
    Carmichael O, Mahamud S, Hebert M (2002) Discriminant filters for object recognition. Technical report, Robotics Institute, Carnegie Mellon University, March, CMU-RI-TR-02-09Google Scholar
  30. 30.
    Ekvall S, Kragic D, Hoffmann F (2005) Object recognition and pose estimation using color cooccurrence histograms and geometric modeling. Image Vis Comput 23:943–955CrossRefGoogle Scholar
  31. 31.
    Chang P, Krumm J (1999) Object recognition with color cooccurrence histograms. In: IEEE conference computer vision pattern recognition, Fort Collins, June 23–25Google Scholar
  32. 32.
    Stolovitzky G (2003) Gene selection in microarray data: the elephant, the blind men and our algorithms. Curr Opin Struct Biol 13(3):370–376Google Scholar
  33. 33.
    Jirapech-Umpai T, Aitken S (2005) Feature selection and classification for microarray data analysis: Evolutionary methods for identifying predictive genes. BMC Bioinformatics 6:148CrossRefGoogle Scholar
  34. 34.
    Pavlidis P, Poirazi P (2006) Individualized markers optimize class prediction of microarray data. BMC Bioinformatics 7:345CrossRefGoogle Scholar
  35. 35.
    Díaz-Uriate R, Alvarez de Andrés S (2006) Gene selection and classification of microarray data using random forest. BMC Bioinformatics 7(1):3. doi: 10.1186/1471-2105-7-3 CrossRefGoogle Scholar
  36. 36.
    Ruiz R, Riquelme JC, Aguilar-Ruiz JS (2006) Incremental wrapper-based gene selection from microarray data for cancer classification. Pattern Recognit 39(12):2383–2392CrossRefGoogle Scholar
  37. 37.
    Singh D, Febbo PG et al. (2002) Gene expression correlates of clinical prostate cancer behavior. Cancer Cell 1(2):203–209. doi: 10.1016/s1535-6108(02)00030-2 CrossRefGoogle Scholar

Copyright information

© Springer-Verlag London Limited 2008

Authors and Affiliations

  • Boyan Bonev
    • 1
    Email author
  • Francisco Escolano
    • 1
  • Miguel Cazorla
    • 1
  1. 1.Deptartamento Ciencia Computación e Inteligencia ArtificialUniversidad de AlicanteAlicanteSpain

Personalised recommendations