A Gaussian Mixture Based Maximization of Mutual Information for Supervised Feature Extraction

  • José M. Leiva-Murillo
  • Antonio Artés-Rodríguez
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3195)


In this paper, we propose a new method for linear feature extraction and dimensionality reduction for classification problems. The method is based on the maximization of the Mutual Information (MI) between the resulting features and the classes. A Gaussian Mixture is used for modelling the distribution of the data. By means of this model, the entropy of the data is then estimated, and so the MI at the output. A gradient descent algorithm is provided for its optimization. Some experiments are provided in which the method is compared with other popular linear feature extractors.


Feature Extraction Mutual Information Partial Little Square Independent Component Analysis Gaussian Mixture Model 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    UCI Repository of Machine Learning Databases (1998),
  2. 2.
    CBCL Software and Datasets, MIT, Face Images database (2000),
  3. 3.
    Battiti, R.: Using mutual information for selecting features in supervised neural net learning. Neural Networks 5, 537–550 (1994)CrossRefGoogle Scholar
  4. 4.
    Bell, A.J., Sejnowski, T.: An information maximisation approach to blind separation and blind deconvolution. Neural Computation 7(6), 1004–1034 (1995)CrossRefGoogle Scholar
  5. 5.
    Center, J.L.: Blind source separation, independent component analysis, and pattern classification - connections and synergies. In: Proceedings MaxEnt 23, Jackson Hole, WY (2003)Google Scholar
  6. 6.
    Cover, T.M., Thomas, J.A.: Elements of Information Theory. John Wiley &Sons, Chichester (1991)zbMATHCrossRefGoogle Scholar
  7. 7.
    Dempster, A.P., Laird, N.M., Rubin, D.B.: Maximum likelihood from incomplete data via EM algorithm (with discussion). Journal of the Royal Statistical Society B(39), 1–38 (1977)MathSciNetGoogle Scholar
  8. 8.
    Xu, D., Principe, J., Fischer III., J.W.: Information-Theoretic Learning, vol. 1. Wiley, Chichester (2000)Google Scholar
  9. 9.
    Kaski, S., Peltonen, J.: Informative discriminant analysis. In: Proceeding of the ICML, Washington DC, vol. 5, pp. 329–336 (2003)Google Scholar
  10. 10.
    Kwak, N., Choi, C.: Input feature selection by mutual information based on parzen window. IEEE Transactions on Pattern Analysis and Machine Intelligence 24(12), 1667–1671 (2002)CrossRefGoogle Scholar
  11. 11.
    Pereira, F.C., Tishby, N., Bialek, W.: The information bottleneck method. In: 37th Annual Allerton International Conference on Communications, Control and Computing (1999)Google Scholar
  12. 12.
    Torkkola, K.: Feature extraction by non-parametric mutual information maximization. Journal on Machine Learning Research 3, 1415–1438 (2003)zbMATHCrossRefMathSciNetGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2004

Authors and Affiliations

  • José M. Leiva-Murillo
    • 1
  • Antonio Artés-Rodríguez
    • 1
  1. 1.Department of Signal Theory and CommunicationsUniversidad Carlos III de MadridLeganés-MadridSpain

Personalised recommendations