Auxiliary Variational Information Maximization for Dimensionality Reduction

  • Felix Agakov
  • David Barber
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3940)


Mutual Information (MI) is a long studied measure of information content, and many attempts to apply it to feature extraction and stochastic coding have been made. However, in general MI is computationally intractable to evaluate, and most previous studies redefine the criterion in forms of approximations. Recently we described properties of a simple lower bound on MI, and discussed its links to some of the popular dimensionality reduction techniques [1]. Here we introduce a richer family of auxiliary variational bounds on MI, which generalizes our previous approximations. Our specific focus then is on applying the bound to extracting informative lower-dimensional projections in the presence of irreducible Gaussian noise. We show that our method produces significantly tighter bounds than the well-known as-if Gaussian approximations of MI. We also show that the auxiliary variable method may help to significantly improve on reconstructions from noisy lower-dimensional projections.


Mutual Information Dimensionality Reduction Auxiliary Variable Conditional Entropy Rich Family 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Barber, D., Agakov, F.V.: The IM Algorithm: A Variational Approach to Information Maximization. In: NIPS, MIT Press, Cambridge (2003)Google Scholar
  2. 2.
    Linsker, R.: An Application of the Principle of Maximum Information Preservation to Linear Systems. In: Touretzky, D. (ed.) Advances in Neural Information Processing Systems, vol. 1. Morgan-Kaufmann, San Francisco (1989)Google Scholar
  3. 3.
    Bell, A.J., Sejnowski, T.J.: An information-maximization approach to blind separation and blind deconvolution. Neural Computation 7(6), 1129–1159 (1995)CrossRefGoogle Scholar
  4. 4.
    Brunel, N., Nadal, J.P.: Mutual Information, Fisher Information and Population Coding. Neural Computation 10, 1731–1757 (1998)CrossRefGoogle Scholar
  5. 5.
    Agakov, F.V., Barber, D.: Variational Information Maximization for Neural Coding. In: International Conference on Neural Information Processing. Springer, Heidelberg (2004)Google Scholar
  6. 6.
    Agakov, F.V.: Variational Information Maximization in Stochastic Environments. PhD thesis, School of Informatics, University of Edinburgh (2005)Google Scholar
  7. 7.
    Bishop, C., Svensen, M., Williams, C.K.I.: GTM: The Generative Topographic Mapping. Neural Computation 10(1), 215–234 (1998)CrossRefMATHGoogle Scholar
  8. 8.
    Tipping, M.E., Bishop, C.M.: Mixtures of Probabilistic Principal Component Analyzers. Neural Computation 11(2), 443–482 (1999)CrossRefGoogle Scholar
  9. 9.
    Linsker, R.: Deriving Receptive Fields Using an Optimal Encoding Criterion. In: Hanson, S., Cowan, J., Gilese, L. (eds.) Advances in Neural Information Processing Systems, vol. 5. Morgan Kaufmann, San Francisco (1993)Google Scholar
  10. 10.
    Neal, R.M., Hinton, G.E.: A View of the EM Algorithm That Justifies Incremental, Sparse, and Other Variants. In: Jordan, M. (ed.) Learning in Graphical Models. Kluwer Academic, Dordrecht (1998)Google Scholar
  11. 11.
    Cover, T.M., Thomas, J.A.: Elements of Information Theory. Wiley, Chichester (1991)CrossRefMATHGoogle Scholar
  12. 12.
    Arimoto, S.: An Algorithm for computing the capacity of arbitrary discrete memoryless channels. IT 18 (1972)Google Scholar
  13. 13.
    Blahut, R.: Computation of channel capacity and rate-distortion functions. IT 18 (1972)Google Scholar
  14. 14.
    Jacobs, R.A., Jordan, M.I., Nowlan, S.J., Hinton, G.E.: Adaptive Mixtures of Local Experts. Neural Computation 3 (1991)Google Scholar
  15. 15.
    Tishby, N., Pereira, F.C., Bialek, W.: The information bottleneck method. In: Proc. of the 37-th Annual Allerton Conference on Communication, Control and Computing, pp. 368–377 (1999)Google Scholar
  16. 16.
    LeCun, Y., Cortes, C.: The MNIST database of handwritten digits (1998)Google Scholar
  17. 17.
    Cardoso, J.F.: Infomax and maximum likelihood for blind source separation. IEEE Signal Processing Letters 4 (1997)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Felix Agakov
    • 1
  • David Barber
    • 2
  1. 1.University of EdinburghEdinburghUK
  2. 2.IDIAPMartignySwitzerland

Personalised recommendations