Maximum Entropy and Gaussian Models for Image Object Recognition

  • Daniel Keysers
  • Franz Josef Och
  • Hermann Ney
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 2449)


The principle of maximum entropy is a powerful framework that can be used to estimate class posterior probabilities for pattern recognition tasks. In this paper, we show how this principle is related to the discriminative training of Gaussian mixture densities using the maximum mutual information criterion. This leads to a relaxation of the constraints on the covariance matrices to be positive (semi-) definite. Thus, we arrive at a conceptually simple model that allows to estimate a large number of free parameters reliably. We compare the proposed method with other state-of-the-art approaches in experiments with the well known US Postal Service handwritten digits recognition task.


Maximum Entropy Covariance Matrice Gaussian Model Neural Information Processing System Relevance Vector Machine 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    A.L. Berger, S.A. Della Pietra, V.J. Della Pietra: A Maximum Entropy Approach to Natural Language Processing. Computational Linguistics, 22(1):39–72, March 1996.Google Scholar
  2. 2.
    J. Dahmen, D. Keysers, H. Ney, M.O. Güld: Statistical Image Object Recognition using Mixture Densities. J. Mathematical Imaging and Vision, 14(3):285–296, May 2001.zbMATHCrossRefGoogle Scholar
  3. 3.
    J. Dahmen, R. Schlüter, H. Ney: Discriminative Training of Gaussian Mixture Densities for Image Object Recognition. In 21. DAGM Symposium Mustererkennung, Bonn, Germany, pp. 205–212, September 1999.Google Scholar
  4. 4.
    J.N. Darroch, D. Ratcliff: Generalized Iterative Scaling for Log-Linear Models. Annals of Mathematical Statistics, 43(5):1470–1480, 1972.zbMATHMathSciNetCrossRefGoogle Scholar
  5. 5.
    T. Jaakkola, M. Meila, T. Jebara: Maximum Entropy Discrimination. In Advances in Neural Information Processing Systems 12, MIT Press, Cambridge, MA, pp. 470–476, 2000.Google Scholar
  6. 6.
    E.T. Jaynes: On the Rationale of Maximum Entropy Models. Proc. of the IEEE, 70(9):939–952, September 1982.CrossRefGoogle Scholar
  7. 7.
    D. Keysers, J. Dahmen, T. Theiner, H. Ney: Experiments with an Extended Tangent Distance. In Proc. 15th IEEE Int. Conf. on Pattern Recognition, volume 2, Barcelona, Spain, pp. 38–42, September 2000.Google Scholar
  8. 8.
    K. Nigam, J. Lafferty, A. McCallum: Using Maximum Entropy for Text Classification. In IJCAI-99 Workshop on Machine Learning for Information Filtering, Stockholm, Sweden, pp. 61–67, August 1999.Google Scholar
  9. 9.
    Y. Normandin: Maximum Mutual Information Estimation of Hidden Markov Models. In C.H. Lee, F.K. Soong, K.K. Paliwal (Eds.): Automatic Speech and Speaker Recognition, Kluwer Academic Publishers, Norwell, MA, pp. 57–81, 1996.Google Scholar
  10. 10.
    W.H. Press, S.A. Teukolsky, W.T. Vetterling, B.P. Flannery: Numerical Recipes in C. Cambridge University Press, Cambridge, second edition, 1992.zbMATHGoogle Scholar
  11. 11.
    B. Schölkopf: Support Vector Learning. Oldenbourg Verlag, Munich, 1997.zbMATHGoogle Scholar
  12. 12.
    B. Schölkopf, P. Simard, A. Smola, V. Vapnik: Prior Knowledge in Support Vector Kernels. In Advances in Neural Information Processing Systems 10. MIT Press, pp. 640–646, 1998.Google Scholar
  13. 13.
    P. Simard, Y. Le Cun, J. Denker, B. Victorri: Transformation Invariance in Pattern Recognition — Tangent Distance and Tangent Propagation. In G. Orr, K.R. Müller (Eds.): Neural Networks: Tricks of the Trade, volume 1524 of Lecture Notes in Computer Science, Springer, Heidelberg, pp. 239–274, 1998.CrossRefGoogle Scholar
  14. 14.
    P. Simard, Y. Le Cun, J. Denker: Efficient Pattern Recognition Using a New Transformation Distance. In Advances in Neural Information Processing Systems 5, Morgan Kaufmann, San Mateo, CA, pp. 50–58, 1993.Google Scholar
  15. 15.
    M.E. Tipping: The Relevance Vector Machine. In Advances in Neural Information Processing Systems 12. MIT Press, pp. 332–388, 2000.Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2002

Authors and Affiliations

  • Daniel Keysers
    • 1
  • Franz Josef Och
    • 1
  • Hermann Ney
    • 1
  1. 1.Lehrstuhl für Informatik VI, Computer Science DepartmentRWTH Aachen — University of TechnologyAachenGermany

Personalised recommendations