Maximum Entropy and Gaussian Models for Image Object Recognition
The principle of maximum entropy is a powerful framework that can be used to estimate class posterior probabilities for pattern recognition tasks. In this paper, we show how this principle is related to the discriminative training of Gaussian mixture densities using the maximum mutual information criterion. This leads to a relaxation of the constraints on the covariance matrices to be positive (semi-) definite. Thus, we arrive at a conceptually simple model that allows to estimate a large number of free parameters reliably. We compare the proposed method with other state-of-the-art approaches in experiments with the well known US Postal Service handwritten digits recognition task.
KeywordsMaximum Entropy Covariance Matrice Gaussian Model Neural Information Processing System Relevance Vector Machine
Unable to display preview. Download preview PDF.
- 1.A.L. Berger, S.A. Della Pietra, V.J. Della Pietra: A Maximum Entropy Approach to Natural Language Processing. Computational Linguistics, 22(1):39–72, March 1996.Google Scholar
- 3.J. Dahmen, R. Schlüter, H. Ney: Discriminative Training of Gaussian Mixture Densities for Image Object Recognition. In 21. DAGM Symposium Mustererkennung, Bonn, Germany, pp. 205–212, September 1999.Google Scholar
- 5.T. Jaakkola, M. Meila, T. Jebara: Maximum Entropy Discrimination. In Advances in Neural Information Processing Systems 12, MIT Press, Cambridge, MA, pp. 470–476, 2000.Google Scholar
- 7.D. Keysers, J. Dahmen, T. Theiner, H. Ney: Experiments with an Extended Tangent Distance. In Proc. 15th IEEE Int. Conf. on Pattern Recognition, volume 2, Barcelona, Spain, pp. 38–42, September 2000.Google Scholar
- 8.K. Nigam, J. Lafferty, A. McCallum: Using Maximum Entropy for Text Classification. In IJCAI-99 Workshop on Machine Learning for Information Filtering, Stockholm, Sweden, pp. 61–67, August 1999.Google Scholar
- 9.Y. Normandin: Maximum Mutual Information Estimation of Hidden Markov Models. In C.H. Lee, F.K. Soong, K.K. Paliwal (Eds.): Automatic Speech and Speaker Recognition, Kluwer Academic Publishers, Norwell, MA, pp. 57–81, 1996.Google Scholar
- 12.B. Schölkopf, P. Simard, A. Smola, V. Vapnik: Prior Knowledge in Support Vector Kernels. In Advances in Neural Information Processing Systems 10. MIT Press, pp. 640–646, 1998.Google Scholar
- 13.P. Simard, Y. Le Cun, J. Denker, B. Victorri: Transformation Invariance in Pattern Recognition — Tangent Distance and Tangent Propagation. In G. Orr, K.R. Müller (Eds.): Neural Networks: Tricks of the Trade, volume 1524 of Lecture Notes in Computer Science, Springer, Heidelberg, pp. 239–274, 1998.CrossRefGoogle Scholar
- 14.P. Simard, Y. Le Cun, J. Denker: Efficient Pattern Recognition Using a New Transformation Distance. In Advances in Neural Information Processing Systems 5, Morgan Kaufmann, San Mateo, CA, pp. 50–58, 1993.Google Scholar
- 15.M.E. Tipping: The Relevance Vector Machine. In Advances in Neural Information Processing Systems 12. MIT Press, pp. 332–388, 2000.Google Scholar