Unifying Divergence Minimization and Statistical Inference Via Convex Duality

Altun, Yasemin; Smola, Alex

doi:10.1007/11776420_13

Yasemin Altun²⁰ &
Alex Smola²¹

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 4005))

Included in the following conference series:

International Conference on Computational Learning Theory

2949 Accesses
24 Citations

Abstract

In this paper we unify divergence minimization and statistical inference by means of convex duality. In the process of doing so, we prove that the dual of approximate maximum entropy estimation is maximum a posteriori estimation as a special case. Moreover, our treatment leads to stability and convergence bounds for many statistical learning problems. Finally, we show how an algorithm by Zhang can be used to solve this class of optimization problems efficiently.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Altun, Y., Hofmann, T., Smola, A.J.: Exponential families for conditional random fields. In: Uncertainty in Artificial Intelligence UAI, pp. 2–9 (2004)
Google Scholar
Borgwardt, K., Gretton, A., Smola, A.J.: Kernel discrepancy estimation. Technical report, NICTA, Canberra (2006)
Google Scholar
Borwein, J., Zhu, Q.J.: Techniques of Variational Analysis. Springer, Heidelberg (2005)
MATH Google Scholar
Borwein, J.M.: Semi-infinite programming: How special is it? In: Fiacco, A.V., Kortanek, K.O. (eds.) Semi-Infinite Programming and Applications. Springer, Heidelberg (1983)
Google Scholar
Bousquet, O., Boucheron, S., Lugosi, G.: Theory of classification: a survey of recent advances. In: ESAIM: Probability and Statistics (submitted, 2004)
Google Scholar
Bousquet, O., Elisseeff, A.: Stability and generalization. JMLR 2, 499–526 (2002)
Article MathSciNet MATH Google Scholar
Chen, S., Rosenfeld, R.: A Gaussian prior for smoothing maximum entropy models. Technical Report CMUCS-99-108, Carnegie Mellon University (1999)
Google Scholar
Collins, M., Schapire, R.E., Singer, Y.: Logistic regression, adaboost and bregman distances. In: COLT 2000, pp. 158–169 (2000)
Google Scholar
Cortes, C., Vapnik, V.: Support vector networks. Machine Learning 20, 273–297 (1995)
MATH Google Scholar
Dudík, M., Phillips, S.J., Schapire, R.E.: Performance guarantees for regularized maximum entropy density estimation. In: Shawe-Taylor, J., Singer, Y. (eds.) COLT 2004. LNCS, vol. 3120, pp. 472–486. Springer, Heidelberg (2004)
Chapter Google Scholar
Dudík, M., Schapire, R.E.: Maximum entropy distribution estimation with generalized regularization. In: Lugosi, G., Simon, H.U. (eds.) COLT 2006. LNCS, vol. 4005, pp. 123–138. Springer, Heidelberg (2006)
Chapter Google Scholar
Friedlander, M.P., Gupta, M.R.: On minimizing distortion and relative entropy. IEEE Transactions on Information Theory 52(1) (2006)
Google Scholar
Kivinen, J., Warmuth, M.: Boosting as entropy projection. In: COLT 1999 (1999)
Google Scholar
Lafferty, J.: Additive models, boosting, and inference for generalized divergences. In: COLT 1999, pp. 125–133. ACM Press, New York (1999)
Chapter Google Scholar
Le, Q.V., Smola, A.J., Canu, S.: Heteroscedastic gaussian process regression. In: International Conference on Machine Learning ICML 2005 (2005)
Google Scholar
Morozov, V.A.: Methods for solving incorrectly posed problems. Springer, Heidelberg (1984)
Google Scholar
Neal, R.: Priors for infinite networks. Technical report, U. Toronto (1994)
Google Scholar
Nemenman, I., Bialek, W.: Occam factors and model independent bayesian learning of continuous distributions. Physical Review E 65(2), 6137 (2002)
Article Google Scholar
Rätsch, G., Mika, S., Warmuth, M.K.: On the convergence of leveraging. In: Advances in Neural Information Processing Systems (NIPS) (2002)
Google Scholar
Rockafellar, R.T.: Convex Analysis. Princeton University Press, Princeton (1970)
MATH Google Scholar
Ruderman, D.L., Bialek, W.: Statistics of natural images: Scaling in the woods. Phys. Rev. Letters (1994)
Google Scholar
Schölkopf, B., Smola, A.: Learning with Kernels. MIT Press, Cambridge (2002)
Google Scholar
Shawe-Taylor, J., Cristianini, N.: Kernel Methods for Pattern Analysis. Cambridge University Press, Cambridge (2004)
Google Scholar
Thikhonov, A.N., Arsenin, V.Y.: Solutions of Ill-Posed Problems. Wiley, Chichester (1977)
Google Scholar
Wainwright, M.J., Jordan, M.I.: Graphical models, exponential families, and variational inference. Technical Report 649, UC Berkeley (September 2003)
Google Scholar
Zhang, T.: Sequential greedy approximation for certain convex optimization problems. IEEE Transactions on Information Theory 49(3), 682–691 (2003)
Article MATH Google Scholar

Download references

Author information

Authors and Affiliations

Toyota Technological Institute at Chicago, Chicago, IL, 60637, USA
Yasemin Altun
National ICT Australia, North Road, Canberra, 0200, ACT, Australia
Alex Smola

Authors

Yasemin Altun
View author publications
You can also search for this author in PubMed Google Scholar
Alex Smola
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

ICREA and Department of Economics, Universitat Pompeu Fabra, Ramon Trias Fargas 25-27, 08005, Barcelona, Spain
Gábor Lugosi
Ruhr-Universität Bochum, Germany
Hans Ulrich Simon

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Altun, Y., Smola, A. (2006). Unifying Divergence Minimization and Statistical Inference Via Convex Duality. In: Lugosi, G., Simon, H.U. (eds) Learning Theory. COLT 2006. Lecture Notes in Computer Science(), vol 4005. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11776420_13

Download citation

DOI: https://doi.org/10.1007/11776420_13
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-35294-5
Online ISBN: 978-3-540-35296-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics