Skip to main content

Unifying Divergence Minimization and Statistical Inference Via Convex Duality

  • Conference paper
Learning Theory (COLT 2006)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 4005))

Included in the following conference series:

Abstract

In this paper we unify divergence minimization and statistical inference by means of convex duality. In the process of doing so, we prove that the dual of approximate maximum entropy estimation is maximum a posteriori estimation as a special case. Moreover, our treatment leads to stability and convergence bounds for many statistical learning problems. Finally, we show how an algorithm by Zhang can be used to solve this class of optimization problems efficiently.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Altun, Y., Hofmann, T., Smola, A.J.: Exponential families for conditional random fields. In: Uncertainty in Artificial Intelligence UAI, pp. 2–9 (2004)

    Google Scholar 

  2. Borgwardt, K., Gretton, A., Smola, A.J.: Kernel discrepancy estimation. Technical report, NICTA, Canberra (2006)

    Google Scholar 

  3. Borwein, J., Zhu, Q.J.: Techniques of Variational Analysis. Springer, Heidelberg (2005)

    MATH  Google Scholar 

  4. Borwein, J.M.: Semi-infinite programming: How special is it? In: Fiacco, A.V., Kortanek, K.O. (eds.) Semi-Infinite Programming and Applications. Springer, Heidelberg (1983)

    Google Scholar 

  5. Bousquet, O., Boucheron, S., Lugosi, G.: Theory of classification: a survey of recent advances. In: ESAIM: Probability and Statistics (submitted, 2004)

    Google Scholar 

  6. Bousquet, O., Elisseeff, A.: Stability and generalization. JMLR 2, 499–526 (2002)

    Article  MathSciNet  MATH  Google Scholar 

  7. Chen, S., Rosenfeld, R.: A Gaussian prior for smoothing maximum entropy models. Technical Report CMUCS-99-108, Carnegie Mellon University (1999)

    Google Scholar 

  8. Collins, M., Schapire, R.E., Singer, Y.: Logistic regression, adaboost and bregman distances. In: COLT 2000, pp. 158–169 (2000)

    Google Scholar 

  9. Cortes, C., Vapnik, V.: Support vector networks. Machine Learning 20, 273–297 (1995)

    MATH  Google Scholar 

  10. Dudík, M., Phillips, S.J., Schapire, R.E.: Performance guarantees for regularized maximum entropy density estimation. In: Shawe-Taylor, J., Singer, Y. (eds.) COLT 2004. LNCS, vol. 3120, pp. 472–486. Springer, Heidelberg (2004)

    Chapter  Google Scholar 

  11. Dudík, M., Schapire, R.E.: Maximum entropy distribution estimation with generalized regularization. In: Lugosi, G., Simon, H.U. (eds.) COLT 2006. LNCS, vol. 4005, pp. 123–138. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  12. Friedlander, M.P., Gupta, M.R.: On minimizing distortion and relative entropy. IEEE Transactions on Information Theory 52(1) (2006)

    Google Scholar 

  13. Kivinen, J., Warmuth, M.: Boosting as entropy projection. In: COLT 1999 (1999)

    Google Scholar 

  14. Lafferty, J.: Additive models, boosting, and inference for generalized divergences. In: COLT 1999, pp. 125–133. ACM Press, New York (1999)

    Chapter  Google Scholar 

  15. Le, Q.V., Smola, A.J., Canu, S.: Heteroscedastic gaussian process regression. In: International Conference on Machine Learning ICML 2005 (2005)

    Google Scholar 

  16. Morozov, V.A.: Methods for solving incorrectly posed problems. Springer, Heidelberg (1984)

    Google Scholar 

  17. Neal, R.: Priors for infinite networks. Technical report, U. Toronto (1994)

    Google Scholar 

  18. Nemenman, I., Bialek, W.: Occam factors and model independent bayesian learning of continuous distributions. Physical Review E 65(2), 6137 (2002)

    Article  Google Scholar 

  19. Rätsch, G., Mika, S., Warmuth, M.K.: On the convergence of leveraging. In: Advances in Neural Information Processing Systems (NIPS) (2002)

    Google Scholar 

  20. Rockafellar, R.T.: Convex Analysis. Princeton University Press, Princeton (1970)

    MATH  Google Scholar 

  21. Ruderman, D.L., Bialek, W.: Statistics of natural images: Scaling in the woods. Phys. Rev. Letters (1994)

    Google Scholar 

  22. Schölkopf, B., Smola, A.: Learning with Kernels. MIT Press, Cambridge (2002)

    Google Scholar 

  23. Shawe-Taylor, J., Cristianini, N.: Kernel Methods for Pattern Analysis. Cambridge University Press, Cambridge (2004)

    Google Scholar 

  24. Thikhonov, A.N., Arsenin, V.Y.: Solutions of Ill-Posed Problems. Wiley, Chichester (1977)

    Google Scholar 

  25. Wainwright, M.J., Jordan, M.I.: Graphical models, exponential families, and variational inference. Technical Report 649, UC Berkeley (September 2003)

    Google Scholar 

  26. Zhang, T.: Sequential greedy approximation for certain convex optimization problems. IEEE Transactions on Information Theory 49(3), 682–691 (2003)

    Article  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2006 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Altun, Y., Smola, A. (2006). Unifying Divergence Minimization and Statistical Inference Via Convex Duality. In: Lugosi, G., Simon, H.U. (eds) Learning Theory. COLT 2006. Lecture Notes in Computer Science(), vol 4005. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11776420_13

Download citation

  • DOI: https://doi.org/10.1007/11776420_13

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-35294-5

  • Online ISBN: 978-3-540-35296-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics