Skip to main content

Performance Guarantees for Regularized Maximum Entropy Density Estimation

  • Conference paper
Learning Theory (COLT 2004)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 3120))

Included in the following conference series:

Abstract

We consider the problem of estimating an unknown probability distribution from samples using the principle of maximum entropy (maxent). To alleviate overfitting with a very large number of features, we propose applying the maxent principle with relaxed constraints on the expectations of the features. By convex duality, this turns out to be equivalent to finding the Gibbs distribution minimizing a regularized version of the empirical log loss. We prove non-asymptotic bounds showing that, with respect to the true underlying distribution, this relaxed version of maxent produces density estimates that are almost as good as the best possible. These bounds are in terms of the deviation of the feature empirical averages relative to their true expectations, a number that can be bounded using standard uniform-convergence techniques. In particular, this leads to bounds that drop quickly with the number of samples, and that depend very moderately on the number or complexity of the features. We also derive and prove convergence for both sequential-update and parallel-update algorithms. Finally, we briefly describe experiments on data relevant to the modeling of species geographical distributions.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Berger, A.L., Pietra, S.A.D., Pietra, V.J.D.: A maximum entropy approach to natural language processing. Computational Linguistics 22(1), 39–71 (1996)

    Google Scholar 

  2. Chen, S.F., Rosenfeld, R.: A survey of smoothing techniques for ME models. IEEE Transactions on Speech and Audio Processing 8(1), 37–50 (2000)

    Article  Google Scholar 

  3. Collins, M., Schapire, R.E., Singer, Y.: Logistic regression, AdaBoost and Bregman distances. Machine Learning 48(1), 253–285 (2002)

    Article  MATH  Google Scholar 

  4. Darroch, J.N., Ratcliff, D.: Generalized iterative scaling for log-linear models. The Annals of Mathematical Statistics 43(5), 1470–1480 (1972)

    Article  MATH  MathSciNet  Google Scholar 

  5. Dekel, O., Shalev-Shwartz, S., Singer, Y.: Smooth ε-insensitive regression by loss symmetrization. In: Proceedings of the Sixteenth Annual Conference on Computational Learning Theory, pp. 433–447. Springer, Heidelberg (2003)

    Google Scholar 

  6. Pietra, S.D., Pietra, V.D., Lafferty, J.: Inducing features of random fields. IEEE Transactions on Pattern Analysis and Machine Intelligence 19(4), 1–13 (1997)

    Google Scholar 

  7. Devroye, L.: Bounds for the uniform deviation of empirical measures. Journal of Multivariate Analysis 12, 72–79 (1982)

    Article  MATH  MathSciNet  Google Scholar 

  8. Goodman, J.: Exponential priors for maximum entropy models. Technical report, Microsoft Research (2003), Available from http://research.microsoft.com/~joshuago/longexponentialpriorp.ps

  9. Jaynes, E.T.: Information theory and statistical mechanics. Physics Reviews 106, 620–630 (1957)

    Article  MathSciNet  Google Scholar 

  10. Kazama, J., Tsujii, J.: Evaluation and extension of maximum entropy models with inequality constraints. In: Conference on Empirical Methods in Natural Language Processing, pp. 137–144 (2003)

    Google Scholar 

  11. Malouf, R.: A comparison of algorithms for maximum entropy parameter estimation. In: Proceedings of the Sixth Conference on Natural Language Learning, pp. 49–55 (2002)

    Google Scholar 

  12. New, M., Hulme, M., Jones, P.: Representing twentieth-century space-time climate variability, Part 1: Development of a 1961-90 mean monthly terrestrial climatology. Journal of Climate 12, 829–856 (1999)

    Article  Google Scholar 

  13. Phillips, S.J., Dudík, M., Schapire, R.E.: A maximum entropy approach to species distribution modeling. In: Proceedings of the Twenty-First International Conference on Machine Learning (2004)

    Google Scholar 

  14. Tyrrell Rockafellar, R.: Convex Analysis. Princeton University Press, Princeton (1970)

    MATH  Google Scholar 

  15. Rosset, S., Segal, E.: Boosting density estimation. In: Advances in Neural Information Processing Systems 15, pp. 641–648. MIT Press, Cambridge (2003)

    Google Scholar 

  16. Salakhutdinov, R., Roweis, S.T., Ghahramani, Z.: On the convergence of bound optimization algorithms. Uncertainty in Artificial Intelligence 19, pp. 509–516 (2003)

    Google Scholar 

  17. Sauer, J.R., Hines, E., Fallon, J.: The North American breeding bird survey, results and analysis 1966–2000, Version 2001.2. USGS PatuxentWildlife Research Center, Laurel, MD (2001), http://www.mbr-pwrc.usgs.gov/bbs/bbs.html

  18. USGS. HYDRO 1k, elevation derivative database. United States Geological Survey, Sioux Falls, South Dakota (2001), Available at http://edcdaac.usgs.gov/gtopo30/hydro/

  19. Welling, M., Zemel, R.S., Hinton, G.E.: Self supervised boosting. In: Advances in Neural Information Processing Systems 15, pp. 665–672. MIT Press, Cambridge (2003)

    Google Scholar 

  20. Williams, P.M.: Bayesian regularization and pruning using a Laplace prior. Neural Computation 7(1), 117–143 (1995)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2004 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Dudík, M., Phillips, S.J., Schapire, R.E. (2004). Performance Guarantees for Regularized Maximum Entropy Density Estimation. In: Shawe-Taylor, J., Singer, Y. (eds) Learning Theory. COLT 2004. Lecture Notes in Computer Science(), vol 3120. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-27819-1_33

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-27819-1_33

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-22282-8

  • Online ISBN: 978-3-540-27819-1

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics