Performance Guarantees for Regularized Maximum Entropy Density Estimation

Dudík, Miroslav; Phillips, Steven J.; Schapire, Robert E.

doi:10.1007/978-3-540-27819-1_33

Miroslav Dudík²⁰,
Steven J. Phillips²¹ &
Robert E. Schapire²⁰

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 3120))

Included in the following conference series:

International Conference on Computational Learning Theory

2378 Accesses
47 Citations

Abstract

We consider the problem of estimating an unknown probability distribution from samples using the principle of maximum entropy (maxent). To alleviate overfitting with a very large number of features, we propose applying the maxent principle with relaxed constraints on the expectations of the features. By convex duality, this turns out to be equivalent to finding the Gibbs distribution minimizing a regularized version of the empirical log loss. We prove non-asymptotic bounds showing that, with respect to the true underlying distribution, this relaxed version of maxent produces density estimates that are almost as good as the best possible. These bounds are in terms of the deviation of the feature empirical averages relative to their true expectations, a number that can be bounded using standard uniform-convergence techniques. In particular, this leads to bounds that drop quickly with the number of samples, and that depend very moderately on the number or complexity of the features. We also derive and prove convergence for both sequential-update and parallel-update algorithms. Finally, we briefly describe experiments on data relevant to the modeling of species geographical distributions.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Berger, A.L., Pietra, S.A.D., Pietra, V.J.D.: A maximum entropy approach to natural language processing. Computational Linguistics 22(1), 39–71 (1996)
Google Scholar
Chen, S.F., Rosenfeld, R.: A survey of smoothing techniques for ME models. IEEE Transactions on Speech and Audio Processing 8(1), 37–50 (2000)
Article Google Scholar
Collins, M., Schapire, R.E., Singer, Y.: Logistic regression, AdaBoost and Bregman distances. Machine Learning 48(1), 253–285 (2002)
Article MATH Google Scholar
Darroch, J.N., Ratcliff, D.: Generalized iterative scaling for log-linear models. The Annals of Mathematical Statistics 43(5), 1470–1480 (1972)
Article MATH MathSciNet Google Scholar
Dekel, O., Shalev-Shwartz, S., Singer, Y.: Smooth ε-insensitive regression by loss symmetrization. In: Proceedings of the Sixteenth Annual Conference on Computational Learning Theory, pp. 433–447. Springer, Heidelberg (2003)
Google Scholar
Pietra, S.D., Pietra, V.D., Lafferty, J.: Inducing features of random fields. IEEE Transactions on Pattern Analysis and Machine Intelligence 19(4), 1–13 (1997)
Google Scholar
Devroye, L.: Bounds for the uniform deviation of empirical measures. Journal of Multivariate Analysis 12, 72–79 (1982)
Article MATH MathSciNet Google Scholar
Goodman, J.: Exponential priors for maximum entropy models. Technical report, Microsoft Research (2003), Available from http://research.microsoft.com/~joshuago/longexponentialpriorp.ps
Jaynes, E.T.: Information theory and statistical mechanics. Physics Reviews 106, 620–630 (1957)
Article MathSciNet Google Scholar
Kazama, J., Tsujii, J.: Evaluation and extension of maximum entropy models with inequality constraints. In: Conference on Empirical Methods in Natural Language Processing, pp. 137–144 (2003)
Google Scholar
Malouf, R.: A comparison of algorithms for maximum entropy parameter estimation. In: Proceedings of the Sixth Conference on Natural Language Learning, pp. 49–55 (2002)
Google Scholar
New, M., Hulme, M., Jones, P.: Representing twentieth-century space-time climate variability, Part 1: Development of a 1961-90 mean monthly terrestrial climatology. Journal of Climate 12, 829–856 (1999)
Article Google Scholar
Phillips, S.J., Dudík, M., Schapire, R.E.: A maximum entropy approach to species distribution modeling. In: Proceedings of the Twenty-First International Conference on Machine Learning (2004)
Google Scholar
Tyrrell Rockafellar, R.: Convex Analysis. Princeton University Press, Princeton (1970)
MATH Google Scholar
Rosset, S., Segal, E.: Boosting density estimation. In: Advances in Neural Information Processing Systems 15, pp. 641–648. MIT Press, Cambridge (2003)
Google Scholar
Salakhutdinov, R., Roweis, S.T., Ghahramani, Z.: On the convergence of bound optimization algorithms. Uncertainty in Artificial Intelligence 19, pp. 509–516 (2003)
Google Scholar
Sauer, J.R., Hines, E., Fallon, J.: The North American breeding bird survey, results and analysis 1966–2000, Version 2001.2. USGS PatuxentWildlife Research Center, Laurel, MD (2001), http://www.mbr-pwrc.usgs.gov/bbs/bbs.html
USGS. HYDRO 1k, elevation derivative database. United States Geological Survey, Sioux Falls, South Dakota (2001), Available at http://edcdaac.usgs.gov/gtopo30/hydro/
Welling, M., Zemel, R.S., Hinton, G.E.: Self supervised boosting. In: Advances in Neural Information Processing Systems 15, pp. 665–672. MIT Press, Cambridge (2003)
Google Scholar
Williams, P.M.: Bayesian regularization and pruning using a Laplace prior. Neural Computation 7(1), 117–143 (1995)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, Princeton University, 35 Olden Street, Princeton, NJ, 08544, USA
Miroslav Dudík & Robert E. Schapire
AT&T Labs – Research, 180 Park Avenue, Florham Park, NJ, 07932, USA
Steven J. Phillips

Authors

Miroslav Dudík
View author publications
You can also search for this author in PubMed Google Scholar
Steven J. Phillips
View author publications
You can also search for this author in PubMed Google Scholar
Robert E. Schapire
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

The Centre for Computational Statistics and Machine Learning Department of Computer Science, University College London, Gower St., WC1E 6BT, London
John Shawe-Taylor
Google, 1600 Amphitheater Parkway, CA 94043, Mountain View, USA
Yoram Singer

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Dudík, M., Phillips, S.J., Schapire, R.E. (2004). Performance Guarantees for Regularized Maximum Entropy Density Estimation. In: Shawe-Taylor, J., Singer, Y. (eds) Learning Theory. COLT 2004. Lecture Notes in Computer Science(), vol 3120. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-27819-1_33

Download citation

DOI: https://doi.org/10.1007/978-3-540-27819-1_33
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-22282-8
Online ISBN: 978-3-540-27819-1
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics