Abstract
We consider the problem of estimating an unknown probability distribution from samples using the principle of maximum entropy (maxent). To alleviate overfitting with a very large number of features, we propose applying the maxent principle with relaxed constraints on the expectations of the features. By convex duality, this turns out to be equivalent to finding the Gibbs distribution minimizing a regularized version of the empirical log loss. We prove non-asymptotic bounds showing that, with respect to the true underlying distribution, this relaxed version of maxent produces density estimates that are almost as good as the best possible. These bounds are in terms of the deviation of the feature empirical averages relative to their true expectations, a number that can be bounded using standard uniform-convergence techniques. In particular, this leads to bounds that drop quickly with the number of samples, and that depend very moderately on the number or complexity of the features. We also derive and prove convergence for both sequential-update and parallel-update algorithms. Finally, we briefly describe experiments on data relevant to the modeling of species geographical distributions.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Berger, A.L., Pietra, S.A.D., Pietra, V.J.D.: A maximum entropy approach to natural language processing. Computational Linguistics 22(1), 39–71 (1996)
Chen, S.F., Rosenfeld, R.: A survey of smoothing techniques for ME models. IEEE Transactions on Speech and Audio Processing 8(1), 37–50 (2000)
Collins, M., Schapire, R.E., Singer, Y.: Logistic regression, AdaBoost and Bregman distances. Machine Learning 48(1), 253–285 (2002)
Darroch, J.N., Ratcliff, D.: Generalized iterative scaling for log-linear models. The Annals of Mathematical Statistics 43(5), 1470–1480 (1972)
Dekel, O., Shalev-Shwartz, S., Singer, Y.: Smooth ε-insensitive regression by loss symmetrization. In: Proceedings of the Sixteenth Annual Conference on Computational Learning Theory, pp. 433–447. Springer, Heidelberg (2003)
Pietra, S.D., Pietra, V.D., Lafferty, J.: Inducing features of random fields. IEEE Transactions on Pattern Analysis and Machine Intelligence 19(4), 1–13 (1997)
Devroye, L.: Bounds for the uniform deviation of empirical measures. Journal of Multivariate Analysis 12, 72–79 (1982)
Goodman, J.: Exponential priors for maximum entropy models. Technical report, Microsoft Research (2003), Available from http://research.microsoft.com/~joshuago/longexponentialpriorp.ps
Jaynes, E.T.: Information theory and statistical mechanics. Physics Reviews 106, 620–630 (1957)
Kazama, J., Tsujii, J.: Evaluation and extension of maximum entropy models with inequality constraints. In: Conference on Empirical Methods in Natural Language Processing, pp. 137–144 (2003)
Malouf, R.: A comparison of algorithms for maximum entropy parameter estimation. In: Proceedings of the Sixth Conference on Natural Language Learning, pp. 49–55 (2002)
New, M., Hulme, M., Jones, P.: Representing twentieth-century space-time climate variability, Part 1: Development of a 1961-90 mean monthly terrestrial climatology. Journal of Climate 12, 829–856 (1999)
Phillips, S.J., Dudík, M., Schapire, R.E.: A maximum entropy approach to species distribution modeling. In: Proceedings of the Twenty-First International Conference on Machine Learning (2004)
Tyrrell Rockafellar, R.: Convex Analysis. Princeton University Press, Princeton (1970)
Rosset, S., Segal, E.: Boosting density estimation. In: Advances in Neural Information Processing Systems 15, pp. 641–648. MIT Press, Cambridge (2003)
Salakhutdinov, R., Roweis, S.T., Ghahramani, Z.: On the convergence of bound optimization algorithms. Uncertainty in Artificial Intelligence 19, pp. 509–516 (2003)
Sauer, J.R., Hines, E., Fallon, J.: The North American breeding bird survey, results and analysis 1966–2000, Version 2001.2. USGS PatuxentWildlife Research Center, Laurel, MD (2001), http://www.mbr-pwrc.usgs.gov/bbs/bbs.html
USGS. HYDRO 1k, elevation derivative database. United States Geological Survey, Sioux Falls, South Dakota (2001), Available at http://edcdaac.usgs.gov/gtopo30/hydro/
Welling, M., Zemel, R.S., Hinton, G.E.: Self supervised boosting. In: Advances in Neural Information Processing Systems 15, pp. 665–672. MIT Press, Cambridge (2003)
Williams, P.M.: Bayesian regularization and pruning using a Laplace prior. Neural Computation 7(1), 117–143 (1995)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2004 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Dudík, M., Phillips, S.J., Schapire, R.E. (2004). Performance Guarantees for Regularized Maximum Entropy Density Estimation. In: Shawe-Taylor, J., Singer, Y. (eds) Learning Theory. COLT 2004. Lecture Notes in Computer Science(), vol 3120. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-27819-1_33
Download citation
DOI: https://doi.org/10.1007/978-3-540-27819-1_33
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-22282-8
Online ISBN: 978-3-540-27819-1
eBook Packages: Springer Book Archive