Performance Guarantees for Regularized Maximum Entropy Density Estimation

  • Miroslav Dudík
  • Steven J. Phillips
  • Robert E. Schapire
Conference paper

DOI: 10.1007/978-3-540-27819-1_33

Part of the Lecture Notes in Computer Science book series (LNCS, volume 3120)
Cite this paper as:
Dudík M., Phillips S.J., Schapire R.E. (2004) Performance Guarantees for Regularized Maximum Entropy Density Estimation. In: Shawe-Taylor J., Singer Y. (eds) Learning Theory. COLT 2004. Lecture Notes in Computer Science, vol 3120. Springer, Berlin, Heidelberg

Abstract

We consider the problem of estimating an unknown probability distribution from samples using the principle of maximum entropy (maxent). To alleviate overfitting with a very large number of features, we propose applying the maxent principle with relaxed constraints on the expectations of the features. By convex duality, this turns out to be equivalent to finding the Gibbs distribution minimizing a regularized version of the empirical log loss. We prove non-asymptotic bounds showing that, with respect to the true underlying distribution, this relaxed version of maxent produces density estimates that are almost as good as the best possible. These bounds are in terms of the deviation of the feature empirical averages relative to their true expectations, a number that can be bounded using standard uniform-convergence techniques. In particular, this leads to bounds that drop quickly with the number of samples, and that depend very moderately on the number or complexity of the features. We also derive and prove convergence for both sequential-update and parallel-update algorithms. Finally, we briefly describe experiments on data relevant to the modeling of species geographical distributions.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Copyright information

© Springer-Verlag Berlin Heidelberg 2004

Authors and Affiliations

  • Miroslav Dudík
    • 1
  • Steven J. Phillips
    • 2
  • Robert E. Schapire
    • 1
  1. 1.Department of Computer SciencePrinceton UniversityPrincetonUSA
  2. 2.AT&T Labs – ResearchFlorham ParkUSA

Personalised recommendations