Skip to main content
Log in

Boosting conditional probability estimators

  • Published:
Annals of Mathematics and Artificial Intelligence Aims and scope Submit manuscript

Abstract

In the standard agnostic multiclass model, <instance, label > pairs are sampled independently from some underlying distribution. This distribution induces a conditional probability over the labels given an instance, and our goal in this paper is to learn this conditional distribution. Since even unconditional densities are quite challenging to learn, we give our learner access to <instance, conditional distribution > pairs. Assuming a base learner oracle in this model, we might seek a boosting algorithm for constructing a strong learner. Unfortunately, without further assumptions, this is provably impossible. However, we give a new boosting algorithm that succeeds in the following sense: given a base learner guaranteed to achieve some average accuracy (i.e., risk), we efficiently construct a learner that achieves the same level of accuracy with arbitrarily high probability. We give generalization guarantees of several different kinds, including distribution-free accuracy and risk bounds. None of our estimates depend on the number of boosting rounds and some of them admit dimension-free formulations.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Similar content being viewed by others

References

  1. Alon, N., Ben-David, S., Cesa-Bianchi, N., Haussler, D.: Scale-sensitive dimensions, uniform convergence, and learnability. J. ACM 44(4), 615–631 (1997)

    Article  MathSciNet  MATH  Google Scholar 

  2. Bartlett, P., Shawe-Taylor, J.: Generalization performance of support vector machines and other pattern classifiers (1999)

  3. Breiman, L.: Arcing classifier (with discussion and a rejoinder by the author). Ann. Statist. 26, 801–849 (1998)

    Article  MathSciNet  MATH  Google Scholar 

  4. Das, D., Petrov, S.: Unsupervised part-of-speech tagging with bilingual graph-based projections. In: Proceedings of the 49th annual meeting of the association for computational linguistics: human language technologies - volume 1, HLT ’11, pp. 600–609. Association for Computational Linguistics, Stroudsburg (2011)

    Google Scholar 

  5. Devroye, L., Lugosi, G.: Combinatorial methods in density estimation, springer series in statistics. Springer, New York (2001)

    Book  MATH  Google Scholar 

  6. Duffy, N., Helmbold, D.: Boosting methods for regression. Mach. Learn. 47, 153–200 (2002)

    Article  MATH  Google Scholar 

  7. Fan, W., Stolfo, S.J., Zhang, J., Chan, P.K.: Adacost: misclassification cost-sensitive boosting. In: ICML, pp. 97–105 (1999)

  8. Freund, Y., Schapire, R.E.: A decision-theoretic generalization of on-line learning and an application to boosting. J. Comput. Syst. Sci. 55(1), 119–139 (1997)

    Article  MathSciNet  MATH  Google Scholar 

  9. Friedman, J., Hastie, T., Tibshirani, R.: Additive logistic regression: a statistical view of boosting. Ann. Stat. 28, 337–374 (2000)

    Article  MathSciNet  MATH  Google Scholar 

  10. Gibbs, A.L., Su, F.E.: On choosing and bounding probability metrics. Int. Stat. Rev. 70(3), 419–435 (2002)

    Article  MATH  Google Scholar 

  11. Gottlieb, L.A., Kontorovich, L., Krauthgamer, R.: Efficient classification for metric data. In: COLT, pp. 433–440 (2010)

  12. Gupta, A., Krauthgamer, R., Lee, J.R.: Bounded geometries, fractals, and low-distortion embeddings. In: FOCS, pp. 534–543 (2003)

  13. Kanamori, T.: Deformation of log-likelihood loss function for multiclass boosting. Neural Netw. 23(7), 843–864 (2010)

    Article  Google Scholar 

  14. Krauthgamer, R., Lee, J.R.: Navigating nets: Simple algorithms for proximity search. In: 15th annual ACM-SIAM Symposium on discrete algorithms, pp. 791–801 (2004)

  15. McDiarmid, C.: On the method of bounded differences. In: Siemons, J. (ed.) Surveys in combinatorics of LMS lecture notes series, vol. 141, pp. 148–188. Morgan Kaufmann Publishers, San Mateo (1989)

    Google Scholar 

  16. Mease, D., Wyner, A.J., Buja, A.: Boosted classification trees and class probability/quantile estimation. J. Mach. Learn. Res. 8, 409–439 (2007)

    MATH  Google Scholar 

  17. Schapire, R.E., Freund, Y., Bartlett, P., Lee, W.S.: Boosting the margin: a new explanation for the effectiveness of voting methods. Ann. Statist. 26(5), 1651–1686 (1998)

    Article  MathSciNet  MATH  Google Scholar 

  18. Talagrand, M.: New concentration inequalities in product spaces. Invent. Math. 126(3), 505–563 (1996)

    Article  MathSciNet  MATH  Google Scholar 

  19. Toutanova, K., Cherry, C.: A global model for joint lemmatization and part-of-speech prediction. In: Proceedings of the joint conference of the 47th annual meeting of the ACL and the 4th international joint conference on natural language processing of the AFNLP, ACL ’09, pp. 486–494 (2009)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Aryeh Kontorovich.

Additional information

A preliminary version was invited to ISAIM 2014. A.K. was partially supported by the Israel Science Foundation (grant No. 1141/12) and a Yahoo Faculty award.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Gutfreund, D., Kontorovich, A., Levy, R. et al. Boosting conditional probability estimators. Ann Math Artif Intell 79, 129–144 (2017). https://doi.org/10.1007/s10472-015-9465-7

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10472-015-9465-7

Keywords

Mathematics Subject Classification (2010)

Navigation