Boosting conditional probability estimators

  • Dan Gutfreund
  • Aryeh KontorovichEmail author
  • Ran Levy
  • Michal Rosen-Zvi


In the standard agnostic multiclass model, <instance, label > pairs are sampled independently from some underlying distribution. This distribution induces a conditional probability over the labels given an instance, and our goal in this paper is to learn this conditional distribution. Since even unconditional densities are quite challenging to learn, we give our learner access to <instance, conditional distribution > pairs. Assuming a base learner oracle in this model, we might seek a boosting algorithm for constructing a strong learner. Unfortunately, without further assumptions, this is provably impossible. However, we give a new boosting algorithm that succeeds in the following sense: given a base learner guaranteed to achieve some average accuracy (i.e., risk), we efficiently construct a learner that achieves the same level of accuracy with arbitrarily high probability. We give generalization guarantees of several different kinds, including distribution-free accuracy and risk bounds. None of our estimates depend on the number of boosting rounds and some of them admit dimension-free formulations.


Boosting Conditional density 

Mathematics Subject Classification (2010)



Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Alon, N., Ben-David, S., Cesa-Bianchi, N., Haussler, D.: Scale-sensitive dimensions, uniform convergence, and learnability. J. ACM 44(4), 615–631 (1997)MathSciNetCrossRefzbMATHGoogle Scholar
  2. 2.
    Bartlett, P., Shawe-Taylor, J.: Generalization performance of support vector machines and other pattern classifiers (1999)Google Scholar
  3. 3.
    Breiman, L.: Arcing classifier (with discussion and a rejoinder by the author). Ann. Statist. 26, 801–849 (1998)MathSciNetCrossRefzbMATHGoogle Scholar
  4. 4.
    Das, D., Petrov, S.: Unsupervised part-of-speech tagging with bilingual graph-based projections. In: Proceedings of the 49th annual meeting of the association for computational linguistics: human language technologies - volume 1, HLT ’11, pp. 600–609. Association for Computational Linguistics, Stroudsburg (2011)Google Scholar
  5. 5.
    Devroye, L., Lugosi, G.: Combinatorial methods in density estimation, springer series in statistics. Springer, New York (2001)CrossRefzbMATHGoogle Scholar
  6. 6.
    Duffy, N., Helmbold, D.: Boosting methods for regression. Mach. Learn. 47, 153–200 (2002)CrossRefzbMATHGoogle Scholar
  7. 7.
    Fan, W., Stolfo, S.J., Zhang, J., Chan, P.K.: Adacost: misclassification cost-sensitive boosting. In: ICML, pp. 97–105 (1999)Google Scholar
  8. 8.
    Freund, Y., Schapire, R.E.: A decision-theoretic generalization of on-line learning and an application to boosting. J. Comput. Syst. Sci. 55(1), 119–139 (1997)MathSciNetCrossRefzbMATHGoogle Scholar
  9. 9.
    Friedman, J., Hastie, T., Tibshirani, R.: Additive logistic regression: a statistical view of boosting. Ann. Stat. 28, 337–374 (2000)MathSciNetCrossRefzbMATHGoogle Scholar
  10. 10.
    Gibbs, A.L., Su, F.E.: On choosing and bounding probability metrics. Int. Stat. Rev. 70(3), 419–435 (2002)CrossRefzbMATHGoogle Scholar
  11. 11.
    Gottlieb, L.A., Kontorovich, L., Krauthgamer, R.: Efficient classification for metric data. In: COLT, pp. 433–440 (2010)Google Scholar
  12. 12.
    Gupta, A., Krauthgamer, R., Lee, J.R.: Bounded geometries, fractals, and low-distortion embeddings. In: FOCS, pp. 534–543 (2003)Google Scholar
  13. 13.
    Kanamori, T.: Deformation of log-likelihood loss function for multiclass boosting. Neural Netw. 23(7), 843–864 (2010)CrossRefGoogle Scholar
  14. 14.
    Krauthgamer, R., Lee, J.R.: Navigating nets: Simple algorithms for proximity search. In: 15th annual ACM-SIAM Symposium on discrete algorithms, pp. 791–801 (2004)Google Scholar
  15. 15.
    McDiarmid, C.: On the method of bounded differences. In: Siemons, J. (ed.) Surveys in combinatorics of LMS lecture notes series, vol. 141, pp. 148–188. Morgan Kaufmann Publishers, San Mateo (1989)Google Scholar
  16. 16.
    Mease, D., Wyner, A.J., Buja, A.: Boosted classification trees and class probability/quantile estimation. J. Mach. Learn. Res. 8, 409–439 (2007)zbMATHGoogle Scholar
  17. 17.
    Schapire, R.E., Freund, Y., Bartlett, P., Lee, W.S.: Boosting the margin: a new explanation for the effectiveness of voting methods. Ann. Statist. 26(5), 1651–1686 (1998)MathSciNetCrossRefzbMATHGoogle Scholar
  18. 18.
    Talagrand, M.: New concentration inequalities in product spaces. Invent. Math. 126(3), 505–563 (1996)MathSciNetCrossRefzbMATHGoogle Scholar
  19. 19.
    Toutanova, K., Cherry, C.: A global model for joint lemmatization and part-of-speech prediction. In: Proceedings of the joint conference of the 47th annual meeting of the ACL and the 4th international joint conference on natural language processing of the AFNLP, ACL ’09, pp. 486–494 (2009)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2015

Authors and Affiliations

  • Dan Gutfreund
    • 1
  • Aryeh Kontorovich
    • 2
    Email author
  • Ran Levy
    • 1
  • Michal Rosen-Zvi
    • 1
  1. 1.IBM ResearchRueschlikonSwitzerland
  2. 2.Ben-Gurion University of the NegevBeer ShevaIsrael

Personalised recommendations