Boosting conditional probability estimators
- 152 Downloads
In the standard agnostic multiclass model, <instance, label > pairs are sampled independently from some underlying distribution. This distribution induces a conditional probability over the labels given an instance, and our goal in this paper is to learn this conditional distribution. Since even unconditional densities are quite challenging to learn, we give our learner access to <instance, conditional distribution > pairs. Assuming a base learner oracle in this model, we might seek a boosting algorithm for constructing a strong learner. Unfortunately, without further assumptions, this is provably impossible. However, we give a new boosting algorithm that succeeds in the following sense: given a base learner guaranteed to achieve some average accuracy (i.e., risk), we efficiently construct a learner that achieves the same level of accuracy with arbitrarily high probability. We give generalization guarantees of several different kinds, including distribution-free accuracy and risk bounds. None of our estimates depend on the number of boosting rounds and some of them admit dimension-free formulations.
KeywordsBoosting Conditional density
Mathematics Subject Classification (2010)65C50
Unable to display preview. Download preview PDF.
- 2.Bartlett, P., Shawe-Taylor, J.: Generalization performance of support vector machines and other pattern classifiers (1999)Google Scholar
- 4.Das, D., Petrov, S.: Unsupervised part-of-speech tagging with bilingual graph-based projections. In: Proceedings of the 49th annual meeting of the association for computational linguistics: human language technologies - volume 1, HLT ’11, pp. 600–609. Association for Computational Linguistics, Stroudsburg (2011)Google Scholar
- 7.Fan, W., Stolfo, S.J., Zhang, J., Chan, P.K.: Adacost: misclassification cost-sensitive boosting. In: ICML, pp. 97–105 (1999)Google Scholar
- 11.Gottlieb, L.A., Kontorovich, L., Krauthgamer, R.: Efficient classification for metric data. In: COLT, pp. 433–440 (2010)Google Scholar
- 12.Gupta, A., Krauthgamer, R., Lee, J.R.: Bounded geometries, fractals, and low-distortion embeddings. In: FOCS, pp. 534–543 (2003)Google Scholar
- 14.Krauthgamer, R., Lee, J.R.: Navigating nets: Simple algorithms for proximity search. In: 15th annual ACM-SIAM Symposium on discrete algorithms, pp. 791–801 (2004)Google Scholar
- 15.McDiarmid, C.: On the method of bounded differences. In: Siemons, J. (ed.) Surveys in combinatorics of LMS lecture notes series, vol. 141, pp. 148–188. Morgan Kaufmann Publishers, San Mateo (1989)Google Scholar
- 19.Toutanova, K., Cherry, C.: A global model for joint lemmatization and part-of-speech prediction. In: Proceedings of the joint conference of the 47th annual meeting of the ACL and the 4th international joint conference on natural language processing of the AFNLP, ACL ’09, pp. 486–494 (2009)Google Scholar