AdaBoost was introduced for binary classification tasks by Freund and Schapire in 1995. Ever since its publication, numerous results have been produced, which revealed surprising links between AdaBoost and related fields, such as information geometry, game theory, and convex optimization. This remarkably comprehensive set of connections suggests that adaBoost is a unique approach that may, in fact, arise out of axiomatic principles. In this paper, we prove that this is indeed the case. We show that three natural axioms on adaptive re-weighting and combining algorithms, also called arcing, suffice to construct adaBoost and, more generally, the multiplicative weight update procedure as the unique family of algorithms that meet those axioms. Informally speaking, our three axioms only require that the arcing algorithm satisfies some elementary notions of additivity, objectivity, and utility. We prove that any method that satisfies these axioms must be minimizing the composition of an exponential loss with an additive function, and that the weights must be updated according to the multiplicative weight update procedure. This conclusion holds in the general setting of learning, which encompasses regression, classification, ranking, and clustering.
Ensemble methods Boosting AdaBoost Axioms
This is a preview of subscription content, log in to check access.
Ackerman, M., Ben-David, S.: Measures of clustering quality: a working set of axioms for clustering. In: NIPS, pp. 121–128 (2009)Google Scholar
Aczél, J., Forte, B., Ng, C.T.: Why the shannon and hartley entropies are natural. Adv. Appl. Probab. 6(01), 131–146 (1974)MathSciNetzbMATHGoogle Scholar
Bell, D.A., Wang, H.: A formalism for relevance and its application in feature subset selection. Mach. Learn. 41(2), 175–195 (2000)zbMATHGoogle Scholar