Abstract
We consider a recursive algorithm to construct an aggregated estimator from a finite number of base decision rules in the classification problem. The estimator approximately minimizes a convex risk functional under the ℓ1-constraint. It is defined by a stochastic version of the mirror descent algorithm which performs descent of the gradient type in the dual space with an additional averaging. The main result of the paper is an upper bound for the expected accuracy of the proposed estimator. This bound is of the order \(C\sqrt {(\log M)/t}\) with an explicit and small constant factor C, where M is the dimension of the problem and t stands for the sample size. A similar bound is proved for a more general setting, which covers, in particular, the regression model with squared loss.
Similar content being viewed by others
REFERENCES
Schapire, R.E., The Strength of Weak Learnability, Machine Learning, 1990, vol. 5, no.2, pp. 197–227.
Freund, Y., Boosting a Weak Learning Algorithm by Majority, Inform. Comput., 1995, vol. 121, no.2, pp. 256–285.
Schapire, R.E., Freund, Y., Bartlett, P.L., and Lee, W.S., Boosting the Margin: a New Explanation for the Effectiveness of Voting Methods, Ann. Statist., 1998, vol. 26, no.5, pp. 1651–1686.
Vapnik, V.N., Statistical Learning Theory, New York: Wiley, 1998.
Bartlett, P.L, Jordan, M.I., and McAuliffe, J.D., Convexity, Classification, and Risk Bounds, Tech. Report of Dept. Statist., Univ. of California, Berkeley, 2003, no. 638.
Lugosi, G. and Vayatis, N., On the Bayes-Risk Consistency of Regularized Boosting Methods (With Discussion), Ann. Statist., 2004, vol. 32, no.1, pp. 30–55.
Scovel, J.C. and Steinwart, I., Fast Rates for Support Vector Machines, Los Alamos National Lab. Tech. Report, 2003, no. LA-UR-03-9117.
Zhang, T., Statistical Behavior and Consistency of Classification Methods Based on Convex Risk Minimization (With Discussion), Ann. Statist., 2004, vol. 32, no.1, pp. 56–85.
Tsypkin, Ya.Z., Osnovy teorii obychayushchikhsya sistem, Moscow: Nauka, 1970. Translated under the title Foundations of the Theory of Learning Systems, New York: Academic, 1973.
Aizerman, M.A., Braverman, E.M., and Rozonoer, L.I., Metod potential'nykh funktsyi v teorii obycheniya mashin (Method of Potential Functions in the Theory of Learning Machines), Moscow: Nauka, 1970.
Aizerman, M., Braverman, E., and Rozonoer, L., Extrapolative Problems in Automatic Control and the Method of Potential Functions, Am. Math. Soc. Transl., 1970, vol. 87, pp. 281–303.
Devroye, L., Gyorfi, L., and Lugosi, G., A Probabilistic Theory of Pattern Recognition, New York: Springer, 1996.
Cesa-Bianchi, N., Conconi, A., and Gentile, C., A Second-Order Perceptron Algorithm, SIAM J. Comput., 2005, vol. 34, no.3, pp. 640–668.
Kivinen, J., Smola, A.J., and Williamson, R.C., Online Learning with Kernels, IEEE Trans. Signal Process., 2004, vol. 52, no.8, pp. 2165–2176.
Zhang, T., Solving Large Scale Linear Prediction Problems Using Stochastic Gradient Descent Algorithms, in Proc. 21st Int. Conf. on Machine Learning, Banff, Alberta, Canada, 2004 (ICML'04), New York: ACM, 2004, vol. 69, p. 116.
Polyak, B.T. and Juditsky, A.B., Acceleration of Stochastic Approximation by Averaging, SIAM J. Control Optim., 1992, vol. 30, no.4, pp. 838–855.
Nemirovskii, A.S. and Yudin, D.B., Slozhnost' zadach i effektivnost' metodov optimizatsii, Moscow: Nauka, 1979. Translated under the title Problem Complexity and Method Efficiency in Optimization, Chichester: Wiley, 1983.
Ben-Tal, A., Margalit, T., and Nemirovski, A., The Ordered Subsets Mirror Descent Optimization Method with Applications to Tomography, SIAM J. Optim., 2001, vol. 12, no.1, pp. 79–108.
Ben-Tal, A. and Nemirovski, A., The Conjugate Barrier Mirror Descent Method for Non-Smooth Convex Optimization, Preprint of the Faculty of Industr. Eng. Manag., Technion-Israel Inst. Technol., Haifa, 1999. Available at http://iew3.technion.ac.il/Labs/Opt/opt/Pap/CP MD.pdf.
Kivinen, J. and Warmuth, M.K., Additive Versus Exponentiated Gradient Updates for Linear Prediction, Inform. Comput., 1997, vol. 132, no.1, pp. 1–64.
Helmbold, D.P., Kivinen, J., and Warmuth, M.K., Relative Loss Bounds for Single Neurons, IEEE Trans. Neural Networks, 1999, vol. 10, no.6, pp. 1291–1304.
Kivinen, J. and Warmuth, M.K., Relative Loss Bounds for Multidimensional Regression Problems, Machine Learning, 2001, vol. 45, no.3, pp. 301–329.
Cesa-Bianchi, N. and Gentile, C., Improved Risk Tail Bounds for On-Line Algorithms, Neural Information Processing Systems, NIPS 2004 Workshop on (Ab)Use of Bounds, Whistler, BC, Canada, December 18, 2004. Available at http://mercurio.srv.dsi.unimi.it/∼cesabian/Pubblicazioni/iada.pdf.
Cesa-Bianchi, N., Conconi, A., and Gentile, C., On the Generalization Ability of On-Line Learning Algorithms, IEEE Trans. Inform. Theory, 2004, vol. 50, no.9, pp. 2050–2057.
Juditsky, A. and Nemirovski, A., Functional Aggregation for Nonparametric Estimation, Ann. Statist., 2000, vol. 28, no.3, pp. 681–712.
Tsybakov, A., Optimal Rates of Aggregation, Computational Learning Theory and Kernel Machines, Scholkopf, B. and Warmuth, M., Eds., Lecture Notes in Artificial Intelligence, Heidelberg: Springer, 2003, vol. 2777, pp. 303–313.
Vapnik, V. and Chervonenkis, A., Teoriya raspoznavaniya obrazov, Moscow: Nauka, 1974. Translated under the title Theorie der Zeichenerkennung, Berlin: Akademie-Verlag, 1979.
Breiman, L., Arcing the Edge, Tech. Rep. of Statist. Dept., Univ. of California, Berkeley, 1997, no. 486.
Friedman, J., Hastie, T., and Tibshirani, R., Additive Logistic Regression: a Statistical View of Boosting (With Discussion and a Rejoinder by the Authors), Ann. Statist., 2000, vol. 28, no.2, pp. 337–407.
Tsybakov, A., Optimal Aggregation of Classifiers in Statistical Learning, Ann. Statist., 2004, vol. 32, no.1, pp. 135–166.
Tarigan, B. and van de Geer, S.A., Adaptivity of Support Vector Machines with ℓ1 Penalty, Tech. Rep. of Math. Inst., Univ. of Leiden, Leiden, 2004, no. MI 2004-14. Available at http://www.math.leidenuniv.nl/∼geer/svm4.pdf.
Rockafellar, R.T. and Wets, R.J.B., Variational Analysis, New York: Springer, 1998.
Kiwiel, K.C., Proximal Minimization Methods with Generalized Bregman Functions, SIAM J. Control Optim., 1997, vol. 35, no.4, pp. 1142–1168.
Beck, A. and Teboulle, M., Mirror Descent and Nonlinear Projected Subgradient Methods for Convex Optimization, Oper. Research Letters, 2003, vol. 31, no.3, pp. 167–175.
Polyak, B.T. and Tsypkin, Ya.Z., Criterial Algorithms of Stochastic Optimization, Avtom. Telemekh., 1984, no. 6. pp. 95–104 [Autom. Remote Contr. (Engl. Transl.), vol. 45, no. 6, part 2, pp. 766–774].
Vajda, I., Theory of Statistical Inference and Information, Dordrecht: Kluwer, 1986.
Author information
Authors and Affiliations
Additional information
__________
Translated from Problemy Peredachi Informatsii, No. 4, 2005, pp. 78–96.
Original Russian Text Copyright © 2005 by Juditsky, Nazin, Tsybakov, Vayatis.
The work was made within the framework of Projects ACI NIM “BIOCLASSIF” and ACI MD “OPSYC,”France.
The research was made during visits to the Paris-VI and Grenoble-I Universities (France) in 2004–2005.
Rights and permissions
About this article
Cite this article
Juditsky, A.B., Nazin, A.V., Tsybakov, A.B. et al. Recursive Aggregation of Estimators by the Mirror Descent Algorithm with Averaging. Probl Inf Transm 41, 368–384 (2005). https://doi.org/10.1007/s11122-006-0005-2
Received:
Accepted:
Issue Date:
DOI: https://doi.org/10.1007/s11122-006-0005-2