Recursive Aggregation of Estimators by the Mirror Descent Algorithm with Averaging

Juditsky, A. B.; Nazin, A. V.; Tsybakov, A. B.; Vayatis, N.

doi:10.1007/s11122-006-0005-2

Recursive Aggregation of Estimators by the Mirror Descent Algorithm with Averaging

Methods of Signal Processing
Published: October 2005

Volume 41, pages 368–384, (2005)
Cite this article

Problems of Information Transmission Aims and scope Submit manuscript

A. B. Juditsky¹,
A. V. Nazin²,
A. B. Tsybakov³ &
…
N. Vayatis³

193 Accesses
50 Citations
Explore all metrics

Abstract

We consider a recursive algorithm to construct an aggregated estimator from a finite number of base decision rules in the classification problem. The estimator approximately minimizes a convex risk functional under the ℓ₁-constraint. It is defined by a stochastic version of the mirror descent algorithm which performs descent of the gradient type in the dual space with an additional averaging. The main result of the paper is an upper bound for the expected accuracy of the proposed estimator. This bound is of the order \(C\sqrt {(\log M)/t}\) with an explicit and small constant factor C, where M is the dimension of the problem and t stands for the sample size. A similar bound is proved for a more general setting, which covers, in particular, the regression model with squared loss.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Optimal learning with Bernstein online aggregation

Article 04 October 2016

Learning to Aggregate Using Uninorms

On the principle of empirical risk minimization based on averaging aggregation functions

Article 01 September 2017

REFERENCES

Schapire, R.E., The Strength of Weak Learnability, Machine Learning, 1990, vol. 5, no.2, pp. 197–227.
Google Scholar
Freund, Y., Boosting a Weak Learning Algorithm by Majority, Inform. Comput., 1995, vol. 121, no.2, pp. 256–285.
Article MATH MathSciNet Google Scholar
Schapire, R.E., Freund, Y., Bartlett, P.L., and Lee, W.S., Boosting the Margin: a New Explanation for the Effectiveness of Voting Methods, Ann. Statist., 1998, vol. 26, no.5, pp. 1651–1686.
MathSciNet Google Scholar
Vapnik, V.N., Statistical Learning Theory, New York: Wiley, 1998.
Google Scholar
Bartlett, P.L, Jordan, M.I., and McAuliffe, J.D., Convexity, Classification, and Risk Bounds, Tech. Report of Dept. Statist., Univ. of California, Berkeley, 2003, no. 638.
Lugosi, G. and Vayatis, N., On the Bayes-Risk Consistency of Regularized Boosting Methods (With Discussion), Ann. Statist., 2004, vol. 32, no.1, pp. 30–55.
MathSciNet Google Scholar
Scovel, J.C. and Steinwart, I., Fast Rates for Support Vector Machines, Los Alamos National Lab. Tech. Report, 2003, no. LA-UR-03-9117.
Zhang, T., Statistical Behavior and Consistency of Classification Methods Based on Convex Risk Minimization (With Discussion), Ann. Statist., 2004, vol. 32, no.1, pp. 56–85.
MATH MathSciNet Google Scholar
Tsypkin, Ya.Z., Osnovy teorii obychayushchikhsya sistem, Moscow: Nauka, 1970. Translated under the title Foundations of the Theory of Learning Systems, New York: Academic, 1973.
Google Scholar
Aizerman, M.A., Braverman, E.M., and Rozonoer, L.I., Metod potential'nykh funktsyi v teorii obycheniya mashin (Method of Potential Functions in the Theory of Learning Machines), Moscow: Nauka, 1970.
Google Scholar
Aizerman, M., Braverman, E., and Rozonoer, L., Extrapolative Problems in Automatic Control and the Method of Potential Functions, Am. Math. Soc. Transl., 1970, vol. 87, pp. 281–303.
Google Scholar
Devroye, L., Gyorfi, L., and Lugosi, G., A Probabilistic Theory of Pattern Recognition, New York: Springer, 1996.
Google Scholar
Cesa-Bianchi, N., Conconi, A., and Gentile, C., A Second-Order Perceptron Algorithm, SIAM J. Comput., 2005, vol. 34, no.3, pp. 640–668.
Article MathSciNet Google Scholar
Kivinen, J., Smola, A.J., and Williamson, R.C., Online Learning with Kernels, IEEE Trans. Signal Process., 2004, vol. 52, no.8, pp. 2165–2176.
Article MathSciNet Google Scholar
Zhang, T., Solving Large Scale Linear Prediction Problems Using Stochastic Gradient Descent Algorithms, in Proc. 21st Int. Conf. on Machine Learning, Banff, Alberta, Canada, 2004 (ICML'04), New York: ACM, 2004, vol. 69, p. 116.
Google Scholar
Polyak, B.T. and Juditsky, A.B., Acceleration of Stochastic Approximation by Averaging, SIAM J. Control Optim., 1992, vol. 30, no.4, pp. 838–855.
Article MathSciNet Google Scholar
Nemirovskii, A.S. and Yudin, D.B., Slozhnost' zadach i effektivnost' metodov optimizatsii, Moscow: Nauka, 1979. Translated under the title Problem Complexity and Method Efficiency in Optimization, Chichester: Wiley, 1983.
Google Scholar
Ben-Tal, A., Margalit, T., and Nemirovski, A., The Ordered Subsets Mirror Descent Optimization Method with Applications to Tomography, SIAM J. Optim., 2001, vol. 12, no.1, pp. 79–108.
Article MathSciNet Google Scholar
Ben-Tal, A. and Nemirovski, A., The Conjugate Barrier Mirror Descent Method for Non-Smooth Convex Optimization, Preprint of the Faculty of Industr. Eng. Manag., Technion-Israel Inst. Technol., Haifa, 1999. Available at http://iew3.technion.ac.il/Labs/Opt/opt/Pap/CP MD.pdf.
Kivinen, J. and Warmuth, M.K., Additive Versus Exponentiated Gradient Updates for Linear Prediction, Inform. Comput., 1997, vol. 132, no.1, pp. 1–64.
Article MathSciNet Google Scholar
Helmbold, D.P., Kivinen, J., and Warmuth, M.K., Relative Loss Bounds for Single Neurons, IEEE Trans. Neural Networks, 1999, vol. 10, no.6, pp. 1291–1304.
Article Google Scholar
Kivinen, J. and Warmuth, M.K., Relative Loss Bounds for Multidimensional Regression Problems, Machine Learning, 2001, vol. 45, no.3, pp. 301–329.
Article Google Scholar
Cesa-Bianchi, N. and Gentile, C., Improved Risk Tail Bounds for On-Line Algorithms, Neural Information Processing Systems, NIPS 2004 Workshop on (Ab)Use of Bounds, Whistler, BC, Canada, December 18, 2004. Available at http://mercurio.srv.dsi.unimi.it/∼cesabian/Pubblicazioni/iada.pdf.
Cesa-Bianchi, N., Conconi, A., and Gentile, C., On the Generalization Ability of On-Line Learning Algorithms, IEEE Trans. Inform. Theory, 2004, vol. 50, no.9, pp. 2050–2057.
Article MathSciNet Google Scholar
Juditsky, A. and Nemirovski, A., Functional Aggregation for Nonparametric Estimation, Ann. Statist., 2000, vol. 28, no.3, pp. 681–712.
MathSciNet Google Scholar
Tsybakov, A., Optimal Rates of Aggregation, Computational Learning Theory and Kernel Machines, Scholkopf, B. and Warmuth, M., Eds., Lecture Notes in Artificial Intelligence, Heidelberg: Springer, 2003, vol. 2777, pp. 303–313.
Google Scholar
Vapnik, V. and Chervonenkis, A., Teoriya raspoznavaniya obrazov, Moscow: Nauka, 1974. Translated under the title Theorie der Zeichenerkennung, Berlin: Akademie-Verlag, 1979.
Google Scholar
Breiman, L., Arcing the Edge, Tech. Rep. of Statist. Dept., Univ. of California, Berkeley, 1997, no. 486.
Friedman, J., Hastie, T., and Tibshirani, R., Additive Logistic Regression: a Statistical View of Boosting (With Discussion and a Rejoinder by the Authors), Ann. Statist., 2000, vol. 28, no.2, pp. 337–407.
Article MathSciNet Google Scholar
Tsybakov, A., Optimal Aggregation of Classifiers in Statistical Learning, Ann. Statist., 2004, vol. 32, no.1, pp. 135–166.
MATH MathSciNet Google Scholar
Tarigan, B. and van de Geer, S.A., Adaptivity of Support Vector Machines with ℓ₁ Penalty, Tech. Rep. of Math. Inst., Univ. of Leiden, Leiden, 2004, no. MI 2004-14. Available at http://www.math.leidenuniv.nl/∼geer/svm4.pdf.
Rockafellar, R.T. and Wets, R.J.B., Variational Analysis, New York: Springer, 1998.
Google Scholar
Kiwiel, K.C., Proximal Minimization Methods with Generalized Bregman Functions, SIAM J. Control Optim., 1997, vol. 35, no.4, pp. 1142–1168.
Article MATH MathSciNet Google Scholar
Beck, A. and Teboulle, M., Mirror Descent and Nonlinear Projected Subgradient Methods for Convex Optimization, Oper. Research Letters, 2003, vol. 31, no.3, pp. 167–175.
MathSciNet Google Scholar
Polyak, B.T. and Tsypkin, Ya.Z., Criterial Algorithms of Stochastic Optimization, Avtom. Telemekh., 1984, no. 6. pp. 95–104 [Autom. Remote Contr. (Engl. Transl.), vol. 45, no. 6, part 2, pp. 766–774].
Vajda, I., Theory of Statistical Inference and Information, Dordrecht: Kluwer, 1986.
Google Scholar

Download references

Author information

Authors and Affiliations

Laboratoire de Modelisation et Calcul, Universite Grenoble I, France
A. B. Juditsky
Institute of Control Sciences, RAS, Moscow, Russia
A. V. Nazin
Laboratoire de Probabilites et Modeles Aleatoires, Universite Paris VI, France
A. B. Tsybakov & N. Vayatis

Authors

A. B. Juditsky
View author publications
You can also search for this author in PubMed Google Scholar
A. V. Nazin
View author publications
You can also search for this author in PubMed Google Scholar
A. B. Tsybakov
View author publications
You can also search for this author in PubMed Google Scholar
N. Vayatis
View author publications
You can also search for this author in PubMed Google Scholar

Additional information

__________

Translated from Problemy Peredachi Informatsii, No. 4, 2005, pp. 78–96.

The work was made within the framework of Projects ACI NIM “BIOCLASSIF” and ACI MD “OPSYC,”France.

The research was made during visits to the Paris-VI and Grenoble-I Universities (France) in 2004–2005.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Juditsky, A.B., Nazin, A.V., Tsybakov, A.B. et al. Recursive Aggregation of Estimators by the Mirror Descent Algorithm with Averaging. Probl Inf Transm 41, 368–384 (2005). https://doi.org/10.1007/s11122-006-0005-2

Download citation

Received: 16 March 2005
Accepted: 26 July 2005
Issue Date: October 2005
DOI: https://doi.org/10.1007/s11122-006-0005-2

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Recursive Aggregation of Estimators by the Mirror Descent Algorithm with Averaging

Abstract

Access this article

Similar content being viewed by others

Optimal learning with Bernstein online aggregation

Learning to Aggregate Using Uninorms

On the principle of empirical risk minimization based on averaging aggregation functions

REFERENCES

Author information

Authors and Affiliations

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Recursive Aggregation of Estimators by the Mirror Descent Algorithm with Averaging

Abstract

Access this article

Similar content being viewed by others

Optimal learning with Bernstein online aggregation

Learning to Aggregate Using Uninorms

On the principle of empirical risk minimization based on averaging aggregation functions

REFERENCES

Author information

Authors and Affiliations

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation