Skip to main content
Log in

Recursive Aggregation of Estimators by the Mirror Descent Algorithm with Averaging

  • Methods of Signal Processing
  • Published:
Problems of Information Transmission Aims and scope Submit manuscript

Abstract

We consider a recursive algorithm to construct an aggregated estimator from a finite number of base decision rules in the classification problem. The estimator approximately minimizes a convex risk functional under the ℓ1-constraint. It is defined by a stochastic version of the mirror descent algorithm which performs descent of the gradient type in the dual space with an additional averaging. The main result of the paper is an upper bound for the expected accuracy of the proposed estimator. This bound is of the order \(C\sqrt {(\log M)/t}\) with an explicit and small constant factor C, where M is the dimension of the problem and t stands for the sample size. A similar bound is proved for a more general setting, which covers, in particular, the regression model with squared loss.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

REFERENCES

  1. Schapire, R.E., The Strength of Weak Learnability, Machine Learning, 1990, vol. 5, no.2, pp. 197–227.

    Google Scholar 

  2. Freund, Y., Boosting a Weak Learning Algorithm by Majority, Inform. Comput., 1995, vol. 121, no.2, pp. 256–285.

    Article  MATH  MathSciNet  Google Scholar 

  3. Schapire, R.E., Freund, Y., Bartlett, P.L., and Lee, W.S., Boosting the Margin: a New Explanation for the Effectiveness of Voting Methods, Ann. Statist., 1998, vol. 26, no.5, pp. 1651–1686.

    MathSciNet  Google Scholar 

  4. Vapnik, V.N., Statistical Learning Theory, New York: Wiley, 1998.

    Google Scholar 

  5. Bartlett, P.L, Jordan, M.I., and McAuliffe, J.D., Convexity, Classification, and Risk Bounds, Tech. Report of Dept. Statist., Univ. of California, Berkeley, 2003, no. 638.

  6. Lugosi, G. and Vayatis, N., On the Bayes-Risk Consistency of Regularized Boosting Methods (With Discussion), Ann. Statist., 2004, vol. 32, no.1, pp. 30–55.

    MathSciNet  Google Scholar 

  7. Scovel, J.C. and Steinwart, I., Fast Rates for Support Vector Machines, Los Alamos National Lab. Tech. Report, 2003, no. LA-UR-03-9117.

  8. Zhang, T., Statistical Behavior and Consistency of Classification Methods Based on Convex Risk Minimization (With Discussion), Ann. Statist., 2004, vol. 32, no.1, pp. 56–85.

    MATH  MathSciNet  Google Scholar 

  9. Tsypkin, Ya.Z., Osnovy teorii obychayushchikhsya sistem, Moscow: Nauka, 1970. Translated under the title Foundations of the Theory of Learning Systems, New York: Academic, 1973.

    Google Scholar 

  10. Aizerman, M.A., Braverman, E.M., and Rozonoer, L.I., Metod potential'nykh funktsyi v teorii obycheniya mashin (Method of Potential Functions in the Theory of Learning Machines), Moscow: Nauka, 1970.

    Google Scholar 

  11. Aizerman, M., Braverman, E., and Rozonoer, L., Extrapolative Problems in Automatic Control and the Method of Potential Functions, Am. Math. Soc. Transl., 1970, vol. 87, pp. 281–303.

    Google Scholar 

  12. Devroye, L., Gyorfi, L., and Lugosi, G., A Probabilistic Theory of Pattern Recognition, New York: Springer, 1996.

    Google Scholar 

  13. Cesa-Bianchi, N., Conconi, A., and Gentile, C., A Second-Order Perceptron Algorithm, SIAM J. Comput., 2005, vol. 34, no.3, pp. 640–668.

    Article  MathSciNet  Google Scholar 

  14. Kivinen, J., Smola, A.J., and Williamson, R.C., Online Learning with Kernels, IEEE Trans. Signal Process., 2004, vol. 52, no.8, pp. 2165–2176.

    Article  MathSciNet  Google Scholar 

  15. Zhang, T., Solving Large Scale Linear Prediction Problems Using Stochastic Gradient Descent Algorithms, in Proc. 21st Int. Conf. on Machine Learning, Banff, Alberta, Canada, 2004 (ICML'04), New York: ACM, 2004, vol. 69, p. 116.

    Google Scholar 

  16. Polyak, B.T. and Juditsky, A.B., Acceleration of Stochastic Approximation by Averaging, SIAM J. Control Optim., 1992, vol. 30, no.4, pp. 838–855.

    Article  MathSciNet  Google Scholar 

  17. Nemirovskii, A.S. and Yudin, D.B., Slozhnost' zadach i effektivnost' metodov optimizatsii, Moscow: Nauka, 1979. Translated under the title Problem Complexity and Method Efficiency in Optimization, Chichester: Wiley, 1983.

    Google Scholar 

  18. Ben-Tal, A., Margalit, T., and Nemirovski, A., The Ordered Subsets Mirror Descent Optimization Method with Applications to Tomography, SIAM J. Optim., 2001, vol. 12, no.1, pp. 79–108.

    Article  MathSciNet  Google Scholar 

  19. Ben-Tal, A. and Nemirovski, A., The Conjugate Barrier Mirror Descent Method for Non-Smooth Convex Optimization, Preprint of the Faculty of Industr. Eng. Manag., Technion-Israel Inst. Technol., Haifa, 1999. Available at http://iew3.technion.ac.il/Labs/Opt/opt/Pap/CP MD.pdf.

  20. Kivinen, J. and Warmuth, M.K., Additive Versus Exponentiated Gradient Updates for Linear Prediction, Inform. Comput., 1997, vol. 132, no.1, pp. 1–64.

    Article  MathSciNet  Google Scholar 

  21. Helmbold, D.P., Kivinen, J., and Warmuth, M.K., Relative Loss Bounds for Single Neurons, IEEE Trans. Neural Networks, 1999, vol. 10, no.6, pp. 1291–1304.

    Article  Google Scholar 

  22. Kivinen, J. and Warmuth, M.K., Relative Loss Bounds for Multidimensional Regression Problems, Machine Learning, 2001, vol. 45, no.3, pp. 301–329.

    Article  Google Scholar 

  23. Cesa-Bianchi, N. and Gentile, C., Improved Risk Tail Bounds for On-Line Algorithms, Neural Information Processing Systems, NIPS 2004 Workshop on (Ab)Use of Bounds, Whistler, BC, Canada, December 18, 2004. Available at http://mercurio.srv.dsi.unimi.it/∼cesabian/Pubblicazioni/iada.pdf.

  24. Cesa-Bianchi, N., Conconi, A., and Gentile, C., On the Generalization Ability of On-Line Learning Algorithms, IEEE Trans. Inform. Theory, 2004, vol. 50, no.9, pp. 2050–2057.

    Article  MathSciNet  Google Scholar 

  25. Juditsky, A. and Nemirovski, A., Functional Aggregation for Nonparametric Estimation, Ann. Statist., 2000, vol. 28, no.3, pp. 681–712.

    MathSciNet  Google Scholar 

  26. Tsybakov, A., Optimal Rates of Aggregation, Computational Learning Theory and Kernel Machines, Scholkopf, B. and Warmuth, M., Eds., Lecture Notes in Artificial Intelligence, Heidelberg: Springer, 2003, vol. 2777, pp. 303–313.

    Google Scholar 

  27. Vapnik, V. and Chervonenkis, A., Teoriya raspoznavaniya obrazov, Moscow: Nauka, 1974. Translated under the title Theorie der Zeichenerkennung, Berlin: Akademie-Verlag, 1979.

    Google Scholar 

  28. Breiman, L., Arcing the Edge, Tech. Rep. of Statist. Dept., Univ. of California, Berkeley, 1997, no. 486.

  29. Friedman, J., Hastie, T., and Tibshirani, R., Additive Logistic Regression: a Statistical View of Boosting (With Discussion and a Rejoinder by the Authors), Ann. Statist., 2000, vol. 28, no.2, pp. 337–407.

    Article  MathSciNet  Google Scholar 

  30. Tsybakov, A., Optimal Aggregation of Classifiers in Statistical Learning, Ann. Statist., 2004, vol. 32, no.1, pp. 135–166.

    MATH  MathSciNet  Google Scholar 

  31. Tarigan, B. and van de Geer, S.A., Adaptivity of Support Vector Machines with ℓ1 Penalty, Tech. Rep. of Math. Inst., Univ. of Leiden, Leiden, 2004, no. MI 2004-14. Available at http://www.math.leidenuniv.nl/∼geer/svm4.pdf.

  32. Rockafellar, R.T. and Wets, R.J.B., Variational Analysis, New York: Springer, 1998.

    Google Scholar 

  33. Kiwiel, K.C., Proximal Minimization Methods with Generalized Bregman Functions, SIAM J. Control Optim., 1997, vol. 35, no.4, pp. 1142–1168.

    Article  MATH  MathSciNet  Google Scholar 

  34. Beck, A. and Teboulle, M., Mirror Descent and Nonlinear Projected Subgradient Methods for Convex Optimization, Oper. Research Letters, 2003, vol. 31, no.3, pp. 167–175.

    MathSciNet  Google Scholar 

  35. Polyak, B.T. and Tsypkin, Ya.Z., Criterial Algorithms of Stochastic Optimization, Avtom. Telemekh., 1984, no. 6. pp. 95–104 [Autom. Remote Contr. (Engl. Transl.), vol. 45, no. 6, part 2, pp. 766–774].

  36. Vajda, I., Theory of Statistical Inference and Information, Dordrecht: Kluwer, 1986.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Additional information

__________

Translated from Problemy Peredachi Informatsii, No. 4, 2005, pp. 78–96.

Original Russian Text Copyright © 2005 by Juditsky, Nazin, Tsybakov, Vayatis.

The work was made within the framework of Projects ACI NIM “BIOCLASSIF” and ACI MD “OPSYC,”France.

The research was made during visits to the Paris-VI and Grenoble-I Universities (France) in 2004–2005.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Juditsky, A.B., Nazin, A.V., Tsybakov, A.B. et al. Recursive Aggregation of Estimators by the Mirror Descent Algorithm with Averaging. Probl Inf Transm 41, 368–384 (2005). https://doi.org/10.1007/s11122-006-0005-2

Download citation

  • Received:

  • Accepted:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11122-006-0005-2

Keywords

Navigation