Machine Learning

, Volume 69, Issue 2–3, pp 115–142 | Cite as

A primal-dual perspective of online learning algorithms



We describe a novel framework for the design and analysis of online learning algorithms based on the notion of duality in constrained optimization. We cast a sub-family of universal online bounds as an optimization problem. Using the weak duality theorem we reduce the process of online learning to the task of incrementally increasing the dual objective function. The amount by which the dual increases serves as a new and natural notion of progress for analyzing online learning algorithms. We are thus able to tie the primal objective value and the number of prediction mistakes using the increase in the dual.


Online learning Mistake bounds Duality Regret bounds 


  1. Azoury, K., & Warmuth, M. (2001). Relative loss bounds for on-line density estimation with the exponential family of distributions. Machine Learning, 43(3), 211–246. MATHCrossRefGoogle Scholar
  2. Boyd, S., & Vandenberghe, L. (2004). Convex optimization. Cambridge: Cambridge University Press. MATHGoogle Scholar
  3. Bregman, L. M. (1967). The relaxation method of finding the common point of convex sets and its application to the solution of problems in convex programming. USSR Computational Mathematics and Mathematical Physics, 7, 200–217. CrossRefGoogle Scholar
  4. Censor, Y., & Zenios, S. A. (1997). Parallel optimization: theory, algorithms, and applications. New York: Oxford University Press. MATHGoogle Scholar
  5. Cesa-Bianchi, N., Conconi, A., & Gentile, C. (2002). On the generalization ability of on-line learning algorithms. In Advances in neural information processing systems (Vol. 14, pp. 359–366). Google Scholar
  6. Cesa-Bianchi, N., Conconi, A., & Gentile, C. (2005). A second-order perceptron algorithm. SIAM Journal on Computing, 34(3), 640–668. MATHCrossRefMathSciNetGoogle Scholar
  7. Crammer, K., Dekel, O., Keshet, J., Shalev-Shwartz, S., & Singer, Y. (2005). Online passive aggressive algorithms. Technical report, The Hebrew University. Google Scholar
  8. Dekel, O., Shalev-Shwartz, S., & Singer, Y. (2005). The forgetron: a kernel-based perceptron on a fixed budget. In Advances in neural information processing systems (Vol. 18). Google Scholar
  9. Gentile, C. (2001). A new approximate maximal margin classification algorithm. Journal of Machine Learning Research, 2, 213–242. CrossRefMathSciNetGoogle Scholar
  10. Gentile, C. (2002). The robustness of the p-norm algorithms. Machine Learning, 53(3). Google Scholar
  11. Grove, A. J., Littlestone, N., & Schuurmans, D. (2001). General convergence results for linear discriminant updates. Machine Learning, 43(3), 173–210. MATHCrossRefGoogle Scholar
  12. Hannan, J. (1957). Approximation to Bayes risk in repeated play. In M. Dresher, A. W. Tucker, & P. Wolfe (Eds.), Contributions to the theory of games (Vol. III, pp. 97–139). Princeton: Princeton University Press. Google Scholar
  13. Helmbold, D. P., Kivinen, J., & Warmuth, M. (1999). Relative loss bounds for single neurons. IEEE Transactions on Neural Networks, 10(6), 1291–1304. CrossRefGoogle Scholar
  14. Kivinen, J., & Warmuth, M. (1997). Exponentiated gradient versus gradient descent for linear predictors. Information and Computation, 132(1), 1–64. MATHCrossRefMathSciNetGoogle Scholar
  15. Kivinen, J., & Warmuth, M. (2001). Relative loss bounds for multidimensional regression problems. Journal of Machine Learning, 45(3), 301–329. MATHCrossRefGoogle Scholar
  16. Kivinen, J., Smola, A. J., & Williamson, R. C. (2002). Online learning with kernels. IEEE Transactions on Signal Processing, 52(8), 2165–2176. CrossRefMathSciNetGoogle Scholar
  17. Li, Y., & Long, P. M. (2002). The relaxed online maximum margin algorithm. Machine Learning, 46(1–3), 361–387. MATHCrossRefGoogle Scholar
  18. Littlestone, N. (1988). Learning when irrelevant attributes abound: a new linear-threshold algorithm. Machine Learning, 2, 285–318. Google Scholar
  19. Littlestone, N. (1989). Mistake bounds and logarithmic linear-threshold learning algorithms. PhD thesis, U.C. Santa Cruz, March 1989. Google Scholar
  20. Novikoff, A. B. J. (1962). On convergence proofs on perceptrons. In Proceedings of the symposium on the mathematical theory of automata (Vol. XII, pp. 615–622). Google Scholar
  21. Rockafellar, R. T. (1970). Convex analysis. Princeton: Princeton University Press. MATHGoogle Scholar
  22. Rosenblatt, F. (1958). The perceptron: A probabilistic model for information storage and organization in the brain. Psychological Review, 65, 386–407. (Reprinted in Neurocomputing, MIT Press, 1988.) CrossRefMathSciNetGoogle Scholar
  23. Vovk, V. (2001). Competitive on-line statistics. International Statistical Review, 69, 213–248. MATHCrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC 2007

Authors and Affiliations

  1. 1.School of Computer Science & EngineeringThe Hebrew UniversityJerusalemIsrael
  2. 2.Google Inc.Mountain ViewUSA

Personalised recommendations