In this paper we consider two online multi-class classification problems: classification with linear models and with kernelized models. The predictions can be thought of as probability distributions. The quality of predictions is measured by the Brier loss function. We suggest two computationally efficient algorithms to work with these problems, the second algorithm is derived by considering a new class of linear prediction models. We prove theoretical guarantees on the cumulative losses of the algorithms. We kernelize one of the algorithms and prove theoretical guarantees on the loss of the kernelized version. We perform experiments and compare our algorithms with logistic regression.


Online prediction classification linear regression Aggregating Algorithm 


  1. 1.
    Cesa-Bianchi, N., Lugosi, G.: Prediction, Learning, and Games. Cambridge University Press, Cambridge (2006)MATHCrossRefGoogle Scholar
  2. 2.
    Vovk, V.: Competitive on-line statistics. International Statistical Review 69, 213–248 (2001)MATHCrossRefGoogle Scholar
  3. 3.
    Zhdanov, F., Kalnishkan, Y.: Linear probability forecasting. Technical report, arXiv:1001.0879 [cs.LG], arXiv.org e-Print archive (2009)Google Scholar
  4. 4.
    Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly Weather Review 78, 1–3 (1950)CrossRefGoogle Scholar
  5. 5.
    Michelot, C.: A finite algorithm for finding the projection of a point onto the canonical simplex of ℝn. Journal of Optimization Theory and Applications 50, 195–200 (1986)MATHCrossRefMathSciNetGoogle Scholar
  6. 6.
    Vovk, V.: Aggregating strategies. In: Proceedings of COLT 1990, pp. 371–383 (1990)Google Scholar
  7. 7.
    Vovk, V., Zhdanov, F.: Prediction with expert advice for the Brier game. In: Proceedings of ICML 2008, pp. 1104–1111 (2008)Google Scholar
  8. 8.
    Gammerman, A., Kalnishkan, Y., Vovk, V.: On-line prediction with kernels and the complexity approximation principle. In: Proceedings of UAI 2004, pp. 170–176 (2004)Google Scholar
  9. 9.
    Kivinen, J., Warmuth, M.K.: Exponentiated gradient versus gradient descent for linear predictors. Information and Computation 132, 1–63 (1997)MATHCrossRefMathSciNetGoogle Scholar
  10. 10.
    Hoerl, A.E., Kennard, R.W.: Ridge Regression: biased estimation for nonorthogonal problems. Technometrics 42, 80–86 (2000)CrossRefGoogle Scholar
  11. 11.
    Kivinen, J., Warmuth, M.K.: Relative loss bounds for multidimensional regression problems. Machine Learning 45, 301–329 (2001)MATHCrossRefGoogle Scholar

Copyright information

© IFIP 2010

Authors and Affiliations

  • Fedor Zhdanov
    • 1
  • Yuri Kalnishkan
    • 1
  1. 1.Computer Learning Research Centre and Department of Computer Science, Royal HollowayUniversity of LondonEghamUK

Personalised recommendations