Online Multitask Learning

  • Ofer Dekel
  • Philip M. Long
  • Yoram Singer
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4005)


We study the problem of online learning of multiple tasks in parallel. On each online round, the algorithm receives an instance and makes a prediction for each one of the parallel tasks. We consider the case where these tasks all contribute toward a common goal. We capture the relationship between the tasks by using a single global loss function to evaluate the quality of the multiple predictions made on each round. Specifically, each individual prediction is associated with its own individual loss, and then these loss values are combined using a global loss function. We present several families of online algorithms which can use any absolute norm as a global loss function. We prove worst-case relative loss bounds for all of our algorithms.


Online Learning Online Algorithm Multiple Task Dual Norm Machine Learn Research 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Crammer, K., Dekel, O., Keshet, J., Shalev-Shwartz, S., Singer, Y.: Online passive aggressive algorithms. Journal of Machine Learning Research 7 (2006)Google Scholar
  2. 2.
    Dietterich, T.G., Bakiri, G.: Solving multiclass learning problems via error-correcting output codes. JAIR 2, 263–286 (1995)MATHGoogle Scholar
  3. 3.
    Caruana, R.: Multitask learning. Machine Learning 28, 41–75 (1997)CrossRefGoogle Scholar
  4. 4.
    Heskes, T.: Solving a huge number of silmilar tasks: A combination of multitask learning and a hierarchical bayesian approach. In: ICML, vol. 15, pp. 233–241 (1998)Google Scholar
  5. 5.
    Evgeniou, T., Micchelli, C., Pontil, M.: Learning multiple tasks with kernel methods. Journal of Machine Learning Research 6, 615–637 (2005)MathSciNetGoogle Scholar
  6. 6.
    Baxter, J.: A model of inductive bias learning. Journal of Artificial Intelligence Research 12, 149–198 (2000)MathSciNetMATHGoogle Scholar
  7. 7.
    Ben-David, S., Schuller, R.: Exploiting task relatedness for multiple task learning. In: Schölkopf, B., Warmuth, M.K. (eds.) COLT/Kernel 2003. LNCS, vol. 2777, pp. 567–580. Springer, Heidelberg (2003)CrossRefGoogle Scholar
  8. 8.
    Tsochantaridis, I., Hofmann, T., Joachims, T., Altun, Y.: Support vector machine learning for interdependent and structured output spaces. In: Proceedings of the Twenty-First International Conference on Machine Learning (2004)Google Scholar
  9. 9.
    Kivinen, J., Warmuth, M.: Relative loss bounds for multidimensional regression problems. Journal of Machine Learning 45, 301–329 (2001)CrossRefMATHGoogle Scholar
  10. 10.
    Helmbold, D., Kivinen, J., Warmuth, M.: Relative loss bounds for single neurons. IEEE Transactions on Neural Networks 10, 1291–1304 (1999)CrossRefGoogle Scholar
  11. 11.
    Crammer, K., Singer, Y.: On the algorithmic implementation of multiclass kernel-based vector machines. Jornal of Machine Learning Research 2, 265–292 (2001)CrossRefGoogle Scholar
  12. 12.
    Crammer, K., Singer, Y.: Ultraconservative online algorithms for multiclass problems. Jornal of Machine Learning Research 3, 951–991 (2003)CrossRefMathSciNetMATHGoogle Scholar
  13. 13.
    Horn, R.A., Johnson, C.R.: Matrix Analysis. Cambridge Univ. Press, Cambridge (1985)MATHGoogle Scholar
  14. 14.
    Boyd, S., Vandenberghe, L.: Convex Optimization. Cambridge Univ. Press, Cambridge (2004)MATHGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Ofer Dekel
    • 1
  • Philip M. Long
    • 2
  • Yoram Singer
    • 1
    • 2
  1. 1.School of Computer Science and EngineeringThe Hebrew UniversityJerusalemIsrael
  2. 2.GoogleMountain ViewUSA

Personalised recommendations