Focused Multi-task Learning Using Gaussian Processes

  • Gayle Leen
  • Jaakko Peltonen
  • Samuel Kaski
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6912)


Given a learning task for a data set, learning it together with related tasks (data sets) can improve performance. Gaussian process models have been applied to such multi-task learning scenarios, based on joint priors for functions underlying the tasks. In previous Gaussian process approaches, all tasks have been assumed to be of equal importance, whereas in transfer learning the goal is asymmetric: to enhance performance on a target task given all other tasks. In both settings, transfer learning and joint modelling, negative transfer is a key problem: performance may actually decrease if the tasks are not related closely enough. In this paper, we propose a Gaussian process model for the asymmetric setting, which learns to “explain away” non-related variation in the additional tasks, in order to focus on improving performance on the target task. In experiments, our model improves performance compared to single-task learning, symmetric multi-task learning using hierarchical Dirichlet processes, and transfer learning based on predictive structure learning.


Gaussian processes multi-task learning asymmetric setting negative transfer 


  1. 1.
    Alvarez, M., Lawrence, N.D.: Sparse convolved Gaussian processes for multioutput regression. In: Advances in Neural Information Processing Systems, vol. 21, pp. 57–64 (2009)Google Scholar
  2. 2.
    Ando, R.K., Zhang, T.: A framework for learning predictive structures from multiple tasks and unlabeled data. Journal of Machine Learning Research 6, 1817–1853 (2005)MathSciNetzbMATHGoogle Scholar
  3. 3.
    Bickel, S., Bogojeska, J., Lengauer, T., Scheffer, T.: Multi-task learning for HIV therapy screening. In: McCallum, A., Roweis, S. (eds.) Proceedings of the 25th Annual International Conference on Machine Learning (ICML 2008), pp. 56–63. Omnipress (2008)Google Scholar
  4. 4.
    Bickel, S., Sawade, C., Scheffer, T.: Transfer learning by distribution matching for targeted advertising. In: Koller, D., Schuurmans, D., Bengio, Y., Bottou, L. (eds.) Advances in Neural Information Processing Systems, vol. 21, pp. 145–152 (2009)Google Scholar
  5. 5.
    Bonilla, E.V., Chai, K.M.A., Williams, C.K.I.: Multi-task Gaussian Process Prediction. In: Neural Information Processing Systems (2008)Google Scholar
  6. 6.
    Caruana, R.: Multitask learning. Machine Learning 28, 41–75 (1997)CrossRefGoogle Scholar
  7. 7.
    Chai, K.M.A.: Generalization errors and learning curves for regression with multi-task gaussian processes. In: Bengio, Y., Schuurmans, D., Lafferty, J., Williams, C.K.I., Culotta, A. (eds.) Advances in Neural Information Processing Systems, vol. 22, pp. 279–287 (2009)Google Scholar
  8. 8.
    Malinen, S., Hlushchuk, Y., Hari, R.: Towards natural stimulation in fMRI - issues of data analysis. Neuroimage 35(1), 131–139 (2007)CrossRefGoogle Scholar
  9. 9.
    Marx, Z., Rosenstein, M.T., Kaelbling, L.P.: Transfer learning with an ensemble of background tasks. In: Inductive Transfer: 10 Years Later, NIPS 2005 Workshop (2005)Google Scholar
  10. 10.
    Minka, T.: Expectation Propagation for approximative Bayesian inference. In: Breese, J.S., Koller, D. (eds.) Proceedings of the 17th Conference in Uncertainty in Artificial Intelligence, pp. 362–369 (2001)Google Scholar
  11. 11.
    Pan, S.J., Yang, Q.: A survey on transfer learning. IEEE Transactions on Knowledge and Data Engineering (in press)Google Scholar
  12. 12.
    Raina, R., Ng, A.Y., Koller, D.: Transfer learning by constructing informative priors. In: Inductive Transfer: 10 Years Later, NIPS 2005 Workshop (2005)Google Scholar
  13. 13.
    Snelson, E., Ghahramani, Z.: Sparse Gaussian Processes using Pseudo-inputs. In: Advances in Neural Information Processing Systems, vol. 18 (2006)Google Scholar
  14. 14.
    Thrun, S.: Is learning the n-th thing any easier than learning the first? In: Advances in Neural Information Processing Systems, vol. 8 (1996)Google Scholar
  15. 15.
    Wackernagel, H.: Cokriging versus kriging in regionalized multivariate data analysis. Geoderma 62, 83–92 (1994)CrossRefGoogle Scholar
  16. 16.
    Wu, P., Dietterich, T.G.: Improving SVM accuracy by training on auxiliary data sources. In: Greiner, R., Schuurmans, D. (eds.) Proceedings of the 21st International Conference on Machine Learning (ICML 2004), pp. 871–878. Omnipress, Madison (2004)Google Scholar
  17. 17.
    Xue, Y., Liao, X., Carin, L.: Multi-Task Learning for Classification with Dirichlet Process Priors. Journal of Machine Learning Research 8, 35–63 (2007)MathSciNetzbMATHGoogle Scholar
  18. 18.
    Ylipaavalniemi, J., Savia, E., Malinen, S., Hari, R., Vigário, R., Kaski, S.: Dependencies between stimuli and spatially independent fMRI sources: Towards brain correlates of natural stimuli. Neuroimage 48, 176–185 (2009)CrossRefGoogle Scholar
  19. 19.
    Yu, K., Chu, W., Yu, S., Tresp, V., Zhao, X.: Stochastic Relational Models for Discriminative Link Prediction. In: Advances in Neural Information Processing Systems, vol. 19 (2007)Google Scholar
  20. 20.
    Yu, K., Tresp, V.: Learning to learn and collaborative filtering. In: Inductive Transfer: 10 Years Later, NIPS 2005 Workshop (2005)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2011

Authors and Affiliations

  • Gayle Leen
    • 1
    • 2
  • Jaakko Peltonen
    • 1
    • 2
  • Samuel Kaski
    • 1
    • 2
    • 3
  1. 1.Aalto University School of Science, Department of Information and Computer ScienceFinland
  2. 2.Helsinki Institute of Information Technology HIITFinland
  3. 3.University of Helsinki, Department of Computer ScienceFinland

Personalised recommendations