Machine Learning

, Volume 73, Issue 3, pp 221–242 | Cite as

Flexible latent variable models for multi-task learning

  • Jian Zhang
  • Zoubin Ghahramani
  • Yiming Yang


Given multiple prediction problems such as regression or classification, we are interested in a joint inference framework that can effectively share information between tasks to improve the prediction accuracy, especially when the number of training examples per problem is small. In this paper we propose a probabilistic framework which can support a set of latent variable models for different multi-task learning scenarios. We show that the framework is a generalization of standard learning methods for single prediction problems and it can effectively model the shared structure among different prediction tasks. Furthermore, we present efficient algorithms for the empirical Bayes method as well as point estimation. Our experiments on both simulated datasets and real world classification datasets show the effectiveness of the proposed models in two evaluation settings: a standard multi-task learning setting and a transfer learning setting.


Multi-task learning Latent variable models Hierarchical Bayesian models Model selection Transfer learning 


  1. Ando, R., & Zhang, T. (2004). A framework for learning predictive structures from multiple tasks and unlabeled data (Technical Report RC23462). IBM T.J. Watson Research Center, 45. Google Scholar
  2. Argyriou, A., Evgeniou, T., & Pontil, M. (2006). Multi-task feature learning. In Advances in neural information processing systems (NIPS) 19. Cambridge: MIT Press. Google Scholar
  3. Baxter, J. (2000). A model of inductive bias learning. Journal of Artificial Intelligence Research, 12, 149–198. zbMATHMathSciNetGoogle Scholar
  4. Breiman, L., & Friedman, J. H. (1997). Predicting multivariate responses in multiple linear regression. Journal of the Royal Statistical Society B, 59(1), 3–54. zbMATHCrossRefMathSciNetGoogle Scholar
  5. Caruana, R. (1997). Multitask learning. Machine Learning, 28(1), 41–75. CrossRefGoogle Scholar
  6. Dempster, A. P., Laird, N. M., & Rubin, D. B. (1977). Maximum likelihood from incomplete data via the em algorithm. Journal of the Royal Statistical Society, Series B, 39, 1–38. zbMATHMathSciNetGoogle Scholar
  7. Evgeniou, T., Micchelli, C., & Pontil, M. (2005). Learning multiple tasks with kernel methods. Journal of Machine Learning Research, 6, 615–637. MathSciNetGoogle Scholar
  8. Ferguson, T. (1973). A Bayesian analysis of some nonparametric problems. Annals of Statistics, 1, 209–230. zbMATHCrossRefMathSciNetGoogle Scholar
  9. Hastie, T., Tibshirani, R., & Friedman, J. (2001). The elements of statistical learning: data mining, inference and prediction (1st ed.). Berlin: Springer. zbMATHGoogle Scholar
  10. Heskes, T. (2000). Empirical Bayes for learning to learn. In Proc. 17th international conf. on machine learning (pp. 367–374). San Mateo, CA: Morgan Kaufmann. Google Scholar
  11. Jaakkola, T., & Jordan, M. (1997). A variational approach to Bayesian logistic regression models and their extensions. In Proceedings of 6th international workshop on AI and statistics. Google Scholar
  12. Koller, D., & Sahami, M. (1997). Hierarchically classifying documents using very few words. In Proceedings of the 14th international conference on machine learning (ICML). Google Scholar
  13. Lehmann, E., & Casella, G. (1998). Theory of point estimation (2nd ed.). Berlin: Springer. zbMATHGoogle Scholar
  14. Lenk, P., DeSarbo, W., Green, P., & Young, M. (1996). Hierarchical Bayes conjoint analysis: Recovery of partworth heterogeneity from reduced experimental designs. Marketing Science, 15(2), 173–191. CrossRefGoogle Scholar
  15. McCullagh, P., & Nelder, J. A. (1989). Generalized Linear Models (2nd ed.). London: Chapman & Hall/CRC. zbMATHGoogle Scholar
  16. Silver, D., & Mercer, R. (2001). Selective functional transfer: Inductive bias from related tasks. In Proceedings of the IASTED international conference on artificial intelligence and soft computing (ASC2001) (pp. 182–189). Google Scholar
  17. Silverman, B. (1986). Density estimation for statistics and data analysis. London: Chapman & Hall/CRC. zbMATHGoogle Scholar
  18. Tanner, M. A. (2005). Tools for statistical inference: methods for the exploration of posterior distributions and likelihood functions (3rd ed.). Berlin: Springer. Google Scholar
  19. Teh, Y., Seeger, M., & Jordan, M. (2005). Semiparametric latent factor models. In AISTAT. Google Scholar
  20. Thrun, S., & Pratt, L. (1998). Learning to learn. Dordrecht: Kluwer Academic. zbMATHGoogle Scholar
  21. Thrun, S. (1996). Is learning the n-th thing any easier than learning the first? In D. S. Touretzky, M. C. Mozer, & M. E. Hasselmo (Eds.), Advances in neural information processing systems (Vol. 8, pp. 640–646). Cambridge: MIT Press. Google Scholar
  22. Yu, K., Tresp, V., & Schwaighofer, A. (2005). Learning Gaussian processes from multiple tasks. In Proceedings of 22nd international conference on machine learning (ICML). Google Scholar
  23. Zhang, J., Ghahramani, Z., & Yang, Y. (2005). Learning multiple related tasks using latent independent component analysis. In Neural information processing systems (NIPS) 18. Google Scholar

Copyright information

© Springer Science+Business Media, LLC 2008

Authors and Affiliations

  1. 1.Department of StatisticsPurdue UniversityWest LafayetteUSA
  2. 2.Department of EngineeringUniversity of CambridgeCambridgeUK
  3. 3.School of Computer ScienceCarnegie Mellon UniversityPittsburghUSA

Personalised recommendations