Abstract
Multi-task learning involves solving multiple related learning problems by sharing some common structure for improved generalization performance. A promising idea to multi-task learning is joint feature selection where a sparsity pattern is shared across task specific feature representations. In this paper, we propose a novel Gaussian Process (GP) approach to multi-task learning based on joint feature selection. The novelty of the proposed approach is that it captures the task similarity by sharing a sparsity pattern over the kernel hyper-parameters associated with each task. This is achieved by considering a hierarchical model which imposes a multi-Laplacian prior over the kernel hyper-parameters. This leads to a flexible GP model which can handle a wide range of multi-task learning problems and can identify features relevant across all the tasks. The hyper-parameter estimation results in an optimization problem which is solved using a block co-ordinate descent algorithm. Experimental results on synthetic and real world multi-task learning data sets demonstrate that the flexibility of the proposed model is useful in getting better generalization performance.
Chapter PDF
Similar content being viewed by others
References
Caruana, R.: Multitask Learning. Machine Learning 28(1), 41–75 (1997)
Ando, R.K., Zhang, T.: A Framework for Learning Predictive Structures from Multiple Tasks and Unlabeled Data. JMLR 6, 1817–1853 (2005)
Bakker, B., Heskes, T.: Task Clustering and Gating for Bayesian Multitask Learning. JMLR 4, 83–99 (2003)
Evgeniou, T., Micchelli, C.A., Pontil, M.: Learning Multiple Tasks with Kernel Methods. JMLR 6, 615–637 (2005)
Xue, Y., Liao, X., Carin, L., Krishnapuram, B.: Multi-task Learning for Classification with Dirichlet Process Priors. JMLR 8, 35–63 (2007)
Argyriou, A., Evgeniou, T., Pontil, M.: Convex Multi-task Feature Learning. Machine Learning 73(3), 243–272 (2008)
Yu, K., Tresp, V., Schwaighofer, A.: Learning Gaussian processes from Multiple Tasks. In: ICML, pp. 1012–1019 (2005)
Obozinski, G., Taskar, B.: Multi-task Feature Selection. Technical report, Department of Statistics, University of California, Berkeley (2006)
Xiong, T., Bi, J., Rao, R.B., Cherkassky, V.: Probabilistic Joint Feature Selection for Multi-task Learning. In: SDM (2007)
Archembeau, C., Guo, S., Zoeter, O.: Sparse Bayesian Multi-task Learning. In: NIPS, pp. 1755–1763 (2011)
Hernández-Lobato, D., Hernández-Lobato, J.M., Helleputte, T., Dupont, P.: Expectation Propagation for Bayesian Multi-task Feature Selection. In: ECML-PKDD, pp. 522–537 (2010)
Rasmussen, C.E., Williams, C.K.I.: Gaussian Processes for Machine Learning (Adaptive Computation and Machine Learning). MIT Press (2005)
Lawrence, N.D., Platt, J.C.: Learning to Learn with the Informative Vector Machine. In: ICML, pp. 65–72 (2004)
Bonilla, E.V., Chai, K.M., Williams, C.K.I.: Multi-task Gaussian Process Prediction. In: NIPS, pp. 153–160 (2008)
Teh, Y.W., Seeger, M., Jordan, M.I.: Semiparametric Latent Factor Models. In: International Workshop on Artificial Intelligence and Statistics, vol. 10 (2005)
Liu, J., Ji, S., Ye, J.: Multi-task Feature Learning via Efficient L2,1-Norm Minimization. In: UAI, pp. 339–348 (2009)
Obozinski, G., Taskar, B., Jordan, M.I.: Joint Covariate Selection and Joint Subspace Selection for Multiple Classification Problems. Statistics and Computing 20(2), 231–252 (2010)
Zhang, Y., Yeung, D.Y., Xu, Q.: Probabilistic Multi-Task Feature Selection. In: NIPS, pp. 2559–2567 (2010)
Jebara, T.: Multitask Sparsity via Maximum Entropy Discrimination. JMLR 12, 75–110 (2011)
Titsias, M.K., Lázaro-Gredilla, M.: Spike and Slab Variational Inference for Multi-Task and Multiple Kernel Learning. In: NIPS, pp. 2339–2347 (2011)
Wang, Y., Khardon, R., Protopapas, P.: Shift-invariant Grouped Multi-task Learning for Gaussian Processes. In: ECML-PKDD, pp. 418–434 (2010)
Tibshirani, R.: Regression Shrinkage and Selection Via the Lasso. Journal of the Royal Statistical Society 58, 267–288 (1994)
Raman, S., Fuchs, T.J., Wild, P.J., Dahl, E., Roth, V.: Bayesian Group-Lasso for Analyzing Contingency Tables. In: ICML, pp. 881–888 (2009)
Bach, F.R., Jenatton, R., Mairal, J., Obozinski, G.: Optimization with Sparsity-Inducing Penalties. Foundations and Trends in Machine Learning 4(1), 1–106 (2012)
Liu, H., Palatucci, M., Jian, Z.: Blockwise Coordinate Descent Procedures for the Multi-task Lasso, with Applications to Neural Semantic Basis Discovery. In: ICML, pp. 649–656 (2009)
Lenk, P.J., DeSarbo, W.S., Green, P.E., Young, M.R.: Hierarchical Bayes Conjoint Analysis: Recovery of Partworth Heterogeneity from Reduced Experimental Designs. Marketing Science 15(2), 173–191 (1996)
Dietterich, T.G.: Approximate Statistical Tests for Comparing Supervised Classification Learning Algorithms. Neural Computation 10, 1895–1923 (1998)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Srijith, P.K., Shevade, S. (2014). Gaussian Process Multi-task Learning Using Joint Feature Selection. In: Calders, T., Esposito, F., Hüllermeier, E., Meo, R. (eds) Machine Learning and Knowledge Discovery in Databases. ECML PKDD 2014. Lecture Notes in Computer Science(), vol 8726. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-44845-8_7
Download citation
DOI: https://doi.org/10.1007/978-3-662-44845-8_7
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-662-44844-1
Online ISBN: 978-3-662-44845-8
eBook Packages: Computer ScienceComputer Science (R0)