Expectation Propagation for Bayesian Multi-task Feature Selection

  • Daniel Hernández-Lobato
  • José Miguel Hernández-Lobato
  • Thibault Helleputte
  • Pierre Dupont
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6321)


In this paper we propose a Bayesian model for multi-task feature selection. This model is based on a generalized spike and slab sparse prior distribution that enforces the selection of a common subset of features across several tasks. Since exact Bayesian inference in this model is intractable, approximate inference is performed through expectation propagation (EP). EP approximates the posterior distribution of the model using a parametric probability distribution. This posterior approximation is particularly useful to identify relevant features for prediction. We focus on problems for which the number of features d is significantly larger than the number of instances for each task. We propose an efficient parametrization of the EP algorithm that offers a computational complexity linear in d. Experiments on several multi-task datasets show that the proposed model outperforms baseline approaches for single-task learning or data pooling across all tasks, as well as two state-of-the-art multi-task learning approaches. Additional experiments confirm the stability of the proposed feature selection with respect to various sub-samplings of the training data.


Multi-task learning feature selection expectation propagation approximate Bayesian inference 


  1. 1.
    Dudoit, S., Fridlyand, J.: Classification in microarray experiments. In: Statistical Analysis of Gene Expression Microarray Data, pp. 93–158. Chapman and Hall/CRC Press (2003)Google Scholar
  2. 2.
    Seeger, M., Nickisch, H., Pohmann, R., Schölkopf, B.: Optimization of k-space trajectories for compressed sensing by Bayesian experimental design. Magnetic Resonance in Medicine 63(1), 116–126 (2009)Google Scholar
  3. 3.
    Johnstone, I., Titterington, D.: Statistical challenges of high-dimensional data. Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences 367(1906), 4237 (2009)zbMATHCrossRefMathSciNetGoogle Scholar
  4. 4.
    Tibshirani, R.: Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society. Series B (Methodological) 58(1), 267–288 (1996)zbMATHMathSciNetGoogle Scholar
  5. 5.
    George, E.I., McCulloch, R.E.: Approaches for Bayesian variable selection. Statistica Sinica 7(2), 339–373 (1997)zbMATHGoogle Scholar
  6. 6.
    Ishwaran, H., Rao, J.: Spike and slab variable selection: frequentist and Bayesian strategies. The Annals of Statistics 33(2), 730–773 (2005)zbMATHCrossRefMathSciNetGoogle Scholar
  7. 7.
    Tipping, M.E.: Sparse Bayesian learning and the relevance vector machine. Journal of Machine Learning Research 1, 211–244 (2001)zbMATHCrossRefMathSciNetGoogle Scholar
  8. 8.
    Obozinski, G., Taskar, B., Jordan, M.: Joint covariate selection and joint subspace selection for multiple classification problems. Statistics and Computing, 1–22 (2009)Google Scholar
  9. 9.
    Evgeniou, T., Pontil, M.: Regularized multi–task learning. In: Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 109–117. ACM, New York (2004)CrossRefGoogle Scholar
  10. 10.
    Minka, T.: A Family of Algorithms for approximate Bayesian Inference. PhD thesis, Massachusetts Institute of Technology (2001)Google Scholar
  11. 11.
    Bishop, C.M.: Pattern Recognition and Machine Learning (Information Science and Statistics). Springer, Heidelberg (August 2006)Google Scholar
  12. 12.
    Minka, T., Lafferty, J.: Expectation-propagation for the generative aspect model. In: Darwiche, A., Friedman, N. (eds.) Proceedings of the 18th Conference on Uncertainty in Artificial Intelligence, pp. 352–359. Morgan Kaufmann, San Francisco (2002)Google Scholar
  13. 13.
    Seeger, M.: Notes on Minka’s expectation propagation for Gaussian process classification. Technical report, University of Edinburgh (2002)Google Scholar
  14. 14.
    Gerven, M.V., Cseke, B., Oostenveld, R., Heskes, T.: Bayesian source localization with the multivariate Laplace prior. In: Bengio, Y., Schuurmans, D., Lafferty, J., Williams, C.K.I., Culotta, A. (eds.) Advances in Neural Information Processing Systems, vol. 22, pp. 1901–1909 (2009)Google Scholar
  15. 15.
    Abdleazeem, S., El-Sherif, E.: Arabic handwritten digit recognition. International Journal on Document Analysis and Recognition 11(3), 127–141 (2008)CrossRefGoogle Scholar
  16. 16.
    Demšar, J.: Statistical comparisons of classifiers over multiple data sets. Journal of Machine Learning Research 7, 1–30 (2006)Google Scholar
  17. 17.
    Irizarry, R., Hobbs, B., Collin, F., Beazer-Barclay, Y., Antonellis, K., Scherf, U., Speed, T.: Exploration, normalization, and summaries of high density oligonucleotide array probe level data. Biostatistics 4(2), 249 (2003)zbMATHCrossRefGoogle Scholar
  18. 18.
    Singh, D., Febbo, P.G., Ross, K., Jackson, D.G., Manola, J., Ladd, C., Tamayo, P., Renshaw, A.A., D’Amico, A.V., Richie, J.P., Lander, E.S., Loda, M., Kantoff, P.W., Golub, T.R., Sellers, W.R.: Gene expression correlates of clinical prostate cancer behavior. Cancer Cell 1, 203–209 (2002)CrossRefGoogle Scholar
  19. 19.
    Stuart, R., Wachsman, W., Berry, C., Wang-Rodriguez, J., Wasserman, L., Klacansky, I., Masys, D., Arden, K., Goodison, S., McClelland, M., et al.: In silico dissection of cell-type-associated patterns of gene expression in prostate cancer. Proceedings of the National Academy of Sciences 101(2), 615 (2004)CrossRefGoogle Scholar
  20. 20.
    Welsh, J., Sapinoso, L., Su, A., Kern, S., Wang-Rodriguez, J., Moskaluk, C., Frierson Jr., H., Hampton, G.: Analysis of gene expression identifies candidate markers and pharmacological targets in prostate cancer. Cancer Research 61(16), 5974 (2001)Google Scholar
  21. 21.
    Helleputte, T., Dupont, P.: Feature selection by transfer learning with linear regularized models. In: Buntine, W., Grobelnik, M., Mladenić, D., Shawe-Taylor, J. (eds.) ECML PKDD 2009. LNCS, vol. 5781, pp. 533–547. Springer, Heidelberg (2009)CrossRefGoogle Scholar
  22. 22.
    Kuncheva, L.I.: A stability index for feature selection. In: Proceedings of the 25th IASTED International Multi-Conference on Artificial Intelligence and Applications, Anaheim, CA, USA, pp. 390–395. ACTA Press (2007)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2010

Authors and Affiliations

  • Daniel Hernández-Lobato
    • 1
  • José Miguel Hernández-Lobato
    • 2
  • Thibault Helleputte
    • 1
  • Pierre Dupont
    • 1
  1. 1.Machine Learning Group, ICTEAM instituteUniversité catholique de LouvainLouvain-la-NeuveBelgium
  2. 2.Computer Science DepartmentUniversidad Autónoma de MadridMadridSpain

Personalised recommendations