Joint Autoencoders: A Flexible Meta-learning Framework

  • Baruch EpsteinEmail author
  • Ron Meir
  • Tomer Michaeli
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11051)


We develop a framework for learning multiple tasks simultaneously, based on sharing features that are common to all tasks, achieved through the use of a modular deep feedforward neural network consisting of shared branches, dealing with the common features of all tasks, and private branches, learning the specific unique aspects of each task. Once an appropriate weight sharing architecture has been established, learning takes place through standard algorithms for feedforward networks, e.g., stochastic gradient descent and its variations. The method deals with meta-learning (such as domain adaptation, transfer and multi-task learning) in a unified fashion, and can deal with data arising from different modalities. Numerical experiments demonstrate the effectiveness of learning in domain adaptation and transfer learning setups, and provide evidence for the flexible and task-oriented representations arising in the network. In particular, we handle transfer learning between multiple tasks in a straightforward manner, as opposed to many competing state-of-the-art methods, that are unable to handle more than two tasks. We also illustrate the network’s ability to distill task-specific and shared features.


Autoencoders Meta-learning Weakly-supervised learning 

Supplementary material

478880_1_En_30_MOESM1_ESM.pdf (412 kb)
Supplementary material 1 (pdf 411 KB)


  1. 1.
    Abadi, M., et al.: TensorFlow: Large-scale machine learning on heterogeneous systems (2015)., software available from
  2. 2.
    Andrew, G., et al.: Deep canonical correlation analysis. In: Proceedings of ICML (2013)Google Scholar
  3. 3.
    Baxter, J.: A model of inductive bias learning. JAIR 12, 149–198 (2000)MathSciNetCrossRefGoogle Scholar
  4. 4.
    Ben-David, S., et al.: A theory of learning from different domains. Mach. Learn. 79, 151–175 (2009)MathSciNetCrossRefGoogle Scholar
  5. 5.
    Bengio, Y., Courville, A., Vincent, P.: Representation learning: a review and new perspectives. IEEE Trans. PAMI 35(8), 1798–1828 (2013). n2423CrossRefGoogle Scholar
  6. 6.
    Bishop, C.M.: Pattern Recognition and Machine Learning. Springer, New York (2006)zbMATHGoogle Scholar
  7. 7.
    Bousmalis, K., et al.: Domain separation networks. In: Advances in Neural Information Processing Systems 29 (NIPS 2016) (2016)Google Scholar
  8. 8.
    Chollet, F.: keras (2015).
  9. 9.
    Daumé III, H.: Frustratingly easy domain adaptation. CoRR, abs/0907.1815 (2009)Google Scholar
  10. 10.
    Devroye, L., Gyoörfi, L., Lugosi, G.: A Probabilistic Theory of Pattern Recognition. Springer, New York (1996). Scholar
  11. 11.
    Donahue, J., Krähenbühl, P., Darrell, T.: Adversarial feature learning. arXiv preprint arXiv:1605.09782 (2016)
  12. 12.
    Dumoulin, V., et al.: Adversarially learned inference. arXiv preprint arXiv:1606.00704 (2017)
  13. 13.
    Evgeniou, T., Pontil, M.: Regularized multi-task learning. In: Proceedings of the tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2004)Google Scholar
  14. 14.
    Ganin, Y., Lempitsky, V.: Unsupervised domain adaptation by backpropagation. In: International Conference on Machine Learning, pp. 1180–1189 (2015)Google Scholar
  15. 15.
    Ganin, Y., et al.: Domain-adversarial training of neural networks. J. Mach. Learn. Res. 17(59), 1–35 (2016)MathSciNetzbMATHGoogle Scholar
  16. 16.
    Ghifary, M., Kleijn, W.B., Zhang, M., Balduzzi, D., Li, W.: Deep reconstruction-classification networks for unsupervised domain adaptation. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9908, pp. 597–613. Springer, Cham (2016). Scholar
  17. 17.
    Goodfellow, I.: Nips 2016 tutorial: Generative adversarial networks. arXiv preprint arXiv:1701.00160 (2016). n2574 (Tech)
  18. 18.
    Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, Cambridge (2016)zbMATHGoogle Scholar
  19. 19.
    Goodfellow, I., et al.: Generative adversarial nets. In: Advances in Neural Information Processing Systems, pp. 2672–2680 (2014)Google Scholar
  20. 20.
    Hume, D.: An Enquiry Concerning Human Understanding. A. Millar, London (1748)CrossRefGoogle Scholar
  21. 21.
    Kingma, D.P., Welling, M.: Auto-encoding variational bayes. CoRR abs/1312.6114 (2014)Google Scholar
  22. 22.
    Krizhevsky, A.: Learning multiple layers of features from tiny images. Technical report, University of Toronto (2009)Google Scholar
  23. 23.
    LeCun, Y., et al.: Gradient-based learning applied to document recognition. Proc. IEEE 86(11), 2278–2324 (1998)CrossRefGoogle Scholar
  24. 24.
    Lederman, R.R., Talmon, R.: Learning the geometry of common latent variables using alternating-diffusion. Appl. Comput. Harmonic Anal. 44(3), 509–536 (2015)MathSciNetCrossRefGoogle Scholar
  25. 25.
    Liu, M.Y., Tuzel, O.: Coupled generative adversarial networks. In: Advances in Neural Information Processing Systems, pp. 469–477 (2016)Google Scholar
  26. 26.
    Liu, Z., Luo, P., Wang, X., Tang, X.: Deep learning face attributes in the wild. In: Proceedings of International Conference on Computer Vision (ICCV) (2015)Google Scholar
  27. 27.
    Long, M., Wang, J., Ding, G., Sun, J., Yu, P.S.: Transfer feature learning with joint distribution adaptation. In: 2013 IEEE International Conference on Computer Vision (2013)Google Scholar
  28. 28.
    van der Maaten, L.J.P., Hinton, G.E.: Visualizing data using t-sne. J. Mach. Learn. Res. 9, 2579–2605 (2008)zbMATHGoogle Scholar
  29. 29.
    Maurer, A., Pontil, M., Romera-Paredes, B.: The benefit of multitask representation learning. J. Mach. Learn. Res. 17, 2853–2884 (2016)MathSciNetzbMATHGoogle Scholar
  30. 30.
    Michaeli, T., et al.: Nonparametric canonical correlation analysis. In: Proceedings of ICML (2016)Google Scholar
  31. 31.
    Parameswaran, S., Weinberger, K.Q.: Large margin multi-task metric learning. In: Advances in Neural Information Processing Systems 23 (NIPS 2010) (2010)Google Scholar
  32. 32.
    Thrun, S., Pratt, L.: Learning to Learn. Kluwer Academic Publishers, Dordrecht (1998)CrossRefGoogle Scholar
  33. 33.
    Tzeng, E., Hoffman, J., Saenko, K., Darrell, T.: Adversarial discriminative domain adaptation. CoRR abs/1702.05464 (2017)Google Scholar
  34. 34.
    Tzeng, E., Hoffman, J., Zhang, N., Saenko, K., Darrell, T.: Deep domain confusion: maximizing for domain invariance. CoRR, abs/1412.347 (2014)Google Scholar
  35. 35.
    Wang, W., Arora, R., Livescu, K., Bilmes, J.: On deep multi-view representation learning. In: Proceedings of the 32nd ICML, pp. 1083–1092 (2015)Google Scholar
  36. 36.
    Weston, J., et al.: Deep learning via semi-supervised embedding. In: Proceedings of ICML (2008)Google Scholar
  37. 37.
    Zhang, J., Ghahramani, Z., Yang, Y.: Flexible latent variable models for multi-task learning. Mach. Learn. 73(3), 221–242 (2008). n2570CrossRefGoogle Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  1. 1.Viterbi Faculty of Electrical EngineeringTechnion - Israel Institute of TechnologyHaifaIsrael

Personalised recommendations