Advertisement

Learning Latent Representations Across Multiple Data Domains Using Lifelong VAEGAN

Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 12365)

Abstract

The problem of catastrophic forgetting occurs in deep learning models trained on multiple databases in a sequential manner. Recently, generative replay mechanisms (GRM) have been proposed to reproduce previously learned knowledge aiming to reduce the forgetting. However, such approaches lack an appropriate inference model and therefore can not provide latent representations of data. In this paper, we propose a novel lifelong learning approach, namely the Lifelong VAEGAN (L-VAEGAN), which not only induces a powerful generative replay network but also learns meaningful latent representations, benefiting representation learning. L-VAEGAN can allow to automatically embed the information associated with different domains into several clusters in the latent space, while also capturing semantically meaningful shared latent variables, across different data domains. The proposed model supports many downstream tasks that traditional generative replay methods can not, including interpolation and inference across different data domains.

Keywords

Lifelong learning Representation learning Generative modeling VAEGAN model. 

Supplementary material

504476_1_En_46_MOESM1_ESM.pdf (7.3 mb)
Supplementary material 1 (pdf 7519 KB)

References

  1. 1.
    Achille, A., et al.: Life-long disentangled representation learning with cross-domain latent homologies. In: Proceedings of the Advances in Neural Information Processing Systems (NIPS), pp. 9873–9883 (2018)Google Scholar
  2. 2.
    Aljundi, R., et al.: Online continual learning with maximal interfered retrieval. In: Proceedings of the Neural Information Processing Systems (NIPS). arXiv preprint arXiv:1908.04742 (2019)
  3. 3.
    Aljundi, R., Chakravarty, P., Tuytelaars, T.: Expert gate: lifelong learning with a network of experts. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3366–3375 (2017)Google Scholar
  4. 4.
    Aljundi, R., Lin, M., Goujaud, B., Bengio, Y.: Gradient based sample selection for online continual learning. In: Proceedings of the Neural Information Processing Systems (NIPS). arXiv preprint arXiv:1903.08671 (2019)
  5. 5.
    Arjovsky, M., Chintala, S., Bottou, L.: Wasserstein generative adversarial networks. In: Proceedings of the International Conference on Machine Learning (ICML), pp. 214–223 (2017)Google Scholar
  6. 6.
    Aubry, M., Maturana, D., Efros, A.A., Russell, B.C., Sivic, J.: Seeing 3D chairs: exemplar part-based 2D–3D alignment using a large dataset of CAD models. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3762–3769 (2014)Google Scholar
  7. 7.
    Burgess, C.P., et al.: Understanding disentangling in \(\beta \)-VAE. In: Proceedings of the NIPS Workshop on Learning Disentangled Representation. arXiv preprint arXiv:1804.03599 (2017)
  8. 8.
    Chaudhry, A., et al.: On tiny episodic memories in continual learning. arXiv preprint arXiv:1902.10486 (2019)
  9. 9.
    Chen, B.C., Chen, C.S., Hsu, W.H.: Cross-age reference coding for age-invariant face recognition and retrieval. In: Proceedings of the European Conference on Computer Vision (ECCV), vol. LNCS 8694, pp. 768–783 (2014)Google Scholar
  10. 10.
    Chen, L., Dai, S., Pu, Y., Li, C., Su, Q., Carin, L.: Symmetric variational auto encoder and connections to adversarial learning. In: Proceedings of the International Conference on Artificial Intelligence and Statistics (AISTATS) 2018, vol. PMLR 84, pp. 661–669 (2018)Google Scholar
  11. 11.
    Chen, L., Dai, S., Pu, Y., Li, C., Su, Q., Carin, L.: Symmetric variational auto encoder and connections to adversarial learning. arXiv preprint arXiv:1709.01846 (2017)
  12. 12.
    Chen, T.Q., Li, X., Grosse, R.B., Duvenaud, D.K.: Isolating sources of disentanglement in variational autoencoders. In: Proceedings of the Advances in Neural Information Processing Systems (NIPS), pp. 2615–2625 (2018)Google Scholar
  13. 13.
    Chen, Z., Ma, N., Liu, B.: Lifelong learning for sentiment classification. In: Proceedings of the Annual Meeting of the Association for Comparative Linguistics and International Joint Conference on Natural Language Processing, pp. 750–756 (2015)Google Scholar
  14. 14.
    Dai, W., Yang, Q., Xue, G.R., Yu, Y.: Boosting for transfer learning. In: Proceedings of the International Conference on Machine Learning (ICML), pp. 193–200 (2007)Google Scholar
  15. 15.
    Donahue, J., Krähenbühl, P., Darrell, T.: Adversarial feature learning. In: Proceedings of the International Conference on Learning Representations (ICLR). arXiv preprint arXiv:1605.09782 (2017)
  16. 16.
    Dumoulin, V., et al.: Adversarially learned inference. In: Proceedings of the International Conference on Learning Representations (ICLR). arXiv preprint arXiv:1606.00704 (2017)
  17. 17.
    Dupont, E.: Learning disentangled joint continuous and discrete representations. In: Proceedings of the Advances in Neural Information Processing Systems (NIPS), pp. 710–720 (2018)Google Scholar
  18. 18.
    Erhan, D., Szegedy, C., Toshev, A., Anguelov, D.: Scalable object detection using deep neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2147–2154 (2014)Google Scholar
  19. 19.
    Fagot, J., Cook, R.G.: Evidence for large long-term memory capacities in baboons and pigeons and its implications for learning and the evolution of cognition. Proc. National Acad. Sci. (PNAS) 103(46), 17564–17567 (2006)CrossRefGoogle Scholar
  20. 20.
    Gao, S., Brekelmans, R., ver Steeg, G., Galstyan, A.: Auto-encoding total correlation explanation. In: Proceedings of the International Conference on Artificial Intelligence and Statistics (AISTATS) 2018, vol. PMLR 89, pp. 1157–1166 (2019)Google Scholar
  21. 21.
    Goodfellow, I., et al.: Generative adversarial nets. In: Proceedings of the Advances in Neural Information Processing Systems (NIPS), pp. 2672–2680 (2014)Google Scholar
  22. 22.
    Gulrajani, I., Ahmed, F., Arjovsky, M., Dumoulin, V., Courville, A.C.: Improved training of Wasserstein GANs. In: Proceedings of the Advances in Neural Information Processing Systems (NIPS), pp. 5767–5777 (2017)Google Scholar
  23. 23.
    Gumbel, E.J.: Statistical theory of extreme values and some practical applications: a series of lectures (1954)Google Scholar
  24. 24.
    Heusel, M., Ramsauer, H., Unterthiner, T., Nessler, B., Hochreiter, S.: GANs trained by a two time-scale update rule converge to a local Nash equilibrium. In: Proceedings of the Advances in Neural Information Processing Systems (NIPS), pp. 6626–6637 (2017)Google Scholar
  25. 25.
    Higgins, I., et al.: \(\beta \)-VAE: learning basic visual concepts with a constrained variational framework. In: Proceedings of the International Conference on Learning Representations (ICLR) (2017)Google Scholar
  26. 26.
    Hinton, G., Vinyals, O., Dean, J.: Distilling the knowledge in a neural network. In: Proceedings of the NIPS Deep Learning Workshop. arXiv preprint arXiv:1503.02531 (2014)
  27. 27.
    Jang, E., Gu, S., Poole, B.: Categorical reparameterization with Gumbel-Softmax. In: Proceedings of the International Conference on Learning Representations (ICLR). arXiv preprint arXiv:1611.01144 (2017)
  28. 28.
    Jeong, Y., Song, H.O.: Learning discrete and continuous factors of data via alternating disentanglement. In: Proceedings of the International Conference on Machine Learning (ICML), vol. PMLR 97, pp. 3091–3099 (2019)Google Scholar
  29. 29.
    Jung, H., Ju, J., Jung, M., Kim, J.: Less-forgetting learning in deep neural networks. arXiv preprint arXiv:1607.00122 (2016)
  30. 30.
    Kim, H., Mnih, A.: Disentangling by factorising. In: Proceedings of the International Conference on Machine Learning (ICML), vol. PMLR 80, pp. 2649–2658 (2018)Google Scholar
  31. 31.
    Kingma, D.P., Mohamed, S., Rezende, D.J., Welling, M.: Semi-supervised learning with deep generative models. In: Proceedings of the Advances in Neural Information Processing Systems (NIPS), pp. 3581–3589 (2014)Google Scholar
  32. 32.
    Kingma, D.P., Welling, M.: Auto-encoding variational Bayes. arXiv preprint arXiv:1312.6114 (2013)
  33. 33.
    Kirkpatrick, J., et al.: Overcoming catastrophic forgetting in neural networks. Proc. National Acad. Sci. (PNAS) 114(13), 3521–3526 (2017)MathSciNetCrossRefGoogle Scholar
  34. 34.
    Krizhevsky, A., Hinton, G.: Learning multiple layers of features from tiny images. Technical report (2009)Google Scholar
  35. 35.
    Kumar, A., Sattigeri, P., Balakrishnan, A.: Variational inference of disentangled latent concepts from unlabeled observations. In: Proceedings of the International Conference on Learning Representations (ICLR). arXiv preprint arXiv:1711.00848 (2018)
  36. 36.
    Larsen, A.B.L., Sønderby, S.K., Larochelle, H., Winther, O.: Autoencoding beyond pixels using a learned similarity metric. In: Proceedings of the International Conference on Machine Learning (ICML), pp. 1558–1566 (2015)Google Scholar
  37. 37.
    Li, C., et al.: Alice: towards understanding adversarial learning for joint distribution matching. In: Proceedings of the Advances in Neural Information Processing Systems (NIPS), pp. 5495–5503 (2017)Google Scholar
  38. 38.
    Li, Z., Hoiem, D.: Learning without forgetting. IEEE Trans. Pattern Anal. Mach. Intell. 40(12), 2935–2947 (2017)CrossRefGoogle Scholar
  39. 39.
    Liu, Z., Luo, P., Wang, X., Tang, X.: Deep learning face attributes in the wild. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV), pp. 3730–3738 (2015)Google Scholar
  40. 40.
    Maddison, C.J., Mnih, A., Teh, Y.W.: The concrete distribution: a continuous relaxation of discrete random variables. In: Proceedings of the International Conference on Learning Representations (ICLR). arXiv preprint arXiv:1611.00712 (2016)
  41. 41.
    Maddison, C.J., Tarlow, D., Minka, T.: A* sampling. In: Proceedings of the Advances in Neural Information Processing Systems (NIPS), pp. 1–10 (2014)Google Scholar
  42. 42.
    Makhzani, A., Shlens, J., Jaitly, N., Goodfellow, I., Frey, B.: Adversarial autoencoders. In: Proceedings of the International Conference on Learning Representations (ICLR). arXiv preprint arXiv:1511.05644 (2016)
  43. 43.
    Mescheder, L., Nowozin, S., Geiger, A.: Adversarial variational bayes: unifying variational autoencoders and generative adversarial networks. In: Proceedings of the International Conference on Machine Learning (ICML), vol. PMLR 70, pp. 2391–2400(2017)Google Scholar
  44. 44.
    Narayanaswamy, S., et al.: Learning disentangled representations with semi-supervised deep generative models. In: Proceedings of the Advances in Neural Information Processing Systems (NIPS), pp. 5925–5935 (2017)Google Scholar
  45. 45.
    Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning. In: NIPS Workshop on Deep Learning and Unsupervised Feature Learning (2011)Google Scholar
  46. 46.
    Nowozin, S., Cseke, B., Tomioka, R.: \(f\)-GAN: training generative neural samplers using variational divergence minimization. In: Proceedings of the Advances in Neural Information Processing Systems (NIPS), pp. 271–279 (2016)Google Scholar
  47. 47.
    Parisi, G.I., Kemker, R., Part, J.L., Kanan, C., Wermter, S.: Continual lifelong learning with neural networks: a review. Neural Netw. 113, 54–71 (2019)CrossRefGoogle Scholar
  48. 48.
    Polikar, R., Upda, L., Upda, S.S., Honavar, V.: Learn++: an incremental learning algorithm for supervised neural networks. IEEE Trans. Syst. Man Cybern. Part C 31(4), 497–508 (2001)CrossRefGoogle Scholar
  49. 49.
    Pu, Y., et al.: Adversarial symmetric variational autoencoder. In: Proceedings of the Advances in Neural Information Processing Systems (NIPS), pp. 4333–4342 (2017)Google Scholar
  50. 50.
    Ramapuram, J., Gregorova, M., Kalousis, A.: Lifelong generative modeling. In: Proceedings of the International Conference on Learning Representations (ICLR). arXiv preprint arXiv:1705.09847 (2017)
  51. 51.
    Rannen, A., Aljundi, R., Blaschko, M., Tuytelaars, T.: Encoder based lifelong learning. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV), pp. 1320–1328 (2017)Google Scholar
  52. 52.
    Rao, D., Visin, F., Rusu, A.A., Teh, Y.W., Pascanu, R., Hadsell, R.: Continual unsupervised representation learning. In: Proceedings of the Neural Information Processing Systems (NIPS). arXiv preprint arXiv:1910.14481 (2019)
  53. 53.
    Redko, I., Habrard, A., Sebban, M.: Theoretical analysis of domain adaptation with optimal transport. In: Ceci, M., Hollmén, J., Todorovski, L., Vens, C., Džeroski, S. (eds.) ECML PKDD 2017. LNCS (LNAI), vol. 10535, pp. 737–753. Springer, Cham (2017).  https://doi.org/10.1007/978-3-319-71246-8_45CrossRefGoogle Scholar
  54. 54.
    Ren, B., Wang, H., Li, J., Gao, H.: Life-long learning based on dynamic combination model. Appl. Soft Comput. 56, 398–404 (2017)CrossRefGoogle Scholar
  55. 55.
    Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. In: Proceedings of the Advances in Neural Information Processing Systems (NIPS), pp. 91–99 (2015)Google Scholar
  56. 56.
    Rezende, D.J., Mohamed, S., Wierstra, D.: Stochastic backpropagation and approximate inference in deep generative models. In: Proceedings of the International Conference on Machine Learning (ICML), vol. PMLR 32, pp. 1278–1286 (2014)Google Scholar
  57. 57.
    Rissanen, J.: Modeling by shortest data description. Automatica 14(5), 465–471 (1978)CrossRefGoogle Scholar
  58. 58.
    Rusu, A.A., et al.: Progressive neural networks. arXiv preprint arXiv:1606.04671 (2016)
  59. 59.
    Salimans, T., Goodfellow, I., Zaremba, W., Cheung, V., Radford, A., Chen, X.: Improved techniques for training GANs. In: Proceedings of the Advances in Neural Information Processing Systems (NIPS), pp. 2234–2242 (2016)Google Scholar
  60. 60.
    Shin, H., Lee, J.K., Kim, J., Kim, J.: Continual learning with deep generative replay. In: Proceedings of the Advances in Neural Information Processing Systems (NIPS), pp. 2990–2999 (2017)Google Scholar
  61. 61.
    Srivastava, A., Valkov, L., Russell, C., Gutmann, M.U., Sutton, C.: VEEGAN: reducing mode collapse in GANs using implicit variational learning. In: Proceedings of the Advances in Neural Information Processing Systems (NIPS), pp. 3308–3318 (2017)Google Scholar
  62. 62.
    Tessler, C., Givony, S., Zahavy, T., Mankowitz, D.J., Mannor, S.: A deep hierarchical approach to lifelong learning in Minecraft. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp. 1553–1561 (2017)Google Scholar
  63. 63.
    Wang, X., Zhang, R., Sun, Y., Qi, J.: KDGAN: knowledge distillation with generative adversarial networks. In: Proceedings of the Advances in Neural Information Processing Systems (NIPS), pp. 775–786 (2018)Google Scholar
  64. 64.
    Wu, C., Herranz, L., Liu, X., van de Weijer, J., Raducanu, B.: Memory replay GANs: learning to generate new categories without forgetting. In: Proceedings of the Advances in Neural Information Processing Systems (NIPS), pp. 5962–5972 (2018)Google Scholar
  65. 65.
    Yoon, J., Yang, E., Lee, J., Hwang, S.J.: Lifelong learning with dynamically expandable networks. In: Proceedings of the International Conference on Learning Representations (ICLR). arXiv preprint arXiv:1708.01547 (2017)
  66. 66.
    Zhai, M., Chen, L., Tung, F., He, J., Nawhal, M., Mori, G.: Lifelong GAN: continual learning for conditional image generation. arXiv preprint arXiv:1907.10107 (2019)
  67. 67.
    Zhai, M., Chen, L., Tung, F., He, J., Nawhal, M., Mori, G.: Lifelong GAN: continual learning for conditional image generation. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV), pp. 2759–2768 (2019)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2020

Authors and Affiliations

  1. 1.Department of Computer ScienceUniversity of YorkYorkUK

Personalised recommendations