Disentangling Factors of Variation with Cycle-Consistent Variational Auto-encoders

  • Ananya Harsh JhaEmail author
  • Saket Anand
  • Maneesh Singh
  • VSR Veeravasarapu
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11207)


Generative models that learn disentangled representations for different factors of variation in an image can be very useful for targeted data augmentation. By sampling from the disentangled latent subspace of interest, we can efficiently generate new data necessary for a particular task. Learning disentangled representations is a challenging problem, especially when certain factors of variation are difficult to label. In this paper, we introduce a novel architecture that disentangles the latent space into two complementary subspaces by using only weak supervision in form of pairwise similarity labels. Inspired by the recent success of cycle-consistent adversarial architectures, we use cycle-consistency in a variational auto-encoder framework. Our non-adversarial approach is in contrast with the recent works that combine adversarial training with auto-encoders to disentangle representations. We show compelling results of disentangled latent subspaces on three datasets and compare with recent works that leverage adversarial training.


Disentangling factors of variation Cycle-consistent architecture Variational auto-encoders 



We are thankful for the insightful feedback from anonymous ECCV reviewers. We acknowledge Infosys Center for AI at IIIT-Delhi for partially supporting this research. We also appreciate the support from Verisk Analytics for its successful execution.

Supplementary material

474178_1_En_49_MOESM1_ESM.pdf (1.2 mb)
Supplementary material 1 (pdf 1214 KB)


  1. 1.
    Edwards, H., Storkey, A.J.: Censoring representations with an adversary. In: International Conference in Learning Representations, ICLR 2016 (2016)Google Scholar
  2. 2.
    Louizos, C., Swersky, K., Li, Y., Welling, M., Zemel, R.S.: The variational fair autoencoder. In: International Conference in Learning Representations, ICLR 2016 (2016)Google Scholar
  3. 3.
    Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: Proceedings of the 25th International Conference on Neural Information Processing Systems - NIPS 2012, vol. 1, pp. 1097–1105 (2012)Google Scholar
  4. 4.
    Goodfellow, I.J., et al.: Generative adversarial nets. In: NIPS, pp. 2672–2680 (2014)Google Scholar
  5. 5.
    Radford, A., Metz, L., Chintala, S.: Unsupervised representation learning with deep convolutional generative adversarial networks. In: International Conference in Learning Representations, ICLR 2016 (2016)Google Scholar
  6. 6.
    Xiao, T., Hong, J., Ma, J.: DNA-GAN: learning disentangled representations from multi-attribute images. arXiv preprint arXiv:1711.05415 (2017)
  7. 7.
    Mathieu, M., Zhao, J.J., Sprechmann, P., Ramesh, A., LeCun, Y.: Disentangling factors of variation in deep representation using adversarial training. In: NIPS, pp. 5041–5049 (2016)Google Scholar
  8. 8.
    Szabó, A., Hu, Q., Portenier, T., Zwicker, M., Favaro, P.: Challenges in disentangling independent factors of variation. arXiv preprint arXiv:1711.02245 (2017)
  9. 9.
    Kingma, D.P., Welling, M.: Auto-encoding variational Bayes. In: International Conference in Learning Representations, ICLR 2014 (2014)Google Scholar
  10. 10.
    He, D., et al.: Dual learning for machine translation. Adv. Neural Inf. Process. Syst. 29, 820–828 (2016)Google Scholar
  11. 11.
    Zhou, T., Krähenbühl, P., Aubry, M., Huang, Q., Efros, A.A.: Learning dense correspondence via 3D-guided cycle consistency. In: CVPR, pp. 117–126. IEEE Computer Society (2016)Google Scholar
  12. 12.
    Godard, C., Mac Aodha, O., Brostow, G.J.: Unsupervised monocular depth estimation with left-right consistency. In: CVPR (2017)Google Scholar
  13. 13.
    Zhu, J., Park, T., Isola, P., Efros, A.A.: Unpaired image-to-image translation using cycle-consistent adversarial networks. In: ICCV, pp. 2242–2251. IEEE Computer Society (2017)Google Scholar
  14. 14.
    Hoffman, J., et al.: CyCADA: cycle-consistent adversarial domain adaptation. arXiv preprint arXiv:1711.03213 (2017)
  15. 15.
    Ghahramani, Z.: Factorial learning and the EM algorithm. In: Proceedings of the 7th International Conference on Neural Information Processing Systems, NIPS 1994, Cambridge, MA, USA, pp. 617–624. MIT Press (1994)Google Scholar
  16. 16.
    Tenenbaum, J.B., Freeman, W.T.: Separating style and content with bilinear models. Neural Comput. 12(6), 1247–1283 (2000)CrossRefGoogle Scholar
  17. 17.
    Desjardins, G., Courville, A.C., Bengio, Y.: Disentangling factors of variation via generative entangling. arXiv preprint arXiv:1210.5474 (2012)
  18. 18.
    Reed, S.E., Sohn, K., Zhang, Y., Lee, H.: Learning to disentangle factors of variation with manifold interaction. In: ICML, Volume 32 of JMLR Workshop and Conference Proceedings, pp. 1431–1439. (2014)Google Scholar
  19. 19.
    Tang, Y., Salakhutdinov, R., Hinton, G.E.: Deep Lambertian networks. In: ICML. (2012)Google Scholar
  20. 20.
    Kulkarni, T.D., Whitney, W.F., Kohli, P., Tenenbaum, J.B.: Deep convolutional inverse graphics network. In: Proceedings of the 28th International Conference on Neural Information Processing Systems - Volume 2, NIPS 2015, Cambridge, MA, USA, pp. 2539–2547. MIT Press (2015)Google Scholar
  21. 21.
    Tran, L., Yin, X., Liu, X.: Disentangled representation learning GAN for pose-invariant face recognition. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), July 2017Google Scholar
  22. 22.
    Donahue, C., Balsubramani, A., McAuley, J., Lipton, Z.C.: Semantically decomposing the latent spaces of generative adversarial networks. In: International Conference in Learning Representations, ICLR 2018 (2018)Google Scholar
  23. 23.
    Berthelot, D., Schumm, T., Metz, L.: BEGAN: boundary equilibrium generative adversarial networks. arXiv preprint arXiv:1703.10717 (2017)
  24. 24.
    Reed, S.E., Zhang, Y., Zhang, Y., Lee, H.: Deep visual analogy-making. In: NIPS, pp. 1252–1260 (2015)Google Scholar
  25. 25.
    Makhzani, A., Shlens, J., Jaitly, N., Goodfellow, I.: Adversarial autoencoders. In: International Conference on Learning Representations (2016)Google Scholar
  26. 26.
    Hu, Q., Szabó, A., Portenier, T., Zwicker, M., Favaro, P.: Disentangling factors of variation by mixing them. arXiv preprint arXiv:1711.07410 (2017)
  27. 27.
    Bouchacourt, D., Tomioka, R., Nowozin, S.: Multi-level variational autoencoder: learning disentangled representations from grouped observations. In: Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence, New Orleans, Louisiana, USA, 2–7 February 2018 (2018)Google Scholar
  28. 28.
    Lecun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. In: Proceedings of the IEEE, pp. 2278–2324 (1998)CrossRefGoogle Scholar
  29. 29.
    Liberated Pixel Cup. Accessed 21 Feb 2018
  30. 30.
    Hinterstoisser, S., et al.: Model based training, detection and pose estimation of texture-less 3D objects in heavily cluttered scenes. In: Lee, K.M., Matsushita, Y., Rehg, J.M., Hu, Z. (eds.) ACCV 2012. LNCS, vol. 7724, pp. 548–562. Springer, Heidelberg (2013). Scholar
  31. 31.
    Wohlhart, P., Lepetit, V.: Learning descriptors for object recognition and 3D pose estimation. In: CVPR, pp. 3109–3118. IEEE Computer Society (2015)Google Scholar
  32. 32.
    Paszke, A., et al.: Automatic Differentiation in PyTorch (2017)Google Scholar
  33. 33.
    van der Maaten, L., Hinton, G.: Visualizing high-dimensional data using t-SNE. J. Mach. Learn. Res. 9, 2579–2605 (2008)zbMATHGoogle Scholar

Copyright information

© Springer Nature Switzerland AG 2018

Authors and Affiliations

  • Ananya Harsh Jha
    • 1
    Email author
  • Saket Anand
    • 1
  • Maneesh Singh
    • 2
  • VSR Veeravasarapu
    • 2
  1. 1.IIIT-DelhiDelhiIndia
  2. 2.Verisk AnalyticsJersey CityUSA

Personalised recommendations