Skip to main content
Log in

PCA-AE: Principal Component Analysis Autoencoder for Organising the Latent Space of Generative Networks

  • Published:
Journal of Mathematical Imaging and Vision Aims and scope Submit manuscript

Abstract

Autoencoders and generative models produce some of the most spectacular deep learning results to date. However, understanding and controlling the latent space of these models presents a considerable challenge. Drawing inspiration from principal component analysis and autoencoders, we propose the principal component analysis autoencoder (PCA-AE). This is a novel autoencoder whose latent space verifies two properties. Firstly, the dimensions are organised in decreasing importance with respect to the data at hand. Secondly, the components of the latent space are statistically independent. We achieve this by progressively increasing the latent space during training, and with a covariance loss applied to the latent codes. The resulting autoencoder produces a latent space which separates the intrinsic attributes of the data into different components of the latent space, in a completely unsupervised manner. We also describe an extension of our approach to the case of powerful, pre-trained GANs. We show results on both synthetic examples of shapes and on a state-of-the-art GAN. For example, we are able to separate the colour shade scale of hair, pose of faces and gender, without accessing any labels. We compare the PCA-AE with other state-of-the-art approaches, in particular with respect to the ability to disentangle attributes in the latent space. We hope that this approach will contribute to better understanding of the intrinsic latent spaces of powerful deep generative models.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

Notes

  1. The following argmin are to be understood up to a sign.

  2. https://github.com/YannDubs/disentangling-vae.

  3. PyTorch GAN zoo: https://github.com/facebookresearch/pytorch_GAN_zoo.

  4. StyleGAN Code: https://github.com/rosinality/style-based-gan-pytorch.

References

  1. Kingma, D.P.,Welling, M.: Auto-encoding variational bayes. In: International Conference on Learning Representations (2014)

  2. Van den Aaron, O., Nal, K., Lasse, E., Oriol, V., Alex, G., et al.: Conditional image generation with pixelcnn decoders. In: Advances in Neural Information Processing Systems, pp. 4790–4798 (2016)

  3. Casper, K., Sønderby, T., Raiko, L., Maaløe, S., Kaae, S., Ole, W.: Ladder variational autoencoders. In: Advances in Neural Information Processing Systems, pp. 3738–3746 (2016)

  4. Ilya, T., Olivier, B., Sylvain, G., Bernhard, S.: Wasserstein auto-encoders. In: International Conference on Learning Representations (2018)

  5. Huaibo, H., Ran, H., Zhenan, S., Tieniu, T., et al.: Introvae: introspective variational autoencoders for photographic image synthesis. In: Advances in Neural Information Processing Systems, pp. 52–63 (2018)

  6. Ari, H., Arno, S., Juho, K.: Towards photographic image manipulation with balanced growing of generative autoencoders. In The IEEE Winter Conference on Applications of Computer Vision, pp. 3120–3129 (2020)

  7. Ian, G., Jean, P.-A., Mehdi, M., Bing, X., David, W.-F., Sherjil, O., Aaron, C., Yoshua, B.: Generative adversarial nets. In: Advances in Neural Information Processing Systems, pp. 2672–2680 (2014)

  8. Tim, S., Ian, G., Wojciech, Zaremba, V., Cheung, A., Radford, X., Chen: Improved techniques for training gans. In: Advances in Neural Information Processing Systems, pp. 2234–2242 (2016)

  9. Tero, Karras, Timo, Aila, Samuli, Laine, Jaakko, Lehtinen: Progressive growing of gans for improved quality, stability, and variation. In International Conference on Learning Representations, 2018

  10. Akash, S., Lazar, V., Chris, R., Michael, U., Gutmann, C., Sutton: Veegan: reducing mode collapse in gans using implicit variational learning. In: Advances in Neural Information Processing Systems, pp. 3308–3318 (2017)

  11. Tero, K., Samuli, L., Timo, A.: A style-based generator architecture for generative adversarial networks. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (2019)

  12. Yunjey, C., Minje, C., Munyoung, K., Jung-Woo, H., Sunghun, K., Jaegul, C.: Stargan: unified generative adversarial networks for multi-domain image-to-image translation. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (2018)

  13. Jun-Yan, Z., Richard, Z., Deepak, P., Trevor, D., Alexei, A., Efros, O., Wang, E., Shechtman: Toward multimodal image-to-image translation. In: Advances in Neural Information Processing Systems, pp. 465–476 (2017)

  14. Chi-Hieu, P., Carlos, T., Hélène, M., Nathalie, B., Ronan, F., Nicolas, P., François, R.: Simultaneous super-resolution and segmentation using a generative adversarial network: Application to neonatal brain MRI. In: International Symposium on Biomedical Imaging, pp. 991–994 (2019)

  15. Durk, P., Kingma, S., Mohamed, D., Jimenez, R., Max, W.: Semi-supervised learning with deep generative models. In: Advances in Neural Information Processing Systems, pp. 3581–3589 (2014)

  16. Scott, R., Kihyuk, S., Yuting, Z., Honglak, L.: Learning to disentangle factors of variation with manifold interaction. In: International Conference on Machine Learning, pp. 1431–1439 (2014)

  17. Michael, F., Mathieu, J., Jake, Z., Junbo, Z., Aditya, R., Pablo, S., Yann, L.: Disentangling factors of variation in deep representation using adversarial training. In: Advances in Neural Information Processing Systems, pp. 5040–5048 (2016)

  18. Emily, L., Denton, et al.: Unsupervised learning of disentangled representations from video. In: Advances in Neural Information Processing Systems, pp. 4414–4423 (2017)

  19. Wei-Ning, H., Yu, Z., James, G.: Unsupervised learning of disentangled and interpretable representations from sequential data. In: Advances in Neural Information Processing Systems, pp. 1878–1889 (2017)

  20. Narayanaswamy, S., Brooks, P., Jan-Willem, Van de, M., Alban, D., Noah, G., Pushmeet, K., Frank, Wood, Philip, Tood: Learning disentangled representations with semi-supervised deep generative models. In: Advances in Neural Information Processing Systems, pp. 5925–5935 (2017)

  21. Marc’Aurelio, R., Y-Lan, B., Yann, L., Cun: Sparse feature learning for deep belief networks. In: Advances in Neural Information Processing Systems, pp. 1185–1192 (2008)

  22. Xavier, Glorot, A., Bordes, Y.: Bengio: deep sparse rectifier neural networks. In: Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics (2011)

  23. Alireza, M., Brendan, F.: K-sparse autoencoders. In: International Conference on Learning Representations (2014)

  24. Alec, R., Luke, M., Soumith, C.: Unsupervised representation learning with deep convolutional generative adversarial networks. In: International Conference on Learning Representations (2016)

  25. Martin, A., Soumith, C., Léon, B.: Wasserstein generative adversarial networks. In: International Conference on Machine Learning, pp. 214–223 (2017)

  26. Ishaan, G., Faruk, A., Martin, A., Vincent, D., Aaron, C.: Courville: improved training of wasserstein gans. In: Advances in Neural Information Processing Systems, pp. 5767–5777 (2017)

  27. Xi, C., Yan, D., Rein, H., John, S., Ilya, S., Pieter, A.: Infogan: interpretable representation learning by information maximizing generative adversarial nets. In: Advances in Neural Information Processing Systems, pp. 2172–2180 (2016)

  28. Augustus, O., Christopher, O., Jonathon, S.: Conditional image synthesis with auxiliary classifier gans. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2642–2651. JMLR. org (2017)

  29. Xinchen, Y., Jimei, Y., Kihyuk, S., Honglak, L.: Attribute2image: conditional image generation from visual attributes. In: European Conference on Computer Vision, pp. 776–791. Springer (2016)

  30. Sudipto, M., Himanshu, A., Eugene, L., Sreeram, K.: Clustergan: latent space clustering in generative adversarial networks. In: Proceedings of the AAAI Conference on Artificial Intelligence 33, pp. 4610–4617 (2019)

  31. Quentin, D., Chi-Hieu, P., Clément, C., Carlos, T., Guillaume, D., Hélène, M., Nathalie, B., Ronan, F., Nicolas, P., François, R.: SegSRGAN: super-resolution and segmentation uing generative adversarial networks \(-\) application to neonatal brain MRI. Comput. Biol. Med., 103755 (2020)

  32. Salah, R., Yoshua, B., Aaron, C., Pascal, V., Mehdi, M.: Disentangling factors of variation for facial expression recognition. In: European Conference on Computer Vision, pp. 808–822. Springer (2012)

  33. Brian, C., Jesse, A., Livezey, A.K., Bansal, B.A.: Olshausen: discovering hidden factors of variation in deep networks. arXiv:1412.6583 (2014)

  34. Abhishek, K., Prasanna, S., Avinash, B.: Variational inference of disentangled latent concepts from unlabeled observations. In: International Conference on Learning Representations (2018)

  35. José, L.: Overcoming the disentanglement vs reconstruction trade-off via jacobian supervision. In: International Conference on Learning Representations (2019)

  36. Guillaume, L., Neil, Z., Nicolas, U., Antoine, B., Ludovic, D., et al.: Fader networks: manipulating images by sliding attributes. In: Advances in Neural Information Processing Systems, pp. 5967–5976 (2017)

  37. Irina, H., Loic, M., Arka, P., Christopher, B., Xavier, G., Matthew, B., Shakir, M., Alexander, L.: \(\beta \)-vae: learning basic visual concepts with a constrained variational framework. In: International Conference on Learning Representations, vol. 2, p. 6 (2017)

  38. Christopher, P., Burgess, I., Higgins, A., Pal, L., Matthey, N., Watters, G., Desjardins, A., Lerchner.: Understanding disentangling in \(\beta \)-vae. In: NIPS Workshop on Learning Disentangled Representations (2018)

  39. Hyunjik, K., Andriy, M.: Disentangling by factorising. In: International Conference on Machine Learning (2018)

  40. Tian, Q., Chen, X., Li, R.B, Grosse, D.K., Duvenaud.: Isolating sources of disentanglement in variational autoencoders. In: Advances in Neural Information Processing Systems, pp. 2610–2620 (2018)

  41. Wei, W., Dan, Y., Feiyu, C., Yunsheng, P., Sheng, H., Yongxin, G.: Clustering with orthogonal autoencoder. IEEE. Access 7, 62421–62432 (2019)

  42. Yaodong, Y., Kwan, H.R., Chan, C., You, C., Song, Y.M.: Learning diverse and discriminative representations via the principle of maximal coding rate reduction. Advances in Neural Information Processing Systems, 33 (2020)

  43. Sergey, I., Christian, S.: Batch normalization: accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015)

  44. Diederik, K., Jimmy, B.: Adam: a method for stochastic optimization. In: International Conference on Learning Representations (2015)

  45. Ziwei, L., Ping, L., Xiaogang, W., Xiaoou, T.: Deep learning face attributes in the wild. In: Proceedings of International Conference on Computer Vision (2015)

  46. ://www.faceplusplus.com/.Face++ cognitive services

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Chi-Hieu Pham.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Submitted to the editors. This work was funded by the Labex DIGICOSME.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Pham, CH., Ladjal, S. & Newson, A. PCA-AE: Principal Component Analysis Autoencoder for Organising the Latent Space of Generative Networks. J Math Imaging Vis 64, 569–585 (2022). https://doi.org/10.1007/s10851-022-01077-z

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10851-022-01077-z

Keywords

Navigation