Visualization-based disentanglement of latent space

Huang, Runze; Zheng, Qianying; Zhou, Haifang

doi:10.1007/s00521-021-06223-z

Visualization-based disentanglement of latent space

Original Article
Published: 19 July 2021

Volume 33, pages 16213–16228, (2021)
Cite this article

Neural Computing and Applications Aims and scope Submit manuscript

Runze Huang¹,
Qianying Zheng¹ &
Haifang Zhou¹

622 Accesses
1 Altmetric
Explore all metrics

Abstract

In recent years, selecting manipulation of data attributes by changing latent code using auto-encoder has received considerable scholarly attention . However, the representation of the data encoded by the auto-encoder cannot be visually observed. Furthermore, the attribute values and the latent code of the dimension do not conform to a linear monotonic relationship. From a practical point of view, we propose a novel method that uses the encoder–decoder architecture to disentangle data into two visualizable representations that are encoded as latent spaces. Consequently, the encoded latent space can be used to manipulate data attributes in a simple and intuitive way. The experiments on image dataset and music dataset show that the proposed approach leads to produce complete interpretable latent spaces, which can be used to manipulate a wide range of data attributes and to generate realistic music via analogy.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 10

Attribute-based regularization of latent spaces for variational auto-encoders

Article 07 August 2020

DRIT++: Diverse Image-to-Image Translation via Disentangled Representations

Article 03 February 2020

Attribute2Image: Conditional Image Generation from Visual Attributes

Notes

References

Shao Z, Huang M, Wen J, Xu W, Zhu X (2019) Long and diverse text generation with planning-based hierarchical variational model. In: International joint conference on natural language processing (IJCNLP), pp 3255–3266
Shen D, Celikyilmaz A, Zhang Y, Chen L, Wang X, Gao J, Carin L (2019) Towards generating long and coherent text with multi-level latent variable models. In: Meeting of the association for computational linguistics (ACL), pp 2079–2089
Zhang Y, Wang Y, Zhang L, Zhang Z, Gai K (2019) Improve diverse text generation by self labeling conditional variational auto encoder. In: International conference on acoustics speech and signal processing (ICASSP), pp 2767–2771
Hsu W, Zhang Y, Weiss R, Chung Y, Wang Y, Wu Y, Glass J (2019) Disentangling correlated speaker and noise for speech synthesis via data augmentation and adversarial factorization. In: International conference on acoustics speech and signal processing (ICASSP), pp 5901–5905
Hsu W, Zhang Y, Weiss R, Zen H, Wu Y, Wang Y, Cao Y, Jia Y, Chen Z, Shen J (2019) Hierarchical Generative Modeling for Controllable Speech Synthesis. In: international conference on learning representations (ICLR)
Luo Y, Agres K, Herremans D (2019) Learning disentangled representations of timbre and pitch for musical instrument sounds using gaussian mixture variational autoencoders. In: International symposium/conference on music information retrieval (ISMIR), pp 746–753
Wang Y, Stanton D, Zhang Y, Ryan R, Battenberg E, Shor J, Xiao Y, Jia Y, Ren F, Saurous RA (2018) Style tokens: unsupervised style modeling, control and transfer in end-to-end speech synthesis. In: International conference on machine learning (ICML), pp 5167–5176
Razavi A, Oord Avd, Vinyals O (2019) Generating diverse high-fidelity images with VQ-VAE-2. In: Neural information processing systems (NIPS)
Ślot K, Kapusta P, Kucharski J (2020) Autoencoder-based image processing framework for object appearance modifications. Neural Computing and Applications (NCAA)
Qi CR, Yi L, Su H, Guibas LJ (2017) PointNet++: deep hierarchical feature learning on point sets in a metric space. In: Neural Information Processing Systems (NIPS)
Brunner G, Konrad A, Wang Y, Wattenhofer R (2018) MIDI-VAE: modeling dynamics and instrumentation of music with applications to style transfer. In: International symposium/conference on music information retrieval (ISMIR), pp 747–754
9. Esling P, Chemlaromeusantos A, Bitton A (2018) Bridging audio analysis, perception and synthesis with perceptually-regularized variational timbre spaces. In: International symposium/conference on music information retrieval (ISMIR), pp 175–181
Roberts A, Engel J, Raffel C, Hawthorne C, Eck DJaL (2018) A hierarchical latent vector model for learning long-term structure in music. In: International conference on machine learning (ICML)
Rubenstein PK, Scholkopf B, Tolstikhin I (2018) Learning disentangled representations with wasserstein auto-encoders. In: International conference on learning representations (ICLR)
Kingma DP, Welling M (2014) Auto-encoding variational Bayes. In: International conference on learning representations (ICLR)
Hadjeres G, Nielsen F, Pachet F, Ieee (2017) GLSR-VAE: geodesic latent space regularization for variational autoencoder architectures. In: IEEE symposium series on computational intelligence (SSCI)
Brunner G, Konrad A, Wang Y, Wattenhofer R (2018) MIDI-VAE: modeling dynamics and instrumentation of music with applications to style transfer. In: International symposium/conference on music information retrieval (ISMIR)
Pati A, Lerch A, Hadjeres G (2019) Learning to traverse latent spaces for musical score inpainting. In: international symposium/conference on music information retrieval (ISMIR)
Rezaabad AL, Vishwanath S (2020) Learning representations by maximizing mutual information in variational autoencoders. In: International symposium on information theory (ISIT)
Gao S, Brekelmans R, Steeg GV, Galstyan A (2019) Auto-encoding total correlation explanation. In: International conference on artificial intelligence and statistics
Achille A, Soatto S (2018) Information dropout: learning optimal representations through noisy computation. IEEE Trans Pattern Anal Mach Intell (TPAMI) 40(12):2897–2905
Article Google Scholar
Kim H, Mnih A (2018) Disentangling by factorising. In: International conference on machine learning (ICML)
Castro DCD, Tan J, Kainz B, Konukoglu E, Glocker B (2019) Morpho-MNIST: quantitative assessment and diagnostics for representation learning. J Mach Learn Res (JMLR) 20(178):1–29
MathSciNet MATH Google Scholar
Foxley E (2011) Nottingham database. https://github.com/jukedeck/nottingham-dataset
Yingzhen L, Mandt S (2018) Disentangled sequential autoencoder. In: International conference on machine learning (ICML)
Jha AH, Anand S, Singh M, Veeravasarapu VSR (2018) Disentangling factors of variation with cycle-consistent variational auto-encoders. In: European conference on computer vision (ECCV)
Hadad N, Wolf L, Shahar M (2018) A two-step disentanglement method. In: Computer vision and pattern recognition (CVPR)
Zhao S, Song J, Ermon S (2017) InfoVAE: information maximizing variational autoencoders. arXiv:1706.02262 [cs, stat]
Houthooft R, Chen X, Duan Y, Schulman J, Turck FD, Abbeel P (2016) VIME: variational information maximizing exploration. In: Neural information processing systems (NIPS)
Esmaeili B, Wu H, Jain S, Bozkurt A, Siddharth N, Paige B, Brooks DH, Dy JG, Meent J-Wvd (2019) Structured disentangled representations. In: International conference on artificial intelligence and statistics
Carter S, Nielsen M (2017) Using artificial intelligence to augment human intelligence. vol 2. https://doi.org/10.23915/DISTILL.00009
Locatello F, Bauer S, Lucic M, Ratsch G, Gelly S, Scholkopf B, Bachem O (2019) Challenging common assumptions in the unsupervised learning of disentangled representations. In: International conference on learning representations (ICLR)
Pesteie M, Abolmaesumi P, Rohling RN (2019) Adaptive augmentation of medical data using independently conditional variational auto-encoders. IEEE Trans Med Imaging 38(12):2807–2820
Article Google Scholar
Sohn K, Yan X, Lee H (2015) Learning structured output representation using deep conditional generative models. In: Neural information processing systems (NIPS)
Pandey G, Dukkipati A (2017) Variational methods for conditional multimodal deep learning. In: International Joint Conference on Neural Network (IJCNN)
Kulkarni TD, Whitney WF, Kohli P, Tenenbaum JB (2015) Deep convolutional inverse graphics network. In: Neural information processing systems (NIPS)
Pati A, Lerch A (2020) Attribute-based regularization of latent spaces for variational auto-encoders. Neural Computing and Applications (NCAA)
Kaliakatsos-Papakostas M, Floros A, Vrahatis MN (2020) Artificial intelligence methods for music generation: a review and future perspectives. In 217:217–245
Google Scholar
Yang R, Chen T, Zhang Y, Xia G (2019) Inspecting and interacting with meaningful music representations using VAE. In: New interfaces for musical expression, pp 307–312
Esling P, Chemla-Romeu-Santos A, Bitton A (2018) Bridging audio analysis, perception and synthesis with perceptually-regularized variational timbre spaces. In: International symposium/conference on music information retrieval
Jing L, Xinyu Y, Shulei J, Juan L (2019) MG-VAE: Deep chinese folk songs generation with specific regional styles. In: Conference on sound and music technology (CSMT)
Yun-Ning H, Yi-AN C, Yi-Hsuan Y (2018) Learning disentangled representations for timber and pitch. arXiv:1811:03271v1 [cs.SD]
Yang R, Wang D, Wang Z, Chen T, Jiang J, Xia G (2019) Deep music analogy via latent representation disentanglement. In: International symposium/conference on music information retrieval (ISMIR), pp 596–603
Chung J, Gülčehre vC, Cho K, Bengio Y (2014) Empirical evaluation of gated recurrent neural networks on sequence modeling. arXiv:1412.3555 [cs, stat]
Higgins I, Matthey L, Pal A, Burgess C, Glorot X, Botvinick M, Mohamed S, Lerchner A (2016) beta-VAE: learning basic visual concepts with a constrained variational framework. In: International conference on learning representations (ICLR)
Adel T, Ghahramani Z, Weller A (2018) Discovering interpretable representations for both deep generative and discriminative models. In: International conference on machine learning (ICML)
Eastwood C, Williams CKI (2018) A framework for the quantitative evaluation of disentangled representations. In: International Conference on Learning Representations (ICLR)
Chen TQ, Li X, Grosse RB, Duvenaud D (2018) Isolating sources of disentanglement in variational autoencoders. In: International Conference on Learning Representations (ICLR)
Ridgeway K, Mozer MC (2018) Learning deep disentangled embeddings with the F-statistic loss. In: Neural Information Processing Systems (NIPS)
Kumar A, Sattigeri P, Balakrishnan A (2017) Variational inference of disentangled latent concepts from unlabeled observations. In: International Conference on Learning Representations (ICLR)
Scheffe H (1999) The analysis of variance, vol 72. Wiley
Sak H, Senior AW, Beaufays F (2014) Long short-term memory recurrent neural network architectures for large scale acoustic modeling. In: Conference of the international speech communication association

Download references

Acknowledgements

The authors would like to acknowledge the supports by the National Natural Science Foundation of China (Grant No. 61471124), Key Industrial Guidance Projects of Fujian Science and Technology Department (Grant No. 2020H0007).

Author information

Authors and Affiliations

College of Physics and Information Engineering, Fuzhou University, Fuzhou, China
Runze Huang, Qianying Zheng & Haifang Zhou

Authors

Runze Huang
View author publications
You can also search for this author in PubMed Google Scholar
Qianying Zheng
View author publications
You can also search for this author in PubMed Google Scholar
Haifang Zhou
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Qianying Zheng.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix 1: Network architecture

Image-based models: For the MNIST digits dataset, a stacked convolutional encoder–decoder architecture is used. The encoder consists of four two-dimensional convolutional layers followed by a stack of three linear layers. The structure of decoder is similar to the encoder and consists of a stack of three linear layers followed by four two-dimensional convolutional layers. The network details are shown in Table 3.

Music-based models: For the music dataset, the model architecture is based on other previous works. A hierarchical recurrent GRUs architecture is used. Figure 6 shows the schematic of the decoder architecture, and the network details are shown in Table 4.

Appendix 2: Additional results

Some additional generated examples of the MNIST handwritten digits are shown in Fig. 21.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Huang, R., Zheng, Q. & Zhou, H. Visualization-based disentanglement of latent space. Neural Comput & Applic 33, 16213–16228 (2021). https://doi.org/10.1007/s00521-021-06223-z

Download citation

Received: 26 December 2020
Accepted: 08 June 2021
Published: 19 July 2021
Issue Date: December 2021
DOI: https://doi.org/10.1007/s00521-021-06223-z

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Visualization-based disentanglement of latent space

Abstract

Access this article

Similar content being viewed by others

Attribute-based regularization of latent spaces for variational auto-encoders

DRIT++: Diverse Image-to-Image Translation via Disentangled Representations

Attribute2Image: Conditional Image Generation from Visual Attributes

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Appendices

Appendix 1: Network architecture

Appendix 2: Additional results

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Visualization-based disentanglement of latent space

Abstract

Access this article

Similar content being viewed by others

Attribute-based regularization of latent spaces for variational auto-encoders

DRIT++: Diverse Image-to-Image Translation via Disentangled Representations

Attribute2Image: Conditional Image Generation from Visual Attributes

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Appendices

Appendix 1: Network architecture

Appendix 2: Additional results

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation