This paper presents a new general framework for turning any auto-encoder into a generative model. Here, we focus on a specific instantiation of the auto-encoder that consists of the Short Time Fourier Transform as an encoder, and a composition of the Griffin-Lim Algorithm and the pseudo inverse of the Short Time Fourier Transform as a decoder. In order to allow sampling from this model, we propose to use the probabilistic Principal Component Analysis. We show preliminary results on the UrbanSound8K Dataset.
- Generative modelling
- Fourier Transform