Deep Kernelized Autoencoders
- 3 Citations
- 1.8k Downloads
Abstract
In this paper we introduce the deep kernelized autoencoder, a neural network model that allows an explicit approximation of (i) the mapping from an input space to an arbitrary, user-specified kernel space and (ii) the back-projection from such a kernel space to input space. The proposed method is based on traditional autoencoders and is trained through a new unsupervised loss function. During training, we optimize both the reconstruction accuracy of input samples and the alignment between a kernel matrix given as prior and the inner products of the hidden representations computed by the autoencoder. Kernel alignment provides control over the hidden representation learned by the autoencoder. Experiments have been performed to evaluate both reconstruction and kernel alignment performance. Additionally, we applied our method to emulate kPCA on a denoising task obtaining promising results.
Keywords
Autoencoders Kernel methods Deep learning Representation learningNotes
Acknowledgments
We gratefully acknowledge the support of NVIDIA Corporation with the donation of the GPU used for this research. This work was partially funded by the Norwegian Research Council FRIPRO grant no. 239844 on developing the Next Generation Learning Machines.
References
- 1.Bakir, G.H., Weston, J., Schölkopf, B.: Learning to find pre-images. In: Advances in Neural Information Processing Systems, pp. 449–456 (2004)Google Scholar
- 2.Bengio, Y., Courville, A., Vincent, P.: Representation learning: a review and new perspectives. IEEE Trans. Pattern Anal. Mach. Intell. 35(8), 1798–1828 (2013)CrossRefGoogle Scholar
- 3.Bengio, Y.: Learning deep architectures for ai. Found. Trends Mach. Learn. 2(1), 1–127 (2009)MathSciNetCrossRefzbMATHGoogle Scholar
- 4.Boser, B.E., Guyon, I.M., Vapnik, V.N.: A training algorithm for optimal margin classifiers. In: Proceedings of the Fifth Annual Workshop on Computational Learning Theory, pp. 144–152 (1992)Google Scholar
- 5.Cho, Y., Saul, L.K.: Kernel methods for deep learning. In: Advances in Neural Information Processing Systems 22, pp. 342–350 (2009)Google Scholar
- 6.Cover, T.M., Thomas, J.A.: Elements of Information Theory. Wiley, New York (1991)CrossRefzbMATHGoogle Scholar
- 7.Cristianini, N., Elisseeff, A., Shawe-Taylor, J., Kandola, J.: On kernel-target alignment. In: Advances in Neural Information Processing Systems (2001)Google Scholar
- 8.Dai, B., Xie, B., He, N., Liang, Y., Raj, A., Balcan, M.F.F., Song, L.: Scalable kernel methods via doubly stochastic gradients. In: Advances in Neural Information Processing Systems, pp. 3041–3049 (2014)Google Scholar
- 9.Giraldo, L.G.S., Rao, M., Principe, J.C.: Measures of entropy from data using infinitely divisible kernels. IEEE Trans. Inf. Theory 61(1), 535–548 (2015)MathSciNetCrossRefzbMATHGoogle Scholar
- 10.Glorot, X., Bengio, Y.: Understanding the difficulty of training deep feedforward neural networks. In: Proceedings of the International Conference on Artificial Intelligence and Statistics (AISTATS10) (2010)Google Scholar
- 11.Hinton, G.E., Salakhutdinov, R.R.: Reducing the dimensionality of data with neural networks. Science 313(5786), 504–507 (2006)MathSciNetCrossRefzbMATHGoogle Scholar
- 12.Hinton, G.E., Osindero, S., Teh, Y.W.: A fast learning algorithm for deep belief nets. Neural Comput. 18(7), 1527–1554 (2006)MathSciNetCrossRefzbMATHGoogle Scholar
- 13.Honeine, P., Richard, C.: A closed-form solution for the pre-image problem in kernel-based machines. J. Sig. Process. Syst. 65(3), 289–299 (2011)CrossRefGoogle Scholar
- 14.Izquierdo-Verdiguier, E., Jenssen, R., Gómez-Chova, L., Camps-Valls, G.: Spectral clustering with the probabilistic cluster kernel. Neurocomputing 149, 1299–1304 (2015)CrossRefGoogle Scholar
- 15.Jenssen, R.: Kernel entropy component analysis. IEEE Trans. Pattern Anal. Mach. Intell. 32(5), 847–860 (2010)CrossRefGoogle Scholar
- 16.Kamyshanska, H., Memisevic, R.: The potential energy of an autoencoder. IEEE Trans. Pattern Anal. Mach. Intell. 37(6), 1261–1273 (2015)CrossRefGoogle Scholar
- 17.Kingma, D., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)
- 18.Kingma, D.P., Welling, M.: Auto-encoding variational bayes. arXiv preprint arXiv:1312.6114 (2013)
- 19.Kulis, B., Sustik, M.A., Dhillon, I.S.: Low-rank kernel learning with Bregman matrix divergences. J. Mach. Learn. Res. 10, 341–376 (2009)Google Scholar
- 20.Maaten, L.: Learning a parametric embedding by preserving local structure. In: International Conference on Artificial Intelligence and Statistics, pp. 384–391 (2009)Google Scholar
- 21.Montavon, G., Braun, M.L., Müller, K.R.: Kernel analysis of deep networks. J. Mach. Learn. Res. 12, 2563–2581 (2011)MathSciNetzbMATHGoogle Scholar
- 22.Ng, A.Y., Jordan, M.I., Weiss, Y., et al.: On spectral clustering: analysis and an algorithm. In: Advances in Neural Information Processing Systems, pp. 849–856 (2001)Google Scholar
- 23.Santana, E., Emigh, M., Principe, J.C.: Information theoretic-learning auto-encoder. arXiv preprint arXiv:1603.06653 (2016)
- 24.Schölkopf, B., Smola, A., Müller, K.R.: Nonlinear component analysis as a kernel eigenvalue problem. Neural Comput. 10(5), 1299–1319 (1998)CrossRefGoogle Scholar
- 25.Vincent, P., Larochelle, H., Lajoie, I., Bengio, Y., Manzagol, P.A.: Stacked denoising autoencoders: learning useful representations in a deep network with a local denoising criterion. J. Mach. Learn. Res. 11, 3371–3408 (2010)MathSciNetzbMATHGoogle Scholar
- 26.Wang, T., Zhao, D., Tian, S.: An overview of kernel alignment and its applications. Artif. Intell. Rev. 43(2), 179–192 (2015)CrossRefGoogle Scholar
- 27.Wilson, A.G., Hu, Z., Salakhutdinov, R., Xing, E.P.: Deep kernel learning. In: Proceedings of the 19th International Conference on Artificial Intelligence and Statistics, pp. 370–378 (2016)Google Scholar