Abstract
Although deep learning shows high performance in pattern recognition and machine learning, the reasons are little clarified. To tackle this problem, we calculated the information theoretical variables of representations in hidden layers and analyzed their relationship to the performance. We found that the entropy and the mutual information decrease in a different way as the layer gets deeper. This suggests that the information theoretical variables may become a criterion to determine the number of layers in deep learning.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Hinton, G.E., Osindero, S., Teh, Y.: A fast learning algorithm for deep belief nets. Neural Comp. 12, 531–545 (2006)
Seide, F., Li, G., Yu, D.: Conversational speech transcription using context-dependent deep neural networks. In: Proceedings Interspeech, pp. 437–440 (2011)
Zeiler, M.D., Fergus, R.: Visualizing and understanding convolutional networks. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014, Part I. LNCS, vol. 8689, pp. 818–833. Springer, Heidelberg (2014)
Watanabe, S.: Algebraic analysis for nonidentifiable learning machines. Neural Comp. 13, 899–933 (2001)
Fukumizu, K., Akaho, S., Amari, S.: Critical lines in symmetry of mixture models and its application to component splitting. NIPS 15, 857–864 (2003)
Erhan, D., Bengio, Y., Courville, A., Manzagol, P.A., Vincent, P., Bengio, S.: Why does unsupervised pre-training help deep learning? JMLR 11, 625–660 (2010)
Hinton, G.E., Srivastava, N., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.R.: Improving neural networks by preventing co-adaptation of feature detectors. arXiv:1207.0580 (2012)
Vincent, P., Larochella, H., Lajoie, I., Bengio, Y., Manzagol, P.-A.: Stacked demonising autoencoders: learning useful representations in a deep network with a local denoising criterion. JMLR 11, 3371–3408 (2010)
LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proc. IEEE 86, 2278–2324 (1998)
Bengio, Y., Lamblin, P., Popovici, D., Larochelle, H.: Greedy layer-wise training of deep networks. NIPS 19, 153–160 (2007)
Rifai, S., Vincent, P., Muller, X., Glorot, X., Bengio, Y.: Contracting auto-encoders: explicit invariance during feature extraction. In: Proceedings ICML, pp. 833–840 (2011)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Furusho, Y., Kubo, T., Ikeda, K. (2015). Information Theoretical Analysis of Deep Learning Representations. In: Arik, S., Huang, T., Lai, W., Liu, Q. (eds) Neural Information Processing. ICONIP 2015. Lecture Notes in Computer Science(), vol 9489. Springer, Cham. https://doi.org/10.1007/978-3-319-26532-2_66
Download citation
DOI: https://doi.org/10.1007/978-3-319-26532-2_66
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-26531-5
Online ISBN: 978-3-319-26532-2
eBook Packages: Computer ScienceComputer Science (R0)