Training Neural Networks with Implicit Variance

Bayer, Justin; Osendorfer, Christian; Urban, Sebastian; van der Smagt, Patrick

doi:10.1007/978-3-642-42042-9_17

Justin Bayer²⁰,
Christian Osendorfer²⁰,
Sebastian Urban²⁰ &
…
Patrick van der Smagt²⁰

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 8227))

Included in the following conference series:

International Conference on Neural Information Processing

3643 Accesses
1 Citations

Abstract

We present a novel method to train predictive Gaussian distributions p(z|x) for regression problems with neural networks. While most approaches either ignore or explicitly model the variance as another response variable, it is trained implicitly in our case. Establishing stochasticty by the injection of noise into the input and hidden units, the outputs are approximated with a Gaussian distribution by the forward propagation method introduced for fast dropout [1]. We have designed our method to respect that probabilistic interpretation of the output units in the loss function. The method is evaluated on a synthetic and a inverse robot dynamics task, yielding superior performance to plain neural networks, Gaussian processes and LWPR in terms of likelihood.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Wang, S., Manning, C.: Fast dropout training. In: Proceedings of the 30th International Conference on Machine Learning (ICML 2013), pp. 118–126 (2013)
Google Scholar
Krizhevsky, A., Sutskever, I., Hinton, G.: Imagenet classification with deep convolutional neural networks. Advances in Neural Information Processing Systems 25, 1106–1114 (2012)
Google Scholar
Dahl, G.E., Yu, D., Deng, L., Acero, A.: Context-dependent pre-trained deep neural networks for large-vocabulary speech recognition. IEEE Transactions on Audio, Speech, and Language Processing 20(1), 30–42 (2012)
Article Google Scholar
Larochelle, H., Erhan, D., Courville, A., Bergstra, J., Bengio, Y.: An empirical evaluation of deep architectures on problems with many factors of variation. In: Proceedings of the 24th International Conference on Machine Learning, pp. 473–480. ACM (2007)
Google Scholar
Hinton, G.E., Salakhutdinov, R.R.: Reducing the dimensionality of data with neural networks. Science 313(5786), 504–507 (2006)
Article MathSciNet MATH Google Scholar
Ciresan, D., Meier, U., Schmidhuber, J.: Multi-column deep neural networks for image classification. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3642–3649. IEEE (2012)
Google Scholar
Zeiler, M., Ranzato, M., Monga, R., Mao, M., Yang, K., Le, Q., Nguyen, P., Senior, A., Vanhoucke, V., Dean, J., et al.: On rectified linear units for speech processing, ICASSP (2013)
Google Scholar
Hinton, G.E., Srivastava, N., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.R.: Improving neural networks by preventing co-adaptation of feature detectors. arXiv preprint arXiv:1207.0580 (2012)
Google Scholar
Neal, R.M.: Connectionist learning of belief networks. Artificial Intelligence 56(1), 71–113 (1992)
Article MathSciNet MATH Google Scholar
Bengio, Y., Thibodeau-Laufer, É.: Deep generative stochastic networks trainable by backprop (2013)
Google Scholar
Rumelhart, D.E., Hinton, G.E., Williams, R.J.: Learning representations by back-propagating errors. Nature 323(6088), 533–536 (1986)
Article Google Scholar
Tang, Y., Salakhutdinov, R.: A new learning algorithm for stochastic feedforward neural nets (2013)
Google Scholar
Bengio, Y.: Estimating or propagating gradients through stochastic neurons. arXiv preprint arXiv:1305.2982 (2013)
Google Scholar
Salakhutdinov, R., Hinton, G.: Using deep belief nets to learn covariance kernels for gaussian processes. Advances in Neural Information Processing Systems 20, 1249–1256 (2008)
Google Scholar
Uria, B., Murray, I., Renals, S., Richmond, K.: Deep architectures for articulatory inversion. In: Proceedings of Interspeech (2012)
Google Scholar
Bishop, C.M.: Mixture density networks (1994)
Google Scholar
Werbos, P.: Beyond regression: New tools for prediction and analysis in the behavioral sciences (1974)
Google Scholar
Le Cun, Y.: Learning process in an asymmetric threshold network. In: Disordered Systems and Biological Organization, pp. 233–240. Springer (1986)
Google Scholar
Bishop, C.M., et al.: Pattern recognition and machine learning, vol. 1. Springer, New York (2006)
MATH Google Scholar
Julier, S.J., Uhlmann, J.K.: New extension of the kalman filter to nonlinear systems. In: AeroSense 1997 International Society for Optics and Photonics, pp. 182–193 (1997)
Google Scholar
Vijayakumar, S., D’souza, A., Schaal, S.: Incremental online learning in high dimensions. Neural Computation 17(12), 2602–2634 (2005)
Article MathSciNet Google Scholar
Rasmussen, C.E.: Gaussian processes for machine learning. Citeseer (2006)
Google Scholar
LeCun, Y.A., Bottou, L., Orr, G.B., Müller, K.-R.: Efficient backProp. In: Orr, G.B., Müller, K.-R. (eds.) NIPS-WS 1996. LNCS, vol. 1524, pp. 9–50. Springer, Heidelberg (1998)
Chapter Google Scholar
Bergstra, J., Bengio, Y.: Random search for hyper-parameter optimization. The Journal of Machine Learning Research 13, 281–305 (2012)
MathSciNet Google Scholar
Tieleman, T., Hinton, G.: Lecture 6.5 - rmsprop: Divide the gradient by a running average of its recent magnitude. In: COURSERA: Neural Networks for Machine Learning (2012)
Google Scholar
Sutskever, I.: Training Recurrent Neural Networks. PhD thesis, University of Toronto (2013)
Google Scholar
Sutskever, I., Martens, J., Dahl, G., Hinton, G.: On the importance of initialization and momentum in deep learning (2013)
Google Scholar
Le, Q.V., Smola, A.J., Canu, S.: Heteroscedastic gaussian process regression. In: Proceedings of the 22nd International Conference on Machine Learning, pp. 489–496. ACM (2005)
Google Scholar

Download references

Author information

Authors and Affiliations

Fakultät für Informatik, Lehrstuhl für Robotik und Echtzeitsysteme, Technische Universität München, Boltzmannstraße 3, 85748, München, Germany
Justin Bayer, Christian Osendorfer, Sebastian Urban & Patrick van der Smagt

Authors

Justin Bayer
View author publications
You can also search for this author in PubMed Google Scholar
Christian Osendorfer
View author publications
You can also search for this author in PubMed Google Scholar
Sebastian Urban
View author publications
You can also search for this author in PubMed Google Scholar
Patrick van der Smagt
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Kyungpook National University, 1370 Sankyuk-Dong, Puk-Gu, 702-701, Taegu, Korea
Minho Lee
The University of Tokyo, 7-3-1 Hongo, 113-8656, Bunkyo-ku, Tokyo, Japan
Akira Hirose
Institute of Automation, Key Laboratory of Complex Systems and Intelligence Science, Chinese Academy of Sciences, 100190, Beijing, China
Zeng-Guang Hou
Sungkyunkwan University, 2066, Seobu-ro, Jangan-gu, 440-746, Suwon, Korea
Rhee Man Kil

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Bayer, J., Osendorfer, C., Urban, S., van der Smagt, P. (2013). Training Neural Networks with Implicit Variance. In: Lee, M., Hirose, A., Hou, ZG., Kil, R.M. (eds) Neural Information Processing. ICONIP 2013. Lecture Notes in Computer Science, vol 8227. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-42042-9_17

Download citation

DOI: https://doi.org/10.1007/978-3-642-42042-9_17
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-42041-2
Online ISBN: 978-3-642-42042-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics