Skip to main content

Training Neural Networks with Implicit Variance

  • Conference paper
Neural Information Processing (ICONIP 2013)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 8227))

Included in the following conference series:

Abstract

We present a novel method to train predictive Gaussian distributions p(z|x) for regression problems with neural networks. While most approaches either ignore or explicitly model the variance as another response variable, it is trained implicitly in our case. Establishing stochasticty by the injection of noise into the input and hidden units, the outputs are approximated with a Gaussian distribution by the forward propagation method introduced for fast dropout [1]. We have designed our method to respect that probabilistic interpretation of the output units in the loss function. The method is evaluated on a synthetic and a inverse robot dynamics task, yielding superior performance to plain neural networks, Gaussian processes and LWPR in terms of likelihood.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Wang, S., Manning, C.: Fast dropout training. In: Proceedings of the 30th International Conference on Machine Learning (ICML 2013), pp. 118–126 (2013)

    Google Scholar 

  2. Krizhevsky, A., Sutskever, I., Hinton, G.: Imagenet classification with deep convolutional neural networks. Advances in Neural Information Processing Systems 25, 1106–1114 (2012)

    Google Scholar 

  3. Dahl, G.E., Yu, D., Deng, L., Acero, A.: Context-dependent pre-trained deep neural networks for large-vocabulary speech recognition. IEEE Transactions on Audio, Speech, and Language Processing 20(1), 30–42 (2012)

    Article  Google Scholar 

  4. Larochelle, H., Erhan, D., Courville, A., Bergstra, J., Bengio, Y.: An empirical evaluation of deep architectures on problems with many factors of variation. In: Proceedings of the 24th International Conference on Machine Learning, pp. 473–480. ACM (2007)

    Google Scholar 

  5. Hinton, G.E., Salakhutdinov, R.R.: Reducing the dimensionality of data with neural networks. Science 313(5786), 504–507 (2006)

    Article  MathSciNet  MATH  Google Scholar 

  6. Ciresan, D., Meier, U., Schmidhuber, J.: Multi-column deep neural networks for image classification. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3642–3649. IEEE (2012)

    Google Scholar 

  7. Zeiler, M., Ranzato, M., Monga, R., Mao, M., Yang, K., Le, Q., Nguyen, P., Senior, A., Vanhoucke, V., Dean, J., et al.: On rectified linear units for speech processing, ICASSP (2013)

    Google Scholar 

  8. Hinton, G.E., Srivastava, N., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.R.: Improving neural networks by preventing co-adaptation of feature detectors. arXiv preprint arXiv:1207.0580 (2012)

    Google Scholar 

  9. Neal, R.M.: Connectionist learning of belief networks. Artificial Intelligence 56(1), 71–113 (1992)

    Article  MathSciNet  MATH  Google Scholar 

  10. Bengio, Y., Thibodeau-Laufer, É.: Deep generative stochastic networks trainable by backprop (2013)

    Google Scholar 

  11. Rumelhart, D.E., Hinton, G.E., Williams, R.J.: Learning representations by back-propagating errors. Nature 323(6088), 533–536 (1986)

    Article  Google Scholar 

  12. Tang, Y., Salakhutdinov, R.: A new learning algorithm for stochastic feedforward neural nets (2013)

    Google Scholar 

  13. Bengio, Y.: Estimating or propagating gradients through stochastic neurons. arXiv preprint arXiv:1305.2982 (2013)

    Google Scholar 

  14. Salakhutdinov, R., Hinton, G.: Using deep belief nets to learn covariance kernels for gaussian processes. Advances in Neural Information Processing Systems 20, 1249–1256 (2008)

    Google Scholar 

  15. Uria, B., Murray, I., Renals, S., Richmond, K.: Deep architectures for articulatory inversion. In: Proceedings of Interspeech (2012)

    Google Scholar 

  16. Bishop, C.M.: Mixture density networks (1994)

    Google Scholar 

  17. Werbos, P.: Beyond regression: New tools for prediction and analysis in the behavioral sciences (1974)

    Google Scholar 

  18. Le Cun, Y.: Learning process in an asymmetric threshold network. In: Disordered Systems and Biological Organization, pp. 233–240. Springer (1986)

    Google Scholar 

  19. Bishop, C.M., et al.: Pattern recognition and machine learning, vol. 1. Springer, New York (2006)

    MATH  Google Scholar 

  20. Julier, S.J., Uhlmann, J.K.: New extension of the kalman filter to nonlinear systems. In: AeroSense 1997 International Society for Optics and Photonics, pp. 182–193 (1997)

    Google Scholar 

  21. Vijayakumar, S., D’souza, A., Schaal, S.: Incremental online learning in high dimensions. Neural Computation 17(12), 2602–2634 (2005)

    Article  MathSciNet  Google Scholar 

  22. Rasmussen, C.E.: Gaussian processes for machine learning. Citeseer (2006)

    Google Scholar 

  23. LeCun, Y.A., Bottou, L., Orr, G.B., Müller, K.-R.: Efficient backProp. In: Orr, G.B., Müller, K.-R. (eds.) NIPS-WS 1996. LNCS, vol. 1524, pp. 9–50. Springer, Heidelberg (1998)

    Chapter  Google Scholar 

  24. Bergstra, J., Bengio, Y.: Random search for hyper-parameter optimization. The Journal of Machine Learning Research 13, 281–305 (2012)

    MathSciNet  Google Scholar 

  25. Tieleman, T., Hinton, G.: Lecture 6.5 - rmsprop: Divide the gradient by a running average of its recent magnitude. In: COURSERA: Neural Networks for Machine Learning (2012)

    Google Scholar 

  26. Sutskever, I.: Training Recurrent Neural Networks. PhD thesis, University of Toronto (2013)

    Google Scholar 

  27. Sutskever, I., Martens, J., Dahl, G., Hinton, G.: On the importance of initialization and momentum in deep learning (2013)

    Google Scholar 

  28. Le, Q.V., Smola, A.J., Canu, S.: Heteroscedastic gaussian process regression. In: Proceedings of the 22nd International Conference on Machine Learning, pp. 489–496. ACM (2005)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Bayer, J., Osendorfer, C., Urban, S., van der Smagt, P. (2013). Training Neural Networks with Implicit Variance. In: Lee, M., Hirose, A., Hou, ZG., Kil, R.M. (eds) Neural Information Processing. ICONIP 2013. Lecture Notes in Computer Science, vol 8227. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-42042-9_17

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-42042-9_17

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-42041-2

  • Online ISBN: 978-3-642-42042-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics