Training Neural Networks with Implicit Variance

  • Justin Bayer
  • Christian Osendorfer
  • Sebastian Urban
  • Patrick van der Smagt
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8227)


We present a novel method to train predictive Gaussian distributions p(z|x) for regression problems with neural networks. While most approaches either ignore or explicitly model the variance as another response variable, it is trained implicitly in our case. Establishing stochasticty by the injection of noise into the input and hidden units, the outputs are approximated with a Gaussian distribution by the forward propagation method introduced for fast dropout [1]. We have designed our method to respect that probabilistic interpretation of the output units in the loss function. The method is evaluated on a synthetic and a inverse robot dynamics task, yielding superior performance to plain neural networks, Gaussian processes and LWPR in terms of likelihood.


neural networks predictive distributions deep learning 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Wang, S., Manning, C.: Fast dropout training. In: Proceedings of the 30th International Conference on Machine Learning (ICML 2013), pp. 118–126 (2013)Google Scholar
  2. 2.
    Krizhevsky, A., Sutskever, I., Hinton, G.: Imagenet classification with deep convolutional neural networks. Advances in Neural Information Processing Systems 25, 1106–1114 (2012)Google Scholar
  3. 3.
    Dahl, G.E., Yu, D., Deng, L., Acero, A.: Context-dependent pre-trained deep neural networks for large-vocabulary speech recognition. IEEE Transactions on Audio, Speech, and Language Processing 20(1), 30–42 (2012)CrossRefGoogle Scholar
  4. 4.
    Larochelle, H., Erhan, D., Courville, A., Bergstra, J., Bengio, Y.: An empirical evaluation of deep architectures on problems with many factors of variation. In: Proceedings of the 24th International Conference on Machine Learning, pp. 473–480. ACM (2007)Google Scholar
  5. 5.
    Hinton, G.E., Salakhutdinov, R.R.: Reducing the dimensionality of data with neural networks. Science 313(5786), 504–507 (2006)MathSciNetCrossRefzbMATHGoogle Scholar
  6. 6.
    Ciresan, D., Meier, U., Schmidhuber, J.: Multi-column deep neural networks for image classification. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3642–3649. IEEE (2012)Google Scholar
  7. 7.
    Zeiler, M., Ranzato, M., Monga, R., Mao, M., Yang, K., Le, Q., Nguyen, P., Senior, A., Vanhoucke, V., Dean, J., et al.: On rectified linear units for speech processing, ICASSP (2013)Google Scholar
  8. 8.
    Hinton, G.E., Srivastava, N., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.R.: Improving neural networks by preventing co-adaptation of feature detectors. arXiv preprint arXiv:1207.0580 (2012)Google Scholar
  9. 9.
    Neal, R.M.: Connectionist learning of belief networks. Artificial Intelligence 56(1), 71–113 (1992)MathSciNetCrossRefzbMATHGoogle Scholar
  10. 10.
    Bengio, Y., Thibodeau-Laufer, É.: Deep generative stochastic networks trainable by backprop (2013)Google Scholar
  11. 11.
    Rumelhart, D.E., Hinton, G.E., Williams, R.J.: Learning representations by back-propagating errors. Nature 323(6088), 533–536 (1986)CrossRefGoogle Scholar
  12. 12.
    Tang, Y., Salakhutdinov, R.: A new learning algorithm for stochastic feedforward neural nets (2013)Google Scholar
  13. 13.
    Bengio, Y.: Estimating or propagating gradients through stochastic neurons. arXiv preprint arXiv:1305.2982 (2013)Google Scholar
  14. 14.
    Salakhutdinov, R., Hinton, G.: Using deep belief nets to learn covariance kernels for gaussian processes. Advances in Neural Information Processing Systems 20, 1249–1256 (2008)Google Scholar
  15. 15.
    Uria, B., Murray, I., Renals, S., Richmond, K.: Deep architectures for articulatory inversion. In: Proceedings of Interspeech (2012)Google Scholar
  16. 16.
    Bishop, C.M.: Mixture density networks (1994)Google Scholar
  17. 17.
    Werbos, P.: Beyond regression: New tools for prediction and analysis in the behavioral sciences (1974)Google Scholar
  18. 18.
    Le Cun, Y.: Learning process in an asymmetric threshold network. In: Disordered Systems and Biological Organization, pp. 233–240. Springer (1986)Google Scholar
  19. 19.
    Bishop, C.M., et al.: Pattern recognition and machine learning, vol. 1. Springer, New York (2006)zbMATHGoogle Scholar
  20. 20.
    Julier, S.J., Uhlmann, J.K.: New extension of the kalman filter to nonlinear systems. In: AeroSense 1997 International Society for Optics and Photonics, pp. 182–193 (1997)Google Scholar
  21. 21.
    Vijayakumar, S., D’souza, A., Schaal, S.: Incremental online learning in high dimensions. Neural Computation 17(12), 2602–2634 (2005)MathSciNetCrossRefGoogle Scholar
  22. 22.
    Rasmussen, C.E.: Gaussian processes for machine learning. Citeseer (2006)Google Scholar
  23. 23.
    LeCun, Y.A., Bottou, L., Orr, G.B., Müller, K.-R.: Efficient backProp. In: Orr, G.B., Müller, K.-R. (eds.) NIPS-WS 1996. LNCS, vol. 1524, pp. 9–50. Springer, Heidelberg (1998)CrossRefGoogle Scholar
  24. 24.
    Bergstra, J., Bengio, Y.: Random search for hyper-parameter optimization. The Journal of Machine Learning Research 13, 281–305 (2012)MathSciNetGoogle Scholar
  25. 25.
    Tieleman, T., Hinton, G.: Lecture 6.5 - rmsprop: Divide the gradient by a running average of its recent magnitude. In: COURSERA: Neural Networks for Machine Learning (2012)Google Scholar
  26. 26.
    Sutskever, I.: Training Recurrent Neural Networks. PhD thesis, University of Toronto (2013)Google Scholar
  27. 27.
    Sutskever, I., Martens, J., Dahl, G., Hinton, G.: On the importance of initialization and momentum in deep learning (2013)Google Scholar
  28. 28.
    Le, Q.V., Smola, A.J., Canu, S.: Heteroscedastic gaussian process regression. In: Proceedings of the 22nd International Conference on Machine Learning, pp. 489–496. ACM (2005)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2013

Authors and Affiliations

  • Justin Bayer
    • 1
  • Christian Osendorfer
    • 1
  • Sebastian Urban
    • 1
  • Patrick van der Smagt
    • 1
  1. 1.Fakultät für Informatik, Lehrstuhl für Robotik und EchtzeitsystemeTechnische Universität MünchenMünchenGermany

Personalised recommendations