Uncertainty Modelling in Deep Networks: Forecasting Short and Noisy Series

  • Axel BrandoEmail author
  • Jose A. Rodríguez-SerranoEmail author
  • Mauricio CiprianEmail author
  • Roberto MaestreEmail author
  • Jordi VitriàEmail author
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11053)


Deep Learning is a consolidated, state-of-the-art Machine Learning tool to fit a function \(y=f(x)\) when provided with large data sets of examples \(\{(x_i, y_i)\}\). However, in regression tasks, the straightforward application of Deep Learning models provides a point estimate of the target. In addition, the model does not take into account the uncertainty of a prediction. This represents a great limitation for tasks where communicating an erroneous prediction carries a risk. In this paper we tackle a real-world problem of forecasting impending financial expenses and incomings of customers, while displaying predictable monetary amounts on a mobile app. In this context, we investigate if we would obtain an advantage by applying Deep Learning models with a Heteroscedastic model of the variance of a network’s output. Experimentally, we achieve a higher accuracy than non-trivial baselines. More importantly, we introduce a mechanism to discard low-confidence predictions, which means that they will not be visible to users. This should help enhance the user experience of our product.


Deep Learning Uncertainty Aleatoric models Time-series 



We gratefully acknowledge the Industrial Doctorates Plan of Generalitat de Catalunya for funding part of this research. The UB acknowledges the support of NVIDIA Corporation with the donation of a Titan X Pascal GPU and recognizes that part of the research described in this chapter was partially funded by TIN2015-66951-C2, SGR 1219. We also thank Alberto Rúbio and César de Pablo for insightful comments as well as BBVA Data and Analytics for sponsoring the industrial PhD.


  1. 1.
    Abadi, M., et al.: Tensorflow: a system for large-scale machine learning. In OSDI, vol. 16, pp. 265–283, November 2016Google Scholar
  2. 2.
    Anderson-Cook, C.M.: Generalized Additive Models: An Introduction with R. American Statistical Association, UK (2007)Google Scholar
  3. 3.
    Bishop, C.M.: Mixture density networks, p. 7. Technical report NCRG/4288. Aston University, Birmingham, UK (1994)Google Scholar
  4. 4.
    Blundell, C., Cornebise, J., Kavukcuoglu, K., Wierstra, D.: Weight uncertainty in neural networks. In: Proceedings of the 32nd International Conference on International Conference on Machine Learning, vol. 37, pp. 1613–1622, July 2015Google Scholar
  5. 5.
    Bridle, J.S.: Probabilistic interpretation of feedforward classification network outputs, with relationships to statistical pattern recognition. In: Soulié, F.F., Hérault, J. (eds.) Neurocomputing. NATO ASI Series, vol. 68, pp. 227–236. Springer, Heidelberg (1990). Scholar
  6. 6.
    Cho, K., et al.: Learning phrase representations using RNN encoder-decoder for statistical machine translation. arXiv preprint arXiv:1406.1078 (2014)
  7. 7.
    Chollet, F.: Keras: deep learning library for Theano and TensorFlow (2015)Google Scholar
  8. 8.
    Ciprian, M., et al.: Evaluating uncertainty scores for deep regression networks in financial short time series forecasting. In: NIPS Workshop on Machine Learning for Spatiotemporal Forecasting (2016)Google Scholar
  9. 9.
    Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? Does it matter? Struct. Saf. 31(2), 105–112 (2009)CrossRefGoogle Scholar
  10. 10.
    Gal, Y., Ghahramani, Z.: Dropout as a Bayesian approximation: representing model uncertainty in deep learning. In: ICML, June 2016Google Scholar
  11. 11.
    Gers, F.A., Schmidhuber, J., Cummins, F.: Learning to forget: continual prediction with LSTM. IET Digital Library (1999)Google Scholar
  12. 12.
    Marra, G., Wood, S.N.: Coverage properties of confidence intervals for generalized additive model components. SJS 39(1), 53–74 (2012)MathSciNetzbMATHGoogle Scholar
  13. 13.
    Hendrycks, D., Gimpel, K.: A baseline for detecting misclassified and out-of-distribution examples in neural networks. arXiv preprint arXiv:1610.02136 (2016)
  14. 14.
    Hernández-Lobato, J.M., Adams, R.: Probabilistic backpropagation for scalable learning of Bayesian neural networks. In ICML, pp. 1861–1869, June 2015Google Scholar
  15. 15.
    Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)CrossRefGoogle Scholar
  16. 16.
    Hornik, K., Stinchcombe, M., White, H.: Multilayer feedforward networks are universal approximators. Neural Netw. 2(5), 359–366 (1989)CrossRefGoogle Scholar
  17. 17.
    Hyndman, R.J., Koehler, A.B.: Another look at measures of forecast accuracy. Int. J. Forecast. 22(4), 679–688 (2006)CrossRefGoogle Scholar
  18. 18.
    Kendall, A., Gal, Y.: What uncertainties do we need in Bayesian deep learning for computer vision? In: NIPS, pp. 5574–5584 (2017)Google Scholar
  19. 19.
    Mitrović, S., Singh, G.: Predicting branch visits and credit card up-selling using temporal banking data. arXiv preprint arXiv:1607.06123 (2016)
  20. 20.
    Mutanen, T., Ahola, J., Nousiainen, S.: Customer churn prediction-a case study in retail banking. In: Proceedings of ECML/PKDD Workshop on Practical Data Mining, pp. 13–19, September 2006Google Scholar
  21. 21.
    Rasmussen, C.E.: A practical Monte Carlo implementation of Bayesian learning. In: Advances in Neural Information Processing Systems, pp. 598–604 (1996)Google Scholar
  22. 22.
    Wistuba, M., Duong-Trung, N., Schilling, N., Schmidt-Thieme, L.: Bank card usage prediction exploiting geolocation information. arXiv preprint arXiv:1610.03996 (2016)

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  1. 1.BBVA Data and AnalyticsBarcelonaSpain
  2. 2.BBVA Data and AnalyticsMadridSpain
  3. 3.Departament de Matemàtiques i InformàticaUniversitat de BarcelonaBarcelonaSpain

Personalised recommendations