Advertisement

TRADI: Tracking Deep Neural Network Weight Distributions

Conference paper
  • 440 Downloads
Part of the Lecture Notes in Computer Science book series (LNCS, volume 12362)

Abstract

During training, the weights of a Deep Neural Network (DNN) are optimized from a random initialization towards a nearly optimum value minimizing a loss function. Only this final state of the weights is typically kept for testing, while the wealth of information on the geometry of the weight space, accumulated over the descent towards the minimum is discarded. In this work we propose to make use of this knowledge and leverage it for computing the distributions of the weights of the DNN. This can be further used for estimating the epistemic uncertainty of the DNN by aggregating predictions from an ensemble of networks sampled from these distributions. To this end we introduce a method for tracking the trajectory of the weights during optimization, that does neither require any change in the architecture, nor in the training procedure. We evaluate our method, TRADI, on standard classification and regression benchmarks, and on out-of-distribution detection for classification and semantic segmentation. We achieve competitive results, while preserving computational efficiency in comparison to ensemble approaches.

Keywords

Deep Neural Networks Weight distribution Uncertainty Ensembles Out-of-distribution detection 

Supplementary material

504472_1_En_7_MOESM1_ESM.pdf (2.4 mb)
Supplementary material 1 (pdf 2438 KB)

References

  1. 1.
  2. 2.
    Andrychowicz, M., et al.: Learning to learn by gradient descent by gradient descent. In: Advances in Neural Information Processing Systems, pp. 3981–3989 (2016)Google Scholar
  3. 3.
    Beluch, W.H., Genewein, T., Nürnberger, A., Köhler, J.M.: The power of ensembles for active learning in image classification. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 9368–9377 (2018)Google Scholar
  4. 4.
    Bishop, C.M.: Pattern Recognition and Machine Learning. Springer, Heidelberg (2006)Google Scholar
  5. 5.
    Blundell, C., Cornebise, J., Kavukcuoglu, K., Wierstra, D.: Weight uncertainty in neural network. In: Bach, F., Blei, D. (eds.) Proceedings of the 32nd International Conference on Machine Learning. Proceedings of Machine Learning Research, 07–09 July 2015, vol. 37, pp. 1613–1622. PMLR, Lille, France (2015) http://proceedings.mlr.press/v37/blundell15.html
  6. 6.
    Blundell, C., Cornebise, J., Kavukcuoglu, K., Wierstra, D.: Weight uncertainty in neural networks. arXiv preprint arXiv:1505.05424 (2015)
  7. 7.
    Brostow, G.J., Shotton, J., Fauqueur, J., Cipolla, R.: Segmentation and recognition using structure from motion point clouds. In: Forsyth, D., Torr, P., Zisserman, A. (eds.) ECCV 2008. LNCS, vol. 5302, pp. 44–57. Springer, Heidelberg (2008).  https://doi.org/10.1007/978-3-540-88682-2_5CrossRefGoogle Scholar
  8. 8.
    Chen, C., Lu, C.X., Markham, A., Trigoni, N.: Ionet: learning to cure the curse of drift in inertial odometry. In: The Thirty-Second AAAI Conference on Artificial Intelligence (AAAI-18) (2018)Google Scholar
  9. 9.
    Dosovitskiy, A., Ros, G., Codevilla, F., Lopez, A., Koltun, V.: CARLA: an open urban driving simulator. In: Proceedings of the 1st Annual Conference on Robot Learning, pp. 1–16 (2017)Google Scholar
  10. 10.
    Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: representing model uncertainty in deep learning. In: International Conference on Machine Learning, pp. 1050–1059 (2016)Google Scholar
  11. 11.
    Gal, Y., Hron, J., Kendall, A.: Concrete dropout. In: NIPS (2017)Google Scholar
  12. 12.
    Glorot, X., Bengio, Y.: Understanding the difficulty of training deep feedforward neural networks. In: Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics, pp. 249–256 (2010)Google Scholar
  13. 13.
    Graves, A.: Practical variational inference for neural networks. In: Advances in Neural Information Processing Systems, pp. 2348–2356 (2011)Google Scholar
  14. 14.
    Grewal, M.S.: Kalman Filtering. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-04898-2_321
  15. 15.
    Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: Proceedings of the 34th International Conference on Machine Learning, vol. 70, pp. 1321–1330. JMLR. org (2017)Google Scholar
  16. 16.
    Haarnoja, T., Ajay, A., Levine, S., Abbeel, P.: Backprop kf: learning discriminative deterministic state estimators. In: Advances in Neural Information Processing Systems, pp. 4376–4384 (2016)Google Scholar
  17. 17.
    He, K., Zhang, X., Ren, S., Sun, J.: Delving deep into rectifiers: surpassing human-level performance on imagenet classification. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1026–1034 (2015)Google Scholar
  18. 18.
    He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)Google Scholar
  19. 19.
    Hendrycks, D., Basart, S., Mazeika, M., Mostajabi, M., Steinhardt, J., Song, D.: A benchmark for anomaly segmentation. arXiv preprint arXiv:1911.11132 (2019)
  20. 20.
    Hendrycks, D., Gimpel, K.: A baseline for detecting misclassified and out-of-distribution examples in neural networks. arXiv preprint arXiv:1610.02136 (2016)
  21. 21.
    Hernández-Lobato, J.M., Adams, R.: Probabilistic backpropagation for scalable learning of bayesian neural networks. In: International Conference on Machine Learning, pp. 1861–1869 (2015)Google Scholar
  22. 22.
    Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. arXiv preprint arXiv:1502.03167 (2015)
  23. 23.
    Izmailov, P., Podoprikhin, D., Garipov, T., Vetrov, D., Wilson, A.G.: Averaging weights leads to wider optima and better generalization. arXiv preprint arXiv:1803.05407 (2018)
  24. 24.
    Kalman, R.E.: A new approach to linear filtering and prediction problems. J. Basic Eng. 82(1), 35–45 (1960)MathSciNetCrossRefGoogle Scholar
  25. 25.
    Kendall, A., Badrinarayanan, V., Cipolla, R.: Bayesian segnet: model uncertainty in deep convolutional encoder-decoder architectures for scene understanding. arXiv preprint arXiv:1511.02680 (2015)
  26. 26.
    Kendall, A., Gal, Y.: What uncertainties do we need in bayesian deep learning for computer vision? In: Advances in Neural Information Processing Systems, pp. 5574–5584 (2017)Google Scholar
  27. 27.
    Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)
  28. 28.
    Kingma, D.P., Welling, M.: Auto-encoding variational bayes. In: 2nd International Conference on Learning Representations, ICLR 2014, Banff, AB, Canada, 14–16 April 2014, Conference Track Proceedings (2014)Google Scholar
  29. 29.
    Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images. Tecnical report, Citeseer (2009)Google Scholar
  30. 30.
    Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems, pp. 1097–1105 (2012)Google Scholar
  31. 31.
    Lakshminarayanan, B., Pritzel, A., Blundell, C.: Simple and scalable predictive uncertainty estimation using deep ensembles. In: Advances in Neural Information Processing Systems, pp. 6402–6413 (2017)Google Scholar
  32. 32.
    Lambert, J., Sener, O., Savarese, S.: Deep learning under privileged information using heteroscedastic dropout. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8886–8895 (2018)Google Scholar
  33. 33.
    Lan, J., Liu, R., Zhou, H., Yosinski, J.: LCA: loss change allocation for neural network training. In: Advances in Neural Information Processing Systems, pp. 3614–3624 (2019)Google Scholar
  34. 34.
    LeCun, Y., Bottou, L., Bengio, Y., Haffner, P., et al.: Gradient-based learning applied to document recognition. Proc. IEEE 86(11), 2278–2324 (1998)CrossRefGoogle Scholar
  35. 35.
    Lee, S., Purushwalkam, S., Cogswell, M., Crandall, D., Batra, D.: Why m heads are better than one: Training a diverse ensemble of deep networks. arXiv preprint arXiv:1511.06314 (2015)
  36. 36.
    Liu, C., Gu, J., Kim, K., Narasimhan, S.G., Kautz, J.: Neural rgb (r) d sensing: depth and uncertainty from a video camera. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 10986–10995 (2019)Google Scholar
  37. 37.
    Maddox, W., Garipov, T., Izmailov, P., Vetrov, D., Wilson, A.G.: A simple baseline for bayesian uncertainty in deep learning. arXiv preprint arXiv:1902.02476 (2019)
  38. 38.
    Mukhoti, J., Gal, Y.: Evaluating bayesian deep learning methods for semantic segmentation. CoRR abs/1811.12709 (2018). http://arxiv.org/abs/1811.12709
  39. 39.
    Neal, R.M.: Bayesian Learning for Neural Networks. Springer, Heidelberg (1996).  https://doi.org/10.1007/978-1-4612-0745-0CrossRefzbMATHGoogle Scholar
  40. 40.
    Ollivier, Y.: The extended kalman filter is a natural gradient descent in trajectory space. arXiv preprint arXiv:1901.00696 (2019)
  41. 41.
    Osband, I.: Risk versus uncertainty in deep learning: Bayes, bootstrap and the dangers of dropout (2016)Google Scholar
  42. 42.
    Osband, I., Aslanides, J., Cassirer, A.: Randomized prior functions for deep reinforcement learning. In: NeurIPS (2018)Google Scholar
  43. 43.
    Paszke, A., Chaurasia, A., Kim, S., Culurciello, E.: Enet: a deep neural network architecture for real-time semantic segmentation. arXiv preprint arXiv:1606.02147 (2016)
  44. 44.
    Paszke, A., et al.: Pytorch: An imperative style, high-performance deep learning library. In: Advances in Neural Information Processing Systems, pp. 8024–8035 (2019)Google Scholar
  45. 45.
    Rahimi, A., Recht, B.: Random features for large-scale kernel machines. In: Advances in Neural Information Processing Systems, pp. 1177–1184 (2007)Google Scholar
  46. 46.
    Salimans, T., Kingma, D.P.: Weight normalization: a simple reparameterization to accelerate training of deep neural networks. In: Advances in Neural Information Processing Systems, pp. 901–909 (2016)Google Scholar
  47. 47.
    Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014)
  48. 48.
    Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.: Dropout: a simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 15(1), 1929–1958 (2014), http://dl.acm.org/citation.cfm?id=2627435.2670313
  49. 49.
    Szegedy, C., et al.: Going deeper with convolutions. arxiv 2014. arXiv preprint arXiv:1409.4842 1409 (2014)
  50. 50.
    Teye, M., Azizpour, H., Smith, K.: Bayesian uncertainty estimation for batch normalized deep networks. In: ICML (2018)Google Scholar
  51. 51.
    Wang, G., Peng, J., Luo, P., Wang, X., Lin, L.: Batch kalman normalization: Towards training deep neural networks with micro-batches. arXiv preprint arXiv:1802.03133 (2018)
  52. 52.
    Williams, C.K., Rasmussen, C.E.: Gaussian Processes for Machine Learning, vol. 2. MIT press, Cambridge (2006)zbMATHGoogle Scholar
  53. 53.
    Yang, G.: Scaling limits of wide neural networks with weight sharing: Gaussian process behavior, gradient independence, and neural tangent kernel derivation. arXiv preprint arXiv:1902.04760 (2019)
  54. 54.
    Yu, F., et al.: Bdd100k: A diverse driving video database with scalable annotation tooling. arXiv preprint arXiv:1805.04687 (2018)
  55. 55.
    Zagoruyko, S., Komodakis, N.: Wide residual networks. arXiv preprint arXiv:1605.07146 (2016)
  56. 56.
    Zhao, H., Shi, J., Qi, X., Wang, X., Jia, J.: Pyramid scene parsing network. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2881–2890 (2017)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2020

Authors and Affiliations

  1. 1.ENSTA Paris, Institut Polytechnique de ParisPalaiseauFrance
  2. 2.SATIE, Université Paris-Sud, Université Paris-SaclayGif-sur-YvetteFrance
  3. 3.valeo.aiParisFrance
  4. 4.CNRS, LIS, Aix Marseille UniversityMarseilleFrance
  5. 5.LTCI, Télécom Paris, Institut Polytechnique de ParisPalaiseauFrance

Personalised recommendations