Towards Efficient MCMC Sampling in Bayesian Neural Networks by Exploiting Symmetry

Wiese, Jonas Gregor; Wimmer, Lisa; Papamarkou, Theodore; Bischl, Bernd; Günnemann, Stephan; Rügamer, David

doi:10.1007/978-3-031-43412-9_27

Jonas Gregor Wiese¹²,
Lisa Wimmer^13,14,
Theodore Papamarkou¹⁵,
Bernd Bischl^13,14,
Stephan Günnemann^12,14 &
…
David Rügamer^13,14

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 14169))

Included in the following conference series:

Joint European Conference on Machine Learning and Knowledge Discovery in Databases

1269 Accesses
1 Citations

Abstract

Bayesian inference in deep neural networks is challenging due to the high-dimensional, strongly multi-modal parameter posterior density landscape. Markov chain Monte Carlo approaches asymptotically recover the true posterior but are considered prohibitively expensive for large modern architectures. Local methods, which have emerged as a popular alternative, focus on specific parameter regions that can be approximated by functions with tractable integrals. While these often yield satisfactory empirical results, they fail, by definition, to account for the multi-modality of the parameter posterior. Such coarse approximations can be detrimental in practical applications, notably safety-critical ones. In this work, we argue that the dilemma between exact-but-unaffordable and cheap-but-inexact approaches can be mitigated by exploiting symmetries in the posterior landscape. These symmetries, induced by neuron interchangeability and certain activation functions, manifest in different parameter values leading to the same functional output value. We show theoretically that the posterior predictive density in Bayesian neural networks can be restricted to a symmetry-free parameter reference set. By further deriving an upper bound on the number of Monte Carlo chains required to capture the functional diversity, we propose a straightforward approach for feasible Bayesian inference. Our experiments suggest that efficient sampling is indeed possible, opening up a promising path to accurate uncertainty quantification in deep learning.

J. G. Wiese and L. Wimmer—Equal contribution.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
We assume the likelihood to be parameterized by a single parameter vector. In the case of neural networks (NNs), the parameter contains all weights and biases.
2.
https://github.com/jgwiese/mcmc_bnn_symmetry/.../sub_44_supplementary_material.pdf.
3.
Recall that the pre-activation of neuron i in layer l is \(o_{li} = \sum _{j = 1}^{M_{l-1}} w_{lij}z_{(l-1)j} + b_{li}\). By the commutative property of sums, any permutation \(\pi : J \rightarrow J\) of elements from the set \(J = \{1, \dots , M_{l - 1} \}\) will lead to the same pre-activation:
\(o_{li} = \sum _{j \in J} w_{lij}z_{(l-1)j} + b_{li} = \sum _{j \in \pi (J)} w_{lij}z_{(l-1)j} + b_{li}.\).
4.
[10] demonstrate that finding invariant representations for groups acting on the input space is an NP-hard problem. While we are not aware of such a result for the parameter space, the NP-hardness in [10] for permutations of the inputs only suggests a similar property in our case.
5.
https://github.com/jgwiese/mcmc_bnn_symmetry.

References

Agrawal, D., Ostrowski, J.: A classification of G-invariant shallow neural networks. In: Advances in Neural Information Processing Systems (2022)
Google Scholar
Ainsworth, S., Hayase, J., Srinivasa, S.: Git Re-Basin: merging models modulo permutation symmetries. In: The Eleventh International Conference on Learning Representations (2023)
Google Scholar
Bardenet, R., Kégl, B.: An adaptive Monte-Carlo Markov chain algorithm for inference from mixture signals. J. Phys. Conf. Ser. 368, 012044 (2012)
Article Google Scholar
Bona-Pellissier, J., Bachoc, F., Malgouyres, F.: Parameter identifiability of a deep feedforward ReLU neural network (2021)
Google Scholar
Van den Broeck, G., Kersting, K., Natarajan, S., Poole, D.: An Introduction to Lifted Probabilistic Inference. MIT Press, Cambridge (2021)
Google Scholar
Chen, A.M., Lu, H.M., Hecht-Nielsen, R.: On the geometry of feedforward neural network error surfaces. Neural Comput. 5(6), 910–927 (1993)
Article Google Scholar
Daxberger, E., Kristiadi, A., Immer, A., Eschenhagen, R., Bauer, M., Hennig, P.: Laplace redux - effortless Bayesian deep learning. In: 35th Conference on Neural Information Processing Systems (NeurIPS 2021) (2021)
Google Scholar
Draxler, F., Veschgini, K., Salmhofer, M., Hamprecht, F.: Essentially no barriers in neural network energy landscape. In: Proceedings of the 35th International Conference on Machine Learning, pp. 1309–1318. PMLR (2018)
Google Scholar
Dua, D., Graff, C.: UCI Machine Learning Repository (2017)
Google Scholar
Ensign, D., Neville, S., Paul, A., Venkatasubramanian, S.: The complexity of explaining neural networks through (group) invariants. In: Proceedings of Machine Learning Research, vol. 76 (2017)
Google Scholar
Eschenhagen, R., Daxberger, E., Hennig, P., Kristiadi, A.: Mixtures of Laplace approximations for improved post-Hoc uncertainty in deep learning. In: Bayesian Deep Learning Workshop, NeurIPS 2021 (2021)
Google Scholar
Garipov, T., Izmailov, P., Podoprikhin, D., Vetrov, D.P., Wilson, A.G.: Loss surfaces, mode connectivity, and fast ensembling of DNNs. In: Proceedings of the 32nd Conference on Neural Information Processing Systems (NeurIPS 2018) (2018)
Google Scholar
Gelman, A., Hwang, J., Vehtari, A.: Understanding predictive information criteria for Bayesian models. Stat. Comput. 24(6), 997–1016 (2014)
Article MathSciNet MATH Google Scholar
Graf, S., Luschgy, H.: Foundations of Quantization for Probability Distributions. Springer, Heidelberg (2007). https://doi.org/10.1007/BFb0103945
Book MATH Google Scholar
Hecht-Nielsen, R.: On the algebraic structure of feedforward network weight spaces. In: Advanced Neural Computers, pp. 129–135. Elsevier, Amsterdam (1990)
Google Scholar
Hoffman, M.D., Gelman, A.: The No-U-Turn sampler: adaptively setting path lengths in Hamiltonian Monte Carlo. J. Mach. Learn. Res. 15(47), 1593–1623 (2014)
MathSciNet MATH Google Scholar
Hüllermeier, E., Waegeman, W.: Aleatoric and epistemic uncertainty in machine learning: an introduction to concepts and methods. Mach. Learn. 110 (2021)
Google Scholar
Izmailov, P., Vikram, S., Hoffman, M.D., Wilson, A.G.: What are Bayesian neural network posteriors really like? In: Proceedings of the 38th International Conference on Machine Learning, vol. 139. PMLR (2021)
Google Scholar
Kipf, T.N., Welling, M.: Semi-supervised classification with graph convolutional networks. In: ICLR 2017 (2017)
Google Scholar
Kůrková, V., Kainen, P.C.: Functionally equivalent feedforward neural networks. Neural Comput. 6(3), 543–558 (1994)
Google Scholar
Lakshminarayanan, B., Pritzel, A., Blundell, C.: Simple and scalable predictive uncertainty estimation using deep ensembles. In: Proceedings of the 31st Conference on Neural Information Processing Systems (NIPS 2017) (2017)
Google Scholar
MacKay, D.J.C.: Bayesian interpolation. Neural Comput. 4, 415–447 (1992)
Article MATH Google Scholar
Margossian, C.C., Hoffman, M.D., Sountsov, P., Riou-Durand, L., Vehtari, A., Gelman, A.: Nested \(\hat{R}\): assessing the convergence of Markov chain Monte Carlo when running many short chains (2022)
Google Scholar
Nalisnick, E.T.: On priors for Bayesian neural networks. Ph.D. thesis, University of California, Irvine (2018)
Google Scholar
Niepert, M.: Markov chains on orbits of permutation groups. In: Proceedings of the Twenty-Eighth Conference on Uncertainty in Artificial Intelligence, p. 624–633. UAI’12, AUAI Press, Arlington, Virginia, USA (2012)
Google Scholar
Niepert, M.: Symmetry-aware marginal density estimation. In: Proceedings of the Twenty-Seventh AAAI Conference on Artificial Intelligence. AAAI’13, pp. 725–731. AAAI Press (2013)
Google Scholar
Papamarkou, T., Hinkle, J., Young, M.T., Womble, D.: Challenges in Markov chain Monte Carlo for Bayesian neural networks. Stat. Sci. 37(3) (2022)
Google Scholar
Pearce, T., Leibfried, F., Brintrup, A.: Uncertainty in neural networks: approximately Bayesian ensembling. In: Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics. Proceedings of Machine Learning Research, vol. 108, pp. 234–244. PMLR, 26–28 August 2020
Google Scholar
Petzka, H., Trimmel, M., Sminchisescu, C.: Notes on the symmetries of 2-layer ReLU-networks. In: Northern Lights Deep Learning Workshop, vol. 1 (2020)
Google Scholar
Pittorino, F., Ferraro, A., Perugini, G., Feinauer, C., Baldassi, C., Zecchina, R.: Deep networks on toroids: removing symmetries reveals the structure of flat regions in the landscape geometry. In: Proceedings of the 39th International Conference on Machine Learning, vol. 162. PMLR (2022)
Google Scholar
Pourzanjani, A.A., Jiang, R.M., Petzold, L.R.: Improving the identifiability of neural networks for Bayesian inference. In: Second Workshop on Bayesian Deep Learning (NIPS) (2017)
Google Scholar
Rosenthal, J.S.: Parallel computing and Monte Carlo algorithms. Far East J. Theor. Stat. 4, 207–236 (2000)
MathSciNet MATH Google Scholar
Sen, D., Papamarkou, T., Dunson, D.: Bayesian neural networks and dimensionality reduction (2020). arXiv: 2008.08044
Sussmann, H.J.: Uniqueness of the weights for minimal feedforward nets with a given input-output map. Neural Netw. 5(4), 589–593 (1992)
Article Google Scholar
Tatro, N.J., Chen, P.Y., Das, P., Melnyk, I., Sattigeri, P., Lai, R.: Optimizing mode connectivity via neuron alignment. In: Proceedings of the 34th Conference on Neural Information Processing Systems (NeurIPS 2020) (2020)
Google Scholar
Vlačić, V., Bölcskei, H.: Affine symmetries and neural network identifiability. Adv. Math. 376, 107485 (2021)
Google Scholar
Wilson, A.G., Izmailov, P.: Bayesian deep learning and a probabilistic perspective of generalization. In: Proceedings of the 34th International Conference on Neural Information Processing Systems. NIPS’20. Curran Associates Inc., Red Hook, NY, USA (2020)
Google Scholar

Download references

Acknowledgments

LW is supported by the DAAD programme Konrad Zuse Schools of Excellence in Artificial Intelligence, sponsored by the German Federal Ministry of Education and Research.

Author information

Authors and Affiliations

Technical University of Munich, Munich, Germany
Jonas Gregor Wiese & Stephan Günnemann
Department of Statistics, LMU Munich, Munich, Germany
Lisa Wimmer, Bernd Bischl & David Rügamer
Munich Center for Machine Learning (MCML), Munich, Germany
Lisa Wimmer, Bernd Bischl, Stephan Günnemann & David Rügamer
Department of Mathematics, The University of Manchester, Manchester, UK
Theodore Papamarkou

Authors

Jonas Gregor Wiese
View author publications
You can also search for this author in PubMed Google Scholar
Lisa Wimmer
View author publications
You can also search for this author in PubMed Google Scholar
Theodore Papamarkou
View author publications
You can also search for this author in PubMed Google Scholar
Bernd Bischl
View author publications
You can also search for this author in PubMed Google Scholar
Stephan Günnemann
View author publications
You can also search for this author in PubMed Google Scholar
David Rügamer
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to David Rügamer .

Editor information

Editors and Affiliations

University of Michigan, Ann Arbor, MI, USA
Danai Koutra
University of Vienna, Vienna, Austria
Claudia Plant
Max Planck Institute for Software Systems, Kaiserslautern, Germany
Manuel Gomez Rodriguez
Politecnico di Torino, Turin, Italy
Elena Baralis
CENTAI, Turin, Italy
Francesco Bonchi

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Wiese, J.G., Wimmer, L., Papamarkou, T., Bischl, B., Günnemann, S., Rügamer, D. (2023). Towards Efficient MCMC Sampling in Bayesian Neural Networks by Exploiting Symmetry. In: Koutra, D., Plant, C., Gomez Rodriguez, M., Baralis, E., Bonchi, F. (eds) Machine Learning and Knowledge Discovery in Databases: Research Track. ECML PKDD 2023. Lecture Notes in Computer Science(), vol 14169. Springer, Cham. https://doi.org/10.1007/978-3-031-43412-9_27

Download citation

DOI: https://doi.org/10.1007/978-3-031-43412-9_27
Published: 17 September 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-43411-2
Online ISBN: 978-3-031-43412-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

the ECML PKDD community (opens in a new tab)

Towards Efficient MCMC Sampling in Bayesian Neural Networks by Exploiting Symmetry