Abstract
Recent work concerning adversarial attacks on ReLU neural networks has shown that unbounded regions and regions with a sufficiently large volume can be prone to containing adversarial samples. Finding the representation of linear regions and identifying their properties are challenging tasks. In practice, one works with deep neural networks and high-dimensional input data that leads to polytopes represented by an extensive number of inequalities, and hence demanding high computational resources. The approach should be scalable, feasible and numerically stable. We discuss an algorithm that finds the H-representation of each region of a neural network and identifies if the region is bounded or not.
Keywords
- Neural network
- Unbounded polytope
- Linear programming
- ReLU activation function
The research reported in this paper has been partly funded by BMK, BMDW, and the Province of Upper Austria in the frame of the COMET Programme managed by FFG in the COMET Module S3AI.
This is a preview of subscription content, access via your institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Büeler, B., Enge, A., Fukuda, K.: Exact volume computation for polytopes: a practical study. In: Polytopes-Combinatorics and Computation, pp. 131–154. Springer, Heidelberg (2000). https://doi.org/10.1007/978-3-0348-8438-9_6
Caswell, T.A., et al.: matplotlib/matplotlib: Rel: v3.5.1 (2021)
Emiris, I.Z., Fisikopoulos, V.: Efficient random-walk methods for approximating polytope volume. In: Proceedings of the Thirtieth Annual Symposium on Computational Geometry, pp. 318–327 (2014)
Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330. PMLR (2017)
Harris, C.R., et al.: Array programming with NumPy. Nature 585(7825), 357–362 (2020)
Hein, M., Andriushchenko, M., Bitterwolf, J.: Why ReLU networks yield high-confidence predictions far away from the training data and how to mitigate the problem. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 41–50 (2019)
Hendrycks, D., Gimpel, K.: A baseline for detecting misclassified and out-of-distribution examples in neural networks. arXiv preprint arXiv:1610.02136 (2016)
Hsu, Y.C., Shen, Y., Jin, H., Kira, Z.: Generalized ODIN: detecting out-of-distribution image without learning from out-of-distribution data. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2020)
Huangfu, Q., Hall, J.A.J.: Parallelizing the dual revised simplex method. Math. Program. Comput. 10(1), 119–142 (2017). https://doi.org/10.1007/s12532-017-0130-5
Leibig, C., Allken, V., Ayhan, M.S., Berens, P., Wahl, S.: Leveraging uncertainty information from deep neural networks for disease detection. Sci. Rep. 7(1), 1–14 (2017)
Mangoubi, O., Vishnoi, N.K.: Faster polytope rounding, sampling, and volume computation via a sub-linear ball walk. In: 2019 IEEE 60th Annual Symposium on Foundations of Computer Science (FOCS), pp. 1338–1357. IEEE (2019)
Minderer, M., et al.: Revisiting the calibration of modern neural networks. Adv. Neural Inf. Process. Syst. 34, 1–13 (2021)
Nguyen, A., Yosinski, J., Clune, J.: Deep neural networks are easily fooled: high confidence predictions for unrecognizable images. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2015)
Paszke, A., et al.: Pytorch: an imperative style, high-performance deep learning library. Adv. Neural Inf. Process. Syst. 32, 8024–8035 (2019)
Potra, F.A., Wright, S.J.: Interior-point methods. J. Comput. Appl. Math. 124(1–2), 281–302 (2000)
Shepeleva, N., Zellinger, W., Lewandowski, M., Moser, B.: ReLU code space: a basis for rating network quality besides accuracy. In: ICLR 2020 Workshop on Neural Architecture Search (NAS 2020) (2020)
Stiemke, E.: Über positive Lösungen homogener linearer Gleichungen. Mathematische Annalen 76(2), 340–342 (1915)
Virtanen, P., et al.: Scipy 1.0: fundamental algorithms for scientific computing in python. Nature methods 17(3), 261–272 (2020)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Appendices
Appendix A: Polytope Calculation for an Input Point \(\mathbf {x}\)
Let us remind that the ReLU neural network \( F(\mathbf {x}) = f_{L} \circ \sigma \circ f_{L-1} \circ \ldots \circ \sigma \circ f_{1}(\mathbf {x})\) is a composition of L affine functions \(f_{i}(\mathbf {x}) = \mathbf {A}_{i}\mathbf {x}+ \mathbf {b}_{i}\), where \(\mathbf {A}_{i} \in \mathbb {R}^{n_{i} \times n_{i-1}}\) and \(\mathbf {b}_{i} \in \mathbb {R}^{n_{i}}\) for all \(i \in \{1, \ldots , L\}\), with a point-wise non-linear function \(\sigma (x) = \max (x, 0)\). We denote the i-th hidden layer of the network \(F(\mathbf {x})\) by \(\mathbf {a}_{i}(\mathbf {x}) = \sigma \circ f_{i}(\mathbf {x})\).
A binary activation state for an input vector \(\mathbf {x}\in \mathbb {R}^{n_{0}}\) is the function
where \(a_{k}^{i_{k}}(\mathbf {x})\) is the \(i_{k}\)-th output of the k-th hidden layer \(\mathbf {a}_{k}\), for all \(k \in \{1, \ldots , L\}\) and \(i_{k} \in \{1, \ldots , n_{k} \}\).
A polar activation state for an input vector \(\mathbf {x}\in \mathbb {R}^{n_{0}}\) is the function
for all \(k \in \{1, \ldots , L\}\) and \(i_{k} \in \{1, \ldots , n_{k} \}\). Note that we defined two binary functions which have the sets \(\{0, 1\}\) and \(\{-1, 1\}\) as codomains, respectively. By using \(\beta ^{i_{k}}_{k}(\mathbf {x})\) and \(\pi _{k}^{i_{k}}(\mathbf {x})\), we now collect all states of a layer into a diagonal matrix form:
where \(k\in \{1, \ldots , L\}\). We will use the matrix \(\mathbf {Q}_{k}^{\beta }(\mathbf {x})\) to model the behavior of the activation function in the k-th layer. For each input vector \(\mathbf {x}\in \mathbb {R}^{n_{0}}\), the matrices \(\mathbf {Q}_{k}^{\pi }(\mathbf {x})\) and \(\mathbf {Q}_{k}^{\beta }(\mathbf {x})\) allow us to derive an H-representation of the corresponding polytope \(\mathbf {H}(\mathbf {x})\in \{ \mathbf {H}_{i}\}_{i=1}^{J}\) in explicit form. More precisely, the H-representation is given as a set of inequalities in the following way:
where
such that \(\mathbf {W}_{k}(\mathbf {x}) \in \mathbb {R}^{n_{k} \times n_{0}}\) and \(\mathbf {v}_{k}(\mathbf {x}) \in \mathbb {R}^{n_{k}}\). According to (3), the polytope \(\mathbf {H}(\mathbf {x})\) is defined by exactly \(N = n_{1} + \ldots + n_{L}\) inequalities. However, in practice, the number of half-spaces whose intersection yields the polytope \(\mathbf {H}(\mathbf {x})\) is typically smaller than N, so that the above representation is not minimal in general.
Appendix B: Unbounded Linear Region Problem
As mentioned in Appendix A, a ReLU neural network F splits the input space \(\mathbb {R}^{n_{0}}\) into a set of linear regions \(\mathbb {R}^{n_{0}}=\bigcup _{i=1}^{J}\mathbf {H}_{i}\). On each such linear region the network realizes some affine function
where \(\mathbf {A}_{i} \in \mathbb {R}^{n_{L}\times n_{0}}\), \(\mathbf {b}\in \mathbb {R}^{n_{L}}\), \(\mathbf {x}_{i} \in \mathbf {H}_{i} \) and \(i \in \{1, \ldots , J\}\). So the given network F is represented by a set of affine functions \(F_{\mathbf {H}_{i}}\), each of which corresponds to some linear region \(\mathbf {H}_{i}\) for all \(i\in \{1, \ldots , J\}\).
Assume that \(\mathbf {A}_{i}\) does not contain identical rows for all \(i\in \{1, \ldots , J\}\), then for almost all \(\mathbf {x}\in \mathbb {R}^{n_{0}}\) and \(\varepsilon > 0\), there exists an \(\alpha > 0\) and a class \(k\in \{1, \ldots , n_L\}\) such that for \(\mathbf {z} = \alpha \mathbf {x}\) the following holds:
Inequality (6) shows that almost any point from the input space \(\mathbb {R}^{n_{0}}\) can be scaled such that the transformed input value will get an overconfident output for some class \(k\in \{1, \ldots , n_{L}\}\). See reference [6] for a proof of the above statement.
Appendix C: Stiemke’s Lemma
Stiemke’s Lemma states the following:
Lemma 2
Let \(\mathbf {W}\in \mathbb {R}^{m\times n}\) and \(\mathbf {W}\mathbf {x}= 0\) be a homogeneous system of linear inequalities. Then only one of the following is true:
-
1.
There exists a vector \(\mathbf {v}\in \mathbb {R}^{m}\) such that the vector \(\mathbf {W}^{T}\mathbf {v}\ge 0\) with at least one non-zero element.
-
2.
There exists a vector \(\mathbf {x}\in \mathbb {R}^{n}\) such that \(\mathbf {x}> 0\) and \(\mathbf {W}\mathbf {x}= 0\).
This lemma has variants with different sign constraints that can be found in [17].
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Ponomarchuk, A., Koutschan, C., Moser, B. (2022). Unboundedness of Linear Regions of Deep ReLU Neural Networks. In: Kotsis, G., et al. Database and Expert Systems Applications - DEXA 2022 Workshops. DEXA 2022. Communications in Computer and Information Science, vol 1633. Springer, Cham. https://doi.org/10.1007/978-3-031-14343-4_1
Download citation
DOI: https://doi.org/10.1007/978-3-031-14343-4_1
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-14342-7
Online ISBN: 978-3-031-14343-4
eBook Packages: Computer ScienceComputer Science (R0)