Skip to main content

Unboundedness of Linear Regions of Deep ReLU Neural Networks

  • 656 Accesses

Part of the Communications in Computer and Information Science book series (CCIS,volume 1633)

Abstract

Recent work concerning adversarial attacks on ReLU neural networks has shown that unbounded regions and regions with a sufficiently large volume can be prone to containing adversarial samples. Finding the representation of linear regions and identifying their properties are challenging tasks. In practice, one works with deep neural networks and high-dimensional input data that leads to polytopes represented by an extensive number of inequalities, and hence demanding high computational resources. The approach should be scalable, feasible and numerically stable. We discuss an algorithm that finds the H-representation of each region of a neural network and identifies if the region is bounded or not.

Keywords

  • Neural network
  • Unbounded polytope
  • Linear programming
  • ReLU activation function

The research reported in this paper has been partly funded by BMK, BMDW, and the Province of Upper Austria in the frame of the COMET Programme managed by FFG in the COMET Module S3AI.

This is a preview of subscription content, access via your institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   99.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   129.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Büeler, B., Enge, A., Fukuda, K.: Exact volume computation for polytopes: a practical study. In: Polytopes-Combinatorics and Computation, pp. 131–154. Springer, Heidelberg (2000). https://doi.org/10.1007/978-3-0348-8438-9_6

  2. Caswell, T.A., et al.: matplotlib/matplotlib: Rel: v3.5.1 (2021)

    Google Scholar 

  3. Emiris, I.Z., Fisikopoulos, V.: Efficient random-walk methods for approximating polytope volume. In: Proceedings of the Thirtieth Annual Symposium on Computational Geometry, pp. 318–327 (2014)

    Google Scholar 

  4. Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330. PMLR (2017)

    Google Scholar 

  5. Harris, C.R., et al.: Array programming with NumPy. Nature 585(7825), 357–362 (2020)

    CrossRef  Google Scholar 

  6. Hein, M., Andriushchenko, M., Bitterwolf, J.: Why ReLU networks yield high-confidence predictions far away from the training data and how to mitigate the problem. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 41–50 (2019)

    Google Scholar 

  7. Hendrycks, D., Gimpel, K.: A baseline for detecting misclassified and out-of-distribution examples in neural networks. arXiv preprint arXiv:1610.02136 (2016)

  8. Hsu, Y.C., Shen, Y., Jin, H., Kira, Z.: Generalized ODIN: detecting out-of-distribution image without learning from out-of-distribution data. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2020)

    Google Scholar 

  9. Huangfu, Q., Hall, J.A.J.: Parallelizing the dual revised simplex method. Math. Program. Comput. 10(1), 119–142 (2017). https://doi.org/10.1007/s12532-017-0130-5

    CrossRef  MathSciNet  MATH  Google Scholar 

  10. Leibig, C., Allken, V., Ayhan, M.S., Berens, P., Wahl, S.: Leveraging uncertainty information from deep neural networks for disease detection. Sci. Rep. 7(1), 1–14 (2017)

    CrossRef  Google Scholar 

  11. Mangoubi, O., Vishnoi, N.K.: Faster polytope rounding, sampling, and volume computation via a sub-linear ball walk. In: 2019 IEEE 60th Annual Symposium on Foundations of Computer Science (FOCS), pp. 1338–1357. IEEE (2019)

    Google Scholar 

  12. Minderer, M., et al.: Revisiting the calibration of modern neural networks. Adv. Neural Inf. Process. Syst. 34, 1–13 (2021)

    Google Scholar 

  13. Nguyen, A., Yosinski, J., Clune, J.: Deep neural networks are easily fooled: high confidence predictions for unrecognizable images. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2015)

    Google Scholar 

  14. Paszke, A., et al.: Pytorch: an imperative style, high-performance deep learning library. Adv. Neural Inf. Process. Syst. 32, 8024–8035 (2019)

    Google Scholar 

  15. Potra, F.A., Wright, S.J.: Interior-point methods. J. Comput. Appl. Math. 124(1–2), 281–302 (2000)

    CrossRef  MathSciNet  Google Scholar 

  16. Shepeleva, N., Zellinger, W., Lewandowski, M., Moser, B.: ReLU code space: a basis for rating network quality besides accuracy. In: ICLR 2020 Workshop on Neural Architecture Search (NAS 2020) (2020)

    Google Scholar 

  17. Stiemke, E.: Über positive Lösungen homogener linearer Gleichungen. Mathematische Annalen 76(2), 340–342 (1915)

    CrossRef  MathSciNet  Google Scholar 

  18. Virtanen, P., et al.: Scipy 1.0: fundamental algorithms for scientific computing in python. Nature methods 17(3), 261–272 (2020)

    CrossRef  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Anton Ponomarchuk .

Editor information

Editors and Affiliations

Appendices

Appendix A: Polytope Calculation for an Input Point \(\mathbf {x}\)

Let us remind that the ReLU neural network \( F(\mathbf {x}) = f_{L} \circ \sigma \circ f_{L-1} \circ \ldots \circ \sigma \circ f_{1}(\mathbf {x})\) is a composition of L affine functions \(f_{i}(\mathbf {x}) = \mathbf {A}_{i}\mathbf {x}+ \mathbf {b}_{i}\), where \(\mathbf {A}_{i} \in \mathbb {R}^{n_{i} \times n_{i-1}}\) and \(\mathbf {b}_{i} \in \mathbb {R}^{n_{i}}\) for all \(i \in \{1, \ldots , L\}\), with a point-wise non-linear function \(\sigma (x) = \max (x, 0)\). We denote the i-th hidden layer of the network \(F(\mathbf {x})\) by \(\mathbf {a}_{i}(\mathbf {x}) = \sigma \circ f_{i}(\mathbf {x})\).

A binary activation state for an input vector \(\mathbf {x}\in \mathbb {R}^{n_{0}}\) is the function

$$\begin{aligned} \beta ^{i_{k}}_{k}(\mathbf {x}) {:}{=}{\left\{ \begin{array}{ll} 1, &{} a_{k}^{i_{k}}(\mathbf {x}) > 0, \\ 0, &{} \text {otherwise}, \end{array}\right. } \end{aligned}$$

where \(a_{k}^{i_{k}}(\mathbf {x})\) is the \(i_{k}\)-th output of the k-th hidden layer \(\mathbf {a}_{k}\), for all \(k \in \{1, \ldots , L\}\) and \(i_{k} \in \{1, \ldots , n_{k} \}\).

A polar activation state for an input vector \(\mathbf {x}\in \mathbb {R}^{n_{0}}\) is the function

$$\begin{aligned} \pi _{k}^{i_{k}}(\mathbf {x}) {:}{=}2 \beta ^{i_{k}}_{k}(\mathbf {x}) - 1, \end{aligned}$$

for all \(k \in \{1, \ldots , L\}\) and \(i_{k} \in \{1, \ldots , n_{k} \}\). Note that we defined two binary functions which have the sets \(\{0, 1\}\) and \(\{-1, 1\}\) as codomains, respectively. By using \(\beta ^{i_{k}}_{k}(\mathbf {x})\) and \(\pi _{k}^{i_{k}}(\mathbf {x})\), we now collect all states of a layer into a diagonal matrix form:

$$\begin{aligned}&\mathbf {Q}_{k}^{\pi }(\mathbf {x}) {:}{=}{\text {diag}}\bigl (\pi _{k}^{1}(\mathbf {x}), \ldots , \pi _{k}^{n_{k}}(\mathbf {x})\bigr ), \\&\mathbf {Q}_{k}^{\beta }(\mathbf {x}) {:}{=}{\text {diag}}\bigl (\beta _{k}^{1}(\mathbf {x}), \ldots ,\beta _{k}^{n_{k}}(\mathbf {x})\bigr ), \end{aligned}$$

where \(k\in \{1, \ldots , L\}\). We will use the matrix \(\mathbf {Q}_{k}^{\beta }(\mathbf {x})\) to model the behavior of the activation function in the k-th layer. For each input vector \(\mathbf {x}\in \mathbb {R}^{n_{0}}\), the matrices \(\mathbf {Q}_{k}^{\pi }(\mathbf {x})\) and \(\mathbf {Q}_{k}^{\beta }(\mathbf {x})\) allow us to derive an H-representation of the corresponding polytope \(\mathbf {H}(\mathbf {x})\in \{ \mathbf {H}_{i}\}_{i=1}^{J}\) in explicit form. More precisely, the H-representation is given as a set of inequalities in the following way:

$$\begin{aligned} \mathbf {H}(\mathbf {x}) {:}{=}\bigr \{\mathbf {x}' \in \mathbb {R}^{n_{0}} \mid \mathbf {W}_{k}(\mathbf {x})\cdot \mathbf {x}' + \mathbf {v}_{k}(\mathbf {x}) \ge \mathbf {0}, \ k \in \{ 1, \ldots , L\} \bigr \}, \end{aligned}$$
(3)

where

$$\begin{aligned} \mathbf {W}_{k}(\mathbf {x})&{:}{=}\mathbf {Q}^{\pi }_{k}(\mathbf {x})\mathbf {A}_{k}\prod _{j=1}^{k=1}\mathbf {Q}_{k-j}^{\beta }(\mathbf {x})\mathbf {A}_{k-j}, \end{aligned}$$
(4)
$$\begin{aligned} \mathbf {v}_{k}(\mathbf {x})&{:}{=}\mathbf {Q}^{\pi }_{k}(\mathbf {x}) \sum _{i=1}^k \left( \prod _{j=1}^{k-i} \mathbf {A}_{k-j+1} \mathbf {Q}_{k-j}^{\beta }(\mathbf {x})\right) \mathbf {b}_{i}, \end{aligned}$$
(5)

such that \(\mathbf {W}_{k}(\mathbf {x}) \in \mathbb {R}^{n_{k} \times n_{0}}\) and \(\mathbf {v}_{k}(\mathbf {x}) \in \mathbb {R}^{n_{k}}\). According to (3), the polytope \(\mathbf {H}(\mathbf {x})\) is defined by exactly \(N = n_{1} + \ldots + n_{L}\) inequalities. However, in practice, the number of half-spaces whose intersection yields the polytope \(\mathbf {H}(\mathbf {x})\) is typically smaller than N, so that the above representation is not minimal in general.

Appendix B: Unbounded Linear Region Problem

As mentioned in Appendix A, a ReLU neural network F splits the input space \(\mathbb {R}^{n_{0}}\) into a set of linear regions \(\mathbb {R}^{n_{0}}=\bigcup _{i=1}^{J}\mathbf {H}_{i}\). On each such linear region the network realizes some affine function

$$\begin{aligned} F_{\mathbf {H}_{i}}(\mathbf {x}) {:}{=}\mathbf {A}_{i}\mathbf {x}+ \mathbf {b}_{i}, \end{aligned}$$

where \(\mathbf {A}_{i} \in \mathbb {R}^{n_{L}\times n_{0}}\), \(\mathbf {b}\in \mathbb {R}^{n_{L}}\), \(\mathbf {x}_{i} \in \mathbf {H}_{i} \) and \(i \in \{1, \ldots , J\}\). So the given network F is represented by a set of affine functions \(F_{\mathbf {H}_{i}}\), each of which corresponds to some linear region \(\mathbf {H}_{i}\) for all \(i\in \{1, \ldots , J\}\).

Assume that \(\mathbf {A}_{i}\) does not contain identical rows for all \(i\in \{1, \ldots , J\}\), then for almost all \(\mathbf {x}\in \mathbb {R}^{n_{0}}\) and \(\varepsilon > 0\), there exists an \(\alpha > 0\) and a class \(k\in \{1, \ldots , n_L\}\) such that for \(\mathbf {z} = \alpha \mathbf {x}\) the following holds:

$$\begin{aligned} \dfrac{\exp (F_k(\mathbf {z}))}{\sum _{j=1}^{n_L}\exp (F_{j}(\mathbf {z}))} \ge 1 - \varepsilon . \end{aligned}$$
(6)

Inequality (6) shows that almost any point from the input space \(\mathbb {R}^{n_{0}}\) can be scaled such that the transformed input value will get an overconfident output for some class \(k\in \{1, \ldots , n_{L}\}\). See reference [6] for a proof of the above statement.

Appendix C: Stiemke’s Lemma

Stiemke’s Lemma states the following:

Lemma 2

Let \(\mathbf {W}\in \mathbb {R}^{m\times n}\) and \(\mathbf {W}\mathbf {x}= 0\) be a homogeneous system of linear inequalities. Then only one of the following is true:

  1. 1.

    There exists a vector \(\mathbf {v}\in \mathbb {R}^{m}\) such that the vector \(\mathbf {W}^{T}\mathbf {v}\ge 0\) with at least one non-zero element.

  2. 2.

    There exists a vector \(\mathbf {x}\in \mathbb {R}^{n}\) such that \(\mathbf {x}> 0\) and \(\mathbf {W}\mathbf {x}= 0\).

This lemma has variants with different sign constraints that can be found in [17].

Rights and permissions

Reprints and Permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Ponomarchuk, A., Koutschan, C., Moser, B. (2022). Unboundedness of Linear Regions of Deep ReLU Neural Networks. In: Kotsis, G., et al. Database and Expert Systems Applications - DEXA 2022 Workshops. DEXA 2022. Communications in Computer and Information Science, vol 1633. Springer, Cham. https://doi.org/10.1007/978-3-031-14343-4_1

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-14343-4_1

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-14342-7

  • Online ISBN: 978-3-031-14343-4

  • eBook Packages: Computer ScienceComputer Science (R0)