Abstract
In this article we present new results on neural networks with linear threshold activation functions \(x \mapsto \mathbb {1}_{\{ x > 0\}}\). We precisely characterize the class of functions that are representable by such neural networks and show that 2 hidden layers are necessary and sufficient to represent any function representable in the class. This is a surprising result in the light of recent exact representability investigations for neural networks using other popular activation functions like rectified linear units (ReLU). We also give upper and lower bounds on the sizes of the neural networks required to represent any function in the class. Finally, we design an algorithm to solve the empirical risk minimization (ERM) problem to global optimality for these neural networks with a fixed architecture. The algorithm’s running time is polynomial in the size of the data sample, if the input dimension and the size of the network architecture are considered fixed constants. The algorithm is unique in the sense that it works for any architecture with any number of layers, whereas previous polynomial time globally optimal algorithms work only for restricted classes of architectures. Using these insights, we propose a new class of neural networks that we call shortcut linear threshold neural networks. To the best of our knowledge, this way of designing neural networks has not been explored before in the literature. We show that these neural networks have several desirable theoretical properties.
This is a preview of subscription content,
to check access.


Similar content being viewed by others
References
Abrahamsen, M., Kleist, L., Miltzow, T.: Training neural networks is \(\exists {\mathbb{R} }\)-complete. Adv. Neural. Inf. Process. Syst. 34, 18293–18306 (2021)
Anthony, M., Bartlett, P.L.: Neural network learning: Theoretical foundations. cambridge university press (1999)
Arora, R., Basu, A., Mianjy, P., Mukherjee, A.: Understanding deep neural networks with rectified linear units. In: International Conference on Learning Representations (2018)
Bartlett, P.L., Harvey, N., Liaw, C., Mehrabian, A.: Nearly-tight VC-dimension and pseudodimension bounds for piecewise linear neural networks (2017)
Bertschinger, D., Hertrich, C., Jungeblut, P., Miltzow, T., Weber, S.: Training fully connected neural networks is \(\exists {\mathbb{R}}\)-complete. arXiv preprint arXiv:2204.01368 (2022)
Bienstock, D., Muñoz, G., Pokutta, S.: Principled deep neural network training through linear programming. arXiv preprint arXiv:1810.03218 (2018)
Boob, D., Dey, S.S., Lan, G.: Complexity of training relu neural network. Discrete Opt. 44, 100620 (2020)
Boob, D., Dey, S.S., Lan, G.: Complexity of training relu neural network. Discret. Optim. 44, 100620 (2022)
Cohen, N., Sharir, O., Shashua, A.: On the expressive power of deep learning: A tensor analysis. In: V. Feldman, A. Rakhlin, O. Shamir (eds.) 29th Annual Conference on Learning Theory, Proceedings of Machine Learning Research, vol. 49, pp. 698–728. PMLR, Columbia University, New York, New York, USA (2016). https://proceedings.mlr.press/v49/cohen16.html
Cybenko, G.: Approximation by superpositions of a sigmoidal function. Math. Control Signals Syst. 2(4), 303–314 (1989)
Dey, S.S., Wang, G., Xie, Y.: Approximation algorithms for training one-node relu neural networks. IEEE Trans. Signal Process 68, 6696–6706 (2020)
Eldan, R., Shamir, O.: The power of depth for feedforward neural networks. In: 29th Annual Conference on Learning Theory, pp. 907–940 (2016)
Froese, V., Hertrich, C., Niedermeier, R.: The computational complexity of ReLU network training parameterized by data dimensionality. arXiv preprint arXiv:2105.08675 (2021)
Froese, V., Hertrich, C., Niedermeier, R.: The computational complexity of relu network training parameterized by data dimensionality. J. Artif. Intell. Res. 74, 1775–1790 (2022)
Goel, S., Kanade, V., Klivans, A., Thaler, J.: Reliably learning the relu in polynomial time. In: Conference on Learning Theory, pp. 1004–1042. PMLR (2017)
Goel, S., Klivans, A., Manurangsi, P., Reichman, D.: Tight hardness results for training depth-2 relu networks. arXiv preprint arXiv:2011.13550 (2020)
Goel, S., Klivans, A., Meka, R.: Learning one convolutional layer with overlapping patches. In: International Conference on Machine Learning, pp. 1783–1791. PMLR (2018)
Goel, S., Klivans, A.R.: Learning neural networks with two nonlinear layers in polynomial time. In: Conference on Learning Theory, pp. 1470–1499. PMLR (2019)
Goel, S., Klivans, A.R., Manurangsi, P., Reichman, D.: Tight hardness results for training depth-2 ReLU networks. In: 12th Innovations in Theoretical Computer Science Conference (ITCS ’21), LIPIcs, vol. 185, pp. 22:1–22:14. Schloss Dagstuhl - Leibniz-Zentrum für Informatik (2021)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 770–778 (2016)
Hertrich, C., Basu, A., Di Summa, M., Skutella, M.: Towards lower bounds on the depth of relu neural networks. To appear in NeurIPS 2021 (arXiv preprint arXiv:2105.14835) (2021)
Hoffgen, K.U., Simon, H.U., Vanhorn, K.S.: Robust trainability of single neurons. J. Comput. Syst. Sci. 50(1), 114–125 (1995)
Hornik, K.: Approximation capabilities of multilayer feedforward networks. Neural Netw. 4(2), 251–257 (1991)
Impagliazzo, R., Paturi, R., Saks, M.E.: Size-depth tradeoffs for threshold circuits. SIAM J. Comput. 26(3), 693–707 (1997)
Kane, D.M., Williams, R.: Super-linear gate and super-quadratic wire lower bounds for depth-two and depth-three threshold circuits. In: Proceedings of the forty-eighth annual ACM symposium on Theory of Computing, pp. 633–643 (2016)
Manurangsi, P., Reichman, D.: The computational complexity of training relu (s). arXiv preprint arXiv:1810.04207 (2018)
Matousek, J.: Lectures on discrete geometry, vol. 212. Springer Science & Business Media (2013)
MUROGA, S.: Threshold logic and its application. Wiley-Interscience (1971)
Paturi, R., Saks, M.E.: Approximating threshold circuits by rational functions. Inf. Comput. 112(2), 257–272 (1994)
Telgarsky, M.: Benefits of depth in neural networks. In: Conference on learning theory, pp. 1517–1539. PMLR (2016)
Wachs, M.L.: Poset topology: tools and applications. arXiv preprint math/0602226 (2006)
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
A preliminary version of this work was published in the IPCO22 proceedings (same title). The first and third authors gratefully acknowledge support from Air Force Office of Scientific Research (AFOSR) grant FA95502010341 and National Science Foundation (NSF) grant CCF2006587.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Khalife, S., Cheng, H. & Basu, A. Neural networks with linear threshold activations: structure and algorithms. Math. Program. (2023). https://doi.org/10.1007/s10107-023-02016-5
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s10107-023-02016-5