Skip to main content
Log in

Neural networks with linear threshold activations: structure and algorithms

Mathematical Programming Aims and scope Submit manuscript

Cite this article

Abstract

In this article we present new results on neural networks with linear threshold activation functions \(x \mapsto \mathbb {1}_{\{ x > 0\}}\). We precisely characterize the class of functions that are representable by such neural networks and show that 2 hidden layers are necessary and sufficient to represent any function representable in the class. This is a surprising result in the light of recent exact representability investigations for neural networks using other popular activation functions like rectified linear units (ReLU). We also give upper and lower bounds on the sizes of the neural networks required to represent any function in the class. Finally, we design an algorithm to solve the empirical risk minimization (ERM) problem to global optimality for these neural networks with a fixed architecture. The algorithm’s running time is polynomial in the size of the data sample, if the input dimension and the size of the network architecture are considered fixed constants. The algorithm is unique in the sense that it works for any architecture with any number of layers, whereas previous polynomial time globally optimal algorithms work only for restricted classes of architectures. Using these insights, we propose a new class of neural networks that we call shortcut linear threshold neural networks. To the best of our knowledge, this way of designing neural networks has not been explored before in the literature. We show that these neural networks have several desirable theoretical properties.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Algorithm 1

Similar content being viewed by others

References

  1. Abrahamsen, M., Kleist, L., Miltzow, T.: Training neural networks is \(\exists {\mathbb{R} }\)-complete. Adv. Neural. Inf. Process. Syst. 34, 18293–18306 (2021)

    Google Scholar 

  2. Anthony, M., Bartlett, P.L.: Neural network learning: Theoretical foundations. cambridge university press (1999)

  3. Arora, R., Basu, A., Mianjy, P., Mukherjee, A.: Understanding deep neural networks with rectified linear units. In: International Conference on Learning Representations (2018)

  4. Bartlett, P.L., Harvey, N., Liaw, C., Mehrabian, A.: Nearly-tight VC-dimension and pseudodimension bounds for piecewise linear neural networks (2017)

  5. Bertschinger, D., Hertrich, C., Jungeblut, P., Miltzow, T., Weber, S.: Training fully connected neural networks is \(\exists {\mathbb{R}}\)-complete. arXiv preprint arXiv:2204.01368 (2022)

  6. Bienstock, D., Muñoz, G., Pokutta, S.: Principled deep neural network training through linear programming. arXiv preprint arXiv:1810.03218 (2018)

  7. Boob, D., Dey, S.S., Lan, G.: Complexity of training relu neural network. Discrete Opt. 44, 100620 (2020)

    Article  MathSciNet  MATH  Google Scholar 

  8. Boob, D., Dey, S.S., Lan, G.: Complexity of training relu neural network. Discret. Optim. 44, 100620 (2022)

    Article  MathSciNet  MATH  Google Scholar 

  9. Cohen, N., Sharir, O., Shashua, A.: On the expressive power of deep learning: A tensor analysis. In: V. Feldman, A. Rakhlin, O. Shamir (eds.) 29th Annual Conference on Learning Theory, Proceedings of Machine Learning Research, vol. 49, pp. 698–728. PMLR, Columbia University, New York, New York, USA (2016). https://proceedings.mlr.press/v49/cohen16.html

  10. Cybenko, G.: Approximation by superpositions of a sigmoidal function. Math. Control Signals Syst. 2(4), 303–314 (1989)

    Article  MathSciNet  MATH  Google Scholar 

  11. Dey, S.S., Wang, G., Xie, Y.: Approximation algorithms for training one-node relu neural networks. IEEE Trans. Signal Process 68, 6696–6706 (2020)

    Article  MathSciNet  MATH  Google Scholar 

  12. Eldan, R., Shamir, O.: The power of depth for feedforward neural networks. In: 29th Annual Conference on Learning Theory, pp. 907–940 (2016)

  13. Froese, V., Hertrich, C., Niedermeier, R.: The computational complexity of ReLU network training parameterized by data dimensionality. arXiv preprint arXiv:2105.08675 (2021)

  14. Froese, V., Hertrich, C., Niedermeier, R.: The computational complexity of relu network training parameterized by data dimensionality. J. Artif. Intell. Res. 74, 1775–1790 (2022)

    Article  MathSciNet  MATH  Google Scholar 

  15. Goel, S., Kanade, V., Klivans, A., Thaler, J.: Reliably learning the relu in polynomial time. In: Conference on Learning Theory, pp. 1004–1042. PMLR (2017)

  16. Goel, S., Klivans, A., Manurangsi, P., Reichman, D.: Tight hardness results for training depth-2 relu networks. arXiv preprint arXiv:2011.13550 (2020)

  17. Goel, S., Klivans, A., Meka, R.: Learning one convolutional layer with overlapping patches. In: International Conference on Machine Learning, pp. 1783–1791. PMLR (2018)

  18. Goel, S., Klivans, A.R.: Learning neural networks with two nonlinear layers in polynomial time. In: Conference on Learning Theory, pp. 1470–1499. PMLR (2019)

  19. Goel, S., Klivans, A.R., Manurangsi, P., Reichman, D.: Tight hardness results for training depth-2 ReLU networks. In: 12th Innovations in Theoretical Computer Science Conference (ITCS ’21), LIPIcs, vol. 185, pp. 22:1–22:14. Schloss Dagstuhl - Leibniz-Zentrum für Informatik (2021)

  20. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 770–778 (2016)

  21. Hertrich, C., Basu, A., Di Summa, M., Skutella, M.: Towards lower bounds on the depth of relu neural networks. To appear in NeurIPS 2021 (arXiv preprint arXiv:2105.14835) (2021)

  22. Hoffgen, K.U., Simon, H.U., Vanhorn, K.S.: Robust trainability of single neurons. J. Comput. Syst. Sci. 50(1), 114–125 (1995)

    Article  MathSciNet  MATH  Google Scholar 

  23. Hornik, K.: Approximation capabilities of multilayer feedforward networks. Neural Netw. 4(2), 251–257 (1991)

    Article  MathSciNet  Google Scholar 

  24. Impagliazzo, R., Paturi, R., Saks, M.E.: Size-depth tradeoffs for threshold circuits. SIAM J. Comput. 26(3), 693–707 (1997)

    Article  MathSciNet  MATH  Google Scholar 

  25. Kane, D.M., Williams, R.: Super-linear gate and super-quadratic wire lower bounds for depth-two and depth-three threshold circuits. In: Proceedings of the forty-eighth annual ACM symposium on Theory of Computing, pp. 633–643 (2016)

  26. Manurangsi, P., Reichman, D.: The computational complexity of training relu (s). arXiv preprint arXiv:1810.04207 (2018)

  27. Matousek, J.: Lectures on discrete geometry, vol. 212. Springer Science & Business Media (2013)

  28. MUROGA, S.: Threshold logic and its application. Wiley-Interscience (1971)

  29. Paturi, R., Saks, M.E.: Approximating threshold circuits by rational functions. Inf. Comput. 112(2), 257–272 (1994)

    Article  MathSciNet  MATH  Google Scholar 

  30. Telgarsky, M.: Benefits of depth in neural networks. In: Conference on learning theory, pp. 1517–1539. PMLR (2016)

  31. Wachs, M.L.: Poset topology: tools and applications. arXiv preprint math/0602226 (2006)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sammy Khalife.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

A preliminary version of this work was published in the IPCO22 proceedings (same title). The first and third authors gratefully acknowledge support from Air Force Office of Scientific Research (AFOSR) grant FA95502010341 and National Science Foundation (NSF) grant CCF2006587.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and Permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Khalife, S., Cheng, H. & Basu, A. Neural networks with linear threshold activations: structure and algorithms. Math. Program. (2023). https://doi.org/10.1007/s10107-023-02016-5

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s10107-023-02016-5

Keywords

Mathematics Subject Classification

Navigation