Abstract
This work provides an additional step in the theoretical understanding of neural networks. We consider neural networks with one hidden layer and show that when learning symmetric functions, one can choose initial conditions so that standard SGD training efficiently produces generalization guarantees. We empirically verify this and show that this does not hold when the initial conditions are chosen at random. The proof of convergence investigates the interaction between the two layers of the network. Our results highlight the importance of using symmetry in the design of neural networks.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
A standard “lifting” that adds a coordinate with 1 to every vector allows to translate the affine case to the linear case.
References
Abbe, E., Sandon, C.: Provable limitations of deep learning. arXiv e-prints p. 1812.06369 (2018)
Ajtai, M.: \(\sum ^1_1\)-formulae on finite structures. Ann. Pure Appl. Logic 24, 1–48 (1983)
Allen-Zhu, Z., Li, Y., Liang, Y.: Learning and generalization in overparameterized neural networks, going beyond two layers. arXiv e-prints p. 1811.04918 (2018)
Allen-Zhu, Z., Li, Y., Song, Z.: A convergence theory for deep learning via over-parameterization. arXiv e-prints p. 1811.03962 (2018)
Andoni, A., Panigrahy, R., Valiant, G., Zhang, L.: Learning polynomials with neural networks. In: Xing, E.P., Jebara, T. (eds.) Proceedings of the 31st International Conference on Machine Learning. pp. 1908–1916 (2014)
Arora, R., Basu, A., Mianjy, P., Mukherjee, A.: Understanding deep neural networks with rectified linear units. arXiv e-prints p. 1611.01491 (2016)
Arora, S., Du, S.S., Hu, W., Li, Z., Wang, R.: Fine-grained analysis of optimization and generalization for overparameterized two-layer neural networks. arXiv e-prints p. 1901.08584 (2019)
Arslanov, M.Z., Ashigaliev, D.U., Ismail, E.: N-bit parity ordered neural networks. Neurocomput. 48, 1053–1056 (2002)
Arslanov, M., Amirgalieva, Z.E., Kenshimov, C.A.: N-bit parity neural networks with minimum number of threshold neurons. Open Eng. 6, 309–313 (2016)
Bartlett, P., Foster, D.J., Telgarsky, M.: Spectrally-normalized margin bounds for neural networks. arXiv e-prints p. 1706.08498 (2017)
Brutzkus, A., Globerson, A., Malach, E., Shalev-Shwartz, S.: SGD learns over-parameterized networks that provably generalize on linearly separable data. arXiv e-prints p. 1710.10174 (2018)
Chizat, L., Bach, F.: A note on lazy training in supervised differentiable programming. arXiv e-prints p. 1812.07956 (2018)
Cohen, T.S., Welling, M.: Group equivariant convolutional networks. arXiv e-prints p. 1602.07576 (2016)
Collobert, R., Bengio, S.: Links between perceptrons. In: Proceedings of the 21st International Conference on Machine Learning. p. 23 (2004)
Daniely, A.: SGD learns the conjugate kernel class of the network. arXiv e-prints p. 1702.08503 (2017)
Du, S.S., Zhai, X., Póczos, B., Singh, A.: Gradient descent provably optimizes over-parameterized neural networks. arXiv e-prints p. 1810.02054 (2018)
Eldan, R., Shamir, O.: The power of depth for feedforward neural networks. In: Feldman, V., Rakhlin, A., Shamir, O. (eds.) Proceedings of the 29th Annual Conference on Learning Theory. Proceedings of Machine Learning Research, vol. 49, pp. 907–940. PMLR, Columbia University, New York, USA (2016)
Elsayed, G.F., Krishnan, D., Mobahi, H., Regan, K., Bengio, S.: Large margin deep networks for classification. arXiv e-prints p. 1803.05598 (2018)
Furst, M., Saxe, J.B., Sipser, M.: Parity, circuits, and the polynomial-time hierarchy. In: Proceedings of the 22nd Symposium on the Foundations of Computer Science. pp. 260–270 (1981)
Gens, R., Domingos, P.M.: Deep symmetry networks. In: Ghahramani, Z., Welling, M., Cortes, C., Lawrence, N.D., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 27, NIPS 2014, pp. 2537–2545 (2014)
Håstad, J.: Computational Limitations of Small-depth Circuits. MIT Press, United States (1987)
Iyoda, E.M., Nobuhara, H., Hirota, K.: A solution for the n-bit parity problem using a single translated multiplicative neuron. Neural Process. Lett. 18, 233–238 (2003)
Jacot, A., Gabriel, F., Hongler, C.: Neural tangent kernel: Convergence and generalization in neural networks. arXiv e-prints p. arXiv:1806.07572 (2018)
Kearns, M.: Efficient noise-tolerant learning from statistical queries. J. ACM 45(6), 983–1006 (1998)
Krauth, W., Mezard, M.: Learning algorithms with optimal stability in neural networks. J. Phys. A: Math. General 20, L745–L752 (1987)
Lecun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proc. IEEE 86, 2278–2324 (1998)
Li, Y., Liang, Y.: Learning overparameterized neural networks via stochastic gradient descent on structured data. arXiv e-prints p. 1808.01204 (2018)
Littlestone, N., Warmuth, M.K.: Relating data compression and learnability (1986), Unpublished manuscript, University of California Santa Cruz (1986)
Liu, W., Wen, Y., Yu, Z., Yang, M.M.: Large-margin softmax loss for convolutional neural networks. arXiv e-prints p. 1612.02295 (2016)
Minsky, M.L., Papert, S.A.: Perceptrons, Expanded edn. MIT Press, Cambridge, MA, USA (1988)
Moran, S., Nachum, I., Panasoff, I., Yehudayoff, A.: On the perceptron’s compression. arXiv e-prints p. 1806.05403 (2018)
Novikoff, A.B.J.: On convergence proofs on perceptrons. Proceedings of the Symposium on the Mathematical Theory of Automata. 12, 615–622 (1962)
Rahimi, A., Recht, B.: Random features for large-scale kernel machines. In: Platt, J.C., Koller, D., Singer, Y., Roweis, S.T. (eds.) Advances in Neural Information Processing Systems 20, pp. 1177–1184 (2008)
Romero, E., Alquezar, R.: Maximizing the margin with feedforward neural networks. In: Proceedings of the 2002 International Joint Conference on Neural Networks, IJCNN 2002. vol. 1, pp. 743–748 (2002)
Rosenblatt, F.: The perceptron: a probabilistic model for information storage and organization in the brain. Psychol. Rev. 65, 386–408 (1958)
Shalev-Shwartz, S., Ben-David, S.: Understanding machine learning: From theory to algorithms. Cambridge University Press (2014)
Shamir, O.: Distribution-specific hardness of learning neural networks. arXiv e-prints p. 1609.01037 (2016)
Sokolic, J., Giryes, R., Sapiro, G., Rodrigues, M.R.D.: Margin preservation of deep neural networks. arXiv e-prints p. 1605.08254v1 (2016)
Sokolic, J., Giryes, R., Sapiro, G., Rodrigues, M.R.D.: Robust large margin deep neural networks. IEEE Trans. Signal Process. 65, 4265–4280 (2017)
Song, L., Vempala, S., Wilmes, J., Xie, B.: On the complexity of learning neural networks. arXiv e-prints p. 1707.04615 (2017)
Soudry, D., Carmon, Y.: No bad local minima: Data independent training error guarantees for multilayer neural networks. arXiv e-prints p. 1605.08361 (2016)
Sun, S., Chen, W., Wang, L., Liu, T.Y.: Large margin deep neural networks: Theory and algorithms. arXiv e-prints p. 1506.05232 (2015)
Telgarsky, M.: Representation benefits of deep feedforward networks. arXiv e-prints p. 1509.08101 (2016)
Wilamowski, B., Hunter, D., Malinowski, A.: Solving parity-n problems with feedforward neural networks. In: Proceedings of the International Joint Conference on Neural Networks, IJCNN. vol. 4, pp. 2546–2551 (2003)
Zaheer, M., Kottur, S., Ravanbakhsh, S., Poczos, B., Salakhutdinov, R., Smola, A.: Deep sets. In: Guyon, I., et al., (eds.) Advances in Neural Information Processing Systems 30, pp. 3391–3401 (2017)
Zou, D., Cao, Y., Zhou, D., Gu., Q.: Stochastic gradient descent optimizes over-parameterized deep relu networks. arXiv e-prints p. 1811.08888 (2018)
Author information
Authors and Affiliations
Corresponding authors
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Nachum, I., Yehudayoff, A. (2020). On Symmetry and Initialization for Neural Networks. In: Kohayakawa, Y., Miyazawa, F.K. (eds) LATIN 2020: Theoretical Informatics. LATIN 2021. Lecture Notes in Computer Science(), vol 12118. Springer, Cham. https://doi.org/10.1007/978-3-030-61792-9_32
Download citation
DOI: https://doi.org/10.1007/978-3-030-61792-9_32
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-61791-2
Online ISBN: 978-3-030-61792-9
eBook Packages: Computer ScienceComputer Science (R0)