Training a 3-node neural network is NP-complete
We show for many simple two-layer networks whose nodes compute linear threshold functions of their inputs that training is NP-complete. For any training algorithm for one of these networks there will be some sets of training data on which it performs poorly, either by running for more than an amount of time polynomial in the input length, or by producing sub-optimal weights. Thus, these networks differ fundamentally from the perceptron in a worst-case computational sense.
The theorems and proofs are in a sense fragile; they do not imply that training is necessarily hard for networks other than those specifically mentioned. They do, however, suggest that one cannot escape computational difficulties simply by considering only very simple or very regular networks.
On a somewhat more positive note, we present two networks such that the second is both more powerful than the first and can be trained in polynomial time, even though the first is NP-complete to train. This shows that computational intractability does not depend directly on network power and provides theoretical support for the idea that finding an appropriate network and input encoding for one's training problem is an important part of the training process.
An open problem is whether the NP-completeness results can be extended to neural networks that use the differentiable logistic linear functions. We conjecture that training remains NP-complete when these functions are used since their use does not seem to alter significantly the expressive power of a neural network. However, our proof techniques break down. Note that Judd , for the networks he considers, shows NP-completeness for a wide variety of node functions including logistic linear functions.
KeywordsPolynomial Time Hide Node Output Node Training Algorithm Training Problem
Unable to display preview. Download preview PDF.
- 1.Baum, E. B. and Haussler, D. What size net gives valid generalization? In Advances in Neural Information Processing Systems I (1989) 81–90.Google Scholar
- 2.Blum, A. On the computational complexity of training simple neural networks. Master's thesis, MIT Department of Electrical Engineering and Computer Science. (Published as Laboratory for Computer Science Technical Report MIT/LCS/TR-445) (1989).Google Scholar
- 3.Blum, A. An Õ(n0.4)-approximation algorithm for 3-coloring (and improved approximation algorithms for k-coloring). In Proceedings of the Twenty-First Annual ACM Symposium on Theory of Computing (1989) 535–542.Google Scholar
- 4.Blum, A. and Rivest, R. L. Training a 3-node neural network is NP-Complete. In Proceedings of the 1988 Workshop on Computational Learning Theory (1988) 9–18.Google Scholar
- 5.Blum, A. and Rivest, R. L. Training a 3-node neural net is NP-Complete. In David S. Touretzky, editor, Advances in Neural Information Processing Systems I (1989) 494–501.Google Scholar
- 6.Garey, M. and Johnson, D. Computers and Intractability: A Guide to the Theory of NP-Completeness. W. H. Freeman, San Francisco (1979).Google Scholar
- 7.Haussler, D. Generalizing the PAC model for neural net and other learning applications. Technical Report UCSC-CRL-89-30, University of California Santa Cruz (1989).Google Scholar
- 8.Judd, J. S. Neural Network Design and the Complexity of Learning. PhD thesis, University of Massachussets at Amherst, Department of Computer and Information Science (1988).Google Scholar
- 9.Judd, J. S. Neural Network Design and the Complexity of Learning. MIT Press (1990)Google Scholar
- 10.Kearns, M., Li, M., Pitt, L., and Valiant, L. On the learnability of Boolean formulae. In Proceedings of the Nineteenth Annual ACM Symposium on Theory of Computing (1987) 285–295.Google Scholar
- 11.Kearns, M. and Valiant, L. Cryptographic limitations on learning boolean formulae and finite automata. In Proceedings of the Twenty-First Annual ACM Symposium on Theory of Computing (1989) 433–444.Google Scholar
- 12.Megiddo, N. On the complexity of polyhedral separability. Technical Report RJ 5252, IBM Almaden Research Center (1986).Google Scholar
- 13.Raghavan, P. Learning in threshold networks. In First Workshop on Computational Learning Theory (1988) 19–27.Google Scholar
- 14.Rumelhart, D. E., Hinton, G. E., and Williams, R. J. Learning internal representations by error propagation. In Parallel Distributed Processing—Explorations in the Microstructure of Cognition, David E. Rumelhart and James L. McClelland, editors, Chapter 8 (1986) 318–362.Google Scholar
- 15.Sejnowski, T. J. and Rosenberg, C. R. Parallel networks that learn to pronounce english text. Journal of Complex Systems, 1(1) (1987) 145–168.Google Scholar
- 16.Tesauro, G. and Janssens, B. Scaling relationships in back-propagation learning. Complex Systems, 2 (1988) 39–44.Google Scholar
- 17.L. Valiant and Warmuth M. K. Predicting symmetric differences of two halfspaces reduces to predicting half spaces. Unpublished manuscript (1989).Google Scholar
- 18.Wigderson, A. Improving the performance guarantee for approximate graph coloring. JACM, 30(4) (1983) 729–735.Google Scholar