Tighter Guarantees for the Compressive Multi-layer Perceptron
We are interested in theoretical guarantees for classic 2-layer feed-forward neural networks with sigmoidal activation functions, having inputs linearly compressed by random projection. Due to the speedy increase of the dimensionality of modern data sets, and the development of novel data acquisition devices in compressed sensing, a proper understanding of are the guarantees obtainable is of much practical importance. We start by analysing previous work that attempted to derive a lower bound on the target dimension to ensure low distortion of the outputs under random projection, we find a disagreement with empirically observed behaviour. We then give a new lower bound on the target dimension that, in contrast with previous work, does not depend on the number of hidden neurons, but only depends on the Frobenius norm of the first layer weights, and in addition it holds for a much larger class of random projections. Numerical experiments agree with our finding. Furthermore, we are able to bound the generalisation error of the compressive network in terms of the error and the expected distortion of the optimal network in the original uncompressed class. These results mean that one can provably learn networks with arbitrarily large number of hidden units from randomly compressed data, as long as there is sufficient regularity in the original learning problem, which our analysis rigorously quantifies.
KeywordsError analysis Random projection Multi-layer perceptron
The work of AK is funded by the EPSRC Fellowship EP/P004245/1.
- 3.Bartlett, P.L.: For valid generalization, the size of the weights is more important than the size of the network. Neural Inf. Process. Syst. 9, 134–140 (1997)Google Scholar
- 8.Kabán, A.: New bounds on compressed linear least squares regression. In: International Conference on Artificial Intelligence and Statistics (AISTATS), pp. 448–456, vol. 33. JMLR W&P (2014)Google Scholar
- 9.Kabán, A.: Improved bounds on the dot product under random projection and random sign projection. In: ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD), pp. 487–496 (2015)Google Scholar
- 10.Kakade, S.M., Sridharan, K., Tewari, A.: On the complexity of linear prediction: risk bounds, margin bounds, and regularization. In: Neural Information Processing Systems (NIPS), pp. 793–800 (2008)Google Scholar