Tighter Guarantees for the Compressive Multi-layer Perceptron

  • Ata Kabán
  • Yamonporn Thummanusarn
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11324)


We are interested in theoretical guarantees for classic 2-layer feed-forward neural networks with sigmoidal activation functions, having inputs linearly compressed by random projection. Due to the speedy increase of the dimensionality of modern data sets, and the development of novel data acquisition devices in compressed sensing, a proper understanding of are the guarantees obtainable is of much practical importance. We start by analysing previous work that attempted to derive a lower bound on the target dimension to ensure low distortion of the outputs under random projection, we find a disagreement with empirically observed behaviour. We then give a new lower bound on the target dimension that, in contrast with previous work, does not depend on the number of hidden neurons, but only depends on the Frobenius norm of the first layer weights, and in addition it holds for a much larger class of random projections. Numerical experiments agree with our finding. Furthermore, we are able to bound the generalisation error of the compressive network in terms of the error and the expected distortion of the optimal network in the original uncompressed class. These results mean that one can provably learn networks with arbitrarily large number of hidden units from randomly compressed data, as long as there is sufficient regularity in the original learning problem, which our analysis rigorously quantifies.


Error analysis Random projection Multi-layer perceptron 



The work of AK is funded by the EPSRC Fellowship EP/P004245/1.


  1. 1.
    Achlioptas, D.: Database-friendly random projections: Johnson-Lindenstrauss with binary coins. J. Comput. Syst. Sci. 66(4), 671–687 (2003)MathSciNetzbMATHCrossRefGoogle Scholar
  2. 2.
    Alon, N., Ben-David, S., Cesa-Bianchi, N., Haussler, D.: Scale-sensitive dimensions, uniform convergence, and learnability. J. ACM 4, 615–631 (1997)MathSciNetzbMATHCrossRefGoogle Scholar
  3. 3.
    Bartlett, P.L.: For valid generalization, the size of the weights is more important than the size of the network. Neural Inf. Process. Syst. 9, 134–140 (1997)Google Scholar
  4. 4.
    Bartlett, P.L., Mendelson, S.: Rademacher and Gaussian complexities: risk bounds and structural results. J. Mach. Learn. Res. 3, 463–482 (2002)MathSciNetzbMATHGoogle Scholar
  5. 5.
    Cybenko, G.: Approximations by superpositions of Lipschitz continuous functions. Math. Control Signals Syst. 2(4), 303–314 (1989)MathSciNetzbMATHCrossRefGoogle Scholar
  6. 6.
    Dudley, R.M.: Uniform Central Limit Theorems. Cambridge University Press, Cambridge (1999)zbMATHCrossRefGoogle Scholar
  7. 7.
    Johnson, W.B., Lindenstrauss, J.: Extensions of Lipschitz maps into a Hilbert space. Contemp. Math. 26, 189–206 (1984)zbMATHCrossRefGoogle Scholar
  8. 8.
    Kabán, A.: New bounds on compressed linear least squares regression. In: International Conference on Artificial Intelligence and Statistics (AISTATS), pp. 448–456, vol. 33. JMLR W&P (2014)Google Scholar
  9. 9.
    Kabán, A.: Improved bounds on the dot product under random projection and random sign projection. In: ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD), pp. 487–496 (2015)Google Scholar
  10. 10.
    Kakade, S.M., Sridharan, K., Tewari, A.: On the complexity of linear prediction: risk bounds, margin bounds, and regularization. In: Neural Information Processing Systems (NIPS), pp. 793–800 (2008)Google Scholar
  11. 11.
    Kane, D.M., Nelson, J.: Sparser Johnson-Lindenstrauss transforms. J. ACM 61, 4 (2014)MathSciNetzbMATHCrossRefGoogle Scholar
  12. 12.
    Koltchinskii, V., Panchenko, D.: Empirical margin distributions and bounding the generalization error of combined classifiers. Ann. Stat. 30(1), 1–50 (2002)MathSciNetzbMATHGoogle Scholar
  13. 13.
    Reboredo, H., Renna, F., Calderbank, R., Rodrigues, M.R.D.: Bounds on the number of measurements for reliable compressive classification. IEEE Trans. Signal Process. 64(22), 5778–5793 (2016)MathSciNetCrossRefGoogle Scholar
  14. 14.
    Skubalska-Rafajlowicz, E.: Neural networks with Lipschitz continuous activation functions: dimension reduction using normal random projection. Nonlinear Anal. 71, 12 (2009)CrossRefGoogle Scholar

Copyright information

© Springer Nature Switzerland AG 2018

Authors and Affiliations

  1. 1.School of Computer ScienceUniversity of BirminghamBirminghamUK

Personalised recommendations