Training of Sparsely Connected MLPs

  • Markus Thom
  • Roland Schweiger
  • Günther Palm
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6835)


Sparsely connected Multi-Layer Perceptrons (MLPs) differ from conventional MLPs in that only a small fraction of entries in their weight matrices are nonzero. Using sparse matrix-vector multiplication algorithms reduces the computational complexity of classification. Training of sparsely connected MLPs is achieved in two consecutive stages. In the first stage, initial values for the network’s parameters are given by the solution to an unsupervised matrix factorization problem, minimizing the reconstruction error. In the second stage, a modified version of the supervised backpropagation algorithm optimizes the MLP’s parameters with respect to the classification error. Experiments on the MNIST database of handwritten digits show that the proposed approach achieves equal classification performance compared to a densely connected MLP while speeding-up classification by a factor of seven.


Weight Matrice Hide Unit Convolutional Neural Network Code Word Handwritten Digit 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Bishop, C.M.: Neural Networks for Pattern Recognition. Clarendon Press, Oxford (1995)Google Scholar
  2. 2.
    Burges, C.J.C., Schölkopf, B.: Improving the Accuracy and Speed of Support Vector Machines. In: NIPS, vol. 9, pp. 375–381 (1997)Google Scholar
  3. 3.
    Cireşan, D.C., Meier, U., Gambardella, L.M., Schmidhuber, J.: Deep, Big, Simple Neural Nets for Handwritten Digit Recognition. Neural Computation 22(12), 3207–3220 (2010)CrossRefGoogle Scholar
  4. 4.
    DeCoste, D., Schölkopf, B.: Training Invariant Support Vector Machines. Machine Learning 46, 161–190 (2002)zbMATHCrossRefGoogle Scholar
  5. 5.
    Elliott, D.: A Better Activation Function for Artificial Neural Networks. Tech. Rep. ISR TR 93-8, Institute for Systems Research, University of Maryland (1993)Google Scholar
  6. 6.
    Fan, R.E., Chang, K.W., Hsieh, C.J., Wang, X.R., Lin, C.J.: LIBLINEAR: A Library for Large Linear Classification. JMLR 9, 1871–1874 (2008)Google Scholar
  7. 7.
    Field, D.J.: What is the Goal of Sensory Coding? Neural Computation 6, 559–601 (1994)CrossRefGoogle Scholar
  8. 8.
    Hinton, G.E., Salakhutdinov, R.R.: Reducing the Dimensionality of Data with Neural Networks. Science 313(5786), 1527–1554 (2006)MathSciNetCrossRefGoogle Scholar
  9. 9.
    Hoyer, P.O.: Non-negative Matrix Factorization with Sparseness Constraints. JMLR 5, 1457–1469 (2004)MathSciNetGoogle Scholar
  10. 10.
    LeCun, Y., Jackel, L., Bottou, L., Brunot, A., Cortes, C., Denker, J., Drucker, H., Guyon, I., Müller, U., Säckinger, E., Simard, P., Vapnik, V.: Comparison Of Learning Algorithms For Handwritten Digit Recognition. In: Proceedings of ICANN, pp. 53–60 (1995)Google Scholar
  11. 11.
    LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-Based Learning Applied to Document Recognition. Proceedings of the IEEE 86, 2278–2324 (1998)CrossRefGoogle Scholar
  12. 12.
    LeCun, Y., Cortes, C.: The MNIST Database of Handwritten Digits,
  13. 13.
    LeCun, Y., Kanter, I., Solla, S.A.: Eigenvalues of Covariance Matrices: Application to Neural-Network Learning. Physical Review Letters 66(18), 2396–2399 (1991)CrossRefGoogle Scholar
  14. 14.
    Lee, D.D., Seung, H.S.: Learning the parts of objects by nonnegative matrix factorization. Nature 401, 788–791 (1999)CrossRefGoogle Scholar
  15. 15.
    Ortigosa, E.M., Cañas, A., Rodríguez, R., Díaz, J., Mota, S.: Towards an Optimal Implementation of MLP in FPGA. In: Bertels, K., Cardoso, J.M.P., Vassiliadis, S. (eds.) ARC 2006. LNCS, vol. 3985, pp. 46–51. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  16. 16.
    Ranzato, M., Boureau, Y., LeCun, Y.: Sparse Feature Learning for Deep Belief Networks. In: NIPS, vol. 20, pp. 1185–1192 (2008)Google Scholar
  17. 17.
    Rast, A.D., Welbourne, S., Jin, X., Furber, S.: Optimal Connectivity In Hardware-Targetted MLP Networks. In: Proceedings of IJCNN, pp. 2619–2626 (2009)Google Scholar
  18. 18.
    Rumelhart, D.E., Hinton, G.E., Williams, R.J.: Learning representations by back-propagating errors. Nature 323, 533–536 (1986)CrossRefGoogle Scholar
  19. 19.
    Simard, P.Y., Steinkraus, D., Platt, J.C.: Best Practices for Convolutional Neural Networks Applied to Visual Document Analysis. In: Proceedings of ICDAR, pp. 958–962 (2003)Google Scholar
  20. 20.
    Theis, F.J., Stadlthanner, K., Tanaka, T.: First results on uniqueness of sparse non-negative matrix factorization. In: Proceedings of EUSIPCO (2005)Google Scholar
  21. 21.
    Thom, M., Schweiger, R., Palm, G.: Supervised Matrix Factorization with Sparseness Constraints and Fast Inference. In: Proceedings of IJCNN (to appear, 2011)Google Scholar
  22. 22.
    Yoshimura, Y., Dantzker, J.L.M., Callaway, E.M.: Excitatory cortical neurons form fine-scale functional networks. Nature 433(7028), 868–873 (2005)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2011

Authors and Affiliations

  • Markus Thom
    • 1
  • Roland Schweiger
    • 1
  • Günther Palm
    • 2
  1. 1.Department Environment Perception (GR/PAP)Daimler AGUlmGermany
  2. 2.Institute of Neural Information ProcessingUniversity of UlmGermany

Personalised recommendations