Learning Input Features Representations in Deep Learning

Conference paper
Part of the Advances in Intelligent Systems and Computing book series (AISC, volume 513)


Traditionally, when training supervised classifiers with Backpropagation, the training dataset is a static representation of the learning environment. The error on this training set is then propagated backwards to all the layers, and the gradient of the error with respect to the classifiers parameters is used to update them. However, this process stops when the parameters between the input layer and the next layer are updated. We note that there is a residual error that could be propagated further backwards to the feature vector(s) in order to adapt the representation of the input features, and that using this residual error can lead to improved speed of convergence towards a generalised solution. We present a methodology for applying this new technique to Deep Learning methods, such as Deep Neural Networks and Convolutional Neural Networks.


  1. 1.
    Rumelhart, D.E., Hinton, G.E., Williams, R.J.: Learning representations by back-propagating errors. Cogn. Model. 5(3), 1 (1988)Google Scholar
  2. 2.
    Hecht-Nielsen, R.: Theory of the backpropagation neural network. In: International Joint Conference on Neural Networks, 1989 (IJCNN), pp.593–605. IEEE (1989)Google Scholar
  3. 3.
    LeCun, Y., Bengio, Y.: Convolutional networks for images, speech, and time series. Handb. Brain Theory Neural Netw. 3361, 310 (1995)Google Scholar
  4. 4.
    Miikkulainen, R., Dyer, M.G.: Natural language processing with modular pdp networks and distributed lexicon. Cogn. Sci. 15(3), 343–399 (1991)CrossRefGoogle Scholar
  5. 5.
    Ciresan, D.C., Meier, U., Gambardella, L.M., Schmidhuber, J.: Deep, big, simple neural nets for handwritten digit recognition. Neural Comput. 22(12), 3207–3220 (2010)CrossRefGoogle Scholar
  6. 6.
    Simard, P.Y., Steinkraus, D., Platt, J.C.: Best practices for convolutional neural networks applied to visual document analysis. http://research.microsoft.com/apps/pubs/default.aspx?id=68920 (2003)
  7. 7.
    Kingma, D., Ba, J.: Adam: a method for stochastic optimization (2014). arXiv:1412.6980
  8. 8.
    Lecun, Y., Cortes, C.: The MNIST database of handwritten digits. http://yann.lecun.com/exdb/mnist/
  9. 9.
    Ioffe, S., Szegedy, C.: Batch normalization: accelerating deep network training by reducing internal covariate shift (2015). arXiv:1502.03167
  10. 10.
    Krizhevsky, A., Hinton, G.: Learning multiple layers of features from tiny images (2009)Google Scholar
  11. 11.
    Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)CrossRefGoogle Scholar
  12. 12.
    Mozer, M.C.: A focused back-propagation algorithm for temporal pattern recognition. Complex Syst. 3(4), 349–381 (1989)MathSciNetMATHGoogle Scholar
  13. 13.
    Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples (2014). arXiv:1412.6572

Copyright information

© Springer International Publishing AG 2017

Authors and Affiliations

  1. 1.Department of Computer Science and Information SystemsBirkbeck, University of LondonLondonUK

Personalised recommendations