Advertisement

Deep Big Multilayer Perceptrons for Digit Recognition

  • Dan Claudiu Cireşan
  • Ueli Meier
  • Luca Maria Gambardella
  • Jürgen Schmidhuber
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7700)

Abstract

The competitive MNIST handwritten digit recognition benchmark has a long history of broken records since 1998. The most recent advancement by others dates back 8 years (error rate 0.4 old on-line back-propagation for plain multi-layer perceptrons yields a very low 0.35% error rate on the MNIST handwritten digits benchmark with a single MLP and 0.31% with a committee of seven MLP. All we need to achieve this until 2011 best result are many hidden layers, many neurons per layer, numerous deformed training images to avoid overfitting, and graphics cards to greatly speed up learning.

Keywords

NN (Neural Network) MLP (Multilayer Perceptron) GPU (Graphics Processing Unit) training set deformations MNIST committee BP (back-propagation) 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Bengio, Y., Lamblin, P., Popovici, D., Larochelle, H.: Greedy layer-wise training of deep networks. In: Neural Information Processing Systems (2006)Google Scholar
  2. 2.
    Bishop, C.M.: Pattern Recognition and Machine Learning. Springer (2006)Google Scholar
  3. 3.
    Breiman, L.: Bagging predictors. Machine Learning 24, 123–140 (1996)zbMATHGoogle Scholar
  4. 4.
    Chellapilla, K., Shilman, M., Simard, P.: Combining Multiple Classifiers for Faster Optical Character Recognition. In: Bunke, H., Spitz, A.L. (eds.) DAS 2006. LNCS, vol. 3872, pp. 358–367. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  5. 5.
    Chellapilla, K., Puri, S., Simard, P.: High performance convolutional neural networks for document processing. In: International Workshop on Frontiers in Handwriting Recognition (2006)Google Scholar
  6. 6.
    Ciresan, D.C., Meier, U., Gambardella, L.M., Schmidhuber, J.: Deep, big, simple neural nets for handwritten digit recognition. Neural Computation 22(12), 3207–3220 (2010)CrossRefGoogle Scholar
  7. 7.
    Ciresan, D.C., Meier, U., Gambardella, L.M., Schmidhuber, J.: Handwritten Digit Recognition with a Committee of Deep Neural Nets on GPUs. Technical Report IDSIA-03-11, Istituto Dalle Molle di Studi sull’Intelligenza Artificiale, IDSIA (2011)Google Scholar
  8. 8.
    Ciresan, D.C., Meier, U., Gambardella, L.M., Schmidhuber, J.: Convolutional neural network committees for handwritten character recognition. In: International Conference on Document Analysis and Recognition, pp. 1135–1139 (2011)Google Scholar
  9. 9.
    Ciresan, D.C., Meier, U., Masci, J., Schmidhuber, J.: A committee of neural networks for traffic sign classification. In: International Joint Conference on Neural Networks, pp. 1918–1921 (2011)Google Scholar
  10. 10.
    Ciresan, D.C., Meier, U., Masci, J., Gambardella, L.M., Schmidhuber, J.: Flexible, high performance convolutional neural networks for image classification. In: International Joint Conference on Artificial Intelligence, pp. 1237–1242 (2011)Google Scholar
  11. 11.
    Ciresan, D.C., Meier, U., Schmidhuber, J.: Multi-column deep neural networks for image classification. In: Computer Vision and Pattern Recognition, pp. 3642–3649 (2012)Google Scholar
  12. 12.
    Ciresan, D.C., Meier, U., Masci, J., Schmidhuber, J.: Multi-column deep neural network for traffic sign classification. Neural Networks 32, 333–338 (2012)CrossRefGoogle Scholar
  13. 13.
    Decoste, D., Scholkopf, B.: Training invariant support vector machines. Machine Learning (46), 161–190 (2002)Google Scholar
  14. 14.
    Hinton, G.E., Salakhutdinov, R.R.: Reducing the dimensionality of data with neural networks. Science 313 (2006)Google Scholar
  15. 15.
    Hinton, G.E.: To recognize shapes, first learn to generate images. Computational Neuroscience: Theoretical Insights into Brain Function (2007)Google Scholar
  16. 16.
    Hochreiter, S.: Untersuchungen zu dynamischen neuronalen Netzen. Diploma thesis, Institut für Informatik, Lehrstuhl Prof. Brauer, Technische Universität München (1991), http://www7.informatik.tu-muenchen.de/~hochreit; advisor: J. Schmidhuber
  17. 17.
    Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Computation 9, 1735–1780 (1997)CrossRefGoogle Scholar
  18. 18.
    Hochreiter, S., Bengio, Y., Frasconi, P., Schmidhuber, J.: Gradient flow in recurrent nets: the difficulty of learning long-term dependencies. In: Kremer, S.C., Kolen, J.F. (eds.) A Field Guide to Dynamical Recurrent Neural Networks. IEEE Press (2001)Google Scholar
  19. 19.
    Lauer, F., Suen, C., Bloch, G.: A trainable feature extractor for handwritten digit recognition. Pattern Recognition (40), 1816–1824 (2007)Google Scholar
  20. 20.
    LeCun, Y.: Une procédure d’apprentissage pour réseau a seuil asymmetrique (a learning scheme for asymmetric threshold networks). In: Proceedings of Cognitiva 1985, Paris, France, pp. 599–604 (1985)Google Scholar
  21. 21.
    LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998)CrossRefGoogle Scholar
  22. 22.
    Meier, U., Ciresan, D.C., Gambardella, L.M., Schmidhuber, J.: Better digit recognition with a committee of simple neural nets. In: ICDAR, pp. 1135–1139 (2011)Google Scholar
  23. 23.
    Mohamed, A., Dahl, G., Hinton, G.E.: Deep belief networks for phone recognition. In: Proc. of NIPS 2009 Workshop on Deep Learning for Speech Recognition and Related Applications (2009)Google Scholar
  24. 24.
    Nair, V., Hinton, G.E.: 3D object recognition with deep belief nets. In: Advances in Neural Information Processing Systems (2009)Google Scholar
  25. 25.
    NVIDIA: NVIDIA CUDA. Reference Manual, vol. 2.3. NVIDIA (2009)Google Scholar
  26. 26.
    Ranzato, M., Poultney, C., Chopra, S., LeCun, Y.: Efficient learning of sparse representations with an energy-based model. In: Platt, J., et al. (eds.) Advances in Neural Information Processing Systems (NIPS 2006). MIT Press (2006)Google Scholar
  27. 27.
    Ranzato, M.: Fu Jie Huang, Y.L.B., LeCun, Y.: Unsupervised learning of invariant feature hierarchies with applications to object recognition. In: Proc. of Computer Vision and Pattern Recognition Conference (2007)Google Scholar
  28. 28.
    Ruetsch, G., Micikevicius, P.: Optimizing matrix transpose in cuda. In: NVIDIA GPU Computing SDK, pp. 1–2 (2009)Google Scholar
  29. 29.
    Rumelhart, D.E., Hinton, G.E., Williams, R.J.: Learning internal representations by error propagation. In: Parallel Distributed Processing: Explorations in the Microstructure of Cognition, vol. 1: Foundations, pp. 318–362. MIT Press, Cambridge (1986)Google Scholar
  30. 30.
    Russell, S., Norvig, P.: Artificial Intelligence: A Modern Approach, 2nd edn. Prentice-Hall, Englewood Cliffs (2003)zbMATHGoogle Scholar
  31. 31.
    Salakhutdinov, R., Hinton, G.: Learning a nonlinear embedding by preserving class neighborhood structure. In: Proc. of the International Conference on Artificial Intelligence and Statistics, vol. 11 (2007)Google Scholar
  32. 32.
    Scherer, D., Behnke, S.: Accelerating large-scale convolutional neural networks with parallel graphics multiprocessors. In: Proc. of NIPS 2009 Workshop on Large-Scale Machine Learning: Parallelism and Massive Datasets (2009)Google Scholar
  33. 33.
    Simard, P., Steinkraus, D., Platt, J.C.: Best practices for convolutional neural networks applied to visual document analysis. In: Seventh International Conference on Document Analysis and Recognition, pp. 958–963 (2003)Google Scholar
  34. 34.
    Steinkraus, D., Simard, P.Y.: Gpus for machine learning algorithms. In: International Conference on Document Analysis and Recognition, pp. 1115–1120 (2005)Google Scholar
  35. 35.
    Werbos, P.J.: Beyond Regression: New Tools for Prediction and Analysis in the Behavioral Sciences. PhD thesis, Harvard University (1974)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  • Dan Claudiu Cireşan
    • 1
    • 2
  • Ueli Meier
    • 1
    • 2
  • Luca Maria Gambardella
    • 1
    • 2
  • Jürgen Schmidhuber
    • 1
    • 2
  1. 1.IDSIAManno-LuganoSwitzerland
  2. 2.University of Lugano & SUPSISwitzerland

Personalised recommendations