Deep Learning

Schmidhuber, Jürgen

doi:10.1007/978-1-4899-7502-7_909-1

Jürgen Schmidhuber³

1395 Accesses
3 Citations

Abstract

Deep learning artificial neural networks have won numerous contests in pattern recognition and machine learning. They are now widely used by the worlds most valuable public companies. I review the most popular algorithms for feedforward and recurrent networks and their history.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Institutional subscriptions

Recommended Reading

Aizenberg I, Aizenberg NN, Vandewalle JPL (2000) Multi-valued and universal binary neurons: theory, learning and applications. Springer, Boston. First work to introduce the term “Deep Learning” to Neural Networks
Google Scholar
AMAmemory (2015) Answer at reddit AMA (Ask Me Anything) on “memory networks” etc (with references) http://www.reddit.com/r/MachineLearning/comments/2xcyrl/i_am_j%C3%BCrgen_schmidhuber_ama/cp0q12t
Amari S-I (1998) Natural gradient works efficiently in learning. Neural Comput 10(2):251–276
Article MathSciNet Google Scholar
Baird H (1990) Document image defect models. In: Proceedings of IAPR workshop on syntactic and structural pattern recognition, Murray Hill
Google Scholar
Baldi P, Pollastri G (2003) The principled design of large-scale recursive neural network architectures – DAG-RNNs and the protein structure prediction problem. J Mach Learn Res 4:575–602
MATH Google Scholar
Ballard DH (1987) Modular learning in neural networks. In: Proceedings of AAAI, Seattle, pp 279–284
Google Scholar
Barlow HB, Kaushal TP, Mitchison GJ (1989) Finding minimum entropy codes. Neural Comput 1(3):412–423
Article Google Scholar
Bayer J, Wierstra D, Togelius J, Schmidhuber J (2009) Evolving memory cell structures for sequence learning. In: Proceedings of ICANN, vol 2. Springer, Berlin/New York, pp 755–764
Google Scholar
Behnke S (1999) Hebbian learning and competition in the neural abstraction pyramid. In: Proceedings of IJCNN, vol 2. Washington, pp 1356–1361
Google Scholar
Behnke S (2003) Hierarchical neural networks for image interpretation. Lecture notes in computer science, vol LNCS 2766. Springer, Berlin/New York
Book MATH Google Scholar
Bengio Y, Lamblin P, Popovici D, Larochelle H (2007) Greedy layer-wise training of deep networks. In: Cowan JD, Tesauro G, Alspector J (eds) Proceedings of NIPS 19, MIT Press, Cambridge, pp 153–160
Google Scholar
Breiman L (1996) Bagging predictors. Mach Learn 24(2):123–140
MathSciNet MATH Google Scholar
Bryson AE (1961) A gradient method for optimizing multi-stage allocation processes. In: Proceedings of Harvard university symposium on digital computers and their applications, Harvard University Press, Cambridge
Google Scholar
Bryson A, Ho Y (1969) Applied optimal control: optimization, estimation, and control. Blaisdell Publishing Company, Washington
Google Scholar
Cho K, Ilin A, Raiko T (2012) Tikhonov-type regularization for restricted Boltzmann machines. In: Proceedings of ICANN 2012, Springer, Berlin/New York, pp 81–88
Google Scholar
Ciresan DC, Meier U, Gambardella LM, Schmidhuber J (2010) Deep big simple neural nets for handwritten digit recogntion. Neural Comput 22(12):3207–3220
Article Google Scholar
Ciresan DC, Meier U, Masci J, Gambardella LM, Schmidhuber J (2011) Flexible, high performance convolutional neural networks for image classification. In: Proceedings of IJCAI, pp 1237–1242
Google Scholar
Ciresan DC, Giusti A, Gambardella LM, Schmidhuber J (2012a) Deep neural networks segment neuronal membranes in electron microscopy images. In: Proceedings of NIPS, Quebec City, pp 2852–2860
Google Scholar
Ciresan DC, Meier U, Masci J, Schmidhuber J (2012b) Multi-column deep neural network for traffic sign classification. Neural Netw 32:333–338
Article Google Scholar
Ciresan DC, Meier U, Schmidhuber J (2012c) Multi-column deep neural networks for image classification. In: Proceedings of CVPR 2012, Long preprint. arXiv:1202.2745v1 [cs.CV]
Google Scholar
Ciresan DC, Giusti A, Gambardella LM, Schmidhuber J (2013) Mitosis detection in breast cancer histology images with deep neural networks. In: Proceedings of MICCAI, vol 2. Nagoya, pp 411–418
Google Scholar
Coates A, Huval B, Wang T, Wu DJ, Ng AY, Catanzaro, B (2013) Deep learning with COTS HPC systems. In: Proceedings of ICML’13
Google Scholar
Dechter R (1986) Learning while searching in constraint-satisfaction problems. University of California, Computer Science Department, Cognitive Systems Laboratory. First paper to introduce the term “Deep Learning” to Machine Learning; compare a popular G+ post on this. https://plus.google.com/100849856540000067209/posts/7N6z251w2Wd?pid=6127540521703625346&oid=100849856540000067209
Dreyfus SE (1962) The numerical solution of variational problems. J Math Anal Appl 5(1):30–45
Article MathSciNet MATH Google Scholar
Dreyfus SE (1973) The computational solution of optimal control problems with time lag. IEEE Trans Autom Control 18(4):383–385
Article MathSciNet Google Scholar
Fan B, Wang L, Soong FK, Xie L (2015) Photo-real talking head with deep bidirectional LSTM. In: Proceedings of ICASSP 2015, Brisbane
Google Scholar
Farabet C, Couprie C, Najman L, LeCun Y (2013) Learning hierarchical features for scene labeling. IEEE Trans Pattern Anal Mach Intell 35(8):1915–1929
Article Google Scholar
Fernandez S, Graves A, Schmidhuber J (2007a) An application of recurrent neural networks to discriminative keyword spotting. In: Proceedings of ICANN, vol 2. pp 220–229
Google Scholar
Fernandez S, Graves A, Schmidhuber J (2007b) Sequence labelling in structured domains with hierarchical recurrent neural networks. In: Proceedings of IJCAI
Google Scholar
Fu KS (1977) Syntactic pattern recognition and applications. Springer, Berlin
Book MATH Google Scholar
Fukushima K (1979) Neural network model for a mechanism of pattern recognition unaffected by shift in position – neocognitron. Trans. IECE J62-A(10):658–665
Google Scholar
Gers FA, Schmidhuber J (2001) LSTM recurrent networks learn simple context free and context sensitive languages. IEEE Trans Neural Netw 12(6):1333–1340
Article Google Scholar
Gerstner W, Kistler WK (2002) Spiking neuron models. Cambridge University Press, Cambridge
Book MATH Google Scholar
Glorot X, Bordes A, Bengio Y (2011) Deep sparse rectifier networks. In: Proceedings of AISTATS, vol 15. Fort Lauderdale, pp 315–323
Google Scholar
Goodfellow IJ, Warde-Farley D, Mirza M, Courville A, Bengio Y (2013) Maxout networks. In: Proceedings of ICML, Atlanta
Google Scholar
Goodfellow IJ, Bulatov Y, Ibarz J, Arnoud S, Shet V (2014b) Multi-digit number recognition from street view imagery using deep convolutional neural networks. arXiv preprint arXiv:1312.6082 v4
Google Scholar
Goller C, Küchler A (1996) Learning task-dependent distributed representations by backpropagation through structure. In: IEEE international conference on neural networks 1996, vol 1, pp 347–352
Google Scholar
Graves A, Fernandez S, Gomez FJ, Schmidhuber J(2006) Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural nets. In: Proceedings of ICML’06, Pittsburgh, pp 369–376
Google Scholar
Graves A, Liwicki M, Fernandez S, Bertolami R, Bunke H, Schmidhuber J (2009) A novel connectionist system for improved unconstrained handwriting recognition. IEEE Trans Pattern Anal Mach Intell 31(5):855–868
Article Google Scholar
Graves A, Mohamed A-R, Hinton GE (2013) Speech recognition with deep recurrent neural networks. In: Proceedings of ICASSP, Vancouver, pp 6645–6649
Google Scholar
Hannun A, Case C, Casper J, Catanzaro B, Diamos G, Elsen E, Prenger R, Satheesh S, Sengupta S, Coates A, Ng AY (2014) Deep speech: scaling up end-to-end speech recognition. arXiv preprint http://arxiv.org/abs/1412.5567
Hanson SJ, Pratt LY (1989) Comparing biases for minimal network construction with back-propagation. In: Touretzky DS (ed) Proceedings of NIPS, vol 1. Morgan Kaufmann, San Mateo, pp 177–185
Google Scholar
Hanson SJ (1990) A stochastic version of the delta rule. Phys D: Nonlinear Phenom 42(1):265–272
Article Google Scholar
Hastie TJ, Tibshirani RJ (1990) Generalized additive models, vol 43. CRC Press
MATH Google Scholar
Hebb DO (1949) The organization of behavior. Wiley, New York
Google Scholar
Herrero J, Valencia A, Dopazo J (2001) A hierarchical unsupervised growing neural network for clustering gene expression patterns. Bioinformatics 17(2):126–136
Article Google Scholar
Hinton G, Salakhutdinov R (2006) Reducing the dimensionality of data with neural networks. Science 313(5786):504–507
Article MathSciNet MATH Google Scholar
Hinton GE (2002) Training products of experts by minimizing contrastive divergence. Neural Comput 14(8):1771–1800
Article MathSciNet MATH Google Scholar
Hinton GE, Osindero S, Teh Y-W (2006) A fast learning algorithm for deep belief nets. Neural Comput 18(7):1527–1554
Article MathSciNet MATH Google Scholar
Hinton GE, Srivastava N, Krizhevsky A, Sutskever I, Salakhutdinov RR (2012b) Improving neural networks by preventing co-adaptation of feature detectors. Technical report. arXiv:1207.0580
Google Scholar
Hochreiter S (1991) Untersuchungen zu dynamischen neuronalen Netzen. Diploma thesis, Institut fuer Informatik, Lehrstuhl Prof. Brauer, Tech. Univ. Munich. Advisor: J. Schmidhuber
Google Scholar
Hochreiter S, Schmidhuber J (1997a) Flat minima. Neural Comput 9(1):1–42
Article MATH Google Scholar
Hochreiter S, Schmidhuber J (1997b) Long short-term memory. Neural Comput 9(8):1735–1780. Based on TR FKI-207-95, TUM (1995)
Article Google Scholar
Hochreiter S, Schmidhuber J (1999) Feature extraction through LOCOCODE. Neural Comput 11(3):679–714
Article Google Scholar
Hodgkin AL, Huxley AF (1952) A quantitative description of membrane current and its application to conduction and excitation in nerve. J Physiol 117(4):500
Article Google Scholar
Hutter M (2005) Universal artificial intelligence: sequential decisions based on algorithmic probability. Springer, Berlin
MATH Google Scholar
Ivakhnenko AG, Lapa VG (1965) Cybernetic Predicting Devices. CCM Information Corporation, New York
Google Scholar
Ivakhnenko AG (1971) Polynomial theory of complex systems. IEEE Trans Syst Man Cybern (4):364–378
Article MathSciNet Google Scholar
Jaeger H (2004) Harnessing nonlinearity: Predicting chaotic systems and saving energy in wireless communication. Science 304:78–80
Article Google Scholar
Ji S, Xu W, Yang M, Yu K (2013) 3D convolutional neural networks for human action recognition. IEEE Trans Pattern Anal Mach Intell 35(1):221–231
Article Google Scholar
Kaelbling LP, Littman ML, Moore AW (1996) Reinforcement learning: a survey. J AI Res 4:237–285
Google Scholar
Karpathy A, Toderici G, Shetty S, Leung T, Sukthankar R, Fei-Fei L (2014) Large-scale video classification with convolutional neural networks. In: Proceedings of CVPR, Columbus
Book Google Scholar
Kelley HJ (1960) Gradient theory of optimal flight paths. ARS J 30(10):947–954
Article MATH Google Scholar
Khan SH, Bennamoun M, Sohel F, Togneri R (2014) Automatic feature learning for robust shadow detection. In: Proceedings of CVPR, Columbus
Book Google Scholar
Koikkalainen P and Oja E (1990) Self-organizing hierarchical feature maps. In: Proceedings of IJCNN, pp 279–284
Google Scholar
Koutnik J, Greff K, Gomez F, Schmidhuber J (2014) A Clockwork RNN. In: Proceedings of ICML, vol 32. pp 1845–1853. arXiv:1402.3511 [cs.NE]
Google Scholar
Kramer M (1991) Nonlinear principal component analysis using autoassociative neural networks. AIChE J 37:233–243
Article Google Scholar
Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. In: Proceedings of NIPS, Nevada, p 4
Google Scholar
LeCun Y, Boser B, Denker JS, Henderson D, Howard RE, Hubbard W, Jackel LD (1989) Back-propagation applied to handwritten zip code recognition. Neural Comput 1(4):541–551
Article Google Scholar
LeCun Y, Denker JS, Solla SA (1990b) Optimal brain damage. In: Touretzky DS (ed) Proceedings of NIPS 2, Morgan Kaufmann, San Mateo, pp 598–605
Google Scholar
LeCun Y, Bengio Y, Hinton G (2015) Deep Learning. Nature 521:436–444. Link. See critique by J. Schmidhuber (2015) http://people.idsia.ch/~juergen/deep-learning-conspiracy.html
Google Scholar
Lee S, Kil RM (1991) A Gaussian potential function network with hierarchically selforganizing learning. Neural Netw 4(2):207–224
Article Google Scholar
Li X, Wu X (2015) Constructing long short-term memory based deep recurrent neural networks for large vocabulary speech recognition. In: Proceedings of ICASSP 2015. http://arxiv.org/abs/1410.4281
Google Scholar
Linnainmaa S (1970) The representation of the cumulative rounding error of an algorithm as a Taylor expansion of the local rounding errors. Master’s thesis, University of Helsinki
Google Scholar
Linnainmaa S (1976) Taylor expansion of the accumulated rounding error. BIT Numer Math 16(2):146–160
Article MathSciNet MATH Google Scholar
Maas AL, Hannun AY, Ng AY (2013) Rectifier nonlinearities improve neural network acoustic models. In: Proceedings of ICML, Atlanta
Google Scholar
Maass W (2000) On the computational power of winner-take-all. Neural Comput 12:2519–2535
Article Google Scholar
MacKay, DJC (1992) A practical Bayesian framework for backprop networks. Neural Comput 4:448–472
Article Google Scholar
Maclin R, Shavlik JW (1995) Combining the predictions of multiple classifiers: using competitive learning to initialize neural networks. In: Proceedings of IJCAI, pp 524–531
Google Scholar
Martens J, Sutskever I (2011) Learning recurrent neural networks with Hessian-free optimization. In: Proceedings of ICML, pp 1033–1040
Google Scholar
Masci J, Giusti A, Ciresan DC, Fricout G, Schmidhuber J (2013) A fast learning algorithm for image segmentation with max-pooling convolutional networks. In: Proceedings of ICIP13, pp 2713–2717
Google Scholar
McCulloch W, Pitts W (1943) A logical calculus of the ideas immanent in nervous activity. Bull Math Biophys 7:115–133
Article MathSciNet MATH Google Scholar
Mohamed A, Hinton GE (2010) Phone recognition using restricted Boltzmann machines. In: Proceedings of ICASSP, Dallas, pp 4354–4357
Google Scholar
Moller MF (1993) Exact calculation of the product of the Hessian matrix of feed-forward network error functions and a vector in O(N) time. Technical report PB-432, Computer Science Department, Aarhus University
Google Scholar
Montavon G, Orr G, Mueller K (2012) Neural networks: tricks of the trade. Lecture notes in computer science, vol LNCS 7700. Springer, Berlin/Heidelberg
Book Google Scholar
Moody JE (1992) The effective number of parameters: an analysis of generalization and regularization in nonlinear learning systems. In: Proceedings of NIPS’4, Morgan Kaufmann, San Mateo, pp 847–854
Google Scholar
Mozer MC, Smolensky P (1989) Skeletonization: a technique for trimming the fat from a network via relevance assessment. In: Proceedings of NIPS 1, Morgan Kaufmann, San Mateo, pp 107–115
Google Scholar
Nair V, Hinton GE (2010) Rectified linear units improve restricted Boltzmann machines. In: Proceedings of ICML, Dallas
Google Scholar
Oh K-S, Jung K (2004) GPU implementation of neural networks. Pattern Recognit 37(6):1311–1314
Article MATH Google Scholar
Pascanu R, Mikolov T, Bengio Y (2013b) On the difficulty of training recurrent neural networks. In: ICML’13: JMLR: W&CP, vol 28
Google Scholar
Pearlmutter BA (1994) Fast exact multiplication by the Hessian. Neural Comput 6(1):147–160
Article Google Scholar
Raina R, Madhavan A, Ng A (2009) Large-scale deep unsupervised learning using graphics processors. In: Proceedings of ICML, Montreal, pp 873–880
Google Scholar
Ranzato MA, Huang F, Boureau Y, LeCun Y (2007) Unsupervised learning of invariant feature hierarchies with applications to object recognition. In: Proceedings of CVPR, Minneapolis, pp 1–8
Google Scholar
Robinson AJ, Fallside F (1987) The utility driven dynamic error propagation network. Technical report CUED/F-INFENG/TR.1, Cambridge University Engineering Department
Google Scholar
Rosenblatt F (1958) The perceptron: a probabilistic model for information storage and organization in the brain. Psychol Rev 65(6):386
Article MathSciNet Google Scholar
Rumelhart DE, Hinton GE, Williams RJ (1986) Learning internal representations by error propagation. In: Rumelhart DE, McClelland JL (eds) Parallel distributed processing, vol 1, MIT Press, Cambridge, pp 318–362
Google Scholar
Sak H, Senior AW, Beaufays F (2014) Long short-term memory recurrent neural network architectures for large scale acoustic modeling. INTERSPEECH
Google Scholar
Sak H, Senior A, Rao K, Beaufays F, Schalkwyk J (2015) Google research blog. http://googleresearch.blogspot.ch/2015/09/google-voice-search-faster-and-more.html
Schapire RE (1990) The strength of weak learnability. Mach Learn 5(2):197–227
Google Scholar
Scherer D, Mueller A, Behnke S (2010) Evaluation of pooling operations in convolutional architectures for object recognition. In: Proceedings of ICANN, Thessaloniki, pp 92–101
Google Scholar
Schmidhuber J (1989b) A local learning algorithm for dynamic feedforward and recurrent networks. Connect Sci 1(4):403–412
Article Google Scholar
Schmidhuber J (1992b) Learning complex, extended sequences using the principle of history compression. Neural Comput 4(2):234–242. Based on TR FKI-148-91, TUM, 1991
Google Scholar
Schmidhuber J (1992c) Learning factorial codes by predictability minimization. Neural Comput 4(6):863–879
Article Google Scholar
Schmidhuber J (1997) Discovering neural nets with low Kolmogorov complexity and high generalization capability. Neural Netw 10(5):857–873
Article Google Scholar
Schmidhuber J, Wierstra D, Gagliolo M, Gomez FJ (2007) Training recurrent networks by Evolino. Neural Comput 19(3):757–779
Article MATH Google Scholar
Schmidhuber J (2015) Deep learning in neural networks: an overview. Neural Netw 61:85–117. arXiv preprint 1404.7828
Google Scholar
Schmidhuber J (2015) Deep learning. Scholarpedia 10(11):32832
Article Google Scholar
Schuster M, Paliwal KK (1997) Bidirectional recurrent neural networks. IEEE Trans Signal Process 45:2673–2681
Article Google Scholar
Sima J (1994) Loading deep networks is hard. Neural Comput 6(5):842–850
Article MathSciNet Google Scholar
Simonyan K, Zisserman A (2015) Very deep convolutional networks for large-scale image recognition. arXiv preprint http://arxiv.org/abs/1409.1556
Smolensky P (1986) Parallel distributed processing: explorations in the microstructure of cognition, chapter information processing in dynamical systems: foundations of Harmony theory, vol 1. MIT Press, Cambridge, pp 194–281
Google Scholar
Speelpenning B (1980) Compiling fast partial derivatives of functions given by algorithms. Ph.D. thesis, Department of Computer Science, University of Illinois, Urbana-Champaign
Google Scholar
Srivastava RK, Masci J, Kazerounian S, Gomez F, Schmidhuber J (2013) Compete to compute. In: Proceedings of NIPS, Nevada, pp 2310–2318
Google Scholar
Sutskever I, Vinyals O, Le QV (2014) Sequence to sequence learning with neural networks. In: Proceedings of NIPS’2014. arXiv preprint arXiv:1409.3215 [cs.CL]
Google Scholar
Sutton R, Barto A (1998) Reinforcement learning: an introduction. MIT Press, Cambridge
Google Scholar
Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A (2014) Going deeper with convolutions. arXiv preprint arXiv:1409.4842 [cs.CV]
Google Scholar
Tikhonov AN, Arsenin VI, John F (1977) Solutions of ill-posed problems. Winston, New York
MATH Google Scholar
Vaillant R, Monrocq C, LeCun Y (1994) Original approach for the localisation of objects in images. IEE Proc Vision Image Signal Process 141(4):245–250
Article Google Scholar
Vieira A, Barradas N (2003) A training algorithm for classification of high-dimensional data. Neurocomputing 50:461–472
Article MATH Google Scholar
Vinyals O, Toshev A, Bengio S, Erhan D (2014a) Show and tell: a neural image caption generator. arXiv Preprint http://arxiv.org/pdf/1411.4555v1.pdf
Vinyals O, Kaiser L, Koo T, Petrov S, Sutskever I, Hinton G (2014b) Grammar as a foreign language. Preprint http://arxiv.org/abs/1412.7449
Wan EA (1994) Time series prediction by using a connectionist network with internal delay lines. In: Weigend AS, Gershenfeld NA (eds) Time series prediction: forecasting the future and understanding the past. Addison-Wesley, Reading, pp 265–295
Google Scholar
Weng JJ, Ahuja N, Huang TS (1993) Learning recognition and segmentation of 3-d objects from 2-d images. Proceedings of the fourth international conference on computer vision. IEEE
Book Google Scholar
Williams RJ (1989) Complexity of exact gradient computation algorithms for recurrent neural networks. Technical Report NU-CCS-89-27, Northeastern University, College of Computer Science, Boston
Google Scholar
Wiering M, van Otterlo M (2012) Reinforcement learning. Springer, Berlin/Heidelberg
Book Google Scholar
Werbos PJ (1974) Beyond regression: new tools for prediction and analysis in the behavioral sciences. Ph.D. thesis, Harvard University
Google Scholar
Werbos PJ (1982) Applications of advances in nonlinear sensitivity analysis. In: Proceedings of the 10th IFIP conference, 31.8–4.9, NYC, pp 762–770
Google Scholar
Werbos PJ (1988) Generalization of backpropagation with application to a recurrent gas market model. Neural Netw 1(4):339–356
Article Google Scholar
Yamins D, Hong H, Cadieu C, DiCarlo JJ (2013) Hierarchical modular optimization of convolutional networks achieves representations similar to macaque IT and human ventral stream. In: Proceedings of NIPS, Nevada, pp 1–9
Google Scholar
Zeiler MD, Fergus R (2013) Visualizing and understanding convolutional networks. Technical report arXiv:1311.2901 [cs.CV], NYU
Google Scholar
Zen H, Sak H (2015) Unidirectional long short-term memory recurrent neural network with recurrent output layer for low-latency speech synthesis. In: Proceedings of ICASSP, Brisbane, pp 4470–4474
Google Scholar
Zimmermann H-G, Tietz C, Grothmann R (2012) Forecasting with recurrent neural networks: 12 tricks. In: Montavon G, Orr GB, Mueller K-R (eds) Neural networks: tricks of the trade, 2nd edn. Lecture Notes in Computer Science, vol 7700. Springer, Berlin/New York, pp 687–707
Chapter Google Scholar

Download references

Author information

Authors and Affiliations

The Swiss AI Lab, IDSIA, USI & SUPSI, Galleria 2, 6928, Manno & Lugano, Switzerland
Jürgen Schmidhuber

Authors

Jürgen Schmidhuber
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jürgen Schmidhuber .

Editor information

Editors and Affiliations

Engineering (CSE), University of New South Wales School of Computer Science &, Sydney, New South Wales, Australia
Claude Sammut
Software Engineering, Monash University School of Computer Science &, Melbourne, Victoria, Australia
Geoffrey I. Webb

Rights and permissions

Reprints and permissions

Copyright information

About this entry

Cite this entry

Schmidhuber, J. (2016). Deep Learning. In: Sammut, C., Webb, G. (eds) Encyclopedia of Machine Learning and Data Mining. Springer, Boston, MA. https://doi.org/10.1007/978-1-4899-7502-7_909-1

Download citation

DOI: https://doi.org/10.1007/978-1-4899-7502-7_909-1
Received: 27 February 2016
Accepted: 02 March 2016
Published: 06 May 2016
Publisher Name: Springer, Boston, MA
Online ISBN: 978-1-4899-7502-7
eBook Packages: Springer Reference Computer SciencesReference Module Computer Science and Engineering

Publish with us

Policies and ethics