Neural Networks: Tricks of the Trade

Volume 1524 of the series Lecture Notes in Computer Science pp 9-50


Efficient BackProp

  • Yann LeCunAffiliated withImage Processing Research Department AT&T Labs - Research
  • , Leon BottouAffiliated withImage Processing Research Department AT&T Labs - Research
  • , Genevieve B. OrrAffiliated withWillamette University
  • , Klaus -Robert MüllerAffiliated withGMD FIRST

* Final gross prices may vary according to local VAT.

Get Access


The convergence of back-propagation learning is analyzed so as to explain common phenomenon observedb y practitioners. Many undesirable behaviors of backprop can be avoided with tricks that are rarely exposedin serious technical publications. This paper gives some of those tricks, ando.ers explanations of why they work. Many authors have suggested that second-order optimization methods are advantageous for neural net training. It is shown that most “classical” second-order methods are impractical for large neural networks. A few methods are proposed that do not have these limitations.