Error Entropy Minimization for LSTM Training

  • Luís A. Alexandre
  • J. P. Marques de Sá
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4131)


In this paper we present a new training algorithm for the Long Short-Term Memory (LSTM) recurrent neural network. This algorithm uses entropy instead of the usual mean squared error as the cost function for the weight update. More precisely we use the Error Entropy Minimization approach, were the entropy of the error is minimized after each symbol is present to the network. Our experiments show that this approach enables the convergence of the LSTM more frequently than with the traditional learning algorithm. This in turn relaxes the burden of parameter tuning since learning is achieved for a wider range of parameter values. The use of EEM also reduces, in some cases, the number of epochs needed for convergence.


Output Layer Learning Rate Time Lapse Recurrent Neural Network Output Gate 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Computation 9(8), 1735–1780 (1997)CrossRefGoogle Scholar
  2. 2.
    Gers, F., Schmidhuber, J., Cummins, F.: Learning to forget: Continual prediction with LSTM. Neural Computation 12(10), 2451–2471 (2000)CrossRefGoogle Scholar
  3. 3.
    Gers, F., Schmidhuber, J.: Recurrent nets that time and count. In: Proc. IJCNN 2000, Int. Joint Conf. on Neural Networks, Como, Italy (2000)Google Scholar
  4. 4.
    Pérez-Ortiz, J., Gers, F., Eck, D., Schmidhuber, J.: Kalman filters improve LSTM network performance in problems unsolvable by traditional recurrent nets. Neural Networks 16(2), 241–250 (2003)CrossRefGoogle Scholar
  5. 5.
    Haykin, S.: Neural Networks: A Comprehensive Foundation, 2nd edn. Prentice Hall, Englewood Cliffs (1999)MATHGoogle Scholar
  6. 6.
    Erdogmus., D., Principe, J.: An error-entropy minimization algorithm for supervised training of nonlinear adaptive systems. IEEE Trans. Signal Processing 50(7), 1780–1786 (2002)CrossRefMathSciNetGoogle Scholar
  7. 7.
    Santos, J., Alexandre, L., Sereno, F., de Sá, J.M.: Optimization of the error entropy minimization algorithm for neural network classification. In: ANNIE 2004, St.Louis, USA. Intelligent Engineering Systems Through Artificial Neural Networks, vol. 14, pp. 81–86. ASME Press Series, St. Louis (2004)Google Scholar
  8. 8.
    Santos, J., Alexandre, L., de Sá, J.M.: The error entropy minimization algorithm for neural network classification. In: Lofti, A. (ed.) Proceedings of the 5th International Conference on Recent Advances in Soft Computing, Nottingham, United Kingdom, pp. 92–97 (2004)Google Scholar
  9. 9.
    Silva, L., de Sá, J.M., Alexandre, L.: Neural network classification using Shannon’s entropy. In: 13th European Symposium on Artificial Neural Networks - ESANN 2005, Bruges, Belgium, pp. 217–222 (2005)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Luís A. Alexandre
    • 1
  • J. P. Marques de Sá
    • 2
  1. 1.Department of Informatics and IT-Networks and Multimedia GroupUniversity of Beira InteriorCovilhãPortugal
  2. 2.Faculty of Engineering and INEBUniversity of PortoPortugal

Personalised recommendations