Advertisement

On the Initialization of Long Short-Term Memory Networks

  • Mostafa Mehdipour GhaziEmail author
  • Mads Nielsen
  • Akshay Pai
  • Marc Modat
  • M. Jorge Cardoso
  • Sébastien Ourselin
  • Lauge Sørensen
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11953)

Abstract

Weight initialization is important for faster convergence and stability of deep neural networks training. In this paper, a robust initialization method is developed to address the training instability in long short-term memory (LSTM) networks. It is based on a normalized random initialization of the network weights that aims at preserving the variance of the network input and output in the same range. The method is applied to standard LSTMs for univariate time series regression and to LSTMs robust to missing values for multivariate disease progression modeling. The results show that in all cases, the proposed initialization method outperforms the state-of-the-art initialization techniques in terms of training convergence and generalization performance of the obtained solution.

Keywords

Deep neural networks Long short-term memory Time series regression Initialization Disease progression modeling 

Notes

Acknowledgments

This project has received funding from the European Union’s Horizon 2020 research and innovation programme under the Marie Skłodowska-Curie grant agreement No 721820.

References

  1. 1.
    Hochreiter, S., Bengio, Y., Frasconi, P., Schmidhuber, J.: Gradient flow in recurrent nets: the difficulty of learning long-term dependencies. In: A Field Guide to Dynamical Recurrent Neural Networks. IEEE Press (2001)Google Scholar
  2. 2.
    Martens, J., Sutskever, I.: Learning recurrent neural networks with Hessian-free optimization. In: Proceedings of the International Conference on Machine Learning, pp. 1033–1040 (2011)Google Scholar
  3. 3.
    Trinh, T.H., Dai, A.M., Luong, M.T., Le, Q.V.: Learning longer-term dependencies in RNNs with auxiliary losses. CoRR abs/1803.00144 (2018)Google Scholar
  4. 4.
    Le, Q.V., Jaitly, N., Hinton, G.E.: A simple way to initialize recurrent networks of rectified linear units. CoRR abs/1504.00941 (2015)Google Scholar
  5. 5.
    Vorontsov, E., Trabelsi, C., Kadoury, S., Pal, C.: On orthogonality and learning recurrent networks with long term dependencies. CoRR abs/1702.00071 (2017)Google Scholar
  6. 6.
    Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)CrossRefGoogle Scholar
  7. 7.
    Cho, K., et al.: Learning phrase representations using RNN encoder-decoder for statistical machine translation. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing, pp. 1724–1734 (2014)Google Scholar
  8. 8.
    Sutskever, I., Vinyals, O., Le, Q.V.: Sequence to sequence learning with neural networks. In: Advances in Neural Information Processing Systems, pp. 3104–3112 (2014)Google Scholar
  9. 9.
    Gers, F.A., Schraudolph, N.N., Schmidhuber, J.: Learning precise timing with LSTM recurrent networks. J. Mach. Learn. Res. 3, 115–143 (2002)MathSciNetzbMATHGoogle Scholar
  10. 10.
    Dau, H.A., et al.: The UCR Time Series Archive. CoRR abs/1810.07758 (2018)Google Scholar
  11. 11.
    Ghazi, M.M., et al.: Training recurrent neural networks robust to incomplete data: application to Alzheimer’s disease progression modeling. Med. Image Anal. 53, 39–46 (2019)CrossRefGoogle Scholar
  12. 12.
    Petersen, R.C., et al.: Alzheimer’s Disease Neuroimaging Initiative (ADNI): clinical characterization. Neurology 74, 201–209 (2010)CrossRefGoogle Scholar
  13. 13.
    Glorot, X., Bengio, Y.: Understanding the difficulty of training deep feedforward neural networks. In: Proceedings of the International Conference on Artificial Intelligence and Statistics, pp. 249–256 (2010)Google Scholar
  14. 14.
    Erhan, D., Manzagol, P.A., Bengio, Y., Bengio, S., Vincent, P.: The difficulty of training deep architectures and the effect of unsupervised pre-training. In: Proceedings of the International Conference on Artificial Intelligence and Statistics, pp. 153–160 (2009)Google Scholar
  15. 15.
    He, K., Zhang, X., Ren, S., Sun, J.: Delving deep into rectifiers: surpassing human-level performance on ImageNet classification. In: Proceedings of the 2015 IEEE International Conference on Computer Vision, pp. 1026–1034 (2015)Google Scholar
  16. 16.
    Talathi, S.S., Vartak, A.: Improving performance of recurrent neural network with ReLU nonlinearity. CoRR abs/1511.03771 (2015)Google Scholar
  17. 17.
    Buraczewski, D., Damek, E., Mikosch, T., et al.: Stochastic Models with Power-Law Tails. Springer, Cham (2016).  https://doi.org/10.1007/978-3-319-29679-1CrossRefzbMATHGoogle Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  • Mostafa Mehdipour Ghazi
    • 1
    • 2
    • 3
    • 4
    Email author
  • Mads Nielsen
    • 1
    • 2
    • 3
  • Akshay Pai
    • 1
    • 2
    • 3
  • Marc Modat
    • 4
    • 5
  • M. Jorge Cardoso
    • 4
    • 5
  • Sébastien Ourselin
    • 4
    • 5
  • Lauge Sørensen
    • 1
    • 2
    • 3
  1. 1.Biomediq A/SCopenhagenDenmark
  2. 2.Cerebriu A/SCopenhagenDenmark
  3. 3.Department of Computer ScienceUniversity of CopenhagenCopenhagenDenmark
  4. 4.Department of Medical Physics and Biomedical EngineeringUniversity College LondonLondonUK
  5. 5.School of Biomedical Engineering and Imaging SciencesKing’s College LondonLondonUK

Personalised recommendations