Skip to main content
Log in

Character-level recurrent neural networks in practice: comparing training and sampling schemes

  • Original Article
  • Published:
Neural Computing and Applications Aims and scope Submit manuscript

Abstract

Recurrent neural networks are nowadays successfully used in an abundance of applications, going from text, speech and image processing to recommender systems. Backpropagation through time is the algorithm that is commonly used to train these networks on specific tasks. Many deep learning frameworks have their own implementation of training and sampling procedures for recurrent neural networks, while there are in fact multiple other possibilities to choose from and other parameters to tune. In the existing literature, this is very often overlooked or ignored. In this paper, we therefore give an overview of possible training and sampling schemes for character-level recurrent neural networks to solve the task of predicting the next token in a given sequence. We test these different schemes on a variety of datasets, neural network architectures and parameter settings, and formulate a number of take-home recommendations. The choice of training and sampling scheme turns out to be subject to a number of trade-offs, such as training stability, sampling time, model performance and implementation effort, but is largely independent of the data. Perhaps the most surprising result is that transferring hidden states for correctly initializing the model on subsequences often leads to unstable training behavior depending on the dataset.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11

Similar content being viewed by others

Notes

  1. The datasets are available for download at https://github.com/cedricdeboom/character-level-rnn-datasets.

  2. www.gutenberg.org.

  3. github.com/torvalds/linux/tree/master/kernel.

  4. www.classicalarchives.com.

References

  1. Bradbury J, Merity S, Xiong C, Socher R (2016) Quasi-recurrent neural networks. arXiv:1611.01576

  2. Cho K, van Merrienboer B, Gülçehre Ç, Bahdanau D, Bougares F, Schwenk H, Bengio Y (2014) Learning phrase representations using RNN encoder–decoder for statistical machine translation. arXiv:1406.1078

  3. Chung J, Gülçehre Ç, Cho K, Bengio Y (2014) Empirical evaluation of gated recurrent neural networks on sequence modeling. arXiv:1412.3555

  4. Chung J, Ahn S, Bengio Y (2016) Hierarchical multiscale recurrent neural networks. arXiv:1609.01704

  5. Cooijmans T, Ballas N, Laurent C, Courville A (2016) Recurrent batch normalization. arXiv:1603.09025

  6. De Boom C, Agrawal R, Hansen S, Kumar E, Yon R, Chen CW, Demeester T, Dhoedt B (2017) Large-scale user modeling with recurrent neural networks for music discovery on multiple time scales. arXiv:1708.06520

  7. Gal Y, Ghahramani Z (2016) A theoretically grounded application of dropout in recurrent neural networks. NIPS. arXiv:1512.05287

  8. Goodfellow I, Bengio Y, Courville A (2016) Deep learning. MIT Press, London

    MATH  Google Scholar 

  9. Graves A (2013) Generating sequences with recurrent neural networks. arXiv:1308.0850

  10. Greff K, Srivastava RK, Koutník J, Steunebrink BR, Schmidhuber J (2015) LSTM: a search space Odyssey. arXiv:1503.04069

  11. Gregor K, Danihelka I, Mnih A, Blundell C, Wierstra D (2013) Deep autoregressive networks. arXiv:1310.8499

  12. Gregor K, Danihelka I, Graves A, Wierstra D (2015) DRAW: a recurrent neural network for image generation. arXiv:1502.04623

  13. Ha D, Dai A, Le QV (2016) Hypernetworks. arXiv:1609.09106

  14. Hidasi B, Karatzoglou A, Baltrunas L, Tikk D (2016) Session-based recommendations with recurrent neural networks. arXiv:1511.06939

  15. Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9:1735–1780

    Article  Google Scholar 

  16. Hochreiter S, Bengio Y, Frasconi P, Schmidhuber J (2001) Gradient flow in recurrent nets: the difficulty of learning long-term dependencies. In: A Field Guide to Dynamical Recurrent Neural Networks, IEEE Press

  17. Hutter M (2012) The human knowledge compression contest

  18. Inan H, Khosravi K, Socher R (2016) Tying word vectors and word classifiers—a loss framework for language modeling. arXiv:1611.01462

  19. Karpathy A, Johnson J, Fei-Fei L (2015) Visualizing and understanding recurrent networks. arXiv:1506.02078

  20. Kim Y, Jernite Y, Sontag D, Rush AM (2015) Character-aware neural language models. arXiv:1508.06615

  21. Kingma D, Ba J (2015) Adam: a method for stochastic optimization. In: ICLR. arXiv:1412.6980

  22. Krause B, Lu L, Murray I, Renals S (2016) Multiplicative LSTM for sequence modelling. arXiv:1609.07959

  23. Marcus M, Santorini B, Marcinkiewicz MA (1993) Building a large annotated corpus of English: the Penn Treebank. Comput Linguist 19:313–330

    Google Scholar 

  24. Melis G, Dyer C, Blunsom P (2017) On the state of the art of evaluation in neural language models. arXiv:1707.05589

  25. Merity S, Xiong C, Bradbury J, Socher R (2016) Pointer sentinel mixture models. arXiv:1609.07843

  26. Merity S, Keskar NS, Socher R (2017) Regularizing and optimizing LSTM language models. arXiv:1708.02182

  27. Mikolov T, Zweig G (2012) Context dependent recurrent neural network language model. In: 2012 IEEE spoken language technology workshop (SLTW)

  28. Mikolov T, Karafiát M, Burget L, Cernocky J, Khudanpur S (2010) Recurrent neural network based language model. In: Interspeech

  29. Mujika A, Meier F, Steger A (2017) Fast–slow recurrent neural networks. arXiv:1705.08639

  30. Oord A, Kalchbrenner N, Kavukcuoglu K (2016) Pixel recurrent neural networks. arXiv:1601.06759

  31. Rumelhart DE, Hinton GE, Williams RJ (1988) Learning representations by back-propagating errors. Cogn Model 5(3):1

    MATH  Google Scholar 

  32. Saon G, Sercu T, Rennie S, Kuo HKJ (2016) The IBM 2016 English conversational telephone speech recognition system. arXiv:1505.05899

  33. Sercu T, Goel V (2016) Advances in very deep convolutional neural networks for LVCSR. In: Interspeech. arXiv:1604.01792

  34. Srivastava N, Hinton GE, Krizhevsky A, Sutskever I, Salakhutdinov R (2014) Dropout— a simple way to prevent neural networks from overfitting. J Mach Learn Res 15:1929–1958

    MathSciNet  MATH  Google Scholar 

  35. Sturm BL, Santos JF, Ben-Tal O, Korshunova I (2016) Music transcription modelling and composition using deep learning. arXiv:1604.08723

  36. Sutskever I (2013) Training recurrent neural networks. Ph.D. thesis

  37. Tan YK, Xu X, Liu Y (2016) Improved recurrent neural networks for session-based recommendations. arXiv:1606.08117

  38. Van Den Oord A, Dieleman S, Zen H, Simonyan K, Vinyals O, Graves A, Kalchbrenner N, Senior A, Kavukcuoglu K (2016) WaveNet: a generative model for raw audio. arXiv:1609.03499

  39. Wu Y, Zhang S, Zhang Y, Bengio Y, Salakhutdinov R (2016) On multiplicative integration with recurrent neural networks. arXiv:1606.06630

  40. Yang Z, Dai Z, Salakhutdinov R, Cohen WW (2017) Breaking the softmax bottleneck: a high-rank RNN language model. arXiv:1711.03953

  41. Zaremba W, Sutskever I, Vinyals O (2014) Recurrent neural network regularization. arXiv:1409.2329

  42. Zilly JG, Srivastava RK, Koutník J, Schmidhuber J (2017) Recurrent highway networks. arXiv:1607.03474

  43. Zoph B, Le QV (2016) Neural architecture search with reinforcement learning. arXiv:1611.01578

Download references

Funding

The hardware used to perform the experiments in this paper was funded by Nvidia.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Cedric De Boom.

Ethics declarations

Conflict of interest

Cedric De Boom is funded by a Ph.D. grant of the Research Foundation—Flanders (FWO). The other authors declare that they have no conflict of interest.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

De Boom, C., Demeester, T. & Dhoedt, B. Character-level recurrent neural networks in practice: comparing training and sampling schemes. Neural Comput & Applic 31, 4001–4017 (2019). https://doi.org/10.1007/s00521-017-3322-z

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00521-017-3322-z

Keywords

Navigation