Character-level recurrent neural networks in practice: comparing training and sampling schemes

De Boom, Cedric; Demeester, Thomas; Dhoedt, Bart

doi:10.1007/s00521-017-3322-z

Character-level recurrent neural networks in practice: comparing training and sampling schemes

Original Article
Published: 10 January 2018

Volume 31, pages 4001–4017, (2019)
Cite this article

Neural Computing and Applications Aims and scope Submit manuscript

675 Accesses
5 Citations
31 Altmetric
Explore all metrics

Abstract

Recurrent neural networks are nowadays successfully used in an abundance of applications, going from text, speech and image processing to recommender systems. Backpropagation through time is the algorithm that is commonly used to train these networks on specific tasks. Many deep learning frameworks have their own implementation of training and sampling procedures for recurrent neural networks, while there are in fact multiple other possibilities to choose from and other parameters to tune. In the existing literature, this is very often overlooked or ignored. In this paper, we therefore give an overview of possible training and sampling schemes for character-level recurrent neural networks to solve the task of predicting the next token in a given sequence. We test these different schemes on a variety of datasets, neural network architectures and parameter settings, and formulate a number of take-home recommendations. The choice of training and sampling scheme turns out to be subject to a number of trade-offs, such as training stability, sampling time, model performance and implementation effort, but is largely independent of the data. Perhaps the most surprising result is that transferring hidden states for correctly initializing the model on subsequences often leads to unstable training behavior depending on the dataset.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Sparse Bayesian Recurrent Neural Networks

Brief Announcement: Gradual Learning of Deep Recurrent Neural Network

Comparing Deep Recurrent Networks Based on the MAE Random Sampling, a First Approach

Notes

The datasets are available for download at https://github.com/cedricdeboom/character-level-rnn-datasets.
www.gutenberg.org.
github.com/torvalds/linux/tree/master/kernel.
www.classicalarchives.com.

References

Bradbury J, Merity S, Xiong C, Socher R (2016) Quasi-recurrent neural networks. arXiv:1611.01576
Cho K, van Merrienboer B, Gülçehre Ç, Bahdanau D, Bougares F, Schwenk H, Bengio Y (2014) Learning phrase representations using RNN encoder–decoder for statistical machine translation. arXiv:1406.1078
Chung J, Gülçehre Ç, Cho K, Bengio Y (2014) Empirical evaluation of gated recurrent neural networks on sequence modeling. arXiv:1412.3555
Chung J, Ahn S, Bengio Y (2016) Hierarchical multiscale recurrent neural networks. arXiv:1609.01704
Cooijmans T, Ballas N, Laurent C, Courville A (2016) Recurrent batch normalization. arXiv:1603.09025
De Boom C, Agrawal R, Hansen S, Kumar E, Yon R, Chen CW, Demeester T, Dhoedt B (2017) Large-scale user modeling with recurrent neural networks for music discovery on multiple time scales. arXiv:1708.06520
Gal Y, Ghahramani Z (2016) A theoretically grounded application of dropout in recurrent neural networks. NIPS. arXiv:1512.05287
Goodfellow I, Bengio Y, Courville A (2016) Deep learning. MIT Press, London
MATH Google Scholar
Graves A (2013) Generating sequences with recurrent neural networks. arXiv:1308.0850
Greff K, Srivastava RK, Koutník J, Steunebrink BR, Schmidhuber J (2015) LSTM: a search space Odyssey. arXiv:1503.04069
Gregor K, Danihelka I, Mnih A, Blundell C, Wierstra D (2013) Deep autoregressive networks. arXiv:1310.8499
Gregor K, Danihelka I, Graves A, Wierstra D (2015) DRAW: a recurrent neural network for image generation. arXiv:1502.04623
Ha D, Dai A, Le QV (2016) Hypernetworks. arXiv:1609.09106
Hidasi B, Karatzoglou A, Baltrunas L, Tikk D (2016) Session-based recommendations with recurrent neural networks. arXiv:1511.06939
Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9:1735–1780
Article Google Scholar
Hochreiter S, Bengio Y, Frasconi P, Schmidhuber J (2001) Gradient flow in recurrent nets: the difficulty of learning long-term dependencies. In: A Field Guide to Dynamical Recurrent Neural Networks, IEEE Press
Hutter M (2012) The human knowledge compression contest
Inan H, Khosravi K, Socher R (2016) Tying word vectors and word classifiers—a loss framework for language modeling. arXiv:1611.01462
Karpathy A, Johnson J, Fei-Fei L (2015) Visualizing and understanding recurrent networks. arXiv:1506.02078
Kim Y, Jernite Y, Sontag D, Rush AM (2015) Character-aware neural language models. arXiv:1508.06615
Kingma D, Ba J (2015) Adam: a method for stochastic optimization. In: ICLR. arXiv:1412.6980
Krause B, Lu L, Murray I, Renals S (2016) Multiplicative LSTM for sequence modelling. arXiv:1609.07959
Marcus M, Santorini B, Marcinkiewicz MA (1993) Building a large annotated corpus of English: the Penn Treebank. Comput Linguist 19:313–330
Google Scholar
Melis G, Dyer C, Blunsom P (2017) On the state of the art of evaluation in neural language models. arXiv:1707.05589
Merity S, Xiong C, Bradbury J, Socher R (2016) Pointer sentinel mixture models. arXiv:1609.07843
Merity S, Keskar NS, Socher R (2017) Regularizing and optimizing LSTM language models. arXiv:1708.02182
Mikolov T, Zweig G (2012) Context dependent recurrent neural network language model. In: 2012 IEEE spoken language technology workshop (SLTW)
Mikolov T, Karafiát M, Burget L, Cernocky J, Khudanpur S (2010) Recurrent neural network based language model. In: Interspeech
Mujika A, Meier F, Steger A (2017) Fast–slow recurrent neural networks. arXiv:1705.08639
Oord A, Kalchbrenner N, Kavukcuoglu K (2016) Pixel recurrent neural networks. arXiv:1601.06759
Rumelhart DE, Hinton GE, Williams RJ (1988) Learning representations by back-propagating errors. Cogn Model 5(3):1
MATH Google Scholar
Saon G, Sercu T, Rennie S, Kuo HKJ (2016) The IBM 2016 English conversational telephone speech recognition system. arXiv:1505.05899
Sercu T, Goel V (2016) Advances in very deep convolutional neural networks for LVCSR. In: Interspeech. arXiv:1604.01792
Srivastava N, Hinton GE, Krizhevsky A, Sutskever I, Salakhutdinov R (2014) Dropout— a simple way to prevent neural networks from overfitting. J Mach Learn Res 15:1929–1958
MathSciNet MATH Google Scholar
Sturm BL, Santos JF, Ben-Tal O, Korshunova I (2016) Music transcription modelling and composition using deep learning. arXiv:1604.08723
Sutskever I (2013) Training recurrent neural networks. Ph.D. thesis
Tan YK, Xu X, Liu Y (2016) Improved recurrent neural networks for session-based recommendations. arXiv:1606.08117
Van Den Oord A, Dieleman S, Zen H, Simonyan K, Vinyals O, Graves A, Kalchbrenner N, Senior A, Kavukcuoglu K (2016) WaveNet: a generative model for raw audio. arXiv:1609.03499
Wu Y, Zhang S, Zhang Y, Bengio Y, Salakhutdinov R (2016) On multiplicative integration with recurrent neural networks. arXiv:1606.06630
Yang Z, Dai Z, Salakhutdinov R, Cohen WW (2017) Breaking the softmax bottleneck: a high-rank RNN language model. arXiv:1711.03953
Zaremba W, Sutskever I, Vinyals O (2014) Recurrent neural network regularization. arXiv:1409.2329
Zilly JG, Srivastava RK, Koutník J, Schmidhuber J (2017) Recurrent highway networks. arXiv:1607.03474
Zoph B, Le QV (2016) Neural architecture search with reinforcement learning. arXiv:1611.01578

Download references

Funding

The hardware used to perform the experiments in this paper was funded by Nvidia.

Author information

Authors and Affiliations

Imec, IDLab, Ghent University, Technologiepark-Zwijnaarde 15, 9052, Ghent, Belgium
Cedric De Boom, Thomas Demeester & Bart Dhoedt

Authors

Cedric De Boom
View author publications
You can also search for this author in PubMed Google Scholar
Thomas Demeester
View author publications
You can also search for this author in PubMed Google Scholar
Bart Dhoedt
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Cedric De Boom.

Ethics declarations

Conflict of interest

Cedric De Boom is funded by a Ph.D. grant of the Research Foundation—Flanders (FWO). The other authors declare that they have no conflict of interest.

Rights and permissions

Reprints and permissions

About this article

Cite this article

De Boom, C., Demeester, T. & Dhoedt, B. Character-level recurrent neural networks in practice: comparing training and sampling schemes. Neural Comput & Applic 31, 4001–4017 (2019). https://doi.org/10.1007/s00521-017-3322-z

Download citation

Received: 18 April 2017
Accepted: 28 December 2017
Published: 10 January 2018
Issue Date: August 2019
DOI: https://doi.org/10.1007/s00521-017-3322-z

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Character-level recurrent neural networks in practice: comparing training and sampling schemes

Abstract

Access this article

Similar content being viewed by others

Sparse Bayesian Recurrent Neural Networks

Brief Announcement: Gradual Learning of Deep Recurrent Neural Network

Comparing Deep Recurrent Networks Based on the MAE Random Sampling, a First Approach

Notes

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Character-level recurrent neural networks in practice: comparing training and sampling schemes

Abstract

Access this article

Similar content being viewed by others

Sparse Bayesian Recurrent Neural Networks

Brief Announcement: Gradual Learning of Deep Recurrent Neural Network

Comparing Deep Recurrent Networks Based on the MAE Random Sampling, a First Approach

Notes

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation