There is a recent trend in handwritten text recognition with deep neural networks to replace 2D recurrent layers with 1D and in some cases even completely remove the recurrent layers, relying on simple feed-forward convolutional-only architectures. The most used type of recurrent layer is the long short-term memory (LSTM). The motivations to do so are many: there are few open-source implementations of 2D-LSTM, even fewer supporting GPU implementations (currently cuDNN only implements 1D-LSTM); 2D recurrences reduce the amount of computations that can be parallelized and thus possibly increase the training/inference time; recurrences create global dependencies with respect to the input, and sometimes this may not be desirable. Many recent competitions were won by systems that employed networks that use 2D-LSTM layers. Most previous works that compared 1D or pure feed-forward architectures to 2D recurrent models have done so on simple datasets or did not fully optimize the “baseline” 2D model compared to the challenger model, which was dully optimized. In this work, we aim at a fair comparison between 2D and competing models and also extensively evaluate them on more complex datasets that are more representative of challenging “real-world” data, compared to “academic” datasets that are more restricted in their complexity. We aim at determining when and why the 1D and 2D recurrent models have different results. We also compare the results with a language model to assess if linguistic constraints do level the performance of the different networks. Our results show that for challenging datasets, 2D-LSTM networks still seem to provide the highest performances and we propose a visualization strategy to explain it.
This is a preview of subscription content, log in to check access.
Buy single article
Instant access to the full article PDF.
Price includes VAT for USA
Subscribe to journal
Immediate online access to all issues from 2019. Subscription will auto renew annually.
This is the net price. Taxes to be calculated in checkout.
Here LSTM denote bidirectional (forward and backward) recurrent layers , 2D-LSTM introduce top and bottom directions, and a second forget gate.
Right Single Quotation Mark.
Bluche, T., Louradour, J., Knibbe, M., Moysset, B., Benzeghiba, M.F., Kermorvant, C.: The a2ia arabic handwritten text recognition system at the open hart2013 evaluation. In: 11th IAPR International Workshop on Document Analysis Systems (DAS), pp. 161–165. IEEE (2014)
Bluche, T., Messina, R.: Faster segmentation-free handwritten Chinese text recognition with character decompositions. In: International Conference on Frontiers in Handwriting Recognition (ICFHR) (2016)
Bluche, T., Messina, R.: Gated convolutional recurrent neural networks for multilingual handwriting recognition. In: 14th IAPR International Conference on Document Analysis and Recognition (ICDAR),vol. 1, pp. 646–651. IEEE (2017)
Bluche, T., Moysset, B., Kermorvant, C.: Automatic line segmentation and ground-truth alignment of handwritten documents. In: International Conference on Frontiers of Handwriting Recognition (2014)
Borisyuk, F., Gordo, A., Sivakumar, V.: Rosetta: large scale system for text detection and recognition in images. In: Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp. 71–79. ACM (2018)
Breuel, T.M.: High performance text recognition using a hybrid convolutional-lstm implementation. In: 14th IAPR International Conference on Document Analysis and Recognition (ICDAR), vol. 1, pp. 11–16. IEEE (2017)
Brunessaux, S., Giroux, P., Grilhères, B., Manta, M., Bodin, M., Choukri, K., Galibert, O., Kahn, J.: The maurdor project: improving automatic processing of digital documents. In: 11th IAPR International Workshop on Document Analysis Systems (DAS), pp. 349–354. IEEE (2014)
Bunke, H., Roth, M., Schukat-Talamazzini, E.G.: Off-line cursive handwriting recognition using hidden Markov models. Pattern Recognit. 28(9), 1399–1413 (1995)
Espana-Boquera, S., Castro-Bleda, M.J., Gorbe-Moya, J., Zamora-Martinez, F.: Improving offline handwritten text recognition with hybrid HMM/ANN models. IEEE Trans. Pattern Anal. Mach. Intell. 33(4), 767–779 (2011)
Glorot, X., Bengio, Y.: Understanding the difficulty of training deep feedforward neural networks. In: Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics, pp. 249–256 (2010)
Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: 23rd International Conference on Machine Learning, pp. 369–376. ACM (2006)
Graves, A., Schmidhuber, J.: Offline handwriting recognition with multidimensional recurrent neural networks. In: Advances in Neural Information Processing Systems, pp. 545–552 (2009)
Grosicki, E., El-Abed, H.: ICDAR 2011: French handwriting recognition competition. In: Proceedings of the International Conference on Document Analysis and Recognition, pp. 1459–1463 (2011)
Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)
Levenshtein, I.V.: Binary codes capable of correcting deletions, insertions, and reversals. In: Cybernetics and Control Theory (1966) (Russian Edition: Doklady Akademii Nauk SSSR, vol. 163, no. 4 (1965))
Liu, C.L., Yin, F., Wang, D.H., Wang, Q.F.: CASIA online and offline Chinese handwriting databases. In: ICDAR, pp. 37–41. IEEE (2011)
Marti, U.V., Bunke, H.: The IAM-database: an English sentence database for offline handwriting recognition. IJDAR 5(1), 39–46 (2002)
Mohri, M.: Finite-state transducers in language and speech processing. Comput. Linguist. 23, 269–311 (1997)
Moysset, B., Bluche, T., Knibbe, M., Benzeghiba, M.F., Messina, R., Louradour, J., Kermorvant, C.: The A2iA multi-lingual text recognition system at the Maurdor evaluation. In: International Conference on Frontiers of Handwriting Recognition (2014)
Oparin, I., Kahn, J., Galibert, O.: First Maurdor 2013 evaluation campaign in scanned document image processing. In: International Conference on Acoustics, Speech, and Signal Processing (2014)
Pham, V., Bluche, T., Kermorvant, C., Louradour, J.: Dropout improves recurrent neural networks for handwriting recognition. In: 14th International Conference on Frontiers in Handwriting Recognition (ICFHR), pp. 285–290. IEEE (2014)
Povey, D., Ghoshal, A., Boulianne, G., Burget, L., Glembek, O., Goel, N., Hannemann, M., Motlicek, P., Qian, Y., Schwarz, P., Silovsky, J., Stemmer, G., Vesely, K.: The Kaldi speech recognition toolkit. In: Workshop on Automatic Speech Recognition and Understanding (2011)
Puigcerver, J.: Are multidimensional recurrent layers really necessary for handwritten text recognition? In: 14th IAPR International Conference on Document Analysis and Recognition (ICDAR), vol. 1, pp. 67–72. IEEE (2017)
Sabir, E., Rawls, S., Natarajan, P.: Implicit language model in lstm for OCR. In: 14th IAPR International Conference on Document Analysis and Recognition (ICDAR), vol. 7, pp. 27–31. IEEE (2017)
Sanchez, J., Romero, V., Toselli, A., Vidal, E.: ICFHR2014 competition on handwritten text recognition on transcriptorium datasets (HTRtS). In: International Conference on Frontiers in Handwriting Recognition (ICFHR), pp. 181–186 (2014)
Schuster, M., Paliwal, K.K.: Bidirectional recurrent neural networks. IEEE Trans. Signal Process. 45(11), 2673–2681 (1997)
Sánchez, J.A., Romero, V., Toselli, A.H., Vidal, E.: Icfhr2016 competition on handwritten text recognition on the read dataset. In: ICFHR, pp. 630–635. IEEE Computer Society (2016)
Sánchez, J.A., Romero, V., Toselli, A.H., Villegas, M., Vidal, E.: Icdar2017 competition on handwritten text recognition on the read dataset. In: ICDAR, pp. 1383–1388. IEEE (2017)
Sánchez, J.A., Toselli, A.H., Romero, V., Vidal, E.: ICDAR 2015 competition HTRTS: handwritten text recognition on the transcriptorium dataset. In: ICDAR, pp. 1166–1170. IEEE Computer Society (2015). [Relocated from Tunis, Tunisia]
Srivastava, N., Hinton, G.E., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.: Dropout: a simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 15(1), 1929–1958 (2014)
Stolcke, A.: SRILM—An extensible language modeling toolkit. In: Proceedings International Conference on Spoken Language Processing, pp. 257–286 (2002)
Stollenga, M.F., Byeon, W., Liwicki, M., Schmidhuber, J.: Parallel multi-dimensional LSTM, with application to fast biomedical volumetric image segmentation. In: NIPS, pp. 2998–3006 (2015)
Tieleman, T., Hinton, G.: Lecture 6.5-rmsprop: divide the gradient by a running average of its recent magnitude. COURSERA Neural Netw. Mach. Learn. 4, 26–31 (2012)
Yin, F., Wang, Q.F., Zhang, X.Y., Liu, C.L.: ICDAR 2013 Chinese Handwriting Recognition Competition. In: 12th International Conference on Document Analysis and Recognition. ICDAR ’13, pp. 1464–1470. IEEE Computer Society, Washington (2013)
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
About this article
Cite this article
Moysset, B., Messina, R. Are 2D-LSTM really dead for offline text recognition?. IJDAR 22, 193–208 (2019). https://doi.org/10.1007/s10032-019-00325-0
- Text line recognition
- Neural network