Are 2D-LSTM really dead for offline text recognition?

Abstract

There is a recent trend in handwritten text recognition with deep neural networks to replace 2D recurrent layers with 1D and in some cases even completely remove the recurrent layers, relying on simple feed-forward convolutional-only architectures. The most used type of recurrent layer is the long short-term memory (LSTM). The motivations to do so are many: there are few open-source implementations of 2D-LSTM, even fewer supporting GPU implementations (currently cuDNN only implements 1D-LSTM); 2D recurrences reduce the amount of computations that can be parallelized and thus possibly increase the training/inference time; recurrences create global dependencies with respect to the input, and sometimes this may not be desirable. Many recent competitions were won by systems that employed networks that use 2D-LSTM layers. Most previous works that compared 1D or pure feed-forward architectures to 2D recurrent models have done so on simple datasets or did not fully optimize the “baseline” 2D model compared to the challenger model, which was dully optimized. In this work, we aim at a fair comparison between 2D and competing models and also extensively evaluate them on more complex datasets that are more representative of challenging “real-world” data, compared to “academic” datasets that are more restricted in their complexity. We aim at determining when and why the 1D and 2D recurrent models have different results. We also compare the results with a language model to assess if linguistic constraints do level the performance of the different networks. Our results show that for challenging datasets, 2D-LSTM networks still seem to provide the highest performances and we propose a visualization strategy to explain it.

This is a preview of subscription content, log in to check access.

Fig. 1
Fig. 2
Fig. 3

Notes

  1. 1.

    Here LSTM denote bidirectional (forward and backward) recurrent layers [26], 2D-LSTM introduce top and bottom directions, and a second forget gate.

  2. 2.

    Right Single Quotation Mark.

References

  1. 1.

    Bluche, T., Louradour, J., Knibbe, M., Moysset, B., Benzeghiba, M.F., Kermorvant, C.: The a2ia arabic handwritten text recognition system at the open hart2013 evaluation. In: 11th IAPR International Workshop on Document Analysis Systems (DAS), pp. 161–165. IEEE (2014)

  2. 2.

    Bluche, T., Messina, R.: Faster segmentation-free handwritten Chinese text recognition with character decompositions. In: International Conference on Frontiers in Handwriting Recognition (ICFHR) (2016)

  3. 3.

    Bluche, T., Messina, R.: Gated convolutional recurrent neural networks for multilingual handwriting recognition. In: 14th IAPR International Conference on Document Analysis and Recognition (ICDAR),vol. 1, pp. 646–651. IEEE (2017)

  4. 4.

    Bluche, T., Moysset, B., Kermorvant, C.: Automatic line segmentation and ground-truth alignment of handwritten documents. In: International Conference on Frontiers of Handwriting Recognition (2014)

  5. 5.

    Borisyuk, F., Gordo, A., Sivakumar, V.: Rosetta: large scale system for text detection and recognition in images. In: Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp. 71–79. ACM (2018)

  6. 6.

    Breuel, T.M.: High performance text recognition using a hybrid convolutional-lstm implementation. In: 14th IAPR International Conference on Document Analysis and Recognition (ICDAR), vol. 1, pp. 11–16. IEEE (2017)

  7. 7.

    Brunessaux, S., Giroux, P., Grilhères, B., Manta, M., Bodin, M., Choukri, K., Galibert, O., Kahn, J.: The maurdor project: improving automatic processing of digital documents. In: 11th IAPR International Workshop on Document Analysis Systems (DAS), pp. 349–354. IEEE (2014)

  8. 8.

    Bunke, H., Roth, M., Schukat-Talamazzini, E.G.: Off-line cursive handwriting recognition using hidden Markov models. Pattern Recognit. 28(9), 1399–1413 (1995)

    Article  Google Scholar 

  9. 9.

    Espana-Boquera, S., Castro-Bleda, M.J., Gorbe-Moya, J., Zamora-Martinez, F.: Improving offline handwritten text recognition with hybrid HMM/ANN models. IEEE Trans. Pattern Anal. Mach. Intell. 33(4), 767–779 (2011)

    Article  Google Scholar 

  10. 10.

    Glorot, X., Bengio, Y.: Understanding the difficulty of training deep feedforward neural networks. In: Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics, pp. 249–256 (2010)

  11. 11.

    Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: 23rd International Conference on Machine Learning, pp. 369–376. ACM (2006)

  12. 12.

    Graves, A., Schmidhuber, J.: Offline handwriting recognition with multidimensional recurrent neural networks. In: Advances in Neural Information Processing Systems, pp. 545–552 (2009)

  13. 13.

    Grosicki, E., El-Abed, H.: ICDAR 2011: French handwriting recognition competition. In: Proceedings of the International Conference on Document Analysis and Recognition, pp. 1459–1463 (2011)

  14. 14.

    Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)

    Article  Google Scholar 

  15. 15.

    Levenshtein, I.V.: Binary codes capable of correcting deletions, insertions, and reversals. In: Cybernetics and Control Theory (1966) (Russian Edition: Doklady Akademii Nauk SSSR, vol. 163, no. 4 (1965))

  16. 16.

    Liu, C.L., Yin, F., Wang, D.H., Wang, Q.F.: CASIA online and offline Chinese handwriting databases. In: ICDAR, pp. 37–41. IEEE (2011)

  17. 17.

    Marti, U.V., Bunke, H.: The IAM-database: an English sentence database for offline handwriting recognition. IJDAR 5(1), 39–46 (2002)

    Article  MATH  Google Scholar 

  18. 18.

    Mohri, M.: Finite-state transducers in language and speech processing. Comput. Linguist. 23, 269–311 (1997)

    MathSciNet  Google Scholar 

  19. 19.

    Moysset, B., Bluche, T., Knibbe, M., Benzeghiba, M.F., Messina, R., Louradour, J., Kermorvant, C.: The A2iA multi-lingual text recognition system at the Maurdor evaluation. In: International Conference on Frontiers of Handwriting Recognition (2014)

  20. 20.

    Oparin, I., Kahn, J., Galibert, O.: First Maurdor 2013 evaluation campaign in scanned document image processing. In: International Conference on Acoustics, Speech, and Signal Processing (2014)

  21. 21.

    Pham, V., Bluche, T., Kermorvant, C., Louradour, J.: Dropout improves recurrent neural networks for handwriting recognition. In: 14th International Conference on Frontiers in Handwriting Recognition (ICFHR), pp. 285–290. IEEE (2014)

  22. 22.

    Povey, D., Ghoshal, A., Boulianne, G., Burget, L., Glembek, O., Goel, N., Hannemann, M., Motlicek, P., Qian, Y., Schwarz, P., Silovsky, J., Stemmer, G., Vesely, K.: The Kaldi speech recognition toolkit. In: Workshop on Automatic Speech Recognition and Understanding (2011)

  23. 23.

    Puigcerver, J.: Are multidimensional recurrent layers really necessary for handwritten text recognition? In: 14th IAPR International Conference on Document Analysis and Recognition (ICDAR), vol. 1, pp. 67–72. IEEE (2017)

  24. 24.

    Sabir, E., Rawls, S., Natarajan, P.: Implicit language model in lstm for OCR. In: 14th IAPR International Conference on Document Analysis and Recognition (ICDAR), vol. 7, pp. 27–31. IEEE (2017)

  25. 25.

    Sanchez, J., Romero, V., Toselli, A., Vidal, E.: ICFHR2014 competition on handwritten text recognition on transcriptorium datasets (HTRtS). In: International Conference on Frontiers in Handwriting Recognition (ICFHR), pp. 181–186 (2014)

  26. 26.

    Schuster, M., Paliwal, K.K.: Bidirectional recurrent neural networks. IEEE Trans. Signal Process. 45(11), 2673–2681 (1997)

    Article  Google Scholar 

  27. 27.

    Sánchez, J.A., Romero, V., Toselli, A.H., Vidal, E.: Icfhr2016 competition on handwritten text recognition on the read dataset. In: ICFHR, pp. 630–635. IEEE Computer Society (2016)

  28. 28.

    Sánchez, J.A., Romero, V., Toselli, A.H., Villegas, M., Vidal, E.: Icdar2017 competition on handwritten text recognition on the read dataset. In: ICDAR, pp. 1383–1388. IEEE (2017)

  29. 29.

    Sánchez, J.A., Toselli, A.H., Romero, V., Vidal, E.: ICDAR 2015 competition HTRTS: handwritten text recognition on the transcriptorium dataset. In: ICDAR, pp. 1166–1170. IEEE Computer Society (2015). [Relocated from Tunis, Tunisia]

  30. 30.

    Srivastava, N., Hinton, G.E., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.: Dropout: a simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 15(1), 1929–1958 (2014)

    MathSciNet  MATH  Google Scholar 

  31. 31.

    Stolcke, A.: SRILM—An extensible language modeling toolkit. In: Proceedings International Conference on Spoken Language Processing, pp. 257–286 (2002)

  32. 32.

    Stollenga, M.F., Byeon, W., Liwicki, M., Schmidhuber, J.: Parallel multi-dimensional LSTM, with application to fast biomedical volumetric image segmentation. In: NIPS, pp. 2998–3006 (2015)

  33. 33.

    Tieleman, T., Hinton, G.: Lecture 6.5-rmsprop: divide the gradient by a running average of its recent magnitude. COURSERA Neural Netw. Mach. Learn. 4, 26–31 (2012)

    Google Scholar 

  34. 34.

    Yin, F., Wang, Q.F., Zhang, X.Y., Liu, C.L.: ICDAR 2013 Chinese Handwriting Recognition Competition. In: 12th International Conference on Document Analysis and Recognition. ICDAR ’13, pp. 1464–1470. IEEE Computer Society, Washington (2013)

Download references

Author information

Affiliations

Authors

Corresponding author

Correspondence to Bastien Moysset.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Moysset, B., Messina, R. Are 2D-LSTM really dead for offline text recognition?. IJDAR 22, 193–208 (2019). https://doi.org/10.1007/s10032-019-00325-0

Download citation

Keywords

  • Text line recognition
  • Neural network
  • Recurrent
  • 2D-LSTM
  • 1D-LSTM
  • Convolutional