Skip to main content

Boosting modern and historical handwritten text recognition with deformable convolutions

Abstract

Handwritten Text Recognition (HTR) in free-layout pages is a challenging image understanding task that can provide a relevant boost to the digitization of handwritten documents and reuse of their content. The task becomes even more challenging when dealing with historical documents due to the variability of the writing style and degradation of the page quality. State-of-the-art HTR approaches typically couple recurrent structures for sequence modeling with Convolutional Neural Networks for visual feature extraction. Since convolutional kernels are defined on fixed grids and focus on all input pixels independently while moving over the input image, this strategy disregards the fact that handwritten characters can vary in shape, scale, and orientation even within the same document and that the ink pixels are more relevant than the background ones. To cope with these specific HTR difficulties, we propose to adopt deformable convolutions, which can deform depending on the input at hand and better adapt to the geometric variations of the text. We design two deformable architectures and conduct extensive experiments on both modern and historical datasets. Experimental results confirm the suitability of deformable convolutions for the HTR task.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Notes

  1. For the sake of simplicity, we do not include biases in this analysis.

  2. www.tbluche.com/resources.html.

  3. https://doi.org/10.5281/zenodo.44519.

  4. https://doi.org/10.5281/zenodo.1164045.

References

  1. Aberdam, A., Litman, R., Tsiper, S., Anschel, O., Slossberg, R., Mazor, S., Manmatha, R., Perona, P.: Sequence-to-sequence contrastive learning for text recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 15302–15312 (2021)

  2. Aradillas, J.C., Murillo-Fuentes, J.J., Olmos, P.M.: Boosting offline handwritten text recognition in historical documents with few labeled lines. IEEE Access 9(9438636), 76674–76688 (2021)

    Article  Google Scholar 

  3. Augustin, E., Carré, M., Grosicki, E., Brodin, J.M., Geoffrois, E., Prêteux, F.: RIMES evaluation campaign for handwritten mail processing. In: Proceedings of the International Workshop on Frontiers in Handwriting Recognition, pp. 231–235 (2006)

  4. Bera, S.K., Chakrabarti, A., Lahiri, S., Smith, E.H.B., Sarkar, R.: Normalization of unconstrained handwritten words in terms of Slope and Slant Correction. Pattern Recognit. Lett. 128, 488–495 (2019)

    Article  Google Scholar 

  5. Bhunia, A.K., Das, A., Bhunia, A.K., Kishore, P.S.R., Roy, P.P.: Handwriting recognition in low-resource scripts using adversarial learning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4767–4776 (2019)

  6. Bluche, T.: Joint line segmentation and transcription for end-to-end handwritten paragraph recognition. Adv. Neural Inf. Process. Syst. 29, 838–846 (2016)

    Google Scholar 

  7. Bluche, T., Louradour, J., Messina, R.: Scan, attend and read: end-to-end handwritten paragraph recognition with MDLSTM attention. In: Proceedings of the International Conference on Document Analysis and Recognition, volume 1, pp. 1050–1055. IEEE (2017)

  8. Bluche, T., Messina, R.: Gated convolutional recurrent neural networks for multilingual handwriting recognition. In: Proceedings of the International Conference on Document Analysis and Recognition, volume 1, pp. 646–651. IEEE (2017)

  9. Bouillon, M., Ingold, R., Liwicki, M.: Grayification: a meaningful grayscale conversion to improve handwritten historical documents analysis. Pattern Recognit. Lett. 121, 46–51 (2019)

    Article  Google Scholar 

  10. Cascianelli, S., Cornia, M., Baraldi, L., Piazzi, M.L., Schiuma, R., Cucchiara, R.: Learning to read L’Infinito: handwritten text recognition with synthetic training data. In: Proceedings of the International Conference on Computer Analysis of Images and Patterns, pp. 340–350. Springer (2021)

  11. Causer, T., Wallace, V.: Building a volunteer community: results and findings from Transcribe Bentham. Digital Hum. Q. 6(2) (2012)

  12. Chen, Z., Wu, Y., Yin, F., Liu, C.L.: Simultaneous script identification and handwriting recognition via multi-task learning of recurrent neural networks. In: Proceedings of the International Conference on Document Analysis and Recognition, volume 1, pp. 525–530. IEEE (2017)

  13. Chowdhury, A., Vig, L.: An efficient end-to-end neural model for handwritten text recognition. In: Proceedings of the British Machine Vision Conference (2018)

  14. Cilia, N.D., De Stefano, C., Fontanella, F., di Freca, A.S.: A ranking-based feature selection approach for handwritten character recognition. Pattern Recognit. Lett. 121, 77–86 (2019)

    Article  Google Scholar 

  15. Clanuwat, T., Lamb, A., Kitamoto, A.: KuroNet: pre-modern Japanese Kuzushiji character recognition with deep learning. In: Proceedings of the International Conference on Document Analysis and Recognition, pp. 607–614. IEEE (2019)

  16. Cojocaru, I., Cascianelli, S., Baraldi, L., Corsini, M., Cucchiara, R.: Watch your strokes: improving handwritten text recognition with deformable convolutions. In: Proceedings of the International Conference on Pattern Recognition, pp. 6096–6103. IEEE (2021)

  17. Coquenet, D., Chatelain, C., Paquet, T.: Recurrence-free unconstrained handwritten text recognition using gated fully convolutional network. In: Proceedings of the International Conference on Frontiers in Handwriting Recognition, pp. 19–24. IEEE (2020)

  18. Dai, J., Qi, H., Xiong, Y., Li, Y., Zhang, G., Hu, H., Wei, Y.: Deformable convolutional networks. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 764–773 (2017)

  19. de Buy Wenniger, G.M., Schomaker, L., Way, A.: No Padding please: efficient neural handwriting recognition. In: Proceedings of the International Conference on Document Analysis and Recognition, pp. 355–362. IEEE (2019)

  20. Fontanella, F., Colace, F., Molinara, M., Scotto Di Freca, A., Filippo, S.: Pattern recognition and artificial intelligence techniques for cultural heritage. Pattern Recognit. Lett. 138, 23–29 (2020)

    Article  Google Scholar 

  21. Graves, A., Schmidhuber, J.: Offline handwriting recognition with multidimensional recurrent neural networks. Adv. Neural Inf. Process. Syst. 21, 545–552 (2008)

    Google Scholar 

  22. Jaderberg, M., Simonyan, K., Zisserman, A., Kavukcuoglu, K.: Spatial transformer networks. Adv. Neural Inf. Process. Syst. 28, 2017–2025 (2015)

    Google Scholar 

  23. Jayasundara, V., Jayasekara, S., Jayasekara, H., Rajasegaran, J., Seneviratne, S., Rodrigo, R.: TextCap: handwritten character recognition with very small datasets. In: Proceedings of the IEEE Winter Conference on Applications of Computer Vision, pp. 254–262. IEEE (2019)

  24. Johansson, S., Leech, G.N., Goodluck, H.: Manual of information to accompany the Lancaster-Oslo/Bergen Corpus of British English, for use with digital computer. University of Oslo, Department of English (1978)

  25. Kang, L., Riba, P., Rusiñol, M., Fornés, A., Villegas, M.: Pay attention to what you read: non-recurrent handwritten text-line recognition (2020) preprint arXiv:2005.13044

  26. Krevat, E., Cuzzillo, E.: improving off-line handwritten character recognition with hidden Markov models. IEEE Trans. Pattern Anal. Mach. Intell. 33 (2006)

  27. Li, M., Lv, T., Cui, L., Lu, Y., Florencio, D., Zhang, C., Li, Z., Wei, F.: TrOCR: transformer-based optical character recognition with pre-trained models (2021). arXiv preprint arXiv:2109.10282

  28. Liu, C.L., Marukawa, K.: Pseudo two-dimensional shape normalization methods for handwritten Chinese character recognition. Pattern Recognit. 38(12), 2242–2255 (2005)

    Article  Google Scholar 

  29. Ly, N.T., Nguyen, H.T., Nakagawa, M.: 2D Self-attention convolutional recurrent network for offline handwritten text recognition. In: Proceedings of the International Conference on Document Analysis and Recognition, pp. 191–204. Springer (2021)

  30. Markou, K., Tsochatzidis, L., Zagoris, K., Papazoglou, A., Karagiannis, B., Symeonidis, S., Pratikakis, I.: A convolutional recurrent neural network for the handwritten text recognition of historical Greek manuscripts. In: Proceedings of the International Conference on Pattern Recognition Workshops and Challenges, pp. 249–262. Springer

  31. Marti, U.V., Bunke, H.: Handwritten sentence recognition. In: Proceedings of the International Conference on Pattern Recognition, volume 3, pp. 463–466. IEEE (2000)

  32. Marti, U.V., Bunke, H.: The IAM-database: an English sentence database for offline handwriting recognition. Int. J. Doc. Anal. Recognit. 5(1), 39–46 (2002)

    Article  Google Scholar 

  33. Michael, J., Labahn, R., Grüning, T., Zöllner,J.: Evaluating sequence-to-sequence models for handwritten text recognition. In: Proceedings of the International Conference on Document Analysis and Recognition, pp. 1286–1293. IEEE (2019)

  34. Moysset, B., Kermorvant, C., Wolf, C.: Full-Page text recognition: learning where to start and when to stop. In: Proceedings of the International Conference on Document Analysis and Recognition, volume 1, pp. 871–876. IEEE (2017)

  35. Moysset, B., Messina, R.: Are 2D-LSTM really dead for offline text recognition? Int. J. Doc. Anal. Recognit. 22(3), 193–208 (2019)

    Article  Google Scholar 

  36. Pham, V., Bluche, T., Kermorvant, C., Louradour, J.: Dropout improves recurrent neural networks for handwriting recognition. In: Proceedings of the International Conference on Frontiers in Handwriting Recognition, pp. 285–290. IEEE (2014)

  37. Poznanski, A., Wolf, L.: CNN-N-gram for handwriting word recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2305–2314 (2016)

  38. Puigcerver, J.: Are multidimensional recurrent layers really necessary for handwritten text recognition? In: Proceedings of the International Conference on Document Analysis and Recognition, 1, pp. 67–72. IEEE (2017)

  39. Quirós, L., Bosch, V., Serrano, L., Toselli, A.H., Vidal, E.: From HMMs to RNNs: computer-assisted transcription of a handwritten notarial records collection. In: Proceedings of the International Conference on Frontiers in Handwriting Recognition, pp. 116–121. IEEE (2018)

  40. Sánchez, J.A., Romero, V., Toselli, A.H., Vidal, E.: ICFHR2014 competition on handwritten text recognition on transcriptorium datasets (HTRtS). In: Proceedings of the International Conference on Frontiers in Handwriting Recognition, pp. 785–790. IEEE (2014)

  41. Sanchez, J.A., Romero, V., Toselli, A.H., Vidal, E.: ICFHR2016 competition on handwritten text recognition on the READ dataset. In: Proceedings of the International Conference on Frontiers in Handwriting Recognition, pp. 630–635. IEEE (2016)

  42. Santoro, A., Marcelli, A.: Using keyword spotting systems as tools for the transcription of historical handwritten documents: models and procedures for performance evaluation. Pattern Recognit. Lett. 131, 329–335 (2020)

    Article  Google Scholar 

  43. Shi, B., Bai, X., Yao, C.: An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition. IEEE Trans. Pattern Anal. Mach. Intell. 39(11), 2298–2304 (2016)

    Article  Google Scholar 

  44. Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. Presented at the (2015)

  45. Such, F.P., Peri, D., Brockler, F., Paul, H., Ptucha, R.: Fully Convolutional networks for handwriting recognition. In: Proceedings of the International Conference on Frontiers in Handwriting Recognition, pp. 86–91. IEEE (2018)

  46. Sueiras, J., Ruiz, V., Sanchez, A., Velez, J.F.: Offline continuous handwriting recognition using sequence to sequence neural networks. Neurocomputing 289, 119–128 (2018)

  47. Toselli, A.H., Juan, A., González, J., Salvador, I., Vidal, E., Casacuberta, F., Keysers, D., Ney, H.: Integrated handwriting recognition and interpretation using finite-state models. Int. J. Pattern Recognit. Artif. Intell. 18(04), 519–539 (2004)

    Article  Google Scholar 

  48. Toselli, A.H., Vidal, E.: Handwritten text recognition results on the Bentham collection with improved classical n-gram-HMM methods. In: Proceedings of the International Workshop on Historical Document Imaging and Processing, pp. 15–22 (2015)

  49. Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser , Ł., Polosukhin, I.: Attention is all you need. In: Advances in Neural Information Processing Systems, pp. 5998–6008 (2017)

  50. Voigtlaender, P., Doetsch, P., Ney, H.: Handwriting recognition with large multidimensional long short-term memory recurrent neural networks. In: Proceedings of the International Conference on Frontiers in Handwriting Recognition, pp. 228–233. IEEE (2016)

  51. Wick, C., Zöllner, J., Grüning, T.: 2021. Rescoring sequence-to-sequence models for text line recognition with CTC-prefixes. arXiv preprint arXiv:2110.05909

  52. Wigington, C., Stewart, S., Davis, B., Barrett, B., Price, B., Cohen, S.: Data augmentation for recognition of handwritten words and lines using a CNN-LSTM network. In: Proceedings of the International Conference on Document Analysis and Recognition, volume 1, pp. 639–645. IEEE (2017)

  53. Wigington, C., Tensmeyer, C., Davis, B., Barrett, W., Price, B., Cohen, S.: Start, follow, read: end-to-end full-page handwriting recognition. In: Proceedings of the European Conference on Computer Vision, pp. 367–383 (2018)

  54. Yousef, M., Bishop, T.E.: OrigamiNet: weakly-supervised, segmentation-free, one-step, Full Page Text Recognition by learning to unfold. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognit., pp. 14710–14719 (2020)

  55. Yousef, M., Hussain, K.F., Mohammed, U.S.: Accurate, data-efficient, unconstrained text recognition with convolutional neural networks. Pattern Recognit. 108, 107482 (2020)

    Article  Google Scholar 

  56. Zhang, Y., Nie, S., Liu, W., Xu, X., Zhang, D., Shen, H.T.: Sequence-to-sequence domain adaptation network for robust text image recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2740–2749 (2019)

  57. Zhong, Z., Zhang, X.Y., Yin, F., Liu, C.L.: Handwritten Chinese character recognition with spatial transformer and deep residual networks. Presented at the (2016)

Download references

Acknowledgements

This work was supported by the “AI for Digital Humanities” project (Pratica Sime n.2018.0390), funded by “Fondazione di Modena”, and by the “DHMoRe Lab” project (CUP E94I19001060003), funded by “Regione Emilia Romagna”.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Silvia Cascianelli.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix

Appendix

Models architectures

We provide the detailed architecture of the proposedDefConv-based HTR models: in Table 4 for CRNN, in Table 5 for 1D-LSTM. The offsets of the DefConvs layers are handled in a standard convolutional layer before the DefConv, which is in charge of learning two parameters for each kernel cell of the DefConv. Note that the output size of the final Linear layer, c depends on the charset size of each dataset (including the blank character). In particular: \(c = 96\) for the IAM dataset, \(c = 80\) for the RIMES, \(c = 89\) for the ICFHR14, \(c = 94\) for the ICFHR16, and \(c = 77\) for the Leopardi. Note that, from a practical standpoint, when the whole dataset is available, c can be calculated directly as the number of the characters appearing in the dataset (i.e. the charset), plus the blank character. For new or unknown datasets, the charset, and thus, c, can be estimated e.g., from large corpora in the same language as the dataset of interest, but can potentially include as many characters as the designer wants. In this latter case, logits corresponding to characters included in the charset but not appearing in the dataset of interest will be assigned zero probability. In the StandardConv-based baselines we used in the experiments, each pair of offset Convolution layer and the DefConv layer is replaced by a standard convolution layer with the same characteristics as the DefConv layer.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Cascianelli, S., Cornia, M., Baraldi, L. et al. Boosting modern and historical handwritten text recognition with deformable convolutions. IJDAR 25, 207–217 (2022). https://doi.org/10.1007/s10032-022-00401-y

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10032-022-00401-y

Keywords

  • Handwritten text recognition
  • Deformable convolutions
  • Historical manuscripts