Character-based handwritten text transcription with attention networks

Abstract

The paper approaches the task of handwritten text recognition (HTR) with attentional encoder–decoder networks trained on sequences of characters, rather than words. We experiment on lines of text from popular handwriting datasets and compare different activation functions for the attention mechanism used for aligning image pixels and target characters. We find that softmax attention focuses heavily on individual characters, while sigmoid attention focuses on multiple characters at each step of the decoding. When the sequence alignment is one-to-one, softmax attention is able to learn a more precise alignment at each step of the decoding, whereas the alignment generated by sigmoid attention is much less precise. When a linear function is used to obtain attention weights, the model predicts a character by looking at the entire sequence of characters and performs poorly because it lacks a precise alignment between the source and target. Future research may explore HTR in natural scene images, since the model is capable of transcribing handwritten text without the need for producing segmentations or bounding boxes of text in images.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Data Availability Statement

The IAM, Saint Gall, and Parzival datasets can be downloaded from: https://fki.tic.heia-fr.ch/databases. The RIMES dataset can be downloaded from: http://www.a2ialab.com/doku.php?id=rimes_database:start.

Notes

  1. 1.

    Similarly, Kim et al. [71] find that softmax attention performs better than sigmoid attention on word-to-word machine translation tasks.

References

  1. 1.

    Bluche T, Louradour J, Messina R (2016) Scan, attend and read: end-to-end handwritten paragraph recognition with MDLSTM attention. ArXiv e-prints 1604:03286

  2. 2.

    Louradour J, Kermorvant C (2013) Curriculum learning for handwritten text line recognition. ArXiv e-prints 1312:1737

  3. 3.

    Graves A, Fernández S, Gomez F, Schmidhuber J (2006) Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: Proceedings of the 23rd international conference on Machine learning, pp 369–376

  4. 4.

    Graves A, Liwicki M, Fernández S, Bertolami R, Bunke H, Schmidhuber J (2009) A novel connectionist dystem for unconstrained handwriting recognition. IEEE 31:855–868

    Google Scholar 

  5. 5.

    Liwicki M, Graves A, Bunke H, Schmidhuber J (2007) A novel approach to on-line handwriting recognition based on bidirectional long short-term memory networks. In: Proceedings of the 9th International conference on document analysis and recognition, vol 1, pp 367–371

  6. 6.

    Liwicki M, Graves A, Bunke H (2012) Neural networks for handwriting recognition. Computational intelligence paradigms in advanced pattern classification. Springer, Berlin, pp 5–24

    Google Scholar 

  7. 7.

    Wigington C, Stewart S, Davis B, Barrett B, Price B, Cohen S (2017) Data augmentation for recognition of handwritten words and lines using a CNN-LSTM network. In: 2017 14th IAPR International conference on document analysis and recognition (ICDAR), IEEE, vol 1, pp 639–645

  8. 8.

    Stuner B, Chatelain C, Paquet T (2020) Handwriting recognition using cohort of lstm and lexicon verification with extremely large lexicon. Multim Tools Appl 79(45):34407–34427

    Article  Google Scholar 

  9. 9.

    Deng Y, Kanervisto A, Ling J, Rush AM (2016) Image-to-Markup Generation with Coarse-to-Fine Attention. ArXiv e-prints 1609:04938

  10. 10.

    Vinyals O, Kaiser L, Koo T, Petrov S, Sutskever I, Hinton G (2014) Grammar as a Foreign Language. ArXiv e-prints 1412:7449

  11. 11.

    Bahdanau D, Cho K, Bengio Y (2014) Neural machine translation by jointly learning to align and translate. ArXiv e-prints 1409:0473

  12. 12.

    Chorowski JK, Bahdanau D, Serdyuk D, Cho K, Bengio Y (2015) Attention-based models for speech recognition. In: Advances in neural information processing systems, pp 577–585

  13. 13.

    Xu K, Ba J, Kiros R, Cho K, Courville AC, Salakhutdinov R, Zemel RS, Bengio Y (2015) Show, attend and tell: neural image caption generation with visual attention. ICML 14:77–81

    Google Scholar 

  14. 14.

    Cho K, van Merrienboer B, Gulcehre C, Bahdanau D, Bougares F, Schwenk H, Bengio Y (2014) Learning phrase representations using RNN encoder-decoder for statistical machine translation. ArXiv e-prints 1406:1078

  15. 15.

    Cho K, Courville A, Bengio Y (2015) Describing multimedia content using attention-based encoder-decoder networks. ArXiv e-prints 1507:01053

  16. 16.

    Lee CY, Osindero S (2016) Recursive recurrent nets with attention modeling for OCR in the wild. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2231–2239

  17. 17.

    Shi B, Wang X, Lyu P, Yao C, Bai X (2016) Robust scene text recognition with automatic rectification. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4168–4176

  18. 18.

    Bluche T, Messina R (2017) Gated convolutional recurrent neural networks for multilingual handwriting recognition. In: Proceedings of the 13th International conference on document analysis and recognition (ICDAR), Kyoto, Japan, pp 13–15

  19. 19.

    Puigcerver J (2017) Are multidimensional recurrent layers really necessary for handwritten text recognition? In: Document analysis and recognition (ICDAR), 2017 14th IAPR international conference on, IEEE, vol 1, pp 67–72

  20. 20.

    Chowdhury A, Vig L (2018) An efficient end-to-end neural model for handwritten text recognition. ArXiv e-prints 1807.07965

  21. 21.

    Zhang Y, Nie S, Liu W, Xu X, Zhang D, Shen HT (2019) Sequence-to-sequence domain adaptation network for robust text image recognition. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR)

  22. 22.

    Kang L, Riba P, Villegas M, Fornés A, Rusiñol M (2019) Candidate fusion: Integrating language modelling into a sequence-to-sequence handwritten word recognition architecture. ArXiv e-prints 1912.10308

  23. 23.

    Kang L, Rusiñol M, Fornés A, Riba P, Villegas M (2020) Unsupervised writer adaptation for synthetic-to-real handwritten word recognition. In: The IEEE winter conference on applications of computer vision, pp 3502–3511

  24. 24.

    Xiao S, Peng L, Yan R, Wang S (2020) Deep network with pixel-level rectification and robust training for handwriting recognition. SN Comput Sci 1(3):1–13

    Article  Google Scholar 

  25. 25.

    Retsinas G, Sfikas G, Maragos P (2020) Wsrnet: Joint spotting and recognition of handwritten words. ArXiv e-prints 1604:032860

  26. 26.

    Belay B, Habtegebrial T, Belay G, Mesheshsa M, Liwicki M, Stricker D (2020) Learning by injection: Attention embedded recurrent neural network for amharic text-image recognition. 1604:032861

  27. 27.

    Sueiras J, Ruiz V, Sanchez A, Velez JF (2018) Offline continuous handwriting recognition using sequence to sequence neural networks. Neurocomputing

  28. 28.

    Kang L, Toledo JI, Riba P, Villegas M, Fornés A, Rusinol M (2018) Convolve, attend and spell: An attention-based sequence-to-sequence model for handwritten word recognition. In: German Conference on Pattern Recognition. pp 459–472. Springer, Berlin

  29. 29.

    Gui L, Liang X, Chang X, Hauptmann AG (2018) Adaptive context-aware reinforced agent for handwritten text recognition. In: Proceedings of the British machine vision conference (BMVC)

  30. 30.

    Gehring J, Auli M, Grangier D, Yarats D, Dauphin YN (2017) Convolutional sequence to sequence learning. ArXiv e-prints 1604:032862

  31. 31.

    Fogel S, Averbuch-Elor H, Cohen S, Mazor S, Litman R (2020) Scrabblegan: Semi-supervised varying length handwritten text generation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR)

  32. 32.

    Davis B, Tensmeyer C, Price B, Wigington C, Morse B, Jain R (2020) Text and style conditioned GAN for generation of offline handwriting lines. ArXiv e-prints 1604:032863

  33. 33.

    Poznanski A, Wolf L (2016) CNN-n-gram for handwriting word recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2305–2314

  34. 34.

    Such FP, Peri D, Brockler F, Paul H, Ptucha R (2018) Fully convolutional networks for handwriting recognition. In: 2018 16th International conference on frontiers in handwriting recognition (ICFHR), IEEE, pp 86–91

  35. 35.

    Coquenet D, Soullard Y, Chatelain C, Paquet T (2019) Have convolutions already made recurrence obsolete for unconstrained handwritten text recognition? In: 2019 International conference on document analysis and recognition workshops (ICDARW), IEEE, vol 5, pp 65–70

  36. 36.

    Ptucha R, Such FP, Pillai S, Brockler F, Singh V, Hutkowski P (2019) Intelligent character recognition using fully convolutional neural networks. Pattern Recogn 88:604–613

    Article  Google Scholar 

  37. 37.

    Yousef M, Hussain KF, Mohammed US (2020) Accurate, data-efficient, yunconstrained text recognition with convolutional neural networks. Pattern Recogn 108:107482

    Article  Google Scholar 

  38. 38.

    Yousef M, Bishop TE (2020) Origaminet: Weakly-supervised, segmentation-free, one-step, full page text recognition by learning to unfold. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR)

  39. 39.

    Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser Ł, Polosukhin I (2017) Attention is all you need. Adv Neural Inf Process Syst 30:5998–6008

    Google Scholar 

  40. 40.

    Kang L, Riba P, Rusiñol M, Fornés A, Villegas M (2020) Pay attention to what you read: Non-recurrent handwritten text-line recognition. ArXiv e-prints 1604:032864

  41. 41.

    Ling W, Trancoso I, Dyer C, Black AW (2015) Character-based neural machine translation. ArXiv e-prints 1604:032865

  42. 42.

    Marti UV, Bunke H (2002) The IAM-database: an english sentence database for offline handwriting recognition. Int J Doc Anal Recogn 5(1):39–46

    Article  Google Scholar 

  43. 43.

    Grosicki E, El-Abed H (2011) ICDAR 2011: French handwriting recognition competition. In: Proceedings of the international conference on document analysis and recognition, pp 1459–1463

  44. 44.

    Fischer A, Frinken V, Fornés A, Bunke H (2011) Transcription alignment of Latin manuscripts using Hidden Markov Models. In: Proceedings of the 2011 workshop on historical document imaging and processing, ACM, pp 29–36

  45. 45.

    Fischer A, Wuthrich M, Liwicki M, Frinken V, Bunke H, Viehhauser G, Stolz M (2009) Automatic transcription of handwritten medieval documents. In: 2009 15th international conference on virtual systems and multimedia, IEEE, pp 137–142

  46. 46.

    Puigcerver J, Martin-Albo D, Villegas M (2016) Laia: A deep learning toolkit for HTR. 1604:032866, gitHub repository

  47. 47.

    Villegas M, Romero V, Sánchez JA (2015) On the modification of binarization algorithms to retain grayscale information for handwritten text recognition. In: Iberian conference on pattern recognition and image analysis. pp 208–215, Springer, Berlin

  48. 48.

    Wang P, Sun R, Zhao H, Yu K (2013) A new word language model evaluation metric for character based languages. In: Chinese computational linguistics and natural language processing based on naturally annotated big data. pp 315–324. Springer, Berlin

  49. 49.

    Bluche T (2015) Deep neural networks for large vocabulary handwritten text recognition. PhD thesis, Université Paris Sud-Paris XI

  50. 50.

    Jean S, Cho K, Memisevic R, Bengio Y (2014) On Using Very Large Target Vocabulary for Neural Machine Translation. ArXiv e-prints 1604:032867

  51. 51.

    Graves A, Schmidhuber J (2009) Offline handwriting recognition with multidimensional recurrent neural networks. In: Advances in neural information processing systems, pp 545–552

  52. 52.

    Michael J, Labahn R, Grüning T, Zöllner J (2019) Evaluating sequence-to-sequence models for handwritten text recognition. In: 2019 International conference on document analysis and recognition (ICDAR), IEEE, pp 1286–1293

  53. 53.

    Voigtlaender P, Doetsch P, Ney H (2016) Handwriting recognition with large multidimensional long short-term memory recurrent neural networks. In: Frontiers in handwriting recognition (ICFHR), 2016 15th international conference on, IEEE, pp 228–233

  54. 54.

    Castro D, Bezerra BL, Valença M (2018) Boosting the deep multidimensional long-short-term memory network for handwritten recognition systems. In: 2018 16th International conference on frontiers in handwriting recognition (ICFHR), IEEE, pp 127–132

  55. 55.

    Bluche T (2016) Joint line segmentation and transcription for end-to-end handwritten paragraph recognition. In: Advances in neural information processing systems, pp 838–846

  56. 56.

    Doetsch P, Kozielski M, Ney H (2014) Fast and robust training of recurrent neural networks for offline handwriting recognition. In: Frontiers in handwriting recognition (ICFHR), 2014 14th international conference on, IEEE, pp 279–284

  57. 57.

    Voigtlaender P, Doetsch P, Wiesler S, Schlüter R, Ney H (2015) Sequence-discriminative training of recurrent neural networks. In: Acoustics, speech and signal processing (ICASSP), 2015 IEEE International Conference on, IEEE, pp 2100–2104

  58. 58.

    Coquenet D, Chatelain C, Paquet T (2020) End-to-end handwritten paragraph text recognition using a vertical attention network. ArXiv e-prints 1604:032868

  59. 59.

    Kozielski M, Doetsch P, Ney H (2013) Improvements in RWTH’s system for off-line handwriting recognition. In: 2013 12th International conference on document analysis and recognition, IEEE, pp 935–939

  60. 60.

    Pham V, Bluche T, Kermorvant C, Louradour J (2014) Dropout improves recurrent neural networks for handwriting recognition. In: Frontiers in handwriting recognition (ICFHR), 2014 14th international conference on, IEEE, pp 285–290

  61. 61.

    Dutta K, Krishnan P, Mathew M, Jawahar C (2018) Improving CNN-RNN hybrid networks for handwriting recognition. In: 2018 16th International conference on frontiers in handwriting recognition (ICFHR), IEEE, pp 80–85

  62. 62.

    Huang X, Qiao L, Yu W, Li J, Ma Y (2020) End-to-end sequence labeling via convolutional recurrent neural network with a connectionist temporal classification layer. Int J Comput Intell Syst 13:341–351. 1604:032869

    Article  Google Scholar 

  63. 63.

    Kozielski M, Rybach D, Hahn S, Schlüter R, Ney H (2013) Open vocabulary handwriting recognition using combined word-level and character-level language models. In: Acoustics, speech and signal processing (ICASSP), 2013 IEEE international conference on, IEEE, pp 8257–8261

  64. 64.

    Krishnan P, Dutta K, Jawahar C (2018) Word spotting and recognition using deep embedding. In: 2018 13th IAPR international workshop on document analysis systems (DAS), IEEE, pp 1–6

  65. 65.

    España-Boquera S, Castro-Bleda MJ, Gorbe-Moya J, Zamora-Martinez F (2011) Improving offline handwritten text recognition with hybrid hmm/ann models. IEEE Trans Pattern Anal Mach Intell 33(4):767–779. 1312:17370

    Article  Google Scholar 

  66. 66.

    Chen Z, Wu Y, Yin F, Liu CL (2017) Simultaneous script identification and handwriting recognition via multi-task learning of recurrent neural networks. In: 2017 14th IAPR international conference on document analysis and recognition (ICDAR), IEEE, vol 1, pp 525–530

  67. 67.

    Dreuw P, Doetsch P, Plahl C, Ney H (2011) Hierarchical hybrid MLP/HMM or rather MLP features for a discriminatively trained Gaussian HMM: a comparison for offline handwriting recognition. In: 2011 18th IEEE international conference on image processing, IEEE, pp 3541–3544

  68. 68.

    Doetsch P, Zeyer A, Ney H (2016) Bidirectional decoder networks for attention-based end-to-end offline handwriting recognition. In: 2016 15th international conference on frontiers in handwriting recognition (ICFHR), IEEE, pp 361–366

  69. 69.

    Menasri F, Louradour J, Bianne-Bernard AL, Kermorvant C (2012) The a2ia french handwriting recognition system at the rimes-icdar2011 competition. In: Document recognition and retrieval XIX, international society for optics and photonics, vol 8297, p 82970Y

  70. 70.

    Soullard Y, Ruffino C, Paquet T (2019) CTCModel: a Keras model for connectionist temporal classification. ArXiv e-prints 1312:17371

  71. 71.

    Kim Y, Denton C, Hoang L, Rush AM (2017) Structured attention networks. ArXiv e-prints 1312:17372

  72. 72.

    Veit A, Matera T, Neumann L, Matas J, Belongie S (2016) COCO-Text: dataset and benchmark for text detection and recognition in natural images. ArXiv e-prints 1312:17373

  73. 73.

    Jaderberg M, Simonyan K, Vedaldi A, Zisserman A (2016) Reading text in the wild with convolutional neural networks. Int J Comput Vision 116(1):1–20

    MathSciNet  Article  Google Scholar 

Download references

Acknowledgements

A shorter version of this work appeared as an arXiv preprint at https://arxiv.org/abs/1712.04046.

Author information

Affiliations

Authors

Corresponding author

Correspondence to Jason Poulos.

Ethics declarations

Funding

Poulos acknowledges support of the National Science Foundation Graduate Research Fellowship under Grant DGE-1106400, and the National Science Foundation under Grant DMS-1638521 to the Statistical and Applied Mathematical Sciences Institute. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the National Science Foundation.

Code availability

Implementation code is available at the repository: https://github.com/jvpoulos/Attention-OCR/.

Conflict of interest

The authors declare no conflicts of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Poulos, J., Valle, R. Character-based handwritten text transcription with attention networks. Neural Comput & Applic (2021). https://doi.org/10.1007/s00521-021-05813-1

Download citation

Keywords

  • Attention
  • Convolutional neural networks
  • Handwritten text recognition
  • Recurrent neural networks