The paper approaches the task of handwritten text recognition (HTR) with attentional encoder–decoder networks trained on sequences of characters, rather than words. We experiment on lines of text from popular handwriting datasets and compare different activation functions for the attention mechanism used for aligning image pixels and target characters. We find that softmax attention focuses heavily on individual characters, while sigmoid attention focuses on multiple characters at each step of the decoding. When the sequence alignment is one-to-one, softmax attention is able to learn a more precise alignment at each step of the decoding, whereas the alignment generated by sigmoid attention is much less precise. When a linear function is used to obtain attention weights, the model predicts a character by looking at the entire sequence of characters and performs poorly because it lacks a precise alignment between the source and target. Future research may explore HTR in natural scene images, since the model is capable of transcribing handwritten text without the need for producing segmentations or bounding boxes of text in images.
This is a preview of subscription content, access via your institution.
Buy single article
Instant access to the full article PDF.
Tax calculation will be finalised during checkout.
Subscribe to journal
Immediate online access to all issues from 2019. Subscription will auto renew annually.
Tax calculation will be finalised during checkout.
Data Availability Statement
The IAM, Saint Gall, and Parzival datasets can be downloaded from: https://fki.tic.heia-fr.ch/databases. The RIMES dataset can be downloaded from: http://www.a2ialab.com/doku.php?id=rimes_database:start.
Similarly, Kim et al.  find that softmax attention performs better than sigmoid attention on word-to-word machine translation tasks.
Bluche T, Louradour J, Messina R (2016) Scan, attend and read: end-to-end handwritten paragraph recognition with MDLSTM attention. ArXiv e-prints 1604:03286
Louradour J, Kermorvant C (2013) Curriculum learning for handwritten text line recognition. ArXiv e-prints 1312:1737
Graves A, Fernández S, Gomez F, Schmidhuber J (2006) Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: Proceedings of the 23rd international conference on Machine learning, pp 369–376
Graves A, Liwicki M, Fernández S, Bertolami R, Bunke H, Schmidhuber J (2009) A novel connectionist dystem for unconstrained handwriting recognition. IEEE 31:855–868
Liwicki M, Graves A, Bunke H, Schmidhuber J (2007) A novel approach to on-line handwriting recognition based on bidirectional long short-term memory networks. In: Proceedings of the 9th International conference on document analysis and recognition, vol 1, pp 367–371
Liwicki M, Graves A, Bunke H (2012) Neural networks for handwriting recognition. Computational intelligence paradigms in advanced pattern classification. Springer, Berlin, pp 5–24
Wigington C, Stewart S, Davis B, Barrett B, Price B, Cohen S (2017) Data augmentation for recognition of handwritten words and lines using a CNN-LSTM network. In: 2017 14th IAPR International conference on document analysis and recognition (ICDAR), IEEE, vol 1, pp 639–645
Stuner B, Chatelain C, Paquet T (2020) Handwriting recognition using cohort of lstm and lexicon verification with extremely large lexicon. Multim Tools Appl 79(45):34407–34427
Deng Y, Kanervisto A, Ling J, Rush AM (2016) Image-to-Markup Generation with Coarse-to-Fine Attention. ArXiv e-prints 1609:04938
Vinyals O, Kaiser L, Koo T, Petrov S, Sutskever I, Hinton G (2014) Grammar as a Foreign Language. ArXiv e-prints 1412:7449
Bahdanau D, Cho K, Bengio Y (2014) Neural machine translation by jointly learning to align and translate. ArXiv e-prints 1409:0473
Chorowski JK, Bahdanau D, Serdyuk D, Cho K, Bengio Y (2015) Attention-based models for speech recognition. In: Advances in neural information processing systems, pp 577–585
Xu K, Ba J, Kiros R, Cho K, Courville AC, Salakhutdinov R, Zemel RS, Bengio Y (2015) Show, attend and tell: neural image caption generation with visual attention. ICML 14:77–81
Cho K, van Merrienboer B, Gulcehre C, Bahdanau D, Bougares F, Schwenk H, Bengio Y (2014) Learning phrase representations using RNN encoder-decoder for statistical machine translation. ArXiv e-prints 1406:1078
Cho K, Courville A, Bengio Y (2015) Describing multimedia content using attention-based encoder-decoder networks. ArXiv e-prints 1507:01053
Lee CY, Osindero S (2016) Recursive recurrent nets with attention modeling for OCR in the wild. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2231–2239
Shi B, Wang X, Lyu P, Yao C, Bai X (2016) Robust scene text recognition with automatic rectification. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4168–4176
Bluche T, Messina R (2017) Gated convolutional recurrent neural networks for multilingual handwriting recognition. In: Proceedings of the 13th International conference on document analysis and recognition (ICDAR), Kyoto, Japan, pp 13–15
Puigcerver J (2017) Are multidimensional recurrent layers really necessary for handwritten text recognition? In: Document analysis and recognition (ICDAR), 2017 14th IAPR international conference on, IEEE, vol 1, pp 67–72
Chowdhury A, Vig L (2018) An efficient end-to-end neural model for handwritten text recognition. ArXiv e-prints 1807.07965
Zhang Y, Nie S, Liu W, Xu X, Zhang D, Shen HT (2019) Sequence-to-sequence domain adaptation network for robust text image recognition. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR)
Kang L, Riba P, Villegas M, Fornés A, Rusiñol M (2019) Candidate fusion: Integrating language modelling into a sequence-to-sequence handwritten word recognition architecture. ArXiv e-prints 1912.10308
Kang L, Rusiñol M, Fornés A, Riba P, Villegas M (2020) Unsupervised writer adaptation for synthetic-to-real handwritten word recognition. In: The IEEE winter conference on applications of computer vision, pp 3502–3511
Xiao S, Peng L, Yan R, Wang S (2020) Deep network with pixel-level rectification and robust training for handwriting recognition. SN Comput Sci 1(3):1–13
Retsinas G, Sfikas G, Maragos P (2020) Wsrnet: Joint spotting and recognition of handwritten words. ArXiv e-prints 1604:032860
Belay B, Habtegebrial T, Belay G, Mesheshsa M, Liwicki M, Stricker D (2020) Learning by injection: Attention embedded recurrent neural network for amharic text-image recognition. 1604:032861
Sueiras J, Ruiz V, Sanchez A, Velez JF (2018) Offline continuous handwriting recognition using sequence to sequence neural networks. Neurocomputing
Kang L, Toledo JI, Riba P, Villegas M, Fornés A, Rusinol M (2018) Convolve, attend and spell: An attention-based sequence-to-sequence model for handwritten word recognition. In: German Conference on Pattern Recognition. pp 459–472. Springer, Berlin
Gui L, Liang X, Chang X, Hauptmann AG (2018) Adaptive context-aware reinforced agent for handwritten text recognition. In: Proceedings of the British machine vision conference (BMVC)
Gehring J, Auli M, Grangier D, Yarats D, Dauphin YN (2017) Convolutional sequence to sequence learning. ArXiv e-prints 1604:032862
Fogel S, Averbuch-Elor H, Cohen S, Mazor S, Litman R (2020) Scrabblegan: Semi-supervised varying length handwritten text generation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR)
Davis B, Tensmeyer C, Price B, Wigington C, Morse B, Jain R (2020) Text and style conditioned GAN for generation of offline handwriting lines. ArXiv e-prints 1604:032863
Poznanski A, Wolf L (2016) CNN-n-gram for handwriting word recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2305–2314
Such FP, Peri D, Brockler F, Paul H, Ptucha R (2018) Fully convolutional networks for handwriting recognition. In: 2018 16th International conference on frontiers in handwriting recognition (ICFHR), IEEE, pp 86–91
Coquenet D, Soullard Y, Chatelain C, Paquet T (2019) Have convolutions already made recurrence obsolete for unconstrained handwritten text recognition? In: 2019 International conference on document analysis and recognition workshops (ICDARW), IEEE, vol 5, pp 65–70
Ptucha R, Such FP, Pillai S, Brockler F, Singh V, Hutkowski P (2019) Intelligent character recognition using fully convolutional neural networks. Pattern Recogn 88:604–613
Yousef M, Hussain KF, Mohammed US (2020) Accurate, data-efficient, yunconstrained text recognition with convolutional neural networks. Pattern Recogn 108:107482
Yousef M, Bishop TE (2020) Origaminet: Weakly-supervised, segmentation-free, one-step, full page text recognition by learning to unfold. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR)
Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser Ł, Polosukhin I (2017) Attention is all you need. Adv Neural Inf Process Syst 30:5998–6008
Kang L, Riba P, Rusiñol M, Fornés A, Villegas M (2020) Pay attention to what you read: Non-recurrent handwritten text-line recognition. ArXiv e-prints 1604:032864
Ling W, Trancoso I, Dyer C, Black AW (2015) Character-based neural machine translation. ArXiv e-prints 1604:032865
Marti UV, Bunke H (2002) The IAM-database: an english sentence database for offline handwriting recognition. Int J Doc Anal Recogn 5(1):39–46
Grosicki E, El-Abed H (2011) ICDAR 2011: French handwriting recognition competition. In: Proceedings of the international conference on document analysis and recognition, pp 1459–1463
Fischer A, Frinken V, Fornés A, Bunke H (2011) Transcription alignment of Latin manuscripts using Hidden Markov Models. In: Proceedings of the 2011 workshop on historical document imaging and processing, ACM, pp 29–36
Fischer A, Wuthrich M, Liwicki M, Frinken V, Bunke H, Viehhauser G, Stolz M (2009) Automatic transcription of handwritten medieval documents. In: 2009 15th international conference on virtual systems and multimedia, IEEE, pp 137–142
Puigcerver J, Martin-Albo D, Villegas M (2016) Laia: A deep learning toolkit for HTR. 1604:032866, gitHub repository
Villegas M, Romero V, Sánchez JA (2015) On the modification of binarization algorithms to retain grayscale information for handwritten text recognition. In: Iberian conference on pattern recognition and image analysis. pp 208–215, Springer, Berlin
Wang P, Sun R, Zhao H, Yu K (2013) A new word language model evaluation metric for character based languages. In: Chinese computational linguistics and natural language processing based on naturally annotated big data. pp 315–324. Springer, Berlin
Bluche T (2015) Deep neural networks for large vocabulary handwritten text recognition. PhD thesis, Université Paris Sud-Paris XI
Jean S, Cho K, Memisevic R, Bengio Y (2014) On Using Very Large Target Vocabulary for Neural Machine Translation. ArXiv e-prints 1604:032867
Graves A, Schmidhuber J (2009) Offline handwriting recognition with multidimensional recurrent neural networks. In: Advances in neural information processing systems, pp 545–552
Michael J, Labahn R, Grüning T, Zöllner J (2019) Evaluating sequence-to-sequence models for handwritten text recognition. In: 2019 International conference on document analysis and recognition (ICDAR), IEEE, pp 1286–1293
Voigtlaender P, Doetsch P, Ney H (2016) Handwriting recognition with large multidimensional long short-term memory recurrent neural networks. In: Frontiers in handwriting recognition (ICFHR), 2016 15th international conference on, IEEE, pp 228–233
Castro D, Bezerra BL, Valença M (2018) Boosting the deep multidimensional long-short-term memory network for handwritten recognition systems. In: 2018 16th International conference on frontiers in handwriting recognition (ICFHR), IEEE, pp 127–132
Bluche T (2016) Joint line segmentation and transcription for end-to-end handwritten paragraph recognition. In: Advances in neural information processing systems, pp 838–846
Doetsch P, Kozielski M, Ney H (2014) Fast and robust training of recurrent neural networks for offline handwriting recognition. In: Frontiers in handwriting recognition (ICFHR), 2014 14th international conference on, IEEE, pp 279–284
Voigtlaender P, Doetsch P, Wiesler S, Schlüter R, Ney H (2015) Sequence-discriminative training of recurrent neural networks. In: Acoustics, speech and signal processing (ICASSP), 2015 IEEE International Conference on, IEEE, pp 2100–2104
Coquenet D, Chatelain C, Paquet T (2020) End-to-end handwritten paragraph text recognition using a vertical attention network. ArXiv e-prints 1604:032868
Kozielski M, Doetsch P, Ney H (2013) Improvements in RWTH’s system for off-line handwriting recognition. In: 2013 12th International conference on document analysis and recognition, IEEE, pp 935–939
Pham V, Bluche T, Kermorvant C, Louradour J (2014) Dropout improves recurrent neural networks for handwriting recognition. In: Frontiers in handwriting recognition (ICFHR), 2014 14th international conference on, IEEE, pp 285–290
Dutta K, Krishnan P, Mathew M, Jawahar C (2018) Improving CNN-RNN hybrid networks for handwriting recognition. In: 2018 16th International conference on frontiers in handwriting recognition (ICFHR), IEEE, pp 80–85
Huang X, Qiao L, Yu W, Li J, Ma Y (2020) End-to-end sequence labeling via convolutional recurrent neural network with a connectionist temporal classification layer. Int J Comput Intell Syst 13:341–351. 1604:032869
Kozielski M, Rybach D, Hahn S, Schlüter R, Ney H (2013) Open vocabulary handwriting recognition using combined word-level and character-level language models. In: Acoustics, speech and signal processing (ICASSP), 2013 IEEE international conference on, IEEE, pp 8257–8261
Krishnan P, Dutta K, Jawahar C (2018) Word spotting and recognition using deep embedding. In: 2018 13th IAPR international workshop on document analysis systems (DAS), IEEE, pp 1–6
España-Boquera S, Castro-Bleda MJ, Gorbe-Moya J, Zamora-Martinez F (2011) Improving offline handwritten text recognition with hybrid hmm/ann models. IEEE Trans Pattern Anal Mach Intell 33(4):767–779. 1312:17370
Chen Z, Wu Y, Yin F, Liu CL (2017) Simultaneous script identification and handwriting recognition via multi-task learning of recurrent neural networks. In: 2017 14th IAPR international conference on document analysis and recognition (ICDAR), IEEE, vol 1, pp 525–530
Dreuw P, Doetsch P, Plahl C, Ney H (2011) Hierarchical hybrid MLP/HMM or rather MLP features for a discriminatively trained Gaussian HMM: a comparison for offline handwriting recognition. In: 2011 18th IEEE international conference on image processing, IEEE, pp 3541–3544
Doetsch P, Zeyer A, Ney H (2016) Bidirectional decoder networks for attention-based end-to-end offline handwriting recognition. In: 2016 15th international conference on frontiers in handwriting recognition (ICFHR), IEEE, pp 361–366
Menasri F, Louradour J, Bianne-Bernard AL, Kermorvant C (2012) The a2ia french handwriting recognition system at the rimes-icdar2011 competition. In: Document recognition and retrieval XIX, international society for optics and photonics, vol 8297, p 82970Y
Soullard Y, Ruffino C, Paquet T (2019) CTCModel: a Keras model for connectionist temporal classification. ArXiv e-prints 1312:17371
Kim Y, Denton C, Hoang L, Rush AM (2017) Structured attention networks. ArXiv e-prints 1312:17372
Veit A, Matera T, Neumann L, Matas J, Belongie S (2016) COCO-Text: dataset and benchmark for text detection and recognition in natural images. ArXiv e-prints 1312:17373
Jaderberg M, Simonyan K, Vedaldi A, Zisserman A (2016) Reading text in the wild with convolutional neural networks. Int J Comput Vision 116(1):1–20
A shorter version of this work appeared as an arXiv preprint at https://arxiv.org/abs/1712.04046.
Poulos acknowledges support of the National Science Foundation Graduate Research Fellowship under Grant DGE-1106400, and the National Science Foundation under Grant DMS-1638521 to the Statistical and Applied Mathematical Sciences Institute. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the National Science Foundation.
Implementation code is available at the repository: https://github.com/jvpoulos/Attention-OCR/.
Conflict of interest
The authors declare no conflicts of interest.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
About this article
Cite this article
Poulos, J., Valle, R. Character-based handwritten text transcription with attention networks. Neural Comput & Applic (2021). https://doi.org/10.1007/s00521-021-05813-1
- Convolutional neural networks
- Handwritten text recognition
- Recurrent neural networks