gMLP guided deep networks model for character-based handwritten text transcription

Bensouilah, Mouad; Taffar, Mokhtar; Zennir, Mohamed Nadjib

doi:10.1007/s11042-023-15293-1

gMLP guided deep networks model for character-based handwritten text transcription

Published: 07 July 2023

Volume 83, pages 13557–13575, (2024)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

144 Accesses
1 Altmetric
Explore all metrics

Abstract

In this work, we present an efficient approach to deal with the Handwritten text recognition (HTR) task. The proposed model combines convolutional and recurrent layers and gMLP networks trained on a sequence of characters rather than words. We experiment our model on lines of text from the popular benchmark datasets of handwriting with different languages and distinct sizes of gMLP. The gMLP networks can detect the spatial interaction between the different target characters, and therefore learn a more precise alignment at each step of the decoding. Our model performs well and achieves high performance of 9.0% in metric CER on the IAM dataset without the help of any lexicon or explicit language model.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Offline Handwritten Devanagari Word Recognition Using CNN-RNN-CTC

Article 13 December 2022

Deep Learning Based Handwritten Chinese Character and Text Recognition

Transformer for Handwritten Text Recognition Using Bidirectional Post-decoding

Data Availability

The IAM, Saint Gall, and Parzival datasets can be downloaded from: https://fki.tic.heia-fr.ch/databases. The KHATT dataset can be downloaded from: http://khatt.ideas2serve.net/.

Code Availability

Implementation code is available at the repository: https://github.com/mouadb0101/Line_HTR.

References

Ahmad I, Fink GA (2019) Handwritten arabic text recognition using multi-stage sub-core-shape hmms. International Journal on Document Analysis and Recognition (IJDAR) 22(3):329–349. https://doi.org/10.1007/s10032-019-00339-8
Article Google Scholar
Ahmad R, Naz S, Afzal MZ, Rashid SF, Liwicki M, Dengel A (2017) Khatt: a deep learning benchmark on arabic script. In: 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR), vol 7 (IEEE), pp 10–14. https://doi.org/10.1109/ICDAR.2017.358
Bensouilah M, Zennir M, Taffar M (2021) An ALPR system-based deep networks for the detection and recognition. In: Proceedings of the 10th International conference on pattern recognition applications and methods - ICPRAM,. INSTICC (SciTePress), pp 204–211. https://doi.org/10.5220/0010229202040211
Bluche T (2015) Deep neural networks for large vocabulary handwritten text recognition. Ph.D. thesis, Paris. pp 11
Bluche T (2016) Joint line segmentation and transcription for end-to-end handwritten paragraph recognition. Adv Neural Inf Process Syst, p 29
Bluche T, Louradour J, Messina R (2017) Scan, attend and read: end-to-end handwritten paragraph recognition with mdlstm attention. In: 14th IAPR international conference on document analysis and recognition (ICDAR), vol 1 (IEEE), pp 1050–1055. https://doi.org/10.1109/ICDAR.2017.174
Bluche T, Messina R (2017) Gated convolutional recurrent neural networks for multilingual handwriting recognition. In: 2017 14th IAPR international conference on document analysis and recognition (ICDAR), vol 1 (IEEE), pp 646–651. https://doi.org/10.1109/ICDAR.2017.111
Brown T, Mann B, Ryder N, Subbiah M, Kaplan JD, Dhariwal P, Neelakantan A, Shyam P, Sastry G, Askell A et al (2020) Language models are few-shot learners. Adv Neural Inf Process Syst 33:1877–1901
Google Scholar
Busta M, Neumann L, Matas J (2017) Deep textspotter: an end-to-end trainable scene text localization and recognition framework. In: Proceedings of the IEEE international conference on computer vision, pp 2204–2212
Carion N, Massa F, Synnaeve G, Usunier N, Kirillov A, Zagoruyko S (2020) End-to-end object detection with transformers. In: European conference on computer vision (Springer), pp 213–229. https://doi.org/10.1007/978-3-030-58452-8_13
Castro D, Bezerra BL, Valenċa M (2018) Boosting the deep multidimensional long-short-term memory network for handwritten recognition systems. In: 2018 16th international conference on frontiers in handwriting recognition (ICFHR) (IEEE), pp 127–132. https://doi.org/10.1109/ICFHR-2018.2018.00031
Chaudhary K, Bali R (2022) Easter2 0: improving convolutional models for handwritten text recognition. arXiv:2205.14879
Chen Z, Wu Y, Yin F, Liu CL (2017) Simultaneous script identification and handwriting recognition via multi-task learning of recurrent neural networks. In: 2017 14th IAPR international conference on document analysis and recognition (ICDAR), vol 1 (IEEE), pp 525–530. https://doi.org/10.1109/ICDAR.2017.92
Cheng Z, Bai F, Xu Y, Zheng G, Pu S, Zhou S (2017) Focusing attention: towards accurate text recognition in natural images. In: Proceedings of the IEEE international conference on computer vision, pp 5076–5084. https://doi.org/10.1109/ICCV.2017.543
Chowdhury A, Vig L (2018). https://doi.org/10.48550/arXiv.1807.07965
Coquenet D, Chatelain C, Paquet T (2022) End-to-end handwritten paragraph text recognition using a vertical attention network. IEEE Trans Pattern Anal Mach Intell. https://doi.org/10.1109/TPAMI.2022.3144899
Devlin J, Chang MW, Lee K, Toutanova K (2018) Bert: pre-training of deep bidirectional transformers for language understanding. arXiv:1810.04805. https://doi.org/10.48550/arXiv.1810.04805
Diaz DH, Qin S, Ingle R, Fujii Y, Bissacco A (2021) Rethinking text line recognition models. arXiv:2104.07787
Doetsch P, Kozielski M, Ney H (2014) Fast and robust training of recurrent neural networks for offline handwriting recognition. In: 2014 14th international conference on frontiers in handwriting recognition (IEEE), pp 279–284. https://doi.org/10.1109/ICFHR.2014.54
Dreuw P, Doetsch P, Plahl C, Ney H (2011) Hierarchical hybrid MLP/HMM or rather MLP features for a discriminatively trained gaussian HMM: a comparison for offline handwriting recognition. In: 2011 18th IEEE International conference on image processing (IEEE), pp 3541–3544. https://doi.org/10.1109/ICIP.2011.6116480
Dutta K, Krishnan P, Mathew M, Jawahar C (2018) Improving CNN-RNN hybrid networks for handwriting recognition. In: 2018 16th international conference on frontiers in handwriting recognition (ICFHR) (IEEE), pp 80–85. https://doi.org/10.1109/ICFHR-2018.2018.00023
Espana-Boquera S, Castro-Bleda MJ, Gorbe-Moya J, Zamora-Martinez F (2010) Improving offline handwritten text recognition with hybrid hmm/ann models. IEEE Trans Pattern Anal Mach Intell 33(4):767–779. https://doi.org/10.1109/TPAMI.2010.141
Article Google Scholar
Fischer A, Frinken V, Fornés A, Bunke H (2011) Transcription alignment of Latin manuscripts using hidden Markov models. In: Proceedings of the 2011 Workshop on historical document imaging and processing, pp 29–36. https://doi.org/10.1145/2037342.2037348
Fischer A, Wuthrich M, Liwicki M, Frinken V, Bunke H, Viehhauser G, Stolz M (2009) Automatic transcription of handwritten medieval documents. In: 2009 15th International conference on virtual systems and multimedia (IEEE), pp 137–142. https://doi.org/10.1109/VSMM.2009.26
Graves A, Fernández S, Gomez F, Schmidhuber J (2006) Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: Proceedings of the 23rd international conference on machine learning, pp 369–376. https://doi.org/10.1145/1143844.1143891
Graves A, Liwicki M, Fernández S, Bertolami R, Bunke H, Schmidhuber J (2008) A novel connectionist system for unconstrained handwriting recognition. IEEE Trans Pattern Anal Mach Intell 31 (5):855–868. https://doi.org/10.1109/TPAMI.2008.137
Article Google Scholar
Graves A, Schmidhuber J (2008) Offline handwriting recognition with multidimensional recurrent neural networks. Adv Neural Inf Process Syst, p 21
Gui L, Liang X, Chang X, Hauptmann AG (2018) Adaptive context-aware reinforced agent for handwritten text recognition. In: BMVC, vol 207
Hendrycks D, Gimpel K (2016) Gaussian error linear units (gelus). arXiv:1606.08415. https://doi.org/10.48550/arXiv.1606.08415
Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9(8):1735–1780. https://doi.org/10.1162/neco.1997.9.8.1735
Article Google Scholar
Huang X, Qiao L, Yu W, Li J, Ma Y (2020) End-to-end sequence labeling via convolutional recurrent neural network with a connectionist temporal classification layer. Int J Comput Intell Syst 13(1):341–351. https://doi.org/10.2991/ijcis.d.200316.001
Article Google Scholar
Jeong JJ, Tariq A, Adejumo T, Trivedi H, Gichoya JW, Banerjee I (2022) Systematic review of generative adversarial networks (gans) for medical image classification and segmentation. J Digit Imaging, pp 1–16. https://doi.org/10.1007/s10278-021-00556-w
Kang L, Riba P, Rusiñol M, Fornés A, Villegas M (2022) Pay attention to what you read: non-recurrent handwritten text-line recognition. Pattern Recogn 129(108):766. https://doi.org/10.1016/j.patcog.2022.108766
Google Scholar
Kang L, Riba P, Villegas M, Fornés A, Rusiñol M (2021) Candidate fusion: integrating language modelling into a sequence-to-sequence handwritten word recognition architecture. Pattern Recogn 112(107):790. https://doi.org/10.1016/j.patcog.2020.107790
Google Scholar
Kang L, Toledo JI, Riba P, Villegas M, Fornés A, Rusinol M (2018) Convolve, attend and spell: an attention-based sequence-to-sequence model for handwritten word recognition. In: German conference on pattern recognition (Springer), pp 459–472. https://doi.org/10.1007/978-3-030-12939-2_32
Kingma DP, Ba J (2014) Adam: a method for stochastic optimization. arXiv:1412.6980. https://doi.org/10.48550/arXiv.1412.6980
Kozielski M, Doetsch P, Ney H et al (2013) Improvements in rwth’s system for off-line handwriting recognition. In: 2013 12th International Conference on Document Analysis and Recognition (IEEE), pp 935–939. https://doi.org/10.1109/ICDAR.2013.190
Kozielski M, Rybach D, Hahn S, Schlüter R, Ney H (2013) Open vocabulary handwriting recognition using combined word-level and character-level language models. In: 2013 IEEE International conference on acoustics, speech and signal processing (IEEE), pp 8257–8261. https://doi.org/10.1109/ICASSP.2013.6639275
Krishnan P, Dutta K, Jawahar C (2018) Word spotting and recognition using deep embedding. In: 2018 13th IAPR International workshop on document analysis systems (DAS) (IEEE), pp 1–6. https://doi.org/10.1109/DAS.2018.70
Kumari L, Singh S, Rathore V, Sharma A (2022) A lexicon and depth-wise separable convolution based handwritten text recognition system. arXiv:2207.04651
Li M, Lv T, Cui L, Lu Y, Florencio D, Zhang C, Li Z, Wei F (2021) Trocr: transformer-based optical character recognition with pre-trained models arXiv:2109.10282
Li H, Wang P, Shen C (2017) Towards end-to-end text spotting with convolutional recurrent neural networks. In: Proceedings of the IEEE international conference on computer vision, pp 5238–5246
Ling W, Trancoso I, Dyer C, Black AW (2015) Character-based neural machine translation. arXiv:1511.04586. https://doi.org/10.48550/arXiv.1511.04586
Ling H, Wu J, Huang J, Chen J, Li P (2020) Attention-based convolutional neural network for deep face recognition. Multimed Tools Appl 79(9):5595–5616. https://doi.org/10.1007/s11042-019-08422-2
Article Google Scholar
Liu H, Dai Z, So D, Le QV (2021) Pay attention to mlps. Adv Neural Inf Process Syst 34:9204–9215
Liu Z, Lin Y, Cao Y, Hu H, Wei Y, Zhang Z, Lin S, Guo B (2021) Swin transformer: hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp 10,012–10,022
Liu Y, Ott M, Goyal N, Du J, Joshi M, Chen D, Levy O, Lewis M, Zettlemoyer L, Stoyanov V (2019) Roberta: a robustly optimized bert pretraining approach. arXiv:1907.11692. https://doi.org/10.48550/arXiv.1907.11692
Liwicki M, Graves A, Bunke H (2012) Neural networks for handwriting recognition. In: Computational intelligence paradigms in advanced pattern classification (Springer), pp 5–24. https://doi.org/10.1007/978-3-642-24049-2_2
Liwicki M, Graves A, Fernàndez S, Bunke H, Schmidhuber J (2007) A novel approach to on-line handwriting recognition based on bidirectional long short-term memory networks. In: Proceedings of the 9th Int’l Conf ICDAR
Louradour J, Kermorvant C (2014) Curriculum learning for handwritten text line recognition. In: 2014 11th IAPR International workshop on document analysis systems (IEEE), pp 56–60. https://doi.org/10.1109/DAS.2014.38
Mahmoud SA, Ahmad I, Al-Khatib WG, Alshayeb M, Parvez MT, Märgner V, Fink GA (2014) Khatt: an open arabic offline handwritten text database. Pattern Recogn 47(3):1096–1112. https://doi.org/10.1016/j.patcog.2013.08.009
Article Google Scholar
Mahmoud SA, Ahmad I, Alshayeb M, Al-Khatib WG, Parvez MT, Fink GA, Märgner V, El Abed H (2012) Khatt:, Arabic offline handwritten text database. In: 2012 International Conference on Frontiers in Handwriting Recognition (IEEE), pp 449–454. https://doi.org/10.1109/ICFHR.2012.224
Mallick M, Biswas S, Das AK, Saha HN, Chakrabarti A, Deb N (2022) Deep learning based automated disease detection and pest classification in indian mung bean. Multimed Tools Appl, pp 1–25. https://doi.org/10.1007/s11042-022-13673-7
Manuel Vargas V, Gutiérrez PA, Hervás-Martínez C (2022) Unimodal regularisation based on beta distribution for deep ordinal regression. Pattern Recogn 122(108):310. https://doi.org/10.1016/j.patcog.2021.108310
Google Scholar
Marti UV, Bunke H (2002) The iam-database: an english sentence database for offline handwriting recognition. Int J Doc Anal Recognit 5(1):39–46. https://doi.org/10.1007/s100320200071
Article Google Scholar
Michael J, Labahn R, Grüning T, Zöllner J (2019) Evaluating sequence-to-sequence models for handwritten text recognition. In: 2019International Conference on Document Analysis and Recognition (ICDAR) (IEEE), pp 1286–1293. https://doi.org/10.1109/ICDAR.2019.00208
Pascanu R, Mikolov T, Bengio Y (2013) On the difficulty of training recurrent neural networks. In: Proceedings of the 30th International conference on machine learning, proceedings of machine learning research, vol 28, ed. by Dasgupta, S., McAllester, D. (PMLR, Atlanta, Georgia, USA), pp 1310–1318. https://proceedings.mlr.press/v28/pascanu13.html
Pham V, Bluche T, Kermorvant C, Louradour J. (2014) Dropout improves recurrent neural networks for handwriting recognition. In: 2014 14th international conference on frontiers in handwriting recognition (IEEE), pp 285–290. https://doi.org/10.1109/ICFHR.2014.55
Poulos J, Valle R (2021) Character-based handwritten text transcription with attention networks. Neural Comput Appl 33(16):10,563–10,573. https://doi.org/10.1007/s00521-021-05813-1
Article Google Scholar
Puigcerver J (2017) Are multidimensional recurrent layers really necessary for handwritten text recognition?. In: 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR), vol 1 (IEEE), pp 67–72. https://doi.org/10.1109/ICDAR.2017.20
Puigcerver J, Martin-Albo D, Villegas M (2016) laia: a deep learning toolkit for htr
Rajagopal A, Nirmala V (2021) Convolutional gated mlp: combining convolutions & gmlp. arXiv:2111.03940
Seddati O, Dupont S, Mahmoudi S, Dutoit T (2022) Transformers and cnns both beat humans on sbir. arXiv:2209.06629
Shen J, Robertson N (2021) Bbas: towards large scale effective ensemble adversarial attacks against deep neural network learning. Inf Sci 569:469–478. https://doi.org/10.1016/j.ins.2020.11.026
Article Google Scholar
Shi B, Bai X, Yao C (2016) An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition. IEEE Trans Pattern Anal Mach Intell 39(11):2298–2304. https://doi.org/10.1109/TPAMI.2016.2646371
Article Google Scholar
Stuner B, Chatelain C, Paquet T (2020) Handwriting recognition using cohort of lstm and lexicon verification with extremely large lexicon. Multimed Tools Appl 79(45):34,407–34,427. https://doi.org/10.1007/s11042-020-09198-6
Article Google Scholar
Sueiras J, Ruiz V, Sanchez A, Velez JF (2018) Offline continuous handwriting recognition using sequence to sequence neural networks. Neurocomputing 289:119–128. https://doi.org/10.1016/j.neucom.2018.02.008
Article Google Scholar
Touvron H, Cord M, Douze M, Massa F, Sablayrolles A, Jégou H (2021) Training data-efficient image transformers & distillation through attention. In: International Conference on Machine Learning (PMLR), pp 10,347–10,357
Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser I, Polosukhin L (2017) Attention is all you need, vol 30 (Curran Associates Inc.)
Villegas M, Romero V, Sánchez JA (2015) On the modification of binarization algorithms to retain grayscale information for handwritten text recognition. In: Iberian conference on pattern recognition and image analysis (Springer), pp 208–215. https://doi.org/10.1007/978-3-319-19390-8_24
Voigtlaender P, Doetsch P, Ney H (2016) Handwriting recognition with large multidimensional long short-term memory recurrent neural networks. In: 2016 15th international conference on frontiers in handwriting recognition (ICFHR) (IEEE), pp 228–233. https://doi.org/10.1109/ICFHR.2016.0052
Voigtlaender P, Doetsch P, Wiesler S, Schlüter R, Ney H (2015) Sequence-discriminative training of recurrent neural networks. In: 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (IEEE), pp 2100–2104. https://doi.org/10.1109/ICASSP.2015.7178341
Wang L, Qian X, Zhang Y, Shen J, Cao X (2020) Enhancing sketch-based image retrieval by cnn semantic re-ranking. IEEE Trans Cybern 50 (7):3330–3342. https://doi.org/10.1109/TCYB.2019.2894498
Article Google Scholar
Wigington C, Stewart S, Davis B, Barrett B, Price B, Cohen S (2017) Data augmentation for recognition of handwritten words and lines using a CNN-LSTM network. In: 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR), vol 1 (IEEE), pp 639–645. https://doi.org/10.1109/ICDAR.2017.110
Yousef M, Hussain KF, Mohammed US (2020) Accurate, data-efficient, unconstrained text recognition with convolutional neural networks. Pattern Recogn 108(107):482. https://doi.org/10.1016/j.patcog.2020.107482
Google Scholar
de Sousa Neto AF, Bezerra BLD, Toselli AH, Lima EB (2020) HTR-Flor: a deep learning system for offline handwritten text recognition. In: 2020 33rd SIBGRAPI Conference on Graphics Patterns and Images (SIBGRAPI), pp 54–61. https://doi.org/10.1109/SIBGRAPI51738.2020.00016

Download references

Funding

The authors did not receive support from any organization for the submitted work.

Author information

Authors and Affiliations

LaRIA Lab., Computer Sc. Dpt, University of Jijel, BP 98, Ouled Aissa, Jijel, 18000, Algeria
Mouad Bensouilah, Mokhtar Taffar & Mohamed Nadjib Zennir

Authors

Mouad Bensouilah
View author publications
You can also search for this author in PubMed Google Scholar
Mokhtar Taffar
View author publications
You can also search for this author in PubMed Google Scholar
Mohamed Nadjib Zennir
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Mouad Bensouilah.

Ethics declarations

Conflict of Interests

The authors declare no conflicts of interest.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Mokhtar Taffar and Mohamed Nadjib Zennir are contributed equally to this work.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Bensouilah, M., Taffar, M. & Zennir, M.N. gMLP guided deep networks model for character-based handwritten text transcription. Multimed Tools Appl 83, 13557–13575 (2024). https://doi.org/10.1007/s11042-023-15293-1

Download citation

Received: 08 August 2022
Revised: 27 September 2022
Accepted: 06 April 2023
Published: 07 July 2023
Issue Date: February 2024
DOI: https://doi.org/10.1007/s11042-023-15293-1

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

gMLP guided deep networks model for character-based handwritten text transcription

Abstract

Access this article

Similar content being viewed by others

Offline Handwritten Devanagari Word Recognition Using CNN-RNN-CTC

Deep Learning Based Handwritten Chinese Character and Text Recognition

Transformer for Handwritten Text Recognition Using Bidirectional Post-decoding

Data Availability

Code Availability

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of Interests

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

gMLP guided deep networks model for character-based handwritten text transcription

Abstract

Access this article

Similar content being viewed by others

Offline Handwritten Devanagari Word Recognition Using CNN-RNN-CTC

Deep Learning Based Handwritten Chinese Character and Text Recognition

Transformer for Handwritten Text Recognition Using Bidirectional Post-decoding

Data Availability

Code Availability

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of Interests

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation