Handwritten English word recognition using a deep learning based object detection architecture

Mondal, Riktim; Malakar, Samir; Barney Smith, Elisa H.; Sarkar, Ram

doi:10.1007/s11042-021-11425-7

Handwritten English word recognition using a deep learning based object detection architecture

Published: 20 September 2021

Volume 81, pages 975–1000, (2022)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

Riktim Mondal¹,
Samir Malakar ORCID: orcid.org/0000-0003-4217-2372²,
Elisa H. Barney Smith³ &
…
Ram Sarkar¹

1356 Accesses
20 Citations
1 Altmetric
Explore all metrics

Abstract

Handwriting is used to distribute information among people. To access this information for further analysis the page needs to be optically scanned and converted to machine recognizable form. Due to unconstrained writing styles along with connected and overlapping characters, handwriting recognition remains a challenging task. Most of the methods in the literature use lexicon-based approaches and train their models on large datasets having near 50 K word samples to achieve good results. This results in high computational requirements. While these models use around 50 K words in their dictionary when recognizing handwritten English text, the actual number of words in the dictionary is much higher than this. To this end, we propose a handwriting recognition technique to recognize handwritten English text based on a YOLOv3 object recognition model that is lexicon-free and that performs sequential character detection and identification with a low number of training samples (only 1200 word images). This model works well without any dependency on writers’ style of writing. This is tested on the IAM dataset and it is able to achieve 29.21% Word Error Rate and 9.53% Character Error Rate without a predefined vocabulary, which is on par with the state-of-the-art lexicon-based word recognition models.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 4

Deep Learning: A Comprehensive Overview on Techniques, Taxonomy, Applications and Research Directions

Article 18 August 2021

ImageNet Large Scale Visual Recognition Challenge

Article 11 April 2015

A detector for page-level handwritten music object recognition based on deep learning

Article 20 January 2023

References

AlexeyAB GUI for marking bounded boxes of objects in images for training neural network Yolo v3 and v2. https://github.com/AlexeyAB/Yolo_mark. Accessed 7 Sept 2021
Almazán J, Gordo A, Fornés A, Valveny E (2014) Word spotting and recognition with embedded attributes. IEEE Trans Pattern Anal Mach Intell 36:2552–2566
Article Google Scholar
Azeem SA, Ahmed H (2013) Effective technique for the recognition of offline Arabic handwritten words using hidden Markov models. Int J Doc Anal Recognit 16:399–412
Article Google Scholar
Bera SK, Kar R, Saha S et al (2018) A one-pass approach for slope and slant estimation of tri-script handwritten words. J Intell Syst 29:688–702. https://doi.org/10.1515/jisys-2018-0105
Article Google Scholar
Bera SK, Chakrabarti A, Lahiri S et al (2019) Normalization of unconstrained handwritten words in terms of slope and slant correction. Pattern Recognit Lett 128:488–495
Article Google Scholar
Bhattacharya R, Malakar S, Ghosh S et al (2020) Understanding contents of filled-in Bangla form images. Multimed Tools Appl 80:3529–3570
Article Google Scholar
Bhattacharya R, Malakar S, Schwenker F, Sarkar R (2021) Fuzzy-based pseudo segmentation approach for handwritten word recognition using a sequence to sequence model with attention. Pattern recognition. ICPR International Workshops and Challenges: virtual event, January 10–15, 2021, Proceedings, Part II. Springer, Cham, pp 582–596
Chapter Google Scholar
Bluche T, Ney H, Kermorvant C (2014) A comparison of sequence-trained deep neural networks and recurrent neural networks optical modeling for handwriting recognition. In: International Conference on Statistical Language and Speech Processing. Springer, pp 199–210
Chakraborty A, De R, Malakar S, et al (2021) Handwritten digit string recognition using deep autoencoder based segmentation and resnet based recognition approach. In: 2020 25th International Conference on Pattern Recognition (ICPR). IEEE, pp 7737–7742
Doetsch P, Kozielski M, Ney H (2014) Fast and Robust Training of Recurrent Neural Networks for Offline Handwriting Recognition. In: Proceedings of the 14th International Conference on Frontiers in Handwriting Recognition. IEEE, pp 279–284
Ghosh S, Bhattacharya R, Majhi S, et al (2018) Textual Content Retrieval from Filled-in Form Images. In: Proceedings of the Workshop on Document Analysis and Recognition. Springer, pp 27–37
Girshick R, Donahue J, Darrell T, Malik J (2014) Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of IEEE Conference on Computer Vsion and Pattern Recognition. pp 580–587
Girshick R (2015) Fast R-CNN. In: Proceedings of the 2015 IEEE International Conference on Computer Vision (ICCV). pp 1440–1448
Graves A, Fernández S, Gomez F, Schmidhuber J (2006) Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: Proceedings of the 23rd International Conference on Machine learning. ACM, pp 369–376
Graves A, Liwicki M, Fernández S et al (2008) A novel connectionist system for unconstrained handwriting recognition. IEEE Trans Pattern Anal Mach Intell 31:855–868
Article Google Scholar
Graves A, Schmidhuber J (2009) Offline handwriting recognition with multidimensional recurrent neural networks. In: Proceedings of the Advances in Neural Information Processing Systems. pp 545–552
Grosicki E, El-Abed H (2011) ICDAR 2011 - French Handwriting Recognition Competition. In: Proceedings of the International Conference on Document Analysis and Recognition. IEEE, pp 1459–1463
Gui L, Liang X, Chang X, Hauptmann AG (2018) Adaptive context-aware reinforced agent for handwritten text recognition. In: British Machine Vision Conference. p 207
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. pp 770–778
Hong W-C, Fan G-F (2019) Hybrid empirical mode decomposition with support vector regression model for short term load forecasting. Energies 12:1093
Article Google Scholar
Kirillov A, He K, Girshick R, Dollár P (2017) A unified architecture for instance and semantic segmentation. http://presentations.cocodataset.org/COCO17-Stuff-FAIR.pdf. Accessed 7 Sept 2021
Lin T-Y, Maire M, Belongie S, et al (2014) Microsoft COCO: Common Objects in Context. In: European Conference on Computer Vision. Springer, pp 740–755
Liu W, Anguelov D, Erhan D, et al (2016) SSD: Single shot MultiBox detector. In: European Conference on Computer Vision. Springer, pp 21–37
Lin T-Y, Goyal P, Girshick R, et al (2017) Focal loss for dense object detection. In: Proceedings of the IEEE International Conference on Computer Vision. pp 2980–2988
Majid N, Smith EHB (2019) Segmentation-free Bangla offline handwriting recognition using sequential detection of characters and diacritics with a Faster R-CNN. In: 2019 International Conference on Document Analysis and Recognition (ICDAR). IEEE, pp 228–233
Majumder S, Ghosh S, Malakar S et al (2021) A voting-based technique for word spotting in handwritten document images. Multimed Tools Appl. https://doi.org/10.1007/s11042-020-10363-0
Article Google Scholar
Malakar S, Ghosh P, Sarkar R, et al (2011) An improved offline handwritten character segmentation algorithm for Bangla script. In: Proceedings of the 5th Indian International Conference on Artificial Intelligence (IICAI 2011)
Malakar S, Ghosh M, Sarkar R, Nasipuri M (2020) Development of a two-stage segmentation-based word searching method for handwritten document images. J Intell Syst. https://doi.org/10.1515/jisys-2017-0384
Article Google Scholar
Malakar S, Sarkar R, Basu S et al (2020) An image database of handwritten Bangla words with automatic benchmarking facilities for character segmentation algorithms. Neural Comput Appl. https://doi.org/10.1007/s00521-020-04981-w
Article Google Scholar
Marti U-V, Bunke H (2001) Using a statistical language model to improve the performance of an HMM-based cursive handwriting recognition system. Hidden Markov models: applications in computer vision. World Scientific, Singapore, pp 65–90
Chapter Google Scholar
Marti UV, Bunke H (2002) The IAM-database: an English sentence database for offline handwriting recognition. Int J Doc Anal Recognit 5:39–46
Article Google Scholar
Menasri F, Louradour J, Bianne-Bernard A-L, Kermorvant C (2012) The A2iA French handwriting recognition system at the Rimes-ICDAR2011 competition. In: Proceedings of the Document Recognition and Retrieval XIX. International Society for Optics and Photonics, p 82970Y
Neubeck A, Van Gool L (2006) Efficient non-maximum suppression. In: Proceedings of the 18th International Conference on Pattern Recognition (ICPR’06). IEEE, pp 850–855
Pal U, Roy RK, Kimura F (2012) Multi-lingual city name recognition for Indian postal automation. In: Proceedings of the International Workshop on Frontiers in Handwriting Recognition. IEEE, pp 169–173
Pham V, Bluche T, Kermorvant C, Louradour J (2014) Dropout improves recurrent neural networks for handwriting recognition. In: Proceedings of the 14th International Conference on Frontiers in Handwriting Recognition. IEEE, pp 285–290
Poznanski A, Wolf L (2016) CNN-N-gram for handwriting word recognition. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition. pp 2305–2314
Redmon J YOLO: real-time object detection. https://pjreddie.com/darknet/yolo/. Accessed 7 Sept 2021
Redmon J, Farhadi A (2018) Yolov3: An incremental improvement, arXiv preprint arXiv:1804.02767
Ren S, He K, Girshick R, Sun J (2015) Faster R-CNN: Towards real-time object detection with region proposal networks. In: Proceedings of the Advances in neural information processing systems. pp 91–99
Rezatofighi H, Tsoi N, Gwak J, et al (2019) Generalized intersection over union: A metric and a loss for bounding box regression. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. pp 658–666
Sarkar R, Malakar S, Das N et al (2011) Word extraction and character segmentation from text lines of unconstrained handwritten Bangla document images. J Intell Syst 20:227–260. https://doi.org/10.1515/JISYS.2011.013
Article Google Scholar
Singh PK, Mahanta S, Malakar S, et al (2014) Development of a page segmentation technique for Bangla documents printed in italic style. In: Proceedings of the 2nd International Conference on Business and Information Management (ICBIM 2014)
Singh S, Kariveda T, Gupta J Das, Bhattacharya K (2015) Handwritten words recognition for legal amounts of bank cheques in English script. In: Proceedings singh 2015 handwritten. 2015 8th International Conference on Advances in Pattern Recognition (ICAPR). IEEE, pp 1–5
Shi B, Bai X, Yao C (2016) An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition. IEEE Trans Pattern Anal Mach Intell 39:2298–2304
Article Google Scholar
Stahlberg F, Vogel S (2015) The QCRI recognition system for handwritten Arabic. In: International Conference on Image Analysis and Processing. Springer, pp 276–286
Sueiras J, Ruiz V, Sanchez A, Velez JF (2018) Offline continuous handwriting recognition using sequence to sequence neural networks. Neurocomputing 289:119–128
Article Google Scholar
Szegedy C, Ioffe S, Vanhoucke V, Alemi AA (2017) Inception-v4, inception-ResNet and the impact of residual connections on learning. In: Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence. pp 4278–4284
Wu X, Chen Q, You J, Xiao Y (2019) Unconstrained offline handwritten word recognition by position embedding integrated resnets model. IEEE Signal Process Lett 26:597–601
Article Google Scholar
Zhang Y, Nie S, Liu W, et al (2019) Sequence-to-sequence domain adaptation network for robust text image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. pp 2740–2749
Zhang Z, Ding S, Sun Y (2020) A support vector regression model hybridized with chaotic krill herd algorithm and empirical mode decomposition for regression task. Neurocomputing 410:185–201
Article Google Scholar

Download references

Acknowledgements

We would like to thank the Centre for Microprocessor Applications for Training, Education and Research (CMATER) research laboratory of the Computer Science and Engineering Department, Jadavpur University, Kolkata, India for providing us the infrastructural support.

Author information

Authors and Affiliations

Department of Computer Science and Engineering, Jadavpur University, Kolkata, India
Riktim Mondal & Ram Sarkar
Department of Computer Science, Asutosh College, Kolkata, India
Samir Malakar
Department of Electrical and Computer Engineering, Boise State University, Idaho, Boise, USA
Elisa H. Barney Smith

Authors

Riktim Mondal
View author publications
You can also search for this author in PubMed Google Scholar
Samir Malakar
View author publications
You can also search for this author in PubMed Google Scholar
Elisa H. Barney Smith
View author publications
You can also search for this author in PubMed Google Scholar
Ram Sarkar
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Samir Malakar.

Ethics declarations

Conflict of interest

We declare that we have no conflict of interest.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Mondal, R., Malakar, S., Barney Smith, E.H. et al. Handwritten English word recognition using a deep learning based object detection architecture. Multimed Tools Appl 81, 975–1000 (2022). https://doi.org/10.1007/s11042-021-11425-7

Download citation

Received: 27 September 2020
Revised: 28 July 2021
Accepted: 10 August 2021
Published: 20 September 2021
Issue Date: January 2022
DOI: https://doi.org/10.1007/s11042-021-11425-7

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Handwritten English word recognition using a deep learning based object detection architecture

Abstract

Access this article

Similar content being viewed by others

Deep Learning: A Comprehensive Overview on Techniques, Taxonomy, Applications and Research Directions

ImageNet Large Scale Visual Recognition Challenge

A detector for page-level handwritten music object recognition based on deep learning

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Handwritten English word recognition using a deep learning based object detection architecture

Abstract

Access this article

Similar content being viewed by others

Deep Learning: A Comprehensive Overview on Techniques, Taxonomy, Applications and Research Directions

ImageNet Large Scale Visual Recognition Challenge

A detector for page-level handwritten music object recognition based on deep learning

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation