Skip to main content
Log in

Handwritten English word recognition using a deep learning based object detection architecture

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

Handwriting is used to distribute information among people. To access this information for further analysis the page needs to be optically scanned and converted to machine recognizable form. Due to unconstrained writing styles along with connected and overlapping characters, handwriting recognition remains a challenging task. Most of the methods in the literature use lexicon-based approaches and train their models on large datasets having near 50 K word samples to achieve good results. This results in high computational requirements. While these models use around 50 K words in their dictionary when recognizing handwritten English text, the actual number of words in the dictionary is much higher than this. To this end, we propose a handwriting recognition technique to recognize handwritten English text based on a YOLOv3 object recognition model that is lexicon-free and that performs sequential character detection and identification with a low number of training samples (only 1200 word images). This model works well without any dependency on writers’ style of writing. This is tested on the IAM dataset and it is able to achieve 29.21% Word Error Rate and 9.53% Character Error Rate without a predefined vocabulary, which is on par with the state-of-the-art lexicon-based word recognition models.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

References

  1. AlexeyAB GUI for marking bounded boxes of objects in images for training neural network Yolo v3 and v2. https://github.com/AlexeyAB/Yolo_mark. Accessed 7 Sept 2021

  2. Almazán J, Gordo A, Fornés A, Valveny E (2014) Word spotting and recognition with embedded attributes. IEEE Trans Pattern Anal Mach Intell 36:2552–2566

    Article  Google Scholar 

  3. Azeem SA, Ahmed H (2013) Effective technique for the recognition of offline Arabic handwritten words using hidden Markov models. Int J Doc Anal Recognit 16:399–412

    Article  Google Scholar 

  4. Bera SK, Kar R, Saha S et al (2018) A one-pass approach for slope and slant estimation of tri-script handwritten words. J Intell Syst 29:688–702. https://doi.org/10.1515/jisys-2018-0105

    Article  Google Scholar 

  5. Bera SK, Chakrabarti A, Lahiri S et al (2019) Normalization of unconstrained handwritten words in terms of slope and slant correction. Pattern Recognit Lett 128:488–495

    Article  Google Scholar 

  6. Bhattacharya R, Malakar S, Ghosh S et al (2020) Understanding contents of filled-in Bangla form images. Multimed Tools Appl 80:3529–3570

    Article  Google Scholar 

  7. Bhattacharya R, Malakar S, Schwenker F, Sarkar R (2021) Fuzzy-based pseudo segmentation approach for handwritten word recognition using a sequence to sequence model with attention. Pattern recognition. ICPR International Workshops and Challenges: virtual event, January 10–15, 2021, Proceedings, Part II. Springer, Cham, pp 582–596

    Chapter  Google Scholar 

  8. Bluche T, Ney H, Kermorvant C (2014) A comparison of sequence-trained deep neural networks and recurrent neural networks optical modeling for handwriting recognition. In: International Conference on Statistical Language and Speech Processing. Springer, pp 199–210

  9. Chakraborty A, De R, Malakar S, et al (2021) Handwritten digit string recognition using deep autoencoder based segmentation and resnet based recognition approach. In: 2020 25th International Conference on Pattern Recognition (ICPR). IEEE, pp 7737–7742

  10. Doetsch P, Kozielski M, Ney H (2014) Fast and Robust Training of Recurrent Neural Networks for Offline Handwriting Recognition. In: Proceedings of the 14th International Conference on Frontiers in Handwriting Recognition. IEEE, pp 279–284

  11. Ghosh S, Bhattacharya R, Majhi S, et al (2018) Textual Content Retrieval from Filled-in Form Images. In: Proceedings of the Workshop on Document Analysis and Recognition. Springer, pp 27–37

  12. Girshick R, Donahue J, Darrell T, Malik J (2014) Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of IEEE Conference on Computer Vsion and Pattern Recognition. pp 580–587

  13. Girshick R (2015) Fast R-CNN. In: Proceedings of the 2015 IEEE International Conference on Computer Vision (ICCV). pp 1440–1448

  14. Graves A, Fernández S, Gomez F, Schmidhuber J (2006) Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: Proceedings of the 23rd International Conference on Machine learning. ACM, pp 369–376

  15. Graves A, Liwicki M, Fernández S et al (2008) A novel connectionist system for unconstrained handwriting recognition. IEEE Trans Pattern Anal Mach Intell 31:855–868

    Article  Google Scholar 

  16. Graves A, Schmidhuber J (2009) Offline handwriting recognition with multidimensional recurrent neural networks. In: Proceedings of the Advances in Neural Information Processing Systems. pp 545–552

  17. Grosicki E, El-Abed H (2011) ICDAR 2011 - French Handwriting Recognition Competition. In: Proceedings of the International Conference on Document Analysis and Recognition. IEEE, pp 1459–1463

  18. Gui L, Liang X, Chang X, Hauptmann AG (2018) Adaptive context-aware reinforced agent for handwritten text recognition. In: British Machine Vision Conference. p 207

  19. He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. pp 770–778

  20. Hong W-C, Fan G-F (2019) Hybrid empirical mode decomposition with support vector regression model for short term load forecasting. Energies 12:1093

    Article  Google Scholar 

  21. Kirillov A, He K, Girshick R, Dollár P (2017) A unified architecture for instance and semantic segmentation. http://presentations.cocodataset.org/COCO17-Stuff-FAIR.pdf.  Accessed 7 Sept 2021

  22. Lin T-Y, Maire M, Belongie S, et al (2014) Microsoft COCO: Common Objects in Context. In: European Conference on Computer Vision. Springer, pp 740–755

  23. Liu W, Anguelov D, Erhan D, et al (2016) SSD: Single shot MultiBox detector. In: European Conference on Computer Vision. Springer, pp 21–37

  24. Lin T-Y, Goyal P, Girshick R, et al (2017) Focal loss for dense object detection. In: Proceedings of the IEEE International Conference on Computer Vision. pp 2980–2988

  25. Majid N, Smith EHB (2019) Segmentation-free Bangla offline handwriting recognition using sequential detection of characters and diacritics with a Faster R-CNN. In: 2019 International Conference on Document Analysis and Recognition (ICDAR). IEEE, pp 228–233

  26. Majumder S, Ghosh S, Malakar S et al (2021) A voting-based technique for word spotting in handwritten document images. Multimed Tools Appl. https://doi.org/10.1007/s11042-020-10363-0

    Article  Google Scholar 

  27. Malakar S, Ghosh P, Sarkar R, et al (2011) An improved offline handwritten character segmentation algorithm for Bangla script. In: Proceedings of the 5th Indian International Conference on Artificial Intelligence (IICAI 2011)

  28. Malakar S, Ghosh M, Sarkar R, Nasipuri M (2020) Development of a two-stage segmentation-based word searching method for handwritten document images. J Intell Syst. https://doi.org/10.1515/jisys-2017-0384

    Article  Google Scholar 

  29. Malakar S, Sarkar R, Basu S et al (2020) An image database of handwritten Bangla words with automatic benchmarking facilities for character segmentation algorithms. Neural Comput Appl. https://doi.org/10.1007/s00521-020-04981-w

    Article  Google Scholar 

  30. Marti U-V, Bunke H (2001) Using a statistical language model to improve the performance of an HMM-based cursive handwriting recognition system. Hidden Markov models: applications in computer vision. World Scientific, Singapore, pp 65–90

    Chapter  Google Scholar 

  31. Marti UV, Bunke H (2002) The IAM-database: an English sentence database for offline handwriting recognition. Int J Doc Anal Recognit 5:39–46

    Article  Google Scholar 

  32. Menasri F, Louradour J, Bianne-Bernard A-L, Kermorvant C (2012) The A2iA French handwriting recognition system at the Rimes-ICDAR2011 competition. In: Proceedings of the Document Recognition and Retrieval XIX. International Society for Optics and Photonics, p 82970Y

  33. Neubeck A, Van Gool L (2006) Efficient non-maximum suppression. In: Proceedings of the 18th International Conference on Pattern Recognition (ICPR’06). IEEE, pp 850–855

  34. Pal U, Roy RK, Kimura F (2012) Multi-lingual city name recognition for Indian postal automation. In: Proceedings of the International Workshop on Frontiers in Handwriting Recognition. IEEE, pp 169–173

  35. Pham V, Bluche T, Kermorvant C, Louradour J (2014) Dropout improves recurrent neural networks for handwriting recognition. In: Proceedings of the 14th International Conference on Frontiers in Handwriting Recognition. IEEE, pp 285–290

  36. Poznanski A, Wolf L (2016) CNN-N-gram for handwriting word recognition. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition. pp 2305–2314

  37. Redmon J YOLO: real-time object detection. https://pjreddie.com/darknet/yolo/. Accessed 7 Sept  2021

  38. Redmon J, Farhadi A (2018) Yolov3: An incremental improvement, arXiv preprint arXiv:1804.02767

  39. Ren S, He K, Girshick R, Sun J (2015) Faster R-CNN: Towards real-time object detection with region proposal networks. In: Proceedings of the Advances in neural information processing systems. pp 91–99

  40. Rezatofighi H, Tsoi N, Gwak J, et al (2019) Generalized intersection over union: A metric and a loss for bounding box regression. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. pp 658–666

  41. Sarkar R, Malakar S, Das N et al (2011) Word extraction and character segmentation from text lines of unconstrained handwritten Bangla document images. J Intell Syst 20:227–260. https://doi.org/10.1515/JISYS.2011.013

    Article  Google Scholar 

  42. Singh PK, Mahanta S, Malakar S, et al (2014) Development of a page segmentation technique for Bangla documents printed in italic style. In: Proceedings of the 2nd International Conference on Business and Information Management (ICBIM 2014)

  43. Singh S, Kariveda T, Gupta J Das, Bhattacharya K (2015) Handwritten words recognition for legal amounts of bank cheques in English script. In: Proceedings singh 2015 handwritten. 2015 8th International Conference on Advances in Pattern Recognition (ICAPR). IEEE, pp 1–5

  44. Shi B, Bai X, Yao C (2016) An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition. IEEE Trans Pattern Anal Mach Intell 39:2298–2304

    Article  Google Scholar 

  45. Stahlberg F, Vogel S (2015) The QCRI recognition system for handwritten Arabic. In: International Conference on Image Analysis and Processing. Springer, pp 276–286

  46. Sueiras J, Ruiz V, Sanchez A, Velez JF (2018) Offline continuous handwriting recognition using sequence to sequence neural networks. Neurocomputing 289:119–128

    Article  Google Scholar 

  47. Szegedy C, Ioffe S, Vanhoucke V, Alemi AA (2017) Inception-v4, inception-ResNet and the impact of residual connections on learning. In: Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence. pp 4278–4284

  48. Wu X, Chen Q, You J, Xiao Y (2019) Unconstrained offline handwritten word recognition by position embedding integrated resnets model. IEEE Signal Process Lett 26:597–601

    Article  Google Scholar 

  49. Zhang Y, Nie S, Liu W, et al (2019) Sequence-to-sequence domain adaptation network for robust text image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. pp 2740–2749

  50. Zhang Z, Ding S, Sun Y (2020) A support vector regression model hybridized with chaotic krill herd algorithm and empirical mode decomposition for regression task. Neurocomputing 410:185–201

    Article  Google Scholar 

Download references

Acknowledgements

We would like to thank the Centre for Microprocessor Applications for Training, Education and Research (CMATER) research laboratory of the Computer Science and Engineering Department, Jadavpur University, Kolkata, India for providing us the infrastructural support.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Samir Malakar.

Ethics declarations

Conflict of interest

We declare that we have no conflict of interest.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Mondal, R., Malakar, S., Barney Smith, E.H. et al. Handwritten English word recognition using a deep learning based object detection architecture. Multimed Tools Appl 81, 975–1000 (2022). https://doi.org/10.1007/s11042-021-11425-7

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-021-11425-7

Keywords

Navigation