Abstract
Handwriting is used to distribute information among people. To access this information for further analysis the page needs to be optically scanned and converted to machine recognizable form. Due to unconstrained writing styles along with connected and overlapping characters, handwriting recognition remains a challenging task. Most of the methods in the literature use lexicon-based approaches and train their models on large datasets having near 50 K word samples to achieve good results. This results in high computational requirements. While these models use around 50 K words in their dictionary when recognizing handwritten English text, the actual number of words in the dictionary is much higher than this. To this end, we propose a handwriting recognition technique to recognize handwritten English text based on a YOLOv3 object recognition model that is lexicon-free and that performs sequential character detection and identification with a low number of training samples (only 1200 word images). This model works well without any dependency on writers’ style of writing. This is tested on the IAM dataset and it is able to achieve 29.21% Word Error Rate and 9.53% Character Error Rate without a predefined vocabulary, which is on par with the state-of-the-art lexicon-based word recognition models.
Similar content being viewed by others
References
AlexeyAB GUI for marking bounded boxes of objects in images for training neural network Yolo v3 and v2. https://github.com/AlexeyAB/Yolo_mark. Accessed 7 Sept 2021
Almazán J, Gordo A, Fornés A, Valveny E (2014) Word spotting and recognition with embedded attributes. IEEE Trans Pattern Anal Mach Intell 36:2552–2566
Azeem SA, Ahmed H (2013) Effective technique for the recognition of offline Arabic handwritten words using hidden Markov models. Int J Doc Anal Recognit 16:399–412
Bera SK, Kar R, Saha S et al (2018) A one-pass approach for slope and slant estimation of tri-script handwritten words. J Intell Syst 29:688–702. https://doi.org/10.1515/jisys-2018-0105
Bera SK, Chakrabarti A, Lahiri S et al (2019) Normalization of unconstrained handwritten words in terms of slope and slant correction. Pattern Recognit Lett 128:488–495
Bhattacharya R, Malakar S, Ghosh S et al (2020) Understanding contents of filled-in Bangla form images. Multimed Tools Appl 80:3529–3570
Bhattacharya R, Malakar S, Schwenker F, Sarkar R (2021) Fuzzy-based pseudo segmentation approach for handwritten word recognition using a sequence to sequence model with attention. Pattern recognition. ICPR International Workshops and Challenges: virtual event, January 10–15, 2021, Proceedings, Part II. Springer, Cham, pp 582–596
Bluche T, Ney H, Kermorvant C (2014) A comparison of sequence-trained deep neural networks and recurrent neural networks optical modeling for handwriting recognition. In: International Conference on Statistical Language and Speech Processing. Springer, pp 199–210
Chakraborty A, De R, Malakar S, et al (2021) Handwritten digit string recognition using deep autoencoder based segmentation and resnet based recognition approach. In: 2020 25th International Conference on Pattern Recognition (ICPR). IEEE, pp 7737–7742
Doetsch P, Kozielski M, Ney H (2014) Fast and Robust Training of Recurrent Neural Networks for Offline Handwriting Recognition. In: Proceedings of the 14th International Conference on Frontiers in Handwriting Recognition. IEEE, pp 279–284
Ghosh S, Bhattacharya R, Majhi S, et al (2018) Textual Content Retrieval from Filled-in Form Images. In: Proceedings of the Workshop on Document Analysis and Recognition. Springer, pp 27–37
Girshick R, Donahue J, Darrell T, Malik J (2014) Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of IEEE Conference on Computer Vsion and Pattern Recognition. pp 580–587
Girshick R (2015) Fast R-CNN. In: Proceedings of the 2015 IEEE International Conference on Computer Vision (ICCV). pp 1440–1448
Graves A, Fernández S, Gomez F, Schmidhuber J (2006) Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: Proceedings of the 23rd International Conference on Machine learning. ACM, pp 369–376
Graves A, Liwicki M, Fernández S et al (2008) A novel connectionist system for unconstrained handwriting recognition. IEEE Trans Pattern Anal Mach Intell 31:855–868
Graves A, Schmidhuber J (2009) Offline handwriting recognition with multidimensional recurrent neural networks. In: Proceedings of the Advances in Neural Information Processing Systems. pp 545–552
Grosicki E, El-Abed H (2011) ICDAR 2011 - French Handwriting Recognition Competition. In: Proceedings of the International Conference on Document Analysis and Recognition. IEEE, pp 1459–1463
Gui L, Liang X, Chang X, Hauptmann AG (2018) Adaptive context-aware reinforced agent for handwritten text recognition. In: British Machine Vision Conference. p 207
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. pp 770–778
Hong W-C, Fan G-F (2019) Hybrid empirical mode decomposition with support vector regression model for short term load forecasting. Energies 12:1093
Kirillov A, He K, Girshick R, Dollár P (2017) A unified architecture for instance and semantic segmentation. http://presentations.cocodataset.org/COCO17-Stuff-FAIR.pdf. Accessed 7 Sept 2021
Lin T-Y, Maire M, Belongie S, et al (2014) Microsoft COCO: Common Objects in Context. In: European Conference on Computer Vision. Springer, pp 740–755
Liu W, Anguelov D, Erhan D, et al (2016) SSD: Single shot MultiBox detector. In: European Conference on Computer Vision. Springer, pp 21–37
Lin T-Y, Goyal P, Girshick R, et al (2017) Focal loss for dense object detection. In: Proceedings of the IEEE International Conference on Computer Vision. pp 2980–2988
Majid N, Smith EHB (2019) Segmentation-free Bangla offline handwriting recognition using sequential detection of characters and diacritics with a Faster R-CNN. In: 2019 International Conference on Document Analysis and Recognition (ICDAR). IEEE, pp 228–233
Majumder S, Ghosh S, Malakar S et al (2021) A voting-based technique for word spotting in handwritten document images. Multimed Tools Appl. https://doi.org/10.1007/s11042-020-10363-0
Malakar S, Ghosh P, Sarkar R, et al (2011) An improved offline handwritten character segmentation algorithm for Bangla script. In: Proceedings of the 5th Indian International Conference on Artificial Intelligence (IICAI 2011)
Malakar S, Ghosh M, Sarkar R, Nasipuri M (2020) Development of a two-stage segmentation-based word searching method for handwritten document images. J Intell Syst. https://doi.org/10.1515/jisys-2017-0384
Malakar S, Sarkar R, Basu S et al (2020) An image database of handwritten Bangla words with automatic benchmarking facilities for character segmentation algorithms. Neural Comput Appl. https://doi.org/10.1007/s00521-020-04981-w
Marti U-V, Bunke H (2001) Using a statistical language model to improve the performance of an HMM-based cursive handwriting recognition system. Hidden Markov models: applications in computer vision. World Scientific, Singapore, pp 65–90
Marti UV, Bunke H (2002) The IAM-database: an English sentence database for offline handwriting recognition. Int J Doc Anal Recognit 5:39–46
Menasri F, Louradour J, Bianne-Bernard A-L, Kermorvant C (2012) The A2iA French handwriting recognition system at the Rimes-ICDAR2011 competition. In: Proceedings of the Document Recognition and Retrieval XIX. International Society for Optics and Photonics, p 82970Y
Neubeck A, Van Gool L (2006) Efficient non-maximum suppression. In: Proceedings of the 18th International Conference on Pattern Recognition (ICPR’06). IEEE, pp 850–855
Pal U, Roy RK, Kimura F (2012) Multi-lingual city name recognition for Indian postal automation. In: Proceedings of the International Workshop on Frontiers in Handwriting Recognition. IEEE, pp 169–173
Pham V, Bluche T, Kermorvant C, Louradour J (2014) Dropout improves recurrent neural networks for handwriting recognition. In: Proceedings of the 14th International Conference on Frontiers in Handwriting Recognition. IEEE, pp 285–290
Poznanski A, Wolf L (2016) CNN-N-gram for handwriting word recognition. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition. pp 2305–2314
Redmon J YOLO: real-time object detection. https://pjreddie.com/darknet/yolo/. Accessed 7 Sept 2021
Redmon J, Farhadi A (2018) Yolov3: An incremental improvement, arXiv preprint arXiv:1804.02767
Ren S, He K, Girshick R, Sun J (2015) Faster R-CNN: Towards real-time object detection with region proposal networks. In: Proceedings of the Advances in neural information processing systems. pp 91–99
Rezatofighi H, Tsoi N, Gwak J, et al (2019) Generalized intersection over union: A metric and a loss for bounding box regression. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. pp 658–666
Sarkar R, Malakar S, Das N et al (2011) Word extraction and character segmentation from text lines of unconstrained handwritten Bangla document images. J Intell Syst 20:227–260. https://doi.org/10.1515/JISYS.2011.013
Singh PK, Mahanta S, Malakar S, et al (2014) Development of a page segmentation technique for Bangla documents printed in italic style. In: Proceedings of the 2nd International Conference on Business and Information Management (ICBIM 2014)
Singh S, Kariveda T, Gupta J Das, Bhattacharya K (2015) Handwritten words recognition for legal amounts of bank cheques in English script. In: Proceedings singh 2015 handwritten. 2015 8th International Conference on Advances in Pattern Recognition (ICAPR). IEEE, pp 1–5
Shi B, Bai X, Yao C (2016) An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition. IEEE Trans Pattern Anal Mach Intell 39:2298–2304
Stahlberg F, Vogel S (2015) The QCRI recognition system for handwritten Arabic. In: International Conference on Image Analysis and Processing. Springer, pp 276–286
Sueiras J, Ruiz V, Sanchez A, Velez JF (2018) Offline continuous handwriting recognition using sequence to sequence neural networks. Neurocomputing 289:119–128
Szegedy C, Ioffe S, Vanhoucke V, Alemi AA (2017) Inception-v4, inception-ResNet and the impact of residual connections on learning. In: Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence. pp 4278–4284
Wu X, Chen Q, You J, Xiao Y (2019) Unconstrained offline handwritten word recognition by position embedding integrated resnets model. IEEE Signal Process Lett 26:597–601
Zhang Y, Nie S, Liu W, et al (2019) Sequence-to-sequence domain adaptation network for robust text image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. pp 2740–2749
Zhang Z, Ding S, Sun Y (2020) A support vector regression model hybridized with chaotic krill herd algorithm and empirical mode decomposition for regression task. Neurocomputing 410:185–201
Acknowledgements
We would like to thank the Centre for Microprocessor Applications for Training, Education and Research (CMATER) research laboratory of the Computer Science and Engineering Department, Jadavpur University, Kolkata, India for providing us the infrastructural support.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
We declare that we have no conflict of interest.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Mondal, R., Malakar, S., Barney Smith, E.H. et al. Handwritten English word recognition using a deep learning based object detection architecture. Multimed Tools Appl 81, 975–1000 (2022). https://doi.org/10.1007/s11042-021-11425-7
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-021-11425-7