Abstract
This paper demonstrates a framework for offline handwriting recognition using character spotting and autonomous tagging which works for any alphabetic script. Character spotting builds on the idea of object detection to find character elements in unsegmented word images. An autonomous tagging approach is introduced which automates the production of a character image training set by estimating character locations in a word based on typical character size. Although scripts can vary vividly from each other, our proposed approach provides a simple and powerful workflow for unconstrained offline recognition that should work for any alphabetic script with few adjustments. Here we demonstrate this approach with handwritten Bangla, obtaining a character recognition accuracy (CRA) of 94.8% and 91.12% with precision and autonomous tagging, respectively. Furthermore, we explained how character spotting and autonomous tagging can be implemented for other alphabetic scripts. We demonstrated that with handwritten Hangul/Korean obtaining a Jamo recognition accuracy (JRA) of 93.16% using a tiny fraction of the PE92 training set. The combination of character spotting and autonomous tagging takes away one of the biggest frustrations—data annotation by hand, and thus, we believe this has the potential to revolutionize the growth of offline recognition development.
Similar content being viewed by others
References
Majid, N., Smith, E.H.B.: Segmentation-free Bangla offline handwriting recognition using sequential detection of characters and diacritics with a Faster R-CNN. In: 2019 International Conference on Document Analysis and Recognition (ICDAR), pp. 228–233. IEEE (2019)
Majid, N., Smith, E.H.B.: Boise State Bangla Handwriting Dataset. https://doi.org/10.18122/saipl/1/boisestate (2018)
Ethnologue: Languages of the World, (25th ed., 2022). Bengali. https://www.ethnologue.com/language/ben. Online; Accessed 24 May 2022
WorldAtlas: The World’s Most Popular Writing Scripts. https://www.worldatlas.com/articles/the-world-s-most-popular-writing-scripts.html. Online; Accessed 24 May 2022
Malakar, S., Sarkar, R., Basu, S., Kundu, M., Nasipuri, M.: An image database of handwritten Bangla words with automatic benchmarking facilities for character segmentation algorithms. Neural Comput. Appl. 33(1), 449–468 (2021)
Mitra, P., Bhattacharjee, K., Das, A., Dey, S.K., Chakraborty, D., Ghosal, A., Akhtar, S.: Character segmentation for handwritten Bangla words using image processing. Am. J. Electron. Commun. 1(3), 8–11 (2021)
Kohli, M., Kumar, S.: Segmentation of handwritten words into characters. Multimed. Tools Appl. 80(14), 22121–22133 (2021)
Mahto, M.K., Bhatia, K., Sharma, R.K.: Robust offline Gurmukhi handwritten character recognition using multilayer histogram oriented gradient features. Int. J. Comput. Sci. Eng. 6(6), 915–925 (2018)
Javia, R.P., Goswami, M.M., Mitra, S.K.: Character segmentation from handwritten Gujarati isolated words using deep learning. In: 18th India Council International Conference (INDICON), pp. 1–6. IEEE (2021)
Gupta, D., Bag, S.: Holistic versus segmentation-based recognition of handwritten Devanagari conjunct characters: a CNN-based experimental study. Neural Comput. Appl. 34(7), 5665–5681 (2022)
Parikh, M., Desai, A.: Segmentation of frequently used handwritten Gujarati conjunctive alphabet. In: 2019 5th International Conference On Computing, Communication, Control And Automation (ICCUBEA), pp. 1–6. IEEE (2019)
Chaudhuri, B.B., Kundu, A.: Proceedings of the Internation Conference on Frontier in Handwriting Recognition (ICFHR) (2008)
CMATERdb: The pattern recognition database repository. http://code.google.com/p/cmaterdb (March 2018)
Ghosh, T., Abedin, M.-H.-Z., Al Banna, H., Mumenin, N., Abu Yousuf, M.: Performance analysis of state of the art convolutional neural network architectures in Bangla handwritten character recognition. Pattern Recognit. Image Anal. 31(1), 60–71 (2021)
Mishra, M., Choudhury, T., Sarkar, T.: Devanagari handwritten character recognition. In: 2021 IEEE India Council International Subsections Conference (INDISCON), pp. 1–6. IEEE (2021)
Mahto, M.K., Bhatia, K., Sharma, R.K.: Deep learning based models for offline Gurmukhi handwritten character and numeral recognition. ELCVIA Electron. Lett. Comput. Vis. Image Anal., 20(2), (2021)
Rani, N.S., Subramani, A.C., Kumar, A., Pushpa, BR.: Deep learning network architecture based Kannada handwritten character recognition. In: 2020 Second International Conference on Inventive Research in Computing Applications (ICIRCA), pp. 213–220. IEEE (2020)
Vinotheni, C., Lakshmana Pandian, S., Lakshmi, G.: Modified convolutional neural network of Tamil character recognition. In: Advances in Distributed Computing and Machine Learning, pp. 469–480. Springer (2021)
Sonthi, V.K., Nagarajan, S., Krishnaraj, N.: An intelligent Telugu handwritten character recognition using multi-objective mayfly optimization with deep learning based DenseNet model. Trans. Asian Low-Resour. Lang. Inf. Process., (2022)
Jose, B., Pushpalatha, KP.: Intelligent handwritten character recognition for Malayalam scripts using deep learning approach. In: IOP Conference Series: Materials Science and Engineering, volume 1085, page 012022. IOP Publishing (2021)
Chauhan, V.K., Singh, S., Sharma, A.: HCR-Net: A deep learning based script independent handwritten character recognition network. arXiv:2108.06663, (2021)
Park, G.-R., Kim, I.-J., Liu, C.-L.: An evaluation of statistical methods in handwritten Hangul recognition. Int. J. Doc. Anal. Recognit. (IJDAR) 16(3), 273–283 (2013)
Kim, I.-J., Xie, X.: Handwritten Hangul recognition using deep convolutional neural networks. Int. J. Doc. Anal. Recognit. (IJDAR) 18(1), 1–13 (2015)
Dziubliuk, V., Zlotnyk, M., Viatchaninov, O.: Sequence learning model for syllables recognition arranged in two dimensions. In: International Conference on Document Analysis and Recognition, pp. 100–111. Springer (2021)
Pramanik, R., Bag, S.: Handwritten Bangla city name word recognition using CNN-based transfer learning and fcn. Neural Comput. Appl. 33(15), 9329–9341 (2021)
Sharma, S., Gupta, S., Gupta, D., Juneja, S., Singal, G., Dhiman, G., Kautish, S.: Recognition of Gurmukhi handwritten city names using deep learning and cloud computing. Sci. Programm. (2022)
Dutta, K., Krishnan, P., Mathew, M., Jawahar, C.V.: Offline handwriting recognition on Devanagari using a new benchmark dataset. In: 2018 13th IAPR International Workshop on Document Analysis Systems (DAS), pp. 25–30. IEEE (2018)
Jino, P.J., Balakrishnan, ., Bhattacharya, U.: Offline handwritten Malayalam word recognition using a deep architecture. In: Soft Computing for Problem Solving, pp. 913–925. Springer (2019)
Salunke, D., Sabne, P., Saini, H., Shivanagi, V., Jadhav, P.: Handwritten Devanagari word recognition using customized convolution neural network. In: 2021 International Conference on Computing, Communication and Green Engineering (CCGE), pp. 1–5. IEEE (2021)
Adak, C., Chaudhuri, B.B., Blumenstein, M.: Offline cursive Bengali word recognition using CNNs with a recurrent model. In: 15th International Conference on Frontiers in Handwriting Recognition (ICFHR), pp. 429–434. IEEE (2016)
Mondal, R., Malakar, S., Smith, E.H.B., Sarkar, Ram.: Handwritten English word recognition using a deep learning based object detection architecture. Multimed. Tools Appl., p 1–26, (2021)
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv:1409.1556 (2014)
Handwritten Hangul Datasets: PE92, SERI95, and HanDB. https://github.com/callee2006/HangulDB, (1992)
Majid, N., Smith, E.H.B.: Introducing the Boise State Bangla Handwriting dataset and an efficient offline recognizer of isolated Bangla characters. In: 16th International Conference on Frontiers in Handwriting Recognition (ICFHR), pp 380–385. IEEE (2018)
Sarkar, R., Das, N., Basu, S., Kundu, M., Nasipuri, M., Basu, D.K.: CMATERdb1: a database of unconstrained handwritten Bangla and Bangla-English mixed script document image. Int. J. Doc. Anal. Recognit. (IJDAR) 15(1), 71–83 (2011)
Mukherjee, S., Kumar, P., Roy, P.P.: Fusion of spatio-temporal information for Indic word recognition combining online and offline text data. ACM Trans. Asian Low-Resour. Lang. Inf. Process. (TALLIP) 19(2), 1–24 (2019)
Clausner, C., Antonacopoulos, A., Derrick, T., Pletschacher, S.: ICDAR2019 competition on recognition of early Indian printed documents–REID2019. In: 2019 International Conference on Document Analysis and Recognition (ICDAR), pp. 1527–1532. IEEE (2019)
Acknowledgements
The authors are extremely grateful to all the volunteers who contributed to the Boise State Handwriting Dataset project. We would also like to thank the Center for Microprocessor Application for Training Education and Research (CMATER) group, Dr. Pradeep Kumar, Department of Computer Science & Engineering, Indian Institute of Technology Roorkee, India. Thanks also to Mr. Steven Kim, Department of Computer Science, Boise State University, Boise, Idaho, USA who helped us with Korean. Last, we would like to acknowledge the high-performance computing support of the R2 Compute Cluster provided by Boise State University’s Research Computing Department.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Majid, N., Smith, E.H.B. Character spotting and autonomous tagging: offline handwriting recognition for Bangla, Korean and other alphabetic scripts. IJDAR 25, 245–263 (2022). https://doi.org/10.1007/s10032-022-00410-x
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10032-022-00410-x