Skip to main content
Log in

Character spotting and autonomous tagging: offline handwriting recognition for Bangla, Korean and other alphabetic scripts

  • Special Issue Paper
  • Published:
International Journal on Document Analysis and Recognition (IJDAR) Aims and scope Submit manuscript

Abstract

This paper demonstrates a framework for offline handwriting recognition using character spotting and autonomous tagging which works for any alphabetic script. Character spotting builds on the idea of object detection to find character elements in unsegmented word images. An autonomous tagging approach is introduced which automates the production of a character image training set by estimating character locations in a word based on typical character size. Although scripts can vary vividly from each other, our proposed approach provides a simple and powerful workflow for unconstrained offline recognition that should work for any alphabetic script with few adjustments. Here we demonstrate this approach with handwritten Bangla, obtaining a character recognition accuracy (CRA) of 94.8% and 91.12% with precision and autonomous tagging, respectively. Furthermore, we explained how character spotting and autonomous tagging can be implemented for other alphabetic scripts. We demonstrated that with handwritten Hangul/Korean obtaining a Jamo recognition accuracy (JRA) of 93.16% using a tiny fraction of the PE92 training set. The combination of character spotting and autonomous tagging takes away one of the biggest frustrations—data annotation by hand, and thus, we believe this has the potential to revolutionize the growth of offline recognition development.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18
Fig. 19
Fig. 20
Fig. 21
Fig. 22
Fig. 23

Similar content being viewed by others

References

  1. Majid, N., Smith, E.H.B.: Segmentation-free Bangla offline handwriting recognition using sequential detection of characters and diacritics with a Faster R-CNN. In: 2019 International Conference on Document Analysis and Recognition (ICDAR), pp. 228–233. IEEE (2019)

  2. Majid, N., Smith, E.H.B.: Boise State Bangla Handwriting Dataset. https://doi.org/10.18122/saipl/1/boisestate (2018)

  3. Ethnologue: Languages of the World, (25th ed., 2022). Bengali. https://www.ethnologue.com/language/ben. Online; Accessed 24 May 2022

  4. WorldAtlas: The World’s Most Popular Writing Scripts. https://www.worldatlas.com/articles/the-world-s-most-popular-writing-scripts.html. Online; Accessed 24 May 2022

  5. Malakar, S., Sarkar, R., Basu, S., Kundu, M., Nasipuri, M.: An image database of handwritten Bangla words with automatic benchmarking facilities for character segmentation algorithms. Neural Comput. Appl. 33(1), 449–468 (2021)

    Article  Google Scholar 

  6. Mitra, P., Bhattacharjee, K., Das, A., Dey, S.K., Chakraborty, D., Ghosal, A., Akhtar, S.: Character segmentation for handwritten Bangla words using image processing. Am. J. Electron. Commun. 1(3), 8–11 (2021)

    Google Scholar 

  7. Kohli, M., Kumar, S.: Segmentation of handwritten words into characters. Multimed. Tools Appl. 80(14), 22121–22133 (2021)

    Article  Google Scholar 

  8. Mahto, M.K., Bhatia, K., Sharma, R.K.: Robust offline Gurmukhi handwritten character recognition using multilayer histogram oriented gradient features. Int. J. Comput. Sci. Eng. 6(6), 915–925 (2018)

    Google Scholar 

  9. Javia, R.P., Goswami, M.M., Mitra, S.K.: Character segmentation from handwritten Gujarati isolated words using deep learning. In: 18th India Council International Conference (INDICON), pp. 1–6. IEEE (2021)

  10. Gupta, D., Bag, S.: Holistic versus segmentation-based recognition of handwritten Devanagari conjunct characters: a CNN-based experimental study. Neural Comput. Appl. 34(7), 5665–5681 (2022)

    Article  Google Scholar 

  11. Parikh, M., Desai, A.: Segmentation of frequently used handwritten Gujarati conjunctive alphabet. In: 2019 5th International Conference On Computing, Communication, Control And Automation (ICCUBEA), pp. 1–6. IEEE (2019)

  12. Chaudhuri, B.B., Kundu, A.: Proceedings of the Internation Conference on Frontier in Handwriting Recognition (ICFHR) (2008)

  13. CMATERdb: The pattern recognition database repository. http://code.google.com/p/cmaterdb (March 2018)

  14. Ghosh, T., Abedin, M.-H.-Z., Al Banna, H., Mumenin, N., Abu Yousuf, M.: Performance analysis of state of the art convolutional neural network architectures in Bangla handwritten character recognition. Pattern Recognit. Image Anal. 31(1), 60–71 (2021)

    Article  Google Scholar 

  15. Mishra, M., Choudhury, T., Sarkar, T.: Devanagari handwritten character recognition. In: 2021 IEEE India Council International Subsections Conference (INDISCON), pp. 1–6. IEEE (2021)

  16. Mahto, M.K., Bhatia, K., Sharma, R.K.: Deep learning based models for offline Gurmukhi handwritten character and numeral recognition. ELCVIA Electron. Lett. Comput. Vis. Image Anal., 20(2), (2021)

  17. Rani, N.S., Subramani, A.C., Kumar, A., Pushpa, BR.: Deep learning network architecture based Kannada handwritten character recognition. In: 2020 Second International Conference on Inventive Research in Computing Applications (ICIRCA), pp. 213–220. IEEE (2020)

  18. Vinotheni, C., Lakshmana Pandian, S., Lakshmi, G.: Modified convolutional neural network of Tamil character recognition. In: Advances in Distributed Computing and Machine Learning, pp. 469–480. Springer (2021)

  19. Sonthi, V.K., Nagarajan, S., Krishnaraj, N.: An intelligent Telugu handwritten character recognition using multi-objective mayfly optimization with deep learning based DenseNet model. Trans. Asian Low-Resour. Lang. Inf. Process., (2022)

  20. Jose, B., Pushpalatha, KP.: Intelligent handwritten character recognition for Malayalam scripts using deep learning approach. In: IOP Conference Series: Materials Science and Engineering, volume 1085, page 012022. IOP Publishing (2021)

  21. Chauhan, V.K., Singh, S., Sharma, A.: HCR-Net: A deep learning based script independent handwritten character recognition network. arXiv:2108.06663, (2021)

  22. Park, G.-R., Kim, I.-J., Liu, C.-L.: An evaluation of statistical methods in handwritten Hangul recognition. Int. J. Doc. Anal. Recognit. (IJDAR) 16(3), 273–283 (2013)

    Article  Google Scholar 

  23. Kim, I.-J., Xie, X.: Handwritten Hangul recognition using deep convolutional neural networks. Int. J. Doc. Anal. Recognit. (IJDAR) 18(1), 1–13 (2015)

    Article  Google Scholar 

  24. Dziubliuk, V., Zlotnyk, M., Viatchaninov, O.: Sequence learning model for syllables recognition arranged in two dimensions. In: International Conference on Document Analysis and Recognition, pp. 100–111. Springer (2021)

  25. Pramanik, R., Bag, S.: Handwritten Bangla city name word recognition using CNN-based transfer learning and fcn. Neural Comput. Appl. 33(15), 9329–9341 (2021)

    Article  Google Scholar 

  26. Sharma, S., Gupta, S., Gupta, D., Juneja, S., Singal, G., Dhiman, G., Kautish, S.: Recognition of Gurmukhi handwritten city names using deep learning and cloud computing. Sci. Programm. (2022)

  27. Dutta, K., Krishnan, P., Mathew, M., Jawahar, C.V.: Offline handwriting recognition on Devanagari using a new benchmark dataset. In: 2018 13th IAPR International Workshop on Document Analysis Systems (DAS), pp. 25–30. IEEE (2018)

  28. Jino, P.J., Balakrishnan, ., Bhattacharya, U.: Offline handwritten Malayalam word recognition using a deep architecture. In: Soft Computing for Problem Solving, pp. 913–925. Springer (2019)

  29. Salunke, D., Sabne, P., Saini, H., Shivanagi, V., Jadhav, P.: Handwritten Devanagari word recognition using customized convolution neural network. In: 2021 International Conference on Computing, Communication and Green Engineering (CCGE), pp. 1–5. IEEE (2021)

  30. Adak, C., Chaudhuri, B.B., Blumenstein, M.: Offline cursive Bengali word recognition using CNNs with a recurrent model. In: 15th International Conference on Frontiers in Handwriting Recognition (ICFHR), pp. 429–434. IEEE (2016)

  31. Mondal, R., Malakar, S., Smith, E.H.B., Sarkar, Ram.: Handwritten English word recognition using a deep learning based object detection architecture. Multimed. Tools Appl., p 1–26, (2021)

  32. Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv:1409.1556 (2014)

  33. Handwritten Hangul Datasets: PE92, SERI95, and HanDB. https://github.com/callee2006/HangulDB, (1992)

  34. Majid, N., Smith, E.H.B.: Introducing the Boise State Bangla Handwriting dataset and an efficient offline recognizer of isolated Bangla characters. In: 16th International Conference on Frontiers in Handwriting Recognition (ICFHR), pp 380–385. IEEE (2018)

  35. Sarkar, R., Das, N., Basu, S., Kundu, M., Nasipuri, M., Basu, D.K.: CMATERdb1: a database of unconstrained handwritten Bangla and Bangla-English mixed script document image. Int. J. Doc. Anal. Recognit. (IJDAR) 15(1), 71–83 (2011)

    Article  Google Scholar 

  36. Mukherjee, S., Kumar, P., Roy, P.P.: Fusion of spatio-temporal information for Indic word recognition combining online and offline text data. ACM Trans. Asian Low-Resour. Lang. Inf. Process. (TALLIP) 19(2), 1–24 (2019)

    Google Scholar 

  37. Clausner, C., Antonacopoulos, A., Derrick, T., Pletschacher, S.: ICDAR2019 competition on recognition of early Indian printed documents–REID2019. In: 2019 International Conference on Document Analysis and Recognition (ICDAR), pp. 1527–1532. IEEE (2019)

Download references

Acknowledgements

The authors are extremely grateful to all the volunteers who contributed to the Boise State Handwriting Dataset project. We would also like to thank the Center for Microprocessor Application for Training Education and Research (CMATER) group, Dr. Pradeep Kumar, Department of Computer Science & Engineering, Indian Institute of Technology Roorkee, India. Thanks also to Mr. Steven Kim, Department of Computer Science, Boise State University, Boise, Idaho, USA who helped us with Korean. Last, we would like to acknowledge the high-performance computing support of the R2 Compute Cluster provided by Boise State University’s Research Computing Department.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Nishatul Majid.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Majid, N., Smith, E.H.B. Character spotting and autonomous tagging: offline handwriting recognition for Bangla, Korean and other alphabetic scripts. IJDAR 25, 245–263 (2022). https://doi.org/10.1007/s10032-022-00410-x

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10032-022-00410-x

Keywords

Navigation