Advertisement

Automatic Text-Line Level Handwritten Indic Script Recognition: A Two-Stage Framework

  • Pawan Kumar Singh
  • Anirban Mukhopadhyay
  • Ram Sarkar
  • Mita Nasipuri
Conference paper
Part of the Advances in Intelligent Systems and Computing book series (AISC, volume 701)

Abstract

Script dependency of the Optical Character Recognition (OCR) systems is a huge obstacle for the digitalization of document images in a multi-script environment. Researchers around the world have developed various feature extraction and classification methodologies till date but mostly those are limited to bi-script and tri-script scenarios. The present work proposes an automatic two-stage framework for text-line based script recognition from the document images written in 12 Indic scripts. A misclassified text-line, at the first stage, is further examined by segmenting the same into its constituent words and the script recognition module is repeated on the obtained words. The pooled consequence of this two-stage framework helps to improve the overall accuracy of text-line level script classification.

Keywords

Text-line level script identification Handwritten documents Two-stage framework Indic scripts Modified log-Gabor filter transform Multi Layer Perceptron 

References

  1. 1.
    Singh, P.K., Sarkar, R., Das, N., Basu, S., Nasipuri, M.: Identification of Devnagari and Roman script from multiscript handwritten documents. In: Proceedings of 5th International Conference on PReMI, pp. 509–514. LNCS 8251 (2013)Google Scholar
  2. 2.
    Joshi, G.D., Garg, S., Sivaswamy, J.: Script identification from Indian documents. In: Lecture Notes in Computer Science: International Workshop Document Analysis Systems, pp. 255–267. Nelson, LNCS-3872, Feb 2006Google Scholar
  3. 3.
    Hiremath, P.S., Shivashankar, S.: Wavelet based co-occurrence histogram features for texture classification with an application to script identification in document image. Pattern Recogn. Lett. 29(9), 1182–1189 (2008)CrossRefGoogle Scholar
  4. 4.
    Pati, P.B., Ramakrishnan, A.G.: Word level multi-script identification. Pattern Recogn. Lett. 29(9), 1218–1229 (2008)CrossRefGoogle Scholar
  5. 5.
    Padma, M.C., Vijaya, P.A.: Wavelet packet based texture features for automatic script identification. Int. J. Image Process. 4(1), 53–65 (2010)Google Scholar
  6. 6.
    Obaidullah, S.M., Mondal, A., Roy, K.: Structural feature based approach for script identification from printed Indian document. In: Proceedings of IEEE Signal Processing and Integrated Networks (SPIN), pp. 120–124 (2014)Google Scholar
  7. 7.
    Obaidullah, S.M., Mondal, A., Das, N., Roy, K.: Script identification from printed Indian document images and performance evaluation using different classifiers. Appl. Comput. Intel. Soft Comput. Article ID: 896128, 1–12 (2014)Google Scholar
  8. 8.
    Hangarge, M., Santosh, K.C., Pardeshi, R.: Directional discrete cosine transform for handwritten script identification. In: Proceedings of 12th IEEE International Conference on Document Analysis and Recognition (ICDAR), pp. 344–348 (2013)Google Scholar
  9. 9.
    Pardeshi, R., Chaudhuri, B.B., Hangarge, M., Santosh, K.C.: Automatic handwritten Indian scripts identification. In: Proceedings of 14th IEEE International Conference on Frontiers in Handwriting Recognition (ICFHR), pp. 375–340 (2014)Google Scholar
  10. 10.
    Singh, P.K., Sarkar, R., Nasipuri, M.: Offline script identification from multilingual Indic-script documents: a state-of-the-art. Comput. Sci. Rev. (Elsevier) 15–16, 1–28 (2015)MathSciNetGoogle Scholar
  11. 11.
    Pal, U., Sinha, S., Chaudhuri, B.B.: Multi-script line identification from Indian documents. In: Proceedings of 7th International Conference on Document Analysis and Recognition (ICDAR), pp. 880–884, Aug 2003Google Scholar
  12. 12.
    Obaidullah, S.M., Santosh, K.C., Halder, C., Das, N., Roy, K.: Automatic Indic script identification from handwritten documents: page, block, line and word-level approach. Int. J. Mach. Learn. Cybern. 1–20 (2017)Google Scholar
  13. 13.
    Chanda, S., Pal, S., Franke, K., Pal, U.: Two-stage approach for word-wise script identification. In: Proceedings of 10th IEEE International Conference on Document Analysis and Recognition (ICDAR), pp. 926–930 (2009)Google Scholar
  14. 14.
    Ubul, K., Tursun, G., Aysa, A., Impedovo, D., Pirlo, G., Yibulayin, T.: Script identification of multi-script documents: a survey. IEEE Access 5, 6546–6559 (2017)Google Scholar
  15. 15.
    Singh, P.K., Chatterjee, I., Sarkar, R.: Page-level handwritten script identification using modified log-Gabor filter based features. In: Proceedings of 2nd IEEE International Conference on Recent Trends in Information Systems (ReTIS), pp. 225–230 (2015)Google Scholar
  16. 16.
    Sarkar, R., Malakar, S., Das, N., Basu, S., Kundu, M., Nasipuri, M.: Word extraction and character segmentation from text lines of unconstrained handwritten bangla document images. J. Intel. Syst. 20(3), 227–260 (2011)Google Scholar
  17. 17.
    Gonzalez, R.C., Woods, R.E.: Digital Image Processing, vol. I. Prentice-Hall, India (1992)Google Scholar
  18. 18.
    Khandelwal, A., Choudhury, P., Sarkar, R., Basu, S., Nasipuri, M., Das, N.: Text line segmentation for unconstrained handwritten document images using neighborhood connected component analysis. In: Proceedings of 3rd International Conference on Pattern Recognition and Machine Intelligence (PReMI’ 09). LNCS 5909, pp. 369–374 (2009)Google Scholar

Copyright information

© Springer Nature Singapore Pte Ltd. 2018

Authors and Affiliations

  • Pawan Kumar Singh
    • 1
  • Anirban Mukhopadhyay
    • 1
  • Ram Sarkar
    • 1
  • Mita Nasipuri
    • 1
  1. 1.Department of Computer Science and EngineeringJadavpur UniversityKolkataIndia

Personalised recommendations