Advertisement

Off-line Handwritten Script Identification from Eastern Indian Document Images Using Logistic Model Tree

  • Sk Md Obaidullah
  • Nibaran Das
  • Kaushik Roy
Conference paper
Part of the Advances in Intelligent Systems and Computing book series (AISC, volume 308)

Abstract

Script identification from document images is a complex real-life problem for a multi-script country like India where 13 official scripts are present. To develop an optical character recognizer for a specific language, it is necessary to identify the script first by which the document is written. In this paper, scripts from the off-line handwritten document images written by any one of the four popular scripts in eastern India, namely Bangla, Roman, Devanagari, and Oriya, are identified. A document-level approach is followed for the same. Using some mathematical, structural, and script-dependent feature, a multi-dimensional feature set is constructed. Finally, logistic model tree (LMT) is applied for classification and an average accuracy rate of 95.5 % is obtained with a fivefold cross-validation.

Keywords

Document image analysis Handwritten script identification Off-line documents Classification Optical character recognizer 

References

  1. 1.
    Basu, S., Das, N., Sarkar, R., Kundu, M., Nasipuri, M., Basu, D.K.: A novel framework for automatic sorting of postal documents with multi-script address blocks. Pattern Recogn. 43(10), 3507–3521 (2010)CrossRefMATHGoogle Scholar
  2. 2.
    Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H.: The WEKA data mining software: an update. SIGKDD Explor. 11, 10–18 (2009)CrossRefGoogle Scholar
  3. 3.
    Hochberg, J., Bowers, K., Cannon, M., Kelly, P.: Script and language identification for handwritten document images. Int. J. Doc. Anal. Recogn. 2(2/3), 45–52 (1999)CrossRefGoogle Scholar
  4. 4.
  5. 5.
  6. 6.
    Landwehr‚ N., Hall‚ M., Frank‚ E.: Logistic model trees. Mach. Learn. 59(1−2), 161–205 (2005)Google Scholar
  7. 7.
    Mandelbrot, B.B.: The fractal geometry of nature. Freeman, NY (1982)MATHGoogle Scholar
  8. 8.
    Moussa, S.B., Zahour, A., Benabdelhafid, A., Alimi, A.M.: Fractal-based system for Arabic/Latin, printed/handwritten script identification. In: Proceedings of International Conference on Pattern Recognition, pp. 1–4 (2008)Google Scholar
  9. 9.
    Obaidullah, S.M., Roy, K., Das, N.: Comparison of different classifier for script identification from handwritten document. In: Proceedings of ISPCC 2013 at Shimla (2013)Google Scholar
  10. 10.
    Roy, K., Das, S.K., Obaidullah, S.M.: Script identification from handwritten document. In: Proceedings of the Third National Conference on Computer Vision Pattern Recognition, Image Processing and Graphics, pp. 66–69. Hubli, Karnataka, Dec 2011Google Scholar
  11. 11.
    Roy, K.: On the development of an optical character recognition system for Indian postal automation. PhD thesis, Jadavpur University (2008)Google Scholar
  12. 12.
    Singhal, V., Navin, N., Ghosh, D.: Script-based classification of hand-written text document in a multilingual environment. In: Research Issues in Data Engineering, p. 47 (2003)Google Scholar
  13. 13.
    Sumner, M., Frank, E., Hall, M.: Speeding up logistic model tree induction. In: 9th European Conference on Principles and Practice of Knowledge Discovery in Databases, pp. 675–683 (2005)Google Scholar
  14. 14.
    Zhou, L., Lu, Y., Tan, C.L.: Bangla/english script identification based on analysis of connected component profiles. In: Lecture Notes in Computer Science, 2006, vol. 3872/2006, 24354, doi:10.1007/11669487_22 Google Scholar

Copyright information

© Springer India 2015

Authors and Affiliations

  1. 1.Department of Computer Science and EngineeringAliah UniversityKolkataIndia
  2. 2.Department of Computer Science and EngineeringJadavpur UniversityKolkataIndia
  3. 3.Department of Computer ScienceWest Bengal State UniversityBarasatIndia

Personalised recommendations