Advertisement

On OCR of Major Indian Scripts: Bangla and Devanagari

  • B. B. ChaudhuriEmail author
Chapter
Part of the Advances in Pattern Recognition book series (ACVPR)

Abstract

This chapter describes our work in the OCR of Bangla and Devanagari, two of the most widely used scripts of the Indian subcontinent. Due to their strong structural similarities, these two scripts can be tackled under a single framework. The proposed approach starts with character and symbol segmentation and employs three recognizers for symbols of different zones. For the middle zone, a two-stage approach with group and individual symbol recognizers is used. The main recognizer is a covariance-based quadratic classifier. The problem of error evaluation and creating ground truth for Indic scripts has also been addressed. A post-recognition error detection approach based on spell-checker principles has been proposed mainly to correct an error in a single position in a recognized word string. Encouraging results have been obtained on multi-font Bangla and Devanagari documents.

Keywords

Bangla Devanagari OCR Benchmarking Error correction 

Notes

Acknowledgment

Partial support from DIT, Govt of India, in the form of a sponsored project is acknowledged with thanks.

References

  1. 1.
    U Pal and B B Chaudhuri, Indian Script Character Recognition: A Survey. Pattern Recognition, Vol. 37, pp. 1887–1899, 2004.CrossRefGoogle Scholar
  2. 2.
    B B Chaudhuri and U Pal, Printed Devnagari Script OCR System. Vivek, Vol. 10, pp. 12–24, 1997.Google Scholar
  3. 3.
    B B Chaudhuri and U Pal, A Complete Printed Bangla OCR System. Pattern Recognition, Vol. 31. No. 5, pp. 531–549, 1998.CrossRefGoogle Scholar
  4. 4.
    V Bansal and R M K Sinha, Integrating knowledge sources in Devanagari Text recognition. IEEE Transactions on Systems, Man and Cybernetics, Part A, Vol. 30, No. 4, pp. 500–505, 2000.Google Scholar
  5. 5.
    H Ma and D Doermann, Adaptive Hindi OCR using generalized Hausdorff image comparison, ACM Transactions on Asian language Information Processing, Vol. 26, No. 2, pp. 198–213, 2003.Google Scholar
  6. 6.
    U Garain and B B Chaudhuri, Segmentation of Touching Characters in Printed Devanagari and Bangla Scripts using Fuzzy Multifactorial Analysis, IEEE Transactions on Systems, Man and Cybernetics, Part C, Vol. 32, No. 4, pp. 449–459, 2002.Google Scholar
  7. 7.
    S Kompalli, S Setlur and V Govindraju, Design and comparison of segmentation driven and recognition driven Devanagari OCR, Proceedings of the 2nd International Conference on Document Image Analysis for Libraries (DIAL), pp. 96–102, 2006.Google Scholar
  8. 8.
    J Song, Z Li, M Lyu and S Cai, Recognition of merged characters based on forepart prediction, necessity-sufficiency matching and character-adaptive masking. IEEE Transactions on Systems, Man and Cybernetics, Part B, Vol. 35, pp. 2–11, 2005.Google Scholar
  9. 9.
    J Park, V Govindaraju and S N Srihari, OCR in a hierarchical feature space,IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 22, No. 4, pp 400–407, 2000.CrossRefGoogle Scholar
  10. 10.
    J J Hull and S N Srihari, Experiments in text recognition with binary n-grams and Viterbi algorithms.IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 4, no. 5, pp 520–530, 1982.CrossRefGoogle Scholar
  11. 11.
    R Singhal and G T Toussaint, Experiments in text recognition with modified Viterbi algorithm.IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 1, pp. 184–192, 1979.CrossRefGoogle Scholar
  12. 12.
    V Bansal and R M K Sinha, Partitioning and searching dictionary for correction of optically read Devanagari character strings. International Journal of Document Analysis and Recognition; Vol. 4, pp. 269–280, 2002.CrossRefGoogle Scholar
  13. 13.
    K. Kukich, Techniques for automatically correcting words in text, ACM Computing Surveys , Vol. 24, No. 4, pp. 377–439, 1992.CrossRefGoogle Scholar
  14. 14.
    P K Kundu and B B Chaudhuri, Error patterns in Bangla Text, International Journal of Dravidian Linguistics. Vol. 28, No 2, pp. 49–88, 2000.Google Scholar
  15. 15.
    J Rocha and T Pavlidis, Character recognition without segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 17, pp. 903–909, 1995.CrossRefGoogle Scholar
  16. 16.
    S Mori, C Y Suen and K Yamamoto, Historical review of OCR research and development. Proc IEEE, Vol. 80, No. 7, pp. 1029–1058, 1992.CrossRefGoogle Scholar

Copyright information

© Springer-Verlag London Limited 2009

Authors and Affiliations

  1. 1.Indian Statistical InstituteKolkataIndia

Personalised recommendations