On OCR of Major Indian Scripts: Bangla and Devanagari
This chapter describes our work in the OCR of Bangla and Devanagari, two of the most widely used scripts of the Indian subcontinent. Due to their strong structural similarities, these two scripts can be tackled under a single framework. The proposed approach starts with character and symbol segmentation and employs three recognizers for symbols of different zones. For the middle zone, a two-stage approach with group and individual symbol recognizers is used. The main recognizer is a covariance-based quadratic classifier. The problem of error evaluation and creating ground truth for Indic scripts has also been addressed. A post-recognition error detection approach based on spell-checker principles has been proposed mainly to correct an error in a single position in a recognized word string. Encouraging results have been obtained on multi-font Bangla and Devanagari documents.
KeywordsBangla Devanagari OCR Benchmarking Error correction
Partial support from DIT, Govt of India, in the form of a sponsored project is acknowledged with thanks.
- 2.B B Chaudhuri and U Pal, Printed Devnagari Script OCR System. Vivek, Vol. 10, pp. 12–24, 1997.Google Scholar
- 4.V Bansal and R M K Sinha, Integrating knowledge sources in Devanagari Text recognition. IEEE Transactions on Systems, Man and Cybernetics, Part A, Vol. 30, No. 4, pp. 500–505, 2000.Google Scholar
- 5.H Ma and D Doermann, Adaptive Hindi OCR using generalized Hausdorff image comparison, ACM Transactions on Asian language Information Processing, Vol. 26, No. 2, pp. 198–213, 2003.Google Scholar
- 6.U Garain and B B Chaudhuri, Segmentation of Touching Characters in Printed Devanagari and Bangla Scripts using Fuzzy Multifactorial Analysis, IEEE Transactions on Systems, Man and Cybernetics, Part C, Vol. 32, No. 4, pp. 449–459, 2002.Google Scholar
- 7.S Kompalli, S Setlur and V Govindraju, Design and comparison of segmentation driven and recognition driven Devanagari OCR, Proceedings of the 2nd International Conference on Document Image Analysis for Libraries (DIAL), pp. 96–102, 2006.Google Scholar
- 8.J Song, Z Li, M Lyu and S Cai, Recognition of merged characters based on forepart prediction, necessity-sufficiency matching and character-adaptive masking. IEEE Transactions on Systems, Man and Cybernetics, Part B, Vol. 35, pp. 2–11, 2005.Google Scholar
- 14.P K Kundu and B B Chaudhuri, Error patterns in Bangla Text, International Journal of Dravidian Linguistics. Vol. 28, No 2, pp. 49–88, 2000.Google Scholar