Abstract
Script identification is required for a multilingual OCR system. In this paper, we present a novel and efficient technique for Bangla/English script identification with applications to the destination address block of Bangladesh envelope images. The proposed approach is based upon the analysis of connected component profiles extracted from the destination address block images, however, it does not place any emphasis on the information provided by individual characters themselves and does not require any character/line segmentation. Experimental results demonstrate that the proposed technique is capable of identifying Bangla/English scripts on the real Bangladesh postal images.
Chapter PDF
Similar content being viewed by others
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
Hochberg, J., Kelly, P., Thomas, T., Kerns, L.: Automatic Script Identification From Document Images Using Cluster-Based Templates. IEEE Transactions on Pattern Analysis and Machine Intelligence, 176–181 (1997)
Spitz, A.L.: Determination of the Script and Language Content of Document Images. IEEE Trans. Pattern Analysis and Machine Intelligence, 235–245 (1997)
Lee, S.W., Kim, J.S.: Multi-lingual, multi-font and multi-size large-set character recognition using self-organizing neural network. In: Proceedings of International Conference on Document Analysis and Recognition, vol. 1, pp. 28–33 (1995)
Liu, Y.H., Lin, C.C., Chang, F.: Language Identification of Character Images Using Machine Learning Techniques. In: Proceedings of 8th Intl. Conf. Document Analysis and Recognition, pp. 630–634 (2005)
John, M.P.: Linguini: Language Identification for Multilingual Documents. In: Proceedings of 32nd Hawaii International Conference on System Sciences, vol. 2, pp. 2035–2045 (1999)
Elgammal, A.M., Ismail, M.A.: Techniques for Language Identification for Hybrid Arabic-English Document Images. In: IEEE Proceedings of the Sixth International Conference on Document Analysis and Recognition, pp. 1100–1104 (2001)
Tan, C.L., Leong, T.Y., He, S.: Language identification in multilingual documents. In: Proceedings of International Symposium on Intelligent Multimedia and Distance Education (ISIMADE 1999), pp. 59–64 (1999)
Peake, G.S., Tan, T.N.: Script and Language Identification from Document Images. In: Proceedings of the Workshop on Document Image Analysis, pp. 10–17 (1997)
Singhal, V., Navin, N., Ghosh, D.: Script-based classification of Hand-written Text Document in a Multilingual Environment. In: Research Issues in Data Engineering, pp. 47–54 (2003)
Wood, S.L., Yao, X., Krishnamurthi, K., Dang, L.: Language identification for printed text independent of segmentation. In: Proceedings of the International Conference on Image Processing, vol. 3, pp. 3428–3431 (1995)
Ding, J., Lam, L., Suen, C.Y.: Classification of Oriental and European Scripts by Using Characteristic Features. In: Proceedings of fourth International Conference Document Analysis and Recognition, pp. 1023–1027 (1997)
Pal, U., Chaudhuri, B.B.: Script Line Separation from Indian Multi-Script Documents. In: Proceedings of fifth Intl. Conf. Document Analysis and Recognition, pp. 406–409 (1999)
Pal, U., Chaudhuri, B.B.: Automatic Identification of English, Chinese, Arabic, Devnagari and Bangla Script Line. In: Intl. Conf. Document Analysis and Recognition, pp. 790–794 (2001)
Pal, U., Sinha, S., Chaudhuri, B.B.: Multi-Script Line identification from Indian Documents. In: Proceedings of the Seventh International Conference on Document Analysis and Recognition, vol. 2, pp. 880–884 (2003)
Chaudhury, S., Sheth, R.: Trainable Script Identification Strategies for Indian Languages. In: Proceedings of 5th International Conference on Document Analysis and Recognation, pp. 657–660 (1999)
Kanoun, S., Ennaji, A., LeCourtier, Y., Alimi, A.M.: Script and Nature Differentiation for Arabic and Latin Text Images. In: Proceedings of the 8th International Workshop on Frontiers in Handwriting Recognition, pp. 309–313 (2002)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Zhou, L., Lu, Y., Tan, C.L. (2006). Bangla/English Script Identification Based on Analysis of Connected Component Profiles. In: Bunke, H., Spitz, A.L. (eds) Document Analysis Systems VII. DAS 2006. Lecture Notes in Computer Science, vol 3872. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11669487_22
Download citation
DOI: https://doi.org/10.1007/11669487_22
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-32140-8
Online ISBN: 978-3-540-32157-6
eBook Packages: Computer ScienceComputer Science (R0)