Bangla/English Script Identification Based on Analysis of Connected Component Profiles

Zhou, Lijun; Lu, Yue; Tan, Chew Lim

doi:10.1007/11669487_22

Lijun Zhou¹⁸,
Yue Lu^18,19 &
Chew Lim Tan²⁰

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 3872))

Included in the following conference series:

International Workshop on Document Analysis Systems

1775 Accesses
35 Citations

Abstract

Script identification is required for a multilingual OCR system. In this paper, we present a novel and efficient technique for Bangla/English script identification with applications to the destination address block of Bangladesh envelope images. The proposed approach is based upon the analysis of connected component profiles extracted from the destination address block images, however, it does not place any emphasis on the information provided by individual characters themselves and does not require any character/line segmentation. Experimental results demonstrate that the proposed technique is capable of identifying Bangla/English scripts on the real Bangladesh postal images.

Download to read the full chapter text

Chapter PDF

Script identification algorithms: a survey

Article 29 July 2017

A Review on Handwritten Indian Script Identification

Handwritten Indic Script Identification – A Multi-level Approach

Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

Hochberg, J., Kelly, P., Thomas, T., Kerns, L.: Automatic Script Identification From Document Images Using Cluster-Based Templates. IEEE Transactions on Pattern Analysis and Machine Intelligence, 176–181 (1997)
Google Scholar
Spitz, A.L.: Determination of the Script and Language Content of Document Images. IEEE Trans. Pattern Analysis and Machine Intelligence, 235–245 (1997)
Google Scholar
Lee, S.W., Kim, J.S.: Multi-lingual, multi-font and multi-size large-set character recognition using self-organizing neural network. In: Proceedings of International Conference on Document Analysis and Recognition, vol. 1, pp. 28–33 (1995)
Google Scholar
Liu, Y.H., Lin, C.C., Chang, F.: Language Identification of Character Images Using Machine Learning Techniques. In: Proceedings of 8th Intl. Conf. Document Analysis and Recognition, pp. 630–634 (2005)
Google Scholar
John, M.P.: Linguini: Language Identification for Multilingual Documents. In: Proceedings of 32nd Hawaii International Conference on System Sciences, vol. 2, pp. 2035–2045 (1999)
Google Scholar
Elgammal, A.M., Ismail, M.A.: Techniques for Language Identification for Hybrid Arabic-English Document Images. In: IEEE Proceedings of the Sixth International Conference on Document Analysis and Recognition, pp. 1100–1104 (2001)
Google Scholar
Tan, C.L., Leong, T.Y., He, S.: Language identification in multilingual documents. In: Proceedings of International Symposium on Intelligent Multimedia and Distance Education (ISIMADE 1999), pp. 59–64 (1999)
Google Scholar
Peake, G.S., Tan, T.N.: Script and Language Identification from Document Images. In: Proceedings of the Workshop on Document Image Analysis, pp. 10–17 (1997)
Google Scholar
Singhal, V., Navin, N., Ghosh, D.: Script-based classification of Hand-written Text Document in a Multilingual Environment. In: Research Issues in Data Engineering, pp. 47–54 (2003)
Google Scholar
Wood, S.L., Yao, X., Krishnamurthi, K., Dang, L.: Language identification for printed text independent of segmentation. In: Proceedings of the International Conference on Image Processing, vol. 3, pp. 3428–3431 (1995)
Google Scholar
Ding, J., Lam, L., Suen, C.Y.: Classification of Oriental and European Scripts by Using Characteristic Features. In: Proceedings of fourth International Conference Document Analysis and Recognition, pp. 1023–1027 (1997)
Google Scholar
Pal, U., Chaudhuri, B.B.: Script Line Separation from Indian Multi-Script Documents. In: Proceedings of fifth Intl. Conf. Document Analysis and Recognition, pp. 406–409 (1999)
Google Scholar
Pal, U., Chaudhuri, B.B.: Automatic Identification of English, Chinese, Arabic, Devnagari and Bangla Script Line. In: Intl. Conf. Document Analysis and Recognition, pp. 790–794 (2001)
Google Scholar
Pal, U., Sinha, S., Chaudhuri, B.B.: Multi-Script Line identification from Indian Documents. In: Proceedings of the Seventh International Conference on Document Analysis and Recognition, vol. 2, pp. 880–884 (2003)
Google Scholar
Chaudhury, S., Sheth, R.: Trainable Script Identification Strategies for Indian Languages. In: Proceedings of 5th International Conference on Document Analysis and Recognation, pp. 657–660 (1999)
Google Scholar
Kanoun, S., Ennaji, A., LeCourtier, Y., Alimi, A.M.: Script and Nature Differentiation for Arabic and Latin Text Images. In: Proceedings of the 8th International Workshop on Frontiers in Handwriting Recognition, pp. 309–313 (2002)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science and Technology, East China Normal University, Shanghai, 200062, China
Lijun Zhou & Yue Lu
Shanghai Research Institute of Postal Science, China State Post Bureau, Shanghai, 200062, China
Yue Lu
Department of Computer Science, School of Computing, National University of Singapore, Kent Ridge, 117543, Singapore
Chew Lim Tan

Authors

Lijun Zhou
View author publications
You can also search for this author in PubMed Google Scholar
Yue Lu
View author publications
You can also search for this author in PubMed Google Scholar
Chew Lim Tan
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Institute of Computer Science and Applied Mathematics, University of Bern, Neubrückstrasse 10, CH-3012, Bern, Switzerland
Horst Bunke
DocRec Ltd, 34 Strathaven Place, 7001, Atawhai, Nelson, New Zealand
A. Lawrence Spitz

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zhou, L., Lu, Y., Tan, C.L. (2006). Bangla/English Script Identification Based on Analysis of Connected Component Profiles. In: Bunke, H., Spitz, A.L. (eds) Document Analysis Systems VII. DAS 2006. Lecture Notes in Computer Science, vol 3872. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11669487_22

Download citation

DOI: https://doi.org/10.1007/11669487_22
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-32140-8
Online ISBN: 978-3-540-32157-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

The International Association for Pattern Recognition (opens in a new tab)

Bangla/English Script Identification Based on Analysis of Connected Component Profiles

Abstract

Chapter PDF

Similar content being viewed by others

Script identification algorithms: a survey

A Review on Handwritten Indian Script Identification

Handwritten Indic Script Identification – A Multi-level Approach

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Societies and partnerships

Navigation

Bangla/English Script Identification Based on Analysis of Connected Component Profiles

Abstract

Chapter PDF

Similar content being viewed by others

Script identification algorithms: a survey

A Review on Handwritten Indian Script Identification

Handwritten Indic Script Identification – A Multi-level Approach

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Societies and partnerships

Search

Navigation