Language Identification of Kannada, Hindi and English Text Words Through Visual Discriminating Features

Padma, M. C.; Vijaya, P. A.

doi:10.2991/ijcis.2008.1.2.2

Language Identification of Kannada, Hindi and English Text Words Through Visual Discriminating Features

Research Article
Open access
Published: 01 November 2008

Volume 1, pages 116–126, (2008)
Cite this article

Download PDF

You have full access to this open access article

International Journal of Computational Intelligence Systems Aims and scope Submit manuscript

Language Identification of Kannada, Hindi and English Text Words Through Visual Discriminating Features

Download PDF

M. C. Padma¹ &
P. A. Vijaya²

156 Accesses
Explore all metrics

Abstract

In a multilingual country like India, a document may contain text words in more than one language. For a multilingual environment, multi lingual Optical Character Recognition (OCR) system is needed to read the multilingual documents. So, it is necessary to identify different language regions of the document before feeding the document to the OCRs of individual language. The objective of this paper is to propose visual clues based procedure to identify Kannada, Hindi and English text portions of the Indian multilingual document.

Article PDF

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

References

P.Naghabhushan, Radhika M Pai, “Modified Region Decomposition Method and Optimal Depth Decision Tree in the Recognition of non-uniform sized characters – An Experimentation with Kannada Characters”, Journal of Pattern Recognition Letters, 20, 1467–1475, (1999).
Google Scholar
T.N.Tan, “Rotation Invariant Texture Features and their use in Automatic Script Identification”, IEEE Trans. Pattern Analysis and Machine Intelligence, 20(7), 751– 756, (1998).
Santanu Choudhury, Gaurav Harit, Shekar Madnani, R.B. Shet, “Identification of Scripts of Indian Languages by Combining Trainable Classifiers”, ICVGIP 2000, Dec., 20–22, Bangalore, India.
M.C.Padma, P. Nagabhushan, “Horizontal and Vertical linear edge features as useful clues in the discrimination of multiligual (Kannada, Hindi and English) machine printed documents”, Proc. National Workshop on Computer Vision, Graphics and Image Processing (WVGIP), Madhurai, 204–209, (2002).
U.Pal, B.B.Choudhuri, “OCR in Bangla:an Indo-Bangladeshi language”, IEEE, no.2, 1051–4651, (1994).
U.Pal, B.B.Choudhuri, “An OCR system to read two Indian language scripts:Bangla and Devanagari(Hindi)”, Proc. 4^th ICDAR, Uhn, 18–20, (1997).
G.S. Peake, T.N.Tan, “Script and Language Identification from Document Images”, Proc. Eighth British Mach. Vision Conference., 2, 230–233, (1997).
Google Scholar
U.Pal, B.B.Choudhuri, “Script Line Separation From Indian Multi-Script Documents”, Pro c. 5^th International Conference on Document Analysis and Recognition(IEEE Comput. Soc. Press), 406–409, (1999).
S.Basvaraj Patil, N.V.Subba Reddy, “Character script class identification system using probabilistic neural network for multi-script multi lingual document processing”, Proc. National Conference on Document Analysis and Recognition, Mandya, Karnataka, 1–8, (2001).
U.Pal B.B.Choudhuri, “Automatic Separation of Words in Multi Lingual multi Script Indian Documents”, Proc. 4^th International Conference on Document Analysis and Recognition, 576–579, (1997).
S.Chanda, U.Pal, “English, Devanagari and Urdu Text Identification”, Proc. International Conference on Document Analysis and Recognition, 538–545, (2005).
U.Pal, S.Sinha, B.B.Choudhuri, “Word-wise script identification from a document containing English, Devanagari and Telugu text”, Proc. 2^nd National Conference on Document Analysis and Recognition, Karnataka, India, 213–220, (2003).
P.Nagabhushan, S.A.Angadi, B.S.Anami, “A Fuzzy Statistical Approachto Kannada Vowel Recognition based on Invariant Moments”, proc. 2^nd National Conference, NCDAR, Mandya, 275–285, (2003).
M.C.Padma, P.Nagabhushan, “Study of the Applicability of Horizontal and Vertical Projections and Segmentation in Language Identification of Kannada, Hindi and English Documents”, P roc. National Conference NCCIT, Kilakarai, Tamilnadu, 93–102, (2001).
M.C.Padma, P.Nagabhushan, “Identification and separation of text words of Kannada, Hindi and English languages through discriminating features”, Proc. 2^nd National Conference on Document Analysis and Recognition, Mandya, Karnataka, 252–260, (2003).
U.Pal, B.B.Choudhuri, “Automatic Identification of English, Chinese, Arabic, Devanagari and Bangla Script Line”, Proc. 6^th International Conference on Document Analysis and Recognition, 790–794, (2001).
R.C.Gonzalez, R.E.Woods, Digital Image Processing Pearson Education Publications, India, 2002.
A.L.Spitz, “Determination of the Script and language Content of Document Images”, IEEE Transaction on Pattern Analysis and Machine Intelligence, vol. 1, no. 3, 235–245, 1997.
Google Scholar
U.Pal, S.Sinha, B.B.Choudhuri, “Multi-Script Line Identification from Indian Documents”, Proc. 7^th International Conference on Document Analysis and Recognition (ICDAR 2003) vol. 2, 880–884, 2003.
Ramachandra Manthalkar and P.K. Biswas, “An Automatic Script Identification Scheme for Indian Languages”, NCC, 2002.
J.Hochberg, P.Kelly, T.Thomas, L.Kerns, “Automatic Script Identification from Document Images using Cluster –based Templates”, IEEE Transaction on Pattern Analysis and Machine Intelligence, 176–18, 1997. Gopal Datt Joshi, Saurabh Garg, Jayanthi Sivaswamy, “Script Identification from Indian Documents”, DAS 2006, LNCS 3872, 255–267, 2006.

Download references

Author information

Authors and Affiliations

Dept. of Computer Science & Engineering, PES College of Engineering, 571401, Mandya, Karnataka, India
M. C. Padma (Assistant Professor)
Dept. of Electronics & Communication Engineering, Malnad College of Engineering, 573201, Hassan, Karnataka, India
P. A. Vijaya (Professor)

Authors

M. C. Padma
View author publications
You can also search for this author in PubMed Google Scholar
P. A. Vijaya
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to M. C. Padma.

Rights and permissions

This is an open access article distributed under the CC BY-NC 4.0 license (http://creativecommons.org/licenses/by-nc/4.0/).

Reprints and permissions

About this article

Cite this article

Padma, M.C., Vijaya, P.A. Language Identification of Kannada, Hindi and English Text Words Through Visual Discriminating Features. Int J Comput Intell Syst 1, 116–126 (2008). https://doi.org/10.2991/ijcis.2008.1.2.2

Download citation

Received: 21 September 2007
Revised: 29 October 2008
Published: 01 November 2008
Issue Date: May 2008
DOI: https://doi.org/10.2991/ijcis.2008.1.2.2

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Language Identification of Kannada, Hindi and English Text Words Through Visual Discriminating Features

Abstract

Article PDF

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation