Abstract
Document segmentation is one of the most important phases in machine recognition of any language. Correct segmentation of individual symbols decides the success of character recognition technique. It is used to decompose an image of a sequence of characters into sub images of individual symbols by segmenting lines and words. Devnagari is the most popular script in India. It is used for writing Hindi, Marathi, Sanskrit and Nepali languages. Moreover, Hindi is the third most popular language in the world. Devnagari documents consist of vowels, consonants and various modifiers. Hence a proper segmentation Devnagari word is challenging. A simple approach based on bounded box to segment Devnagari documents is proposed in this paper. Various challenges in segmentation of Devnagari script are also discussed.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Priyanka, N., Pal, S., Mandal, R.: Line and Word Segmentation Approach for Printed Documents. IJCA Special Issue on Recent Trends in Image Processing and Pattern Recognition 1“RTIPPR”, 30–36 (2010)
Wong, K., Casey, R., Wahl, F.: Document Analysis System. IBM J. Res. Dev. 26(6), 647–656 (1982)
Nagy, G., Seth, S., Viswanathan, M.: A prototype document image analysis system for technical journals. Computer 25, 10–22 (1992)
Kumar, V., Senegar, P.K.: Segmentation of Printed Text in Devnagari Script and Gurmukhi Script. IJCA: International Journal of Computer Applications 3, 24–29 (2010)
Pal, U., Datta, S.: Segmentation of Bangla Unconstrained Handwritten Text. In: Proc. 7th Int. Conf. on Document Analysis and Recognition, pp.1128–1132 (2003)
Dongre, V.J., Mankar, V.H.: A Review of Research on Devnagari Character Recognition. International Journal of Computer Applications (0975 – 8887) 12(2), 8–15 (2010)
Pal, U., Mitra, M., Chaudhuri, B.B.: Multi-skew detection of Indian script documents. In: Proc. 6th Int. Conf. Document Analysis Recognition, pp. 292–296 (2001)
Likforman-Sulem, L., Zahour, A., Taconet, B.: Text line Segmentation of Historical Documents: a Survey. International Journal on Document Analysis and Recognition 9(2), 123–138 (2007)
Magy, G.: Twenty years of Document Analysis in PAMI. IEEE Trans. in PAMI 22, 38–61 (2000)
Serra, J.: Morphological Filtering: An Overview. Signal Processing 38(1), 3–11 (1994)
Arica, N., Yarman-Vural, F.T.: An Overview of Character Recognition Focused On Off-line Handwriting. In: C99-06-C-203. IEEE, Los Alamitos (2000)
Cheriet, M., Kharma, N., Liu, C.-L., Suen, C.Y.: Character Recognition Systems: A Guide for students and Practioners. John Wiley & Sons, Inc., Hoboken (2007)
Kapoor, R., Bagai, D., Kamal, T.S.: Skew angle detection of a cursive handwritten Devnagari script character image. Journal of Indian Inst. Science, 161–175 (May-August 2002)
Pal, U., Mitra, M., Chaudhuri, B.B.: Multi-Skew Detection of Indian Script Documents. In: CVPRU IEEE, pp. 292–296 (2001)
Mankar, V.H., et al.: Contour Detection and Recovery through Bio-Medical Watermarking for Telediagnosis. International Journal of Tomography & Statistics 14(S10) (special volume) (Summer 2010)
Jing, G., Rajan, D., Siong, C.E.: Motion Detection with Adaptive Background and Dynamic Thresholds. In: Fifth International Conference on Information, Communications and Signal Processing, Bangkok, W B.4, pp. 41–45 (2005)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Dongre, V.J., Mankar, V.H. (2011). Segmentation of Printed Devnagari Documents. In: Wyld, D.C., Wozniak, M., Chaki, N., Meghanathan, N., Nagamalai, D. (eds) Advances in Computing and Information Technology. ACITY 2011. Communications in Computer and Information Science, vol 198. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-22555-0_23
Download citation
DOI: https://doi.org/10.1007/978-3-642-22555-0_23
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-22554-3
Online ISBN: 978-3-642-22555-0
eBook Packages: Computer ScienceComputer Science (R0)