Multi-oriented English Text Line Identification
There are many artistic documents where text lines of a single page may have different inclinations (orientations). To enhance the ability of document analysis system, we have to extract text line in multiple orientations. In this paper, we propose a robust technique to detect English text lines of arbitrary orientation in a single document page. We propose here a bottom-up approach where the connected components are at first labelled. They are then clustered into word groups. Text lines of arbitrary orientation are identified from the estimation of these word groups. From an experiment of 3700 text lines, we obtained an accuracy of 98.3% by the proposed method.
KeywordsCore Area Reference Line Candidate Region Document Image Text Line
- L. A. Fletcher and R. Kasturi, “A robust algorithm for text string separation from mined text/graphics images”, IEEE PAMI vol.10, pp.910–918, 1988.Google Scholar
- L. O’Gorman, “The document spectrum for page layout analysis”, IEEE PAMI., vol. 15, pp. 1162–1173, 1993.Google Scholar
- U. Pal, M. Mitra and B. B. Chaudhuri, “Multi-Skew Detection of Indian Script documents”, In Proc. 6th ICDAR pp. 292–296, 20001.Google Scholar