Text-Line Extraction as Selection of Paths in the Neighbor Graph
This paper presents a new method of text-line extraction which can be applied to tilted non-rectangular pages. The method is characterized as follows. As the representation of physical structure of a page, we propose the neighbor graph which represents neighbors of connected components. The use of the area Voronoi diagram enables us to extract neighbors without predetermined parameters. Based on the neighbor graph, the task of text-line extraction is considered to be the selection of its paths appropriate as text-lines. We apply simple iterative selection of edges with local examination so as to reduce the computational cost. From experimental results for 50 pages with rectangular and non-rectangular layout, we discuss advantages and limitations of our method.
KeywordsVoronoi Diagram Neighbor Graph Neighbor Relation Voronoi Region Connected Component Analysis
- 1.L. A. Fletcher and R. Kasturi, A robust algorithm for text string separation from mixed text/graphics images, IEEE Trans. PAMI, Vol. 10, No. 6, pp.910–918, 1988.Google Scholar
- 3.F. Hönes and J. Lichter. Layout extraction of mixed mode documents, Machine Vision and Applications, Vol. 7, pp.237–246, 1994.Google Scholar
- 4.K. Gyohten, T. Sumiya, N. Babaguchi, K. Kakusho and T. Kitahashi, A multiagent based method for extracting characters and character strings, IEICE Trans.Information & Systems, Japan, Vol. E97-D, No. 5, pp.450–455, 1996.Google Scholar