Document layout analysis using pattern classification method
This paper presents a bottom-up approach for segmenting document images and labeling the segmented regions with logical names. Our method uses image features in terms of the characteristics of text lines, such as margin and character size, then our method can analyze an unstable document image that has floating elements such as figures and tables. Experimental application of this method to images of technical journals written in Japanese yielded classification rates of 98.6 % for the front pages and 90.0 % for the final pages that have floating elements.
- 1.A.Dengel,“ANASTASIL: A System for Low-Level and High-Level Geometric Analysis of Printed Documents,” Structured Document Image Analysis, pp. 70–98, Springer-Verlag, 1992.Google Scholar
- 2.K.Iwane, M.Yamaoka and O.Iwaki, “A Functional Classification Approach to Layout Analysis of Document Images,” Proceedings of the Second International Conference on Document Analysis and Recognition, pp. 778–781, 1993.Google Scholar