Document Layout and Reading Sequence Analysis by Extended Split Detection Method
This paper describes an Extended Split Detection Method that can hierarchically segment a machine-printed page image with a complex layout into smaller layout elements. The method performs piecewise-linear segmentation using many kinds of separator elements such as field separators, lines, edges of figures, and edges of white background areas. Furthermore, this method represents an analyzed layout of a hierarchical structure in a tree data structure, in which all nodes are traversed according to the simple rules for generating the reading sequence. We demonstrated that the new method increases the correct character line segmentation rate by 15.5%, to 95.5%, and we achieved a correct reading sequence generation of 88.1%.
- 3.M. Okamoto and M. Takahashi," A Hybrid Page Segmentation Method", Proc. ICDAR, pp. 743–748, 1993.Google Scholar
- 4.Y. Tsuji," Document Image Analysis for Generating Syntactic Structure Description", Proc. ICPR, pp. 744–747, 1988.Google Scholar
- 5.A. K. Jain and Bin Yu, "Page Segmentation Using Document Model", Proc. ICDAR, pp. 34–38. 1997.Google Scholar
- 7.Y. Ishitani, Document Layout Analysis Based on Emergent Computation, Proc. ICDAR, pp. 45–50, 1997.Google Scholar
- 8.K. Kise, O. Yanagida, and S. Takamatsu," Page Segmentation Based on Thinning of Background", Proc. ICPR, pp. 788–792, 1996.Google Scholar
- 9.J. Liu, Y. Y. Tang, Q. He, and C. Y. Suen," Adaptive document segmentation and geometric relation labeling: algorithm and experimental results", Proc. ICPR, pp. 63–767, 1996.Google Scholar