Simple Layout Segmentation of Gray-Scale Document Images
A simple yet effective layout segmentation of document images is proposed in this paper. First, n x n blocks are roughly labeled as background, line, text, images, graphics or mixed class. For blocks in mixed class, they are split into 4 sub-blocks and the process repeats until no mixed class is found. By exploiting Savitzky-Golay derivative filter in the classification, the computation of features is kept to the minimum. Next, the boundaries of each object are refined. The experimental results yields a satisfactory results as a pre-process prior to OCR.
- 2.I. Keslassy, M. Kalman, D. Wang, and B. Girod, Classification of Compound Images Based on Transform Coeficient Likelihood. Proc. ICIP 2001 (2001)Google Scholar