Recursive Projection Profiling for Text-Image Separation
This paper presents an efficient and very simple method for separating text characters from graphical images in a given document image. This is based on a Recursive Projection Profiling (RPP) of the document image. The algorithm tries to use the projection profiling method   to its maximum bent to bring out almost all that is possible with the method. The projection profile reveals the empty space along the horizontal and vertical axes, projecting the gaps between the characters/images. The algorithm turned out to be quite efficient, accurate and least complex in nature. Though some exceptional cases were encountered owing to the drawbacks of projection profiling, they were well handled with some simple heuristics thus resulting in a very efficient method for text-image separation.
KeywordsDocument Image Graphical Image Simple Heuristic Character Segmentation Pixel Density
Unable to display preview. Download preview PDF.
- N. J. Naccache and R Shinghal, Proposed Algorithm for Thinning Binary Patterns, IEEE Transactions on Systems, Man. and Cybernatics, SMC-14:409-418, 1984.Google Scholar
- H. S. Baird, The Skew Angle of Printed Documents, In Proc. of the Conference Society of Photographic Scientists and Engineers, Volume 40, Pages 21-24, Rochester, NY, May, 20-21 1987.Google Scholar
- R. Cattani, T. Coianiez, S. Messelodi & C. Modena, Geometric Layout Analysis Techniques for Document Image Understanding: A Review, IRST Technical Report, Trento, Italy, 1998, 68pp.Google Scholar
- D. Wang, S. Srihari Classification of Newspaper Image Blocks Using Texture Analysis. Computer Vision, Graphics, and Image Processing, Vol. 47, 1989, pp.327-352.Google Scholar
- A. K. Jain and S. Bhattacharjee, Text Segmentation Using Gabor Filters for Automatic Document Processing, Machine Vision and Applications, Vol. 5, No. 3, 1992, pp. 169-184.Google Scholar
- O. Okun, D. Doermann, Matti P. Page Segmentation and zone classification. The State of the Art, Nov 1999.Google Scholar
- C.L. Tan, Z. Zhang Text block segmentation using pyramid structure. SPIE Document Recognition and Retrieval, Vol. 8, January 24-25, 2001, San Jose, USA, pp. 297-306.Google Scholar
- H. Makino. Representation and segmentation of document images. Proc. of IEEE Computer Society Conference on Pattern Recognition and Image Processing, 1983, pp. 291-296.Google Scholar
- J. Duong, M. Ct, H. Emptoz, C. Suen. Extraction of Text Areas in Printed Document Images. ACM Symposium on Document Engineering ,DocEng’01, Atlanta (USA), November9-10, 2001, pp. 157-165.Google Scholar