Abstract.
The capability of extracting and recognizing characters printed in color documents will widen immensely the applications of OCR systems. This paper describes a new method of color segmentation to extract character areas from a color document. At first glance, the characters seem to be printed in a single color, but actual measurements reveal that the color image has a distribution of components. Compared with clustering algorithms, our method prevents oversegmentation and fusion with the background while maintaining real-time usability. It extracts the representative colors based on a histogram analysis of the color space. Our method also contains a selective local color averaging technique that removes the problem of mesh noise on high-resolution color images.
Similar content being viewed by others
References
Catalog Age Magazine (2003) http://catalogagemag.com/ar/marketing_printing_preferences_offset/
Cheng HD, Jiang XH, Sun Y, Wang J (2001) Color image segmentation: advances and prospects. Patt Recog 34(12):2259-2281
Goto H, Aso H (2002) Character pattern extraction from documents with complex backgrounds. Int J Doc Anal Recog 4(4):258-268
Heckbert P (1982) Color image quantization for frame buffer display. Comput Graph 16(3):297-307
Ichinose S (1998) A new technology of image scanner. Inf Process Soc Jpn 39(8):769-775 (in Japanese)
Otsu N (1979) A threshold selection method from gray-level histogram. IEEE Trans Sys Man Cybern 9(1):62-66
Perroud T, Sobottka K, Bunke H (2001) Text extraction from color documents clustering approaches in three and four dimensions. In: Proceedings of the 6th international conference on document analysis and recognition (ICDAR2001), Seattle, 10-13 September 2001, pp 937-941
Riseman E, Arbib M (1977) Segmentation of static scenes. Comput Graph Image Process 6:221-276
Rosenfeld A, Kak AC (1978) Digital picture processing. Academic, San Diego
Sobottka K, Kronenberg H, Perroud T, Bunke H (2000) Text extraction from colored book and journal covers. Int J Doc Anal Recog 2:163-176
Tajima J (1983) Uniform color scale applications to computer graphics. Comput Vision Graph Image Process 21:305-325
Wang H, Kangas J (2001) Character-like region verification for extracting text in scene images. In: Proceedings of the 6th international conference on document analysis and recognition (ICDAR2001), Seattle, 10-13 September 2001, pp 957-961
Zhong Y, Karu K, Jain A (1995) Locating text in complex color images. Patt Recog 28(10):1523-1535
Author information
Authors and Affiliations
Corresponding author
Additional information
Received: 25 July 2003, Revised: 10 August 2003, Published online: 6 February 2004
Correspondence to: Hiroyuki Hase. Current address: 3-9-1 Bunkyo, Fukui-shi 910-8507, Japan
Rights and permissions
About this article
Cite this article
Hase, H., Yoneda, M., Tokai, S. et al. Color segmentation for text extraction. IJDAR 6, 271–284 (2003). https://doi.org/10.1007/s10032-003-0119-7
Issue Date:
DOI: https://doi.org/10.1007/s10032-003-0119-7