Skip to main content
Log in

Color segmentation for text extraction

  • Published:
Document Analysis and Recognition Aims and scope Submit manuscript

Abstract.

The capability of extracting and recognizing characters printed in color documents will widen immensely the applications of OCR systems. This paper describes a new method of color segmentation to extract character areas from a color document. At first glance, the characters seem to be printed in a single color, but actual measurements reveal that the color image has a distribution of components. Compared with clustering algorithms, our method prevents oversegmentation and fusion with the background while maintaining real-time usability. It extracts the representative colors based on a histogram analysis of the color space. Our method also contains a selective local color averaging technique that removes the problem of mesh noise on high-resolution color images.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Catalog Age Magazine (2003) http://catalogagemag.com/ar/marketing_printing_preferences_offset/

  2. Cheng HD, Jiang XH, Sun Y, Wang J (2001) Color image segmentation: advances and prospects. Patt Recog 34(12):2259-2281

    Google Scholar 

  3. Goto H, Aso H (2002) Character pattern extraction from documents with complex backgrounds. Int J Doc Anal Recog 4(4):258-268

    Google Scholar 

  4. Heckbert P (1982) Color image quantization for frame buffer display. Comput Graph 16(3):297-307

    Google Scholar 

  5. Ichinose S (1998) A new technology of image scanner. Inf Process Soc Jpn 39(8):769-775 (in Japanese)

    Google Scholar 

  6. Otsu N (1979) A threshold selection method from gray-level histogram. IEEE Trans Sys Man Cybern 9(1):62-66

    Google Scholar 

  7. Perroud T, Sobottka K, Bunke H (2001) Text extraction from color documents clustering approaches in three and four dimensions. In: Proceedings of the 6th international conference on document analysis and recognition (ICDAR2001), Seattle, 10-13 September 2001, pp 937-941

  8. Riseman E, Arbib M (1977) Segmentation of static scenes. Comput Graph Image Process 6:221-276

    Google Scholar 

  9. Rosenfeld A, Kak AC (1978) Digital picture processing. Academic, San Diego

  10. Sobottka K, Kronenberg H, Perroud T, Bunke H (2000) Text extraction from colored book and journal covers. Int J Doc Anal Recog 2:163-176

    Google Scholar 

  11. Tajima J (1983) Uniform color scale applications to computer graphics. Comput Vision Graph Image Process 21:305-325

    Google Scholar 

  12. Wang H, Kangas J (2001) Character-like region verification for extracting text in scene images. In: Proceedings of the 6th international conference on document analysis and recognition (ICDAR2001), Seattle, 10-13 September 2001, pp 957-961

  13. Zhong Y, Karu K, Jain A (1995) Locating text in complex color images. Patt Recog 28(10):1523-1535

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hiroyuki Hase.

Additional information

Received: 25 July 2003, Revised: 10 August 2003, Published online: 6 February 2004

Correspondence to: Hiroyuki Hase. Current address: 3-9-1 Bunkyo, Fukui-shi 910-8507, Japan

Rights and permissions

Reprints and permissions

About this article

Cite this article

Hase, H., Yoneda, M., Tokai, S. et al. Color segmentation for text extraction. IJDAR 6, 271–284 (2003). https://doi.org/10.1007/s10032-003-0119-7

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10032-003-0119-7

Keywords

Navigation