Advertisement

Restoration of Decorative Headline Images for Document Retrieval

  • Tomio Amano
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 1655)

Abstract

This paper describes a method for restoring decorative character images in headlines of newspapers and magazines. Although headlines contain useful keywords for document retrieval, conventional OCRs cannot always recognize them because the characters are often printed in reverse and with various background textures. We made filters that generate multiple candidate images by changing a small number of simple parameters (namely, by setting a threshold for stroke-width filtering and reversing black and white), so that one of the candidates contains a “normal” image whose characters are printed in black on a white background. If all the candidate images are recognized and an index is created, the keywords in headlines are expected to be retrieved without manual keyword entry and verification processes. In an experiment that we conducted, about 90% of characters in headline images segmented from newspapers were restored in the sense that one of the restored candidate images contained correct character images.

Reference

  1. 1.
    Chen, R. and Bloomberg, S.: Extraction of Indicative Summary Sentences from Imaged Documents, ICDAR'97, pp. 227–232 (1997).Google Scholar
  2. 2.
    Senda, S. Minoh, M. and Ikeda, K.: Document Image Retrieval System Using Character Candidates Generated by Character Recognition Process, ICDAR '93, pp. 541–546, (1993).Google Scholar
  3. 3.
    Liang, S. Ahmadi, M. and Shridhar, M.: A Morphological Approach to Text String Extraction from Regular Periodic Overlapping Text/Background images, CVGIP:Graphical Models and Image Processing, Vol. 56, No.5, pp. 402–413 (1994).CrossRefGoogle Scholar
  4. 4.
    Lin, C. Takai, M. and Narita, S: Decorative Character Restoration by Image Processing, Technical report of IEICE (in Japanese) PRU94-12 (1994).Google Scholar
  5. 5.
    Takebe H, Katsuyama Y, and Naoi S.: Character String Extraction from Newspaper Headlines with a Background Design by Recognizing a Combination of Connected Components, Proceeding of the 1998 Information and Systems Society Conference of IEICE (in Japanese) D-12-22 (1998).Google Scholar
  6. 6.
    Sawaki, M and Hagita, N.: Recognition of Degraded Machine-Printed Characters Using a Complementary Similarity Measure and Error-Correction Learning, Trans. IEICE, Vol. E79-D, No. 5, pp. 491–497 (1996).Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 1999

Authors and Affiliations

  • Tomio Amano
    • 1
  1. 1.IBM ResearchTokyo Research LaboratoryYamato-shi, Kanagawa-kenJapan

Personalised recommendations