Automatic Indexing of Newspaper Microfilm Images

  • Qing Hong Liu
  • Chew Lim Tan
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 2423)


This paper describes a proposed document analysis system that aims at automatic indexing of digitized images of old newspaper microfilms. This is done by extracting news headlines from microfilm images. The headlines are then converted to machine readable text by OCR to serve as indices to the respective news articles. A major challenge to us is the poor image quality of the microfilm as most images are usually inadequately illuminated and considerably dirty. To overcome the problem we propose a new effective method for separating characters from noisy background since conventional threshold selection techniques are inadequate to deal with these kinds of images. A Run Length Smearing Algorithm (RLSA) is then applied to the headline extraction. Experimental results confirm the validity of the approach.


News Article Poor Image Quality Text Block Noisy Background Automatic Indexing 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. 1.
    Hideyuki Negishi etc. “Character Extraction from Noisy Background for an automatic Reference System” ICDAR pp. 143–146, 1999Google Scholar
  2. 2.
    James L. Fisher, Stuart C. Hinds. etc “A Rule-Based System for Document Image Segmentation” IEEE Trans. Pattern Matching, 567–572,1990Google Scholar
  3. 3.
    L. O’Gorman “Binarization and multithresholding of Document images using Connectivity” CVGIP: Graphical Model and Image Processing Vol.56, No. 6 November, pp. 494–506, 1994MathSciNetCrossRefGoogle Scholar
  4. 4.
    L.A. Flecher and R. Kasturi,” A robust algorithm for text string separation from mixed text/graphics images” IEEE Trans. Pattern Anal. Machine Intel. Vol. 10 no. 6, pp. 910–918, Nov 1988CrossRefGoogle Scholar
  5. 5.
    L. O’Gorman “Image and document processing techniques for the Right Pages Electronic library system” in Pro.11th Int. Conf. Pattern Recognition(ICPR) Aug 1992, pp. 260–263.Google Scholar
  6. 6.
    Y. Liu, R. Fenrich, S.N. Srihari, An object attribute thresholding algorithm for document image binarization, International Conference on Document Analysis and Recognition, ICDAR’ 93, Japan, 1993, pp. 278–281.Google Scholar
  7. 7.
    M.A. Forrester, etc “Evaluation of potential approach to improve digitized image quality at the patent and trademark office” MITRE Corp., McLean, VA, Working Paper WP-87W00277, July 1987.Google Scholar
  8. 8.
    F.M. Wahl, K.Y. Wong, and R.G. Casey “Block segmentation and text extraction in mixed text / image documents”, Computer vision, Graphics, Image Processing, vol 20, pp. 375–390, 1982.CrossRefGoogle Scholar
  9. 9.
    K.Y. Wong, R.G. Casey, and F.M. Wahl, “Document analysis system”, IBM J.Res.Develop, vol.26, no. 6, pp. 647–656, Nov.1983.CrossRefGoogle Scholar
  10. 10.
    T. Pavlidis: Algorithms for graphics and image processing, Computer Science Press, 1982.Google Scholar
  11. 11.
    Otsu, N., “A threshold selection Method from Gray-Level Histogram” IEEE Trans. System, Man and Cybernetics, Vol. SMC-9, No. 1, pp. 62–66, Jan 1979MathSciNetGoogle Scholar
  12. 12.
    W. Niblack,”An Introduction to Image Processing”, Prentice-Hall, Englewood Cliff, NJ, pp. 115–116,1986.Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2002

Authors and Affiliations

  • Qing Hong Liu
    • 1
  • Chew Lim Tan
    • 1
  1. 1.School of ComputingNational University of SingaporeKent RidgeSingapore

Personalised recommendations