Automatic Indexing of Newspaper Microfilm Images

Liu, Qing Hong; Tan, Chew Lim

doi:10.1007/3-540-45869-7_41

Qing Hong Liu⁶ &
Chew Lim Tan⁶

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2423))

Included in the following conference series:

International Workshop on Document Analysis Systems

1068 Accesses

Abstract

This paper describes a proposed document analysis system that aims at automatic indexing of digitized images of old newspaper microfilms. This is done by extracting news headlines from microfilm images. The headlines are then converted to machine readable text by OCR to serve as indices to the respective news articles. A major challenge to us is the poor image quality of the microfilm as most images are usually inadequately illuminated and considerably dirty. To overcome the problem we propose a new effective method for separating characters from noisy background since conventional threshold selection techniques are inadequate to deal with these kinds of images. A Run Length Smearing Algorithm (RLSA) is then applied to the headline extraction. Experimental results confirm the validity of the approach.

Download to read the full chapter text

Chapter PDF

Binarization with the Local Otsu Filter

Efficient binarization technique for severely degraded document images

Article 10 September 2014

A new efficient binarization method: application to degraded historical document images

Article 24 February 2017

Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

Hideyuki Negishi etc. “Character Extraction from Noisy Background for an automatic Reference System” ICDAR pp. 143–146, 1999
Google Scholar
James L. Fisher, Stuart C. Hinds. etc “A Rule-Based System for Document Image Segmentation” IEEE Trans. Pattern Matching, 567–572,1990
Google Scholar
L. O’Gorman “Binarization and multithresholding of Document images using Connectivity” CVGIP: Graphical Model and Image Processing Vol.56, No. 6 November, pp. 494–506, 1994
Article MathSciNet Google Scholar
L.A. Flecher and R. Kasturi,” A robust algorithm for text string separation from mixed text/graphics images” IEEE Trans. Pattern Anal. Machine Intel. Vol. 10 no. 6, pp. 910–918, Nov 1988
Article Google Scholar
L. O’Gorman “Image and document processing techniques for the Right Pages Electronic library system” in Pro.11th Int. Conf. Pattern Recognition(ICPR) Aug 1992, pp. 260–263.
Google Scholar
Y. Liu, R. Fenrich, S.N. Srihari, An object attribute thresholding algorithm for document image binarization, International Conference on Document Analysis and Recognition, ICDAR’ 93, Japan, 1993, pp. 278–281.
Google Scholar
M.A. Forrester, etc “Evaluation of potential approach to improve digitized image quality at the patent and trademark office” MITRE Corp., McLean, VA, Working Paper WP-87W00277, July 1987.
Google Scholar
F.M. Wahl, K.Y. Wong, and R.G. Casey “Block segmentation and text extraction in mixed text / image documents”, Computer vision, Graphics, Image Processing, vol 20, pp. 375–390, 1982.
Article Google Scholar
K.Y. Wong, R.G. Casey, and F.M. Wahl, “Document analysis system”, IBM J.Res.Develop, vol.26, no. 6, pp. 647–656, Nov.1983.
Article Google Scholar
T. Pavlidis: Algorithms for graphics and image processing, Computer Science Press, 1982.
Google Scholar
Otsu, N., “A threshold selection Method from Gray-Level Histogram” IEEE Trans. System, Man and Cybernetics, Vol. SMC-9, No. 1, pp. 62–66, Jan 1979
MathSciNet Google Scholar
W. Niblack,”An Introduction to Image Processing”, Prentice-Hall, Englewood Cliff, NJ, pp. 115–116,1986.
Google Scholar

Download references

Author information

Authors and Affiliations

School of Computing, National University of Singapore, 117543, Kent Ridge, Singapore
Qing Hong Liu & Chew Lim Tan

Authors

Qing Hong Liu
View author publications
You can also search for this author in PubMed Google Scholar
Chew Lim Tan
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Bell Labs, Lucent Technologies, 600 Mountain Avenue, 07974, Murray Hill, NJ, USA
Daniel Lopresti
Avaya Labs Research, 233 Mount Airy Road, 07920, Basking Ridge, NJ, USA
Jianying Hu & Ramanujan Kashi &

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Liu, Q.H., Tan, C.L. (2002). Automatic Indexing of Newspaper Microfilm Images. In: Lopresti, D., Hu, J., Kashi, R. (eds) Document Analysis Systems V. DAS 2002. Lecture Notes in Computer Science, vol 2423. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45869-7_41

Download citation

DOI: https://doi.org/10.1007/3-540-45869-7_41
Published: 09 August 2002
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-44068-0
Online ISBN: 978-3-540-45869-2
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics

Societies and partnerships

The International Association for Pattern Recognition (opens in a new tab)

Automatic Indexing of Newspaper Microfilm Images

Abstract

Chapter PDF