Skip to main content
Log in

Extraction of newspaper headlines from microfilm for automatic indexing

  • Published:
Document Analysis and Recognition Aims and scope Submit manuscript

Abstract.

This paper proposes a document image analysis system that extracts newspaper headlines from microfilm images with a view to providing automatic indexing for news articles in microfilm. A major challenge in achieving this is the poor image quality of microfilm as most images are usually inadequately illuminated and considerably dirty. To overcome the problem we propose a new effective method for separating characters from noisy background since conventional threshold selection techniques are inadequate to deal with this kind of image. A run length smoothing algorithm is then applied to the headline extraction. Experimental results confirm the validity of the approach.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Fisher JL, Hinds SC, D’Amato DP (1990) A rule-based system for document image segmentation. In: Proceedings of the international conference on pattern recognition (ICPR), Atlantic City, NJ, June 1990, pp 567-572

  2. Fletcher LA, Kasturi R (1988) A robust algorithm for text string separation from mixed text/graphics images. IEEE Trans Patt Analysis Mach Intell 10(6):910-918

    Google Scholar 

  3. Forrester MA(1987) Evaluation of potential approach to improve digitized image quality at the patent and trademark office, MITRE Corp, Working Paper WP-87W00277, McLean, VA

  4. Junker M, Hoch R, Dengle A (1999) On the evaluation of document analysis components by recall, precision and accuracy. In: Proceedings of the international conference on document analysis and recognition (ICDAR), Bangalore, India, September 1999, pp 713-716

  5. Negishi H, Kato J, Hase H, Watanabe T (1999) Character extraction from noisy background for an automatic reference system. In: Proceedings of the international conference on document analysis and recognition (ICDAR), Bangalore, India, September 1999, pp 143-146

  6. Niblack W (1986) An introduction to image processing. Prentice-Hall, Englewood Cliffs, NJ, pp 115-116

  7. Niyogi D, Sihari SN (1997) The use of document structure analysis to retrieve information from documents in digital libraries. In: Proceedings of SPIE Document Recognition and Retrieval IV, San Jose, February 1997

  8. Niyogi D, Sihari SN (1996) Using domain knowledge to derive the logical structure of documents. In: Proceedings of SPIE Document Recognition and Retrieval III, San Jose, January 1996

  9. O’Gorman L (1992) Image and document processing techniques for the right pages electronic library system. In: Proceedings of the international conference on pattern recognition (ICPR), Amsterdam, August 1992, pp 260-263

  10. O’Gorman L (1994) Binarization and multithresholding of document images using connectivity. CVGIP Graphical Model Image Process 56(6):494-506

  11. Otsu N (1979) A threshold selection method from gray-level histogram. IEEE Trans Sys Man Cybern SMC-9(1):62-66

    Google Scholar 

  12. Pavlidis T (1982) Algorithms for graphics and image processing. Computer Science Press, Rockville, MD

  13. Takebe H, Katsuyama Y, Naoi S (1999) Character string extraction from newspaper headlines with a background design by recognizing a combination of connected component. In: Proceedings of SPIE Document Recognition and Retrieval VI, San Jose, January 1999, pp 22-29

  14. Trier OD, Taxt T (1995) Evaluation of binarization methods for document images. IEEE Trans Patt Analysis Mach Intell 17:312-315

    Google Scholar 

  15. Wong KY, Casey RG, Wahl FM (1983) Document analysis system. IBM J Res Develop 26(6):647-656

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Chew Lim Tan.

Additional information

Received: 15 November 2002, Accepted: 19 May 2003, Published online: 30 January 2004

Correspondence to: Chew Lim Tan

Rights and permissions

Reprints and permissions

About this article

Cite this article

Tan, C.L., Liu, Q.H. Extraction of newspaper headlines from microfilm for automatic indexing. IJDAR 6, 201–210 (2003). https://doi.org/10.1007/s10032-003-0111-2

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10032-003-0111-2

Keywords:

Navigation