Document Image De-warping Based on Detection of Distorted Text Lines

  • Lothar Mischke
  • Wolfram Luther
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3617)

Abstract

Image warping caused by scanning, photocopying or photographing a document is a common problem in the .eld of document processing and understanding. Distortion within the text documents impairs OCRability and thus strongly decreases the usability of the results. This is one of the major obstacles for automating the process of digitizing printed documents.

In this paper we present a novel algorithm which is able to correct document image warping based on the detection of distorted text lines. The proposed solution is used in a recent project of digitizing old, poor quality manuscripts. The algorithm is compared to other published approaches. Experiments with various document samples and the resulting improvements of the text recognition rate achieved by a commercial OCR engine are also presented.

Keywords

Document Image Text Line Core Zone Text Block Line Part 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

  1. 1.
    Amin, A., Fischer, S., Parkinson, A.F., Shiu, R.: Comparative Study of Skew Detection Algorithms. Jour. of Electronic Imaging SPIE, USA, 443–451 (1996)Google Scholar
  2. 2.
    Biella, D., Dyllong, E., Kaiser, H., Luther, W., Mittmann, T.: Edition électronique de la réception de Nietzsche des années 1865 à 1945. In: Proc. ICHIM 2003, Paris, France (September 2003)Google Scholar
  3. 3.
    Biella, D., Luther, W.: Mobile verteilte Dokumentenrecherche in Bibliotheken und Archiven. In: INFORMATIK 2003 - Innovative Informatikanwendungen, GI 2003, Germany, vol. 1, pp. 298–302 (2003)Google Scholar
  4. 4.
    Biella, D., Luther, W., Pilz, T.: A web-based System for Assisted Literature Research. In: Proceedings of the 3rd European Conference on e-Learning, ECEL 2004, Paris, France, November 2004, pp. 15–24 (2004)Google Scholar
  5. 5.
    Cao, H., Ding, X., Liu, C.: A Cylindrical Surface Model to Rectify the Bound Document Image. In: Ninth IEEE ICCV 2003, Nice, France, October 2003, vol. 1, pp. 228–233 (2003)Google Scholar
  6. 6.
    Fletcher, L.A., Kasturi, R.: A Robust Algorithm for Text String Separation from Mixed Text/Graphics Images. IEEE Trans. Pattern Anal. Mach. Intell. 10(6), 910–918 (1988)CrossRefGoogle Scholar
  7. 7.
    Otsu, N.: A Threshold Selection Method from Graylevel Histograms. IEEE Trans. Sys. Man Cybern. 9(1), 62–66 (1979)CrossRefMathSciNetGoogle Scholar
  8. 8.
    Savakis, A.E.: Adaptive Document Image Thresholding Using Foreground and Background Clustering. In: Proc. of ICIP 1998, pp. 785–789 (1998)Google Scholar
  9. 9.
    Wu, C., Agam, G.: Document Image De-Warping for Text/Graphics Recognition. In: Proc. of Joint IAPR 2002 and SPR 2002, Windsor, Ontario, Canada, August 2002, pp. 348–357 (2002)Google Scholar
  10. 10.
    Zhang, Z., Tan, C.L.: Correcting Document Image Warping Based on Regression of Curved Text Lines. In: ICDAR 2003, Edinburgh, UK, August 2003, pp. 589–593 (2003)Google Scholar
  11. 11.
    Zhang, Z., Tan, C.L., Fan, L.: Estimation of 3D Shape of Warped Document Surface for Image Restoration. In: ICPR 2004, Cambridge, UK, August 2004, pp. 486–489 (2004)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2005

Authors and Affiliations

  • Lothar Mischke
    • 1
  • Wolfram Luther
    • 2
  1. 1.Eduard Spranger Vocational SchoolHammGermany
  2. 2.Institute of Computer Science and Interactive SystemsUniversity of Duisburg–EssenDuisburgGermany

Personalised recommendations