A Unified Preprocessing Technique for Enhancement of Degraded Document Images

  • N. Shobha RaniEmail author
  • A. Sajan Jain
  • H. R. Kiran
Conference paper
Part of the Lecture Notes in Computational Vision and Biomechanics book series (LNCVB, volume 30)


The field of Document Image Processing has encountered sensational development and progressively across the board relevance lately. Luckily, propels in PC innovation have kept pace with the fast development in the volume of picture information in different applications. One such utilization of Document picture preparing is OCR (Optical Character Recognition). Pre-preparing is one of the pre-imperative stages in the handling of record pictures which changes the archive to a frame reasonable for ensuing stages. In this paper, various preprocessing techniques are proposed for the enhancement of degraded document images. The algorithms implemented are adept at handling variety of noises that include foxing effect, illumination correction, show through effect, stain marks, and pen and other scratch marks removal. The techniques devised works based on noise degradation models generated from the attributes of noisy pixels which are commonly found in degraded or ancient document images. Further, these noise models are employed for the detection of noisy regions in the image to undergo the enhancement process. The enhancement procedures employed include the local normalization, convolution using central measures like mean and standard deviation, and Sauvola’s adaptive binarization technique. The outcomes of the preprocessing procedure is very promising and are adaptable to various degraded document scenarios.


Optical character recognition Preprocessing Foxing effect Stain marks Pen and scratch marks Nonuniform illumination Local adaptive binarization 


  1. 1.
    Gupta MR, Jacobson NP, Garcia EK (2007) OCR binarization and image pre-processing for searching historical documents. Pattern Recogn 40(2):389–397CrossRefGoogle Scholar
  2. 2.
    Farooq F, Govindaraju V, Perrone M (2005) Pre-processing methods for handwritten Arabic documents. In: Proceedings of the eighth international conference on document analysis and recognition, 2005, pp 267–271. IEEEGoogle Scholar
  3. 3.
    Rani NS, Vasudev T (2018) An efficient technique for detection and removal of lines with text stroke crossings in document images. In: Proceedings of international conference on cognition and recognition, pp 83–97. Springer, SingaporeGoogle Scholar
  4. 4.
    Rani DANS, Vineeth P, Ajith D (2016) Detection and removal of graphical components in pre-printed documents. Int J Appl Eng Res 11(7):4849–4856Google Scholar
  5. 5.
    Gatos B, Pratikakis I, Perantonis SJ (2006) Adaptive degraded document image binarization. Pattern Recogn 39(3):317–327CrossRefGoogle Scholar
  6. 6.
    Farooq F, Govindaraju V, Perrone M (2005) Pre-processing methods for handwritten Arabic documents. In: Proceedings eighth international conference on document analysis and recognition, 2005, pp 267–271. IEEEGoogle Scholar
  7. 7.
    O’Gorman L (1993) The document spectrum for page layout analysis. IEEE Trans Pattern Anal Mach Intell 15(11):1162–1173CrossRefGoogle Scholar
  8. 8.
    Rehman A, Saba T (2014) Neural networks for document image preprocessing: state of the art. Artif Intell Rev 42(2):253–273CrossRefGoogle Scholar
  9. 9.
    Gatos B, Ntirogiannis K, Pratikakis I (2009) ICDAR 2009 document image binarization contest (DIBCO 2009). In: 10th international conference on document analysis and recognition, 2009. ICDAR’09, pp 1375–1382. IEEEGoogle Scholar
  10. 10.
    Kavallieratou E, Stamatatos E (2006) Improving the quality of degraded document images. In: Second international conference on document image analysis for libraries, 2006. DIAL’06, 10-pp. IEEEGoogle Scholar
  11. 11.
    Chang SG, Yu B, Vetterli M (2000) Adaptive wavelet thresholding for image denoising and compression. IEEE Trans Image Process 9(9):1532–1546MathSciNetCrossRefGoogle Scholar
  12. 12.
    Hsia CH, Hoang HG, Tu HY (2015) Document image enhancement using adaptive directional lifting-based wavelet transform. In: 2015 IEEE international conference on consumer electronics-Taiwan (ICCE-TW), pp 432–433. IEEEGoogle Scholar
  13. 13.
    Ntogas N, Veintzas D (2008) A binarization algorithm for historical manuscripts. In: WSEAS Proceedings of the international conference on mathematics and computers in science and engineering, no. 12. World Scientific and Engineering Academy and SocietyGoogle Scholar
  14. 14.
    Kitadai A, Nakagawa M, Baba H, Watanabe A (2012) Similarity evaluation and shape feature extraction for character pattern retrieval to support reading historical documents. In: 2012 10th IAPR international workshop on document analysis systems (DAS), pp 359–363. IEEEGoogle Scholar
  15. 15.
    Shirai K, Endo Y, Kitadai A, Inoue S, Kurushima N, Baba H et al (2013) Character shape restoration of binarized historical documents by smoothing via geodesic morphology. In: 2013 12th international conference on document analysis and recognition (ICDAR), pp 1285–1289. IEEEGoogle Scholar
  16. 16.
    Lu SJ, Tan CL (2007) Binarization of badly illuminated document images through shading estimation and compensation. In: Ninth international conference on document analysis and recognition, 2007. ICDAR 2007, vol 1, pp 312–316. IEEEGoogle Scholar
  17. 17.
    Kavallieratou E, Antonopoulou H (2005) Cleaning and enhancing historical document images. In: International conference on advanced concepts for intelligent vision systems. Springer, Berlin, pp 681–688CrossRefGoogle Scholar
  18. 18.
    Wolf C (2010) Document ink bleed-through removal with two hidden markov random fields and a single observation field. IEEE Trans Pattern Anal Mach Intell 32(3):431–447CrossRefGoogle Scholar
  19. 19.
    Shi Z, Govindaraju V (2004) Historical document image enhancement using background light intensity normalization. In: Proceedings of the 17th international conference on pattern recognition. ICPR 2004, vol 1, pp 473–476. IEEEGoogle Scholar
  20. 20.
    Garain U, Paquet T, Heutte L (2006) On foreground—background separation in low quality document images. IJDAR 8(1):47CrossRefGoogle Scholar
  21. 21.
    Kanungo T, Haralick RM, Phillips I (1993) Global and local document degradation models. In: Proceedings of the second international conference on document analysis and recognition, 1993. IEEE, pp 730–734Google Scholar
  22. 22.
    Lee J-S (1980) Digital image enhancement and noise filtering by use of local statistics. IEEE Trans Pattern Anal Mach Intell 2:165–168CrossRefGoogle Scholar
  23. 23.
    Ord JK, Getis A (1995) Local spatial autocorrelation statistics: distributional issues and an application. Geogr Anal 27(4):286–306CrossRefGoogle Scholar
  24. 24.
    Young IT, Van Vliet LJ (1995) Recursive implementation of the Gaussian filter. Sig Process 44(2):139–151CrossRefGoogle Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  1. 1.Department of Computer ScienceAmrita School of Arts and Sciences, Amrita Vishwa VidyapeethamMysuruIndia

Personalised recommendations