Skip to main content
Log in

An adaptive water flow model for binarization of degraded document images

  • Original Paper
  • Published:
International Journal on Document Analysis and Recognition (IJDAR) Aims and scope Submit manuscript

Abstract

In this paper, we present an adaptive water flow model for the binarization of degraded document images. We regard an image surface as a three-dimensional terrain and pour water on it. The water finds the valleys and fills them. Our algorithm controls the rainfall process, pouring the water, in such a way that the water fills up to half of the valley’s depth. After stopping the rainfall, each wet region represents one character or a noisy component. To segment each character, we labeled the wet regions and regarded them as blobs; since some of the blobs are noisy components, we use a multilayer Perceptron to label each blob as either text or non-text. Since our algorithm classifies the blobs instead of pixels, it preserves stroke connectivity. After several experiments, the proposed binarization algorithm demonstrated superior performance against six well-known algorithms on three sets of degraded document images. The main superiority of our algorithm is on document images with uneven illumination.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Gatos B., Pratikakis I., Perantonis S.J.: Adaptive degraded document image binarization. Pattern Recognit. 39, 317–327 (2006)

    Article  MATH  Google Scholar 

  2. Otsu N.: A threshold selection method from grey level histogram. IEEE Trans. Syst. Man Cybernet. 9, 62–66 (1979)

    Article  Google Scholar 

  3. Kapur J.N., Sahoo P.K., Wong A.K.C.: A new method for graylevel picture thresholding using the entropy of the histogram. Comput. Vis. Graph. Image Process. 29, 273–285 (1985)

    Article  Google Scholar 

  4. Weszka J.S., Rosenfield A.: Histogram modification for threshold selection. IEEE Trans. Syst. Man Cybernet. 9, 38–52 (1979)

    Article  Google Scholar 

  5. Dawoud A., Kamel M.S.: Iterative multimodel subimage binarization for handwritten character segmentation. IEEE Trans. Image Process. 13, 1223–1230 (2004)

    Article  Google Scholar 

  6. Liu Y., Srihari S.N.: Document image binarization based on texture features. IEEE Trans. Pattern Anal. Mach. Intell. 19, 540–544 (1997)

    Article  Google Scholar 

  7. Sauvola J., Pietikainen M.: Adaptive document image binarization. Pattern Recognit. 33, 225–236 (2000)

    Article  Google Scholar 

  8. Lu, S., Tan, C.L.: Binarization of badly illuminated document images through shading estimation and compensation. In: Proceedings of 9th International Conference on Document Analysis and Recognition, Brazil, pp. 312–316 (2007)

  9. Chen Y., Leedham G.: Decompose algorithm for thresholding degraded historical document images. IEE Proc. Vis. Image Signal Process. 152, 702–714 (2005)

    Article  Google Scholar 

  10. Parker J.R.: Gray level thresholding in badly illuminated images. IEEE Trans. Pattern Anal. Mach. Intell. 13, 813–819 (1991)

    Article  Google Scholar 

  11. Niblack W.: An Introduction to Digital Image Processing. Prentice Hall, Englewood Cliffs, NJ (1986)

    Google Scholar 

  12. Yang Y., Yan H.: An adaptive logical method for binarization of degraded document images. Pattern Recognit. 33, 787–807 (2000)

    Article  Google Scholar 

  13. Rodtook, S., Rangsanseri, Y.: Adaptive thresholding of document images based on Laplacian sign. In: Proceedings of International Conference on Information Technology: Coding and Computing, pp. 501–505 (2001)

  14. Chen Q., Sun Q.S., Heng P.A., Xia D.S.: A double-threshold image binarization method based on edge detector. Pattern Recognit. 41, 1254–1267 (2008)

    Article  Google Scholar 

  15. Huang S., Ahmadi M., Sid-Ahmed M.A.: A hidden Markov model-based character extraction method. Pattern Recognit. 41, 2890–2900 (2008)

    Article  MATH  Google Scholar 

  16. Kim I.K., Jung D.W., Park R.H.: Document image binarization based on topographic analysis using a water flow model. Pattern Recognit. 35, 265–277 (2002)

    Article  MATH  Google Scholar 

  17. Gatos, B., Pratikakis, I., Perantonis, S.J.: Efficient binarization of historical and degraded document images. In: Proceedings of 8th IAPR Workshop on Document Analysis Systems, pp. 447–454 (2008)

  18. Kamel M., Zhao A.: Extraction of binary character/graphics images from grayscale document images. Graph. Model. Image Process. 55, 203–217 (1993)

    Article  Google Scholar 

  19. Oh H.H., Lim K.T., Hien S.I.: An improved binarization algorithm based on a water flow model for document image with inhomogeneous backgrounds. Pattern Recognit. 38, 2612–2625 (2005)

    Article  Google Scholar 

  20. Papamarkos N.: A neuro-fuzzy technique for document binarisation. Neural Comput. Appl. 12, 190–199 (2003)

    Article  Google Scholar 

  21. Gupta M.R., Jacobson N.P., Garcia E.K.: OCR binarization and image pre-processing for searching historical documents. Pattern Recognit. 40, 389–397 (2007)

    Article  MATH  Google Scholar 

  22. Badekas E., Papamarkos N.: Optimal combination of document binarization techniques using a self-organizing map neural network. Eng. Appl. Artif. Intell. 20, 11–24 (2007)

    Article  Google Scholar 

  23. Ye X., Cheriet M., Suen C.Y.: Stroke-model-based character extraction from gray-level document images. IEEE Trans. Image Process. 10, 1152–1161 (2001)

    Article  MathSciNet  MATH  Google Scholar 

  24. White J.M., Rohrer G.D.: Image segmentation for optical character recognition and other applications requiring character image extraction. IBM J. Res. Dev. 27, 400–411 (1983)

    Article  Google Scholar 

  25. Valizadeh, M., Kabir, E.: Binarization of degraded document image based on feature space partitioning and classification. Int. J. Doc. Anal. Recognit. (available online since December 2010)

  26. Lu S., Su B., Tan C.L.: Document image binarization using background estimation and stroke edges. Int. J. Doc. Anal. Recognit. 13, 303–314 (2010)

    Article  Google Scholar 

  27. Gatos, B., Ntirogiannis, K., Pratikakis, I.: ICDAR 2009 document image binarization contest (DIBCO 2009). In: Proceedings of 10th International Conference on Document Analysis and Recognition, Spain, pp. 1375–1382 (2009)

  28. Valizadeh, M., Komeili, M., Armanfard, N., Kabir, E.: Degraded document image binarization based on combination of two complementary algorithms. In: Proceedings of International Conference on Advances in Computing Tools for Engineering Applications, Lobanon, pp. 595–599 (2009)

  29. First international document image binarization contest. http://users.iit.demokritos.gr/~bgat/DIBCO2009/benchmark/

  30. Badekas, E., Papamarkos, N.: Automatic evaluation of document binarization results. In: Proceedings of the 10th Iberoamerican Congress on Pattern Recognition, Havana, pp. 1005–1014 (2005)

  31. Media Team Oulu Document database. http://www.mediateam.oulu.fi/MTDB/

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Morteza Valizadeh.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Valizadeh, M., Kabir, E. An adaptive water flow model for binarization of degraded document images. IJDAR 16, 165–176 (2013). https://doi.org/10.1007/s10032-012-0182-z

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10032-012-0182-z

Keywords

Navigation