Advertisement

An adaptive water flow model for binarization of degraded document images

  • Morteza ValizadehEmail author
  • Ehsanollah Kabir
Original Paper

Abstract

In this paper, we present an adaptive water flow model for the binarization of degraded document images. We regard an image surface as a three-dimensional terrain and pour water on it. The water finds the valleys and fills them. Our algorithm controls the rainfall process, pouring the water, in such a way that the water fills up to half of the valley’s depth. After stopping the rainfall, each wet region represents one character or a noisy component. To segment each character, we labeled the wet regions and regarded them as blobs; since some of the blobs are noisy components, we use a multilayer Perceptron to label each blob as either text or non-text. Since our algorithm classifies the blobs instead of pixels, it preserves stroke connectivity. After several experiments, the proposed binarization algorithm demonstrated superior performance against six well-known algorithms on three sets of degraded document images. The main superiority of our algorithm is on document images with uneven illumination.

Keywords

Adaptive water flow Document binarization Degraded image Blob extraction Multilayer Perceptron 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Gatos B., Pratikakis I., Perantonis S.J.: Adaptive degraded document image binarization. Pattern Recognit. 39, 317–327 (2006)CrossRefzbMATHGoogle Scholar
  2. 2.
    Otsu N.: A threshold selection method from grey level histogram. IEEE Trans. Syst. Man Cybernet. 9, 62–66 (1979)CrossRefGoogle Scholar
  3. 3.
    Kapur J.N., Sahoo P.K., Wong A.K.C.: A new method for graylevel picture thresholding using the entropy of the histogram. Comput. Vis. Graph. Image Process. 29, 273–285 (1985)CrossRefGoogle Scholar
  4. 4.
    Weszka J.S., Rosenfield A.: Histogram modification for threshold selection. IEEE Trans. Syst. Man Cybernet. 9, 38–52 (1979)CrossRefGoogle Scholar
  5. 5.
    Dawoud A., Kamel M.S.: Iterative multimodel subimage binarization for handwritten character segmentation. IEEE Trans. Image Process. 13, 1223–1230 (2004)CrossRefGoogle Scholar
  6. 6.
    Liu Y., Srihari S.N.: Document image binarization based on texture features. IEEE Trans. Pattern Anal. Mach. Intell. 19, 540–544 (1997)CrossRefGoogle Scholar
  7. 7.
    Sauvola J., Pietikainen M.: Adaptive document image binarization. Pattern Recognit. 33, 225–236 (2000)CrossRefGoogle Scholar
  8. 8.
    Lu, S., Tan, C.L.: Binarization of badly illuminated document images through shading estimation and compensation. In: Proceedings of 9th International Conference on Document Analysis and Recognition, Brazil, pp. 312–316 (2007)Google Scholar
  9. 9.
    Chen Y., Leedham G.: Decompose algorithm for thresholding degraded historical document images. IEE Proc. Vis. Image Signal Process. 152, 702–714 (2005)CrossRefGoogle Scholar
  10. 10.
    Parker J.R.: Gray level thresholding in badly illuminated images. IEEE Trans. Pattern Anal. Mach. Intell. 13, 813–819 (1991)CrossRefGoogle Scholar
  11. 11.
    Niblack W.: An Introduction to Digital Image Processing. Prentice Hall, Englewood Cliffs, NJ (1986)Google Scholar
  12. 12.
    Yang Y., Yan H.: An adaptive logical method for binarization of degraded document images. Pattern Recognit. 33, 787–807 (2000)CrossRefGoogle Scholar
  13. 13.
    Rodtook, S., Rangsanseri, Y.: Adaptive thresholding of document images based on Laplacian sign. In: Proceedings of International Conference on Information Technology: Coding and Computing, pp. 501–505 (2001)Google Scholar
  14. 14.
    Chen Q., Sun Q.S., Heng P.A., Xia D.S.: A double-threshold image binarization method based on edge detector. Pattern Recognit. 41, 1254–1267 (2008)CrossRefGoogle Scholar
  15. 15.
    Huang S., Ahmadi M., Sid-Ahmed M.A.: A hidden Markov model-based character extraction method. Pattern Recognit. 41, 2890–2900 (2008)CrossRefzbMATHGoogle Scholar
  16. 16.
    Kim I.K., Jung D.W., Park R.H.: Document image binarization based on topographic analysis using a water flow model. Pattern Recognit. 35, 265–277 (2002)CrossRefzbMATHGoogle Scholar
  17. 17.
    Gatos, B., Pratikakis, I., Perantonis, S.J.: Efficient binarization of historical and degraded document images. In: Proceedings of 8th IAPR Workshop on Document Analysis Systems, pp. 447–454 (2008)Google Scholar
  18. 18.
    Kamel M., Zhao A.: Extraction of binary character/graphics images from grayscale document images. Graph. Model. Image Process. 55, 203–217 (1993)CrossRefGoogle Scholar
  19. 19.
    Oh H.H., Lim K.T., Hien S.I.: An improved binarization algorithm based on a water flow model for document image with inhomogeneous backgrounds. Pattern Recognit. 38, 2612–2625 (2005)CrossRefGoogle Scholar
  20. 20.
    Papamarkos N.: A neuro-fuzzy technique for document binarisation. Neural Comput. Appl. 12, 190–199 (2003)CrossRefGoogle Scholar
  21. 21.
    Gupta M.R., Jacobson N.P., Garcia E.K.: OCR binarization and image pre-processing for searching historical documents. Pattern Recognit. 40, 389–397 (2007)CrossRefzbMATHGoogle Scholar
  22. 22.
    Badekas E., Papamarkos N.: Optimal combination of document binarization techniques using a self-organizing map neural network. Eng. Appl. Artif. Intell. 20, 11–24 (2007)CrossRefGoogle Scholar
  23. 23.
    Ye X., Cheriet M., Suen C.Y.: Stroke-model-based character extraction from gray-level document images. IEEE Trans. Image Process. 10, 1152–1161 (2001)MathSciNetCrossRefzbMATHGoogle Scholar
  24. 24.
    White J.M., Rohrer G.D.: Image segmentation for optical character recognition and other applications requiring character image extraction. IBM J. Res. Dev. 27, 400–411 (1983)CrossRefGoogle Scholar
  25. 25.
    Valizadeh, M., Kabir, E.: Binarization of degraded document image based on feature space partitioning and classification. Int. J. Doc. Anal. Recognit. (available online since December 2010)Google Scholar
  26. 26.
    Lu S., Su B., Tan C.L.: Document image binarization using background estimation and stroke edges. Int. J. Doc. Anal. Recognit. 13, 303–314 (2010)CrossRefGoogle Scholar
  27. 27.
    Gatos, B., Ntirogiannis, K., Pratikakis, I.: ICDAR 2009 document image binarization contest (DIBCO 2009). In: Proceedings of 10th International Conference on Document Analysis and Recognition, Spain, pp. 1375–1382 (2009)Google Scholar
  28. 28.
    Valizadeh, M., Komeili, M., Armanfard, N., Kabir, E.: Degraded document image binarization based on combination of two complementary algorithms. In: Proceedings of International Conference on Advances in Computing Tools for Engineering Applications, Lobanon, pp. 595–599 (2009)Google Scholar
  29. 29.
    First international document image binarization contest. http://users.iit.demokritos.gr/~bgat/DIBCO2009/benchmark/
  30. 30.
    Badekas, E., Papamarkos, N.: Automatic evaluation of document binarization results. In: Proceedings of the 10th Iberoamerican Congress on Pattern Recognition, Havana, pp. 1005–1014 (2005)Google Scholar
  31. 31.
    Media Team Oulu Document database. http://www.mediateam.oulu.fi/MTDB/

Copyright information

© Springer-Verlag 2012

Authors and Affiliations

  1. 1.Department of Electrical and Computer EngineeringTarbiat Modarres UniversityTehranIran

Personalised recommendations