Skip to main content
Log in

Structural feature-based evaluation method of binarization techniques for word retrieval in the degraded Arabic document images

  • Original Paper
  • Published:
International Journal on Document Analysis and Recognition (IJDAR) Aims and scope Submit manuscript

Abstract

One of the most important and necessary steps in the process of document analysis and recognition is the binarization, which allows extracting the foreground from the background. Several binarization techniques have been proposed in the literature, but none of them was reliable for all image types. This makes the selection of one method to apply in a given application very difficult. Thus, performance evaluation of binarization algorithms becomes therefore vital. In this paper, we are interested in the evaluation of binarization techniques for the purpose of retrieving words from the images of degraded Arabic documents. A new evaluation methodology is proposed. The proposed evaluation methodology is based on the comparison of the visual features extracted from the binarized document images with ground truth features instead of comparing images between themselves. The most appropriate thresholding method for each image is the one for which the visual features of the identified words in the image are “closer” to the features of the reference words. The proposed technique was used here to assess the performances of eleven algorithms based on different approaches on a collection of real and synthetic images.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11

Similar content being viewed by others

Notes

  1. Document image binarization contest.

  2. Handwritten document image binarization competition.

References

  1. Bernsen, J.: Dynamic thresholding of grey-level images. In: Proceedings of the 8th International Conference on Pattern Recognition, Paris, France, pp. 1251–1255 (1986)

  2. Boubaker, H., Kherallah, M., Alimi, A.M.: New algorithm of straight or curved baseline detection for short Arabic handwritten writing. In: Proceedings of the International Conference on Document Analysis and Recognition, pp. 778–782 (2009)

  3. Burgoyne, J.A., Pugin, L., Eustace, G., Fujinaga, I.: A comparative survey of image binarisation algorithms for optical recognition on degraded musical sources. In: Proceedings of ISMIR, pp. 509–512 (2007)

  4. Cheriet, M., Miled, H., Olivier, C., Lecourtier, Y.: Visual aspect of cursive Arabic handwriting recognition. In: Proceedings of Vision Interface, pp. 262–270 (1998)

  5. Fung, C.C., Chamchong, R.: A review of evaluation of optimal binarization technique for character segmentation in historical manuscripts. In: Proceedings of the 2010 Third International Conference on Knowledge Discovery and Data Mining, pp. 236–240 (2010)

  6. Gatos, B., Pratikakis, I., Perantonis, S.J.: An adaptive binarization technique for low quality historical documents. In: Proceedings of the 6th International Workshop on Document Analysis Systems, pp. 102–113 (2004)

  7. Gatos, B., Ntirogiannis, K., Pratikakis, I.: ICDAR 2009 Document Image Binarization Contest (DIBCO 2009). In: Proceedings of ICDAR, pp. 1375–1382 (2009)

  8. Goyal, R., Kaur, A.: A review of optimal binarization techniques on documents with damaged background. Int. J. Comput. Sci. Technol. 2(2), 237–239 (2011)

    Google Scholar 

  9. Gupta, M.R., Jacobson, N.P., Garcia, E.K.: OCR binarization and image pre-processing for searching historical documents. Pattern Recognit. 40, 389–397 (2007)

    Article  MATH  Google Scholar 

  10. He, J., Do, Q.D.M., Downton, A.C., Kim, J.H.: A comparison of binarization methods for historical archive documents. In: Proceedings of the Eight International Conference on Document Analysis and Recognition, pp. 538–542 (2005)

  11. Kamel, M., Zhao, A.: Extraction of binary character/graphics images from grayscale document images. CVGIP Comput. Vis. Graph. Image Process. 55(3), 203–217 (1993)

    Article  Google Scholar 

  12. Kapur, J.N., Sahoo, P.K., Wong, A.K.C.: A new method for gray-level picture threshold using the entropy of the histogram. Comput. Vis. Graph. Image Process. 29(3), 273–285 (1985)

    Article  Google Scholar 

  13. Kavallieratou, E., Stathis, S.: Adaptive binarization of historical document images. In: Proceedings of ICPR, pp. 742–745 (2006)

  14. Kavallieratou, E.: An objective way to evaluate and compare binarization algorithms. In: Proceedings of the 2008 ACM Symposium on Applied Computing, pp. 397–401 (2008)

  15. Kefali, A., Sari, T., Sellami, M.: Evaluation of several binarization techniques for old Arabic documents images. In: First International Symposium on Modeling and Implementing Complex Systems MISC’10. Constantine, Algeria, pp. 88–99 (2010)

  16. Kefali, A., Sari, T., Bahi, H.: Foreground–background separation by feed-forward neural networks in old manuscripts. Informatica 38(4), 329–338 (2014)

    Google Scholar 

  17. Kim, I.K., Jung, D.W., Park, R.H.: Document image binarization based on topographic analysis using a water flow model. Pattern Recognit. 35(1), 265–277 (2002)

    Article  MATH  Google Scholar 

  18. Kumar, D., Prasad, M.A., Ramakrishnan, A.G.: Evaluation of document binarization using eigen value decomposition. IS&T/SPIE Electronic Imaging. International Society for Optics and Photonics, pp. 86580X–86580X (2013)

  19. Khurshid, K., Siddiqi, I., Faure, C., Vincent, N.: Comparison of Niblack inspired binarization methods for ancient documents. In: Proccedings of the 16th Document Recognition and Retrieval Conference, USA (2009)

  20. Leedham, G., Varma, S., Patankar, A., Govindaraju, V.: Separating text and background in degraded documents images—a comparison of global thresholding techniques for multi-stage thresholding. In: Proccedings of the 8th International Workshop on Frontiers in Handwriting Recognition, Canada, August 6–8, pp. 244–249 (2002)

  21. Leedham, G., Yan, C., Takru, K., Tan, J.H.N., Mian, L.: Comparison of some thresholding algorithms for text/background segmentation in difficult document images. In: Proceedings of the Seventh International Conference on Document Analysis and Recognition, vol. 2, pp. 859–864 (2003)

  22. Lu, H., Kot, A.C., Shi, Y.Q.: Distance–reciprocal distortion measure for binary document images. IEEE Signal Process. Lett. 11(2), 228–231 (2004)

    Article  Google Scholar 

  23. Niblack, W.: An Introduction to Digital Image Processing. Prentice Hall, Englewood Cliffs (1986)

    Google Scholar 

  24. Howe, N.R.: A Laplacian energy for document binarization 2011. In: Proceedings of the International Conference on Document Analysis and Recognition, pp. 6–10 (2011)

  25. Ntirogiannis, K., Gatos, B., Pratikakis, I.: An objective evaluation methodology for handwritten image document binarization techniques. In: Proceedings of the 11th International Conference on Frontiers in Handwriting Recognition, Montreal, Canada, August, pp. 586–591 (2008)

  26. Ntirogiannis, K., Gatos, B., Pratikakis, I.: An objective evaluation methodology for document image binarization techniques. In: Proceedings of the 8th International Workshop on Document Analysis Systems, Nara, Japan, September, pp. 217–224 (2008)

  27. Ntirogiannis, K., Gatos, B., Pratikakis, I.: ICFHR 2014 competition on handwritten document image binarization (H-DIBCO 2014). In: Proceedings of the International Conference on Frontiers in Handwriting Recognition, pp. 809–813 (2014)

  28. O’Gorman, L.: Experimental comparisons of binarization and multi-thresholding methods on document images. In: Proceedings of the International Conference on Pattern Recognition, pp. 395–398 (1994)

  29. Otsu, N.: A threshold selection method from gray-level histograms. IEEE Trans. Syst. Man Cybern. 9(1), 62–66 (1979)

    Article  MathSciNet  Google Scholar 

  30. Palumbo, P.W., Swaminathan, P., Srihari, S.N.: Document image binarization: evaluation of algorithms. Proc. Soc. Photo-Opt. Instrum. Eng. 697, 278–285 (1986)

    Google Scholar 

  31. Pratikakis, I., Gatos, B., Ntirogiannis, K.: H-DIBCO 2010—Handwritten document image binarization competition. In: Proceedings of the 2010 International Conference on Frontiers in Handwritting Recognition, Kolkata, India, pp. 727–732 (2010)

  32. Pratikakis, I., Gatos, B., Ntirogiannis, K.: ICDAR 2011 document image binarization contest (DIBCO 2011). In: Proceedings of the International Conference on Document Analysis and Recognition. Beijing, China, pp. 1506–1510 (2011)

  33. Pratikakis, I., Gatos, B., Ntirogiannis, K.: ICFHR 2012 competition on handwritten document image binarization (H-DIBCO 2012). In: Proceedings of the International Conference on Frontiers in Handwriting Recognition, pp. 813–818 (2012)

  34. Pratikakis, I., Gatos, B., Ntirogiannis, K.: ICDAR 2013 document image binarization contest (DIBCO 2013). In: Proceedings of the International Conference on Document Analysis and Recognition (ICDAR), pp. 1471–1476 (2013)

  35. Ramirez-Ortegeon, M.A., Rojas, R.: Unsupervised evaluation methods based on local gray-intensity variances for binarization of historical documents. In: Proceedings of the 20th International Conference on Pattern Recognition, Istambul, Turkey, pp. 2029–2032. IEEE Computer Society (2010)

  36. Rosin, P.L., Ioannidis, E.: Evaluation of global image thresholding for change detection. Pattern Recognit. Lett. 24(14), 2345–2356 (2003)

  37. Sahoo, P.K., Soltani, S., Wong, A.K.C.: A survey of thresholding techniques. Comput. Vis. Graph. Image Process. 41(2), 233–260 (1988)

    Article  Google Scholar 

  38. Sari, T., Kefali, A.: Recognition-free retrieval of old Arabic document images. Comput. Sist. 15(2), 165–208 (2011)

    Google Scholar 

  39. Sari, T., Kefali, A., Bahi, B.: Text extraction from historical document images by the combination of several thresholding techniques. Adv. Multimed. Article ID 934656. doi:10.1155/2014/934656 (2014)

  40. Sauvola, J., Pietikainen, M.: Adaptive document image binarization. Pattern Recognit. 33(2), 225–236 (2000)

    Article  Google Scholar 

  41. Sezgin, M., Sankur, B.: Survey over image thresholding techniques and quantitative performance evaluation. J. Electr. Imaging 13(1), 146–165 (2004)

    Article  Google Scholar 

  42. Som, H.M., Zain, J.M., Ghazali, A.J.: Application of threshold techniques for readability improvement of jawi historical manuscript images. Adv. Comput. Int. J. 2(2), 60–69 (2011)

  43. Stathis, P., Kavallierato, E., Papamarkos, N.: An evaluation survey of binarization algorithms on historical documents. In: Proceedings of the 19th International Conference on Pattern Recognition, pp. 1–4 (2008)

  44. Su, B., Lu, S., Tan, C.L.: Binarization of historical document images using the local maximum and minimum. In: DAS ’10, June 9–11, Boston, MA, USA (2010)

  45. Trier, Ø.D., Taxt, T.: Evaluation of binarization methods for document images. IEEE Trans. Pattern Anal. Mach. Intell. 17(3), 312–315 (1995)

    Article  Google Scholar 

  46. Trier, Ø.D., Jain, A.K.: Goal-directed evaluation of binarization methods. IEEE Trans. Pattern Anal. Mach. Intell. 17(12), 1191–1201 (1995)

    Article  Google Scholar 

  47. Wolf, C., Jolion, J.M.: Détection et extraction de texte de la vidéo. In: Proceedings of CFED, pp. 215–224 (2002)

  48. Woo, Y.W.: Performance evaluation of binarizations of scanned insect footprints. In: Proceedings of the International Conference on Combinatorial Image Analysis, Auckland, New Zealand, pp. 669–678 (2004)

  49. Zhang, Y.J., Gerbrands, J.J.: Segmentation evaluation using ultimate measurement accuracy. In: EI 92. International Society for Optics and Photonics. pp. 449–460 (1992)

  50. Zhang, H., Fritts, J.E., Goldman, S.A.: Image segmentation evaluation: a survey of unsupervised methods. Comput. Vis. Image Underst. 110(2), 260–280 (2008)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Abderrahmane Kefali.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Sari, T., Kefali, A. & Bahi, H. Structural feature-based evaluation method of binarization techniques for word retrieval in the degraded Arabic document images. IJDAR 19, 31–47 (2016). https://doi.org/10.1007/s10032-015-0254-y

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10032-015-0254-y

Keywords

Navigation