Quality evaluation of degraded document images for binarization result prediction

  • V. Rabeux
  • N. Journet
  • A. Vialard
  • J. P. Domenger
Original Paper
  • 303 Downloads

Abstract

This article proposes an approach to predict the result of binarization algorithms on a given document image according to its state of degradation. Indeed, historical documents suffer from different types of degradation which result in binarization errors. We intend to characterize the degradation of a document image by using different features based on the intensity, quantity and location of the degradation. These features allow us to build prediction models of binarization algorithms that are very accurate according to \(R^2\) values and p values. The prediction models are used to select the best binarization algorithm for a given document image. Obviously, this image-by-image strategy improves the binarization of the entire dataset.

Keywords

Document image analysis Quality evaluation Binarization prediction 

References

  1. 1.
    Baird, H.S.: The state of the art of document image degradation modelling. In: Chaudhuri, B.B. (ed.) Digital Document Processing, Advances in Pattern Recognition, pp. 261–279. Springer, London (2007)CrossRefGoogle Scholar
  2. 2.
    Bernsen, J.: Dynamic thresholding of gray level images. In: Proceedings of the International Conference on Pattern Recognition (ICPR 1986), vol. 1, pp. 252–255 (1986)Google Scholar
  3. 3.
    Lamiroy, B., Lopresti, D.: An open architecture for end-to-end document analysis benchmarking. In: Proceedings of the International Conference on Document Analysis and Recognition (ICDAR 2011), pp. 42–47. IEEE (2011)Google Scholar
  4. 4.
    Lund, W.B., Kennard, D.J., Ringger, E.K.: Combining multiple thresholding binarization values to improve OCR output. In: Proceedings of the 20th Document Recognition and Retrieval Conference (DRR 2013), Volume SPIE 8658, (2013)Google Scholar
  5. 5.
    Wang, Q., Xia, T., Li, L., Tan, C.L.: Document image enhancement using directional wavelet. In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003, vol. 2, pp. 534–539. IEEE (2003)Google Scholar
  6. 6.
    Dubois, E., Pathak, A.: Reduction of bleed-through in scanned manuscript documents. In: Proceedings of IS & TS PICS Conference, pp. 177–180. Society for imaging science & technology (2001)Google Scholar
  7. 7.
    Zhang, Z., Tan, C.L.: Straightening warped text lines using polynomial regression. In: Proceedings of the International Conference on Image Processing (ICIP 2002), vol. 3, pp. 977–980 (2002)Google Scholar
  8. 8.
    Farrahi Moghaddam, R., Cheriet, M.: Rsldi: restoration of single-sided low-quality document images. Pattern Recognit. 42(12), 3355–3364 (2009)CrossRefMATHGoogle Scholar
  9. 9.
    Barney Smith, E.H., Darbon, J., Likforman-Sulem, L.: A mask-based enhancement method for historical documents. In: Proceedings of the 18th Document Recognition and Retrieval Conference (DRR 2011), volume SPIE 7874 (2011)Google Scholar
  10. 10.
    Blando, L.R., Kanai, J., Nartker, T.A.: Prediction of OCR accuracy using simple image features. In: Proceedings of the International Conference on Document Analysis and Recognition (ICDAR 1995), volume 1, pp. 319–322. IEEE (1995)Google Scholar
  11. 11.
    Cannon, M., Hochberg, J., Kelly, P., White, J.: An automated system for numerically rating document image quality. In: Proceedings of the Symposium on Document Image Understanding Technology, pp. 162–167, (1997)Google Scholar
  12. 12.
    Gonzalez, J., Kanai, J., Nartker, T.A.: Prediction of OCR accuracy using a neural network. Ser. Mach. Percept. Artif. Intell. 29, 356–370 (1998)CrossRefGoogle Scholar
  13. 13.
    Ablavsky, V., Pollak, J., Snorrason, M., Stevens, M.R.: OCR accuracy prediction as a script identification problem. In: Doermann, D. (ed.) Proceedings of the Symposium on Document Image Understanding Technology (SDIUT 2003), pp. 135–142 (2003)Google Scholar
  14. 14.
    Souza, A., Cheriet, M., Naoi, S., Suen, C.Y. : Automatic filter selection using image quality assessment. In: Proceedings of the International Conference on Document Analysis and Recognition (ICDAR 2003), pp. 508–512. IEEE (2003)Google Scholar
  15. 15.
    Cannon, M., Hochberg, J., Kelly, P.: Quality assessment and restoration of typewritten document images. Int. J. Document Anal. Recognit. 2(2), 80–89 (1999)CrossRefGoogle Scholar
  16. 16.
    Reed, D.K., Barney Smith, E.H.: Correlating degradation models and image quality metrics. In: Proceedings of the 15th Document Recognition and Retrieval Conference (DRR 2008), volume SPIE 6815 (2008)Google Scholar
  17. 17.
    Moghaddam, R.F., Cheriet, M.: Low quality document image modeling and enhancement. Int. J. Document Anal. Recognit. 11(4), 183–201 (2009)CrossRefGoogle Scholar
  18. 18.
    Kapur, J.N., Sahoo, P.K., Wong, A.K.C.: A new method for gray-level picture thresholding using the entropy of the histogram. Comput. Vis. Gr. Image Process. 29(3), 273–285 (1985)CrossRefGoogle Scholar
  19. 19.
    Kittler, J., Illingworth, J.: On threshold selection using clustering criteria. IEEE Trans. Syst. Man Cybern. 15(5), 652–654 (1985)CrossRefGoogle Scholar
  20. 20.
    Li, C.H., Tam, P.K.S.: An iterative algorithm for minimum cross entropy thresholding. Pattern Recognit. Lett. 19(8), 771–776 (1998)CrossRefMATHGoogle Scholar
  21. 21.
    Niblack, W.: An Introduction to Digital Image Processing. Strandberg Publishing Company (1985)Google Scholar
  22. 22.
    Ridler, T.W., Calvard, S.: Picture thresholding using an iterative selection method. IEEE Trans. Syst. Man Cybern. 8(8), 630–632 (1978)Google Scholar
  23. 23.
    Sahoo, P., Wilkins, C., Yeager, J.: Threshold selection using Renyi’s entropy. Pattern Recognit. 30(1), 71–84 (1997)CrossRefMATHGoogle Scholar
  24. 24.
    Shanbhag, A.G.: Utilization of information measure as a means of image thresholding. CVGIP: Graph. Model. Image Process. 56(5), 414–419 (1994)MathSciNetGoogle Scholar
  25. 25.
    Sauvola, J., Pietikäinen, M.: Adaptive document image binarization. Pattern Recognit. 33(2), 225–236 (2000)CrossRefGoogle Scholar
  26. 26.
    Otsu, N.: A threshold selection method from gray-level histograms. Automatica 11, 285–296 (1975)CrossRefGoogle Scholar
  27. 27.
    White, J.M., Rohrer, G.D.: Image thresholding for optical character recognition and other applications requiring character image extraction. IBM J. Res. Dev. 27(4), 400–411 (1983)CrossRefGoogle Scholar
  28. 28.
    Lu, S., Su, B., Tan, C.L.: Document image binarization using background estimation and stroke edges. Int. J. Document Anal. Recognit. (IJDAR) 13(4), 303–314 (2010) Google Scholar
  29. 29.
    Gatos, B., Ntirogiannis, K., Pratikakis, I.: Icdar 2009 document image binarization contest (dibco 2009). In: Document Analysis and Recognition (ICDAR), 2009 International Conference on, pp. 1375–1382. IEEE (2009)Google Scholar
  30. 30.
    Pratikakis, I., Gatos, B., Ntirogiannis, K.: Icdar 2011 document image binarization contest (dibco 2011). In: Proceedings of the International Conference on Document Analysis and Recognition (ICDAR 2011), pp. 1506–1510. IEEE (2011)Google Scholar
  31. 31.
    Obafemi-Ajayi, T., Agam, G.: Goal-oriented evaluation of binarization algorithms for historical document images. In: Proceedings of the 20th Document Recognition and Retrieval Conference (DRR 2013), volume SPIE 8658 (2013)Google Scholar
  32. 32.
    Pratikakis, I., Gatos, B., Ntirogiannis, K.: H-dibco 2010-handwritten document image binarization competition. In: Proceedings of the International Conference on Frontiers in Handwriting Recognition (ICFHR 2010), pp. 727–732. IEEE (2010)Google Scholar
  33. 33.
    Su, B., Lu, S., Tan, C.L.: A learning framework for degraded document image binarization using Markov random field. In: Proceedings of the 21st International Conference on Pattern Recognition (ICPR 2012), pp. 3200–3203, (2012)Google Scholar
  34. 34.
    Barney Smith, E.H., Likforman-Sulem, L., Darbon, J.: Effect of pre-processing on binarization. In: Proceedings of the 17th Document Recognition and Retrieval Conference (DRR 2010), volume SPIE 7534 (2010)Google Scholar
  35. 35.
    Cheriet, M., Moghaddam, R.F., Hedjam, R.: A learning framework for the optimization and automation of document binarization methods. Comput. Vis. Image Underst. 117(3), 269–280 (2013)CrossRefGoogle Scholar
  36. 36.
    Thompson, M.L.: Selection of variables in multiple regression: part ii. chosen procedures, computations and examples. Int. Stat. Rev. pp. 129–146 (1978)Google Scholar
  37. 37.
    Thompson, M.L.: Selection of variables in multiple regression: part i. a review and evaluation. Int. Stat. Rev. 46(1), 1–19 (1978)CrossRefMATHGoogle Scholar
  38. 38.
    Hocking, R.R.: The analysis and selection of variables in linear regression. Biometrics 32(1), 1–49 (1976)CrossRefMATHMathSciNetGoogle Scholar
  39. 39.
    Cohen, J., Cohen, P.: Applied Multiple Regression/Correlation Analysis for the Behavioral Sciences. Lawrence Erlbaum, London (2003)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2013

Authors and Affiliations

  • V. Rabeux
    • 1
  • N. Journet
    • 1
  • A. Vialard
    • 1
  • J. P. Domenger
    • 1
  1. 1.Talence CedexFrance

Personalised recommendations