Binarization of Degraded Document Images with Generalized Gaussian Distribution

  • Robert Krupiński
  • Piotr Lech
  • Mateusz Tecław
  • Krzysztof OkarmaEmail author
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11540)


One of the most crucial steps of preprocessing of document images subjected to further text recognition is their binarization, which influences significantly obtained OCR results. Since for degrades images, particularly historical documents, classical global and local thresholding methods may be inappropriate, a challenging task of their binarization is still up-to-date. In the paper a novel approach to the use of Generalized Gaussian Distribution for this purpose is presented. Assuming the presence of distortions, which may be modelled using the Gaussian noise distribution, in historical document images, a significant similarity of their histograms to those obtained for binary images corrupted by Gaussian noise may be observed. Therefore, extracting the parameters of Generalized Gaussian Distribution, distortions may be modelled and removed, enhancing the quality of input data for further thresholding and text recognition. Due to relatively long processing time, its shortening using the Monte Carlo method is proposed as well. The presented algorithm has been verified using well-known DIBCO datasets leading to very promising binarization results.


Document images Image binarization Generalized Gaussian Distribution Monte Carlo method Thresholding 


  1. 1.
    Bradley, D., Roth, G.: Adaptive thresholding using the integral image. J. Graph. Tools 12(2), 13–21 (2007). Scholar
  2. 2.
    Clarke, R.J.: Transform Coding of Images. Academic Press, New York (1985)Google Scholar
  3. 3.
    Feng, M.L., Tan, Y.P.: Adaptive binarization method for document image analysis. In: Proceedings of the 2004 IEEE International Conference on Multimedia and Expo (ICME), vol. 1, pp. 339–342 (2004).
  4. 4.
    Krupiński, R.: Approximated fast estimator for the shape parameter of generalized Gaussian distribution for a small sample size. Bull. Polish Acad. Sci. Tech. Sci. 63(2), 405–411 (2015). Scholar
  5. 5.
    Krupiński, R.: Reconstructed quantized coefficients modeled with generalized Gaussian distribution with exponent 1/3. Image Process. Commun. 21(4), 5–12 (2016)CrossRefGoogle Scholar
  6. 6.
    Krupiński, R.: Modeling quantized coefficients with generalized Gaussian distribution with exponent 1/m, \(m=2,3,\ldots \). In: Gruca, A., Czachórski, T., Harezlak, K., Kozielski, S., Piotrowska, A. (eds.) ICMMI 2017. AISC, vol. 659, pp. 228–237. Springer, Cham (2018). Scholar
  7. 7.
    Krupiński, R.: Generating augmented quaternion random variable with Generalized Gaussian Distribution. IEEE Access 6, 34608–34615 (2018). Scholar
  8. 8.
    Lavu, S., Choi, H., Baraniuk, R.: Estimation-quantization geometry coding using normal meshes. In: Proceedings of the Data Compression Conference (DCC 2003), p. 362, March 2003.
  9. 9.
    Lech, P., Okarma, K.: Optimization of the fast image binarization method based on the Monte Carlo approach. Elektronika Ir Elektrotechnika 20(4), 63–66 (2014). Scholar
  10. 10.
    Lu, H., Kot, A.C., Shi, Y.Q.: Distance-reciprocal distortion measure for binary document images. IEEE Signal Process. Lett. 11(2), 228–231 (2004). Scholar
  11. 11.
    Mitianoudis, N., Papamarkos, N.: Document image binarization using local features and Gaussian mixture modeling. Image Vis. Comput. 38, 33–51 (2015). Scholar
  12. 12.
    Niblack, W.: An Introduction to Digital Image Processing. Prentice Hall, Englewood Cliffs (1986)Google Scholar
  13. 13.
    Novey, M., Adali, T., Roy, A.: A complex Generalized Gaussian Distribution - characterization, generation, and estimation. IEEE Trans. Signal Process. 58(3), 1427–1433 (2010). Scholar
  14. 14.
    Ntirogiannis, K., Gatos, B., Pratikakis, I.: Performance evaluation methodology for historical document image binarization. IEEE Trans. Image Process. 22(2), 595–609 (2013). Scholar
  15. 15.
    Okarma, K., Lech, P.: Monte Carlo based algorithm for fast preliminary video analysis. In: Bubak, M., van Albada, G.D., Dongarra, J., Sloot, P.M.A. (eds.) ICCS 2008. LNCS, vol. 5101, pp. 790–799. Springer, Heidelberg (2008). Scholar
  16. 16.
    Okarma, K., Lech, P.: Fast statistical image binarization of colour images for the recognition of the QR codes. Elektronika Ir Elektrotechnika 21(3), 58–61 (2015). Scholar
  17. 17.
    Olver, F.W.J.: Asymptotics and Special Functions. Academic Press, New York (1974)zbMATHGoogle Scholar
  18. 18.
    Otsu, N.: A threshold selection method from gray-level histograms. IEEE Trans. Syst. Man Cybern. 9(1), 62–66 (1979). Scholar
  19. 19.
    Pascal, F., Bombrun, L., Tourneret, J.Y., Berthoumieu, Y.: Parameter estimation for multivariate Generalized Gaussian Distributions. IEEE Trans. Signal Process. 61(23), 5960–5971 (2013). Scholar
  20. 20.
    Pratikakis, I., Zagori, K., Kaddas, P., Gatos, B.: ICFHR 2018 competition on handwritten document image binarization (H-DIBCO 2018). In: 16th International Conference on Frontiers in Handwriting Recognition (ICFHR), pp. 489–493, August 2018.
  21. 21.
    Roenko, A.A., Lukin, V.V., Djurović, I., Simeunović, M.: Estimation of parameters for generalized Gaussian distribution. In: 6th International Symposium on Communications, Control and Signal Processing (ISCCSP), pp. 376–379, May 2014.
  22. 22.
    Sauvola, J., Pietikäinen, M.: Adaptive document image binarization. Pattern Recogn. 33(2), 225–236 (2000). Scholar
  23. 23.
    Saxena, L.P.: Niblack’s binarization method and its modifications to real-time applications: a review. Artif. Intell. Rev. 1–33 (2017). Scholar
  24. 24.
    Tensmeyer, C., Martinez, T.: Document image binarization with fully convolutional neural networks. In: 14th IAPR International Conference on Document Analysis and Recognition, ICDAR 2017, Kyoto, Japan, 9–15 November 2017, pp. 99–104. IEEE (2017).
  25. 25.
    Wang, C.: Research of image segmentation algorithm based on wavelet transform. In: IEEE International Conference on Computer and Communications (ICCC), pp. 156–160, October 2015.
  26. 26.
    Wang, R., Li, R., Sun, H.: Haze removal based on multiple scattering model with superpixel algorithm. Signal Process. 127, 24–36 (2016). Scholar
  27. 27.
    Wolf, C., Jolion, J.M.: Extraction and recognition of artificial text in multimedia documents. Formal Pattern Anal. Appl. 6(4), 309–326 (2004). Scholar
  28. 28.
    Yu, S., Zhang, A., Li, H.: A review of estimating the shape parameter of generalized Gaussian distribution. J. Comput. Inf. Syst. 21(8), 9055–9064 (2012)Google Scholar
  29. 29.
    Zhang, Y., Wu, J., Xie, X., Li, L., Shi, G.: Blind image quality assessment with improved natural scene statistics model. Digit. Signal Process. 57, 56–65 (2016). Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  1. 1.Department of Signal Processing and Multimedia Engineering, Faculty of Electrical EngineeringWest Pomeranian University of Technology, SzczecinSzczecinPoland

Personalised recommendations