Abstract
One of the most crucial steps of preprocessing of document images subjected to further text recognition is their binarization, which influences significantly obtained OCR results. Since for degrades images, particularly historical documents, classical global and local thresholding methods may be inappropriate, a challenging task of their binarization is still up-to-date. In the paper a novel approach to the use of Generalized Gaussian Distribution for this purpose is presented. Assuming the presence of distortions, which may be modelled using the Gaussian noise distribution, in historical document images, a significant similarity of their histograms to those obtained for binary images corrupted by Gaussian noise may be observed. Therefore, extracting the parameters of Generalized Gaussian Distribution, distortions may be modelled and removed, enhancing the quality of input data for further thresholding and text recognition. Due to relatively long processing time, its shortening using the Monte Carlo method is proposed as well. The presented algorithm has been verified using well-known DIBCO datasets leading to very promising binarization results.
You have full access to this open access chapter, Download conference paper PDF
Similar content being viewed by others
Keywords
1 Introduction
Document image binarization is an active area of research in computer vision due to high demands related to the robustness of the thresholding algorithms. As it is one of the most relevant steps in document recognition applications, considering both machine printed and handwritten text documents, many algorithms have been proposed for this purpose. Many of them were presented at Document Image Binarization COmpetitions (DIBCO) held during International Conferences on Document Analysis and Recognition (ICDAR) and H-DIBCO during International Conferences on Frontiers in Handwriting Recognition (ICFHR). Due to the presence of challenging image distortions, DIBCO datasets [20], used for performance evaluation of the submitted algorithms, became the most popular ones for the verification of newly proposed binarization methods.
The motivation of research related with document image binarization and recognition is not only the possibility of preserving the cultural heritage and discovering some historical facts, e.g. by the recognition of ancient manuscripts, but also potential applications of the developed algorithms in some other areas of industry. Considering the rapid development of Industry 4.0 solutions, similar algorithms may be useful in self-localization and navigation of mobile robots based on machine vision as well as modern autonomous vehicles. Capturing the video data by cameras the presence of some similar distortions may be expected both for natural images and degraded document images. Nevertheless, document image datasets containing also ground truth binary images are still the best tool for verification purposes and therefore the method proposed in the paper is considered using the images from DIBCO datasets.
During the last several years many various approaches to image thresholding have been proposed, outperforming the classical Otsu method [18], including adaptive methods proposed by Niblack [12], Sauvola [22], Feng [3], Wolf [27], or Bradley [1] and their modifications [23], being the most useful for document image binarization purposes. Nonetheless, one of the main issues of such adaptive methods is the necessity of analysis of the neighbourhood of each pixel, increasing the computational effort. Recently, some applications of local features with the use of Gaussian mixtures [11], as well as the use of deep neural networks [24] have been proposed as well. However, to obtain satisfactory results, most of such approaches require multiple processing stages with background removal, median filtering, morphological processing or the time-consuming training process.
Nevertheless, the motivation of the paper is not the direct comparison of the proposed approach with the state-of-the-art methods, especially based on recent advances of deep learning, but the increase of the performance of some known methods due to the application of the proposed approach to image preprocessing.
2 The Basics of the Proposed Approach
2.1 Identification and Definition of the Problem
Handwritten and machine printed documents usually are subject to slow destruction over time influencing their readability. Some characteristic examples of this process are ancient books and old prints, however digital restoration methods allow for reading of even heavily damaged documents. Assuming that the original text was distorted by its summing with a noisy image of a normal distribution, analysing the histograms, it can be noticed that the original information was hidden (blurred) and the histogram of the resulting image is a distorted version of a histogram of a “purely” noisy image.
This similarity is preserved also for the real scanned images of historical documents. Therefore, it can be assumed that potential removing of partial information related with noise should improve the quality of the image being the input for further processing. An illustration of this phenomenon is shown in Fig. 1.
2.2 The Basic Idea of Text Image Reconstruction
Assuming that the real text image is approximately the combination of ground truth (GT) binary image with Gaussian noise, being the most widely present one in nature, the readability of text may be initially improved by normalization of pixel intensity levels according to classical formula
where:
-
\( 0 \le I_{min} < I_{max} \le 255 \),
-
\( I_{max} \) is the maximum pixel intensity level,
-
\( I_{min} \) is the minimum pixel intensity level,
-
p(x, y) is the input pixel intensity level at (x, y) coordinates,
-
q(x, y) is the output pixel intensity level at (x, y) coordinates.
Nevertheless, in most typical applications the values of \(I_{min}\) and \(I_{max}\) are the minimum and maximum intensity values from all image pixels, so this normalization may lead only to the increase of image contrast. Assuming the presence of a dark text on brighter background, one may remove safely the detailed data related to brighter pixels not influencing the text information. Therefore, we proposed to set the \(I_{max} = \mu _{GGD}\) and \(I_{min} = 0\) where the \(\mu _{GGD}\) is the location parameter of the Generalized Gaussian Distribution which is used for the approximation of the image histogram. Such operation causes the removal of partial information related to the presence of distortions and is followed by the thresholding which may be conducted using one of the typical methods, e.g. classical global Otsu binarization. The illustration of the consecutive steps is presented in Fig. 2.
3 Generalized Gaussian Distribution
Generalized Gaussian Distribution (GGD) is very popular tool in many research areas related to signal and image processing. Its popularity comes from the coverage of other widely known distributions: Gaussian distribution, Laplacian distribution, a uniform one and an impulse function. Other special cases were also considered in literature [5, 6]. Many different methods were designed to estimate the parameters of this distribution [28].
This distribution was also extended to cover the complex variable [13] and multidimensional [19]. GGD was used to design many different models, for instance, to model the tangential wavelet coefficients for compressing three-dimensional triangular mesh data [8], the image segmentation algorithm [25], to generate an augmented quaternion random variable with GGD [7], the natural scene statistics (NSS) model to describe certain regular statistical properties of natural images [29], to approximate an atmosphere point spread function (APSF) kernel [26].
The probability density function of GGD is defined by the equation [2]
where p is the shape parameter, \(\varGamma (z)=\int _{0}^{\infty }t^{z-1}e^{-t}dt, z>0\) [17] and \(\lambda \) is connected to the standard deviation \(\sigma \) of the distribution by the equation \(\lambda (p,\sigma )=\frac{1}{\sigma }\left[ \frac{\varGamma (\frac{3}{p})}{\varGamma (\frac{1}{p})}\right] ^{\frac{1}{2}}\). The parameter \(p=1\) corresponds to Laplacian distribution and \(p=2\) corresponds to Gaussian distribution. When \(p \rightarrow \infty \), the GGD density function becomes a uniform distribution and when \(p \rightarrow 0\), f(x) approaches an impulse function. Some examples are shown in Fig. 3.
4 Application of the Monte Carlo Method
4.1 Idea of the Monte Carlo Method
Since the calculation of the GGD parameters for the histogram obtained for the whole image is relatively slow, a significant reduction of the computational burden may be achieved using the simplified histogram calculated for the limited number of pixels. To preserve the statistical properties of the analysed image the randomly chosen pixel locations should be evenly distributed on the image plane and therefore the random number generator with uniform distribution should be applied in the Monte Carlo procedure [15].
The general idea of the statistical Monte Carlo method is based on the random drawing procedure applied for the reshaped one-dimensional vector consisting of all \(M \times N\) pixels from the analysed image. Then, n independent numbers, equivalent to positions in the vector, are generated by a pseudo-random generator of uniform distribution with possibly good statistical properties. Next, the total number of randomly chosen pixels (k) for each luminance level is determined used as an estimate of the simplified histogram, according to:
where k is the number of drawn pixels for the specified luminance level in randomly chosen samples, n denotes the total number of draws and \(M \times N\) stands for the total number of samples in the entire image. In general, the estimator \(\hat{L}_{MC}\) may refer to any defined image feature which may be described by binary values 0 and 1.
The estimation error can be determined as:
assuming that K represents the total number of samples with specified luminance level and \(u_\alpha \) denotes the two-sided critical range.
For such estimated histogram some classical binarization methods may be applied leading to results comparable with those obtained for the analysis of full images [9, 16], also in terms of recognition accuracy.
4.2 Experimental Verification of the Proposed Approach for the Estimation of the GGD Parameters
The influence of the number of randomly drawn pixels on the obtained parameters of the GGD was verified for the images from DIBCO datasets with each drawing repeated 30 times for each assumed n. The minimum, average and maximum values of the four GGD parameters: shape parameter p, location parameter \(\mu \), variance of the distribution \(\lambda \), and standard deviation \(\sigma \) were then determined, according to the method described in the paper [4], without the necessity of using of more sophisticated estimators based on maximum likelihood, moments, entropy matching or global convergence [21]. The illustration of convergence of the parameters for an exemplary representative image from DIBCO datasets using different numbers of drawn samples (n) is shown in Fig. 4.
Nonetheless, it should be noted that for each independent run of the Monte Carlo method the values of the estimated parameters may differ, especially assuming a low number of randomly chosen samples (n). One of the possible solutions of this issue is the use of the predefined numbers obtained from the pseudorandom number generator with a uniform distribution. Therefore an appropriate choice of n is necessary to obtain stable results. Some local histogram peaks may be related with the presence of some larger smears of constant brightness on the image plane (considered as background information). Since the histogram of a natural image should be in fact approximated by a multi-Gaussian model, a limitation of the analysed range of brightness should be made to obtain a better fitting of the GGD model.
Determination of the limited brightness range is conducted as follows:
-
determination of the simplified histogram using the Monte Carlo method for n samples (e.g. \(n=100\)),
-
estimation of the GGD using the simplified histogram,
-
setting the lower boundary as \(x_{min}\), such that \(P(x=x_{min}) = 1/n\),
-
setting the upper boundary as \(x_{max}\), such that \(P(x=x_{max}) = 1 - 1/n\).
Therefore, the brightness values with probabilities lower than the probability of an occurrence of a single pixel \(P(x) = 1/n\) are removed on both sides of the distribution. An example based on the histogram determined for the full image (\(M\cdot N\) samples used instead of n) is shown in Fig. 5.
Additionally, the calculation of the Root Mean Squared Error (RMSE), was made to verify the influence of the number of samples n on the approximation error. Since the histograms of natural images are usually “rough”, the additional median filtering of histograms with 5-elements mask was examined. Nevertheless, the obtained results were not always satisfactory, as shown in Fig. 6 and therefore this filtration was not used to prevent additional increase of computation time.
5 Proposed Two-Step Algorithm and Its Experimental Verification
On the basis of the above considerations, the following procedure is proposed:
-
determination of lower boundary \(x_{min}\) and upper boundary \(x_{max}\) with the use of the Monte Carlo histogram estimation and GGD approximation,
-
limiting the brightness range to \( \langle x_{min} ; x_{max} \rangle \),
-
restarted GGD approximation of the histogram for the limited range with the use of Monte Carlo method,
-
estimation of the location parameter \(\mu _{GGD}\) for the histogram with the limited range,
-
limiting the brightness range to \( \langle 0 ; \mu _{GGD} \rangle \) and normalization,
-
binarization using one of the classical thresholding methods.
The illustration of histograms and GGD parameters obtained for an exemplary representative image after two major steps of the proposed algorithm for \(n=100\) pixels randomly chosen according to the Monte Carlo method is shown in Fig. 7. The noticeably different shapes of the left (a) and right (b) histograms result from independent random draws in each of two steps.
In the last step three various image binarization methods are considered: fixed threshold at 0.5 of the brightness range, global Otsu thresholding [18] and locally adaptive thresholding proposed by Bradley [1].
To verify the validity and performance of the proposed method some experiments were made using 8 available DIBCO datasets (2009, 2010, 2011, 2012, 2013, 2014, 2016 and 2017). For all of these databases some typical metrics used for the evaluation of binarization algorithms [14] were calculated for five different values of samples (n) used in the Monte Carlo method. The executions of the Monte Carlo method were repeated 30 times and the obtained results were compared with the application of three classical thresholding methods mentioned above without the proposed image preprocessing based on the GGD histogram approximation. Detailed results obtained for the fixed threshold (0.5), Otsu and Bradley thresholding are presented in Tables 1, 2 and 3, respectively. Better results are indicated by higher accuracy, F-Measure, specificity and PSNR values, whereas lower Distance-Reciprocal Distortion (DRD) values denotes better quality [10]. All the metrics marked with (GGD) were calculated for the proposed GGD based approach with the Monte Carlo method setting the n value at 5% of the total number of pixels (about 1000–5000 depending on image resolution).
As can be observed analysing the results presented in Tables 1, 2 and 3, the proposed approach, utilising the GGD histogram approximation with the use of Monte Carlo method for image preprocessing, leads to the enhancement of binarization results for Otsu and Bradley thresholding methods, whereas its application for the binarization with a fixed threshold is inappropriate. Particularly significant improvements can be observed for DIBCO2012 dataset with the use of Otsu binarization, however the advantages of the proposed approach can also be observed for the aggregated results for all datasets (weighted by the number of images they contain). A visual illustration of the obtained improvement is shown in Fig. 8 for an exemplary H10 image for DIBCO2012 dataset, especially well visible for Otsu thresholding.
It is worth noting that the results shown in Fig. 8 were obtained using the proposed method applied with random drawing of only \(n=120\) samples using the Monte Carlo method. The obtained improvement of the accuracy value due to the proposed preprocessing is from 0.7765 to 0.9748 for Otsu method and from 0.9847 to 0.9851 for Bradley thresholding. The respective F-Measure values increased from 0.4618 to 0.8608 for Otsu and from 0.9220 to 0.9222 for Bradley method. Nevertheless, depending on the number of randomly drawn pixels the values achieved for the proposed method may slightly differ.
Proposed application of the GGD based preprocessing combined with the Monte Carlo method leads to the improvement of the binarization results which are comparable with the application of adaptive thresholding or better for some images. In most cases its application for adaptive thresholding allows for further slight increase of binarization accuracy.
6 Summary and Future Work
Although the obtained results may be outperformed by some more complex state-of-the-art methods, especially based on deep CNNs [24], they can be considered as promising and confirm the usefulness of the GGD histogram approximation with the use of the Monte Carlo method for preprocessing of degraded document images before binarization and further analysis. Since in the proposed approach, only one of the GGD parameters (location parameter \(\mu \)) is used, a natural direction of our future research is the utilisation of the other parameters for the removal of additional information related to contaminations.
Our future research will concentrate on further improvement of binarization accuracy, although an important limitation might be the computational burden. However, due to an efficient use of the Monte Carlo method, the overall processing time may be shortened and therefore our proposed approach may be further combined with some other binarization algorithms proposed by various researchers.
References
Bradley, D., Roth, G.: Adaptive thresholding using the integral image. J. Graph. Tools 12(2), 13–21 (2007). https://doi.org/10.1080/2151237X.2007.10129236
Clarke, R.J.: Transform Coding of Images. Academic Press, New York (1985)
Feng, M.L., Tan, Y.P.: Adaptive binarization method for document image analysis. In: Proceedings of the 2004 IEEE International Conference on Multimedia and Expo (ICME), vol. 1, pp. 339–342 (2004). https://doi.org/10.1109/ICME.2004.1394198
Krupiński, R.: Approximated fast estimator for the shape parameter of generalized Gaussian distribution for a small sample size. Bull. Polish Acad. Sci. Tech. Sci. 63(2), 405–411 (2015). https://doi.org/10.1515/bpasts-2015-0046
Krupiński, R.: Reconstructed quantized coefficients modeled with generalized Gaussian distribution with exponent 1/3. Image Process. Commun. 21(4), 5–12 (2016)
Krupiński, R.: Modeling quantized coefficients with generalized Gaussian distribution with exponent 1/m, \(m=2,3,\ldots \). In: Gruca, A., Czachórski, T., Harezlak, K., Kozielski, S., Piotrowska, A. (eds.) ICMMI 2017. AISC, vol. 659, pp. 228–237. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-67792-7_23
Krupiński, R.: Generating augmented quaternion random variable with Generalized Gaussian Distribution. IEEE Access 6, 34608–34615 (2018). https://doi.org/10.1109/ACCESS.2018.2848202
Lavu, S., Choi, H., Baraniuk, R.: Estimation-quantization geometry coding using normal meshes. In: Proceedings of the Data Compression Conference (DCC 2003), p. 362, March 2003. https://doi.org/10.1109/DCC.2003.1194027
Lech, P., Okarma, K.: Optimization of the fast image binarization method based on the Monte Carlo approach. Elektronika Ir Elektrotechnika 20(4), 63–66 (2014). https://doi.org/10.5755/j01.eee.20.4.6887
Lu, H., Kot, A.C., Shi, Y.Q.: Distance-reciprocal distortion measure for binary document images. IEEE Signal Process. Lett. 11(2), 228–231 (2004). https://doi.org/10.1109/LSP.2003.821748
Mitianoudis, N., Papamarkos, N.: Document image binarization using local features and Gaussian mixture modeling. Image Vis. Comput. 38, 33–51 (2015). https://doi.org/10.1016/j.imavis.2015.04.003
Niblack, W.: An Introduction to Digital Image Processing. Prentice Hall, Englewood Cliffs (1986)
Novey, M., Adali, T., Roy, A.: A complex Generalized Gaussian Distribution - characterization, generation, and estimation. IEEE Trans. Signal Process. 58(3), 1427–1433 (2010). https://doi.org/10.1109/TSP.2009.2036049
Ntirogiannis, K., Gatos, B., Pratikakis, I.: Performance evaluation methodology for historical document image binarization. IEEE Trans. Image Process. 22(2), 595–609 (2013). https://doi.org/10.1109/TIP.2012.2219550
Okarma, K., Lech, P.: Monte Carlo based algorithm for fast preliminary video analysis. In: Bubak, M., van Albada, G.D., Dongarra, J., Sloot, P.M.A. (eds.) ICCS 2008. LNCS, vol. 5101, pp. 790–799. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-69384-0_84
Okarma, K., Lech, P.: Fast statistical image binarization of colour images for the recognition of the QR codes. Elektronika Ir Elektrotechnika 21(3), 58–61 (2015). https://doi.org/10.5755/j01.eee.21.3.10397
Olver, F.W.J.: Asymptotics and Special Functions. Academic Press, New York (1974)
Otsu, N.: A threshold selection method from gray-level histograms. IEEE Trans. Syst. Man Cybern. 9(1), 62–66 (1979). https://doi.org/10.1109/TSMC.1979.4310076
Pascal, F., Bombrun, L., Tourneret, J.Y., Berthoumieu, Y.: Parameter estimation for multivariate Generalized Gaussian Distributions. IEEE Trans. Signal Process. 61(23), 5960–5971 (2013). https://doi.org/10.1109/TSP.2013.2282909
Pratikakis, I., Zagori, K., Kaddas, P., Gatos, B.: ICFHR 2018 competition on handwritten document image binarization (H-DIBCO 2018). In: 16th International Conference on Frontiers in Handwriting Recognition (ICFHR), pp. 489–493, August 2018. https://doi.org/10.1109/ICFHR-2018.2018.00091
Roenko, A.A., Lukin, V.V., Djurović, I., Simeunović, M.: Estimation of parameters for generalized Gaussian distribution. In: 6th International Symposium on Communications, Control and Signal Processing (ISCCSP), pp. 376–379, May 2014. https://doi.org/10.1109/ISCCSP.2014.6877892
Sauvola, J., Pietikäinen, M.: Adaptive document image binarization. Pattern Recogn. 33(2), 225–236 (2000). https://doi.org/10.1016/S0031-3203(99)00055-2
Saxena, L.P.: Niblack’s binarization method and its modifications to real-time applications: a review. Artif. Intell. Rev. 1–33 (2017). https://doi.org/10.1007/s10462-017-9574-2
Tensmeyer, C., Martinez, T.: Document image binarization with fully convolutional neural networks. In: 14th IAPR International Conference on Document Analysis and Recognition, ICDAR 2017, Kyoto, Japan, 9–15 November 2017, pp. 99–104. IEEE (2017). https://doi.org/10.1109/ICDAR.2017.25
Wang, C.: Research of image segmentation algorithm based on wavelet transform. In: IEEE International Conference on Computer and Communications (ICCC), pp. 156–160, October 2015. https://doi.org/10.1109/CompComm.2015.7387559
Wang, R., Li, R., Sun, H.: Haze removal based on multiple scattering model with superpixel algorithm. Signal Process. 127, 24–36 (2016). https://doi.org/10.1016/j.sigpro.2016.02.003
Wolf, C., Jolion, J.M.: Extraction and recognition of artificial text in multimedia documents. Formal Pattern Anal. Appl. 6(4), 309–326 (2004). https://doi.org/10.1007/s10044-003-0197-7
Yu, S., Zhang, A., Li, H.: A review of estimating the shape parameter of generalized Gaussian distribution. J. Comput. Inf. Syst. 21(8), 9055–9064 (2012)
Zhang, Y., Wu, J., Xie, X., Li, L., Shi, G.: Blind image quality assessment with improved natural scene statistics model. Digit. Signal Process. 57, 56–65 (2016). https://doi.org/10.1016/j.dsp.2016.05.012
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Krupiński, R., Lech, P., Tecław, M., Okarma, K. (2019). Binarization of Degraded Document Images with Generalized Gaussian Distribution. In: Rodrigues, J., et al. Computational Science – ICCS 2019. ICCS 2019. Lecture Notes in Computer Science(), vol 11540. Springer, Cham. https://doi.org/10.1007/978-3-030-22750-0_14
Download citation
DOI: https://doi.org/10.1007/978-3-030-22750-0_14
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-22749-4
Online ISBN: 978-3-030-22750-0
eBook Packages: Computer ScienceComputer Science (R0)