In all the experiments, we used the following settings for the two ADMM algorithms to perform image and blur estimation: (i) The image estimate is computed with 20 iterations, initialized with the estimate from the previous iteration, \(\mathbf d _0 = 0\), and \(\lambda \) hand-tuned for the best visual results or best ISNR (improvement in SNR [5]: \(\text {ISNR} = 10 \text {log}_{10}(||\mathbf x -\mathbf y ||^2/||\mathbf x -\hat{\mathbf{x }}||^2)\)) on synthetic data; (ii) the blur estimate is obtained with 4 iterations, when using the weak prior (explained in Sect. 3.2.1), and 10 iterations, when using the sparsity prior (explained in Sect. 3.2.2), initialized with the blur estimate from the previous iteration, \(\mathbf d _0 = 0\), and \(\gamma \) (with the \(\ell _1\) regularizer) hand-tuned for the best results. Default values of regularization parameters are set to \(\lambda = 0.08\) (with \(\rho = \lambda \) for the image estimate ADMM) and \(\gamma = 0.05\) (with \(\rho = 0.01\) for the blur estimate ADMM).Footnote 2
Furthermore, we use three different datasets to perform the experiments and to train the GMMs and/or dictionaries:
-
a dataset with 10 text images, available from Luo et al. [29] (one for testing and nine for training),
-
a dataset with 100 face images from the same source as the text dataset (10 for testing and 90 for training),
-
a dataset with 128 fingerprints from the publicly available UPEK database.
The GMM-based prior is obtained by using patches of size \(6 \times 6\) pixels and a 20-component mixture. The dictionary-based prior is obtained by using the same size patches and the number of dictionary atoms is set to 1000, with 15 iterations of the K-SVD algorithm. In all the experiments, the number of outer iterations is set to 100.
Table 1 ISNR values (in dB) obtained by four algorithms: Almeida et al. [5], Krishnan et al. [23], PlugBM3D, and PlugGMM We compare our results with several state-of-the-art methods for natural images: Almeida et al. [5]; Krishnan et al. [23]; Xu et al. [48]; Xu et al. [49]; Pan et al. [37]. Almeida et al. tackle the realistic case of blind deblurring with unknown boundary conditions by using edge detectors to preserve important edges in the image. Krishnan et al. [23] use image regularization (ratio of the \(l_1\)-norm to the \(l_2\)-norm on the high frequencies of an image) that favors sharp images over blurry ones. Xu et al. [48, 49] propose \(l_0\)-based sparse representation of an intermediate image used to estimate the blurring kernel.Footnote 3 Pan et al. [37] use the fact that a dark channel (smallest values in a local neighborhood) of blurred images is less sparse. Additionally, we compare our results with the following methods tailored for text images: Cho et al. [12] and Pan et al. [36]. Cho et al. [12] rely on specific properties of text images. Pan et al. [36] use an \(l_0\)-based prior on the intensity and gradient for text image deblurring. Note that these text deblurring methods are not designed for images corrupted with strong or unknown noise.
We use several instances of the proposed framework: PlugBM3D refers to the proposed algorithm with the generic denoiser explained in Sect. 2.5; PlugGMM uses the class-specific GMM-based denoiser (Sect. 2.3); PlugDictionary uses a class-specific dictionary-based denoiser (Sect. 2.4) suitable for images that contain one or two classes. We will mention when PlugDictionary is used with the classification step.
Results: one class
For images that contain one class (e.g., text, faces, fingerprints), we performed several experiments with different types of blurs and different noise levels. To show that the proposed method can be used for various types of blurring filters, we created 10 test images containing text and faces (five of each) using one clean image of text or one clean face image, and \(11 \times 11\) synthetic kernels that represent, respectively, Gaussian, linear motion, out-of-focus, uniform, and nonlinear motion blur, as shown in Fig. 1, and noise levels corresponding to blurred signal to noise ratio (BSNR) of 30 dB and 40 dB (Table 1). We compared our results with two generic methods by Almeida et al. [5] and Krishnan et al. [23]. Here, we tested two versions of the proposed algorithm: PlugBM3D and PlugGMM. The results in Table 1 show that our method outperforms state-of-the-art methods for generic images, when tested on images that belong to a specific class (text and face). Additionally, slightly better results are achieved with a class-specific denoiser plugged into the algorithm (PlugGMM), instead of the generic denoiser (PlugBM3D). Note that the generic algorithm of Almeida et al. [5] is designed for a wide variety of blur filters, while Krishnan et al. [23] is designed mostly for motion blurs.
Furthermore, to show that the proposed method can handle text images corrupted with strong noise, we created three test images of text corrupted with motion blur number 2 from [25] and three noise levels (\(\hbox {BSNR} = 40, 20\), and 10 dB). Our results, presented in Fig. 7, are compared with the state-of-the-art method designed for text images by Pan et al. [36], and, again, we use two versions of the proposed method (PlugBM3D and PlugGMM). In these experiments, we use the \(\ell _1\) prior on the blurring filter to promote sparsity (as explained in Sect. 3.2.2). The results show that both versions of the proposed method are able to handle text images corrupted with different levels of noise. Slightly better results, in terms of the ISNR, are achieved by the class-adapted algorithm PlugGMM. The method of Pan et al. was originally designed for noise-free images and does not perform well on test images even with weak noise (\(\hbox {BSNR} = 40~\hbox {dB}\)).
Robustness of the proposed method to noise is shown in Fig. 8 (upper row). We created several experiments with two types of text images (typed and handwritten), corrupted with four different kernels (kernels 1 to 4 from [25]) and six noise levels corresponding to \(\hbox {BSNR} = 50, 40, 30, 20, 15\), and 10 dB. We measure ISNR after 50 iterations of the PlugGMM method. We see that the proposed method performs stably for noise levels above 20 dB, reasonable well for \(\hbox {BSNR} = 15~\hbox {dB}\), and fails for very high noise level, \(\hbox {BSNR} = 10~\hbox {dB}\).
Additionally, the proposed method is tested in the challenging case of a blurred fingerprint image. We choose to show the performance of our method on images containing fingerprints due to two reasons: i) Although rare, fingerprints can be found in old documents as a means of identification; ii) an image containing fingerprints has specific statistics greatly different than natural images, and as such is highly interesting as a testing ground. We created the experiment by using the simplest case of motion blur: linear motion blur and weak noise (\(\hbox {BSNR} = 40~\hbox {dB}\)). Results of two versions of our algorithm (PlugBM3D and PlugGMM) are compared with the methods of Almeida et al. [5] and Krishnan et al. [23], constructed for generic images (Fig. 9). The results show that, due to the specific structure of images containing fingerprints, algorithms designed for generic images perform poorly. PlugBM3D manages to estimate the blurring filter closely; however, the resulting image still contains blurred regions (upper left part), while the proposed algorithm with a class-adapted image prior (PlugGMM) produces the best result, both visually and in terms of ISNR.
Next, we tested the proposed algorithm with the dictionary-based prior (PlugDictionary) on synthetic blurred text image from [12] (Fig. 10). The results show that the method of Almeida et al. [5] for generic images performs well, introducing small artifacts, similar to the method Cho et al. [12], specially constructed for text images. Good performance of the former method is most likely due to fact that it uses (Sobel-type) edge detectors to preserve important edges in the image. The generic method Krishnan et al. [23] introduces strong artifacts in the reconstructed image, and the method Pan et al. [36], designed for text images only, performs equally well as the proposed method, PlugDictionary, constructed for different image classes.
Figure 11 shows a challenging case of a blurred text with narrow spaces between letters (the test image is introduced in [12]). Results are compared with four generic methods: Xu et al. [49], Almeida et al. [5], Krishnan et al. [23], and Pan et al. [37]. All of these methods show poor results on the tested image. Pan et al. [36] show good visual results with still slightly blurry letter edges. Our method, PlugGMM, gives the sharpest visual result.
To show that the method, when used on text images, is able to estimate a blurring kernel that has a large support (\(69 \times 69\) pixels), we perform several experiments and compare our results with the state-of-the-art method Pan et al. [36]. Here, we used PlugBM3D to show that even with this generic denoiser, we can achieve good performance (Fig. 12).
Results: two or more classes
To test the proposed method on images that contain two classes, we created an image with a face and typed text, corrupted by motion blur and different noise levels. The main reasons to choose text and face images are that these two classes are very common in some applications (e.g., identification documents in document analysis and forensics) and, second, these two classes have completely different structure (namely, sharp transitions in text, versus smooth and textured areas in faces), so they are a natural test ground for the proposed technique. Figure 13 shows the results obtained with this synthetic document image, corrupted by motion blur number 1 from [25], and lower noise (\(\hbox {BSNR} = 40~\hbox {dB}\)). Methods tailored for natural images [23, 48, 49] produce strong ringing artifacts and lose details in the part of the image that contains the face (e.g., necklace). The method of Pan et al. [36], constructed only for text images, performs reasonably well in the part of the image that contains faces, but gives a slightly spread kernel estimate and introduce artifacts in the part of the image that contains text. We used PlugDictionary with a sparsity-based kernel prior (Sect. 3.2.2). PlugDictionary is used with the direct approach explained in Sect. 3.3.1. The result shows that the proposed method outperforms state-of-the-art BID methods for generic images [23, 37, 48, 49] or text images [36].
Furthermore, the proposed method is tested on images that contain two classes (e.g., text and face) with the inclusion of the classification step, instead of using the direct approach. Results in terms of ISNR are presented in Table 2. We used two images that contain text and face (gray scale and RGB), blurred with motion blur number 2 from [25] and corrupted with noise with three levels, corresponding to \(\hbox {BSNR} = 40\), 30, and 20 dB. We compare results of the proposed method with the direct approach and the proposed method with the classification step using kNN or SVM. The results show that the classification step improves the performance on images containing two classes. Additionally, for images with higher noise levels (e.g., \(\hbox {BSNR} = 20~\hbox {dB}\)), in some cases, significant improvement can be achieved by proper selection of the classifier and by including noisy patches (images) in the training data.
Table 2 Results in terms of ISNR obtained on images that contain two classes: (i) \(\hbox {text} + \hbox {face}\) (gray scale image), (ii) \(\hbox {text} + \hbox {face}\) (RGB) As mentioned above, the main focus of this work is on the BID performance, not the accuracy of the patch classifier. Still, it is interesting to observe some examples of classifiers (kNN and SVM) as here we see the potential for improvement of the proposed framework. Figure 14 shows results of BID with a classification step on the color image with text and face. Before the classification step, the color image is converted to gray scale. In this example, the segmented image clearly shows significant classification error in the upper region (blue stripe): Patches that should be classified as text are classified as a face. There are at least two possible explanations for this: 1) Both classifiers used in this work are trained only on text image patches that contain black text on white background; 2) the patch size (\(6 \times 6\) pixels) is small compared with the image size (\(532 \times 345\) pixels). The reason behind gray areas in the segmented image is that for this experiment, instead of classifying patches into two classes (text and face), we use three classes (text, face, and other) to achieve more realistic classification. Although, in this example, the classification errors do not significantly influence the final deblurring result, it shows that there is space for improvement of the patch classifier. Finally, we show how the noise level influences the segmentation result. Figure 15 shows results after a patch classification step performed with different classifiers (kNN and SVM) on two images corrupted with the same blur kernel and different noise levels, \(\hbox {BSNR} = 40\) and 10 dB. These results show that the SVM classifier performs slightly better than the kNN classifier in both cases, but still, the same classification error is present in the upper part of the image. Also, we can see that classifiers do not perform much worse when strong noise is present (\(\hbox {BSNR} = 10~\hbox {dB}\)), probably due to the fact that both classifiers are trained with patches corrupted with different noise levels.
Results: real blurred images
To experiment with real data, we first use a \(248 \times 521\) image, acquired with a handheld mobile phone, corrupted by motion blur and unknown noise (Fig. 16). Note that we use term “unknown noise” when we do not know whether noise present in an image is Gaussian and we do not know a noise level. We choose to use an image acquired by a mobile phone in order to test the robustness of the method with regard to an unknown type of noise. Before the deblurring process, to obtain high contrast, we preprocessed the image by computing its luminance component and setting all pixels with values above 0.8 to 1. (The range of pixel intensity values is from 0 to 1.) The preprocessed image is used as an input for all the methods, except the one of Pan et al. [37], because that method uses properties of an image dark channel that can be disturbed after preprocessing. We compared our results with four state-of-the-art methods: three designed for natural images [5, 37, 48] and one for text images [36]. The results show that the method of Almeida et al. [5] is able to estimate a reasonable blurring kernel, but the estimated image contains strong artifacts and noise. The method of Xu et al. [48] gives reasonably good results, although the estimated image still contains noise. The method Pan et al. [37], when used on an image that is not preprocessed, gives good result with some regions that are still blurred. The method Pan et al. [36] is not able to deal with unknown noise. Our method, PlugBM3D, provides the best visual result.
To emphasize the influence of unknown noise on the deblurring process, Fig. 17 shows the results obtained with the zoomed real blurred image (an image is zoomed before deblurring, introducing new artifacts). As before, the methods of Pan et al. [36] and Xu et al. [49] are not able to deal with unknown noise. The methods of Almeida et al. [5] and Pan et al. [37] estimate the blurring kernel realistically, but leaving some parts of the image blurred. Again, our method gives the best visual results. In experiments presented in Figs. 16 and 17, we used PlugBM3D as BM3D denoiser seems to be more suitable for images corrupted with noise that is not necessarily Gaussian.
Figure 18 shows the performance of the proposed method on a real blurred text image from [12]. Here, we use PlugGMM with a sparsity-based prior on the blurring kernel. Results are compared with three generic methods by Xu et al. [48], Xu et al. [49], and Pan et al. [37], and the text deblurring method by Pan et al. [36]. All methods perform reasonably well, but some of them introduce ringing artifacts.
The performance on the real blurred document images that contain text and face is shown in Figs. 19 and 20. We use real blurred images corrupted with two types of blur, motion and out-of-focus, and compare our results with a method tailored to natural images [23] and a method tailored to text images [36]. In addition to motion blur, the image in Fig. 19 contains some saturated pixels, which influences the performance of the evaluated methods: Krishnan et al. [23], Pan et al. [36], and PlugDictionary. Here, we use PlugDictionary with the classification step performed by an SVM, as explained in Sect. 3.3.2, and a weak prior on the blurring filter. In the case of motion blur, the method from Krishnan et al., tailored to natural images, introduces strong blocking artifacts, especially visible in the face region. Our method manages to better recover the part of the image with the numbers. For out-of-focus blur (Fig. 20), we can see slightly sharper details in the face region when we use PlugDictionary. All three methods fail to deblur small letters. Note that the images used in these experiments are very challenging due to several reasons: rich background, parts with very different statistics, and saturated pixels. Therefore, we choose to compare our results only with one alternative method tailored to natural images and one focused on text images.
Finally, Fig. 21 shows experiments performed on real blurred document (magazine) images that contain text and face. The image is acquired by the handheld mobile phone camera. As before, we used PlugDictionary with a classification step based on an SVM and a weak prior on the blurring kernel. The results are compared with four methods tailored to natural images, Krishnan et al. [23], Xu et al. [48, 49] and Pan et al. [37], and the BID method for text images Pan et al. [36]. All methods perform reasonably well in the part of the image containing a face, except that methods from Pan et al. [36, 37] over-smooth it. In the part of the image containing text, all methods introduce, more or less, ringing artifacts taking into consideration that the proposed method arguably least affects its readability. We recommend zooming into the figure in order to clearly appreciate the differences.
Note that in Figs. 19, 20, and 21, we do not present the estimated blurring kernels. This is due to the images being mildly blurred and the obtained resulting blurring kernels only differ slightly.
OCR results
One of the main reason to do text deblurring is to improve OCR (optical character recognition) accuracy. As OCR software typically uses a language model to improve recognition, it needs continuous text as input. To evaluate OCR accuracy, we use five blurred text images from [22] (images 1–5 in Table 3), two that contain text and face corrupted with different noise levels (images 6–7) and a real blurred text image (image 8). We assess the quality of OCR with three measures: average word confidence (AWC), word error rate (WER), and character error rate (CER); all three measure are in the range from 0 to 1, with 1 as the best value for AWC and 0 as the best value for WER and CER. Table 3 shows the results of the OCR tests on clean, blurred, and estimated images of the method by Pan et al. [36] and the proposed method. OCR is performed on a clean image (when available) as a reference.
Table 3 OCR results obtained on clean, blurred, and estimated images from the method of Pan et al. [36] and the proposed method, in terms of thee error measurements: AWC (averaged word confidence), WER (word error rate), CER (character error rate) Furthermore, to give more insight into the performance of OCR, Figs. 22 and 23 show the results obtained on two images blurred with different intensities. Figure 22, which corresponds to Image 1 from Table 3, shows the results obtained on the slightly blurred text image. The results of three error measures used to assess the quality of OCR for the method by Pan et al. [36] and the proposed method are comparable. As this image is corrupted by very low intensity blur, results show that it is the best to perform OCR on the blurred image itself. Figure 23, which corresponds to Image 2 from Table 3, tells different story. Here, we have a situation where, if OCR takes the blurred image as an input, it will not give any “result.” Furthermore, it will give a very bad result if the input image is one estimated by the method from Pan et al. [36] and a reasonable result if the input image is estimated by the proposed method.
The results show that, in most of the experiments, the image estimated by the proposed method is able to improve OCR accuracy. When OCR is not possible on the blurred image (images 2, 4, and 5), the proposed method slightly improves the results, and in some cases (images 6, 7, and 8), the improvement is significant.
Regularization parameters
One of the main challenges of the proposed framework is setting the regularization parameter associated with the image prior. This parameter is influenced by the type of image, blurring kernel, and noise level. The bottom row of Fig. 8 shows the chosen regularization parameter \(\lambda \) as a function of different noise levels for text images. From here, we can see that for a very high noise level (\(\hbox {BSNR} = 10~\hbox {dB}\)), we should use higher values of \(\lambda \), but also that it depends on the blurring kernel (shape and size).
Figure 24 shows the behavior of the final ISNR (after 50 iterations of the PlugGMM algorithm) as a function of the parameter \(\lambda \) for text image corrupted with four different blurring kernels (first four kernels from [25]). We can see how the choice of the regularization parameter influences the result. For some kernels (kernels 1 and 2), it is relatively safe to choose big enough parameter, but it is not the case for, for example, kernel 4 where only one value of the parameter \(\lambda \) gives the maximum result.