Introduction

In recent years, many vision-related issues have been evaluated with the integration of image processing and artificial intelligence. Blur is an undesirable but widespread visual phenomenon that heavily degrades the quality of the captured image from a sensing system. Many factors like motion and defocus can cause partial blur to images, as shown in Fig. 1. Blur detection is aimed to differentiate the blurry and sharp regions from a given image, which is a key task in many computer vision applications including but not limited to image restoration [1], image quality assessment [2], target detection [3], depth estimation [4], and image retargeting [5]. However, the image blur is usually with uneven distribution and indistinct boundary [6], and also easily mingled with lots of noises in realistic sensing and transporting processes [7]. These situations bring large difficulty to accurate blur detection.

Fig. 1
figure 1

Some partially blurred images. a, b Are motion-blurred cases. c, d Are defocus-blurred cases

In the past decade, many blur detection techniques have been proposed and classified roughly into unsupervised methods [6, 10, 11, 15,16,17,18,19,20,21,22] and supervised methods [9, 12, 13, 24,25,26,27,28,29,30, 38,39,40]. Both methods have made remarkable contributions to this field. The unsupervised class generally has explicit mechanism, good interpretability, and low complexity, but also has the difficulty of balancing accuracy and efficiency [8]. The supervised class is usually data driven and has a high performance, but requires considerable manually annotated samples, which are often unavailable in realistic detection tasks. More importantly, most existing blur detection methods are susceptible to image noise so that their performance is greatly limited when faced with noise-polluted cases.

To address these problems, we propose a noise-immune blur detection method based on sequency spectrum truncation. The uniqueness of our method is summarized as follows. (i) A fast pixel-wise blur metric is constructed in sequency domain based on the Walsh transform. By virtue of the Walsh basis function containing only + 1 and − 1, which can be easily compiled by computer, our method has a quite low computing cost. (ii) A noise-immune blur detection framework is proposed via adaptively truncating the sequency spectrum, based on the observation that noises preferentially occupy the high-sequency zones while blurry pixels are associated with low sequencies. (iii) Overall, the proposed method achieves a superior performance and simultaneously a good time efficiency. Especially in noise-polluted cases, our method remarkably outperforms state-of-the-art approaches.

Related works

Currently, researches on blur detection can be mainly classified into two categories: the unsupervised methods based on frequency/gradient analysis and the supervised methods based on learning.

Unsupervised methods

The unsupervised blur detection method is usually based on a clear mechanism. The most frequently used mechanism is that the sharp region has larger gradient magnitude or more high-frequency components than the blur region. Lee et al. [11] developed a technique to divide the input image into non-blur, defocus blur, and motion blur regions based on gradient amplitude and directional coherence. Tang et al. [15] utilized the log averaged spectrum residual to get a coarse blur map and improved the result from coarse to fine based on gradient and color similarity of neighbor pixels. Zhang et al. [16] estimated the blur map by exploiting the edge information and K-nearest neighbors matting interpolation. Yi et al. [17] proposed a fast blur detection framework integrating local binary pattern, image matting and multi-scale inference. Xu et al. [19] used the maximum rank of the local image patch in gradient domain to estimate the spatially varying defocus blur. In [22], Liu et al. united region-based frequency information and edge-based linear information to estimate the defocus blur map via regression tree fields.

Some blur detection methods with other mechanisms are also effective. Su et al. [10] distinguished the blur region by combining pixel-wise singular values and alpha channel information. Golestaneh et al. [6] identified the blur region by multi-scale fusion, sorting and regularization of high-order DCT coefficients in gradient domain. In a previous work [20], we employed a re-blur scheme and DCT coefficients to detect the image blur. Javaran et al. [18] developed another DCT-based blur metric using a similar workflow with ours. Xiao et al. [21] detected the defocus blur based on the decomposition and fusion of multi-scale singular values.

Supervised methods

The supervised blur detection method is usually data driven and dependent on training the classifier. Liu et al. [9] trained a Bayesian bi-classifier based on the gradient, spectrum and color information to identify the blur region. Shi et al. combined the gradient distribution, Fourier frequencies and local filter to distinguish the blur region using the naive Bayesian classifier in [12] and detected the just noticeable defocus blur by sparse edge representation and blurriness estimation in [13].

Recently, the methods based on deep learning have attracted much attention. Park et al. [24] designed the convolutional neural network-based blur feature extractor and fully connected neural network-based classifier to detect the defocus blur. Huang et al. [25] built a six-layer network to predict pixel-wise blur probability and fused multi-scale blur probability maps into a fine result. Zhao et al. [26, 30] proposed a defocus blur detection method based on a fully convolutional network with multi-stream input and bottom–top–bottom framework. In [27], Zeng et al. developed a local blur metric composed of deep feature learning and principal component analysis at the super-pixel level. In another previous work [28], we also built a deep encoder–decoder network equipped with multi-inputs and multi-losses to realize the end-to-end blur detection. Tang et al. [29] proposed a fully convolutional neural network named DeFusionNet, which recurrently fused and refined multi-scale deep features for defocus blur detection. Furthermore, the DeFusionNet got improved by embedding a feature adaptation module and a channel attention module to better exploit discriminative multi-scale features [38]. Tang and his team [39] also developed a deep network for efficient and accurate defocus blur detection via recurrently refining multi-layer residual features. Li et al. [40] designed a dual-branch attention-based network, which was capable of exploiting the complementary information between in-focus and out-of-focus pixels and combining the high-level and low-level features.

Summary

Those above-mentioned studies have made remarkable contributions to the field of blur detection in the past decade. Unsupervised methods are interpretable and feasible in mechanism, with low complexity. Deep learning-based methods can automatically extract suitable features to distinguish blurry pixels from an image, and usually have a high performance driven by a considerable amount of data. However, there are still some limitations: (i) most methods are difficult to achieve a cost–benefit balance; in other words, improving the detection performance is often at the expense of time efficiency; (ii) the deep learning-based methods need numerous annotated samples and sufficient computation power for training and inferring, which are laborious and expensive; (iii) most existing blur detection algorithms are oversensitive to image noise. It leads to quite a large limitation to their performance when facing with noised cases. Note that there are some techniques can help to mitigate the noise sensitivity, such as pre-denoising and anti-noise training, but these palliative techniques introduce extra costs. Therefore, to address these issues, we propose a fast but accurate blur detection method based on sequency spectrum truncation in this paper. It performs well for both noise-free and noise-polluted cases.

Proposed algorithm

The framework of the proposed method is shown in Fig. 2. First, the input image is artificially re-blurred using a kernel function. It can be observed that, after the re-blur process, the sharp region loses numerous details whereas the loss of detail for the blur region is far less. The re-blur scheme has been proved effective in inspecting the unknown blur in our previous work [20] and other literature [18]. Second, the pixel-wise blurriness (i.e., blur map) is measured by quantifying the difference between the input image and its re-blurred version in the sequency domain which is derived by the Walsh transform. Here, an adaptive sequency spectrum truncation strategy is used to eliminate the noise interference so that we can handle the noise-polluted cases. Finally, a multi-scale fusion segmentation framework is designed to extract the accurate blur region based on clustering-guided region growth.

Fig. 2
figure 2

The framework of the proposed blur detection method

Blur metric based on Walsh transform

According to Harmuth’s sequency theory [14], the sequency is defined as half of the average zero-crossing times per second for a function. Compared with the frequency, the sequency has a more generalized concept capable of representing both periodic and nonperiodic functions. In this paper, we convert the input image into sequency domain using the Walsh transform (one of orthogonal transforms) and hereby analyze the loss of image detail during the re-blur process. Unlike the Fourier transform and Cosine transform, the basis of Walsh transform is not the sinusoidal function but rather the Walsh function. It contains only two values of + 1 and − 1 in a certain sequence [35], and hence can be easily compiled by computer binary code; in other words, the Walsh transform has a quite low computing cost. For a patch w ∈ ℝN × N in the input image I ∈ ℝM × M, the discrete Walsh transform is:

$$ {\text{W}}_{h} ({\text{w}}) = {\text{W}}_{h} (u,v) = \frac{1}{{N^{2} }}\sum\limits_{x = 0}^{N - 1} {\sum\limits_{y = 0}^{N - 1} {{\text{w}}(x,y)} } \left[ {\left( { - 1} \right)^{{\phi \left( {x,y,u,v} \right)}} } \right], $$
(1)

where Wh ∈ ℝN + N is the sequency spectrum, (x, y) denotes the pixel in w, (u, v) denotes the element in Wh. (− 1)ϕ(x, y, u, v) is the Walsh basis function whose sequence of + 1 and − 1 is determined by ϕ(x, y, u, v). Here, we use a Hadamard sequence:

$$ \phi (x,u,v,y) = \sum\limits_{i = 0}^{n - 1} {[b_{i} (x)b_{i} (u)]} + \sum\limits_{j = 0}^{n - 1} {[b_{j} (y)b_{j} (v)]} , $$
(2)

where bi(x) is the ith value (either 0 or 1) of the reverse natural binary code of x, N = 2n. Therewith, the Hadamard sequence can be expressed in a recursive matrix mode starting from order 2 to order n:

$$ [H_{2} ] = \frac{1}{\sqrt 2 }\left[ {\begin{array}{*{20}c} 1 & 1 \\ 1 & { - 1} \\ \end{array} } \right], \, [H_{h} (n)] = [H_{2} ]\ominus [H_{h} (n - 1)], $$
(3)

where ⊙ denotes the Kronecker product. Consequently, the Eq. (1) can be expressed as:

$$ {\text{W}}_{h} ({\text{w}}) = [H_{h} (n)] \times {\text{w}} \times [H_{h} (n)]^{T} . $$
(4)

Consistent with other orthogonal transforms, the non-zero Wh coefficients in sequency spectrum can represent image information. Considering that the loss of non-zero coefficients during the re-blur process in a clear region is far more than that in a blur region, we can utilize the loss to estimate the blur level. Therefore, we design a norm descriptor ||Wh||L2 ×||Wh||L, in which the L2 norm and the L norm characterize the global cumulant and the local maximum of non-zero coefficients in Wh, respectively. The integrated norm descriptor ||Wh||L2 ×||Wh||L considering both global and local effects can be used to measure the quantity of non-zero coefficients in sequency spectrum. Therefore, we define the blurriness of one pixel in the input image I as follows:

$$ \lambda ({\text{w}}) = \frac{{\left\| {{\text{W}}_{h} ({\text{w}}^{b} )} \right\|_{{L_{2} }} \times \left\| {{\text{W}}_{h} ({\text{w}}^{b} )} \right\|_{{L_{\infty } }} }}{{\left\| {{\text{W}}_{h} ({\text{w}})} \right\|_{{L_{2} }} \times \left\| {{\text{W}}_{h} ({\text{w}})} \right\|_{{L_{\infty } }} }}, $$
(5)

where w ∈ ℝN×N and wb ∈ ℝN×N denote the pixel-centered patches before and after the re-blur process, respectively. Evidently, the blurriness (λ) estimated using Eq. (5) is a value within [0, 1]. The closer this value to 1, the blurrier the pixel is. After estimating the blurriness of all pixels, we can obtain a blur map Λ for the entire input image. As shown in Fig. 3, the blur region has λ = 0.63–0.91 and the sharp region has λ = 0.12–0.18 using the re-blur kernel of a 9 × 9 mean filter. Visibly, the estimated blur map Λ can properly indicate the blur level of each pixel.

Fig. 3
figure 3

The sketch of pixel-wise blur metric in sequency domain and the consequent blur map. The re-blur kernel we used here is the mean filter with a window size of 9 × 9

Noise-robust blur metric via sequency spectrum truncation

It is important to note that, if the input image is polluted by noise, the proposed blur metric in Eq. (5) will produce a quite lower value, which may result in a notable blur estimation error. Figure 4 shows such a case that the estimated λ = 0.183 for the noise-polluted blurry image is much smaller than the estimated λ = 0.749 for the corresponding noise-free blurry image; in this case, the noise-polluted blurry image would be misjudged into a clear one. The reason is that those noise components in the blurry image bring in many high-sequency Wh coefficients and accordingly increase the norm descriptor ||Wh||L2 ×||Wh||L. Once the noise-polluted blurry image is re-blurred, its high-sequency Wh coefficients and corresponding norm descriptor will be reduced significantly. Therefore, the value calculated by Eq. (5) will be near to zero, which incorrectly indicates that the image is clear.

Fig. 4
figure 4

Noise-induced blur estimation error. a Is a clear image with average blurriness λ = 0.172 calculated by Eq. (5); b Is the re-blurred version of a using a 9 × 9 mean filter and its average blurriness λ = 0.749; c is the noise-polluted version of b by adding the Gaussian noise of σ = 5e−2, but its average blurriness presents a low value λ = 0.183. Obviously, the noise-polluted blurry image is misjudged into a clear one by Eq. (5)

To address the estimation error in noisy conditions, we propose a sequency spectrum truncation strategy. It is assumed that image noise acquired by optical sensors is usually additive Gaussian white noise. Since the blur information is compressed to low-sequency spectrum zones while the noise preferentially concentrates in high-sequency spectrum zones, the noise effect can be reduced by truncating and discarding these high-sequency Wh coefficients. Hence, the modified blur metric is defined as:

$$ \lambda^{l} ({\text{w}}) = \frac{{\left\| {{\text{W}}_{h} ({\text{w}}^{b} )} \right\|_{{L_{2} }}^{l} \times \left\| {{\text{W}}_{h} ({\text{w}}^{b} )} \right\|_{{L_{\infty } }}^{l} }}{{\left\| {{\text{W}}_{h} ({\text{w}})} \right\|_{{L_{2} }}^{l} \times \left\| {{\text{W}}_{h} ({\text{w}})} \right\|_{{L_{\infty } }}^{l} }}, $$
(6)

where l = αN denotes the truncation length (2 ≤ l ≤ N) in the sequency spectrum and α is the truncation proportion (0 < α ≤ 1). The truncation strategy is illustrated in Fig. 5. These Wh coefficients located in the low-sequency triangular zone are retained while those Wh coefficients located in high-sequency zones are truncated and discarded (replaced by 0). It is obvious that the value of truncation proportion (α) should be adaptive to noise intensity (denoted by the standard deviation σ). To find the relation between α and σ, we have performed a series of experiments. First, five image samples from the CSIQ dataset [36] are chosen as original clear images. Then, we simulate their re-blurred versions with different blur kernels to obtain different levels of blurry images. Meanwhile, the noise-polluted versions of these blurry images are created by adding Gaussian white noises with different intensity. Then, for each noise-polluted blurry image, an appropriate α can be found via:

$$ \alpha_{\text{opt}} = \mathop{\arg \min}\limits_{2/N \leq \alpha \leq 1} \left\{ \left| \lambda ({I}^{b}) - \lambda^{l} (I^{b +n})\right| \right\} $$
(7)

where Ib denotes the noise-free blurry image and Ib+n denotes the noise-polluted blurry image.

Fig. 5
figure 5

The schematic of sequency spectrum truncation strategy. Low-sequency coefficients within a right-angled isosceles triangular zone are retained while other middle and high-sequency coefficients are truncated and discarded (replaced by 0) in the sequency spectrum. l = αN is the truncation length and α is an adaptive truncation proportion

Figure 6 shows some results of above-mentioned experiments. As it is observed, there is a fairly stable relation between the optimal truncation proportion αopt and the noise intensity (σ) under a certain noise-to-signal ratio (NSR). Therefore, it can be conceived that αopt is dependent on both σ and NSR. In order to determine the appropriate value of αopt for an input image or patch, we must provide the relation among α, σ, and NSR. This relation (N) can be trained using a radial basis function (RBF) network and multiple sets of {αopt, σ, NSR} acquired from experiments. Then, the trained RBF network (NRBF) can be used to estimate the αopt of the noise-polluted blurry image.

Fig. 6
figure 6

The relation between the optimal truncation proportion (αopt) and noise intensity (σ) for five image examples (from CSIQ dataset) at three noise levels (different NSR)

For a given image with unknown noise, we first employ the SOVNE algorithm [37] to estimate its noise intensity, i.e., the standard deviation σ. Subsequently, the noise-to-signal ratio can be obtained by the standard deviation ratio between noise and image: NSR = σ/σimage. Then, the optimal truncation proportion αopt is inferred via NRBF (σ, NSR).

Then, we recalculate the average blurriness of Fig. 4c (noise-polluted) using Eq. (6) and obtain its blurriness value of λ = 0.744. This value is quite close to that of the noise-free blurry image (λ = 0.749). It demonstrates that the truncation strategy can effectively eliminate the noise interference. On the other hand, we also use Eq. (6) to recalculate the average blurriness of Fig. 4a, b (noise-free). The new values of λ are 0.168 and 0.751, respectively, which are very approximate to the results (0.172 and 0.749) calculated by Eq. (5). It indicates that Eq. (6) with adaptive truncation strategy is appropriate for both noise-polluted and noise-free cases. Therefore, in this paper, we use Eq. (6) to estimate the blurriness of each pixel on its local patch and obtain the blur map of entire input image.

Blur segmentation

Besides pixel-wise blur estimation, another critical step for partial blur detection is blur segmentation, which divides the entire blur map (Λ) into blur and non-blur regions for further processing like deblurring. Here, we propose a multi-scale fusion segmentation algorithm to accurately extract the blur region based on the clustering-guided region growth. The flowchart is shown in Fig. 7.

Fig. 7
figure 7

The proposed segmentation framework integrated by clustering-guided region growth and multi-scale fusion

Fast clustering

The k-means clustering [34] is a classical unsupervised classification algorithm, which gathers similar elements based on the distance criteria. Considering the segmentation result is binary blur/non-blur, the number of clusters (k) is exactly set to 2. The maximum blurriness (λmax) and minimum blurriness (λmin) in the blur map (Λ) are set as two initial cluster centers, respectively. Hence, we can implement a very fast k-means clustering (k = 2), as represented in Algorithm 1, to get a clustered map (C).

figure a

Multi-scale region growth

For the input image, we use Eq. (6) to estimate the blurriness of each pixel based on three local patches whose scales are N1 × N1, N2 × N2, N3 × N3 (N1 < N2 < N3), respectively. Hence, we can obtain three blur maps {Λ1, Λ2, Λ3} and three corresponding clustered maps {C1, C2, C3}. Then, as Algorithm 2, we implement the region growth algorithm to get three grown maps {G1, G2, G3} guided by three clustered maps {C1, C2, C3}.

figure b

Multi-scale fusion

It is well known that the blur degree is sensitive to scales [26, 30], which is definitely a tough problem in blur detection. To mitigate the scale sensitivity and heighten the segmentation accuracy, a multi-scale fusion strategy is used here to get the final segmentation map S. The scale1 and scale2 are fused first. We compare the difference between G1 and G2 by d = G1G2, where ⊕ denotes the exclusion OR operation.

For a pixel px,y, the value of d(px,y) is 1 when and only when G1(px,y) ≠ G2(px,y), which indicates the class (blur or non-blur) of px,y is different on scale1 and scale2. Hence, we redetermine the class of px,y according to the major class in the eight neighborhood of this pixel:

$$ {\text{S}}_{{1{ \ominus }2}} (p_{{x,y}} ) = \mathop {\arg \max }\limits_{{f = 0,1}} \left[ {\sum\nolimits_{{q \in {\text{n}}_{{{\text{8}} \leftarrow {\text{p}}}} }} {{\text{Ind}}({\text{G}}_{2} (q) = f)} } \right], $$
(8)

where n8p denotes the eight neighborhood of px,y and q denotes the neighbor therein. f = 0 and 1 denote the non-blur class and blur class, respectively. Ind(⋅) is an indicator function. Ind takes 1 when G2(q) = f is true, otherwise it takes 0.

If G1(px,y) = G2(px,y), the value of d(px,y) will be 0. At this point, S1,2(px,y) = G2(px,y).

Therefore, the fusion result of G1 and G2 is:

$$S_{{1,2}} (p_{{x,y}} ) = \left\{ {\begin{array}{l@{\quad}l} \mathop {\arg \max }\limits_{{f = 0,1}} \left[ {\sum\nolimits_{{q \in n_{{8 \leftarrow p}} }} {{\text{Ind}}(G_{2} (q) = = f)} } \right],& d(p_{{x,y}} ) = 1 \\ G_{2} (p_{{x,y}} ),& d(p_{{x,y}} ) = 0 \\ \end{array} } \right.. $$
(9)

Accordingly, we fuse G3 and S1,2 as above to merge scale3, and get the final segmentation map S1,2.3.

Experimental results

In this section, we show experimental results of the proposed sequency-based blur detection method, as well as comparisons with previous state-of-the-art methods. To evaluate the performance, two publicly available datasets have been considered: CUHK [12] and DUT [26].

CUHK consists of 1000 partially blurred images captured from real world, wherein 704 images are out-of-focus and 296 images are motion blurred. DUT contains 1100 challenging images, all of which are partially out-of-focus natural images. All samples in both datasets are noise-free and annotated with corresponding ground-truth maps.

On this basis, we add a certain amount of Gaussian noise, speckle noise, and impulsive noise into CUHK and DUT to simulate real noise pollution. The noise intensity (standard deviation) takes from 0.01 to 0.5 at an interval of 0.05, which generates a noise-polluted version of CUHK and DUT.

To objectively validate the proposed method, we choose four widely used indicators: p (precision), r (recall), Fβ (Fβ-measure), and MAE (mean absolute error). Precision is a measure of the ability of the method to recognize only blur regions. Recall indicates the ability of the method to identify all blur regions. Fβ-measure is a comprehensive weighted indicator of precision and recall. Here we use β = 1 that means precision and recall has the equal importance. MAE is an error-related indicator computed by comparing the final segmentation map (S) and the ground-truth map (GT). For p, r, and F1, the higher value indicates the better performance, but MAE acts the opposite. All indicators are defined in Table 1.

Table 1 Definitions of evaluation indicators

Implementation

In our experiments, we use a 9 × 9 mean filter as the re-blur kernel for the sake of efficiency, and select N = 4, 8, and 16 as three patch scales in pixel-wise blur estimation. Here, we experimented various patch scales and achieved the best performance when patch sizes are 4, 8, and 16. For arbitrary input image with unknown noise, the SOVNE algorithm [37] is employed to estimate the noise intensity. A pre-trained RBF network (NRBF) is used to determine the optimal truncation proportion αopt. Then, we can estimate the blur map by Eq. (6) and produce the final segmentation map by Eq. (13). Our method is implemented in MATLAB with C-Mex compilation, and our experiments are performed on a workstation with Core i7 3.50 GHz CPU and 16 GB RAM. Some exemplar blur maps and segmentation maps are shown in Fig. 8.

Fig. 8
figure 8

Some exemplar blur maps and segmentation maps by the proposed method in noise-free and noise-polluted conditions including defocus and motion

Comparisons

The first series of experiments aim to compare the proposed method with the state-of-the-art methods in recent five years, including JNDB [13], SSA [15], LBP [17], DCT [20], HiFST [6], DHDE [24], BDNet [25], CFLLM [27], BTBCRL [30], and DeFusionNET [38], where the latter five methods are based on deep learning. Considering that some methods [13, 17, 24, 27, 30, 38] are designed only for defocus blur, the comparisons are limited to defocus blur images (and their noise-polluted versions). In this section, we use the implementations provided by authors or the results reported in literatures without any modification.

Visual comparison

Figure 9 shows a visual comparison of all involved blur detection methods on common cases and challenging cases, where the challenging case1 contains two images (CUHK) with cluttered backgrounds, the challenging case2 contains two images (DUT) with homogeneous regions, and the challenging case3 contains two images (DUT) with similar foreground and background. It can be observed that our method and DeFusionNET are the two methods that yield the most accurate segmentation map in common and challenging cases, benchmarked by the ground truth. The third-place BTBCRL performs well in challenging cases with cluttered backgrounds, but perform less well in challenging cases with homogeneous regions or similar fore/background. It is also worth mentioning that the segmented blur/sharp region by our method has a high internal connectivity.

Fig. 9
figure 9

Visual comparison of blur segmentation results from different methods including JNDB [13], SSA [15], LBP [17], HiFST [6], DHDE [24], BTBCRL [30], DeFusionNET [38], and ours. Challenging case1 has cluttered backgrounds, case2 has homogeneous regions, and case3 has similar foreground and background

Quantitative comparison

Table 2 presents the compared results of F1 and MAE from different methods on different types of samples. Hereinto, noise-free samples are from CUHK1 (the out-of-focus part of CUHK), CUHK2 (the motion-blurred part of CUHK), and DUT; noise-polluted samples are from NCD. It can be seen that our method achieves the top performance on CUHK2 and NCD, and ranks second on CUHK1 and DUT.

Table 2 Quantitative comparison of F1 and MAE among different methods. The first and second places in each sample type are highlighted in bold and italics, respectively

On noise-free CUHK1, our method achieves a F1-measure of 0.899 and a MAE value of 0.094, which ranks second only to deep learning-based BTBCRL (F1 = 0.904 and MAE = 0.088). On noise-free CUHK2, our method ranks first in F1-measure and MAE whose values are 0.884 and 0.102, respectively. Compared with the second-best DCT, our method improves the F1-measure by 2.8% and reduces the MAE by 30.6%. On noise-free DUT, our method achieves the second-highest F1-measure of 0.881 and the second-lowest MAE value of 0.105, behind deep learning-based DeFusionNET (F1 = 0.889 and MAE = 0.099).

For noised samples from the NCD dataset, the proposed method performs distinctly better than other competitive methods. Our method achieves the highest F1-measure of 0.873 and the lowest MAE value of 0.118. Compared with the second-place BTBCRL, the F1-measure increases by 15.3% and the MAE decreases by 44.3%.

Moreover, we test the pre-denoising technique in our experiments. The BM3D algorithm [41, 42], one of SOTA denoising algorithms, are used to denoise these noise-polluted samples. The results of F1 and MAE on denoised samples are presented in the bottom of Table 2. Obviously, we can observe a performance increase by applying a pre-denoising processor to different methods. For our proposed method, the result also improves after denoising. It demonstrates the pre-denoising technique is an effective means to mitigate the noise sensitivity, although it introduces extra computing costs. However, even with the pre-denoising process, the performance on noise-polluted images is still lower than that on noise-free images. It is probably because that the pre-denoising technique could remove useful sharp information while filtering harmful noises.

Runtime comparison

We also report the average runtime of competitive blur detection methods for one single input image in Table 3. For some methods [25, 27, 30, 38, 39] whose implementations are unavailable, we use the runtime reported in their papers. For the rest, we use their implementations on our workstation (i7 CPU and 16 GB RAM). As it is observed, our method takes about 0.7 s only using CPU to process an input image, behind DeFusionNET [38] and R2MRF [39] but faster than other competitive methods. Besides, our method needs about 0.5 s to pre-train the RBF network, which is far faster than deep learning-based BTBCRL [30] (about 5 days), DeFusionNET (about 11.7 h), and R2MRF (about 0.8 h). Note that our method has a lightweight architecture that requires neither parallel computing resources nor massive model parameters. Hence, it has the potential to be applied in portable devices or embedded platforms.

Table 3 Quantitative comparison of average runtime among different methods on one single image. The pre-training time is also shown in round brackets for some supervised methods and ours

In summary, the proposed sequency-based blur detection method is effective and efficient in different blur scenarios (including out-of-focus and motion) and various challenging cases (containing cluttered backgrounds, homogeneous regions, and similar fore/background). By the above comparisons, our method presents a good detection performance and strong noise robustness, which indicates the superiority to other competitive methods.

Discussion

Necessity of Walsh transform

In this paper, the Walsh transform is introduced into our blur metric. This is mainly based on two following concerns. (1) The Walsh basis is sparse with only ± 1 values. In other words, the Walsh transform involves only addition/subtraction and no multiplication/division, which can be easily compiled by computer. Coupled with the simple recursive relationship of Hadamard sequence, the Walsh transform has a great efficiency advantage that its computing cost and required storage are far lower than other transforms like Fourier. (2) The Walsh transform is a complete orthogonal transform with the property of energy compaction which always compresses image information to the low-sequency spectrum zone. Therefore, a few low-sequency Wh coefficients can effectively represent rich image information and meanwhile protect against high-sequency noise interference [23]. Due to the advantages of computation rapidity and energy compaction, the Walsh transform has been widely used in many fields such as compressed sensing [31], coding and decoding [32, 33]. Our experimental results demonstrate it can be also applied in noise-immune blur detection.

Ablation study

This section aims to evaluate the effect of two strategies: (i) the sequency spectrum truncation strategy for noise-immune blur estimation; (ii) the multi-scale fusion strategy for fine blur segmentation. To this end, two ablation experiments are conducted on DUT, CUHK, and NCD datasets.

Sequency spectrum truncation strategy

To validate the anti-noise effectiveness of this strategy, we set an ablation experiment on NCD using two algorithms: the complete version of our algorithm and the ablated version by removing only the truncation step. Their performances are reported in Table 4. We can observe that the algorithm equipped with truncation strategy achieves a far better performance than the algorithm with no truncation. It demonstrates that the sequency spectrum truncation strategy plays a vital role on the noise immunity of our method.

Table 4 Results of the ablation experiment on truncation strategy

Multi-scale fusion strategy

The proposed method fuses three patch scales of 4 × 4, 8 × 8, and 16 × 16. To prove the contribution of multi-scale fusion strategy, we set an ablation experiment including three algorithms: the complete algorithm using all three patch scales, the ablated algorithm using two patch scales of 8 × 8 and 16 × 16, and the ablated algorithm using one patch scale of 16 × 16. The corresponding F1 and MAE of three algorithms on DUT, CUHK, and NCD are reported in the upper part of Table 5. It can be seen that the complete algorithm evidently outperforms other two ablated algorithms, which demonstrates the effectiveness of the proposed multi-scale fusion strategy. Moreover, we conduct experiments using different fusion orders of three scales. The results are reported in the lower part of Table 5, where scale1, scale2, and scale3 denote 4 × 4, 8 × 8, and 16 × 16, respectively. We can see that there is little difference in detection performance among different fusion orders. Mathematically, the order of (scale1 + scale2) + scale3 looks slightly better on F1 and MAE than two other orders. Therefore, in this paper, we merge scale1 and scale2 at first, then fuse scale3.

Table 5 Results of the ablation experiment on multi-scale fusion strategy

Noise immunity

Table 6 reports the results of F1-measure and MAE of our method at different noise levels. It shows that our method has a good anti-noise capacity and maintains a relatively high detection performance. However, with the increase of noise level, the anti-noise capacity will also be moderately reduced. In the bottom of Table 6, we also give the updated results of F1-measure and MAE after adding a BM3D pre-denoising processor. It can be observed that the proposed method combined with BM3D provides a higher detection performance, especially facing large noise. It demonstrates that the applying of the pre-denoising technique can enhance the anti-noise capacity of our method, although it introduces extra computing costs.

Table 6 F1-measure and MAE of the proposed method at different noise levels

Failure cases

There are some knotty situations that can cause the proposed method to fail. The first one is the ambiguity between blur region and textureless sharp region. As shown in Fig. 10, our method has difficulty telling the difference between the smooth blur region (marked by blue rectangle) and the smooth sharp region (marked by red rectangle), because they have similar smoothness with few textures. Actually, this problem is fairly challenging for any blur detection algorithm. Another knotty situation is the interference of strong image noise. As shown in Fig. 11, when the noise variance/density achieves the level of 5e−1, the blur map becomes indistinguishable; hence, the proposed method fails to segment the correct blur/sharp region. This noise-induced failure can be mitigated by applying an effective pre-denoising technique as mentioned in “Quantitative comparison”.

Fig. 10
figure 10

Failure case due to the ambiguity between blur region (marked by blue rectangle) and textureless sharp region (marked by red rectangle)

Fig. 11
figure 11

Failure case due to strong noise with a variance/density of 5e-1

Conclusions

In this work, we propose a novel noise-immune blur detection method. First, a simple re-blur scheme is used to inspect the unknown blur in an input image. Then, we design a pixel-wise blurriness metric based on the sequency spectrum derived by the Walsh transform. Meanwhile, to solve the misdetection problem caused by image noises, we develop an adaptive sequency spectrum truncation strategy to eliminate the noise interference. By combining the re-blur scheme, sequency transform, and adaptive truncation, we can achieve an effective and noise-robust blur detection that produces a fine blur map. Finally, the blur region is accurately segmented using the clustering-guided region growth framework and the multi-scale fusion strategy. Experimental results on two benchmark datasets (DUT and CUHK) and their noise-polluted versions (Noised DUT and CUHK) demonstrate the superiority of our method to competitive blur detection approaches. The proposed method achieves the top performance in DUT and the motion blur part of CUKH. It ranks second in the defocus blur part of CUHK. Especially for noise-polluted blurry images, the proposed method outperforms other competitors much and presents a prominent noise immunity, which has great application prospect in realistic noisy conditions. In terms of the runtime, our method is inferior only to a recent deep learning-based approach, but with higher accuracy and significantly shorter pre-training time. To summarize the above results, we believe that our method achieves the best balance between cost and benefit among all competitive methods.

As future work, we plan to explore a more adaptive truncation strategy, develop a more effective multi-scale fusion algorithm, and apply the proposed method into the blur detection of UAV-based and SAR-based images. In addition, we also plan to build a larger and richer dataset for expanding the blur detection filed.