1 Introduction

Often, when outdoor images are acquired under poor weather conditions, such as haze and fog, the visibility of the captured scene is prone to significant degradation (see Fig. 1a). Narasimhan [1] exploited the interactions of light with particles suspended in the atmosphere (scattering, absorption, and emission) that result in reduced contrast, faded colors, and low saturation. Many computer vision applications rely on the assumption that the input image is haze free; consequently, degraded images may cause catastrophic errors. Hence, research on image dehazing is of practical significance, and the search for effective haze removal methods has attracted increased attention in recent years.

Early studies adopted image enhancement techniques to increase the visibility of hazy images. Among these, the Retinex [2] and Choi [3] image processing techniques are typical examples. However, because these techniques do not take the spatial distribution of haze into account and because they ignore the fact that the haze thickness is dependent on the scene depth, their dehazing effect is not visually compelling.

Therefore, subsequent research work focused mainly on haze removal based on the atmospheric scattering model, which has proved to be more attractive than using traditional image enhancement techniques. When using an atmospheric scattering model, it is critical to estimate the scene depth accurately. The literature [46] proposed using multiple images or external information to derive the scene depth map; however, this requirement is difficult to fulfill in many real-world applications.

More recently, single-image haze removal methods have attracted the most research attention—and remarkable progress has been made. Generally, these methods take advantage of strong prior knowledge or assumptions to produce the depth map. For example, by assuming that clear images possess higher local contrast than hazy ones, Tan [7] proposed deriving the transmission map based on a Markov random field (MRF) model and removing haze by maximizing the local contrast. However, Tan’s results tend to be oversaturated because, in spirit, it is similar to contrast stretching. Nishino [8] exploited the statistical properties hidden in images by adopting the Bayesian posterior probability model to remove haze. The results show its superiority for heavily hazed images, but after restoration, the image color tends to be overenhanced for misty images. Fattal [9] assumed that transmission and surface shading are locally uncorrelated and removed haze on the basis of color statistics. However, Fattal’s algorithm does not work for heavily hazed images. Tarel [10] used the median filter to estimate the dissipation function. However, because the median filter shows poor edge preserving performance, the method left small amounts of mist around depth changes in the dehazed image. To solve this problem, Xiao [11] proposed a guided joint bilateral filter for haze removal. Meng [12] estimated a rough transmission map using a boundary constraint and proposed a regularization method to blur the map. Although this method is fast, it tends to distort the color fidelity when dealing with white objects. He [13] obtained a rough estimate of transmission via dark channel prior and adopted soft matting for transmission refinement. Although the dehazing results are almost perfect in their visual effect, He’s algorithm is not applicable for real-time systems, because the soft matting operation incurs expensive computation and memory consumption overhead. To solve this problem, He [14] replaced the soft matting with a guided filter, which proved to be more efficient, but only at the cost of degrading the visual effect. Gibson [15] presented the median dark channel prior method based on [13], which accelerates the haze removal process to some extent, because it requires no refinement of the transmission map. Nevertheless, this method fails to achieve good visual results. In particular, it is prone to leaving dark spots in the dehazed image. Li [16] exploited the detail change prior to estimate the airlight. However, because the result contained excessive texture details, this form of haze removal is unsatisfactory. Zhu [17] created a linear model-to-model scene depth under the color attenuation prior and learned the parameters of the model with a supervised learning method. However, because the scattering coefficient in the atmospheric scattering model cannot actually be regarded as a constant, Zhu’s method proved to be unstable in its haze removal performance.

As mentioned above, the quality of dehazing methods still have room for improvement, especially for images with uneven illumination. Although Li [18] adopted post-enhancement processing to improve the visual quality, he was unable to analyze the underlying key problem and, consequently, failed to make an essential improvement on dehazing. In this study, we first analyze the inherent weaknesses of the atmospheric scattering model and propose an improvement. Then, we present a fast image haze removal algorithm based on the modified model. Our method does not use a traditional way of estimating the global atmospheric light and the transmission map; instead, we perform scene segmentation based on the haze thickness and estimate the scene luminance and scene transmission for each scene region. To eliminate the block effect and the negative effect caused by scene segmentation errors, we propose a guided total variation model (GTV) to perform guided smoothing, which the original total variation (TV) model was not equipped with [19, 20]. Compared to the traditional enhancement techniques, our method results in a better visual effect and improved color fidelity, as shown in Fig. 1.

Fig. 1
figure 1

Dehazing results comparison: a hazy image; b result of homomorphic filter; c result of histogram equalization; d result of Retinex; e result of Laplace; and f result from the proposed method

2 Analysis of and improvement on the atmospheric scattering model

In computer vision and computer graphics, the atmosphere scattering model has been widely used to describe the formation of a hazy image [1, 7] and is defined as follows:

$$\begin{aligned} I\left( {x,y} \right) =A\cdot \rho \left( {x,y} \right) \cdot t\left( {x,y} \right) +A\cdot \left( {1-t\left( {x,y} \right) } \right) , \end{aligned}$$
(1)

where I is the observed image, \(\rho \) is the scene reflectance, A is the global atmospheric light (regarded as constant in the input image), t denotes the transmission and—if we assume that the haze is homogenous—t can be expressed as follows:

$$\begin{aligned} t\left( {x,y} \right) =e^{-\beta _0 \cdot d\left( {x,y} \right) }, \end{aligned}$$
(2)

where \(\beta _{0}\) is the scattering coefficient and d represents the scene depth. Evidently, it is an ill-posed problem to estimate A and t from a single input image. In recent years, many studies have exploited stronger priors or used assumptions as constraints to solve this challenging problem. Although significant progress has been made, the visual effect after restoration is still less than satisfactory.

Fig. 2
figure 2

Scene reflectance in He’s algorithm under different global atmospheric light levels: a hazy image; b–k the dehazing results when \(A=0.1{:}0.1{:}1\); and l the dehazing result when A is estimated by He [13]

Fig. 3
figure 3

Hazy image segmented manually into a limited number of scenes based on scene depth and luminance: left hazy image; right scene segmentation map (identical colors indicate the same scene)

Figure 2 shows various restoration results under different global atmospheric light levels. Clearly, for a smaller values of A, the local contrast in the dark tree shadow is enhanced, but a large number of the detail structures are lost in the bright region. As the Retinex theory states [2], scene reflectance is an intrinsic feature of objects, and it is independent of the incident light. One problem is that we cannot recover the ideal scene reflectance regardless of the value of A, which is a result of the assumption that the atmospheric light level is constant in Eq. 1. However, that assumption is not always true in the real world. The intensity of atmospheric light may vary among different regions. As shown in Fig. 2a, the light intensity tends to be zero in the shadowed area, but it approaches one at the distant horizon. Thus, assuming that A is constant has obvious limitations. In addition, the estimation of the transmission map involves considerable redundant computation, because the transmission map is estimated in a pixelwise manner, while in reality, the depth changes relatively smoothly in the same scene.

To overcome the weaknesses described above, we first discard the assumption that the atmospheric light level is constant. Then, we perform scene partition and adaptively estimate the incident light in each separate scene. Because pixels in the same scene are likely to have similar depth, we can increase the efficiency of this scheme by calculating the transmission in a scene-wise manner rather than pixel by pixel. According to the analysis shown above, the transmission map, t, and the atmospheric light, A, can be redefined as the scene transmission map, T, and the scene luminance map, L, respectively. Therefore, the redefined model can be expressed as follows:

$$\begin{aligned} I\left( {x,y} \right)= & {} L\left( i \right) \cdot \rho \left( x,y\right) \cdot T\left( i \right) \nonumber \\&+\;L\left( i \right) \cdot \left( 1-T\left( i \right) \right) \left( {x,y} \right) \in \Omega _i, \end{aligned}$$
(3)

where \(\Omega _{i}\) stands for the \(i{\mathrm{th}}\) scene. L(i), T(i) refer to the scene luminance and scene transmission that are constant in the \(i{\mathrm{th}}\) scene, respectively. The redefined model significantly simplifies the estimation of transmission, because the scene luminance and scene transmission need to be estimated for only a limited number of scenes (see Fig. 3).

Fig. 4
figure 4

The flowchart of the proposed dehazing method

3 A single image dehazing method

From the redefined model in Sect. 2, it can be inferred that all the scenes in the input image should be recognized first. Then, the scene luminance and scene transmission need to be estimated for each scene separately based on the results of scene segmentation. To eliminate negative effects caused by this scene-wise operation, it is also necessary to refine the scene transmission map and scene luminance map with the goal of preserving the essential depth structure while achieving local smoothness. Figure 4 shows a flowchart of our method.

3.1 Scene segmentation

It is worth noting that the brightness and texture features in a hazy image vary sharply along with the changes in haze concentration. In other words, in regions with heavy haze, the pixel brightness tends to be very high, while the texture detail is prone to be seriously blurred. Hence, we first partition the input image into several nonoverlapping patches \(B_{i}\) and, then, define a quantitative measurement of the haze density in each patch as follows:

$$\begin{aligned} V_i =\varphi \left( B_i\right) -\phi \left( B_i\right) , \end{aligned}$$
(4)

where i denotes the patch index, and \(\varphi \) and \(\phi \) are the mean and standard deviation functions, respectively. The haze distribution map V is constructed after all the patches have been traversed in the hazy image.

Based on the haze distribution, we perform scene partition using the method from [21], which is attractive due to its low complexity. Assuming that the map V is divided into k scenes, the pixel (xy) belongs to the following scene:

$$\begin{aligned} C\left( {x,y} \right)= & {} \left\{ i,V_{\mathrm{sort}} \left( {\max \left( {\left\lfloor {\frac{i-1}{k}\cdot l} \right\rfloor ,1} \right) } \right) \right. \nonumber \\\le & {} \left. V\left( {x,y} \right) \le V_{\mathrm{sort}} \left( {\left\lfloor {\frac{i}{k}\cdot l} \right\rfloor } \right) \right\} , \end{aligned}$$
(5)

where C is the scene segmentation map, \(V_{\mathrm{sort}}\) is a vector in an ascending order of haze thickness coefficients of pixels, and l denotes the image resolution.

Figure 5 shows several groups of segmentation results from using the above method (note that identical colors indicate the same scene). As Fig. 5 shows, larger k values result in more elaborate scene segmentation results; however, they may also cause the subsequent estimation procedure to be more complicated. We set k to 15 throughout our experiments by taking both computational complexity and partition accuracy into account.

Fig. 5
figure 5

The results of scene partition using different k values: a hazy images; b scene segmentation maps using \(k=7\); c scene segmentation maps using \(k=15\); and d scene segmentation maps using \(k=22\)

3.2 The rough estimate of scene luminance

As defined above, scene luminance is used to evaluate the intensity of incident light in a scene. If we simply choose the intensity of the brightest pixel in the scene as the scene luminance, it is susceptible to interference from white objects. Moreover, we have to take possible scene partition mistakes into account that can lead to an incorrect scene luminance estimate. Therefore, we adopt the erosion operation inspired by He [13] to reduce the negative impact from white objects and apply the averaging operation to weaken the interference of scene segmentation errors. In particular, for color hazy images, we first need to perform the erosion operation on the three RGB color channels separately, as follows:

$$\begin{aligned} I_E^c =I^{c}\Theta \Lambda \quad c\in \left\{ {R,G,B} \right\} , \end{aligned}$$
(6)

where \(I^{c}\) is a color channel of image \(I,\Theta \) is the erosion operator, and \(\Lambda \) denotes the template used in erosion. For each scene, the top 0.1 brightest pixels in each eroded color channel \(Ic \,E\) are averaged to obtain the corresponding scene luminance. Figure 6 shows the three separate components of the rough scene luminance map in RGB color space. Clearly, this scene luminance map conforms more closely to the realistic distribution of ambient light compared with using a fixed setting representing a global atmospheric light level.

Fig. 6
figure 6

Three components of the rough scene luminance map. From left to right the original image, and the red, green, and blue components

3.3 The rough estimate of scene transmission

From Eq. 3, we obtain the scene reflectance and its gradient:

$$\begin{aligned} \rho \left( {x,y} \right)= & {} 1+\frac{I\left( {x,y} \right) -L\left( i \right) }{L\left( i \right) \cdot T\left( i \right) }\nonumber \\\Rightarrow & {} \nabla \rho \left( {x,y} \right) =\frac{\nabla I\left( {x,y} \right) }{L\left( i \right) \cdot T\left( i \right) }. \end{aligned}$$
(7)

Because \(L\left( i \right) \cdot T\left( i \right) \le 1\), we can obtain the following:

$$\begin{aligned} \nabla \rho \left( x,y \right) \ge \nabla I\left( {x,y} \right) . \end{aligned}$$
(8)

As can be inferred from Eq. 8, the goal of haze removal is to enhance the local contrast in hazy images. Inspired by this prior, we can derive the scene transmission by maximizing the contrast of each scene, as:

$$\begin{aligned}&\hat{{T}}\left( i \right) =\arg \min \nonumber \\&\left( {-\sum _{\left( {x,y} \right) \in \Omega _i } {\left( {\sum _{c\in \left\{ {R,G,B} \right\} } {\nabla \left( {1+\frac{I^{c}\left( {x,y} \right) -L\left( i \right) }{L\left( i \right) \cdot T\left( i \right) }} \right) }} \right) }} \right) \nonumber \\&s.t.\quad 0.1\le T\left( i \right) \le 1. \end{aligned}$$
(9)

Equation 9 is a typical minimum searching problem, and the classical Fibonacci method works well to obtain optimal solutions quickly. Unfortunately, simply enhancing the contrast leads to poor visual effects, such as oversaturation in textured areas and overenhancement in the sky region. Therefore, we propose an adaptive way to adjust the scene transmission. The basic idea is to design an effective metric to distinguish scenes with various features. Then, this metric is employed to determine the magnitude of the scene transmission adjustment required. As defined in Eq. 4, the quantitative coefficient V can reflect the haze thickness. In the same manner, we are able to measure the haze thickness of a scene by averaging V(xy) of all the pixels in a scene, as follows:

$$\begin{aligned} \chi _i =\frac{1}{\left| {\Omega _i } \right| }\sum _{\left( {x,y} \right) \in \Omega _i } {V\left( {x,y} \right) }, \end{aligned}$$
(10)

where \({\vert }\Omega _{i}{\vert }\) denotes the number of pixels contained in the \(i{\mathrm{th}}\) scene. We select 200 hazy images randomly as test samples from the Internet and perform scene segmentation on those samples by Eq. 5. By observing the scene segmentation output for all these images, we are able to arbitrarily classify all scenes into four types: texture, mist, dense haze, and sky. Then, we can calculate the corresponding value for \({\chi }\). The statistical result is shown in Fig. 7, from which we can obtain the following approximation relationship:

$$\begin{aligned} \chi =\left\{ \begin{array}{ll} [0,0.3] &{}\quad \mathrm{Texture} \\ (0.3,0.5] &{}\quad \mathrm{Mist} \\ (0.5,1] &{}\quad \mathrm{Dense\;haze\;or\;Sky }\\ \end{array} \right. . \end{aligned}$$
(11)

Obviously, it is difficult to distinguish regions of dense haze from sky regions; however, the likelihood that the scene contains sky tends to increase as the value of \({\chi }\) becomes larger. To prevent overenhancement, the adjustment magnitude should be increased accordingly (\(0.5<\chi \leqq 1\)). Moreover, the adjustment magnitude should be decreased from the texture region to the mist region (\(0\leqq \chi \leqq 0.5\)). According to this principle, we can define the adjustment of scene transmission as follows:

$$\begin{aligned} \tilde{T}\left( i \right) =M_i \cdot \hat{{T}}\left( i \right) , \end{aligned}$$
(12)

where \(M_{i}\) denotes the adjustment magnitude and is explicitly expressed as follows:

$$\begin{aligned} M_i =2-\exp \left( {-\frac{\left( {\chi _i -0.5} \right) ^{2}}{2\cdot \omega ^{2}}} \right) , \end{aligned}$$
(13)

where \(\omega \) controls the slope of the function. After repeated testing, we found that the adjusted transmission behaves well and preserves the consistency of the original scene depth when \(\omega =0.15\) (see Fig. 9), and the corresponding adjustment magnitude function is used, as shown in Fig. 8. It can be clearly observed that the magnitude of the adjustment becomes smaller from texture to mist regions, while it tends to become larger from heavy haze to sky areas. In this way, we can eliminate the problems of overenhancement in sky regions and oversaturation in texture regions while still removing the haze as much as possible.

Fig. 7
figure 7

Scene feature probability distribution of \({\chi }\)

Fig. 8
figure 8

The function used to adjust the magnitude of scene transmission (\(\omega =0.15\))

Fig. 9
figure 9

Scene transmission before and after adjustment: a hazy images; b scene transmission map before adjustment; and c scene transmission map after adjustment

3.4 Edge optimization based on a guided total variation model

As described in Sect. 3.1, scene partition is inherently a patchwise process that will blur the edges in the scene transmission map \((\tilde{T})\) as well as in the three scene luminance maps \((L^{R},\, L^{G},\, L^{B})\). Thus, it will thus produce halo artifacts in the dehazing result. At the same time, the accuracy of the estimates for scene transmission and scene luminance may suffer from erroneous scene segmentation. Moreover, both the scene transmission map and the scene luminance map should possess the characteristic of local spatial smoothing, because excessive texture details may have a negative impact on the dehazing effect [11]. Intuitively, adopting a filter with a guiding function is a good choice for solving this problem [14, 22]. Such filters include the joint bilateral filter, guided filter, etc. However, these methods are extremely sensitive to parameter values and different parameter selections can greatly affect the filtering results.

Instead, to achieve the goal of local smoothing, we can apply the TV model described in [20, 23]:

$$\begin{aligned} E\left( {T_{\mathrm{refine}}} \right) =\frac{\alpha }{2}\cdot \left\| {T_{\mathrm{refine}} -\tilde{T}} \right\| _2^2 +\frac{1}{2}\cdot \left\| {\,\nabla T_{\mathrm{refine}} \,} \right\| _{2}^{2}, \end{aligned}$$
(14)

where \(\alpha \) is the regularization factor. In this model, the first term ensures the correlation between \(T_{\mathrm{refine}}\) and \(\tilde{T}\), while the second term guarantees the local smoothing of \(T_{\mathrm{refine}}\) itself. Note, the texture details are reliably blurred in \(\tilde{T}\) through this total variation optimization; however, the edge inconsistencies still exist in \(T_{\mathrm{refine}}\) where the original depth changes. Inspired by the advantages of the joint bilateral filter and guided filter, we propose a GTV model with the guiding function described as follows:

$$\begin{aligned} E\left( {T_{\mathrm{refine}}} \right)= & {} \frac{\alpha }{2}\cdot \left\| {T_{\mathrm{refine}} -\tilde{T}} \right\| _2^2 \nonumber \\&+\frac{\beta }{\hbox {2}}\cdot \left( {1-W} \right) \cdot \left\| {\nabla T_{\mathrm{refine}}} \right\| _{2}^{2} \nonumber \\&+\frac{\gamma }{2}\cdot W \cdot \left\| {\nabla T_{\mathrm{refine}} -\nabla G} \right\| _{2}^{2}. \end{aligned}$$
(15)

Here, \(\beta \) and \(\gamma \) are regularization parameters. The last term is introduced to ensure that the edge features in \(T_{\mathrm{refine}}\) remain with the guiding image, G. The weight W is defined as:

$$\begin{aligned} W\left( {x,y} \right) =1-e^{-\left| {\nabla G\left( {x,y} \right) } \right| }. \end{aligned}$$
(16)

Obviously, the weight W increases as the gradient increases. This means that the importance of the second term is reduced, but the third term becomes more important, thus achieving the goal of both blurring the texture details and preserving the edges around areas with sudden depth changes.

To speed up these calculations, we do not solve the GTV model from the perspective of the energy function; instead, we use the gradient approximation method [24] in the \(\hbox {r}\times \hbox {r}\) neighborhood as follows:

$$\begin{aligned}&E\left( {T_{refine}} \right) =\frac{\alpha }{2}\cdot \left( {T_{\mathrm{refine}} -\tilde{T}} \right) ^{2} \nonumber \\&\quad +\;\frac{\beta }{2}\cdot \left( {1-W} \right) \cdot \left( {\sum _{i=1}^{r^{2}-1} {\left( {T_{\mathrm{refine}} -T_{\mathrm{refine}-i} } \right) } } \right) ^{2} \nonumber \\&\quad +\;\frac{\gamma }{2}\cdot W\cdot \left( {\sum _{i=1}^{r^{2}-1} {\left( {T_{\mathrm{refine}} -T_{\mathrm{refine}-i} -G+G_i} \right) } } \right) ^{2}, \end{aligned}$$
(17)

where \(T_{\mathrm{refine}-i}\) and \(G_{i}\) are the neighboring pixels of \(T_{\mathrm{refine}}\) and G. According to the Euler–Lagrange equation, Eq. (17) satisfies:

$$\begin{aligned}&T_{\mathrm{refine}} \nonumber \\&=\frac{\alpha \cdot \tilde{T}{+}\gamma \cdot W\cdot \sum \nolimits _{i=1}^{r^{2}{-}1} {\left( {G{-}G_i } \right) } {+}\left( {W\cdot \left( {\gamma {-}\beta } \right) {+}\beta } \right) \cdot \sum \nolimits _{i=1}^{r^{2}{-}1} {T_{\mathrm{refine}{-}i} } }{\left( {\alpha {+}\left( {r^{2}{-}1} \right) \cdot \left( {W\cdot \left( {\gamma {-}\beta } \right) {+}\beta } \right) } \right) }.\nonumber \\ \end{aligned}$$
(18)

It can also be expressed in an iterative form [24]:

$$\begin{aligned}&T_{\mathrm{refine}}^{\mathrm{Iter}}\nonumber \\&=\frac{\alpha \cdot \tilde{T}{+}\gamma \cdot W\cdot \sum \nolimits _{i=1}^{r^{2}{-}1} {\left( {G{-}G_i} \right) } {+}\left( {W\cdot \left( {\gamma {-}\beta } \right) {+}\beta } \right) \cdot \sum \nolimits _{i=1}^{r^{2}{-}1} {T_{_{\mathrm{refine}{-}i}}^{\mathrm{Iter}{-}1} } }{\left( \alpha {+}\left( {r^{2}{-}1} \right) \cdot \left( W\cdot \left( {\gamma -\beta } \right) {+}\beta \right) \right) },\nonumber \\ \end{aligned}$$
(19)

where Iter denotes the number of iterations. When \(\xi ^{\mathrm{Iter}}={\left\| {T_{{refine}}^{{Iter}} -T_{{refine}}^{{Iter}-1} } \right\| }_2^2\big /l\le 10^{-4}\) is satisfied, the iteration process terminates. The outcome from the last iteration is the refined result of the scene transmission \(T_{\mathrm{refine}}\). In the numerator, the first two items depend on the input information and must be calculated only once, while the third one involves sum operation in the \(\hbox {r}\times \hbox {r}\) neighbor region and needs constant updates during iteration. In effect, the computational complexity of the last term in the numerator during each iteration can be decreased to O(1) if the box filter [14] is adopted to speed up the processing. We set T0 refine to \(T_{\mathrm{refine}}^0 =\tilde{T}\). The guiding image, G, is the gray component of the hazy input image. After repeated testing, we have found that this approach can achieve good dehazing results when \({\alpha }=3,\, {\beta }=3\cdot \) (Iter\(-1\)), \({\gamma }=4\), and \(\hbox {r}=\hbox {max}(l_{h},l_{w})/15\), where \((l_{h}, l_{w})\) are the height and width of the image, respectively. As Fig. 10 shows, the GTV model can achieve fast convergence. Moreover, after only a few iterations, the output scene transmission map is effective at highlighting the depth structure while blurring a large amount of the texture details (in fact, the outcome of the \(1{\mathrm{st}}\) iteration is capable of preserving the depth details consistently to the original hazy image. As the subsequent iterations proceed, the texture details become increasingly blurred). The three scene luminance maps \(L^{R}\), \(L^{G}\), and \(L^{B}\) can be refined in the same way.

Fig. 10
figure 10

The outcome of scene transmission as the iterations proceed: a, g hazy images; b, h rough scene transmission maps; and cf and il results after the \({\mathrm{Iter}}^{\mathrm{th}}\) (Iter = 1:1:4) iteration, respectively

3.5 Image restoration

Note that the refinement of the scene transmission map \(T_{\mathrm{refine}}\) and the scene luminance map Lcrefine are known, we can derive the scene reflectance \(\rho \) by Eq. 3. For convenience, we rewrite Eq. 3 as follows:

$$\begin{aligned} \rho ^{c}=1+\frac{I^{c}-L_{\mathrm{refine}}^c }{L_{\mathrm{refine}}^c \cdot T_{\mathrm{refine}}}. \end{aligned}$$
(20)

Finally, the restoration result is obtained by restricting \(\rho \) in the range [0,1] by the min–max operation:

$$\begin{aligned} R^{c}=\min \left( {\max \left( {\rho ^{c},\;0} \right) ,\;1} \right) . \end{aligned}$$
(21)

4 Experiments

In this section, we compare the quality of image haze removal using our proposed method with other typical dehazing algorithms. In the following experiments, our algorithms are implemented in MATLAB on a computer with an Intel (R) Core(Tm) i5-4210U CPU and 8.00 GB of RAM. All the parameters of our proposed method are set as described in Sect. 3.

4.1 The visual effect

Without loss of generality, we select six hazy images of different types from Internet and process them with our algorithm. The dehazing results in Fig. 11 show that our method is capable of estimating the luminance of various regions accurately and, thus, overcoming the limitation of using a fixed value for the global atmospheric light level; consequently, the visual effect of the restored images is significantly improved.

Fig. 11
figure 11

Estimation results using the proposed approach and the corresponding dehazed images: a hazy images; b scene transmission maps; c the red component of the scene luminance map; d the green component of the scene luminance map; e the blue component of the scene luminance map; and f the resulting dehazed images

4.2 Comprehensive comparison

Next, we show the haze removal results of both our method and several other representative algorithms. (The test images in Fig. 14 were downloaded from the Internet, and the test images in Figs. 12, 13, and 15 originate from Fattal’s website: http://www.cs.huji.ac.il/~raananf/). Figure 12 shows the results obtained by Tan [7], Kratz [26] and our method, respectively. As Fig. 12 shows, both algorithms proposed by Tan and Kratz work well for contrast enhancement but result in oversaturated color and halo effects near areas with discontinuous depths. Comparatively, our algorithm performs better in terms of color fidelity, and the visual results seem more natural.

From left to right, the panels in Fig. 13 show the input image, the results obtained by Choi [3], Kopf [5], Fattal [9], and our method, respectively. Clearly, the results of all those algorithms except ours lose some information in the sky region, causing an unsatisfactory visual effect. Our method performs well in preserving the information in the sky region, and exhibits clearer visibility after restoration.

In Fig. 14, we compare our method with the algorithms presented by Taral [10], Meng [12], He [13], Gibson [15], Zhu [17], and Qi [27]. Obviously, the sky color is overenhanced in the results of Taral, Meng, He, Gibson and Qi, while it is not in our method and Zhu’s; however, our method outperforms Zhu’s algorithm in the dehazing visual effect.

Finally, we compare our method with the algorithms in Nishino [8] and Fattal [25]. As shown in Fig. 15, the algorithms presented by Nishino and Fattal can generally achieve a good dehazing effect except when the illumination is insufficient. When the incident light is not strong enough, the global contrast in the dehazing results tends to be low in their results. In comparison, our method not only provides comparable haze removal results but also performs well in low luminance conditions.

Fig. 12
figure 12

From left to right hazy image, Tan’s result, Kratz’s result, and our result

Fig. 13
figure 13

From left to right hazy image, Choi’s result, Kopf’s result, Fattal’s result, and our result

Fig. 14
figure 14

From left to right hazy image, Tarel’s result, Meng’s result, He’s result, Gibson’s result, Zhu’s result, Qi’s result, and our result

Fig. 15
figure 15

From left to right hazy images, Nishino’s results, Fattal’s results, and our results

4.3 The objective assessment

We employ the rate of new visible edges e recommended by [28] and the structure similarity f proposed by Wang [29] to assess our approach quantitatively. The measures e and f are defined as follows:

$$\begin{aligned} e= & {} \frac{n_r -n_0}{n_0 }\nonumber \\ f= & {} \frac{1}{l} \cdot \sum _{1\le i\le l} {\frac{\left( {2\cdot \mu _{\hat{{B}}_i } \cdot \mu _{\tilde{B}_i } +c_1 } \right) \cdot \left( {\sigma _{\hat{{B}}_i \tilde{B}_i } +c_2 } \right) }{\left( {\mu _{\mu _{\hat{{B}}_i } }^2 +\mu _{\mu _{\tilde{B}_i } }^2 +c_1 } \right) \cdot \left( {\sigma _{\mu _{\hat{{B}}_i}}^2 +\sigma _{\mu _{\tilde{B}_i } }^2 +c_2 } \right) }},\nonumber \\ \end{aligned}$$
(22)

where \(n_{0}\) and \(n_{r}\) represent the number of visible edges in the hazy images and the corresponding dehazed images, respectively, \(\hat{{B}}_i\) and \(\tilde{B}_i\) are the \(i{\mathrm{th}}\) nonoverlapping patches in the original image I and the restored image R, respectively, \(\mu _{\hat{{B}}_i}\) and \(\sigma _{\hat{{B}}_i}^2\) denote the mean and the variance of \(\hat{{B}}_i\), respectively, \(\mu _{\tilde{B}_i}\) and \(\sigma _{\tilde{B}_i}^2\) denote the mean and the variance of \(\tilde{B}_i\), respectively, and \(\sigma _{\hat{{B}}_i \tilde{B}_i}\) is the covariance between \(\hat{{B}}_i\) and \(\tilde{B}_i \). The constants \(c_{1}\) and \(c_{2}\) are included to avoid instability.

For the sake of fairness, we test the most up-to-date dehazing algorithms on two benchmark images from Fattal’s website. The visual results for all the algorithms are shown in Fig. 16a, c; the corresponding quantitative comparison results are listed in Table 1. As Table 1 shows, Tarel [10] achieves the maximum e value, followed by Meng [12], Gibson [15], and He [13]. However, this does not mean these algorithms are superior to our method, because the number of visible edges can increase when excessive dehazing leads to noise amplification in the image. This problem can be solved by the approaches described in [30, 31]. The f values listed in Table 1 demonstrate that our method achieves the maximum similarity in structure, which indicates that the depth structure of our result conforms better to the original image.

In addition, we adopt the Aydin [32] method to detect loss of visible contrast, amplification of invisible contrast and reversal of visible contrast. As can be seen in Fig. 16b , d, except for Kopf [5], Fattal [25], and ours, the other algorithms tend to cause more or less distortion and overenhancement (e.g., the rock region in Fig. 16b).

Fig. 16
figure 16

Comparison of up-to-date dehazing techniques: a, c dehazed images; b, d visualized distortion maps (loss of visible contrast (green), amplification of invisible contrast (blue), reversal of visible contrast (red)

Table 1 The objective assessment
Fig. 17
figure 17

Comparison of computation speed

Processing speed is also important when evaluating algorithmic performance. Algorithms such as Tan [7], Nishino [8] and He [13] involve complex operations (e.g., MRF or soft matting) that greatly reduce the speed of haze removal. Therefore, we compare our method only with algorithms with less complexity. Figure 17 shows the curve for the computation time consumed in processing images under different resolutions. As the results show, our method is only slightly slower than Gibson [15], and it is faster than the others. From aspects of both dehazing effect and computational efficiency, our proposed algorithm is better suited for dehazing applications.

4.4 Situations not suited to our method

Our method may not work well in some specific types of scenes. Figure 18 shows the haze removal result of such an unsuitable image. As we all know, the hazy imaging formulation is modeled under the assumption that the scattering particles should consist of the same ingredients and uniformly distributed in atmosphere [1]. Consequently, when this assumption is violated, it is difficult for haze removal techniques satisfy the requirements of practical applications. To solve the problem of haze removal in inhomogeneous atmospheric conditions, Shi [33] presented a more robust scattering model. However, when processing the original hazy image shown in Fig. 18a. Shi’s model fails to achieve a satisfactory dehazing result, because it takes only the impact of earth’s gravity on the scattering particles into account (see Fig. 18c).

Fig. 18
figure 18

An unsuitable case: a hazy image; b our result; and c Shi’s result

4.5 Conclusion and future work

In this study, we propose a single image haze removal approach based on an improved atmospheric scattering model. We first analyze the weaknesses of the atmospheric scattering model and propose an improvement to it. Then, based on the improved model as the starting point, we develop methods to automatically partition scenes and perform estimates of the scene luminance and scene transmission maps in a scene-wise manner. Finally, we present a GTV model to achieve edge optimization. The experimental results demonstrate that our approach outperforms most up-to-date algorithms in terms of both visual effect and processing speed.

It is possible to further accelerate the procedure of haze removal; for example, we can reduce the number of scenes segmented for images with smooth changes in depth. Therefore, our future work will focus on the following two aspects: (1) adaptively setting the number of scenes based on the features of the image, and (2) investigating improvements to the atmospheric scattering model, which we expect to be more applicable in inhomogeneous atmospheric conditions.