1 Introduction

Outdoor scene images are often degraded because of the bad weather conditions, such as air particles, fog, haze and smoke, which reduce the visibility and quality of the images. The light received by the camera from an object in the distance is weakened along the line of sight. In addition, the receiving light is blended with the atmospheric light-the environment light reflected into the line of sight by air particles. The degraded images show low contrast and quality.

Haze removing or defogging is urgently needed in computer vision applications and commercial/computational photography due to its wide applications. However removing fog is a challenging problem due to that the fog is decided by the unknown depth information. This problem is ill-posed, because a single input image only provides three equations for a pixel, but there exists four unknown quantities. Therefore, a lot of methods have been proposed by using additional data or multiple images [1,2,3]. Tan and Oakley [1] removed the fog effect by taking multiple images of the same scene under the condition the depth of the scene is given. Schechner et al. [2] removed the haze by taking two or more images with different degrees of polarization. Kopf et al. [3] proposed a depth based method which requires user to input depth data or a 3D model of the scene. Further more, removing haze from single image is a more difficult case. But significant progress was made for single image haze removal in recent years. The series of methods rely on using stronger assumptions or priors. All these methods can be divided into two kind categories. The first is contrast-based, the second is statistical approaches. Fattal [4] proposed to remove haze from single image by explaining the image through a model that accounts for surface shading and the scene transmission. By assuming that the surface shading and medium transmission functions to be locally statistically uncorrelated, they solved the problem of a constant albedo and the airlight-albedo ambiguity, and recovered a haze free image. This method is physically sound and can generate a high quality results. However, it cannot deal with heavily hazy images very well and may lose efficacy in the cases of the assumption is false. He et al. [5] proposed a simple but effective method to remove the haze, which is based on the statistical observation of the dark channel, which is called dark channel prior and can be used to get a rough transmission map. In order to refine the transmission map, a time consuming matting was used to solve the unknown transmission region. He et al.’s results is high quality, but the time is very long, and maybe fail for some particular images. The first case is that the scene objects are similar to the air light and there no exists shadow in the image. The second case is that the image is physical invalid. Kratz and Nishino [6] found that scene albedo and depth have a natural ambiguity. They used novel probabilistic method (Factorial Markov Fandom Field) to solve this ambiguity, and treated the scene albedo and depth as independent latent layers. By using natural image and depth statistics as priors they could get a haze free image. Nishino et al. [7] introduced a novel Bayesian probabilistic method that jointly estimated the scene albedo and depth from a single foggy image by fully leveraging their latent statistical structures. Gibson and Nguyen [8] proposed a new dark channel prior for removing haze from the image. Unlike the dark-channel prior that assumes zero minimal value, the new prior searches for the darkest pixel average inside of each ellipsoid. Fattal [9] proposed to remove haze by using color-lines in natural images where pixels of small image patches typically show a one-dimensional distribution in RGB color space. He derived a local formation model that illustrates the color-lines in the context of hazy scenes and use it for solving the scene transmission. In contrast, by observing the haze free image and hazy image, Tan [10] found that the haze free images have higher contrast, so he removed haze from image by maximizing the local contrast of the result image and keeping the image smooth. The results are impressive but may not be physically sound. Tarel and Hautiere also proposed a contrast-based method in [11], and this method is computationally effective but it also has an assumption the transmission must be smooth except alone the edges regions with gradient jumps.

More recently, some learning based methods [12,13,14,15] had been proposed to remove haze from single image effectively. In [13], Tang et al. investigated the different haze-relevant feature of hazy image, and used best suitable feature combination to estimate a transmission map for a hazy image. In [12], Zhu et al. proposed a learning based method which considers the transmission as a linear combination of the saturation and brightness of pixels in a hazy image. They used a learning strategy to get the parameters of the model. The most related to our work is Cai et al.’s [14] and Ren et al.’s [15], which are also a deep learning-based method for estimating of transmission map. In [15], Ren et al. proposed a multi-scale deep convolutional neural networks to remove haze from single image. In contrast, our method estimates a haze-free image from hazy image directly. Compared with other learning-based methods, our network is much simpler and generates high quality results. Regarding the training data, Cai et al.’s method uniformly samples 10 random transmissions \(t \in (0,1)\) to generate 10 hazy patches for a clear image patch [14]. In contrast, our method simplifies the training data collection. By finding the common image atoms shared by hazy image patch and clear image patch, our method can generate one pair of image patch atom and clear image patch for a clear image patch.

The contribution of this paper is three-fold. First, we propose a deep convolutional neural networks to learn effective features from image patch atoms shared by hazy image patch and haze-free image patch for the estimation of an approximate clear image patch. Our goal is to estimate a clear image patch for image patch atom, which can provide a way to estimate a clear image directly from hazy image. Second, our work is the first to explore the relation between the hazy image patch and clear image patch. As shown in Sect. 2, we can see that a clear image patch will share a same image patch atom with hazy image patch, which can help us to train a network without hazy information to remove haze from the single image. Third, we can use the image patch atom by hazy image patch and haze-free image patch to simplify the preparation of the train data, we only need to consider the clear image and \( A \), which also reduce the number of training data.

2 Image Patch Atom Generation

In this section, our goal is to find a image patch atom which can be used to generate hazy image patch and haze-free image patch.

2.1 Modeling of Hazy Images

Our model used in this paper is very similar to [16], which is widely used in computer graphics and computer vision, which also explains the formulation of the haze as follows:

$$\begin{aligned} \mathbf {I}(i,j)= J (i,j)\times t(i,j)+(1-t(i,j))\times A , \end{aligned}$$
(1)

where \(\mathbf {I}\) represents the hazy image, \( J \) represents the haze free image, \( A \) stands for the global atmospheric light, and t denotes the transmission describing the probability of the light that is not scattered and absorbed by air particle or mist and arrives at the camera. To remove haze from the hazy image is equal to solve the \( A \) and t from \(\mathbf {I}\). The first term is called as direct attenuation and the second term is called as air light \( A \) contributions.

2.2 Proofing

Based on Eq. (1), we can derive the following equation:

$$\begin{aligned} \mathbf {I}(i,j)- A =( J (i,j)- A )\times t(i,j). \end{aligned}$$
(2)

Generally, we can assume that pixels in a small image patch will share a same transmission, which is use widely in single image dehazing [5, 9, 17, 18]. Based on this assumption, we assume that the transmission in a local patch is constant. We use \(\tilde{t}(\mathbf {x})\) represents this transmission where \(\mathbf {x}\) stands for (ij). Then we use \(({IA}_1, \ldots , {IA}_n)\) and \(({JA}_1, \ldots , {JA}_n)\) to represent \(\mathbf {I}(\mathbf {x})- A \) and \( J (\mathbf {x})- A \), respectively.

We can normalize the hazy model in Eq. (1) as:

$$\begin{aligned} \begin{aligned} N(\mathbf {I}(\mathbf {x})- A )=N(( J (\mathbf {x})- A )\times \tilde{t}(\mathbf {x}))=\frac{( J (\mathbf {x})- A )\times t(\mathbf {x})}{\sqrt{\sum _{k=1}^n ({JA}_k \times \tilde{t}(\mathbf {x}))^2}} \\=\frac{( J (\mathbf {x})- A )\times t(\mathbf {x})}{\sqrt{\sum _{k=1}^n {JA}_k^2)\times \tilde{t}(\mathbf {x})}}=\left( \frac{{JA}_1}{\sum _{k=1}^n {JA}_k^2}, \ldots , \frac{{JA}_n}{\sum _{k=1}^n {JA}_k^2}\right) =N( J (\mathbf {x})- A ). \end{aligned} \end{aligned}$$
(3)

According to the above equation, we can see that through a simple operation, a hazy image patch can be transferred to an image patch which can be used to generate the haze-free image patch. To the best of our knowledge, our work is the first study designed to explore the relation between the hazy image patch and the haze-free image patch. Because the hazy image patch and the haze-free image patch share same image patch atom, we can use the image patch atom to identify the haze-free image patch, the relation between image patch atom and the clear image patch can be learned via deep learning.

2.3 Validation

In this section, we conducted some experiments to validate our theory. In order to show the result visually, we chose to apply absolution and scale operation to common features. We chose to use \(1000\times 1000\) patch size for showing the result, which is not correct for dehazing. In our dehazing program we used the patch size \(16\times 16\), which is based on the assumption that pixels share same transmission value in a local image patch. In Fig. 1 we can see that our feature is determined by the \( A \) and the haze-free image patch, and one haze-free image patch will get the same image patch atom for all hazy image patches with the same \( A \).

Fig. 1.
figure 1

The hazy image patches and their corresponding image patch atoms with different transmission and the global atmospheric lights.

Fig. 2.
figure 2

Comparison on the influence of \( A \): (a) \( A =(160, 160,160)\); (b) \( A =(128, 153,255)\); (c) \( A =(200, 200, 200)\).

Fig. 3.
figure 3

The relation between the hazy image patch, the haze-free image patch and the corresponding image patch atom.

In this part, we also study the influence of A. In Fig. 2, we show the influence of A, and we can see the different values of A have different image patch atoms, and find that \( A =(160, 160,160)\) and \( A =(200, 200, 200)\) have some similarity in the image patch atom.

2.4 Motivation

Our patch image atom is inspired by the sparse coding, which uses dictionary to represent an image. In this paper, we use patch image atom to reconstruct a haze-free image. Sparse coding uses a linear combination of atoms to reconstruct an image. Different from traditional sparse coding, we use only one atom to reconstruct \( J (\mathbf {x})- A \). Our method learned a relation between the atom and haze-free image patch, we use this relation to reconstruct a haze-free image. In Fig. 3, we show the relation between the image hazy patch, image haze-free patch and image patch atom, and we can use the image patch atom to reconstruct the haze-free image patch.

3 Haze Removal

In this section, we describe our method how to use the image patch atom to remove haze from a single image. Our method consists of four essential steps: normalizing the hazy image, extracting patches from hazy image, estimating approximate clear image patches using Deep Convolution Neural Networks, removing color distortion and block artifacts (see Algorithm 1).

Fig. 4.
figure 4

Intermediate and final results of our method: (a) an input hazy image; (b) the output image; (c) the distance \(r(\mathbf {x})\) of every pixel of the hazy image to the airlight; (d) the estimate distance \(\tilde{r}(\mathbf {x})\); (e) the initial \(\tilde{t}(\mathbf {x})\); (f) the final \(t(\mathbf {x})\); (g) the guided filter output; (h) the dehazed result using transmission (g); (i) the contextual regularization output without guided filter; (j) the dehazed result using transmission (i).

(1) Patch Extraction and Normalization: We estimate \( A \) using one of the previous methods [5, 17] and define \(\mathbf {I}_A\) as:

$$\begin{aligned} \mathbf {I}_A(\mathbf {x})=\mathbf {I}(\mathbf {x})-\mathbf {A}. \end{aligned}$$
(4)
$$\begin{aligned} \mathbf {I}_A(\mathbf {x})=||\mathbf {I}_A(\mathbf {x})||. \end{aligned}$$
(5)

In order to use the image patch atom described in Sect. 2, we need to extract patches from \(\mathbf {I}_A\), and then normalize these patches. In our method, we set the patch size as \(16\times 16\) and the patches are non-overlapped. Then we normalize the patches, which will convert a hazy image patch into an image patch atom shared by the haze-free image patch and the hazy image patch.

(2) Estimating Initial Clear Patches: Based on the fact that weights sharing allows for relatively larger interactive range than other fully connected structures, we choose convolutional neural network (CNN) architecture. Our CNN architecture is very simple, which can be implemented easily. Our convolutional network architecture can be expressed as:

$$\begin{aligned} F^0(Y)=Y, \end{aligned}$$
(6)
$$\begin{aligned} F^n(Y)=\max (W^n* F^{(n-1)}(Y)+B^n,0), n=1,2, \end{aligned}$$
(7)
$$\begin{aligned} F_W(Y)=W^n* F^{(n-1)}(Y)+B^n, n=3, \end{aligned}$$
(8)

where n represents the layers, which ranges from 1 to 3. Our convolutional network architecture consists of five layers and contains four hidden layers for convolution generation. For the bottom layer with index 0, which is the input layer and expressed by Eq. (6). In each intermediate layer, which is expressed by Eq. (7), represents a convolution process for the nodes in the convolution network regarding its neighbors. By convention, \(*\) represents the convolution operation, \(W^n\) represents the convolution kernel and \(B^n\) is the bias. The top layer with \(F_W(I)\) in Eq. (8) generates the initial clear patches from the network. Then we can get the initial clear image.

(3) Removing Color Distortion and Block Artifacts: Our estimated result may shift from the line formed by \(\mathbf {I}(\mathbf {x})\) and \(\mathbf {A}\). In order to recover a high quality result, we need to apply a regularization operation. Because we recover \(\mathbf {J}(\mathbf {x})-\mathbf {A}\) from the image patch atom, we can use this information and \(\mathbf {I}_A(\mathbf {x})\) to recover the initial transmission of each pixel. Geometrically the hazy model 1 implies that in the RGB color space, the vector \(\mathbf {I}(\mathbf {x})\), \(\mathbf {J}(\mathbf {x})\), \(\mathbf {A}\) is coplanar and the end points form a line. The transmission is the ratio of the two line segments [5]:

$$\begin{aligned} \tilde{t}=r(\mathbf {x})/\tilde{r}(\mathbf {x}), \end{aligned}$$
(9)

where \(r(\mathbf {x})\) represents the distance in the RGB space of every pixel in the hazy image to the airlight, and \(\tilde{r}(\mathbf {x})\) represents the distance in the RGB space of our estimated clear pixel to the airlight. We define \(r_J(\mathbf {x})\) as follows:

$$\begin{aligned} r_J(\mathbf {x})=||\mathbf {\tilde{J}}(\mathbf {x})-A(\mathbf {x})||. \end{aligned}$$
(10)

We can replace the \(\tilde{r}(\mathbf {x})\) with \(r_J(\mathbf {x})\) to get the initial transmission map \(\tilde{t}=r(\mathbf {x})/r_J(\mathbf {x})\). Then we use the guided filter to get a smooth transmission map. We find that the result of the guided filter is not smooth enough, because our method may have some areas not are predicted, so we use the context regularization on the result of the guided filter. After that we get the final transmission map.

(4) Dehazing: Once the transmission map is estimated, we can recover the haze-free image using Eq. (1) as:

$$\begin{aligned} \mathbf {J}(\mathbf {x})=\frac{\mathbf {I}(\mathbf {x})-\mathbf {A}}{t(\mathbf {x})}+\mathbf {A}. \end{aligned}$$
(11)

In Fig. 4, we show an example of our method, which is summarized in Algorithm 1. We find that for some images whose sizes can’t be divided by 16, a contextual regularization needs to be applied to get a smooth transmission map.

figure a
Fig. 5.
figure 5

Comparison on indoor hazy images. The number in left is SSIM value and the right is L1ERR.

In general, deep models [19] need a vast amount of labelled data to solve the parameters of the network. In this paper, we seek to find a way to reduce the number of training data. By intensive study of the hazy image patch and the haze-free image patch, we find that we can use a image patch atom to generate the haze-free image patch and the hazy image patch. We use this way to reduce the number of training data. For training of a network to remove haze from single input image, it is even more hard as the pairs of haze-free image and hazy image. We use the same assumptions [13] below. First, image content and medium transmission have no relation with each other. Second, the pixels in a local patch have same transmission. According to these two assumptions, Cai et al. [14] assumed an arbitrary transmission for an individual haze-free image patch. For a haze-free image \(\mathbf {J}^p\), Cai et al. [14] assumed \(t \in (0,1]\), and generated a hazy image patch \(\mathbf {I}^p\) according to the haze model \(\mathbf {I}^p=t \times \mathbf {J}^p+(1-t) \times \mathbf {A}\). In contrast, according to the relation of haze-free image patch, hazy image patch and image patch atom, we generate a pair image patch atom and a haze-free image patch, because we eliminate the influence by normalization, we only need one pair of image patch atom and haze-free image patch for a haze-free image patch which is different from Cai et al.’s method. According to [14], they collected 10000 haze-free image patches from Internet. For a haze-free image patch, they uniformly sampled 10 random transmissions \(t \in (0,1]\) to generate 10 hazy patches. So a training dataset containing 100000 image patches was generated. In contrast, our training data only needs one image patch atom for a haze-free image patch, so for a training dataset contains the same number of image patches, our dataset includes more diversity. Therefore, our method can get a better result than Cai et al.’s method.

4 Experimental Results

In this section, we evaluated our method on a large dataset containing both synthetic and natural images and compared our performance with state-of-art methods [5, 9, 14, 15, 18]. First, we show a comprehensive comparison with other state-of-the-art methods on indoor synthetic hazy images. Second, we show a comprehensive comparison with other state-of-the-art methods on outdoor synthetic hazy images. Third, we show a comprehensive comparison with other state-of-the-art methods on natural images. In this section, we used the \(L1ERR=\frac{1}{N}\sum _{c \in {R,G,B}} |\mathbf {J}^c-\mathbf {G}^c|\) as metric, where \(\mathbf {J}\) represents the dehazing result image and \(\mathbf {G}\) denotes the ground truth image. In order to evaluate the dehazing methods, we generated an indoor hazy image dataset. This dataset is based on the indoor RGBD dataset [20], we used \(\mathbf {A}=[0.78,0.78,0.78]\) and chose three values for \(\beta \) as 0.06, 0.3, 0.5. The outdoor image dataset is obtained from [9].

4.1 Evaluation on Guided Filter

In this subsection, we show that our output of network have high quality. First, we do an operation of projection, which projects the pixel into the line form by \(\mathbf {I}\) and \(\mathbf {A}\), we denote this result as \(\mathbf {NR}\). Then we apply a guided filter on \(\tilde{t}\), and get a smooth transmission map, then use this transmission map to recover a haze-free image denoted as \(\mathbf {GR}\).

As shown in Table 1, we can see that the guided filter will result in image degradation, but will improve the visual quality. The output of the network is more similar to the original image both for indoor images and outdoor ones. So our network result has contained enough information to recover a complete hazy free image. Due to that the guided filter reduces the performance of our method, we need to find a new method to reduce the halo and artifacts.

Table 1. Quantitative comparison on indoor hazy images and outdoor hazy images. Red color indicates best result, and blue color indicates better.
Fig. 6.
figure 6

Comparison on outdoor hazy images.

Fig. 7.
figure 7

Comparison on natural images: (Left) input images; (Right) our result. Middle columns display results obtained by several methods, since each paper reports results on a different set of images.

4.2 Tests on Synthetic Hazy Images

In this subsection, we compared our method with state-of-the-art methods on both indoor and outdoor synthetic hazy images. First, we compare our method with other state-of-the-art methods, and list the overall results. Second, we show some results on some images in Fattal’s dataset and some images in our dataset.

An outdoor synthetic hazy image dataset was introduced by [9], which is available online. In order to evaluate the dehazing methods for indoor hazy images, we generated an indoor hazy image dataset. This dataset is based on the indoor RGBD dataset [20], we used \(\mathbf {A}=[0.78,0.78,0.78]\) and chose three values for \(\beta \) as 0.06, 0.3, 0.54 to generate hazy images [15].

Indoor Hazy Images: In this part, we compared our method with Ren et al.’s [15] and Berman et al.’s [18]. The structural similarity (SSIM) image quality assessment index [21] was used to evaluate performance of the methods. The higher value of SSIM shows that the dehazing result is better. First, we show some results using SSIM and L1ERR. Second, we compared all images in our dataset using SSIM and L1ERR. For quantitative performance evaluation, we selected 5 images from our dataset, the results are shown in Fig. 5, from which we can find that Berman et al.’s [18] may overestimate the haze thickness in some slight regions in Input3321. Ren et al. [15] may underestimate the haze thickness in some heavy hazy regions. In contrast, our method can estimate the hazy thickness more reasonable than Ren et al.’s and Berman et al.’s. We also compared the overall results for our dataset, which are shown in Table 2. We can find that our method can get the best performance of all. We also tested and verified that our method can highest score for 1698 images using SSIM and 2106 highest score for 1698 images using L1ERR in 4347 images.

Table 2. Quantitative comparison on our dataset. Red color indicates best result, and blue color indicates better.

Outdoor Hazy Images: In this part, we also compared our method with some state-of-art methods [5, 9, 18] on some images in dataset. We show the qualitative result in Table 3. As we can see from the results, our method can get a very similar results to the ground truths in general and also can get highest quality result for particular image. In Fig. 6 we show some results on four hazy images, we can see our network output is very similar to haze-free image.

Table 3. Quantitative comparison on road1. Red color indicates the best results and blue indicates the second.

4.3 Quantitative Evaluation on Natural Images

In this subsection, we compared our method with state-of-the-art methods. As previously pointed by [5], the image after dehazing might look dim, since the scene radiance is usually not as bright as the airlight. For display, we performed a global linear contrast stretch on the output, clipping \(0.5\%\) of the pixel values both in the shadows and in the highlights.

Figure 7 compared our method with state-of-the-art methods [5, 12, 14, 15, 17, 22]. Some of the results are provided by Fattal [9], Berman et al. [18] and Cai et al. [14], which are online. We also obtained some results via the program provided by Ren et al. [15]. As shown in Fig. 7, Ancuti et al.’s method can’t remove haze completely. He et al.’s method can yield an excellent result in general but lack some micro-contrast details when compared to [9] and ours. This is obvious in the zoomed-in buildings shown in Cityscape results, where in our result and [9] the windows are clearer than in [5]. We also find that the result of Ren et al.’s loss some details of tree in Cityscape. In contrast, our method can deal with this area well, our result shows much better details of tree. For “train” image, the result of Zhang and Yao [22] cannot deal with the boundary between segments well, which results in a lot of artifacts. The Ancuti’s result can’t remove haze completely from the hazy image. Fattal’s and Berman’s methods can’t deal with tree areas well. In contrast, our method can deal with tree area well.

5 Conclusions

In this paper, we proposed a deep learning-based method for removing haze from single input image. First, we study the relation between the hazy image patch and haze-free image patch, and find that image patch atom can be used to generate hazy image patch and haze-free image patch, we use this relation to simple the preparation of training data. Second, we proposed a deep network to remove haze from single input image, and illustrated that our method can get a high quality and quantitative results. Third, we verified that the guided filter can reduce the halo and artifacts, but reduce the quality of dehazing result. Finally, we did an extensive evaluation of the method on different types of datasets that demonstrate its high accuracy. In order to improve our method we will extend our method by using haze-line as a regularization. Inspired by [18], we can use a few hundreds of distinct colors to represent an image, which will reduce the halo and artifacts.