See clearly on rainy days: Hybrid multiscale loss guided multi-feature fusion network for single image rain removal

The quality of photos is highly susceptible to severe weather such as heavy rain; it can also degrade the performance of various visual tasks like object detection. Rain removal is a challenging problem because rain streaks have different appearances even in one image. Regions where rain accumulates appear foggy or misty, while rain streaks can be clearly seen in areas where rain is less heavy. We propose removing various rain effects in pictures using a hybrid multiscale loss guided multiple feature fusion de-raining network (MSGMFFNet). Specially, to deal with rain streaks, our method generates a rain streak attention map, while preprocessing uses gamma correction and contrast enhancement to enhanced images to address the problem of rain accumulation. Using these tools, the model can restore a result with abundant details. Furthermore, a hybrid multiscale loss combining L1 loss and edge loss is used to guide the training process to pay attention to edge and content information. Comprehensive experiments conducted on both synthetic and real-world datasets demonstrate the effectiveness of our method.


Introduction
Images captured by cameras are one of the most important information sources for intelligent transportation systems, video surveillance systems, self-driving systems, etc. Bad weather such as heavy 1  rain leads to severe degradation of image quality and poor performance of downstream high level vision tasks like object detection, object tracking, traffic flow monitoring, and so on. De-raining algorithms which can remove the resultant artifacts as a preprocessing step can help to ensure the performance of following high level vision tasks which rely on high-quality input.
However, rain removal is an ill-posed problem due to the variety of rain streaks, and because it is hard to determine where rain streaks occur. Different camera angles, wind direction, and light intensity result in different densities, directions, and shapes of rain streaks in captured pictures. In recent decades, rain removal from a single image has aroused the enthusiasm of many researchers due to its practical application value [1][2][3][4][5][6][7][8][9][10][11][12][13][14][15]. Traditional optimization based approaches [3,4,7,9,13] assume that a rainy image is made up of a rain streak layer and a clean background layer and treat it as a decomposition problem. These methods find it difficult to select effective features. Recently, deep learning methods have greatly boosted the performance of computer vision tasks. As a result, several rain removal methods based on convolutional neural networks have been proposed [1,2,5,6,8,10,12,15,16]. Compared to traditional optimization methods, the quality of images predicted by deep learning methods has greatly improved. However, the results still have several deficiencies, as can be observed in Figs. 1(b) and 1(c). Most approaches tend to mistake background texture details for rain streaks or retain some rain streaks in rainy regions, as most existing single image de-raining approaches are not designed to fully consider the complexity of rain streaks and lack the ability to handle different types of rain streaks. These issues affect the features extracted by high level vision algorithms such as object detection, and degrade the final performance.
In this work, we propose a hybrid multiscale loss guided multiple feature fusion de-raining network (MSGMFFNet) to tackle these issues. During the rain removal task, rain streak regions and rain accumulation regions should be given more attention. However, we cannot simply treat them uniformly, because the characteristics of the two areas differ. Hence, a rain streak map is generated by the attention mechanism to locate the regions with rain streaks, while image enhancement algorithms are used to coarsely eliminate the influence of rain accumulation. We use gamma correction and contrast enhancement to pre-process the original input, as contrast enhancement can improve global visibility, but may lose details in bright areas (see Fig. 2(b)) while gamma correction can preserve details in bright regions (see Fig. 2(c)). We then fuse the de-rained results of the two enhanced inputs and original input to generate the restored image under guidance of the rain streak map and a feature-fusion confidence score. We also introduce a hybrid multiscale loss made up of L 1 loss and edge loss to focus on texture detail information during the training process. Specifically, we extract features from different upsampled layers to compute multiscale outputs, and a convolutional Scharr filter layer is built to flexibly extract edge information. We calculate L 1 loss and edge loss on the multiscale outputs and their corresponding edge maps during the training process, not only to boost the performance of rain removal, but also to better preserve image texture details.
MSGMFFNet consists of an attention map learning module, a multi-input feature extraction module, a multiple feature fusion module, and a reconstruction module. The attention map learning module generates a rain streak map to distinguish between rainy and non-rainy regions. Under the guidance of the attention map, the multi-input feature extraction module pays more attention to rain streaks and texture information around them. To fully blend multiple features, the fusion module computes weight maps of the characteristics of each input. The reconstruction module restores the original resolution and produces the final de-rained output. The input of MSGMFFNet is a single image containing rain streaks without any other knowledge.
In conclusion, the key contributions of this work are: • MSGMFFNet, a novel single image de-raining network which extracts and fuses multiple features derived from an original rainy image by contrast enhancement and gamma correction, to better predict de-rained output; • a new hybrid multiscale loss function based on L 1 loss and edge loss to focus on multiscale texture information, to improve rain removal and preserve details, and • comprehensive experiments on four challenging datasets, both synthetic and real-world, to show the superiority of our method, and an ablation study to demonstrate the effectiveness of our method.

Related work
The single image de-raining problem can be seen as removing rain streaks from a rainy image to recover the clean background layer. A rainy image O is abstracted as a combination of a clean background layer B and a rain streak layer R. A widely used mathematical formulation is thus: This is a highly ill-posed problem. Hence, various approaches use prior information to constrain the solution space, and pay more attention to the desired characteristics of the restored image. Kang et al. [7] tackled single image rain removal as a decomposition problem. They used a bilateral filter to decompose the rainy image into high frequency and low frequency information, respectively. They then utilized dictionary learning and sparse coding to further decompose the high frequency part to produce a de-rained result. Chen and Hsu [4] designed a lowrank model to find spatio-temporally correlated rain streaks, which is less time-consuming. A Gaussian mixture model (GMM) is applied in Ref. [9] to provide patch-based priors which can accommodate diverse rain streaks. These methods can remove some rain streaks in captured images, but generally fail to remove all rain.
In recent years, a number of deep learning based single image de-raining methods have been proposed, achieving a better performance than prior-based deraining methods. Ref. [5] proposed a deep detail network which uses a guided filter to get a high frequency detail layer as input and learns the negative residual to reduce the mapping between input and output. Yang et al. [12] designed a multi-task recurrent architecture to handle the problem of rain streak accumulation. Zhang and Patel [1] used estimated rain-density information to guide rain removal at different rain densities and scales. Ren et al. [2] proposed a simpler and more efficient rain removal network composed of residual blocks, a convolutional long short-term memory (LSTM) [17] unit. The recurrent architecture is used to reduce parameters and maintain good performance. Yasarla and Patel [10] designed an uncertainty guided multiscale residual learning (UMRL) network to compute confidence maps to guide a de-raining network. Tan et al. [15] proposed a multiscale attentive residual network (MSAResNet) which is both location-aware and density-aware for detection and removal of rain streaks. Recently, Zhang et al. [18] designed a paired rain removal network for binocular cameras. This model can effectively extract and exploit semantic information and multi-view information. In comparison to single image deraining methods, our proposed method can extract and aggregate features from multiple inputs to get more texture detail information.
We also now briefly review multiscale learning methods which inspired us to leverage multiple input features and our design of the hybrid multiscale loss. Previous de-raining methods [1,12] use parallel dilated convolution with different strides to provide multiscale features and enlarge the receptive field. These methods utilize multiscale information at the feature level. In other image restoration tasks such as de-hazing [19,20], coarse-to-fine networks exploit multiscale knowledge. Ren et al. [19] proposed a multiscale fusion network and derived three inputs from the original hazy image using white balance, contrast enhancement, and gamma correction. In Ref. [20], a multiscale network and a multiscale loss function were designed for image de-hazing. Motivated by this work, we designed a hybrid, multiscale loss guided, multiple feature fusion deraining network. We not only leverage multiscale information at the feature level, but also focus on multiscale outputs by optimizing the hybrid loss function. Taking advantage of multiple inputs, our network is able to recover images with rich details.

Overview
Considering the variety of rain streaks, we propose a hybrid multiscale loss guided multiple feature fusion de-raining network (MSGMFFNet) to use features from multiple inputs to facilitate rain streak removal. The architecture of MSGMFFNet is illustrated in Fig. 3(a); it consists of four modules: (1) a learning module to generate an attention map which focuses on the rain streaks and texture detail information around them, (2) a multiple input feature extraction module to extract features from three inputs separately guided by the attention map, (3) a multiple feature fusion module to compute weight maps for blending the features of multiple inputs, and (4) a reconstruction module to restore the original resolution and predict the final de-rained image. The attention map learning module produces an attention map which shows the significance of different regions. Then we use contrast enhancement and gamma correction to provide two further inputs with better global visibility. To utilize the information concerning distribution of rain streaks, we concatenate the three inputs and the attention map predicted by the attention map learning module as inputs, and feed them into the multiple input extraction module to obtain features from the multiple inputs. The multiple feature fusion module estimates weights for the three inputs for blending the multiple features automatically. The reconstruction module is designed to remove rain streaks and to better restore the original clean image from a rainy image. A hybrid multiscale loss composed of L 1 loss and edge loss is added to further enhance the performance of deraining and ensure good detail preservation.

Attention map learning module
As explained in Section 1, the quality of images taken on rainy days is degraded by rain streaks and rain accumulation. To deal with rain streaks, we should pay different attention to rainy and non-rainy regions, to avoid over de-raining and loss of background texture information. On the other hand, treating rainy areas the same as non-rainy areas would result in rain streaks remaining in the recovered images. Thus, learning the distribution of rain streaks is important. Visual attention mechanisms [21] can automatically learn areas of interest; under their guidance, local features are more effectively extracted for downstream tasks. Thus, we design an attention map learning module which can automatically locate regions covered by rain: see Fig. 3(a). A recursive computation [2,22,23] utilizing the same module is introduced to reduce the number of parameters while maintaining excellent performance. Thus, the proposed attention map learning module is a recursive network formed by unfolding the same architecture. A convolution-ReLU layer is designed to extract shallow features. As different images have different rain streaks, and even the same image can contain various kinds of rain streak, as shown in Fig. 4, we build a multiscale aggregation block (MSABlock) to extract multiscale features to better capture different rain streaks. Figure 3(b) shows details of the MSABlock. It is made up of three dilated convolution layers and two normal convolution layers. We set the dilated factors to 1, 2, and 3 respectively to enlarge the receptive field. To further boost the flow of different scale features and reduce the loss of information, we concatenate all outputs of the dilated convolution layers and feed them into the next convolution layer. A convolutional gated recurrent unit (GRU) [24] follows link features in different stages. Finally, we build a convolution-sigmoid layer to generate the attention map with value in [0, 1], larger values indicating greater importance. Figure 5 shows some generated attention maps. As can be seen, they focus on the rain streaks. Under the guidance of the distribution of rain streaks, our method restores images with rich detail.

Acquisition of multiple inputs
Due to rain streaks and accumulation of rain, images captured in the rain face severe degradation. Inspired by de-hazing methods [25,26], we find contrast enhancement can improve the global clarity of rainy images and coarsely eliminate the effect of rain accumulation. We take a linear combination of the input image I in and average luminance of the image I: We set α = 1.5 in practice. As shown in Fig. 2(b), this can remove some rain streaks and make the image globally clearer. However, some texture details are blurred in bright areas.
Because of the loss of texture detail in bright areas, we obtain another input by using gamma correction (a non-linear transformation): We set β = 1 and γ = 1.5; the gamma corrected image contains rich details in bright areas: see Fig. 2(c). In  summary, we use the original rainy image to ensure the restored image has correct colors, while the two derived inputs provide texture detail for further deraining processes.

Multi-input feature extraction module
The multiple input feature extraction module aims to fully capturing the characteristics of the original rainy image and the two other inputs obtained by contrast enhancement and gamma correction. The three inputs are separately concatenated with the attention map generated by the attention map learning module and fed into three branches having the same weights. As shown in Fig. 3(a), each branch is made up of three parts: (1) a single convolution-instance normalization-ReLU layer to extract shallow features, (2) two strided convolution-instance normalization-ReLU layers to downsample the feature maps, to reduce computation, and (3) four multiscale residual blocks (MSRBlock) to enlarge the receptive field and capture multiscale characteristics to better extract rain streaks. Details of MSRBlocks are shown in Fig. 3(c). The dilation factors of the three dilated convolution layers are set to 1, 2, and 3 respectively. To further boost the flow of feature information, we add the input and output of the current layer and feed it into the next layer. A skip connection between the input to the MSRBlock and the output of the last layer is also provided. Outputs of the three branches are later fed into the multiple feature fusion module.

Multi-feature fusion module
We denote the above features extracted from the rainy image, contrast enhanced image, and gamma corrected image F r , F c , and F g respectively. We follow Ref. [27], in which a gate structure is utilized to automatically calculate confidence scores of features for further feature fusion. In our multiple feature fusion module, firstly, we compute weight maps (W r , W c , W g ) which correspond to the importance of F r , F c , and F g respectively. Then a linear combination is used to combine all products of these weight maps with the features of the multiple inputs as follows: where F f is the final fused result, further fed into the reconstruction module to predict the de-rained image.
In practice, three inputs respectively are fed into a convolution-instance normalization-ReLU layer with a 1×1 filter. Then the outputs are concatenated as input and fed in a convolution layer with a 3×3 filter to compute the weight maps. Finally, we multiply the weight maps and corresponding input features and use linear combination to get the final fused features F f . Example weight maps for F r , F c , and F g are shown in Fig. 2. The weight map of F c (Fig. 2(f)) shows that the region of interest is the global image, which is consistent with our intention to utilize contrast enhanced images to improve the global visibility. The F g weight map ( Fig. 2(g)) pays more attention in light areas which meets our idea of using gamma corrected images to provide more details in bright areas.

Reconstruction module
The multiple feature fusion module produces the fused features F f and feeds them into the reconstruction module to increase the spatial resolution and generate the final de-rained image.
The reconstruction module consists of two ResBlocks [28], two deconvolution-instance normalization-ReLU layers and a convolution-tanh layer: see Fig. 3(a). During training, a convolution-tanh layer is used after the first upsampling operation to obtain an image at 0.5 scale of the input size for further computing the hybrid multiscale loss.

Loss for the multiscale loss guided multifeature fusion network
In the attention map learning module, we use mean squared error (MSE) to compute the difference between the attention maps predicted by different stages and the binary mask M . As the attention map learning module is a recurrent architecture, we denote the attention map produced at stage t by M A t . We calculate M based on a widely used mathematical representation of rainy images [9,13,29]. Firstly, the ground truth B is subtracted from the rainy image R. Secondly, the experience threshold is set; we use 30. Values larger than the threshold belong to the rainy region, others to the non-rainy region. The loss function of the attention map learning module is where N 1 represents the number of recurrences. tθ indicates the importance of different stages. In practice, we set the N 1 to 4 and set θ to 0.2. As Fig. 6 shows, rain streaks and texture details can be clearly observed in the edge map, and non- rainy and rainy regions can be easily distinguished by edge information. Original images contain a wealth of information, but edge maps focus on texture detail. Thus, we design a new hybrid multiscale loss which consists of L 1 loss and edge loss for the purpose of restoring the content of original images and preserving the texture detail. To generate the corresponding edge map, we build a convolutional Scharr filter layer, denoted S(·). We get O 1 (0.5 scale of the input) and O 2 (full scale of the input) from the reconstruction module. The hybrid multiscale loss is where B is the ground truth. N 2 is the number of different scale outputs and θ i is weights the importance of different scales. ∇ x (·) and ∇ y (·) are the derivative operations along the horizontal (x) and vertical (y) directions. We set N 2 to 2 and θ 1 , θ 2 to 0.2, 0.8, respectively. The other two loss functions utilized are perceptual loss [30] and SSIM loss [31] to further ensure visual quality. A pre-trained VGG-19 model is applied to capture features in the restored image and the corresponding ground truth. We denote the features captured from the ith ReLU layer by P i (·) and we linearly combine outputs of different ReLU layers. The perceptual loss is defined as where m is the number of ReLU layers and θ i is the importance of the corresponding layer. Here, we set m to 5, and θ 1 , . . . , θ 5 to 1/32, 1/16, 1/8, 1/4, 1, respectively. The SSIM loss is used to ensure structural similarity between the ground truth and the de-rained image, and is calculated as follows: where μ o 2 and μ b are the mean value of the restored image O 2 and the ground truth B, respectively. σ 2 o 2 and σ 2 b are the variances of O 2 and B, respectively. σ o 2 b represents the covariance. The two small constants C 1 and C 2 are used to prevent division by zero.
The overall objective function during training is where each λ sets the importance of each loss term. In practice, the parameters λ A , λ H , λ P , λ SSIM are set to 1,8,8,12, respectively.

Approach
In this section, we describe experiments on our proposed MSGMFFNet method with three synthetic rainy datasets, a real-world rainy dataset and a task-driven evaluation dataset. Using the synthetic datasets, comparative qualitative and quantitative evaluations are conducted. Because each rainy image has a corresponding ground truth clean image in the synthetic datasets, PSNR [32] and SSIM [33] are used as full-reference image quality assessment metrics for quantitative rain removal performance comparison. On account of the difficulty of obtaining ground truth for real world images, we only show de-rained images for qualitative evaluation. A task-driven evaluation dataset is utilized to compare performance of an object detection task before and after rain removal.
Following Ref. [34], we use mean average precision (mAP) results to compare de-raining methods. We compare our proposed MSGMFFNet to four state-ofthe-art methods.

Datasets
Previous de-raining work has published some synthetic datasets [1,12,14,34]. In this paper, we chose the DID-MDN dataset provided by Ref. [1] as our training dataset, Train. The Train dataset contains 12,000 synthetic rainy images with diverse shapes, scales, and densities of rain streaks. Our evaluation uses three synthetic test sets provided by Refs. [1,14]. The DID-MDN test dataset is composed of two subsets with 1000 and 1200 images, which we denote Test1 and Test2, respectively. The Rain800 test dataset from Ref. [14], denoted Test3 here, was chosen as our third test dataset; it has 100 synthetic rainy images with rain streaks of differing shape and density.
Real-world rainy datasets are provided by Refs. [12,14]; we use them as our real-world rainy dataset when evaluating the performance of our proposed method. Most images of this real-world rainy dataset came from the Internet. Images of different scenes again contain diverse rain streaks.
Many high-level computer vision tasks are severely affected by rain. The RIS dataset captured by traffic surveillance cameras in rainy weather [34] was used to study the benefit of de-raining algorithms in an objection detection task. The RIS set contains 2048 relatively low resolution images taken by traffic cameras; it is annotated with object bounding boxes. Since 24 images lack label information, we used the remaining 2024 images as our test set to study the effectiveness of rain removal algorithms in a real task.

Training
During training, we randomly cropped 256 × 256 image patches from the original rainy images and corresponding clean background images. Gamma correction and contrast enhancement were used to get the other two inputs. The Adam [35] optimizer was utilized to train the model. We set momentum parameters β 1 , β 2 and weight decay to 0.5, 0.999, and 5 × 10 −5 , respectively. The batch size was set to 4. We initialized the learning rate to 2 × 10 −4 , and multiplied it by 0.5 and 0.2 at epochs 30 and 35, respectively. The training process ended after 46 epochs.

Comparison on synthetic datasets
Qualitative and quantitative evaluations on Test1, Test2, and Test3 were conducted to assess the performance of our proposed MSGMFFNet and the following single image rain removal algorithms: JORDER [12], DID-MDN [1], PReNet [2], and UMRL [10]. All methods predicted de-rained results during inferencing in the same test environment. We used published pre-trained models for the selected methods for fairness. Figure 7 shows results for different methods on the synthetic dataset Test1. We can clearly observe that some methods such as Refs. [2,12] do not remove rain streaks completely, while some approaches like Refs. [1,10,12] tend to over smooth texture details and reduce the clarity of restored images. In contrast, our proposed method removes rain better and retains the texture detail information better than other methods. For example, our method can completely remove rain streaks in shadowed areas of vegetation. Figure 8 presents de-rained images for different methods tested on Test1, Test2, and Test3. It can be clearly seen that our method outperforms other methods in dealing with diverse rain streaks of different shapes, directions, and densities. A visual comparison demonstrates that our proposed method can remove rain streaks effectively as well as preserving contextual details well. Table 2 lists the average PSNR and SSIM for different methods on Test1, Test2, and Test3. As can be observed, our approach outperforms the peer methods in terms of these indicators. This demonstrates the effectiveness of our proposed method which utilizes multiple input features and multiscale information.

Real-world rainy dataset
To verify the effectiveness of rain removal in realworld rain environments, we conducted experiments on the real-world rainy dataset. We present some results for visual comparison. Figure 9 shows restored results from images with diverse rain streaks. In particular, our method removes rain streaks on the black dress marked with a box in the second row of Fig. 9 while other methods leave some rain streaks there. The tattoo in the third row of Fig. 9 can be seen clearly after using our method while other approaches produce artifacts or leave rain streaks in restored images. The other state-of-the-art methods either tend to over de-rain causing image blurring, or under de-rain resulting in rain streaks remaining, while our method both removes rain streaks and preserves texture details.

Task-driven evaluation dataset
To further study the performance of de-raining methods, we ran our proposed algorithm and four other algorithms to preprocess the task-driven dataset taken on rainy days. Then, three state-of-the-art object detection models, Faster R-CNN [36], SSD-512 [37], and RetinaNet [38], were utilized detect objects: our goal was to use general detection methods not designed for rainy weather, to test detection performance after rain removal, not to retrain an object detection method suitable for rainy days. We used models trained on MS COCO provided by MMLab Detection Toolbox [39]. We used mean average precision (mAP) to compare the object detection results. Table 1 presents the mAP detection results for  single image rain removal methods using three different detection models. Figure 10 shows some restored images after rain removal and object detection. Due to over or under de-raining, restored images produced by Refs. [1,2,10,12] contain artifacts which lead to lower mAP even than for the original rainy images. This shows that preserving texture detail during single image rain removal is of great importance. Because the images in this dataset were captured by relatively low resolution surveillance cameras, the dataset itself is very challenging. Compared with other methods, our method achieves better results, but it still has a long way to go to provide the features that detection algorithms pay attention to, rather than human vision.

Ablation study
We now consider an ablation study to show the utility of the different components in our proposed MSGMFFNet. We focus on the following components: (1) the attention map learning module, f atten , (2) the multiple inputs, f multi , and (3) the hybrid multiscale loss function, L H . We started with an architecture composed of an attention map learning module, a multiple input feature extraction module, and a reconstruction module, denoted f atten + f single . We only used one branch in the multiple input feature extraction module as we only fed the original rainy image with an attention map into the following network. We then trained the network f multi without guidance of the attention maps; it consisted of a multiple input feature extraction module, a multiple feature fusion module, and a reconstruction module. Next, we added an attention map learning module to the network f multi , denoted f atten + f multi . Finally, we added a hybrid multiscale loss function L H to f atten +f multi to get the final model f atten +f multi +L H . Four different architectures were evaluated on Test1.
Note that when the L H loss was not used, we used the L 1 loss with the same weight as L H loss to compute the difference of the final restored image and the ground truth. Table 3 shows the results of the quantitative evaluation. Performance improves by 1.30 dB when using multiple input features. The guidance of the attention map improves the performance by 1.20 dB, and the hybrid multiscale loss function improves the restored results by 0.62 dB. Figure 11 presents some de-rained images for the four different networks. We can clearly see that the final network provides better rain removal. The quantitative and qualitative evaluations justify that the components designed in this paper contribute to the performance of the final network. We further present an ablation study on the proposed loss function to validate the parameter choices. The hybrid multiscale loss function we proposed has 4 parameters as described in Eq. (6): N 2 , θ 1 , θ 2 , λ H . Firstly, we discuss the effectiveness of the hybrid loss. To do so, we only change λ H in Eq. (10). To facilitate observation of the effects of different losses, we calculate the losses only at the final output scale. We replace L H with just L 1 , single edge loss, denoted L edge , and joint L 1 and edge losses in the ratio 1:1, denoted L 1 + L edge . We also removed this item from the objective loss function.
As can be seen in Table 4, L 1 + L edge achieves the best performance, demonstrating the effectiveness of the hybrid loss. Next, we evaluated the de-raining results of MSGMFFNet when using different scales to calculate losses. During the reconstruction process, our MSGMFFNet has three scales: 1/4 scale, 1/2 scale, and full scale of the input size denoted as Scale 1/4 ,  Scale 1/2 , and Scale 1 . We evaluated models trained using Scale 1 lone, mixed Scale 1/2 , and Scale 1 in the ratio 1:1, and mixed Scale 1/4 , Scale 1/2 , and Scale 1 in the ratio 1:1:1 to compute the hybrid loss. We also set the sum of weights in each case to 1 to ensure that this item had the same order of magnitude in all three experiments. Table 5 shows that the PSNR values of results generated by MSGMFFNet using multiscale outputs to compute the hybrid loss are much higher than without multiscale outputs; the values of SSIM are similar. We note that using all 3 scales does no improve the performance, so we simply use N 2 = 2 to compute the hybrid loss.
We also investigated choice of θ 1 and θ 2 in Eq. (6) to trade-off the effects of Scale 1/2 and Scale 1 . We conducted three experiments in which the ratio θ 1 : θ 2 was 1:1, 1:4, and 4:1 respectively. We set θ 1 + θ 2 = 1 to ensure that the value of this loss item had the same order of magnitude in all three experiments. Table 6 shows the results. As the weight of θ 2 increases, we obtain better results. However, when θ 2 = 1 and θ 1 = 0, it reduces to only using Scale 1 to compute the hybrid loss, and the performance decreases as shown in Table 5. Hence, we simply fix θ 2 to 0.8 and θ 1 to 0.2 in this paper.
Finally, we considered selection of λ H in Eq. (10). λ H is the weight determining the importance of the hybrid multiscale loss in the total loss function. During training, we wish to this loss to roughly have the same order of magnitude as the SSIM loss and the perceptual loss. We conducted three experiments with λ H set to 4, 8, and 12, respectively. Results are presented in Table 7. With increasing λ H , MSGMFFNet achieves better results. However, when

Conclusions
In this paper, a hybrid multiscale loss guided multiple feature fusion de-raining network (MSGMFFNet) was proposed, to detect rainy regions automatically and use features from multiple inputs to provide a better de-rained image. Multiple inputs are obtained from the original rainy image by using gamma correction and contrast enhancement to enhance global visibility and provide more detailed information. To blend features of multiple inputs automatically and improve the final de-rained images, weight maps are determined. In addition, a hybrid multiscale loss combining L 1 loss and edge loss was designed to focus on content and texture information to further boost the rain removal performance. In practice, we build a convolutional Scharr filter layer to obtain the edge map flexibly.
Results of extensive experiments demonstrate that MSGMFFNet outperforms several leading methods both on synthetic and real-world rainy datasets.