Low-light enhancement method with dual branch feature fusion and learnable regularized attention

Sun, Yixiang; Ni, Mengyao; Zhao, Ming; Yang, Zhenyu; Peng, Yuanlong; Cao, Danhua

doi:10.1007/s12200-024-00129-z

Low-light enhancement method with dual branch feature fusion and learnable regularized attention

RESEARCH ARTICLE
Open access
Published: 14 August 2024

Volume 17, article number 28, (2024)
Cite this article

Download PDF

You have full access to this open access article

Frontiers of Optoelectronics Aims and scope Submit manuscript

Low-light enhancement method with dual branch feature fusion and learnable regularized attention

Download PDF

Yixiang Sun¹,
Mengyao Ni¹,
Ming Zhao¹,
Zhenyu Yang¹,
Yuanlong Peng² &
…
Danhua Cao¹

299 Accesses
21 Altmetric
3 Mentions
Explore all metrics

Abstract

Restricted by the lighting conditions, the images captured at night tend to suffer from color aberration, noise, and other unfavorable factors, making it difficult for subsequent vision-based applications. To solve this problem, we propose a two-stage size-controllable low-light enhancement method, named Dual Fusion Enhancement Net (DFEN). The whole algorithm is built on a double U-Net structure, implementing brightness adjustment and detail revision respectively. A dual branch feature fusion module is adopted to enhance its ability of feature extraction and aggregation. We also design a learnable regularized attention module to balance the enhancement effect on different regions. Besides, we introduce a cosine training strategy to smooth the transition of the training target from the brightness adjustment stage to the detail revision stage during the training process. The proposed DFEN is tested on several low-light datasets, and the experimental results demonstrate that the algorithm achieves superior enhancement results with the similar parameters. It is worth noting that the lightest DFEN model reaches 11 FPS for image size of 1224×1024 in an RTX 3090 GPU.

Graphical Abstract

A non-uniform low-light image enhancement method with multi-scale attention transformer and luminance consistency loss

Article 29 May 2024

Dual UNet low-light image enhancement network based on attention mechanism

Article 01 December 2022

Low-Light Image Enhancement Combining U-Net and Self-attention Mechanism

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

Power inspection is an essential component in ensuring the stable operation of the power grid. Currently, the mainstream inspection solution is to analyze the images captured by the monitoring equipment through manual or computer vision techniques [1,2,3]. However, images collected during night or bad weather scenes often suffer from insufficient lighting, resulting in reduced contrast and color distortion. These issues can affect both the subjective perception of the observer and the detection accuracy of subsequent computer vision systems. Therefore, how to improve the quality of images captured from low-light scenes has become an important area of research.

The development of computer vision technology has led to numerous studies on improving low-light images using enhancement algorithms. There are three main challenges when deploying low-light enhancement algorithms in practical engineering scenarios, which are enhancement performance, scene adaptability, and inference efficiency.

Existing low-light enhancement algorithms are less capable of addressing all three issues at the same time. The performance of traditional enhancement algorithms [4,5,6,7,8,9,10] depends heavily on the configuration of model parameters, which means they may be difficult to handle various scenarios. Although the lightweight neural network-based algorithms have high adaptive capacity [11,12,13,14,15,16,17,18], it is difficult to solve the color distortion and noise interference of low-light images due to the limitation of the model volume. Besides, the algorithms with outstanding enhancement performance and scene adaptability [19,20,21,22,23,24,25,26,27] have complex network structures, which make it hard to enhance large size images in edge device with weak computational capability rapidly.

In this paper, we propose a two-stage end-to-end low-light enhancement model called Dual Fusion Enhancement Network (DFEN) for efficient enhancement of low-light images of grid inspection scenes. Referring to the imaging steps of the digital camera [28], which first performs signal amplification and then executes the image signal processing, we decompose the low-light enhancement task into two stages of brightness enhancement and detail revision, which are implemented by two series-connected U-Nets sequentially. When the U-Net goes deeper, high-resolution details embedded in the low-level features tend to be partially missing after the scale transformation. Therefore, we introduce a dual branch feature fusion module that selectively reconstructs the same scale features of the two-stage network through the channel and spatial fusion branches. Furthermore, considering the significant differences of lighting conditions in various regions of the high dynamic range images, enhancing them according to a uniform standard may result in overexposure of bright regions. A learnable regularized attention module is introduced to extract the illumination attention map of low-light images, which can guide the model to adaptively enhance the low-light images.

The proposed method is validated on several datasets including our self-built Dark Grids dataset with multiple scenarios to verify the scene adaptation. The experiment results demonstrate that our algorithm can meet the demand for rapid enhancement of high-resolution images while achieving superior results on a variety of evaluation metrics compared to the state-of-the-arts. Our contributions are as follows:

1)
We investigate a novel size-controllable low-light enhancement algorithm DFEN. It decomposes the low-light enhancement task into two stages of brightness enhancement and detail revision, allowing it to focus on different goals in each stage and achieves better enhancement results.
2)
We adopt the dual branch feature fusion (DBFF) module to shorten the feature path of the algorithm and preserve the high-resolution information, which selectively aggregates features of the same scale in the spatial and channel dimensions. This module can effectively improve the texture detail preservation and color restoration ability of the DFEN model.
3)
We design the learnable regularized attention (LRA) module to balance the enhancement effect of different regions, which can effectively suppress the overexposure in bright regions and further improve the scene adaptability of the algorithm.
4)
For nighttime grid inspection scenes, we construct a paired low-light enhanced dataset containing multiple scenarios, called Dark Grids dataset, and the proposed DFEN outperforms the state-of-the-arts on several datasets including it.

2 Related work

According to the algorithm principle, the low-light image enhancement methods can be divided into two categories. One is based on the Retinex theory, which decomposes low-light images into illumination and reflection images for separate processing. The other one enhances the low-light image directly without decomposition.

2.1 Retinex-based low-light images enhancement methods

Retinex theory treats the observed image as the product of the illumination component $L$ and the reflection component $R$, i.e., $S = R \times L$, where $R$ is not affected by the non-uniformity of light. According to the Retinex theory, we can decompose the low-light image into illumination and reflection ones to process separately, then fuse them to obtain the enhanced image.

Jobson et al. respectively proposed Single-Scale Retinex (SSR) [4], Multi-Scale Retinex (MSR) [5], Multi-scale Retinex with Color Restoration (MSRCR) [6] in 1995, 1996, and 1997. SSR took the Gaussian surround function filtered image as the estimated illumination map. However, this method could not guarantee both the color fidelity of the image and the dynamic compression capability of the algorithm at the same time. To improve the robustness of SSR, MSR got the final illumination map by weighted averaging the multiple illumination maps obtained from different scales Gaussian kernels. MSRCR was proposed to solve the color-bias problem in SSR and MSR by introducing a color recovery factor C to adjust the ratio of RGB channels. LIME [7] proposed by Guo in 2017 extracted the maximum value of pixels in each color channel of the original image as the initial illumination map, and then optimized the illumination map by the Augmented Lagrangian Multiplier (ALM). In the same year, Ying et al. proposed a dual-exposure fusion algorithm BIMEF [8] to avoid excessive contrast and lightness over-enhancement. BIMEF fused the input image with the best exposure image generated by the camera response model according to the image fusion weight matrix to obtain the enhanced image. CRM [9] proposed by Ying et al. in 2017 used the camera response model to adjust each pixel to the desired exposure based on the estimated exposure ratio map, which could reduce the color brightness distortion.

Although the above methods have achieved decent results in some image enhancement tasks, the performance is heavily dependent on the selection of model parameters, which limits the application on varied scenarios. To improve generalizability, Retinex-based deep learning methods have been increasingly used in low-light image enhancement tasks. In 2017, Shen et al. proposed MSRNet [11] to transform the MSR model into a feedforward convolutional neural network that could directly learn the end-to-end mapping of dark and bright images, but it was weak in noise suppression. RetinexNet [12] proposed by Wei et al. in 2018 firstly decomposes the input image by DecomNet, and then used EnhanceNet to realize the illumination image light adjustment, finally synthesizes the processed images to get the enhancement result. However, these methods tended to use a consistent denoising module to denoise the full image indiscriminately, and they found it difficult to handle the large differences in reflected illumination regions. In 2019, Zhang et al. proposed KinD [19] to eliminate the degradation effect of reflection image by RestorationNet and adjust the light intensity of illumination image by AdjustmentNet. Later, KinD++ [20], released in 2020, presented a novel multi-scale illumination attention module (MSIA), which not only allowed targeted denoising according to the lighting conditions in different regions, but also effectively solved the color distortion problem. In 2021, Chen et al. proposed an up-sampling algorithm [21] for single low-light images. The algorithm enhanced the illumination component and up-sampled the reflectance component by two sub-networks, and then fused the illumination and reflectance components based on the image gradient map. The algorithm achieved better results in color reconstruction and texture feature preservation. In 2021, Wang et al. proposed a reversible normalizing flow model LLFlow [24], which mapped the illumination-invariant color distribution of the normal exposure image to a Gaussian distribution, aiming to extract local pixel correlation and global image features. LLFlow had a complex structure, and it could adaptively recover image illumination while suppressing noise and artifacts. In 2022, Ma et al. proposed SCI [18], composed with share-weighted cascaded enhancement modules. SCI could greatly reduce the inference time while ensuring the vivid color of the enhanced output, but it was not capable of strong denoising. In 2023, Fu et al. concluded that the point-wise multiplication operation of the reflection and illumination components amplifies noise in low-light images, so a synthetic neural network module was used instead of the point-wise multiplication operation to obtain enhanced images [25]. They also used contrastive learning and self-knowledge distillation to constrain the network. Cai et al. proposed a transformer-based algorithm Retinexformer [26]. It designed an illumination-guided transformer for the low-light enhancement task, which could direct the modeling of long-range dependencies and interactions of regions with different lighting conditions according to the captured illumination information.

2.2 Direct low-light images enhancement methods

In addition to these Retinex-based methods, some methods do not need to decompose the input image, but rather enhance the original image directly.

In 2011 Dong et al. applied the image defogging algorithm to reversal low-light image to achieve the image enhancement [10]. The distribution of foggy image and reversed low-light image were not exactly the same, which limited the enhancement effect. In 2016, Lore et al. proposed LLNet [13] to attain adaptive low-light enhancement by stacked sparse denoising autoencoder (SSDA). Due to the simple structure, it tended to blur image details. Chen et al. applied a data-driven approach [22] in 2018 to directly train a fully convolutional network with the SID dataset containing low-exposure images and corresponding high-exposure images. In the same year, Wang et al. proposed GLAD [14] that scales the original image and inputs it into a codec network to estimate the global illumination, then fills in the detailed information that was lost during image scaling. GLAD is more effective in recovering overall brightness, but it is easy to cause color distortion.

In 2019, Jiang et al. proposed an unsupervised low-light image enhancement network EnlightenGAN [17], which created unpaired mappings between low-light and normal images, greatly simplifying the reliance on paired datasets. However, EnlightenGAN was hard to accurately recover the backlit regions, which could easily lead to color bias and artifacts. Based on this, our previous work SRANet [27] further improve the supervised training and adversarial training methods so that the algorithm could be trained using both paired and unpaired datasets, and we presented a noise reduction module based on Patch-GAN, which greatly suppressed the noise of unpaired images during the enhancement process. In 2020, MIRNet [23] proposed by Zamir et al. extracts a complementary set of features across multiple spatial scales, which could not only ensure accurate spatial details but also provide strong contextualized representations. Zeng et al. proposed the Image-Adaptive-3DLUT algorithm [15] model by combining 3DLUT with CNN. It was lightweight and realized the enhancement of high-resolution images in real time. Still, this method was less effective in processing high noisy images due to the lack of a denoising module. Guo et al. proposed a Zero-Reference Deep Curve Estimation (Zero-DCE) [16] model. To obtain the best-fit light enhancement curve, they designed an exquisite loss function to iterative curve parameter learning by implicate evaluating each output image quality. Although Zero-DCE has high inference efficiency, it tends to cause edge flares and color distortion. In 2023, Yin et al. proposed a controllable light enhancement diffusion model CLE Diffusion [29], which encoded the illumination information and utilizes the conditional diffusion model to achieve controlled light enhancement of the image. It also introduces the segment-anything model to allow the user to select regions of interest for enhancement.

In conclusion, the excellent performance of neural network has made it the main way to solve the low-light enhancement task, so our proposed DFEN also uses a structure based on convolutional neural networks to realize the enhancement of low-light images.

3 Proposed method

It is challenging to obtain satisfactory results when enhancing brightness and suppressing noise simultaneously. Therefore, we decompose the full enhancement task into two stages to accomplish brightness adjustment and detail revision in sequence. Figure 1 shows the architecture of our proposed end-to-end DFEN and the main notations of the work are formally summarized in Table 1. The low-light input image ${I}_{\text{In}}$ conducts point-wise multiplication with the output of the first stage to obtain the brighten image ${I}_{\text{Mid}}$. This process is equivalent to multiplying each pixel by a luminance enhancement factor to achieve the brightness adjustment of $I_{\mathrm{In}}$. Subsequently, ${I}_{\text{Mid}}$ is used as the input to the second stage and we produce the enhancement result ${I}_{\text{Out}}$ by point-wise addition ${I}_{\text{Mid}}$ with the output of the second stage. This stage adds a detail revision bias to each pixel, aimed for the color correction and denoising of ${I}_{\text{Mid}}$.

Table 1 Main notations and descriptions

Full size table

Our proposed DFEN applies a dual U-Net structure with a channel attention block (i.e., SE block [30]) to extract multi-scale features, which enables the model to learn richer contextual information from the input images [31, 32]. Subsequently, we adopt a dual branch feature fusion (DBFF) module that highlights the key feature information in the channel and spatial dimensions by weighted fusion, thus enhancing the color restoration and detail preservation ability of the model. In addition, we design a learnable regularized attention (LRA) module to fit the lighting condition of the low-light images and guides the model to balance the enhancement effects for different regions. Finally, the cosine training strategy is introduced to gradually adjust the loss weights of the two-stage network, which leads to smoother transition between the two-stage tasks and achieves better integrated enhancement results.

3.1 Dual branch feature fusion module

In order to compensate for the high-resolution information lost during the scale transformation of the U-Net and shorten the feature path of the network, we adopt the dual branch feature fusion (DBFF) module shown in Fig. 2 in the encoder part of the second-stage to perform weighted fusion of the same scale features in the encoder and decoder parts of the two-stage network.

Inspired by the dual attention branch designed in SANet [33], DBFF splits the multiple input feature streams in the channel dimension, and parallel performs feature adaptive selection and aggregation in channel fusion branch (CFB) and spatial fusion branch (SFB). Finally, we concatenate the output of the two branches to obtain better fusion feature representations.

In the channel fusion branch, we first perform point-wise addition on the multiple input feature streams $\left\{{Ci}_{1}, {Ci}_{2}, {Ci}_{3}\right\}\;\in\;{R}^{H\times W\times C}$, and then obtain ${C}_\text{Mid}\;\in\;{R}^{1\times 1\times C}$ by applying global average pooling, which embeds the spatial global information of input features. Subsequently, we obtain the inter-channel relationships through squeeze and excitation operations and generate the channel fusion weights Cw₁, Cw₂ and Cw₃.

$$C_{\mathrm{Mid}}=\mathrm{GAP}(C_{\mathrm{add}})=\mathrm{GAP}(Ci_1+Ci_2+Ci_3),$$

(1)

$$Cw = F_{{{\text{ex}}}} (Cs) = F_{{{\text{ex}}}} (F_{{{\text{sq}}}} (C_{{{\text{Mid}}}} )),$$

(2)

where F_sq is a channel-downscaling convolution layer and F_ex is a channel-upscaling convolution layer. $\left\{{Cw}_{1}, {Cw}_{2}, {Cw}_{3}\right\}\;\in\;{R}^{1\times 1\times C}$ is split from Cw. $Cw\;\in\;{R}^{1\times 1\times 3C}$, $Cs\;\in\;{R}^{1\times 1\times \frac{C}{4}}$ is the typical setting in our method.

Finally, we compute the fusion result of the channel dimension according to Eq. (3). Note that the SoftMax function is used to normalize the weights ${\alpha }_{c}$, ${\beta }_{c}$, ${\gamma }_{c}$ of the same channel in Cw₁, Cw₂, Cw₃, which makes ${\alpha }_{c}+{\beta }_{c}+{\gamma }_{c}=1$.

$$Co=Cw_1\cdot Ci_1+Cw_2\cdot Ci_2+Cw_3\cdot Ci_3.$$

(3)

Spatial fusion branch is similar to the channel fusion branch. After point-wise add $\left\{{Si}_{1}, {Si}_{2}, {Si}_{3}\right\} \in {R}^{H\times W\times C}$, we perform global average pooling and maximum pooling along the channel dimensions and concatenate the results to obtain ${S}_{\text{Mid}}\in {R}^{H\times W\times 2}$. Then, we generate the spatial fusion weights $\left\{{Sw}_{1}, {Sw}_{2}, {Sw}_{3}\right\} \in {R}^{H\times W\times 1}$ via a convolution layer with a kernel size of 3. Finally, we implement the spatial feature fusion based on the normalized Sw₁, Sw₂ and Sw₃.

$$S_{\mathrm{add}}=Si_1+Si_2+Si_3,$$

(4)

$$S_{\mathrm{Mid}}=\mathrm{Concat}\left(\mathrm{GAP}\left(S_{\mathrm{add}}\right),\mathrm{GMP}\left(S_{\mathrm{add}}\right)\right),$$

(5)

$$So=Sw_1\cdot Si_1+Sw_2\cdot Si_2+Sw_3\cdot Si_3.$$

(6)

DBFF reconstructs the input feature streams in the channel and spatial dimensions, which effectively compensates for the high-resolution detail information lost due to the network layer deepening, thus enhancing the color recovery and detail preservation capabilities of the DFEN. Moreover, the adaptive generation of fusion weights through global pooling and convolution operations can more accurately guide the model to emphasize the significant features of inputs streams.

3.2 Learnable regularized attention module

In our previous study, the quality of enhancement can be significantly improved by adding a simple self-regularized attention map, which allows the model to enhance both light and dark areas of the image appropriately [27]. However, we find that when using fixed coefficients to obtain an attention map, the contrast of the image may be lost, making it difficult to accurately distinguish regions of similar brightness but different colors, which results in color distortion in the enhanced images.

To solve this problem, we replace the fixed coefficients with an LRA module to acquire the single-channel attention map. We then concatenate it with the input image and send them to the InConv layer together. At the same time, the attention maps are downsampled progressively using the same downsampling method as in the U-Net structure in order to adapt to different feature scales. In addition, the attention maps will also be multiplied with the features in the skip connections of the U-Net for better guidance. It is worth noting that the last activation of each attention block is sigmoid to limit the value range from 0 to 1. Through the addition of the LRA module, DFEN can effectively suppress the overexposure in bright regions and adaptively balance the enhancement performance of different regions, which will be verified in the ablation experiments.

3.3 Cosine training strategy

As illustrated in Fig. 3, we establish distinct loss functions for the mid-output and output images to the reference image. This makes DFEN focus on different tasks during the different epochs of training. Besides, we propose a cosine training strategy that dynamically adjusts the weights of the loss function during training to make the transition between the two task stages smoother.

When training DFEN, we adopt L1 loss and SSIM loss [34] between the output image $I_{{{\text{Out}}}}$ and the reference image $I_{{{\text{Ref}}}}$ to accurately restore the details. Besides, the content loss is computed as an addition to constrain the perceptually similar [35]. The loss of the output image is as follows:

$${\mathcal{L}}_{{{\text{Out}}}} \left( {I_{{{\text{Out}}}} ,I_{{{\text{Ref}}}} } \right) = w_{1} {\mathcal{L}}_{1} + w_{{{\text{SSIM}}}} {\mathcal{L}}_{{{\text{SSIM}}}} + w_{{{\text{Cont}}}} {\mathcal{L}}_{{{\text{Cont}}}},$$

(7)

where $w_{1}$, $w_{{{\text{SSIM}}}}$ and $w_{{{\text{Cont}}}}$ are the weight of ${\mathcal{L}}_{1}$, ${\mathcal{L}}_{{{\text{SSIM}}}}$ and ${\mathcal{L}}_{{{\text{Cont}}}}$, respectively.

We also apply the loss function directly between the mid-output $I_{{{\text{Mid}}}}$ and the reference image $I_{{{\text{Ref}}}}$ as follows:

$$\mathcal{L}_{\mathrm{Mid}}\;\left(I_{\mathrm{Mid}},\;I_{\mathrm{Ref}}\right)\;=\;w_1^{\prime}\mathcal{L}_1^{\prime}\;+\;w_{\mathrm{SSIM}}^{\prime}\mathcal{L}_{\mathrm{SSIM}}^{\prime}.\\$$

(8)

As the pseudo-code shown in Algorithm 1, the training of DFEN is divided into two stages. During the first stage, we compute all the losses after one forward propagation to make each parameter get enough gradient. In the second stage, we only calculate ${\mathcal{L}}_{{{\text{Out}}}}$. We add a cosine conversion factor to control the transition between the two stages, and the total loss function of the network is shown as follows:

$$\left\{ \begin{gathered} {\mathcal{L}}_{{{\text{Total}}}} = c \times {\mathcal{L}}_{{{\text{Mid}}}} + \left( {1 - c} \right) \times {\mathcal{L}}_{{{\text{Out}}}} , \hfill \\ c = \max \left( {\cos \left( {\uppi \times \frac{{{\text{Epoch}}}}{N}} \right),0} \right), \hfill \\ \end{gathered} \right.$$

(9)

where c is a coefficient that satisfies cosine descent until 0 throughout the training process. Epoch represents the current number of training epochs $N$ is the total training epochs.

4 Experimental results and discussion

In this section, we conduct comparison and ablation experiments to reveal the advance of the proposed DFEN.

4.1 Datasets description

Public datasets LOL [12], LOLv2 [36] and SICE [37] are selected for comparison experiments. The details of each dataset are shown in Table 2, and some reference samples are shown in Fig. 4.

Table 2 Details of each datasets

Full size table

For the low-light power inspection scenes of the project, there is currently no public dataset of that can be used for enhancement model training and evaluation. So, we constructed the Dark Grids dataset using inspection imaging equipment, which mainly consists of nighttime transmission tower scenes, high-dynamic-range scenes and daytime normal exposure scenes.

We use a 12-bit industrial camera to capture images from different times and locations. For each scene, we first capture a long-exposure (512 ms) image N₁ and then take the image sequence with different exposures from 1 to 512 ms step by step. Another long-exposure (512 ms) image N₂ is taken at the end. We calculate the MSE metrics between N₁ and N₂, and filter out the sequences of images with high similarity (>0.98). Next, we applied the multi-frame HDR fusion algorithm in Adobe Photoshop CC to obtain the reference image of each scene. Finally, we collected a total of 530 pairs of training samples and 103 pairs of test samples with 1224 × 1024 pixels.

4.2 Implementation details

The entire algorithm is built on the PyTorch framework. As mentioned in Fig. 1, the proposed DFEN can quickly change its size by setting different channel numbers of InConv layer. To evaluate the enhancement effect of different volume models, we set different sizes of DFEN as 8, 16, 24, corresponding to DFEN-s, DFEN-m and DFEN-l, respectively.

When training DFEN, random flip, affine transform and random crop are used to enhance both low-light and reference images to obtain 512 × 512 image pairs. We tuned the hyperparameters by manual tuning and grid search. The multiplication coefficients of each feature layer of U-Net are set to 1, 4, 16 and 32 and different loss weights are set according to $2w^{\prime}_{1} = 2w^{\prime}_{{{\text{SSIM}}}} = w_{1} = w_{{{\text{SSIM}}}} = 10w_{{{\text{Cont}}}} = 1$. Based on experience, AdamW [38] is used as the optimizer and the batch size is set to 12. A total of 600 epochs are trained to avoid overfitting.

4.3 Quantitative comparison experiments

We focus the comparison experiment on the proposed DFEN with thirteen mainstream low-light image enhancement methods, including three traditional methods LIME [7], BIMEF [8], CRM [9], and ten deep learning methods RetinexNet [12], GLAD [14], MIRNet [23], EnlightenGAN [17], KinD++ [20], Zero-DCE++ [39], Adapt-3DLUT [15], SCI [18], LLFlow [24], SRANet [27]. It is worth mentioning that for the unpaired training part involved in SRANet and EnlightenGAN, the low-light and the corresponding reference images are shuffled separately and randomly collected to form the unpaired training batch.

We compute SSIM [34] and PSNR between the enhanced image and the reference image as quantitative metrics of enhancement performance. Moreover, we use LOEref [19, 40] to evaluate the ability of the algorithm in preserving the naturalness of lightness. Table 3 reports the results, and we also plot the scatter diagram of the SSIM metric and the computational efficiency (Params and FLOPs) for part of the CNN-based algorithms in the LOLv2 dataset, as shown in Fig. 5. Some algorithms with the number of parameters larger than 40 M are not indicated in the figure. Moreover, some enhanced samples of the top 10 comparison algorithms in terms of average SSIM metric are shown in Figs. 6, 7, 8 and 9.

Table 3 Results of comparison experiment

Full size table

As for the LOL dataset, some networks use weights provided by the original authors. There are only 15 images in the LOL testing set. Our DFENs achieve better enhancement performance in models with a similar number of parameters and obtained the lowest LOE metrics. As shown in Fig. 6, CRM, GLAD and SCI fail to reach an acceptable brightness range and the contrast of the enhanced image is low. AdpLUT and EnligtenGAN have poor denoising ability and the enhanced image contains a lot of noise. It is evident that only the proposed DFEN, as well as MIRNet, Kind++, LLFlow and SRANet with a huge number of parameters, are able to accurately restore the color of the clothes in the plastic box. Among them, Kind++’s result are heavily color biased, while the results of MIRNet and SRANet have blurred details.

For the LOLv2 and SICE datasets, all the networks are retrained, and the proposed DFENs achieve superior results on SSIM, PSNR and LOE. Observing the visualization results, CRM has acceptable noise suppression and color restoration capabilities, but the enhancement effect is poor for dark scenes, making it difficult to recognize the facial details of the people in the dark parts of the image in Fig. 8. GLAD struggles to recover color information from low-light images, and the green carpet next to the swimming pool in Fig. 7 degrades to brown. The structure of MIRNet is too deep to accurately restore the high-frequency features in the shallow layers, resulting in blurred detail. Since EnlightenGAN adopts unsupervised training strategy, it poses a challenge to balance the enhancement effect of different regions, which leads to overexposure and high saturation in the output image. As can be seen in the detail zoomed image, the output of AdpLUT is still noisy due to the lack of a denoise module. Although KinD++ can suppress the noise better, the details are partially lost, and artifacts are generated in the extremely dark regions in the bottom left corner of the first image in Fig. 7. The model of SCI is too simple, resulting in poor enhancement effect and serious color deviation in the enhanced image. LLFlow can effectively suppress the generation of noise and artifacts at the same time, but the model is too complex, leading to a longer inference time. Compared with our previously proposed SRANet, the DFEN redesigns the algorithm structure, introducing the LRA module and novel DBFF module, so that it can restore the image more realistically. Compared with our previously proposed SRANet, the DFEN redesigns the algorithm structure, introducing the LRA module and novel DBFF module, so that it can restore the image more realistically. For the low-light images in Fig. 7, DFEN obtains enhancement results with more realistic colors and clearer details. And for the high dynamic range sample in Fig. 8, DFEN can avoid the over-exposure of bright regions while correctly reveal details in dark regions.

In the Dark Grids dataset, it is easy to find that the proposed method has obvious advantages over other methods. From the displayed samples in Fig. 9, GLAD, KinD++, and SRANet have deficiencies in brightness recovery, making the enhancement results of the first image grayish and the details of the construction vehicles blurred. The enhanced images of CRM, EnlightenGAN and AdpLUT are bright enough but the saturation is low. SCI improved color recovery, but the second image has an over-exposure problem in the car interior. The results of MIRNet and LLFlow lose a great deal of high-resolution detail, leading to severe blurring of the distant towers in the second image. We can see that the DFEN model can solve the problems of color deviation and uneven exposure more accurately, performing better in noise suppression and detail retention. It indicates that DFEN is more suitable for the project’s low-light power inspection scenes than existing algorithms.

Overall, in contrast to other comparative algorithms, the proposed DFEN is more effective in recovering the color and texture details of low-light images. At the same time, DFEN can balance the enhancement effect in different lighting conditions to avoid color distortion and over-exposure in the enhanced image. Moreover, DFEN has achieved outstanding enhancement results in several datasets, which also proves its excellent scene adaptation ability. Although LLFlow shows stronger color recovery and denoising ability in some datasets, DFEN has the advantage of computational efficiency, which is more suitable for the scenarios of our project.

4.4 Ablation study

To validate the necessity of each modules in the proposed method, we design several ablation experiments, configuration and results are shown in Tables 4 and 5. Note that the quantitative evaluation of all ablation experiments was performed on the LOLv2 dataset and we still use SSIM, PSNR and LOE_ref as quantitative evaluation metrics. We consider DFEN-m as the baseline algorithm. Except for the modules evaluated, all the experiments share the identical experimental setups.

Table 4 Results of structure ablation experiment

Full size table

Table 5 Results of training strategy experiment

Full size table

4.4.1 Ablation experiments of feature fusion strategy

We first conducted ablation experiments for different feature fusion methods, and several enhanced samples are displayed in Fig. 10. A1 removes the feature fusion module between the two stages, the high-resolution features are partly lost after the scale transformation, resulting in serious blurred details in the outputs, as well as the worst evaluation metrics. Compared with the common concatenation fusion in A2, A3 and A4 can highlight the key features of the same channel or spatial position in the feature group by using the spatial or channel dimension weighted fusion method, thus obtaining better texture detail retention and color recovery capabilities, respectively. Observing the enlarged image, it can be seen that the characters, lawns and railings in the enhanced images are clearer when using the spatial fusion module, but there are some color deviations in the whole image. Although the color of the enhanced image obtained by A4 is more realistic, it suffers from a degree of blurring in the details. The DBFF module adopted by DFEN makes the fusion weights of the spatial and channel dimensions to be independent of each other by grouping the feature maps to jointly improve the fusion effect of the model, leading to enhancement results with clear details and realistic colors.

The introduction of the feature fusion module improves the performance of the algorithm, but also increases the computational complexity. The proposed DBFF uses channel splitting to divide the features into two groups to balance the computation and enhancement effect of the algorithm. In practical applications, we can adjust the preferences for either enhancement effect or computation by duplicating the features or grouping them by channel dimensions

4.4.2 Ablation experiments of illumination attention module

Subsequently, we compare the usefulness of illumination attention maps generated by different attention modules. B1 without any illumination attention module, B2 utilizes a self-regularized attention module with fixed coefficients, and proposed DFEN adopt a learnable regularized attention module. As depicted in Fig. 11, B1 is struggles to perceive the lighting conditions in various regions of the input image without the illumination attention module. This can result in overexposure of bright areas in the input image during enhancement. Therefore, in order to balance the enhancement effect of different regions, we introduce the SRA and LRA modules to extract the illumination attention map of the input images. However, the SRA module is hard to distinguish regions with similar brightness but different colors, which leads to color distortion in the vegetation part of the enhanced image. The LRA module uses a learnable convolution to generate illumination attention maps, making the boundaries of vegetation and dry grassland areas clearer. In addition, Fig. 12 displays the illumination attention maps generated with the learnable weights and fixed coefficients. It is obvious that the learnable approach can better distinguish different regions.

It should be noted that having a sufficient number of images with varied lighting conditions in the dataset is crucial for the efficiency of the LRA module. If the lighting conditions are too homogeneous, the LRA may fail and misrepresent the features of the enhanced images. For instance, excluding the images of normal illumination scenes from the dataset can significantly reduce the ability of the LRA to suppress overexposure.

4.4.3 Ablation experiments of training strategy

Finally, we evaluate the impact of different training strategies, the results of which are recorded in Table 5. C1–C4 only perform the loss constraint on the output image $I_{{{\text{Out}}}}$, C5–C7 add the constraint on the intermediate image $I_{{{\text{Mid}}}}$, and C8 adopts the same loss function configuration as the DFEN model but dose not adopt the proposed cosine training strategy. It shows that simultaneously adopt five losses during training helped to obtain better enhancement effect. When we remove the cosine training strategy, the enhancement effect gets worse. It may be caused by the inconsistent goals of the two training stages, making the training effect difficult to pass on, which is equivalent to shorten the number of valid training epochs. By establishing different loss constraints for the two stages output images and using the cosine training strategy to gradually adjust the loss weights of two stages during the training process, the two stages of the algorithm can focus on different task respectively, thus achieving better enhancement results.

In summary, the DBFF module, the LRA module and the cosine training strategy with five losses adopted by DFEN can effectively improve the enhancement performance, making the color more realistic, the detail clearer, and the enhance different regions more adaptable.

4.5 Lighting conditions adaptability evaluation

In real-world applications, diverse lighting conditions pose a great challenge to low-light enhancement algorithms. In our Dark Grids dataset, we take images with various exposure duration in the same scene, which makes them have different lighting conditions. We use them to test the light condition adaptation of the DFEN and state-of-the-art methods.

As shown in Figs. 13 and 14, it can be seen that our algorithm has a more satisfactory adaptability to lighting conditions. Specifically, for the night dark scene shown in Fig. 13, DFEN can better recover the color and texture information of the low-light image when the exposure duration reaches 32 ms. However, the brightness of the results of CRM and SCI is low, the enhanced image of EnlightenGAN has serious artifacts, and LLFlow is difficult to recover the color information of the image. For the high dynamic range scene shown in Fig. 14, at 8 ms exposure time, only DFEN and LLFlow effectively enhance the detail information of the leaves in the shadows. And observe the pavement under the street lamp, when the exposure duration reaches 32 ms or more, CRM, EnlightenGAN and SCI all show more serious over-exposure phenomenon, and LLFlow produced white artifacts, while only DFEN could avoid the aggravation of over-exposure in the input image.

It can be seen that the exposure duration limits the ability of the camera to capture the environmental information in low-light scenes. Therefore, we need to extend the exposure time of the camera to obtain visually better enhanced images, but also to avoid the motion blur caused by long exposure time.

5 Conclusion

To achieve rapid and premium enhancement of low-light images of power grid inspection scenes, we propose a two-stage end-to-end low-light enhancement algorithm DFEN. Compared with the one-stage method, DFEN decomposes the low-light enhancement task, making the learning target of each network more simplified. By employing the proposed cosine training strategy, it dynamically adjusts the loss function of the model. This allows the algorithm to focus on the learning of brightness adjustment and detail revision networks separately during different training epochs, thus achieving better enhancement results. In addition, we also adopt DBFF and LRA to further enhance the feature extraction and recovery ability of the model. Finally, we introduced a size control hyperparameter to adjust the number of channels in the U-Net. This allows our algorithm to flexibly balance the model size and enhancement effect based on practical application needs.

We also produce the Dark Grids dataset with various scenarios, and verify the effectiveness of the proposed method on several datasets including it. The results show that compared to state-of-the-art methods, the proposed DFEN can achieve better enhancement performance with the similar parameters, and has excellent scene adaptability. Among them, the lightest DFEN model reaches 11 FPS for image size of 1224×1024 in an RTX 3090 GPU.

We will continue to work on two aspects in the future. Firstly, the construction of paired datasets is complicated, so we are trying to introduce an unsupervised training strategy to get rid of the dependence on high-quality paired datasets. Second, the research and experiments of the algorithms are currently conducted on the server with an RTX 3090 GPU. We will complete the model deployment and inference acceleration on the edge computing platform, so that the model can be actually used on the grid inspection platform.

Availability of data and materials

The data that support the findings of this study are available from the corresponding author, upon reasonable request.

References

Chen, J., Fu, Z., Cheng, X., Wang, F.: An method for power lines insulator defect detection with attention feedback and double spatial pyramid. Electric Power Syst. Res. 218, 109175 (2023)
Article Google Scholar
Tao, X., Zhang, D., Wang, Z., Liu, X., Zhang, H., Xu, D.: Detection of power line insulator defects using aerial images analyzed with convolutional neural networks. IEEE Trans. Syst. Man Cybernetics Syst. 50(4), 1486–1498 (2020)
Article Google Scholar
Wang, S., Zou, X., Zhu, W., Zeng, L.: Insulator defects detection for aerial photography of the power grid using you only look once algorithm. J. Electr. Eng. Technol. 18, 3287–3300 (2023)
Article Google Scholar
Jobson, D.J., Rahman, Z., Woodell, G.A.: Properties and performance of a center/surround retinex. IEEE Trans. Image Process. 6, 451–462 (1997)
Article Google Scholar
Rahman, Z., Jobson, D.J., Woodell, G.A.: Multi-scale retinex for color image enhancement. In: Proceedings of 3rd IEEE International Conference on Image Processing. pp. 1003–1006. IEEE, Lausanne, Switzerland (1996)
Jobson, D.J., Rahman, Z., Woodell, G.A.: A multiscale retinex for bridging the gap between color images and the human observation of scenes. IEEE Trans. Image Process. 6, 965–976 (1997)
Article Google Scholar
Guo, X., Li, Y., Ling, H.: LIME: low-light image enhancement via illumination map estimation. IEEE Trans. Image Process. 26(2), 982–993 (2017)
Article MathSciNet Google Scholar
Ying, Z., Li, G., Gao, W.: A bio-inspired multi-exposure fusion framework for low-light image enhancement (2017) arxiv.org/abs/1711.00591
Ying, Z., Li, G., Ren, Y., Wang, R., Wang, W.: A new low-light image enhancement algorithm using camera response model. In: IEEE International Conference on Computer Vision Workshops (ICCVW). pp. 3015–3022. IEEE, Venice, Italy (2017)
Dong, X., Wang, G., Pang, Y., Li, W., Wen, J., Meng, W., Lu, Y.: Fast efficient algorithm for enhancement of low lighting video. In: IEEE International Conference on Multimedia and Expo. pp. 1–6. IEEE, Barcelona, Spain (2011)
Shen, L., Yue, Z., Feng, F., Chen, Q., Liu, S., Ma, J.: MSR-net: low-light image enhancement using deep convolutional network (2017) arxiv.org/abs/1711.02488
Wei, C., Wang, W., Yang, W., Liu, J.: Deep retinex decomposition for low-light enhancement. In: The British Machine Vision Conference. British Machine Vision Association, Newcastle (2018)
Lore, K.G., Akintayo, A., Sarkar, S.: LLNet: a deep autoencoder approach to natural low-light image enhancement. Pattern Recogn. 61, 650–662 (2017)
Article Google Scholar
Wang, W., Wei, C., Yang, W., Liu, J.: GLADNet: low-light enhancement network with global awareness. In: 13th IEEE International Conference on Automatic Face and Gesture Recognition. pp. 751–755. IEEE, Xi’an (2018)
Zeng, H., Cai, J., Li, L., Cao, Z., Zhang, L.: Learning image-adaptive 3D lookup tables for high performance photo enhancement in real-time. IEEE Trans. Pattern Analysis Mach. Intell. 44, 2058–2073 (2022)
Google Scholar
Guo, C., Li, C., Guo, J., Loy, C.C., Hou, J., Kwong, S., Cong, R.: Zero-reference deep curve estimation for low-light image enhancement. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). pp. 1777–1786. IEEE, Seattle (2020)
Jiang, Y., Gong, X., Liu, D., Cheng, Y., Fang, C., Shen, X., Yang, J., Zhou, P., Wang, Z.: EnlightenGAN: deep light enhancement without paired supervision. IEEE Trans. Image Process. 30, 2340–2349 (2021)
Article Google Scholar
Ma, L., Ma, T., Liu, R., Fan, X., Luo, Z.: Toward fast, flexible, and robust low-light image enhancement. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). pp. 5627–5636. IEEE, New Orleans (2022)
Zhang, Y., Zhang, J., Guo, X.: Kindling the darkness: a practical low-light image enhancer. In: Proceedings of the 27th ACM International Conference on Multimedia. pp. 1632–1640. ACM, Nice (2019)
Zhang, Y., Guo, X., Ma, J., Liu, W., Zhang, J.: Beyond brightening low-light images. Int. J. Comput. Vis. 129, 1013–1037 (2021)
Article Google Scholar
Chen, L., Guo, L., Cheng, D., Kou, Q.: Structure-preserving and color-restoring up-sampling for single low-light image. IEEE Trans. Circuits Syst. Video Technol. 32(4), 1889–1902 (2022)
Article Google Scholar
Chen, C., Chen, Q., Xu, J., Koltun, V.: Learning to see in the dark. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). pp. 3291–3300. IEEE, Salt Lake City (2018)
Zamir, S.W., Arora, A., Khan, S., Hayat, M., Khan, F.S., Yang, M.H., Shao, L.: Learning enriched features for real image restoration and enhancement. In: 2020 European Conference on Computer Vision (ECCV). pp. 492–511. Springer International Publishing (2020)
Wang, Y., Wan, R., Yang, W., Li, H., Chau, L.P., Kot, A.: Low-light image enhancement with normalizing flow. Proc. AAAI Conference Artif. Intell. 36, 2604–2612 (2022)
Article Google Scholar
Fu, H., Zheng, W., Meng, X., Wang, X., Wang, C., Ma, H.: You do not need additional priors or regularizers in retinex-based low-light image enhancement. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). pp. 18125–18134. IEEE, Vancouver (2023)
Cai, Y., Bian, H., Lin, J., Wang, H., Timofte, R., Zhang, Y.: Retinexformer: one-stage retinex-based transformer for low-light image enhancement. In: IEEE/CVF International Conference on Computer Vision (ICCV). pp. 12470–12479. IEEE, Paris (2023)
Yang, Q., Wu, Y., Cao, D., Luo, M., Wei, T.: A lowlight image enhancement method learning from both paired and unpaired data by adversarial training. Neurocomputing 433, 83–95 (2021)
Article Google Scholar
Nakamura, J. (ed.): Image sensors and signal processing for digital still cameras. Taylor & Francis, Boca Raton (2006)
Google Scholar
Yin, Y., Xu, D., Tan, C., Liu, P., Zhao, Y., Wei, Y.: CLE Diffusion: controllable light enhancement diffusion model. In: Proceedings of the 31st ACM International Conference on Multimedia. pp. 8145–8156. ACM, Ottawa (2023)
Hu, J., Shen, L., Sun, G.: Squeeze-and-excitation networks. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 7132–7141. Salt Lake City (2018)
Ronneberger, O., Fischer, P., Brox, T.: U-Net: convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., Wells, W.M., and Frangi, A.F. (eds.) Medical Image Computing and Computer-Assisted Intervention. pp. 234–241. Springer International Publishing, Munich (2015)
Zheng, C., Cao, D., Hu, C.: A similarity-guided segmentation model for garbage detection under road scene. Front. Optoelectron. 15(22), 1–17 (2022)
Google Scholar
Zhang, Q., Yang, Y.: SA-net: shuffle attention for deep convolutional neural networks. In: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). pp. 2235–2239. IEEE, Toronto (2021)
Wang, Z., Simoncelli, E.P., Bovik, A.C.: Multiscale structural similarity for image quality assessment. In: Asilomar Conference on Signals, Systems and Computers. pp. 1398–1402. IEEE, Pacific Grove (2003)
Gatys, L.A., Ecker, A.S., Bethge, M.: Image style transfer using convolutional neural networks. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR). pp. 2414–2423. IEEE, Las Vegas (2016)
Yang, W., Wang, W., Huang, H., Wang, S., Liu, J.: Sparse gradient regularized deep retinex network for robust low-light image enhancement. IEEE Trans. Image Process. 30, 2072–2086 (2021)
Article Google Scholar
Cai, J., Gu, S., Zhang, L.: Learning a deep single image contrast enhancer from multi-exposure images. IEEE Trans. Image Process. 27, 2049–2062 (2018)
Article MathSciNet Google Scholar
Loshchilov, I., Hutter, F.: Decoupled weight decay regularization (2019)
Li, C., Guo, C., Chen, C.L.: Learning to enhance low-light image via zero-reference deep curve estimation. IEEE Trans. Pattern Analysis Mach. Intell. 44(8), 4225–4237 (2022)
Google Scholar
Wang, S., Zheng, J., Hu, H.M., Li, B.: Naturalness preserved enhancement algorithm for non-uniform illumination images. IEEE Trans. Image Process. 22(9), 3538–3548 (2013)
Article Google Scholar

Download references

Acknowledgements

This work was supported by State Grid Corporation of China (5700-202325308A-1-1-ZN) and Information & Telecommunication Branch of State Grid Jiangxi Electric Power Company.

Author information

Authors and Affiliations

School of Optical and Electronic Information, Huazhong University of Science and Technology, Wuhan, 430074, China
Yixiang Sun, Mengyao Ni, Ming Zhao, Zhenyu Yang & Danhua Cao
State Grid Information & Telecommunication Branch, Beijing, 100761, China
Yuanlong Peng

Authors

Yixiang Sun
View author publications
You can also search for this author in PubMed Google Scholar
Mengyao Ni
View author publications
You can also search for this author in PubMed Google Scholar
Ming Zhao
View author publications
You can also search for this author in PubMed Google Scholar
Zhenyu Yang
View author publications
You can also search for this author in PubMed Google Scholar
Yuanlong Peng
View author publications
You can also search for this author in PubMed Google Scholar
Danhua Cao
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

YS and MN completed the preparation of the algorithm and experiments, and jointly wrote the paper. DC and MZ offered valuable suggestions for the overall conception of the manuscript. ZY and YP supervised the project. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Danhua Cao.

Ethics declarations

Competing interests

The authors declare that they have no competing interests.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Sun, Y., Ni, M., Zhao, M. et al. Low-light enhancement method with dual branch feature fusion and learnable regularized attention. Front. Optoelectron. 17, 28 (2024). https://doi.org/10.1007/s12200-024-00129-z

Download citation

Received: 11 March 2024
Accepted: 02 July 2024
Published: 14 August 2024
DOI: https://doi.org/10.1007/s12200-024-00129-z

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Low-light enhancement method with dual branch feature fusion and learnable regularized attention