Abstract
Recent advancements in deep learning have significantly impacted low-light video enhancement, sparking great interest in the field. However, while these techniques have proven effective for enhancing individual static images, they struggle with temporal instability when applied to videos, leading to artifacts and flickering. This challenge is further compounded by the difficulty of obtaining dynamic low-light/high-light video pairs in real-world scenarios. Our proposed solution tackles these issues by integrating a cross-attention mechanism with optical flow. This approach helps mitigate temporal inconsistencies, often found when training with static images, by using optical flow to infer motion in individual frames. We have also developed a Transformer model (DSFormer) that leverages spatial and channel features to enhance visual quality and temporal stability in videos. Additionally, we have created a novel dual path feed-forward network (DPFN) that improves our method’s ability to capture and maintain local contextual information, which is crucial for low-light enhancement. Through extensive comparative and ablation studies, we demonstrate that our approach delivers high luminance and temporal consistency in enhancement sequences.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Zhang, F., Li, Y., You, S., Fu, Y.: Learning temporal consistency for low light video enhancement from single images. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4967–4976 (2021)
Dosovitskiy, A., et~al.: An image is worth 16x16 words: Transformers for image recognition at scale (2020). arXiv preprint arXiv:2010.11929
Liu, Z., et al.: Swin transformer: hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021)
Zheng, S., Gupta, G.: Semantic-guided zero-shot learning for low-light image/video enhancement. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp. 581–590 (2022)
Azizi, Z., et al.: Salve: self-supervised adaptive low-light video enhancement. APSIPA Trans. Signal Inf. Proc. 12(4) (2022)
Li, C., Guo, C., Loy, C.C.: Learning to enhance low-light image via zero-reference deep curve estimation. IEEE Trans. Pattern Anal. Mach. Intell. 44(8), 4225–4238 (2021)
Wu, Y., Kirillov, A., Massa, F., Lo, W.Y., Girshick, R.: Detectron2 (2019)
Zhan, X., Pan, X., Liu, Z., Lin, D., Loy, C.C.: Self-supervised learning via conditional motion propagation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1881–1889 (2019)
Pont-Tuset, J., Perazzi, F., Caelles, S., Arbelaez, P., Sorkine-Hornung, A., Van~Gool, L.: The 2017 davis challenge on video object segmentation (2017). arXiv preprint arXiv:1704.00675
Lv, F., Li, Y., Lu, F.: Attention guided low-light image enhancement with a large scale low-light simulation dataset. Int. J. Comput. Vision 129(7), 2175–2193 (2021)
Wang, Z., Bovik, A.C., Sheikh, H.R., Simoncelli, E.P.: Image quality assessment: from error visibility to structural similarity. IEEE Trans. Image Process. 13(4), 600–612 (2004)
Zhang, R., Isola, P., Efros, A.A., Shechtman, E., Wang, O.: The unreasonable effectiveness of deep features as a perceptual metric. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 586–595 (2018)
Ma, L., et al.: Bilevel fast scene adaptation for low-light image enhancement. Int. J. Comput. Vision 1–19 (2023).
Fu, Z., Yang, Y., Tu, X., Huang, Y., Ding, X., Ma, K.K.: Learning a simple low-light image enhancer from paired low-light instances. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 22252–22261 (2023)
Xu, X., Wang, R., Fu, C.W., Jia, J.: SNR-aware low-light image enhancement. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). pp. 17714--17724 (June 2022)
Ma, L., Ma, T., Liu, R., Fan, X., Luo, Z.: Toward fast, flexible, and robust low-light image enhancement. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5637–5646 (2022)
Hai, J., et al.: R2rnet: low-light image enhancement via real-low to real-normal network. J. Vis. Commun. Image Represent. 90, 103712 (2023)
Liu, Y., Huang, T., Dong, W., Wu, F., Li, X., Shi, G.: Low-light image enhancement with multi-stage residue quantization and brightness-aware attention. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12140–12149 (2023)
Wang, T., Zhang, K., Shen, T., Luo, W., Stenger, B., Lu, T.: Ultra-high-definition low-light image enhancement: a benchmark and transformer-based method. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 37, pp. 2654–2662 (2023)
Yang, S., Ding, M., Wu, Y., Li, Z., Zhang, J.: Implicit neural representation for cooperative low-light image enhancement. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12918–12927 (2023)
Wang, C., Wu, H., Jin, Z.: Fourllie: boosting low-light image enhancement by fourier frequency information. In: Proceedings of the 31st ACM International Conference on Multimedia, pp. 7459–7469 (2023)
Lv, F., Lu, F., Wu, J., Lim, C.: MBLLEN: low-light image/video enhancement using cnns. In: BMVC, vol. 220, p. 4. Northumbria University (2018)
Jiang, H., Zheng, Y.: Learning to see moving objects in the dark. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 7324–7333 (2019)
Lai, W.S., Huang, J.B., Wang, O., Shechtman, E., Yumer, E., Yang, M.H.: Learning blind video temporal consistency. In: Proceedings of the European conference on computer vision (ECCV), pp. 170–185 (2018)
Acknowledgments
This work was sponsored by National Natural Science Foundation of China (NSFC) (62272342, 62020106004), and Tianjin Natural Science Foundation (23JCJQJC00070).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2024 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Xu, J., Mei, S., Chen, Z., Zhang, D., Shi, F., Zhao, M. (2024). DSFormer: Leveraging Transformer with Cross-Modal Attention for Temporal Consistency in Low-Light Video Enhancement. In: Huang, DS., Pan, Y., Zhang, Q. (eds) Advanced Intelligent Computing Technology and Applications. ICIC 2024. Lecture Notes in Computer Science, vol 14872. Springer, Singapore. https://doi.org/10.1007/978-981-97-5612-4_3
Download citation
DOI: https://doi.org/10.1007/978-981-97-5612-4_3
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-97-5611-7
Online ISBN: 978-981-97-5612-4
eBook Packages: Computer ScienceComputer Science (R0)