Semi-supervised attention based merging network with hybrid dilated convolution module for few-shot HDR video reconstruction

Zhao, Fengshan; Liu, Qin; Ikenaga, Takeshi

doi:10.1007/s11042-023-16885-7

Semi-supervised attention based merging network with hybrid dilated convolution module for few-shot HDR video reconstruction

Published: 30 September 2023

Volume 83, pages 37409–37430, (2024)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

Deep learning based methods for high dynamic range (HDR) video reconstruction require collecting large-scale HDR video dataset with ground truth which is time-consuming. Recent training strategies under the few-shot learning paradigm, which aim to build an effective model upon only a few labeled samples, have shown success in image classification and image segmentation. In this paper, a semi-supervised learning based framework for few-shot HDR video reconstruction is proposed. An attention based merging network with the hybrid dilated convolution module is used to recover missing contents and remove artifacts. The hybrid dilated convolution module extracts additional features from ill-exposed regions and the attention module corrects them to suppress harmful information. In the semi-supervised framework, designed training losses for the supervised branch and the unsupervised branch are utilized to constrain the network during training under the few-shot scenario. Experimental results show that the proposed method trained with only 5 labeled samples and 45 unlabeled samples achieves a PSNR score of 41.664dB on synthetic evaluation dataset, compared with 35.201dB which is the best score among supervised methods trained in the same few-shot condition.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Learning a Deep Convolutional Network for Image Super-Resolution

ESRGAN: Enhanced Super-Resolution Generative Adversarial Networks

Simple Baselines for Image Restoration

Data Availability

The data that support the findings of this study are openly available as follows:

\({\bullet }\) HDR video repositories for making traning dataset and synthetic evaluation dataset: “Cinematic Wide Gamut HDR video” at https://www.hdm-stuttgart.de/vmlab/hdm-hdr-2014/ [46]; HDR video repository “LiU HDRv Repository” at http://www.hdrv.org/Resources.php [25].

\({\bullet }\) Real-world dataset without ground truth at https://github.com/guanyingc/DeepHDRVideo-Dataset/ [3].

\({\bullet }\) Kalantari13 dataset at https://web.ece.ucsb.edu/~psen/PaperPages/HDRVideo/ [27].

References

Kang SB, Uyttendaele M, Winder S, Szeliski R (2003) High dynamic range video. ACM Transactions on Graphics (TOG) 22(3):319–325
Article Google Scholar
Kalantari N.K, Ramamoorthi R (2019) Deep hdr video from sequences with alternating exposures. In: Computer graphics forum, vol 38, pp 193–205. Wiley Online Library
Chen G, Chen C, Guo S, Liang Z, Wong K-YK, Zhang L (2021) Hdr video reconstruction: A coarse-to-fine network and a real-world benchmark dataset. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 2502–2511
Anand M, Harilal N, Kumar C, Raman S (2021) Hdrvideo-gan: deep generative hdr video reconstruction. In: Proceedings of the twelfth indian conference on computer vision, graphics and image processing, pp 1–9
Li L, Dong Y, Ren W, Pan J, Gao C, Sang N, Yang M-H (2019) Semi-supervised image dehazing. IEEE Trans Image Process 29:2766–2779
Article Google Scholar
Hasinoff S.W, Durand F, Freeman WT (2010) Noise-optimal capture for high dynamic range photography. In: 2010 IEEE computer society conference on computer vision and pattern recognition, pp 553–560. IEEE
Seshadrinathan K, Park SH, Nestares O (2012) Noise and dynamic range optimal computational imaging. In: 2012 19th IEEE international conference on image processing, pp 2785–2788. IEEE
Pourreza-Shahri R, Kehtarnavaz N (2015) Exposure bracketing via automatic exposure selection. In: 2015 IEEE international conference on image processing (ICIP):pp 320–323. IEEE
Eilertsen G, Kronander J, Denes G, Mantiuk RK, Unger J (2017) Hdr image reconstruction from a single exposure using deep cnns. ACM Transactions on Graphics (TOG) 36(6):1–15
Article Google Scholar
Bogoni L (2000) Extending dynamic range of monochrome and color images through fusion. In: Proceedings 15th international conference on pattern recognition. ICPR-2000, vol 3, pp 7–12. IEEE
Jacobs K, Loscos C, Ward G (2008) Automatic high-dynamic range image generation for dynamic scenes. IEEE Comput Graph Appl 28(2):84–93
Kalantari NK, Ramamoorthi R et al (2017) Deep high dynamic range imaging of dynamic scenes. ACM Trans Graph 36(4):144–1
Article Google Scholar
Pece F, Kautz J (2010) Bitmap movement detection: Hdr for dynamic scenes. In: 2010 Conference on visual media production, pp 1–8. IEEE
Zhang W, Cham W-K (2012) Reference-guided exposure fusion in dynamic scenes. J Vis Commun Image Represent 23(3):467–475
Article Google Scholar
Oh T-H, Lee J-Y, Tai Y-W, Kweon IS (2014) Robust high dynamic range imaging by rank minimization. IEEE Transactions on Pattern Analysis and Machine Intelligence 37(6):1219–1232
Article Google Scholar
Wu S, Xu J, Tai Y-W, Tang C-K (2018) Deep high dynamic range imaging with large foreground motions. In: Proceedings of the european conference on computer vision (ECCV):pp 117–132
Yan Q, Gong D, Shi Q, Hengel A.v.d, Shen C, Reid I, Zhang Y (2019) Attention-guided network for ghost-free high dynamic range imaging. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 1751–1760
Yan Q, Zhang L, Liu Y, Zhu Y, Sun J, Shi Q, Zhang Y (2020) Deep hdr imaging via a non-local network. IEEE Trans Image Process 29:4308–4322
Article Google Scholar
Niu Y, Wu J, Liu W, Guo W, Lau RW (2021) Hdr-gan: Hdr image reconstruction from multi-exposed ldr images with large motions. IEEE Trans Image Process 30:3885–3896
Article Google Scholar
Nayar SK, Mitsunaga T (2000) High dynamic range imaging: Spatially varying pixel exposures. In: Proceedings IEEE conference on computer vision and pattern recognition. CVPR 2000 (Cat. No. PR00662):vol 1, pp 472–479. IEEE
Serrano A, Heide F, Gutierrez D, Wetzstein G, Masia B (2016) Convolutional sparse coding for high dynamic range imaging. In: Computer graphics forum, vol 35, pp 153–163. Wiley Online Library
Hajisharif S, Kronander J, Unger J (2015) Adaptive dualiso hdr reconstruction. EURASIP Journal on Image and Video Processing 2015(1)1:1–13
Choi I, Baek S-H, Kim MH (2017) Reconstructing interlaced high-dynamic-range video using joint learning. IEEE Trans Image Process 26(11):5353–5366
Article MathSciNet Google Scholar
McGuire M, Matusik W, Pfister H, Chen B, Hughes JF, Nayar SK (2007) Optical splitting trees for high-precision monocular imaging. IEEE Comput Graph Appl 27(2):32–42
Article Google Scholar
Kronander J, Gustavson S, Bonnet G, Ynnerman A, Unger J (2014) A unified framework for multi-sensor hdr video reconstruction. Signal Process Image Commun 29(2):203–215
Article Google Scholar
Mangiat S, Gibson, J (2010) High dynamic range video with ghost removal. In: Applications of digital image processing XXXIII, vol 7798, pp 307–314. SPIE
Kalantari NK, Shechtman E, Barnes C, Darabi S, Goldman DB, Sen P (2013) Patch-based high dynamic range video. ACM Trans Graph 32(6):202–1
Article Google Scholar
Li Y, Lee C, Monga V (2016) A maximum a posteriori estimation framework for robust high dynamic range video synthesis. IEEE Trans Image Process 26(3):1143–1157
Article MathSciNet Google Scholar
Cai Q, Pan Y, Yao T, Yan C, Mei T (2018) Memory matching networks for one-shot image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4080–4088
Munkhdalai T, Yuan X, Mehri S, Trischler A (2018) Rapid adaptation with conditionally shifted neurons. In: International conference on machine learning, pp 3664–3673. PMLR
Yoon SW, Seo J, Moon J (2019) Tapnet: Neural network augmented with task-adaptive projection for few-shot learning. In: International conference on machine learning, pp 7115–7123. PMLR
Li H, Eigen D, Dodge S, Zeiler M, Wang X (2019) Finding task-relevant features for few-shot learning by category traversal. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 1–10
Ravi S, Larochelle H (2016) Optimization as a model for few-shot learning. In: International conference on learning representations
Su J-C, Maji S, Hariharan B (2020) When does self-supervision improve few-shot learning? In: European conference on computer vision, pp 645–666. Springer
Vinyals O, Blundell C, Lillicrap T, Wierstra D, et al (2016) Matching networks for one shot learning. Advances in neural information processing systems 29
Prabhakar KR, Senthil G, Agrawal S, Babu RV, Gorthi RKSS (2021) Labeled from unlabeled: Exploiting unlabeled data for few-shot deep hdr deghosting. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 4875–4885
Sousa S, Milios E, Berton L (2020) Word sense disambiguation: an evaluation study of semi-supervised approaches with word embeddings. In: 2020 International joint conference on neural networks (IJCNN):pp 1–8. IEEE
Hu Z, Yang Z, Hu X, Nevatia R (2021) Simple: Similar pseudo label exploitation for semi-supervised classification. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 15099–15108
Ling S, Liu Y, Salazar J, Kirchhoff K (2020) Deep contextualized acoustic representations for semi-supervised speech recognition. In: ICASSP 2020-2020 IEEE international conference on acoustics, speech and signal processing (ICASSP):pp 6429–6433. IEEE
Lai W-S, Huang J-B, Yang M-H (2017) Semi-supervised learning for optical flow with generative adversarial networks. Advances in Neural Information Processing Systems 30
Yang W, Wang S, Fang Y, Wang Y, Liu J (2020) From fidelity to perceptual quality: A semi-supervised approach for low-light image enhancement. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 3063–3072
Ranjan A, Black MJ (2017) Optical flow estimation using a spatial pyramid network. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4161–4170
Johnson J, Alahi A, Fei-Fei L (2016) Perceptual losses for real-time style transfer and super-resolution. In: European conference on computer vision, pp 694–711. Springer
Chen D, Yuan L, Liao J, Yu N, Hua G (2017) Stylebank: An explicit representation for neural image style transfer. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1897–1906
Hore A, Ziou D (2010) Image quality metrics: Psnr vs. ssim. In: 2010 20th International conference on pattern recognition, pp 2366–2369. IEEE
Froehlich J, Grandinetti S, Eberhardt B, Walter S, Schilling A, Brendel H (2014) Creating cinematic wide gamut hdr-video for the evaluation of tone mapping operators and hdr-displays. In: Digital photography X, vol 9023, pp 279–288. SPIE
Mantiuk R, Kim KJ, Rempel AG, Heidrich W (2011) Hdr-vdp-2: A calibrated visual metric for visibility and quality predictions in all luminance conditions. ACM Transactions on graphics (TOG) 30(4):1–14
Article Google Scholar
Narwaria M, Da Silva MP, Le Callet P (2015) Hdr-vqm: An objective quality measure for high dynamic range video. Signal Processing: Image Communication 35:46–60
Google Scholar
Jais IKM, Ismail AR, Nisa SQ (2019) Adam optimization algorithm for wide and deep neural network. Knowledge Engineering and Data Science 2(1):41–46
Article Google Scholar

Download references

Acknowledgements

This work was supported by KAKENHI(21K11816).

Author information

Authors and Affiliations

Graduate School of IPS, Waseda University, Kitakyushu, Japan
Fengshan Zhao & Takeshi Ikenaga
State Key Laboratory for Novel Software Technology, Nanjing University, Nanjing, China
Qin Liu

Authors

Fengshan Zhao
View author publications
You can also search for this author in PubMed Google Scholar
Qin Liu
View author publications
You can also search for this author in PubMed Google Scholar
Takeshi Ikenaga
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Fengshan Zhao.

Ethics declarations

Conflicts of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Zhao, F., Liu, Q. & Ikenaga, T. Semi-supervised attention based merging network with hybrid dilated convolution module for few-shot HDR video reconstruction. Multimed Tools Appl 83, 37409–37430 (2024). https://doi.org/10.1007/s11042-023-16885-7

Download citation

Received: 03 October 2022
Revised: 07 August 2023
Accepted: 04 September 2023
Published: 30 September 2023
Issue Date: April 2024
DOI: https://doi.org/10.1007/s11042-023-16885-7

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Semi-supervised attention based merging network with hybrid dilated convolution module for few-shot HDR video reconstruction

Abstract

Access this article

Similar content being viewed by others

Learning a Deep Convolutional Network for Image Super-Resolution

ESRGAN: Enhanced Super-Resolution Generative Adversarial Networks

Simple Baselines for Image Restoration

Data Availability

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflicts of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Semi-supervised attention based merging network with hybrid dilated convolution module for few-shot HDR video reconstruction

Abstract

Access this article

Similar content being viewed by others

Learning a Deep Convolutional Network for Image Super-Resolution

ESRGAN: Enhanced Super-Resolution Generative Adversarial Networks

Simple Baselines for Image Restoration

Data Availability

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflicts of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation