Abstract
Deep learning based methods for high dynamic range (HDR) video reconstruction require collecting large-scale HDR video dataset with ground truth which is time-consuming. Recent training strategies under the few-shot learning paradigm, which aim to build an effective model upon only a few labeled samples, have shown success in image classification and image segmentation. In this paper, a semi-supervised learning based framework for few-shot HDR video reconstruction is proposed. An attention based merging network with the hybrid dilated convolution module is used to recover missing contents and remove artifacts. The hybrid dilated convolution module extracts additional features from ill-exposed regions and the attention module corrects them to suppress harmful information. In the semi-supervised framework, designed training losses for the supervised branch and the unsupervised branch are utilized to constrain the network during training under the few-shot scenario. Experimental results show that the proposed method trained with only 5 labeled samples and 45 unlabeled samples achieves a PSNR score of 41.664dB on synthetic evaluation dataset, compared with 35.201dB which is the best score among supervised methods trained in the same few-shot condition.
Similar content being viewed by others
Data Availability
The data that support the findings of this study are openly available as follows:
\({\bullet }\) HDR video repositories for making traning dataset and synthetic evaluation dataset: “Cinematic Wide Gamut HDR video” at https://www.hdm-stuttgart.de/vmlab/hdm-hdr-2014/ [46]; HDR video repository “LiU HDRv Repository” at http://www.hdrv.org/Resources.php [25].
\({\bullet }\) Real-world dataset without ground truth at https://github.com/guanyingc/DeepHDRVideo-Dataset/ [3].
\({\bullet }\) Kalantari13 dataset at https://web.ece.ucsb.edu/~psen/PaperPages/HDRVideo/ [27].
References
Kang SB, Uyttendaele M, Winder S, Szeliski R (2003) High dynamic range video. ACM Transactions on Graphics (TOG) 22(3):319–325
Kalantari N.K, Ramamoorthi R (2019) Deep hdr video from sequences with alternating exposures. In: Computer graphics forum, vol 38, pp 193–205. Wiley Online Library
Chen G, Chen C, Guo S, Liang Z, Wong K-YK, Zhang L (2021) Hdr video reconstruction: A coarse-to-fine network and a real-world benchmark dataset. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 2502–2511
Anand M, Harilal N, Kumar C, Raman S (2021) Hdrvideo-gan: deep generative hdr video reconstruction. In: Proceedings of the twelfth indian conference on computer vision, graphics and image processing, pp 1–9
Li L, Dong Y, Ren W, Pan J, Gao C, Sang N, Yang M-H (2019) Semi-supervised image dehazing. IEEE Trans Image Process 29:2766–2779
Hasinoff S.W, Durand F, Freeman WT (2010) Noise-optimal capture for high dynamic range photography. In: 2010 IEEE computer society conference on computer vision and pattern recognition, pp 553–560. IEEE
Seshadrinathan K, Park SH, Nestares O (2012) Noise and dynamic range optimal computational imaging. In: 2012 19th IEEE international conference on image processing, pp 2785–2788. IEEE
Pourreza-Shahri R, Kehtarnavaz N (2015) Exposure bracketing via automatic exposure selection. In: 2015 IEEE international conference on image processing (ICIP):pp 320–323. IEEE
Eilertsen G, Kronander J, Denes G, Mantiuk RK, Unger J (2017) Hdr image reconstruction from a single exposure using deep cnns. ACM Transactions on Graphics (TOG) 36(6):1–15
Bogoni L (2000) Extending dynamic range of monochrome and color images through fusion. In: Proceedings 15th international conference on pattern recognition. ICPR-2000, vol 3, pp 7–12. IEEE
Jacobs K, Loscos C, Ward G (2008) Automatic high-dynamic range image generation for dynamic scenes. IEEE Comput Graph Appl 28(2):84–93
Kalantari NK, Ramamoorthi R et al (2017) Deep high dynamic range imaging of dynamic scenes. ACM Trans Graph 36(4):144–1
Pece F, Kautz J (2010) Bitmap movement detection: Hdr for dynamic scenes. In: 2010 Conference on visual media production, pp 1–8. IEEE
Zhang W, Cham W-K (2012) Reference-guided exposure fusion in dynamic scenes. J Vis Commun Image Represent 23(3):467–475
Oh T-H, Lee J-Y, Tai Y-W, Kweon IS (2014) Robust high dynamic range imaging by rank minimization. IEEE Transactions on Pattern Analysis and Machine Intelligence 37(6):1219–1232
Wu S, Xu J, Tai Y-W, Tang C-K (2018) Deep high dynamic range imaging with large foreground motions. In: Proceedings of the european conference on computer vision (ECCV):pp 117–132
Yan Q, Gong D, Shi Q, Hengel A.v.d, Shen C, Reid I, Zhang Y (2019) Attention-guided network for ghost-free high dynamic range imaging. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 1751–1760
Yan Q, Zhang L, Liu Y, Zhu Y, Sun J, Shi Q, Zhang Y (2020) Deep hdr imaging via a non-local network. IEEE Trans Image Process 29:4308–4322
Niu Y, Wu J, Liu W, Guo W, Lau RW (2021) Hdr-gan: Hdr image reconstruction from multi-exposed ldr images with large motions. IEEE Trans Image Process 30:3885–3896
Nayar SK, Mitsunaga T (2000) High dynamic range imaging: Spatially varying pixel exposures. In: Proceedings IEEE conference on computer vision and pattern recognition. CVPR 2000 (Cat. No. PR00662):vol 1, pp 472–479. IEEE
Serrano A, Heide F, Gutierrez D, Wetzstein G, Masia B (2016) Convolutional sparse coding for high dynamic range imaging. In: Computer graphics forum, vol 35, pp 153–163. Wiley Online Library
Hajisharif S, Kronander J, Unger J (2015) Adaptive dualiso hdr reconstruction. EURASIP Journal on Image and Video Processing 2015(1)1:1–13
Choi I, Baek S-H, Kim MH (2017) Reconstructing interlaced high-dynamic-range video using joint learning. IEEE Trans Image Process 26(11):5353–5366
McGuire M, Matusik W, Pfister H, Chen B, Hughes JF, Nayar SK (2007) Optical splitting trees for high-precision monocular imaging. IEEE Comput Graph Appl 27(2):32–42
Kronander J, Gustavson S, Bonnet G, Ynnerman A, Unger J (2014) A unified framework for multi-sensor hdr video reconstruction. Signal Process Image Commun 29(2):203–215
Mangiat S, Gibson, J (2010) High dynamic range video with ghost removal. In: Applications of digital image processing XXXIII, vol 7798, pp 307–314. SPIE
Kalantari NK, Shechtman E, Barnes C, Darabi S, Goldman DB, Sen P (2013) Patch-based high dynamic range video. ACM Trans Graph 32(6):202–1
Li Y, Lee C, Monga V (2016) A maximum a posteriori estimation framework for robust high dynamic range video synthesis. IEEE Trans Image Process 26(3):1143–1157
Cai Q, Pan Y, Yao T, Yan C, Mei T (2018) Memory matching networks for one-shot image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4080–4088
Munkhdalai T, Yuan X, Mehri S, Trischler A (2018) Rapid adaptation with conditionally shifted neurons. In: International conference on machine learning, pp 3664–3673. PMLR
Yoon SW, Seo J, Moon J (2019) Tapnet: Neural network augmented with task-adaptive projection for few-shot learning. In: International conference on machine learning, pp 7115–7123. PMLR
Li H, Eigen D, Dodge S, Zeiler M, Wang X (2019) Finding task-relevant features for few-shot learning by category traversal. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 1–10
Ravi S, Larochelle H (2016) Optimization as a model for few-shot learning. In: International conference on learning representations
Su J-C, Maji S, Hariharan B (2020) When does self-supervision improve few-shot learning? In: European conference on computer vision, pp 645–666. Springer
Vinyals O, Blundell C, Lillicrap T, Wierstra D, et al (2016) Matching networks for one shot learning. Advances in neural information processing systems 29
Prabhakar KR, Senthil G, Agrawal S, Babu RV, Gorthi RKSS (2021) Labeled from unlabeled: Exploiting unlabeled data for few-shot deep hdr deghosting. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 4875–4885
Sousa S, Milios E, Berton L (2020) Word sense disambiguation: an evaluation study of semi-supervised approaches with word embeddings. In: 2020 International joint conference on neural networks (IJCNN):pp 1–8. IEEE
Hu Z, Yang Z, Hu X, Nevatia R (2021) Simple: Similar pseudo label exploitation for semi-supervised classification. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 15099–15108
Ling S, Liu Y, Salazar J, Kirchhoff K (2020) Deep contextualized acoustic representations for semi-supervised speech recognition. In: ICASSP 2020-2020 IEEE international conference on acoustics, speech and signal processing (ICASSP):pp 6429–6433. IEEE
Lai W-S, Huang J-B, Yang M-H (2017) Semi-supervised learning for optical flow with generative adversarial networks. Advances in Neural Information Processing Systems 30
Yang W, Wang S, Fang Y, Wang Y, Liu J (2020) From fidelity to perceptual quality: A semi-supervised approach for low-light image enhancement. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 3063–3072
Ranjan A, Black MJ (2017) Optical flow estimation using a spatial pyramid network. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4161–4170
Johnson J, Alahi A, Fei-Fei L (2016) Perceptual losses for real-time style transfer and super-resolution. In: European conference on computer vision, pp 694–711. Springer
Chen D, Yuan L, Liao J, Yu N, Hua G (2017) Stylebank: An explicit representation for neural image style transfer. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1897–1906
Hore A, Ziou D (2010) Image quality metrics: Psnr vs. ssim. In: 2010 20th International conference on pattern recognition, pp 2366–2369. IEEE
Froehlich J, Grandinetti S, Eberhardt B, Walter S, Schilling A, Brendel H (2014) Creating cinematic wide gamut hdr-video for the evaluation of tone mapping operators and hdr-displays. In: Digital photography X, vol 9023, pp 279–288. SPIE
Mantiuk R, Kim KJ, Rempel AG, Heidrich W (2011) Hdr-vdp-2: A calibrated visual metric for visibility and quality predictions in all luminance conditions. ACM Transactions on graphics (TOG) 30(4):1–14
Narwaria M, Da Silva MP, Le Callet P (2015) Hdr-vqm: An objective quality measure for high dynamic range video. Signal Processing: Image Communication 35:46–60
Jais IKM, Ismail AR, Nisa SQ (2019) Adam optimization algorithm for wide and deep neural network. Knowledge Engineering and Data Science 2(1):41–46
Acknowledgements
This work was supported by KAKENHI(21K11816).
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflicts of interest
The authors declare that they have no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Zhao, F., Liu, Q. & Ikenaga, T. Semi-supervised attention based merging network with hybrid dilated convolution module for few-shot HDR video reconstruction. Multimed Tools Appl 83, 37409–37430 (2024). https://doi.org/10.1007/s11042-023-16885-7
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-023-16885-7