Skip to main content
Log in

Semi-supervised attention based merging network with hybrid dilated convolution module for few-shot HDR video reconstruction

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

Deep learning based methods for high dynamic range (HDR) video reconstruction require collecting large-scale HDR video dataset with ground truth which is time-consuming. Recent training strategies under the few-shot learning paradigm, which aim to build an effective model upon only a few labeled samples, have shown success in image classification and image segmentation. In this paper, a semi-supervised learning based framework for few-shot HDR video reconstruction is proposed. An attention based merging network with the hybrid dilated convolution module is used to recover missing contents and remove artifacts. The hybrid dilated convolution module extracts additional features from ill-exposed regions and the attention module corrects them to suppress harmful information. In the semi-supervised framework, designed training losses for the supervised branch and the unsupervised branch are utilized to constrain the network during training under the few-shot scenario. Experimental results show that the proposed method trained with only 5 labeled samples and 45 unlabeled samples achieves a PSNR score of 41.664dB on synthetic evaluation dataset, compared with 35.201dB which is the best score among supervised methods trained in the same few-shot condition.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13

Similar content being viewed by others

Data Availability

The data that support the findings of this study are openly available as follows:

\({\bullet }\) HDR video repositories for making traning dataset and synthetic evaluation dataset: “Cinematic Wide Gamut HDR video” at https://www.hdm-stuttgart.de/vmlab/hdm-hdr-2014/ [46]; HDR video repository “LiU HDRv Repository” at http://www.hdrv.org/Resources.php [25].

\({\bullet }\) Real-world dataset without ground truth at https://github.com/guanyingc/DeepHDRVideo-Dataset/ [3].

\({\bullet }\) Kalantari13 dataset at https://web.ece.ucsb.edu/~psen/PaperPages/HDRVideo/ [27].

References

  1. Kang SB, Uyttendaele M, Winder S, Szeliski R (2003) High dynamic range video. ACM Transactions on Graphics (TOG) 22(3):319–325

    Article  Google Scholar 

  2. Kalantari N.K, Ramamoorthi R (2019) Deep hdr video from sequences with alternating exposures. In: Computer graphics forum, vol 38, pp 193–205. Wiley Online Library

  3. Chen G, Chen C, Guo S, Liang Z, Wong K-YK, Zhang L (2021) Hdr video reconstruction: A coarse-to-fine network and a real-world benchmark dataset. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 2502–2511

  4. Anand M, Harilal N, Kumar C, Raman S (2021) Hdrvideo-gan: deep generative hdr video reconstruction. In: Proceedings of the twelfth indian conference on computer vision, graphics and image processing, pp 1–9

  5. Li L, Dong Y, Ren W, Pan J, Gao C, Sang N, Yang M-H (2019) Semi-supervised image dehazing. IEEE Trans Image Process 29:2766–2779

    Article  Google Scholar 

  6. Hasinoff S.W, Durand F, Freeman WT (2010) Noise-optimal capture for high dynamic range photography. In: 2010 IEEE computer society conference on computer vision and pattern recognition, pp 553–560. IEEE

  7. Seshadrinathan K, Park SH, Nestares O (2012) Noise and dynamic range optimal computational imaging. In: 2012 19th IEEE international conference on image processing, pp 2785–2788. IEEE

  8. Pourreza-Shahri R, Kehtarnavaz N (2015) Exposure bracketing via automatic exposure selection. In: 2015 IEEE international conference on image processing (ICIP):pp 320–323. IEEE

  9. Eilertsen G, Kronander J, Denes G, Mantiuk RK, Unger J (2017) Hdr image reconstruction from a single exposure using deep cnns. ACM Transactions on Graphics (TOG) 36(6):1–15

    Article  Google Scholar 

  10. Bogoni L (2000) Extending dynamic range of monochrome and color images through fusion. In: Proceedings 15th international conference on pattern recognition. ICPR-2000, vol 3, pp 7–12. IEEE

  11. Jacobs K, Loscos C, Ward G (2008) Automatic high-dynamic range image generation for dynamic scenes. IEEE Comput Graph Appl 28(2):84–93

  12. Kalantari NK, Ramamoorthi R et al (2017) Deep high dynamic range imaging of dynamic scenes. ACM Trans Graph 36(4):144–1

    Article  Google Scholar 

  13. Pece F, Kautz J (2010) Bitmap movement detection: Hdr for dynamic scenes. In: 2010 Conference on visual media production, pp 1–8. IEEE

  14. Zhang W, Cham W-K (2012) Reference-guided exposure fusion in dynamic scenes. J Vis Commun Image Represent 23(3):467–475

    Article  Google Scholar 

  15. Oh T-H, Lee J-Y, Tai Y-W, Kweon IS (2014) Robust high dynamic range imaging by rank minimization. IEEE Transactions on Pattern Analysis and Machine Intelligence 37(6):1219–1232

    Article  Google Scholar 

  16. Wu S, Xu J, Tai Y-W, Tang C-K (2018) Deep high dynamic range imaging with large foreground motions. In: Proceedings of the european conference on computer vision (ECCV):pp 117–132

  17. Yan Q, Gong D, Shi Q, Hengel A.v.d, Shen C, Reid I, Zhang Y (2019) Attention-guided network for ghost-free high dynamic range imaging. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 1751–1760

  18. Yan Q, Zhang L, Liu Y, Zhu Y, Sun J, Shi Q, Zhang Y (2020) Deep hdr imaging via a non-local network. IEEE Trans Image Process 29:4308–4322

    Article  Google Scholar 

  19. Niu Y, Wu J, Liu W, Guo W, Lau RW (2021) Hdr-gan: Hdr image reconstruction from multi-exposed ldr images with large motions. IEEE Trans Image Process 30:3885–3896

    Article  Google Scholar 

  20. Nayar SK, Mitsunaga T (2000) High dynamic range imaging: Spatially varying pixel exposures. In: Proceedings IEEE conference on computer vision and pattern recognition. CVPR 2000 (Cat. No. PR00662):vol 1, pp 472–479. IEEE

  21. Serrano A, Heide F, Gutierrez D, Wetzstein G, Masia B (2016) Convolutional sparse coding for high dynamic range imaging. In: Computer graphics forum, vol 35, pp 153–163. Wiley Online Library

  22. Hajisharif S, Kronander J, Unger J (2015) Adaptive dualiso hdr reconstruction. EURASIP Journal on Image and Video Processing 2015(1)1:1–13

  23. Choi I, Baek S-H, Kim MH (2017) Reconstructing interlaced high-dynamic-range video using joint learning. IEEE Trans Image Process 26(11):5353–5366

    Article  MathSciNet  Google Scholar 

  24. McGuire M, Matusik W, Pfister H, Chen B, Hughes JF, Nayar SK (2007) Optical splitting trees for high-precision monocular imaging. IEEE Comput Graph Appl 27(2):32–42

    Article  Google Scholar 

  25. Kronander J, Gustavson S, Bonnet G, Ynnerman A, Unger J (2014) A unified framework for multi-sensor hdr video reconstruction. Signal Process Image Commun 29(2):203–215

    Article  Google Scholar 

  26. Mangiat S, Gibson, J (2010) High dynamic range video with ghost removal. In: Applications of digital image processing XXXIII, vol 7798, pp 307–314. SPIE

  27. Kalantari NK, Shechtman E, Barnes C, Darabi S, Goldman DB, Sen P (2013) Patch-based high dynamic range video. ACM Trans Graph 32(6):202–1

    Article  Google Scholar 

  28. Li Y, Lee C, Monga V (2016) A maximum a posteriori estimation framework for robust high dynamic range video synthesis. IEEE Trans Image Process 26(3):1143–1157

    Article  MathSciNet  Google Scholar 

  29. Cai Q, Pan Y, Yao T, Yan C, Mei T (2018) Memory matching networks for one-shot image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4080–4088

  30. Munkhdalai T, Yuan X, Mehri S, Trischler A (2018) Rapid adaptation with conditionally shifted neurons. In: International conference on machine learning, pp 3664–3673. PMLR

  31. Yoon SW, Seo J, Moon J (2019) Tapnet: Neural network augmented with task-adaptive projection for few-shot learning. In: International conference on machine learning, pp 7115–7123. PMLR

  32. Li H, Eigen D, Dodge S, Zeiler M, Wang X (2019) Finding task-relevant features for few-shot learning by category traversal. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 1–10

  33. Ravi S, Larochelle H (2016) Optimization as a model for few-shot learning. In: International conference on learning representations

  34. Su J-C, Maji S, Hariharan B (2020) When does self-supervision improve few-shot learning? In: European conference on computer vision, pp 645–666. Springer

  35. Vinyals O, Blundell C, Lillicrap T, Wierstra D, et al (2016) Matching networks for one shot learning. Advances in neural information processing systems 29

  36. Prabhakar KR, Senthil G, Agrawal S, Babu RV, Gorthi RKSS (2021) Labeled from unlabeled: Exploiting unlabeled data for few-shot deep hdr deghosting. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 4875–4885

  37. Sousa S, Milios E, Berton L (2020) Word sense disambiguation: an evaluation study of semi-supervised approaches with word embeddings. In: 2020 International joint conference on neural networks (IJCNN):pp 1–8. IEEE

  38. Hu Z, Yang Z, Hu X, Nevatia R (2021) Simple: Similar pseudo label exploitation for semi-supervised classification. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 15099–15108

  39. Ling S, Liu Y, Salazar J, Kirchhoff K (2020) Deep contextualized acoustic representations for semi-supervised speech recognition. In: ICASSP 2020-2020 IEEE international conference on acoustics, speech and signal processing (ICASSP):pp 6429–6433. IEEE

  40. Lai W-S, Huang J-B, Yang M-H (2017) Semi-supervised learning for optical flow with generative adversarial networks. Advances in Neural Information Processing Systems 30

  41. Yang W, Wang S, Fang Y, Wang Y, Liu J (2020) From fidelity to perceptual quality: A semi-supervised approach for low-light image enhancement. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 3063–3072

  42. Ranjan A, Black MJ (2017) Optical flow estimation using a spatial pyramid network. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4161–4170

  43. Johnson J, Alahi A, Fei-Fei L (2016) Perceptual losses for real-time style transfer and super-resolution. In: European conference on computer vision, pp 694–711. Springer

  44. Chen D, Yuan L, Liao J, Yu N, Hua G (2017) Stylebank: An explicit representation for neural image style transfer. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1897–1906

  45. Hore A, Ziou D (2010) Image quality metrics: Psnr vs. ssim. In: 2010 20th International conference on pattern recognition, pp 2366–2369. IEEE

  46. Froehlich J, Grandinetti S, Eberhardt B, Walter S, Schilling A, Brendel H (2014) Creating cinematic wide gamut hdr-video for the evaluation of tone mapping operators and hdr-displays. In: Digital photography X, vol 9023, pp 279–288. SPIE

  47. Mantiuk R, Kim KJ, Rempel AG, Heidrich W (2011) Hdr-vdp-2: A calibrated visual metric for visibility and quality predictions in all luminance conditions. ACM Transactions on graphics (TOG) 30(4):1–14

    Article  Google Scholar 

  48. Narwaria M, Da Silva MP, Le Callet P (2015) Hdr-vqm: An objective quality measure for high dynamic range video. Signal Processing: Image Communication 35:46–60

    Google Scholar 

  49. Jais IKM, Ismail AR, Nisa SQ (2019) Adam optimization algorithm for wide and deep neural network. Knowledge Engineering and Data Science 2(1):41–46

    Article  Google Scholar 

Download references

Acknowledgements

This work was supported by KAKENHI(21K11816).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Fengshan Zhao.

Ethics declarations

Conflicts of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhao, F., Liu, Q. & Ikenaga, T. Semi-supervised attention based merging network with hybrid dilated convolution module for few-shot HDR video reconstruction. Multimed Tools Appl 83, 37409–37430 (2024). https://doi.org/10.1007/s11042-023-16885-7

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-023-16885-7

Keywords

Navigation