Human-oriented video retargeting via object detection and patch decision

Kim, Dong-Hwi; Lee, Sujin; Bae, Jaehyun; Cho, Sukee; Bae, Byungjun; Lee, Jieun; Park, Sang-hyo

doi:10.1007/s11042-024-18878-6

Human-oriented video retargeting via object detection and patch decision

Published: 08 April 2024

(2024)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

Dong-Hwi Kim¹,
Sujin Lee¹,
Jaehyun Bae¹,
Sukee Cho²,
Byungjun Bae²,
Jieun Lee³ &
…
Sang-hyo Park ORCID: orcid.org/0000-0002-7282-7686¹

36 Accesses
Explore all metrics

Abstract

In this study, we suggest a novel video retargeting approach by considering the essential factors of a video: main object and movement thereof. Such two factors have been considered including the region of interest (ROI) for target object. Experimentally, we set the main object as human for storing the interaction object and movement in each sequential frame. Our method aims to preserve the ROI to the maximum extent possible over retargeting constraints for the target resolution. With a view to preserving the original main object, we rely on an object detection model to identify human-oriented objects; subsequently, we conduct a decision-making process to determine the suitability of our scheme. Upon the application of the proposed method, video frames are split into many patches and then generated with a precise target resolution using a video super-resolution model. The results of retargeting the frame images are compared against quality assessment metrics. The PSNR, SSIM, MS-SSIM, LPIPS, BMPRI, BRISQUE, PIQE and NIQE were used. We perform comparative experiments to confirm that the proposed approach can maintain the original ratio of important objects and the content of the video. We experimentally demonstrate that the proposed approach could enhance video resolution while ensuring visually pleasing quality and original important object.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 2

Fig. 4

Video Retargeting Based on Spatiotemporal Saliency Model

SHE-Based Video Retargeting with Multiple Ground Control Points

Image retargeting using nonparametric semantic segmentation

Article 20 September 2014

Data availability

Two datasets: DAVIS [19] and TVD [30] are used in this research. The DAVIS [19] dataset can be obtained at https://davischallenge.org/davis2017/code.html, and TVD [30] dataset can be accessed at https://multimedia.tencent.com/resources/tvd. The results of retargeting videos can be accessed at http://gofile.me/7apVR/aNiuR7y1d.

References

Jocher G, Stoken A, Chaurasia A, Borovec J, Kwon Y, Michael K et al (2021) Ultralytics/yolov5: v6. 0-YOLOv5n'Nano'models. In: Roboflow integration, TensorFlow export, OpenCV DNN support. Zenodo
Avidan S, Shamir A (2007) Seam carving for content-aware image resizing. ACM Trans Graph 26(3):10
Bansal A, Ma S, Ramanan D, Sheikh Y (2018) Recycle-GAN: Unsupervised video retargeting. In: Proceedings of the European conference on computer vision (ECCV), pp 119–135
Cheng WH, Wang CW, Wu JL (2006) Video adaptation for small display based on content recomposition. IEEE Trans Circuits Syst Video Technol 17(1):43–58
Article Google Scholar
Cho D, Park J, Oh TH, Tai YW, So Kweon I (2017) Weakly-and self-supervised learning for content-aware deep image retargeting. In: Proceedings of the IEEE international conference on computer vision, pp 4558–4567
Cho D, Jung Y, Rameau F, Kim D, Woo S, Kweon IS (2019) Video Retargeting: trade-off between content preservation and spatio-temporal consistency. In: Proceedings of the 27th ACM international conference on multimedia, pp 882–889
Chu M, Xie Y, Mayer J et al (2020) Learning temporal coherence via self-supervision for gan-based video generation. ACM Trans Graphics (TOG) 39(4):75–81
Article Google Scholar
Dong C, Loy CC, He K et al (2015) Image super-resolution using deep convolutional networks. IEEE Trans Pattern Anal Mach Intell 38(2):295–307
Article Google Scholar
Duchon CE (1979) Lanczos filtering in one and two dimensions. J Appl Meteorol Climatol 18(8):1016–1022
Article Google Scholar
Imani H, Islam MB, Wong LK (2023) Saliency-aware stereoscopic video retargeting. In: Proceedings of the IEEE/cvf conference on computer vision and pattern recognition, pp 1230–1239
Jin JG, Bae J, Baek HG, Park SH (2023) Object-ratio-preserving video retargeting framework based on segmentation and inpainting. In: Proceedings of the IEEE/CVF winter conference on applications of computer vision, pp 497–503
Karras T, Laine S, Aila T (2019) A style-based generator architecture for generative adversarial networks. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 4401–4410
Keys R (1981) Cubic convolution interpolation for digital image processing. IEEE Trans Acoust Speech Signal Process 29(6):1153–1160
Article MathSciNet Google Scholar
Kiess J, Kopf S, Guthier B et al (2018) A survey on content-aware image and video retargeting. ACM Trans Multimed Comput Commun App 14(3):28. https://doi.org/10.1145/3231598
Article Google Scholar
Ledig C, Theis L, Huszár F, Caballero J, Cunningham A, Acosta A et al (2017) Photo-realistic single image super-resolution using a generative adversarial network. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4681–4690
Min X, Zhai G, Gu K et al (2018) Blind image quality estimation via distortion aggravation. IEEE Trans Broadcast 64(2):508–517
Article Google Scholar
Mittal A, Moorthy AK, Bovik AC (2011) Blind/referenceless image spatial quality evaluator. In: 2011 conference record of the forty fifth asilomar conference on signals, systems and computers (ASILOMAR). IEEE, pp 723–727
Ni H, Liu Y, Huang SX, Xue Y (2023) Cross-identity video motion retargeting with joint transformation and synthesis. In: Proceedings of the IEEE/CVF winter conference on applications of computer vision, pp 412–422
Pont-Tuset J, Perazzi F, Caelles S, Arbeláez P, Sorkine-Hornung A, Van Gool L (2017) The 2017 Davis challenge on video object segmentation. arXiv preprint arXiv:1704.00675
Redmon J, Divvala S, Girshick R, Farhadi A (2016) You only look once: Unified, real-time object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 779–788
Rubinstein M, Shamir A, Avidan S (2008) Improved seam carving for video retargeting. ACM Transactions Graphics (TOG) 27(3):1–9
Article Google Scholar
Shocher A, Bagon S, Isola P, Irani M (2019) Ingan: Capturing and retargeting the “DNA” of a natural image. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 4492–4501
Th’evenaz P, Blu T, Unser M (2000) Image interpolation and resampling. Handbook Med Imaging, Process Anal 1(1):393–420
Article Google Scholar
Tomar S (2006) Converting video formats with ffmpeg. Linux Journal 2006(146):10
Google Scholar
Venkatanath N, Praneeth D, Bh MC, Channappayya SS, Medasani SS (2015) Blind image quality evaluation using perception based features. In: 2015 twenty first national conference on communications (NCC). IEEE, pp 1–6
Wang YS, Lin HC, Sorkine O, Lee TY (2010) Motion-based video retargeting with optimized crop-and-warp. In: ACM SIGGRAPH 2010 papers, pp 1–9
Wang YS, Hsiao JH, Sorkine O, Lee TY (2011) Scalable and coherent video resizing with per-frame optimization. ACM Trans Graph (TOG) 30(4):1–8
Wang Z, Simoncelli EP, Bovik AC (2003) Multiscale structural similarity for image quality assessment. In: The thrity-seventh Asilomar conference on signals, systems & computers, vol 2. IEEE, pp 1398–1402
Wang Z, Bovik AC, Sheikh HR et al (2004) Image quality assessment: from error visibility to structural similarity. IEEE Trans Image Process 13(4):600–612
Article Google Scholar
Xu X, Liu S, Li Z (2021) Tencent video dataset (tvd): A video dataset for learning-based visual data compression and analysis. https://doi.org/10.48550/ARXIV. 2105.05961, URL https://arxiv.org/abs/2105.05961
Yang Z, Zhu W, Wu W, Qian C, Zhou Q, Zhou B, Loy CC (2020) Transmomo: Invariance-driven unsupervised video motion retargeting. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 5306–5315
Zhang R, Isola P, Efros AA, Shechtman E, Wang O (2018) The unreasonable effectiveness of deep features as a perceptual metric. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 586–595
Granot N, Feinstein B, Shocher A, Bagon S, Irani M (2022) Drop the GAN: In defense of patches nearest neighbors as single image generative models. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 13460–13469
Mittal A, Moorthy AK, Bovik AC (2012) No-reference image quality assessment in the spatial domain. IEEE Trans Image Process 21(12):4695–4708
Article MathSciNet Google Scholar
Mittal A, Soundararajan R, Bovik AC (2012) Making a “completely blind” image quality analyzer. IEEE Signal Process Lett 20(3):209–212
Article Google Scholar

Download references

Acknowledgements

This work was supported in part by Institute of Information & communications Technology Planning & Evaluation (IITP) grant funded by the Korea government (MSIT) (No.2021-0-00087, Development of high-quality conversion technology for SD/HD low-quality media) and in part by the BK21 FOUR project (AI-driven Convergence Software Education Research Program) funded by the Ministry of Education, School of Computer Science and Engineering, Kyungpook National University, Korea (4199990214394).

Author information

Authors and Affiliations

School of Computer Science and Engineering, Kyungpook National University, Daegu, 41566, Korea
Dong-Hwi Kim, Sujin Lee, Jaehyun Bae & Sang-hyo Park
Electronics and Telecommunications Research Institute (ETRI), Daejeon, 34129, Korea
Sukee Cho & Byungjun Bae
Korea Education & Research Information Service (KERIS), Daegu, 41061, Korea
Jieun Lee

Authors

Dong-Hwi Kim
View author publications
You can also search for this author in PubMed Google Scholar
Sujin Lee
View author publications
You can also search for this author in PubMed Google Scholar
Jaehyun Bae
View author publications
You can also search for this author in PubMed Google Scholar
Sukee Cho
View author publications
You can also search for this author in PubMed Google Scholar
Byungjun Bae
View author publications
You can also search for this author in PubMed Google Scholar
Jieun Lee
View author publications
You can also search for this author in PubMed Google Scholar
Sang-hyo Park
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Jieun Lee or Sang-hyo Park.

Ethics declarations

Conflict of interests

The authors declare that there are no conflicts of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Kim, DH., Lee, S., Bae, J. et al. Human-oriented video retargeting via object detection and patch decision. Multimed Tools Appl (2024). https://doi.org/10.1007/s11042-024-18878-6

Download citation

Received: 04 September 2023
Revised: 06 February 2024
Accepted: 07 March 2024
Published: 08 April 2024
DOI: https://doi.org/10.1007/s11042-024-18878-6

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Human-oriented video retargeting via object detection and patch decision

Abstract

Access this article

Similar content being viewed by others

Video Retargeting Based on Spatiotemporal Saliency Model

SHE-Based Video Retargeting with Multiple Ground Control Points

Image retargeting using nonparametric semantic segmentation

Data availability

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding authors

Ethics declarations

Conflict of interests

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Human-oriented video retargeting via object detection and patch decision

Abstract

Access this article

Similar content being viewed by others

Video Retargeting Based on Spatiotemporal Saliency Model

SHE-Based Video Retargeting with Multiple Ground Control Points

Image retargeting using nonparametric semantic segmentation

Data availability

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding authors

Ethics declarations

Conflict of interests

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation