Skip to main content
Log in

Human-oriented video retargeting via object detection and patch decision

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

In this study, we suggest a novel video retargeting approach by considering the essential factors of a video: main object and movement thereof. Such two factors have been considered including the region of interest (ROI) for target object. Experimentally, we set the main object as human for storing the interaction object and movement in each sequential frame. Our method aims to preserve the ROI to the maximum extent possible over retargeting constraints for the target resolution. With a view to preserving the original main object, we rely on an object detection model to identify human-oriented objects; subsequently, we conduct a decision-making process to determine the suitability of our scheme. Upon the application of the proposed method, video frames are split into many patches and then generated with a precise target resolution using a video super-resolution model. The results of retargeting the frame images are compared against quality assessment metrics. The PSNR, SSIM, MS-SSIM, LPIPS, BMPRI, BRISQUE, PIQE and NIQE were used. We perform comparative experiments to confirm that the proposed approach can maintain the original ratio of important objects and the content of the video. We experimentally demonstrate that the proposed approach could enhance video resolution while ensuring visually pleasing quality and original important object.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Algorithm 1
Algorithm 2
Fig. 5
Fig. 6

Similar content being viewed by others

Data availability

Two datasets: DAVIS [19] and TVD [30] are used in this research. The DAVIS [19] dataset can be obtained at https://davischallenge.org/davis2017/code.html, and TVD [30] dataset can be accessed at https://multimedia.tencent.com/resources/tvd. The results of retargeting videos can be accessed at http://gofile.me/7apVR/aNiuR7y1d.

References

  1. Jocher G, Stoken A, Chaurasia A, Borovec J, Kwon Y, Michael K et al (2021) Ultralytics/yolov5: v6. 0-YOLOv5n'Nano'models. In: Roboflow integration, TensorFlow export, OpenCV DNN support. Zenodo

  2. Avidan S, Shamir A (2007) Seam carving for content-aware image resizing. ACM Trans Graph 26(3):10

  3. Bansal A, Ma S, Ramanan D, Sheikh Y (2018) Recycle-GAN: Unsupervised video retargeting. In: Proceedings of the European conference on computer vision (ECCV), pp 119–135

  4. Cheng WH, Wang CW, Wu JL (2006) Video adaptation for small display based on content recomposition. IEEE Trans Circuits Syst Video Technol 17(1):43–58

    Article  Google Scholar 

  5. Cho D, Park J, Oh TH, Tai YW, So Kweon I (2017) Weakly-and self-supervised learning for content-aware deep image retargeting. In: Proceedings of the IEEE international conference on computer vision, pp 4558–4567

  6. Cho D, Jung Y, Rameau F, Kim D, Woo S, Kweon IS (2019) Video Retargeting: trade-off between content preservation and spatio-temporal consistency. In: Proceedings of the 27th ACM international conference on multimedia, pp 882–889

  7. Chu M, Xie Y, Mayer J et al (2020) Learning temporal coherence via self-supervision for gan-based video generation. ACM Trans Graphics (TOG) 39(4):75–81

    Article  Google Scholar 

  8. Dong C, Loy CC, He K et al (2015) Image super-resolution using deep convolutional networks. IEEE Trans Pattern Anal Mach Intell 38(2):295–307

    Article  Google Scholar 

  9. Duchon CE (1979) Lanczos filtering in one and two dimensions. J Appl Meteorol Climatol 18(8):1016–1022

    Article  Google Scholar 

  10. Imani H, Islam MB, Wong LK (2023) Saliency-aware stereoscopic video retargeting. In: Proceedings of the IEEE/cvf conference on computer vision and pattern recognition, pp 1230–1239

  11. Jin JG, Bae J, Baek HG, Park SH (2023) Object-ratio-preserving video retargeting framework based on segmentation and inpainting. In: Proceedings of the IEEE/CVF winter conference on applications of computer vision, pp 497–503

  12. Karras T, Laine S, Aila T (2019) A style-based generator architecture for generative adversarial networks. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 4401–4410

  13. Keys R (1981) Cubic convolution interpolation for digital image processing. IEEE Trans Acoust Speech Signal Process 29(6):1153–1160

    Article  MathSciNet  Google Scholar 

  14. Kiess J, Kopf S, Guthier B et al (2018) A survey on content-aware image and video retargeting. ACM Trans Multimed Comput Commun App 14(3):28. https://doi.org/10.1145/3231598

    Article  Google Scholar 

  15. Ledig C, Theis L, Huszár F, Caballero J, Cunningham A, Acosta A et al (2017) Photo-realistic single image super-resolution using a generative adversarial network. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4681–4690

  16. Min X, Zhai G, Gu K et al (2018) Blind image quality estimation via distortion aggravation. IEEE Trans Broadcast 64(2):508–517

    Article  Google Scholar 

  17. Mittal A, Moorthy AK, Bovik AC (2011) Blind/referenceless image spatial quality evaluator. In: 2011 conference record of the forty fifth asilomar conference on signals, systems and computers (ASILOMAR). IEEE, pp 723–727

  18. Ni H, Liu Y, Huang SX, Xue Y (2023) Cross-identity video motion retargeting with joint transformation and synthesis. In: Proceedings of the IEEE/CVF winter conference on applications of computer vision, pp 412–422

  19. Pont-Tuset J, Perazzi F, Caelles S, Arbeláez P, Sorkine-Hornung A, Van Gool L (2017) The 2017 Davis challenge on video object segmentation. arXiv preprint arXiv:1704.00675

  20. Redmon J, Divvala S, Girshick R, Farhadi A (2016) You only look once: Unified, real-time object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 779–788

  21. Rubinstein M, Shamir A, Avidan S (2008) Improved seam carving for video retargeting. ACM Transactions Graphics (TOG) 27(3):1–9

    Article  Google Scholar 

  22. Shocher A, Bagon S, Isola P, Irani M (2019) Ingan: Capturing and retargeting the “DNA” of a natural image. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 4492–4501

  23. Th’evenaz P, Blu T, Unser M (2000) Image interpolation and resampling. Handbook Med Imaging, Process Anal 1(1):393–420

    Article  Google Scholar 

  24. Tomar S (2006) Converting video formats with ffmpeg. Linux Journal 2006(146):10

    Google Scholar 

  25. Venkatanath N, Praneeth D, Bh MC, Channappayya SS, Medasani SS (2015) Blind image quality evaluation using perception based features. In: 2015 twenty first national conference on communications (NCC). IEEE, pp 1–6

  26. Wang YS, Lin HC, Sorkine O, Lee TY (2010) Motion-based video retargeting with optimized crop-and-warp. In: ACM SIGGRAPH 2010 papers, pp 1–9

  27. Wang YS, Hsiao JH, Sorkine O, Lee TY (2011) Scalable and coherent video resizing with per-frame optimization. ACM Trans Graph (TOG) 30(4):1–8

  28. Wang Z, Simoncelli EP, Bovik AC (2003) Multiscale structural similarity for image quality assessment. In: The thrity-seventh Asilomar conference on signals, systems & computers, vol 2. IEEE, pp 1398–1402

  29. Wang Z, Bovik AC, Sheikh HR et al (2004) Image quality assessment: from error visibility to structural similarity. IEEE Trans Image Process 13(4):600–612

    Article  Google Scholar 

  30. Xu X, Liu S, Li Z (2021) Tencent video dataset (tvd): A video dataset for learning-based visual data compression and analysis. https://doi.org/10.48550/ARXIV. 2105.05961, URL https://arxiv.org/abs/2105.05961

  31. Yang Z, Zhu W, Wu W, Qian C, Zhou Q, Zhou B, Loy CC (2020) Transmomo: Invariance-driven unsupervised video motion retargeting. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 5306–5315

  32. Zhang R, Isola P, Efros AA, Shechtman E, Wang O (2018) The unreasonable effectiveness of deep features as a perceptual metric. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 586–595

  33. Granot N, Feinstein B, Shocher A, Bagon S, Irani M (2022) Drop the GAN: In defense of patches nearest neighbors as single image generative models. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 13460–13469

  34. Mittal A, Moorthy AK, Bovik AC (2012) No-reference image quality assessment in the spatial domain. IEEE Trans Image Process 21(12):4695–4708

    Article  MathSciNet  Google Scholar 

  35. Mittal A, Soundararajan R, Bovik AC (2012) Making a “completely blind” image quality analyzer. IEEE Signal Process Lett 20(3):209–212

    Article  Google Scholar 

Download references

Acknowledgements

This work was supported in part by Institute of Information & communications Technology Planning & Evaluation (IITP) grant funded by the Korea government (MSIT) (No.2021-0-00087, Development of high-quality conversion technology for SD/HD low-quality media) and in part by the BK21 FOUR project (AI-driven Convergence Software Education Research Program) funded by the Ministry of Education, School of Computer Science and Engineering, Kyungpook National University, Korea (4199990214394).

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Jieun Lee or Sang-hyo Park.

Ethics declarations

Conflict of interests

The authors declare that there are no conflicts of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Kim, DH., Lee, S., Bae, J. et al. Human-oriented video retargeting via object detection and patch decision. Multimed Tools Appl (2024). https://doi.org/10.1007/s11042-024-18878-6

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s11042-024-18878-6

Keywords

Navigation