Delivery of omnidirectional video using saliency prediction and optimal bitrate allocation

Ozcinar, Cagri; İmamoğlu, Nevrez; Wang, Weimin; Smolic, Aljosa

doi:10.1007/s11760-020-01769-2

Delivery of omnidirectional video using saliency prediction and optimal bitrate allocation

Original Paper
Published: 09 September 2020

Volume 15, pages 493–500, (2021)
Cite this article

Signal, Image and Video Processing Aims and scope Submit manuscript

Cagri Ozcinar ORCID: orcid.org/0000-0003-4915-2251¹,
Nevrez İmamoğlu²,
Weimin Wang² &
…
Aljosa Smolic¹

422 Accesses
3 Citations
Explore all metrics

Abstract

In this work, we propose and investigate a user-centric framework for the delivery of omnidirectional video (ODV) on VR systems by taking advantage of visual attention (saliency) models for bitrate allocation module. For this purpose, we formulate a new bitrate allocation algorithm that takes saliency map and nonlinear sphere-to-plane mapping into account for each ODV and solve the formulated problem using linear integer programming. For visual attention models, we use both image- and video-based saliency prediction results; moreover, we explore two types of attention model approaches: (i) salient object detection with transfer learning using pre-trained networks, (ii) saliency prediction with supervised networks trained on eye-fixation dataset. Experimental evaluations on saliency integration of models are discussed with interesting findings on transfer learning and supervised saliency approaches.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1

Fig. 2

ATSal: An Attention Based Architecture for Saliency Prediction in 360 $$^\circ $$ Videos

Audio-Visual Saliency for Omnidirectional Videos

Visual attention-aware quality estimation framework for omnidirectional video using spherical Voronoi diagram

Article 27 April 2020

References

Abbas, A., Adsumilli, B.: AhG8: new GoPro test sequences for virtual reality video coding. Technical Report JVET-D0026, JTC1/SC29/WG11, ISO/IEC, Chengdu, China (2016)
Asbun, E., He, H., Y., H., Ye, Y.: AhG8: InterDigital test sequences for virtual reality video coding. Technical Report JVET-D0039, JTC1/SC29/WG11, ISO/IEC, Chengdu, China (2016)
Bang, G., Lafruit, G., Tanimoto, M.: Description of 360 3D video application exploration experiments on divergent multiview video. Technical Report MPEG2015/ M16129, JTC1/SC29/WG11, ISO/IEC, Chengdu, China (2016)
Cornia, M., Baraldi, L., Serra, G., Cucchiara, R.: A Deep Multi-Level Network for Saliency Prediction. In: International Conference on Pattern Recognition (ICPR) (2016)
Fan, C.L., Lo, W.C., Pai, Y.T., Hsu, C.H.: A survey on $360^\circ $ video streaming: acquisition, transmission, and display. ACM Comput. Surv. 54, 1–36 (2019)
Article Google Scholar
Fang, Y., Chen, Z., Lin, W., Lin, C.W.: Saliency detection in the compressed domain for adaptive image re-targeting. IEEE Trans. Image Process. 21(9), 3888–3901 (2012). https://doi.org/10.1109/TIP.2012.2199126
Article MathSciNet MATH Google Scholar
Fang, Y., Wang, Z., Lin, W., Fang, Z.: Video saliency incorporating spatiotemporal cues and uncertainty weighting. IEEE Trans. Image Process. 23(9), 3910–3921 (2014). https://doi.org/10.1109/tip.2014.2336549
Article MathSciNet MATH Google Scholar
Hadizadeh, H., Bajić, I.V.: Saliency-aware video compression. IEEE Trans. Image Process. 23(1), 19–33 (2014). https://doi.org/10.1109/TIP.2013.2282897
Article MathSciNet MATH Google Scholar
Imamoglu, N., Zhang, C., Shimoda, W., Fang, Y., Shi, B.: Saliency detection by forward and backward cues in deep-CNNs. In: 2017 IEEE International Conference on Image Processing (ICIP) (2017)
Intel: Experience the Future of the Olympic Games with Intel (2018). https://www.olympic.org/news/experience-the-future-of-the-olympic-games-with-intel
Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: saliency in context. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2015)
Kan, N., Zou, J., Tang, K., Li, C., Liu, N., Xiong, H.: Deep reinforcement learning-based rate adaptation for adaptive 360-degree video streaming. In: ICASSP 2019–2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4030–4034 (2019)
Li, C., Xu, M., Jiang, L., Zhang, S., Tao, X.: Viewport Proposal CNN for $360^\circ $ Video Quality Assessment. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2019)
Mazumdar, P., Lamichhane, K., Carli, M., Battisti, F.: A feature integrated saliency estimation model for omnidirectional immersive images. Electronics 8(12), 1538 (2019). https://doi.org/10.3390/electronics8121538
Article Google Scholar
Monroy, R., Lutz, S., Chalasani, T., Smolic, A.: Salnet360: saliency maps for omni-directional images with CNN. Signal Process. Image Commun. 69, 26–34 (2018)
Article Google Scholar
Nguyen, D.V., Tran, H.T.T., Pham, A.T., Thang, T.C.: An optimal tile-based approach for viewport-adaptive 360-degree video streaming. IEEE J. Emerg. Sel. Top. Circuits Syst. 9, 29–42 (2019)
Article Google Scholar
Ohm, J.R., Sullivan, G.: Vision, applications and requirements for high efficiency video coding (HEVC). Technical Report MPEG2011/N11891, ISO/IEC JTC1/SC29/WG11, Geneva, Switzerland (2011)
Ozcinar, C., Cabrera, J., Smolic, A.: Visual attention-aware omnidirectional video streaming using optimal tiles for virtual reality. IEEE J. Emerg. Sel. Top. Circuits Syst. 9(1), 217–230 (2019). https://doi.org/10.1109/JETCAS.2019.2895096
Article Google Scholar
Ozcinar, C., De Abreu, A., Smolic, A.: Viewport-aware adaptive 360 video streaming using tiles for virtual reality. In: 2017 IEEE International Conference on Image Processing (ICIP17) (2017)
Ozcinar, C., Smolic, A.: Visual attention in omnidirectional video for virtual reality applications. In: 10th International Conference on Quality of Multimedia Experience (QoMEX 2018). Sardinia, Italy (2018)
Petrangeli, S., Simon, G., Swaminathan, V.: Trajectory-based viewport prediction for 360-degree virtual reality videos. In: 2018 IEEE International Conference on Artificial Intelligence and Virtual Reality (AIVR), pp. 157–160 (2018). https://doi.org/10.1109/AIVR.2018.00033
Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., Huang, Z., Karpathy, A., Khosla, A., Bernstein, M., Berg, A.C., Fei-Fei, L.: ImageNet large scale visual recognition challenge. Int. J. Comput. Vis. 115(3), 211–252 (2015). https://doi.org/10.1007/s11263-015-0816-y
Article MathSciNet Google Scholar
Selvaraju, R.R., Cogswell, M., Das, A., Vedantam, R., Parikh, D., Batra, D.: Grad-CAM: visual explanations from deep networks via gradient-based localization. Int. J. Comput. Vis. 115(3), 211–252 (2019). https://doi.org/10.1007/s11263-015-0816-y
Article Google Scholar
Shimoda, W., Yanai, K.: Distinct Class-Specific Saliency Maps for Weakly Supervised Semantic Segmentation. In: European Conference on Computer Vision (ECCV) (2016)
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: International Conference on Learning Representations (2015)
Sun, Y., Lu, A., Yu, L.: Weighted-to-spherically-uniform quality evaluation for omnidirectional video. IEEE Signal Process. Lett. 24, 1408–1412 (2017)
Google Scholar
Tang, L., Wu, Q., Li, W., Liu, Y.: Deep saliency quality assessment network with joint metric. IEEE Access 6, 913–924 (2017). https://doi.org/10.1109/ACCESS.2017.2776344
Article Google Scholar
x265: x265 HEVC Encoder/H.265 Video Codec. http://x265.org/ (2018)
Xu, M., Li, C., Zhang, S., Le Callet, P.: State-of-the-art in $360^\circ $ video/image processing: perception, assessment and compression. IEEE J. Sel. Top. Signal Process. (2020). https://doi.org/10.1109/JSTSP.2020.2966864
Article Google Scholar
Zhou, B., Khosla, A., A., L., Oliva, A., Torralba, A.: Learning Deep Features for Discriminative Localization. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2016)

Download references

Author information

Authors and Affiliations

V-SENSE, Trinity College Dublin, Dublin, Ireland
Cagri Ozcinar & Aljosa Smolic
Artificial Intelligence Research Center, National Institute of Advanced Industrial Science and Technology, Tokyo, Japan
Nevrez İmamoğlu & Weimin Wang

Authors

Cagri Ozcinar
View author publications
You can also search for this author in PubMed Google Scholar
Nevrez İmamoğlu
View author publications
You can also search for this author in PubMed Google Scholar
Weimin Wang
View author publications
You can also search for this author in PubMed Google Scholar
Aljosa Smolic
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Cagri Ozcinar.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This publication has emanated from research conducted with the financial support of Science Foundation Ireland (SFI) under the Grant Number 15/RP/27760, V-SENSE, Trinity College Dublin, Ireland. This paper is partly based on the results obtained from a Project commissioned by Public/Private R&D Investment Strategic Expansion Program (PRISM), AIST, Japan. Cagri Ozcinar and Nevrez İmamoğlu equally contributed to this work.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Ozcinar, C., İmamoğlu, N., Wang, W. et al. Delivery of omnidirectional video using saliency prediction and optimal bitrate allocation. SIViP 15, 493–500 (2021). https://doi.org/10.1007/s11760-020-01769-2

Download citation

Received: 02 May 2020
Revised: 10 August 2020
Accepted: 25 August 2020
Published: 09 September 2020
Issue Date: April 2021
DOI: https://doi.org/10.1007/s11760-020-01769-2

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Delivery of omnidirectional video using saliency prediction and optimal bitrate allocation

Abstract

Access this article

Similar content being viewed by others

ATSal: An Attention Based Architecture for Saliency Prediction in 360 $$^\circ $$ Videos

Audio-Visual Saliency for Omnidirectional Videos

Visual attention-aware quality estimation framework for omnidirectional video using spherical Voronoi diagram

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Delivery of omnidirectional video using saliency prediction and optimal bitrate allocation

Abstract

Access this article

Similar content being viewed by others

ATSal: An Attention Based Architecture for Saliency Prediction in 360 $$^\circ $$ Videos

Audio-Visual Saliency for Omnidirectional Videos

Visual attention-aware quality estimation framework for omnidirectional video using spherical Voronoi diagram

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation