Skip to main content

Delivery of omnidirectional video using saliency prediction and optimal bitrate allocation


In this work, we propose and investigate a user-centric framework for the delivery of omnidirectional video (ODV) on VR systems by taking advantage of visual attention (saliency) models for bitrate allocation module. For this purpose, we formulate a new bitrate allocation algorithm that takes saliency map and nonlinear sphere-to-plane mapping into account for each ODV and solve the formulated problem using linear integer programming. For visual attention models, we use both image- and video-based saliency prediction results; moreover, we explore two types of attention model approaches: (i) salient object detection with transfer learning using pre-trained networks, (ii) saliency prediction with supervised networks trained on eye-fixation dataset. Experimental evaluations on saliency integration of models are discussed with interesting findings on transfer learning and supervised saliency approaches.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5


  1. Abbas, A., Adsumilli, B.: AhG8: new GoPro test sequences for virtual reality video coding. Technical Report JVET-D0026, JTC1/SC29/WG11, ISO/IEC, Chengdu, China (2016)

  2. Asbun, E., He, H., Y., H., Ye, Y.: AhG8: InterDigital test sequences for virtual reality video coding. Technical Report JVET-D0039, JTC1/SC29/WG11, ISO/IEC, Chengdu, China (2016)

  3. Bang, G., Lafruit, G., Tanimoto, M.: Description of 360 3D video application exploration experiments on divergent multiview video. Technical Report MPEG2015/ M16129, JTC1/SC29/WG11, ISO/IEC, Chengdu, China (2016)

  4. Cornia, M., Baraldi, L., Serra, G., Cucchiara, R.: A Deep Multi-Level Network for Saliency Prediction. In: International Conference on Pattern Recognition (ICPR) (2016)

  5. Fan, C.L., Lo, W.C., Pai, Y.T., Hsu, C.H.: A survey on \(360^\circ \) video streaming: acquisition, transmission, and display. ACM Comput. Surv. 54, 1–36 (2019)

    Article  Google Scholar 

  6. Fang, Y., Chen, Z., Lin, W., Lin, C.W.: Saliency detection in the compressed domain for adaptive image re-targeting. IEEE Trans. Image Process. 21(9), 3888–3901 (2012).

    MathSciNet  Article  MATH  Google Scholar 

  7. Fang, Y., Wang, Z., Lin, W., Fang, Z.: Video saliency incorporating spatiotemporal cues and uncertainty weighting. IEEE Trans. Image Process. 23(9), 3910–3921 (2014).

    MathSciNet  Article  MATH  Google Scholar 

  8. Hadizadeh, H., Bajić, I.V.: Saliency-aware video compression. IEEE Trans. Image Process. 23(1), 19–33 (2014).

    MathSciNet  Article  MATH  Google Scholar 

  9. Imamoglu, N., Zhang, C., Shimoda, W., Fang, Y., Shi, B.: Saliency detection by forward and backward cues in deep-CNNs. In: 2017 IEEE International Conference on Image Processing (ICIP) (2017)

  10. Intel: Experience the Future of the Olympic Games with Intel (2018).

  11. Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: saliency in context. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2015)

  12. Kan, N., Zou, J., Tang, K., Li, C., Liu, N., Xiong, H.: Deep reinforcement learning-based rate adaptation for adaptive 360-degree video streaming. In: ICASSP 2019–2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4030–4034 (2019)

  13. Li, C., Xu, M., Jiang, L., Zhang, S., Tao, X.: Viewport Proposal CNN for \(360^\circ \) Video Quality Assessment. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2019)

  14. Mazumdar, P., Lamichhane, K., Carli, M., Battisti, F.: A feature integrated saliency estimation model for omnidirectional immersive images. Electronics 8(12), 1538 (2019).

    Article  Google Scholar 

  15. Monroy, R., Lutz, S., Chalasani, T., Smolic, A.: Salnet360: saliency maps for omni-directional images with CNN. Signal Process. Image Commun. 69, 26–34 (2018)

    Article  Google Scholar 

  16. Nguyen, D.V., Tran, H.T.T., Pham, A.T., Thang, T.C.: An optimal tile-based approach for viewport-adaptive 360-degree video streaming. IEEE J. Emerg. Sel. Top. Circuits Syst. 9, 29–42 (2019)

    Article  Google Scholar 

  17. Ohm, J.R., Sullivan, G.: Vision, applications and requirements for high efficiency video coding (HEVC). Technical Report MPEG2011/N11891, ISO/IEC JTC1/SC29/WG11, Geneva, Switzerland (2011)

  18. Ozcinar, C., Cabrera, J., Smolic, A.: Visual attention-aware omnidirectional video streaming using optimal tiles for virtual reality. IEEE J. Emerg. Sel. Top. Circuits Syst. 9(1), 217–230 (2019).

    Article  Google Scholar 

  19. Ozcinar, C., De Abreu, A., Smolic, A.: Viewport-aware adaptive 360 video streaming using tiles for virtual reality. In: 2017 IEEE International Conference on Image Processing (ICIP17) (2017)

  20. Ozcinar, C., Smolic, A.: Visual attention in omnidirectional video for virtual reality applications. In: 10th International Conference on Quality of Multimedia Experience (QoMEX 2018). Sardinia, Italy (2018)

  21. Petrangeli, S., Simon, G., Swaminathan, V.: Trajectory-based viewport prediction for 360-degree virtual reality videos. In: 2018 IEEE International Conference on Artificial Intelligence and Virtual Reality (AIVR), pp. 157–160 (2018).

  22. Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., Huang, Z., Karpathy, A., Khosla, A., Bernstein, M., Berg, A.C., Fei-Fei, L.: ImageNet large scale visual recognition challenge. Int. J. Comput. Vis. 115(3), 211–252 (2015).

    MathSciNet  Article  Google Scholar 

  23. Selvaraju, R.R., Cogswell, M., Das, A., Vedantam, R., Parikh, D., Batra, D.: Grad-CAM: visual explanations from deep networks via gradient-based localization. Int. J. Comput. Vis. 115(3), 211–252 (2019).

    Article  Google Scholar 

  24. Shimoda, W., Yanai, K.: Distinct Class-Specific Saliency Maps for Weakly Supervised Semantic Segmentation. In: European Conference on Computer Vision (ECCV) (2016)

  25. Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: International Conference on Learning Representations (2015)

  26. Sun, Y., Lu, A., Yu, L.: Weighted-to-spherically-uniform quality evaluation for omnidirectional video. IEEE Signal Process. Lett. 24, 1408–1412 (2017)

    Google Scholar 

  27. Tang, L., Wu, Q., Li, W., Liu, Y.: Deep saliency quality assessment network with joint metric. IEEE Access 6, 913–924 (2017).

    Article  Google Scholar 

  28. x265: x265 HEVC Encoder/H.265 Video Codec. (2018)

  29. Xu, M., Li, C., Zhang, S., Le Callet, P.: State-of-the-art in \(360^\circ \) video/image processing: perception, assessment and compression. IEEE J. Sel. Top. Signal Process. (2020).

    Article  Google Scholar 

  30. Zhou, B., Khosla, A., A., L., Oliva, A., Torralba, A.: Learning Deep Features for Discriminative Localization. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2016)

Download references

Author information

Authors and Affiliations


Corresponding author

Correspondence to Cagri Ozcinar.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This publication has emanated from research conducted with the financial support of Science Foundation Ireland (SFI) under the Grant Number 15/RP/27760, V-SENSE, Trinity College Dublin, Ireland. This paper is partly based on the results obtained from a Project commissioned by Public/Private R&D Investment Strategic Expansion Program (PRISM), AIST, Japan. Cagri Ozcinar and Nevrez İmamoğlu equally contributed to this work.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Ozcinar, C., İmamoğlu, N., Wang, W. et al. Delivery of omnidirectional video using saliency prediction and optimal bitrate allocation. SIViP 15, 493–500 (2021).

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI:


  • \(360^\circ \) Video streaming
  • Attention-based bitrate allocation
  • Saliency maps with transfer learning and supervision