Two-stream network with viewport selection for blind omnidirectional video quality assessment

Chen, Junhao; Niu, Yuzhen

doi:10.1007/s11042-023-15739-6

Two-stream network with viewport selection for blind omnidirectional video quality assessment

Published: 22 June 2023

Volume 83, pages 12139–12157, (2024)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

Junhao Chen¹ &
Yuzhen Niu¹

226 Accesses
1 Altmetric
Explore all metrics

Abstract

In omnidirectional images or videos, the viewer receives an interactive and immersive experience from the viewport by changing the viewing angle. Due to the wide application of omnidirectional videos, the visual quality assessment for omnidirectional videos is becoming an urgent issue. Due to the large resolution of an omnidirectional video, regions with object motions usually catch the viewers’ attention, so the motion regions have great influences on the visual quality perception. Since the number of potential viewports is huge and the viewer spends varying amounts of time for different viewports, viewport selection is a critical yet not resolved problem for omnidirectional video quality assessment (VQA). In this paper, we propose a two-stream network with viewport selection for blind omnidirectional VQA to incorporate the influences of motion regions and viewport selection. Firstly, we propose a two-stream multi-task convolutional neural network (TSMT) for VQA at any viewport, which uses video frame sequences and motion sequences as inputs. The motion sequences are represented as horizontal and vertical optical flows. Based on the observation that the low latitude regions, the front view, and the moving objects have higher possibilities that appearing in the viewport, we propose a viewport selection method based on a fusion-based saliency map that considers those regions. Experimental results on two datasets demonstrated that the proposed model outperforms state-of-the-art omnidirectional VQA methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

360° video quality assessment based on saliency-guided viewport extraction

Article 21 March 2024

Multi-viewport based 3D convolutional neural network for 360-degree video quality assessment

Article 03 March 2022

Weighted Mean Deviation Similarity Index for Objective Omnidirectional Video Quality Assessment

Data Availability

Date and code will be made available on reasonable request.

References

Antkowiak J, Jamal Baina T, Baroncini FV, Chateau N, FranceTelecom F, Pessoa ACF, Stephanie Colonnese F, Contin IL, Caviedes J, Philips F (2000) Final report from the video quality experts group on the validation of objective models of video quality assessment March 2000
Bosse S, Maniry D, Müller K-R, Wiegand T, Samek W (2017) Deep neural networks for no-reference and full-reference image quality assessment. IEEE Trans Image Process 27(1):206–219. https://doi.org/10.1109/TIP.2017.2760518
Article MathSciNet Google Scholar
Boyce J, Alshina E, Abbas A, Yan Y (2018) JVET-E1030: JVET common test conditions and evaluation procedures for 360\(^{\circ }\) video
Chai X, Shao F (2021) Blind quality assessment of omnidirectional videos using spatio-temporal convolutional neural networks. Optik 226:165887. https://doi.org/10.1016/j.ijleo.2020.165887
Article Google Scholar
Chen S, Zhang Y, Li Y, Chen Z, Wang Z (2018) Spherical structural similarity index for objective omnidirectional video quality assessment. In: 2018 IEEE International Conference on Multimedia and Expo (ICME). pp 1–6. IEEE
Chi L, Tian G, Mu Y, Tian Q (2019) Two-stream video classification with cross-modality attention. In: Proceedings of the IEEE/CVF International Conference on Computer Vision Workshops. pp 0–0 . https://doi.org/10.1109/ICCVW.2019.00552
Choi B, Wang Y, Hannuksela M, Lim Y, Murtaza A (2017) Information technology-coded representation of immersive media (mpeg-i)-part 2: Omnidirectional media format. ISO/IEC 23090–2
der Auwera GV, Coban M, Karczewicz M (2016) AHG8: Truncated square pyramid projection (tsp) for 360 video. JVET Doc 0071
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proc IEEE Conf Comput Vis Pattern Recognit. pp 770–778 . https://doi.org/10.1109/CVPR.2016.90
Jain SD, Xiong B, Grauman K (2017) Fusionseg: Learning to combine motion and appearance for fully automatic segmentation of generic objects in videos. In: 2017 IEEE Conf Comput Vis Pattern Recognit (CVPR). pp 2117–2126. https://doi.org/10.1109/CVPR.2017.228. IEEE
Kendall A, Gal Y, Cipolla R (2018) Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: Proc IEEE Conf Comput Vis Pattern Recognit. pp 7482–7491. https://doi.org/10.1109/CVPR.2018.00781
Kim HG, Lim H-T, Lee S, Ro YM (2018) Vrsa net: Vr sickness assessment considering exceptional motion for 360 vr video. IEEE Trans Image Process 28(4):1646–1660. https://doi.org/10.1109/TIP.2018.2880509
Article MathSciNet Google Scholar
Kim HG, Lim H-T, Ro YM (2019) Deep virtual reality image quality assessment with human perception guider for omnidirectional image. IEEE Transactions on Circuits and Systems for Video Technology 30(4):917–928. https://doi.org/10.1109/TCSVT.2019.2898732
Article Google Scholar
Kim W, Kim J, Ahn S, Kim J, Lee S (2018) Deep video quality assessor: From spatio-temporal visual sensitivity to a convolutional neural aggregation network. In: Proceedings of the European Conference on Computer Vision (ECCV). pp 219–234. https://doi.org/10.1007/978-3-030-01246-5_14
Kim J, Lee S (2017) Deep learning of human visual sensitivity in image quality assessment framework. In: Proc IEEE Conf Comput Vis Pattern Recognit. pp 1676–1684. https://doi.org/10.1109/CVPR.2017.213
Kingma DP, Ba J (2014) Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980
Lim H-T, Kim HG, Ra, YM (2018) Vr iqa net: Deep virtual reality image quality assessment using adversarial learning. In: 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). pp 6737–6741. https://doi.org/10.1109/ICASSP.2018.8461317. IEEE
Li C, Xu M, Du X, Wang Z (2018) Bridge the gap between vqa and human behavior on omnidirectional video: A large-scale dataset and a deep learning model. In: Proceedings of the 26th ACM International Conference on Multimedia. pp 932–940. https://doi.org/10.1145/3240508.3240581
Li C, Xu M, Jiang L, Zhang S, Tao X (2019) Viewport proposal cnn for 360\(^{\circ }\) video quality assessment. In: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). pp 10169–10178. https://doi.org/10.1109/CVPR.2019.01042. IEEE
Mahmoudpour S, Schelkens P (2019) Visual quality analysis of judder effect on head mounted displays. In: 2019 27th European Signal Processing Conference (EUSIPCO). pp 1–5. IEEE
Mangiante S, Klas G, Navon A, Guanhua Z, Ran J, Silva MD (2017) VR is on the edge: How to deliver 360\(^{\circ }\) videos in mobile networks. In: Proceedings of the Workshop on Virtual Reality and Augmented Reality Network . https://doi.org/10.1145/3097895.3097901
Ng K-T, Chan S-C, Shum H-Y (2005) Data compression and transmission aspects of panoramic videos. IEEE Transactions on Circuits and Systems for Video Technology 15(1):82–95. https://doi.org/10.1109/TCSVT.2004.839989
Article Google Scholar
Qiu Z, Yao T, Mei T (2017) Learning spatio-temporal representation with pseudo-3d residual networks. In: Proceedings of the IEEE International Conference on Computer Vision. pp 5533–5541. https://doi.org/10.1109/ICCV.2017.590
Seshadrinathan K, Bovik AC (2009) Motion tuned spatio-temporal quality assessment of natural videos. IEEE Trans Image Process 19(2):335–350. https://doi.org/10.1109/TIP.2009.2034992
Article MathSciNet Google Scholar
Seshadrinathan K, Soundararajan R, Bovik AC, Cormack LK (2010) Study of subjective and objective quality assessment of video. IEEE Trans Image Process 19(6):1427–1441. https://doi.org/10.1109/TIP.2010.2042111
Article MathSciNet Google Scholar
Shi L, Zhang Y, Cheng J, Lu H (2019) Two-stream adaptive graph convolutional networks for skeletonbased action recognition. In: Proc IEEE/CVF Conf Comput Vis Pattern Recognit. pp 12026–12035. https://doi.org/10.1109/CVPR.2019.01230
Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556
Sun Y, Lu A, Yu L (2017) Weighted-to-spherically-uniform quality evaluation for omnidirectional video. IEEE signal processing letters 24(9):1408–1412. https://doi.org/10.1109/LSP.2017.2720693
Article Google Scholar
Sun W, Min X, Zhai G, Gu K, Duan H, Ma S (2019) Mc360iqa: A multi-channel cnn for blind 360-degree image quality assessment. IEEE Journal of Selected Topics in Signal Processing 14(1):64–77
Article Google Scholar
Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning. pp 6105–6114. PMLR
Tran D, Wang H, Torresani L, Ray J, LeCun Y, Paluri M (2018) A closer look at spatiotemporal convolutions for action recognition. In: Proc IEEE Conf Comput Vis Pattern Recognit. pp 6450–6459. https://doi.org/10.1109/CVPR.2018.00675
Wang L, Xiong Y, Wang Z, Qiao Y, Lin D, Tang X, Van Gool L (2016) Temporal segment networks: Towards good practices for deep action recognition. In: European Conference on Computer Vision. pp 20–36. Springer. https://doi.org/10.1007/978-3-319-46484-8_2
Xu M, Li C, Chen Z, Wang Z, Guan Z (2018) Assessing visual quality of omnidirectional videos. IEEE transactions on circuits and systems for video technology 29(12):3516–3530. https://doi.org/10.1109/TCSVT.2018.2886277
Article Google Scholar
Xu M, Li C, Zhang S, Le Callet P (2020) State-of-the-art in 360\(^{\circ }\) video/image processing: Perception, assessment and compression. IEEE Journal of Selected Topics in Signal Processing 14(1):5–26. https://doi.org/10.1109/JSTSP.2020.2966864
Article Google Scholar
Xu J, Zhou W, Chen Z (2020) Blind omnidirectional image quality assessment with viewport oriented graph convolutional networks. IEEE Transactions on Circuits and Systems for Video Technology 31(5):1724–1737. https://doi.org/10.1109/TCSVT.2020.3015186
Article Google Scholar
Xu J, Luo Z, Zhou W, Zhang W, Chen Z (2019) Quality assessment of stereoscopic 360-degree images from multi-viewports. In: 2019 Picture Coding Symposium (PCS). pp 1–5. IEEE
Yang S, Zhao J, Jiang T, Wang J, Rahim T, Zhang B, Xu Z, Fei Z (2017) An objective assessment method based on multi-level factors for panoramic videos. In: 2017 IEEE Visual Communications and Image Processing (VCIP). pp 1–4. IEEE
Yu M, Lakshman H, Girod B (2015) A framework to evaluate omnidirectional video coding schemes. In: 2015 IEEE International Symposium on Mixed and Augmented Reality. pp 31–36. https://doi.org/10.1109/ISMAR.2015.12. IEEE
Zach C, Pock T, Bischof H (2007) A duality based approach for realtime tv-l 1 optical flow. In: Joint Pattern Recognition Symposium, pp. 214-223 . Springer. https://doi.org/10.1007/978-3-540-74936-3_22
Zakharchenko V, Choi KP, Park JH (2016) Quality metric for spherical panoramic video. In: Optics and Photonics for Information Processing X, vol 9970. p 99700 . https://doi.org/10.1117/12.2235885. International Society for Optics and Photonics
Zhou T, Li J, Wang S, Tao R, Shen J (2020) Matnet: Motion-attentive transition network for zero-shot video object segmentation. IEEE Trans Image Process 29:8326–8338. https://doi.org/10.1109/TIP.2020.3013162
Article Google Scholar
Zhou Y, Yu M, Ma H, Shao H, Jiang G (2018) Weighted-to-spherically-uniform ssim objective quality evaluation for panoramic video. In: 2018 14th IEEE International Conference on Signal Processing (ICSP). pp 54–57. https://doi.org/10.1109/ICSP.2018.8652269. IEEE

Download references

Acknowledgements

This work is supported by the National Natural Science Foundation of China (No.61972097) and the Natural Science Foundation of Fujian Province (No.2020J01494).

Author information

Authors and Affiliations

Fujian Key Laboratory of Network Computing and Intelligent Information Processing, Fuzhou University, 350108, Fuzhou, China
Junhao Chen & Yuzhen Niu

Authors

Junhao Chen
View author publications
You can also search for this author in PubMed Google Scholar
Yuzhen Niu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yuzhen Niu.

Ethics declarations

Conflicts of interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Chen, J., Niu, Y. Two-stream network with viewport selection for blind omnidirectional video quality assessment. Multimed Tools Appl 83, 12139–12157 (2024). https://doi.org/10.1007/s11042-023-15739-6

Download citation

Received: 28 November 2022
Revised: 30 March 2023
Accepted: 25 April 2023
Published: 22 June 2023
Issue Date: January 2024
DOI: https://doi.org/10.1007/s11042-023-15739-6

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Two-stream network with viewport selection for blind omnidirectional video quality assessment

Abstract

Access this article

Similar content being viewed by others

360° video quality assessment based on saliency-guided viewport extraction

Multi-viewport based 3D convolutional neural network for 360-degree video quality assessment

Weighted Mean Deviation Similarity Index for Objective Omnidirectional Video Quality Assessment

Data Availability

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflicts of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Two-stream network with viewport selection for blind omnidirectional video quality assessment

Abstract

Access this article

Similar content being viewed by others

360° video quality assessment based on saliency-guided viewport extraction

Multi-viewport based 3D convolutional neural network for 360-degree video quality assessment

Weighted Mean Deviation Similarity Index for Objective Omnidirectional Video Quality Assessment

Data Availability

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflicts of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation