Abstract
In omnidirectional images or videos, the viewer receives an interactive and immersive experience from the viewport by changing the viewing angle. Due to the wide application of omnidirectional videos, the visual quality assessment for omnidirectional videos is becoming an urgent issue. Due to the large resolution of an omnidirectional video, regions with object motions usually catch the viewers’ attention, so the motion regions have great influences on the visual quality perception. Since the number of potential viewports is huge and the viewer spends varying amounts of time for different viewports, viewport selection is a critical yet not resolved problem for omnidirectional video quality assessment (VQA). In this paper, we propose a two-stream network with viewport selection for blind omnidirectional VQA to incorporate the influences of motion regions and viewport selection. Firstly, we propose a two-stream multi-task convolutional neural network (TSMT) for VQA at any viewport, which uses video frame sequences and motion sequences as inputs. The motion sequences are represented as horizontal and vertical optical flows. Based on the observation that the low latitude regions, the front view, and the moving objects have higher possibilities that appearing in the viewport, we propose a viewport selection method based on a fusion-based saliency map that considers those regions. Experimental results on two datasets demonstrated that the proposed model outperforms state-of-the-art omnidirectional VQA methods.
Similar content being viewed by others
Data Availability
Date and code will be made available on reasonable request.
References
Antkowiak J, Jamal Baina T, Baroncini FV, Chateau N, FranceTelecom F, Pessoa ACF, Stephanie Colonnese F, Contin IL, Caviedes J, Philips F (2000) Final report from the video quality experts group on the validation of objective models of video quality assessment March 2000
Bosse S, Maniry D, Müller K-R, Wiegand T, Samek W (2017) Deep neural networks for no-reference and full-reference image quality assessment. IEEE Trans Image Process 27(1):206–219. https://doi.org/10.1109/TIP.2017.2760518
Boyce J, Alshina E, Abbas A, Yan Y (2018) JVET-E1030: JVET common test conditions and evaluation procedures for 360\(^{\circ }\) video
Chai X, Shao F (2021) Blind quality assessment of omnidirectional videos using spatio-temporal convolutional neural networks. Optik 226:165887. https://doi.org/10.1016/j.ijleo.2020.165887
Chen S, Zhang Y, Li Y, Chen Z, Wang Z (2018) Spherical structural similarity index for objective omnidirectional video quality assessment. In: 2018 IEEE International Conference on Multimedia and Expo (ICME). pp 1–6. IEEE
Chi L, Tian G, Mu Y, Tian Q (2019) Two-stream video classification with cross-modality attention. In: Proceedings of the IEEE/CVF International Conference on Computer Vision Workshops. pp 0–0 . https://doi.org/10.1109/ICCVW.2019.00552
Choi B, Wang Y, Hannuksela M, Lim Y, Murtaza A (2017) Information technology-coded representation of immersive media (mpeg-i)-part 2: Omnidirectional media format. ISO/IEC 23090–2
der Auwera GV, Coban M, Karczewicz M (2016) AHG8: Truncated square pyramid projection (tsp) for 360 video. JVET Doc 0071
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proc IEEE Conf Comput Vis Pattern Recognit. pp 770–778 . https://doi.org/10.1109/CVPR.2016.90
Jain SD, Xiong B, Grauman K (2017) Fusionseg: Learning to combine motion and appearance for fully automatic segmentation of generic objects in videos. In: 2017 IEEE Conf Comput Vis Pattern Recognit (CVPR). pp 2117–2126. https://doi.org/10.1109/CVPR.2017.228. IEEE
Kendall A, Gal Y, Cipolla R (2018) Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: Proc IEEE Conf Comput Vis Pattern Recognit. pp 7482–7491. https://doi.org/10.1109/CVPR.2018.00781
Kim HG, Lim H-T, Lee S, Ro YM (2018) Vrsa net: Vr sickness assessment considering exceptional motion for 360 vr video. IEEE Trans Image Process 28(4):1646–1660. https://doi.org/10.1109/TIP.2018.2880509
Kim HG, Lim H-T, Ro YM (2019) Deep virtual reality image quality assessment with human perception guider for omnidirectional image. IEEE Transactions on Circuits and Systems for Video Technology 30(4):917–928. https://doi.org/10.1109/TCSVT.2019.2898732
Kim W, Kim J, Ahn S, Kim J, Lee S (2018) Deep video quality assessor: From spatio-temporal visual sensitivity to a convolutional neural aggregation network. In: Proceedings of the European Conference on Computer Vision (ECCV). pp 219–234. https://doi.org/10.1007/978-3-030-01246-5_14
Kim J, Lee S (2017) Deep learning of human visual sensitivity in image quality assessment framework. In: Proc IEEE Conf Comput Vis Pattern Recognit. pp 1676–1684. https://doi.org/10.1109/CVPR.2017.213
Kingma DP, Ba J (2014) Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980
Lim H-T, Kim HG, Ra, YM (2018) Vr iqa net: Deep virtual reality image quality assessment using adversarial learning. In: 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). pp 6737–6741. https://doi.org/10.1109/ICASSP.2018.8461317. IEEE
Li C, Xu M, Du X, Wang Z (2018) Bridge the gap between vqa and human behavior on omnidirectional video: A large-scale dataset and a deep learning model. In: Proceedings of the 26th ACM International Conference on Multimedia. pp 932–940. https://doi.org/10.1145/3240508.3240581
Li C, Xu M, Jiang L, Zhang S, Tao X (2019) Viewport proposal cnn for 360\(^{\circ }\) video quality assessment. In: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). pp 10169–10178. https://doi.org/10.1109/CVPR.2019.01042. IEEE
Mahmoudpour S, Schelkens P (2019) Visual quality analysis of judder effect on head mounted displays. In: 2019 27th European Signal Processing Conference (EUSIPCO). pp 1–5. IEEE
Mangiante S, Klas G, Navon A, Guanhua Z, Ran J, Silva MD (2017) VR is on the edge: How to deliver 360\(^{\circ }\) videos in mobile networks. In: Proceedings of the Workshop on Virtual Reality and Augmented Reality Network . https://doi.org/10.1145/3097895.3097901
Ng K-T, Chan S-C, Shum H-Y (2005) Data compression and transmission aspects of panoramic videos. IEEE Transactions on Circuits and Systems for Video Technology 15(1):82–95. https://doi.org/10.1109/TCSVT.2004.839989
Qiu Z, Yao T, Mei T (2017) Learning spatio-temporal representation with pseudo-3d residual networks. In: Proceedings of the IEEE International Conference on Computer Vision. pp 5533–5541. https://doi.org/10.1109/ICCV.2017.590
Seshadrinathan K, Bovik AC (2009) Motion tuned spatio-temporal quality assessment of natural videos. IEEE Trans Image Process 19(2):335–350. https://doi.org/10.1109/TIP.2009.2034992
Seshadrinathan K, Soundararajan R, Bovik AC, Cormack LK (2010) Study of subjective and objective quality assessment of video. IEEE Trans Image Process 19(6):1427–1441. https://doi.org/10.1109/TIP.2010.2042111
Shi L, Zhang Y, Cheng J, Lu H (2019) Two-stream adaptive graph convolutional networks for skeletonbased action recognition. In: Proc IEEE/CVF Conf Comput Vis Pattern Recognit. pp 12026–12035. https://doi.org/10.1109/CVPR.2019.01230
Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556
Sun Y, Lu A, Yu L (2017) Weighted-to-spherically-uniform quality evaluation for omnidirectional video. IEEE signal processing letters 24(9):1408–1412. https://doi.org/10.1109/LSP.2017.2720693
Sun W, Min X, Zhai G, Gu K, Duan H, Ma S (2019) Mc360iqa: A multi-channel cnn for blind 360-degree image quality assessment. IEEE Journal of Selected Topics in Signal Processing 14(1):64–77
Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning. pp 6105–6114. PMLR
Tran D, Wang H, Torresani L, Ray J, LeCun Y, Paluri M (2018) A closer look at spatiotemporal convolutions for action recognition. In: Proc IEEE Conf Comput Vis Pattern Recognit. pp 6450–6459. https://doi.org/10.1109/CVPR.2018.00675
Wang L, Xiong Y, Wang Z, Qiao Y, Lin D, Tang X, Van Gool L (2016) Temporal segment networks: Towards good practices for deep action recognition. In: European Conference on Computer Vision. pp 20–36. Springer. https://doi.org/10.1007/978-3-319-46484-8_2
Xu M, Li C, Chen Z, Wang Z, Guan Z (2018) Assessing visual quality of omnidirectional videos. IEEE transactions on circuits and systems for video technology 29(12):3516–3530. https://doi.org/10.1109/TCSVT.2018.2886277
Xu M, Li C, Zhang S, Le Callet P (2020) State-of-the-art in 360\(^{\circ }\) video/image processing: Perception, assessment and compression. IEEE Journal of Selected Topics in Signal Processing 14(1):5–26. https://doi.org/10.1109/JSTSP.2020.2966864
Xu J, Zhou W, Chen Z (2020) Blind omnidirectional image quality assessment with viewport oriented graph convolutional networks. IEEE Transactions on Circuits and Systems for Video Technology 31(5):1724–1737. https://doi.org/10.1109/TCSVT.2020.3015186
Xu J, Luo Z, Zhou W, Zhang W, Chen Z (2019) Quality assessment of stereoscopic 360-degree images from multi-viewports. In: 2019 Picture Coding Symposium (PCS). pp 1–5. IEEE
Yang S, Zhao J, Jiang T, Wang J, Rahim T, Zhang B, Xu Z, Fei Z (2017) An objective assessment method based on multi-level factors for panoramic videos. In: 2017 IEEE Visual Communications and Image Processing (VCIP). pp 1–4. IEEE
Yu M, Lakshman H, Girod B (2015) A framework to evaluate omnidirectional video coding schemes. In: 2015 IEEE International Symposium on Mixed and Augmented Reality. pp 31–36. https://doi.org/10.1109/ISMAR.2015.12. IEEE
Zach C, Pock T, Bischof H (2007) A duality based approach for realtime tv-l 1 optical flow. In: Joint Pattern Recognition Symposium, pp. 214-223 . Springer. https://doi.org/10.1007/978-3-540-74936-3_22
Zakharchenko V, Choi KP, Park JH (2016) Quality metric for spherical panoramic video. In: Optics and Photonics for Information Processing X, vol 9970. p 99700 . https://doi.org/10.1117/12.2235885. International Society for Optics and Photonics
Zhou T, Li J, Wang S, Tao R, Shen J (2020) Matnet: Motion-attentive transition network for zero-shot video object segmentation. IEEE Trans Image Process 29:8326–8338. https://doi.org/10.1109/TIP.2020.3013162
Zhou Y, Yu M, Ma H, Shao H, Jiang G (2018) Weighted-to-spherically-uniform ssim objective quality evaluation for panoramic video. In: 2018 14th IEEE International Conference on Signal Processing (ICSP). pp 54–57. https://doi.org/10.1109/ICSP.2018.8652269. IEEE
Acknowledgements
This work is supported by the National Natural Science Foundation of China (No.61972097) and the Natural Science Foundation of Fujian Province (No.2020J01494).
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflicts of interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Chen, J., Niu, Y. Two-stream network with viewport selection for blind omnidirectional video quality assessment. Multimed Tools Appl 83, 12139–12157 (2024). https://doi.org/10.1007/s11042-023-15739-6
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-023-15739-6