Abstract
The spherical domain representation of 360\(^\circ \) video/image presents many challenges related to the storage, processing, transmission and rendering of omnidirectional videos (ODV). Models of human visual attention can be used so that only a single viewport is rendered at a time, which is important when developing systems that allow users to explore ODV with head mounted displays (HMD). Accordingly, researchers have proposed various saliency models for 360\(^\circ \) video/images. This paper proposes ATSal, a novel attention based (head-eye) saliency model for 360\(^\circ \) videos. The attention mechanism explicitly encodes global static visual attention allowing expert models to focus on learning the saliency on local patches throughout consecutive frames. We compare the proposed approach to other state-of-the-art saliency models on two datasets: Salient360! and VR-EyeTracking. Experimental results on over 80 ODV videos (75K+ frames) show that the proposed method outperforms the existing state-of-the-art.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Xu, M., Li, C., Zhang, S., Le Callet, P.: State-of-the-art in 360 video/image processing: perception, assessment and compression. IEEE J. Sel. Top. Signal Process. 14(1), 5–26 (2020)
De Abreu, A., Ozcinar, C., Smolic, A.: Look around you: saliency maps for omnidirectional images in VR applications. In: 2017 Ninth International Conference on Quality of Multimedia Experience (QoMEX), pp. 1–6. IEEE, May 2017
Itti, L., Koch, C.: A saliency-based search mechanism for overt and covert shifts of visual attention. Vision Res. 40(10–12), 1489–1506 (2000)
Pan, J., ET AL.: SalGAN: visual saliency prediction with generative adversarial networks. arXiv preprint arXiv:1701.01081 (2017)
Borji, A.: Saliency prediction in the deep learning era: an empirical investigation. arXiv preprint arXiv:1810.03716. 10 (2018)
Xu, Y., et al.: Gaze prediction in dynamic 360 immersive videos. In: proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5333–5342 (2018)
Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. arXiv preprint arXiv:2003.05477 (2020)
Min, K., Corso, J.J.: TASED-net: temporally-aggregating spatial encoder-decoder network for video saliency detection. In Proceedings of the IEEE International Conference on Computer Vision, pp. 2394–2403. ISO 690 (2019)
Lai, Q., Wang, W., Sun, H., Shen, J.: Video saliency prediction using spatiotemporal residual attentive networks. IEEE Trans. Image Process. 29, 1113–1126 (2019)
Linardos, P., Mohedano, E., Nieto, J.J., O’Connor, N.E., Giro-i-Nieto, X., McGuinness, K.: Simple vs complex temporal recurrences for video saliency prediction. In: British Machine Vision Conference (BMVC) (2019)
Djilali, Y.A.D., Sayah, M., McGuinness, K., O’Connor, N.E.: 3DSAL: an efficient 3D-CNN architecture for video saliency prediction (2020)
Wang, W., Shen, J., Guo, F., Cheng, M.M., Borji, A.: Revisiting video saliency: A large-scale benchmark and a new model. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4894–4903 (2018)
Bak, C., Kocak, A., Erdem, E., Erdem, A.: Spatio-temporal saliency networks for dynamic saliency prediction. IEEE Trans. Multimedia 20(7), 1688–1698 (2017)
Pan, J., et al.: SalGAN: visual saliency prediction with adversarial networks. In: CVPR Scene Understanding Workshop (SUNw), July 2017
Bogdanova, I., Bur, A., Hügli, H., Farine, P.A.: Dynamic visual attention on the sphere. Comput. Vis. Image Underst. 114(1), 100–110 (2010)
Bogdanova, I., Bur, A., Hugli, H.: Visual attention on the sphere. IEEE Trans. Image Process. 17(11), 2000–2014 (2008)
Rai, Y., Le Callet, P., Guillotel, P.: Which saliency weighting for omni directional image quality assessment?. In: 2017 Ninth International Conference on Quality of Multimedia Experience (QoMEX), pp. 1–6. IEEE, May 2017
Xu, M., Song, Y., Wang, J., Qiao, M., Huo, L., Wang, Z.: Predicting head movement in panoramic video: a deep reinforcement learning approach. IEEE Trans. Pattern Anal. Mach. Intell. 41(11), 2693–2708 (2018)
Sitzmann, V., et al.: Saliency in VR: how do people explore virtual environments? IEEE Trans. Visual Comput. Graphics 24(4), 1633–1642 (2018)
Huang, X., Shen, C., Boix, X., Zhao, Q.: SALICON: reducing the semantic gap in saliency prediction by adapting deep neural networks. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 262–270 (2015)
Lebreton, P., Raake, A.: GBVS360, BMS360, ProSal: extending existing saliency prediction models from 2D to omnidirectional images. Signal Process. Image Commun. 69, 69–78 (2018)
Zhang, J., Sclaroff, S.: Saliency detection: a Boolean map approach. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 153–160 (2013)
Harel, J., Koch, C., Perona, P.: Graph-based visual saliency. In: Advances In Neural Information Processing Systems, pp. 545–552 (2007)
Maugey, T., Le Meur, O., Liu, Z.: Saliency-based navigation in omnidirectional image. In: IEEE 19th International Workshop on Multimedia Signal Processing (MMSP). Luton 2017, pp. 1–6 (2017)
Battisti, F., Baldoni, S., Brizzi, M., Carli, M.: A feature-based approach for saliency estimation of omni-directional images. Signal Process. Image Commun. 69, 53–59 (2018)
Fang, Y., Zhang, X., Imamoglu, N.: A novel superpixel-based saliency detection model for 360-degree images. Signal Process. Image Commun. 69, 1–7 (2018)
David, EJ., Gutiérrez, J., Coutrot, A., Da Silva, M. P., Callet, P.L.: A dataset of head and eye movements for 360 videos. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 432–437. ISO 690, June 2018
Zhang, Z., Xu, Y., Yu, J., Gao, S.: Saliency detection in 360 videos. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 488–503 (2018)
Cheng, H.T., Chao, C.H., Dong, J.D., Wen, H.K., Liu, T.L., Sun, M.: Cube padding for weakly-supervised saliency prediction in 360 videos. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1420–1429 (2018)
Suzuki, T., Yamanaka, T.: Saliency map estimation for omni-directional image considering prior distributions. In: 2018 IEEE International Conference on Systems, Man, and Cybernetics (SMC), pp. 2079–2084. IEEE, October 2018
Lebreton, P., Fremerey, S., Raake, A.: V-BMS360: a video extention to the BMS360 image saliency model. In: 2018 IEEE International Conference on Multimedia & Expo Workshops (ICMEW), pp. 1–4. IEEE, July 2018
Nguyen, A., Yan, Z., Nahrstedt, K.: Your attention is unique: detecting 360-degree video saliency in head-mounted display for head movement prediction. In: Proceedings of the 26th ACM International Conference on Multimedia, pp. 1190–1198, October 2018
Zhang, K., Chen, Z.: Video saliency prediction based on spatial-temporal two-stream network. IEEE Trans. Circuits Syst. Video Technol. 29(12), 3544–3557 (2018)
Hu, H.N., Lin, Y.C., Liu, M.Y., Cheng, H.T., Chang, Y.J., Sun, M.: Deep 360 pilot: learning a deep agent for piloting through 360 sports videos. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1396–1405. IEEE, July 2017
Chao, F.Y., Zhang, L., Hamidouche, W., Deforges, O.: SalGAN360: visual saliency prediction on 360 degree images with generative adversarial networks. In: 2018 IEEE International Conference on Multimedia & Expo Workshops (ICMEW), pp. 01–04. IEEE, July 2018
Qiao, M., Xu, M., Wang, Z., Borji, A.: Viewport-dependent saliency prediction in 360\(^\circ \) video. IEEE Trans. Multimed. (2020)
Wang, F., et al.: Residual attention network for image classification. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3156–3164 (2017)
Yang, Z., He, X., Gao, J., Deng, L., Smola, A.: Stacked attention networks for image question answering. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 21–29 (2016)
Tao, A., Sapra, K., Catanzaro, B.: Hierarchical multi-scale attention for semantic segmentation. arXiv preprint arXiv:2005.10821 (2020)
Chen, L.C., Yang, Y., Wang, J., Xu, W., Yuille, A.L.: Attention to scale: Scale-aware semantic image segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3640–3649 (2016)
Rai, Y., Gutiérrez, J., Le Callet, P.: A dataset of head and eye movements for 360 degree images. In: Proceedings of the 8th ACM on Multimedia Systems Conference, pp. 205–210, June 2017
Sitzmann, V., et al.: How do people explore virtual environments?. arXiv preprint arXiv:1612.04335 (2016)
Oliva, A., Torralba, A.: The role of context in object recognition. Trends Cogn. Sci. 11(12), 520–527 (2007)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
Bao, Y., Zhang, T., Pande, A., Wu, H., Liu, X.: Motion-prediction-based multicast for 360-degree video transmissions. In: 2017 14th Annual IEEE International Conference on Sensing, Communication, and Networking (SECON), pp. 1–9. IEEE, June 2017
Bylinskii, Z., Judd, T., Oliva, A., Torralba, A., Durand, F.: What do different evaluation metrics tell us about saliency models? IEEE Trans. Pattern Anal. Mach. Intell. 41(3), 740–757 (2018)
Acknowledgement
This publication has emanated from research supported by Science Foundation Ireland (SFI) under Grant Number SFI/12/RC/2289_P2, co-funded by the European Regional Development Fund, through the SFI Centre for Research Training in Machine Learning (18/CRT/6183).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Dahou, Y., Tliba, M., McGuinness, K., O’Connor, N. (2021). ATSal: An Attention Based Architecture for Saliency Prediction in 360\(^\circ \) Videos. In: Del Bimbo, A., et al. Pattern Recognition. ICPR International Workshops and Challenges. ICPR 2021. Lecture Notes in Computer Science(), vol 12663. Springer, Cham. https://doi.org/10.1007/978-3-030-68796-0_22
Download citation
DOI: https://doi.org/10.1007/978-3-030-68796-0_22
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-68795-3
Online ISBN: 978-3-030-68796-0
eBook Packages: Computer ScienceComputer Science (R0)