Skip to main content

ATSal: An Attention Based Architecture for Saliency Prediction in 360\(^\circ \) Videos

  • Conference paper
  • First Online:
Pattern Recognition. ICPR International Workshops and Challenges (ICPR 2021)

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 12663))

Included in the following conference series:

Abstract

The spherical domain representation of 360\(^\circ \) video/image presents many challenges related to the storage, processing, transmission and rendering of omnidirectional videos (ODV). Models of human visual attention can be used so that only a single viewport is rendered at a time, which is important when developing systems that allow users to explore ODV with head mounted displays (HMD). Accordingly, researchers have proposed various saliency models for 360\(^\circ \) video/images. This paper proposes ATSal, a novel attention based (head-eye) saliency model for 360\(^\circ \) videos. The attention mechanism explicitly encodes global static visual attention allowing expert models to focus on learning the saliency on local patches throughout consecutive frames. We compare the proposed approach to other state-of-the-art saliency models on two datasets: Salient360! and VR-EyeTracking. Experimental results on over 80 ODV videos (75K+ frames) show that the proposed method outperforms the existing state-of-the-art.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Xu, M., Li, C., Zhang, S., Le Callet, P.: State-of-the-art in 360 video/image processing: perception, assessment and compression. IEEE J. Sel. Top. Signal Process. 14(1), 5–26 (2020)

    Article  Google Scholar 

  2. De Abreu, A., Ozcinar, C., Smolic, A.: Look around you: saliency maps for omnidirectional images in VR applications. In: 2017 Ninth International Conference on Quality of Multimedia Experience (QoMEX), pp. 1–6. IEEE, May 2017

    Google Scholar 

  3. Itti, L., Koch, C.: A saliency-based search mechanism for overt and covert shifts of visual attention. Vision Res. 40(10–12), 1489–1506 (2000)

    Article  Google Scholar 

  4. Pan, J., ET AL.: SalGAN: visual saliency prediction with generative adversarial networks. arXiv preprint arXiv:1701.01081 (2017)

  5. Borji, A.: Saliency prediction in the deep learning era: an empirical investigation. arXiv preprint arXiv:1810.03716. 10 (2018)

  6. Xu, Y., et al.: Gaze prediction in dynamic 360 immersive videos. In: proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5333–5342 (2018)

    Google Scholar 

  7. Droste, R., Jiao, J., Noble, J.A.: Unified image and video saliency modeling. arXiv preprint arXiv:2003.05477 (2020)

  8. Min, K., Corso, J.J.: TASED-net: temporally-aggregating spatial encoder-decoder network for video saliency detection. In Proceedings of the IEEE International Conference on Computer Vision, pp. 2394–2403. ISO 690 (2019)

    Google Scholar 

  9. Lai, Q., Wang, W., Sun, H., Shen, J.: Video saliency prediction using spatiotemporal residual attentive networks. IEEE Trans. Image Process. 29, 1113–1126 (2019)

    Article  MathSciNet  Google Scholar 

  10. Linardos, P., Mohedano, E., Nieto, J.J., O’Connor, N.E., Giro-i-Nieto, X., McGuinness, K.: Simple vs complex temporal recurrences for video saliency prediction. In: British Machine Vision Conference (BMVC) (2019)

    Google Scholar 

  11. Djilali, Y.A.D., Sayah, M., McGuinness, K., O’Connor, N.E.: 3DSAL: an efficient 3D-CNN architecture for video saliency prediction (2020)

    Google Scholar 

  12. Wang, W., Shen, J., Guo, F., Cheng, M.M., Borji, A.: Revisiting video saliency: A large-scale benchmark and a new model. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4894–4903 (2018)

    Google Scholar 

  13. Bak, C., Kocak, A., Erdem, E., Erdem, A.: Spatio-temporal saliency networks for dynamic saliency prediction. IEEE Trans. Multimedia 20(7), 1688–1698 (2017)

    Article  Google Scholar 

  14. Pan, J., et al.: SalGAN: visual saliency prediction with adversarial networks. In: CVPR Scene Understanding Workshop (SUNw), July 2017

    Google Scholar 

  15. Bogdanova, I., Bur, A., Hügli, H., Farine, P.A.: Dynamic visual attention on the sphere. Comput. Vis. Image Underst. 114(1), 100–110 (2010)

    Article  Google Scholar 

  16. Bogdanova, I., Bur, A., Hugli, H.: Visual attention on the sphere. IEEE Trans. Image Process. 17(11), 2000–2014 (2008)

    Article  MathSciNet  Google Scholar 

  17. Rai, Y., Le Callet, P., Guillotel, P.: Which saliency weighting for omni directional image quality assessment?. In: 2017 Ninth International Conference on Quality of Multimedia Experience (QoMEX), pp. 1–6. IEEE, May 2017

    Google Scholar 

  18. Xu, M., Song, Y., Wang, J., Qiao, M., Huo, L., Wang, Z.: Predicting head movement in panoramic video: a deep reinforcement learning approach. IEEE Trans. Pattern Anal. Mach. Intell. 41(11), 2693–2708 (2018)

    Article  Google Scholar 

  19. Sitzmann, V., et al.: Saliency in VR: how do people explore virtual environments? IEEE Trans. Visual Comput. Graphics 24(4), 1633–1642 (2018)

    Article  Google Scholar 

  20. Huang, X., Shen, C., Boix, X., Zhao, Q.: SALICON: reducing the semantic gap in saliency prediction by adapting deep neural networks. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 262–270 (2015)

    Google Scholar 

  21. Lebreton, P., Raake, A.: GBVS360, BMS360, ProSal: extending existing saliency prediction models from 2D to omnidirectional images. Signal Process. Image Commun. 69, 69–78 (2018)

    Article  Google Scholar 

  22. Zhang, J., Sclaroff, S.: Saliency detection: a Boolean map approach. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 153–160 (2013)

    Google Scholar 

  23. Harel, J., Koch, C., Perona, P.: Graph-based visual saliency. In: Advances In Neural Information Processing Systems, pp. 545–552 (2007)

    Google Scholar 

  24. Maugey, T., Le Meur, O., Liu, Z.: Saliency-based navigation in omnidirectional image. In: IEEE 19th International Workshop on Multimedia Signal Processing (MMSP). Luton 2017, pp. 1–6 (2017)

    Google Scholar 

  25. Battisti, F., Baldoni, S., Brizzi, M., Carli, M.: A feature-based approach for saliency estimation of omni-directional images. Signal Process. Image Commun. 69, 53–59 (2018)

    Article  Google Scholar 

  26. Fang, Y., Zhang, X., Imamoglu, N.: A novel superpixel-based saliency detection model for 360-degree images. Signal Process. Image Commun. 69, 1–7 (2018)

    Article  Google Scholar 

  27. David, EJ., Gutiérrez, J., Coutrot, A., Da Silva, M. P., Callet, P.L.: A dataset of head and eye movements for 360 videos. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 432–437. ISO 690, June 2018

    Google Scholar 

  28. Zhang, Z., Xu, Y., Yu, J., Gao, S.: Saliency detection in 360 videos. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 488–503 (2018)

    Google Scholar 

  29. Cheng, H.T., Chao, C.H., Dong, J.D., Wen, H.K., Liu, T.L., Sun, M.: Cube padding for weakly-supervised saliency prediction in 360 videos. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1420–1429 (2018)

    Google Scholar 

  30. Suzuki, T., Yamanaka, T.: Saliency map estimation for omni-directional image considering prior distributions. In: 2018 IEEE International Conference on Systems, Man, and Cybernetics (SMC), pp. 2079–2084. IEEE, October 2018

    Google Scholar 

  31. Lebreton, P., Fremerey, S., Raake, A.: V-BMS360: a video extention to the BMS360 image saliency model. In: 2018 IEEE International Conference on Multimedia & Expo Workshops (ICMEW), pp. 1–4. IEEE, July 2018

    Google Scholar 

  32. Nguyen, A., Yan, Z., Nahrstedt, K.: Your attention is unique: detecting 360-degree video saliency in head-mounted display for head movement prediction. In: Proceedings of the 26th ACM International Conference on Multimedia, pp. 1190–1198, October 2018

    Google Scholar 

  33. Zhang, K., Chen, Z.: Video saliency prediction based on spatial-temporal two-stream network. IEEE Trans. Circuits Syst. Video Technol. 29(12), 3544–3557 (2018)

    Article  Google Scholar 

  34. Hu, H.N., Lin, Y.C., Liu, M.Y., Cheng, H.T., Chang, Y.J., Sun, M.: Deep 360 pilot: learning a deep agent for piloting through 360 sports videos. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1396–1405. IEEE, July 2017

    Google Scholar 

  35. Chao, F.Y., Zhang, L., Hamidouche, W., Deforges, O.: SalGAN360: visual saliency prediction on 360 degree images with generative adversarial networks. In: 2018 IEEE International Conference on Multimedia & Expo Workshops (ICMEW), pp. 01–04. IEEE, July 2018

    Google Scholar 

  36. Qiao, M., Xu, M., Wang, Z., Borji, A.: Viewport-dependent saliency prediction in 360\(^\circ \) video. IEEE Trans. Multimed. (2020)

    Google Scholar 

  37. Wang, F., et al.: Residual attention network for image classification. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3156–3164 (2017)

    Google Scholar 

  38. Yang, Z., He, X., Gao, J., Deng, L., Smola, A.: Stacked attention networks for image question answering. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 21–29 (2016)

    Google Scholar 

  39. Tao, A., Sapra, K., Catanzaro, B.: Hierarchical multi-scale attention for semantic segmentation. arXiv preprint arXiv:2005.10821 (2020)

  40. Chen, L.C., Yang, Y., Wang, J., Xu, W., Yuille, A.L.: Attention to scale: Scale-aware semantic image segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3640–3649 (2016)

    Google Scholar 

  41. Rai, Y., Gutiérrez, J., Le Callet, P.: A dataset of head and eye movements for 360 degree images. In: Proceedings of the 8th ACM on Multimedia Systems Conference, pp. 205–210, June 2017

    Google Scholar 

  42. Sitzmann, V., et al.: How do people explore virtual environments?. arXiv preprint arXiv:1612.04335 (2016)

  43. Oliva, A., Torralba, A.: The role of context in object recognition. Trends Cogn. Sci. 11(12), 520–527 (2007)

    Article  Google Scholar 

  44. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)

    Google Scholar 

  45. Bao, Y., Zhang, T., Pande, A., Wu, H., Liu, X.: Motion-prediction-based multicast for 360-degree video transmissions. In: 2017 14th Annual IEEE International Conference on Sensing, Communication, and Networking (SECON), pp. 1–9. IEEE, June 2017

    Google Scholar 

  46. Bylinskii, Z., Judd, T., Oliva, A., Torralba, A., Durand, F.: What do different evaluation metrics tell us about saliency models? IEEE Trans. Pattern Anal. Mach. Intell. 41(3), 740–757 (2018)

    Article  Google Scholar 

Download references

Acknowledgement

This publication has emanated from research supported by Science Foundation Ireland (SFI) under Grant Number SFI/12/RC/2289_P2, co-funded by the European Regional Development Fund, through the SFI Centre for Research Training in Machine Learning (18/CRT/6183).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yasser Dahou .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Dahou, Y., Tliba, M., McGuinness, K., O’Connor, N. (2021). ATSal: An Attention Based Architecture for Saliency Prediction in 360\(^\circ \) Videos. In: Del Bimbo, A., et al. Pattern Recognition. ICPR International Workshops and Challenges. ICPR 2021. Lecture Notes in Computer Science(), vol 12663. Springer, Cham. https://doi.org/10.1007/978-3-030-68796-0_22

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-68796-0_22

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-68795-3

  • Online ISBN: 978-3-030-68796-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics