Skip to main content

How Well Current Saliency Prediction Models Perform on UAVs Videos?

Part of the Lecture Notes in Computer Science book series (LNIP,volume 11678)

Abstract

It is exciting to witness the fast development of Unmanned Aerial Vehicle (UAV) imaging which opens the door to many new applications. In view of developing rich and efficient services, we wonder which strategy should be adopted to predict salience in UAV videos. To that end, we introduce here a benchmark of off-the-shelf state-of-the-art models for saliency prediction. This benchmark monitors two challenging aspects related to salience, namely the peculiar characteristics of UAV contents and the temporal dimension of videos. This paper enables to identify the strengths and weaknesses of current static, dynamic, supervised and unsupervised models for drone videos. Eventually, we highlight several strategies for the development of visual attention in UAV videos.

Keywords

  • Benchmark
  • Salience
  • Dynamic saliency models
  • Unmanned Aerial Vehicles (UAV)
  • Videos

The presented work is funded by the ongoing research project ANR ASTRID DISSOCIE (Automated Detection of SaliencieS from Operators’ Point of View and Intelligent Compression of DronE videos) referenced as ANR-17-ASTR-0009.

This is a preview of subscription content, access via your institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • DOI: 10.1007/978-3-030-29888-3_25
  • Chapter length: 13 pages
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
eBook
USD   69.99
Price excludes VAT (USA)
  • ISBN: 978-3-030-29888-3
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
Softcover Book
USD   89.99
Price excludes VAT (USA)
Fig. 1.
Fig. 2.
Fig. 3.

References

  1. Bak, C., Kocak, A., Erdem, E., Erdem, A.: Spatio-temporal saliency networks for dynamic saliency prediction. IEEE Trans. Multimedia 20(7), 1688–1698 (2018)

    CrossRef  Google Scholar 

  2. Bazzani, L., Larochelle, H., Torresani, L.: Recurrent mixture density network for spatiotemporal visual attention. arXiv preprint arXiv:1603.08199 (2016)

  3. Borji, A.: Saliency prediction in the deep learning era: an empirical investigation. arXiv preprint arXiv:1810.03716 (2018)

  4. Bruckert, A., Tavakoli, H.R., Liu, Z., Christie, M., Meur, O.L.: Deep saliency models: the quest for the loss function. arXiv preprint arXiv:1907.02336 (2019)

  5. Bylinskii, Z., Judd, T., Oliva, A., Torralba, A., Durand, F.: What do different evaluation metrics tell us about saliency models? IEEE Trans. Pattern Anal. Mach. Intell. 41(3), 740–757 (2019)

    CrossRef  Google Scholar 

  6. Bylinskii, Z., et al.: MIT saliency benchmark (2015)

    Google Scholar 

  7. Cornia, M., Baraldi, L., Serra, G., Cucchiara, R.: A deep multi-level network for saliency prediction. In: 2016 23rd International Conference on Pattern Recognition (ICPR), pp. 3488–3493. IEEE (2016)

    Google Scholar 

  8. Jain, S.D., Xiong, B., Grauman, K.: FusionSeg: learning to combine motion and appearance for fully automatic segmentation of generic objects in videos. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3664–3673 (2017)

    Google Scholar 

  9. Fang, Y., Wang, Z., Lin, W., Fang, Z.: Video saliency incorporating spatiotemporal cues and uncertainty weighting. IEEE Trans. Image Process. 23(9), 3910–3921 (2014)

    MathSciNet  CrossRef  Google Scholar 

  10. Foulsham, T., Kingstone, A., Underwood, G.: Turning the world around: patterns in saccade direction vary with picture orientation. Vis. Res. 48(17), 1777–1790 (2008)

    CrossRef  Google Scholar 

  11. Guo, C., Zhang, L.: A novel multiresolution spatiotemporal saliency detection model and its applications in image and video compression. IEEE Trans. Image Process. 19(1), 185–198 (2010)

    MathSciNet  CrossRef  Google Scholar 

  12. Guo, X., Cui, L., Park, B., Ding, W., Lockhart, M., Kim, I.: How will humans cut through automated vehicle platoons in mixed traffic environments? A simulation study of drivers’ gaze behaviors based on the dynamic areas-of-interest. In: Adams, S., Beling, P., Lambert, J., Scherer, W., Fleming, C. (eds.) Systems Engineering in Context. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-00114-8_55

    CrossRef  Google Scholar 

  13. Harel, J., Koch, C., Perona, P.: Graph-based visual saliency. In: Advances in Neural Information Processing Systems, pp. 545–552 (2007)

    Google Scholar 

  14. Hossein Khatoonabadi, S., Vasconcelos, N., Bajic, I.V., Shan, Y.: How many bits does it take for a stimulus to be salient? In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2015

    Google Scholar 

  15. Hou, X., Harel, J., Koch, C.: Image signature: highlighting sparse salient regions. IEEE Trans. Pattern Anal. Mach. Intell. 34(1), 194–201 (2012). https://doi.org/10.1109/TPAMI.2011.146

    CrossRef  Google Scholar 

  16. Howard, I.P., Rogers, B.: Depth perception. Stevens Handb. Exp. Psychol. 6, 77–120 (2002)

    Google Scholar 

  17. Huang, X., Shen, C., Boix, X., Zhao, Q.: SALICON: reducing the semantic gap in saliency prediction by adapting deep neural networks. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 262–270 (2015)

    Google Scholar 

  18. Itti, L., Koch, C., Niebur, E.: A model of saliency-based visual attention for rapid scene analysis. IEEE Trans. Pattern Anal. Mach. Intell. 20(11), 1254–1259 (1998)

    CrossRef  Google Scholar 

  19. Jia, Y., et al.: Caffe: convolutional architecture for fast feature embedding. In: Proceedings of the 22nd ACM International Conference on Multimedia, MM 2014, pp. 675–678. ACM (2014)

    Google Scholar 

  20. Jiang, L., Xu, M., Liu, T., Qiao, M., Wang, Z.: DeepVS: a deep learning based video saliency prediction approach. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) Computer Vision – ECCV 2018. LNCS, vol. 11218, pp. 625–642. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01264-9_37

    CrossRef  Google Scholar 

  21. Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2106–2113. IEEE (2009)

    Google Scholar 

  22. Kim, D.K., Chen, T.: Deep neural network for real-time autonomous indoor navigation. arXiv preprint arXiv:1511.04668 (2015)

  23. Krassanakis, V., Filippakopoulou, V., Nakos, B.: EyeMMV toolbox: an eye movement post-analysis tool based on a two-step spatial dispersion threshold for fixation identification. J. Eye Mov. Res. 7(1) (2014). https://doi.org/10.16910/jemr.7.1.1

  24. Krassanakis, V., Perreira Da Silva, M., Ricordel, V.: Monitoring human visual behavior during the observation of unmanned aerial vehicles (UAVs) videos. Drones 2(4), 36 (2018)

    CrossRef  Google Scholar 

  25. Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: Pereira, F., Burges, C.J.C., Bottou, L., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems, vol. 25, pp. 1097–1105. Curran Associates, Inc. (2012)

    Google Scholar 

  26. Kümmerer, M., Wallis, T.S.A., Bethge, M.: Saliency benchmarking made easy: separating models, maps and metrics. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11220, pp. 798–814. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01270-0_47

    CrossRef  Google Scholar 

  27. Le Meur, O., Baccino, T.: Methods for comparing scanpaths and saliency maps: strengths and weaknesses. Behav. Res. Method 45(1), 251–266 (2013)

    CrossRef  Google Scholar 

  28. Le Meur, O., Le Callet, P., Barba, D.: Predicting visual fixations on video based on low-level visual features. Vis. Res. 47(19), 2483–2498 (2007)

    CrossRef  Google Scholar 

  29. Li, G., Xie, Y., Wei, T., Wang, K., Lin, L.: Flow guided recurrent neural encoder for video salient object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3243–3252 (2018)

    Google Scholar 

  30. Mueller, M., Smith, N., Ghanem, B.: A benchmark and simulator for UAV tracking. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9905, pp. 445–461. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46448-0_27

    CrossRef  Google Scholar 

  31. Murray, N., Vanrell, M., Otazu, X., Parraga, C.A.: Saliency estimation using a non-parametric low-level vision model. In: CVPR 2011, pp. 433–440, June 2011

    Google Scholar 

  32. Ninassi, A., Le Meur, O., Le Callet, P., Barba, D.: Does where you gaze on an image affect your perception of quality? Applying visual attention to image quality metric. In: 2007 IEEE International Conference on Image Processing, vol. 2, p. II-169. IEEE (2007)

    Google Scholar 

  33. Pan, J., et al.: SalGAN: visual saliency prediction with generative adversarial networks. arXiv preprint arXiv:1701.01081 (2017)

  34. Pan, J., Sayrol, E., Giro-i Nieto, X., McGuinness, K., O’Connor, N.E.: Shallow and deep convolutional networks for saliency prediction. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 598–606 (2016)

    Google Scholar 

  35. Riche, N., Mancas, M., Duvinage, M., Mibulumukini, M., Gosselin, B., Dutoit, T.: RARE2012: a multi-scale rarity-based saliency detection with its comparative statistical analysis. Signal Process. Image Commun. 28(6), 642–658 (2013)

    CrossRef  Google Scholar 

  36. Rudoy, D., Goldman, D.B., Shechtman, E., Zelnik-Manor, L.: Learning video saliency from human gaze using candidate selection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1147–1154 (2013)

    Google Scholar 

  37. Sokalski, J., Breckon, T.P., Cowling, I.: Automatic salient object detection in UAV imagery. In: Proceedings of the 25th International Unmanned Air Vehicle Systems, pp. 1–12 (2010)

    Google Scholar 

  38. Trinh, H., Li, J., Miyazawa, S., Moreno, J., Pankanti, S.: Efficient UAV video event summarization. In: Proceedings of the 21st International Conference on Pattern Recognition (ICPR 2012), pp. 2226–2229. IEEE (2012)

    Google Scholar 

  39. Tseng, P.H., Carmi, R., Cameron, I.G., Munoz, D.P., Itti, L.: Quantifying center bias of observers in free viewing of dynamic natural scenes. J. Vis. 9(7), 4 (2009)

    CrossRef  Google Scholar 

  40. Vig, E., Dorr, M., Cox, D.: Large-scale optimization of hierarchical features for saliency prediction in natural images. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2798–2805 (2014)

    Google Scholar 

  41. Wang, Z., Ren, J., Zhang, D., Sun, M., Jiang, J.: A deep-learning based feature hybrid framework for spatiotemporal saliency detection inside videos. Neurocomputing 287, 68–83 (2018)

    CrossRef  Google Scholar 

  42. Zhang, J., Sclaroff, S.: Exploiting surroundedness for saliency detection: a boolean map approach. IEEE Trans. Pattern Anal. Mach. Intell. 38(5), 889–902 (2016)

    CrossRef  Google Scholar 

  43. Zhang, L., Tong, M.H., Marks, T.K., Shan, H., Cottrell, G.W.: SUN: a Bayesian framework for saliency using natural statistics. J. Vis. 8(7), 32 (2008)

    CrossRef  Google Scholar 

  44. Zhao, Y., Ma, J., Li, X., Zhang, J.: Saliency detection and deep learning-based wildfire identification in UAV imagery. Sensors 18(3), 712 (2018)

    CrossRef  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Anne-Flore Perrin .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and Permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Verify currency and authenticity via CrossMark

Cite this paper

Perrin, AF., Zhang, L., Le Meur, O. (2019). How Well Current Saliency Prediction Models Perform on UAVs Videos?. In: Vento, M., Percannella, G. (eds) Computer Analysis of Images and Patterns. CAIP 2019. Lecture Notes in Computer Science(), vol 11678. Springer, Cham. https://doi.org/10.1007/978-3-030-29888-3_25

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-29888-3_25

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-29887-6

  • Online ISBN: 978-3-030-29888-3

  • eBook Packages: Computer ScienceComputer Science (R0)