Multi-level Net: A Visual Saliency Prediction Model

  • Marcella CorniaEmail author
  • Lorenzo Baraldi
  • Giuseppe Serra
  • Rita Cucchiara
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9914)


State of the art approaches for saliency prediction are based on Fully Convolutional Networks, in which saliency maps are built using the last layer. In contrast, we here present a novel model that predicts saliency maps exploiting a non-linear combination of features coming from different layers of the network. We also present a new loss function to deal with the imbalance issue on saliency masks. Extensive results on three public datasets demonstrate the robustness of our solution. Our model outperforms the state of the art on SALICON, which is the largest and unconstrained dataset available, and obtains competitive results on MIT300 and CAT2000 benchmarks.


Visual saliency Saliency prediction Convolutional neural network Deep learning 


  1. 1.
    Borji, A., Itti, L.: Cat 2000: A large scale fixation dataset for boosting saliency research. In: CVPR 2015 Workshop on “Future of Datasets”, arXiv preprint arXiv:1505.03581 (2015)
  2. 2.
    Buswell, G.T.: How people look at pictures: a study of the psychology and perception in art (1935)Google Scholar
  3. 3.
    Bylinskii, Z., Judd, T., Borji, A., Itti, L., Durand, F., Oliva, A., Torralba, A.: Mit saliency benchmark.
  4. 4.
    Bylinskii, Z., Judd, T., Oliva, A., Torralba, A., Durand, F.: What do different evaluation metrics tell us about saliency models? arXiv preprint arXiv:1604.03605 (2016)
  5. 5.
    Cerf, M., Frady, E.P., Koch, C.: Faces and text attract gaze independent of the task: experimental data and computer model. J. Vis. 9(12), 10–10 (2009)CrossRefGoogle Scholar
  6. 6.
    Gao, D., Vasconcelos, N.: Discriminant saliency for visual recognition from cluttered scenes. In: ANIPS (2004)Google Scholar
  7. 7.
    Glorot, X., Bengio, Y.: Understanding the difficulty of training deep feedforward neural networks. In: International Conference on Artificial Intelligence and Statistics, pp. 249–256 (2010)Google Scholar
  8. 8.
    Goferman, S., Zelnik-Manor, L., Tal, A.: Context-aware saliency detection. IEEE TPAMI 34(10), 1915–1926 (2012)CrossRefGoogle Scholar
  9. 9.
    Hadizadeh, H., Bajic, I.V.: Saliency-aware video compression. IEEE Trans. Image Process. 23(1), 19–33 (2014)MathSciNetCrossRefGoogle Scholar
  10. 10.
    Harel, J., Koch, C., Perona, P.: Graph-based visual saliency. In: ANIPS, pp. 545–552 (2006)Google Scholar
  11. 11.
    Hou, X., Zhang, L.: Saliency detection: a spectral residual approach. In: IEEE International Conference on Computer Vision and Pattern Recognition (2007)Google Scholar
  12. 12.
    Huang, X., Shen, C., Boix, X., Zhao, Q.: SALICON: reducing the semantic gap in saliency prediction by adapting deep neural networks. In: IEEE International Conference on Computer Vision, pp. 262–270 (2015)Google Scholar
  13. 13.
    Itti, L., Koch, C., Niebur, E.: A model of saliency-based visual attention for rapid scene analysis. IEEE TPAMI 11, 1254–1259 (1998)CrossRefGoogle Scholar
  14. 14.
    Jiang, M., Huang, S., Duan, J., Zhao, Q.: Salicon: saliency in context. In: IEEE International Conference on Computer Vision and Pattern Recognition, pp. 1072–1080. IEEE (2015)Google Scholar
  15. 15.
    Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations (2012)Google Scholar
  16. 16.
    Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: IEEE International Conference on Computer Vision (2009)Google Scholar
  17. 17.
    Koch, C., Ullman, S.: Shifts in selective visual attention: towards the underlying neural circuitry. In: Matters of Intelligence, pp. 115–141. Springer, Netherlands (1987)Google Scholar
  18. 18.
    Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: ANIPS, pp. 1097–1105 (2012)Google Scholar
  19. 19.
    Kruthiventi, S.S., Ayush, K., Babu, R.V.: DeepFix: A Fully Convolutional Neural Network for predicting Human Eye Fixations. arXiv preprint arXiv:1510.02927 (2015)
  20. 20.
    Kümmerer, M., Theis, L., Bethge, M.: Deep Gaze I: Boosting saliency prediction with feature maps trained on ImageNet. arXiv preprint arXiv:1411.1045 (2014)
  21. 21.
    Lin, T.-Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollár, P., Zitnick, C.L.: Microsoft COCO: common objects in context. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8693, pp. 740–755. Springer, Heidelberg (2014). doi: 10.1007/978-3-319-10602-1_48 Google Scholar
  22. 22.
    Liu, F., Gleicher, M.: Video retargeting: automating pan and scan. In: ACM International Conference on Multimedia (2006)Google Scholar
  23. 23.
    Liu, N., Han, J., Zhang, D., Wen, S., Liu, T.: Predicting eye fixations using convolutional neural networks. In: IEEE International Conference on Computer Vision and Pattern Recognition (2015)Google Scholar
  24. 24.
    Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: IEEE International Conference on Computer Vision and Pattern Recognition (2015)Google Scholar
  25. 25.
    Pan, J., McGuinness, K., E., S., O’Connor, N., Giró-i Nieto, X.: Shallow and deep convolutional networks for saliency prediction. In: IEEE International Conference on Computer Vision and Pattern Recognition (2016)Google Scholar
  26. 26.
    Peters, R.J., Iyer, A., Itti, L., Koch, C.: Components of bottom-up gaze allocation in natural images. Vis. Res. 45(18), 2397–2416 (2005)CrossRefGoogle Scholar
  27. 27.
    Riche, N., Duvinage, M., Mancas, M., Gosselin, B., Dutoit, T.: Saliency and human fixations: state-of-the-art and study of comparison metrics. In: IEEE International Conference on Computer Vision, pp. 1153–1160 (2013)Google Scholar
  28. 28.
    Riche, N., Mancas, M., Duvinage, M., Mibulumukini, M., Gosselin, B., Dutoit, T.: Rare 2012: a multi-scale rarity-based saliency detection with its comparative statistical analysis. Sig. Process. Image Commun. 28(6), 642–658 (2013)CrossRefGoogle Scholar
  29. 29.
    Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., Huang, Z., Karpathy, A., Khosla, A., Bernstein, M., et al.: Imagenet large scale visual recognition challenge. Int. J. Comput. Vision 115(3), 211–252 (2015)MathSciNetCrossRefGoogle Scholar
  30. 30.
    Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. CoRR abs/1409.1556 (2014)Google Scholar
  31. 31.
    Simonyan, K., Zisserman, A.: Very Deep Convolutional Networks for Large-Scale Image Recognition. CoRR abs/1409.1556 (2014).
  32. 32.
    Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Going deeper with convolutions. In: IEEE International Conference on Computer Vision and Pattern Recognition (2015)Google Scholar
  33. 33.
    Tatler, B.W.: The central fixation bias in scene viewing: selecting an optimal viewing position independently of motor biases and image feature distributions. J. Vis. 7(14), 4–4 (2007)CrossRefGoogle Scholar
  34. 34.
    Torralba, A., Oliva, A., Castelhano, M.S., Henderson, J.M.: Contextual guidance of eye movements and attention in real-world scenes: the role of global features in object search. Psychol. Rev. 113(4), 766 (2006)CrossRefGoogle Scholar
  35. 35.
    Vig, E., Dorr, M., Cox, D.: Large-scale optimization of hierarchical features for saliency prediction in natural images. In: IEEE International Conference on Computer Vision and Pattern Recognition (2014)Google Scholar
  36. 36.
    Yang, Y., Song, M., Li, N., Bu, J., Chen, C.: What Is the Chance of Happening: A New Way to Predict Where People Look. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010. LNCS, vol. 6315, pp. 631–643. Springer, Heidelberg (2010). doi: 10.1007/978-3-642-15555-0_46 CrossRefGoogle Scholar
  37. 37.
    Zhang, J., Sclaroff, S.: Saliency detection: A boolean map approach. In: IEEE International Conference on Computer Vision, pp. 153–160 (2013)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2016

Authors and Affiliations

  • Marcella Cornia
    • 1
    Email author
  • Lorenzo Baraldi
    • 1
  • Giuseppe Serra
    • 1
  • Rita Cucchiara
    • 1
  1. 1.Department of Engineering “Enzo Ferrari”University of Modena and Reggio EmiliaModenaItaly

Personalised recommendations