Advertisement

n-Reference Transfer Learning for Saliency Prediction

Conference paper
  • 591 Downloads
Part of the Lecture Notes in Computer Science book series (LNCS, volume 12353)

Abstract

Benefiting from deep learning research and large-scale datasets, saliency prediction has achieved significant success in the past decade. However, it still remains challenging to predict saliency maps on images in new domains that lack sufficient data for data-hungry models. To solve this problem, we propose a few-shot transfer learning paradigm for saliency prediction, which enables efficient transfer of knowledge learned from the existing large-scale saliency datasets to a target domain with limited labeled samples. Specifically, few target domain samples are used as the reference to train a model with a source domain dataset such that the training process can converge to a local minimum in favor of the target domain. Then, the learned model is further fine-tuned with the reference. The proposed framework is gradient-based and model-agnostic. We conduct comprehensive experiments and ablation study on various source domain and target domain pairs. The results show that the proposed framework achieves a significant performance improvement. The code is publicly available at https://github.com/luoyan407/n-reference.

Keywords

Deep learning Saliency prediction n-shot transfer learning 

Notes

Acknowledgments

This research was funded in part by the NSF under Grants 1908711, 1849107, in part by the University of Minnesota Department of Computer Science and Engineering Start-up Fund (QZ), and in part supported by the National Research Foundation, Singapore under its Strategic Capability Research Centres Funding Initiative. Any opinions, findings and conclusions or recommendations expressed in this material are those of the author(s) and do not reflect the views of National Research Foundation, Singapore.

Supplementary material

504445_1_En_30_MOESM1_ESM.pdf (218 kb)
Supplementary material 1 (pdf 217 KB)

References

  1. 1.
    Bäuml, B., Tulbure, A.: Deep n-shot transfer learning for tactile material classification with a flexible pressure-sensitive skin. In: International Conference on Robotics and Automation, pp. 4262–4268 (2019)Google Scholar
  2. 2.
    Bengio, Y.: Deep learning of representations for unsupervised and transfer learning. In: Proceedings of ICML Workshop on Unsupervised and Transfer Learning, pp. 17–36 (2012)Google Scholar
  3. 3.
    Borji, A., Itti, L.: CAT2000: a large scale fixation dataset for boosting saliency research. In: CVPR 2015 Workshop on “Future of Datasets” (2015)Google Scholar
  4. 4.
    Borji, A., Sihite, D.N., Itti, L.: Quantitative analysis of human-model agreement in visual saliency modeling: a comparative study. IEEE Trans. Image Process. 22(1), 55–69 (2013)MathSciNetCrossRefGoogle Scholar
  5. 5.
    Bylinskii, Z., Judd, T., Oliva, A., Torralba, A., Durand, F.: What do different evaluation metrics tell us about saliency models? IEEE Trans. Pattern Anal. Mach. Intell. 41(3), 740–757 (2018)CrossRefGoogle Scholar
  6. 6.
    Cornia, M., Baraldi, L., Serra, G., Cucchiara, R.: A deep multi-level network for saliency prediction. In: International Conference on Pattern Recognition, pp. 3488–3493 (2016)Google Scholar
  7. 7.
    Cornia, M., Baraldi, L., Serra, G., Cucchiara, R.: Predicting human eye fixations via an LSTM-based saliency attentive model. IEEE Trans. Image Process. 27(10), 5142–5154 (2018)MathSciNetCrossRefGoogle Scholar
  8. 8.
    Csurka, G.: A comprehensive survey on domain adaptation for visual applications. In: Csurka, G. (ed.) Domain Adaptation in Computer Vision Applications. ACVPR, pp. 1–35. Springer, Cham (2017).  https://doi.org/10.1007/978-3-319-58347-1_1CrossRefGoogle Scholar
  9. 9.
    Daume III, H., Marcu, D.: Domain adaptation for statistical classifiers. J. Artif. Intell. Res. 26, 101–126 (2006)MathSciNetCrossRefGoogle Scholar
  10. 10.
    Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: ImageNet: a large-scale hierarchical image database. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 248–255 (2009)Google Scholar
  11. 11.
    Fei-Fei, L., Fergus, R., Perona, P.: One-shot learning of object categories. IEEE Trans. Pattern Anal. Mach. Intell. 28(4), 594–611 (2006)Google Scholar
  12. 12.
    Ge, W., Yu, Y.: Borrowing treasures from the wealthy: deep transfer learning through selective joint fine-tuning. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 1086–1095 (2017)Google Scholar
  13. 13.
    Guo, Y., Shi, H., Kumar, A., Grauman, K., Rosing, T., Feris, R.: SpotTune: transfer learning through adaptive fine-tuning. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 4805–4814 (2019)Google Scholar
  14. 14.
    Harel, J., Koch, C., Perona, P.: Graph-based visual saliency. In: Advances in Neural Information Processing Systems, pp. 545–552 (2007)Google Scholar
  15. 15.
    He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)Google Scholar
  16. 16.
    Hou, X., Harel, J., Koch, C.: Image signature: highlighting sparse salient regions. IEEE Trans. Pattern Anal. Mach. Intell. 34(1), 194–201 (2011)Google Scholar
  17. 17.
    Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 4700–4708 (2017)Google Scholar
  18. 18.
    Itti, L., Dhavale, N., Pighin, F.: Realistic avatar eye and head animation using a neurobiological model of visual attention. In: Proceedings of SPIE 48th Annual International Symposium on Optical Science and Technology, August 2003Google Scholar
  19. 19.
    Itti, L., Koch, C., Niebur, E.: A model of saliency-based visual attention for rapid scene analysis. IEEE Trans. Pattern Anal. Mach. Intell. 20(11), 1254–1259 (1998)CrossRefGoogle Scholar
  20. 20.
    Jiang, M., Huang, S., Duan, J., Zhao, Q.: SALICON: saliency in context. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 1072–1080 (2015)Google Scholar
  21. 21.
    Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations. MIT Technical report (2012)Google Scholar
  22. 22.
    Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: IEEE International Conference on Computer Vision, pp. 2106–2113 (2009)Google Scholar
  23. 23.
    Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. In: International Conference on Learning Representations (2015)Google Scholar
  24. 24.
    Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems, pp. 1097–1105 (2012)Google Scholar
  25. 25.
    Kruthiventi, S.S., Ayush, K., Babu, R.V.: DeepFix: a fully convolutional neural network for predicting human eye fixations. IEEE Trans. Image Process. 26(9), 4446–4456 (2017)MathSciNetCrossRefGoogle Scholar
  26. 26.
    Kümmerer, M., Theis, L., Bethge, M.: Deep Gaze I: boosting saliency prediction with feature maps trained on imageNet. In: International Conference on Learning Representations (ICLR 2015), pp. 1–12 (2014)Google Scholar
  27. 27.
    Lake, B., Salakhutdinov, R., Gross, J., Tenenbaum, J.: One shot learning of simple visual concepts. In: Proceedings of the Annual Meeting of the Cognitive Science Society (2011)Google Scholar
  28. 28.
    Lake, B.M., Salakhutdinov, R., Tenenbaum, J.B.: Human-level concept learning through probabilistic program induction. Science 350(6266), 1332–1338 (2015)MathSciNetCrossRefGoogle Scholar
  29. 29.
    Lee, K., Maji, S., Ravichandran, A., Soatto, S.: Meta-learning with differentiable convex optimization. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 10657–10665 (2019)Google Scholar
  30. 30.
    Li, J., Wong, Y., Zhao, Q., Kankanhalli, M.S.: Attention transfer from web images for video recognition. In: ACM Multimedia, pp. 1–9 (2017)Google Scholar
  31. 31.
    Li, J., Xu, Z., Wong, Y., Zhao, Q., Kankanhalli, M.S.: GradMix: multi-source transfer across domains and tasks. In: IEEE Winter Conference on Applications of Computer Vision (2020)Google Scholar
  32. 32.
    Li, W., Wang, L., Xu, J., Huo, J., Gao, Y., Luo, J.: Revisiting local descriptor based image-to-class measure for few-shot learning. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 7260–7268 (2019)Google Scholar
  33. 33.
    Luo, Y., Wong, Y., Kankanhalli, M., Zhao, Q.: Direction concentration learning: enhancing congruency in machine learning. IEEE Trans. Pattern Anal. Mach. Intell. (2019) Google Scholar
  34. 34.
    Mohri, M., Rostamizadeh, A., Talwalkar, A.: Foundations of Machine Learning. The MIT Press, Cambridge (2012)zbMATHGoogle Scholar
  35. 35.
    Neider, M.B., Zelinsky, G.J.: Scene context guides eye movements during visual search. Vis. Res. 46(5), 614–621 (2006)Google Scholar
  36. 36.
    Ouerhani, N., Von Wartburg, R., Hugli, H., Müri, R.: Empirical validation of the saliency-based model of visual attention. Electron. Lett. Comput. Vis. Image Anal. 3(1), 13–24 (2004)Google Scholar
  37. 37.
    Pan, J., et al.: SalGAN: visual saliency prediction with generative adversarial networks. arXiv preprint arXiv:1701.01081 (2017)
  38. 38.
    Pan, S.J., Tsang, I.W., Kwok, J.T., Yang, Q.: Domain adaptation via transfer component analysis. IEEE Trans. Neural Netw. 22(2), 199–210 (2010)CrossRefGoogle Scholar
  39. 39.
    Paszke, A., et al.: Automatic differentiation in PyTorch. In: NIPS Autodiff Workshop (2017)Google Scholar
  40. 40.
    Rothenstein, A.L., Tsotsos, J.K.: Attention links sensing to recognition. Image Vis. Comput. 26(1), 114–126 (2008)Google Scholar
  41. 41.
    Shan, W., Sun, G., Zhou, X., Liu, Z.: Two-stage transfer learning of end-to-end convolutional neural networks for webpage saliency prediction. In: Sun, Y., Lu, H., Zhang, L., Yang, J., Huang, H. (eds.) IScIDE 2017. LNCS, vol. 10559, pp. 316–324. Springer, Cham (2017).  https://doi.org/10.1007/978-3-319-67777-4_27CrossRefGoogle Scholar
  42. 42.
    Shen, C., Zhao, Q.: Webpage saliency. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8695, pp. 33–46. Springer, Cham (2014).  https://doi.org/10.1007/978-3-319-10584-0_3CrossRefGoogle Scholar
  43. 43.
    Shrivastava, A., Malisiewicz, T., Gupta, A., Efros, A.A.: Data-driven visual similarity for cross-domain image matching. ACM Trans. Graph. 30(6), 154 (2011)Google Scholar
  44. 44.
    Sung, F., Yang, Y., Zhang, L., Xiang, T., Torr, P.H., Hospedales, T.M.: Learning to compare: relation network for few-shot learning. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 1199–1208 (2018)Google Scholar
  45. 45.
    Torralba, A., Oliva, A., Castelhano, M.S., Henderson, J.M.: Contextual guidance of eye movements and attention in real-world scenes: the role of global features in object search. Psychol. Rev. 113(4), 766 (2006)Google Scholar
  46. 46.
    Vinyals, O., Blundell, C., Lillicrap, T., Kavukcuoglu, K., Wierstra, D.: Matching networks for one shot learning. In: Advances in Neural Information Processing Systems, pp. 3630–3638 (2016)Google Scholar
  47. 47.
    Wolfe, J.M., Horowitz, T.S.: Five factors that guide attention in visual search. Nat. Hum. Behav. 1(3), 0058 (2017)Google Scholar
  48. 48.
    Xie, S., Girshick, R., Dollár, P., Tu, Z., He, K.: Aggregated residual transformations for deep neural networks. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 5987–5995 (2017)Google Scholar
  49. 49.
    Li, X., Grandvalet, Y., Davoine, F.: Explicit inductive bias for transfer learning with convolutional networks. In: International Conference on Machine Learning, pp. 2830–2839 (2018)Google Scholar
  50. 50.
    Yang, S., Lin, G., Jiang, Q., Lin, W.: A dilated inception network for visual saliency prediction. IEEE Trans. Multimedia 22(8), 2163–2176 (2020)Google Scholar
  51. 51.
    Yang, Y., Ma, Z., Hauptmann, A.G., Sebe, N.: Feature selection for multimedia analysis by sharing information among multiple tasks. IEEE Trans. Multimedia 15(3), 661–669 (2012)Google Scholar
  52. 52.
    Zhang, J., Sclaroff, S.: Saliency detection: a Boolean map approach. In: IEEE International Conference on Computer Vision, pp. 153–160 (2013)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2020

Authors and Affiliations

  1. 1.Department of Computer Science and EngineeringUniversity of MinnesotaMinneapolisUSA
  2. 2.School of ComputingNational University of SingaporeSingaporeSingapore

Personalised recommendations