Skip to main content
Log in

Visual saliency detection via invariant feature constrained stacked denoising autoencoder

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

Visual saliency detection is usually regarded as an image pre-processing method to predict and locate the position and shape of saliency regions. However, many existing saliency detection methods can only obtain the local or even incorrect position and shape of saliency regions, resulting in incomplete detection and segmentation of the salient target region. In order to solve this problem, a visual saliency detection method based on scale invariant feature and stacked denoising autoencoder is proposed. Firstly, the deep belief network would be pretrained to initialize the parameters of stacked denoising autoencoder network. Secondly, different from traditional features, scale invariant feature is not limited to the size, resolution, and content of original images. At the same time, it can help the network to restore important features of original images more accurately in multi-scale space. So, scale invariant feature is adopted to design the loss function of the network to complete self-training and update the parameters. Finally, the difference between the final reconstructed image obtained by stacked denoising autoencoder and the original is regarded as the final saliency map. In the experiment, we test the performance of the proposed method in both saliency prediction and saliency object segmentation. The experimental results show that the proposed method has good ability in saliency prediction and has the best performance in saliency object segmentation than other comparison saliency prediction methods and saliency object detection methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Algorithm 1
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

Abbreviations

DCNN:

Deep convolutional neural networks

CNN:

Convolutional neural networks

DNN:

Deep neural networks

FCN:

Fully convolutional networks

SIFT:

Scale invariant feature algorithm

DAE:

Denoising autoencoder

SDAE:

Stacked denoising autoencoder

DBN:

Deep belief network

BP:

Backpropagation

RBM:

Restricted Boltzmann Machines

DoG:

Difference of Gaussian

EMD:

Earth Mover’s Distance

GTFP:

Ground truth of fixation prediction

GTOS:

Ground truth of saliency object segmentation

CC:

Pearson’s Correlation Coefficient

SIM:

Similarity

MAE:

Mean Absolute Error

AUC:

Area Under Curve

ROC:

Receiver Operating Characteristic

TPR:

True positive rates

FPR:

False positive rates

X :

The input vector

\( \overset{\sim }{X} \) :

The corrupted input vector

Y :

The hidden layer vector

Z :

The output layer vector

S(⋅):

A non-linear activation function

f e :

Enconder function

f d :

Decoder function

L(W, p, X, Z):

Loss function

f g(⋅):

The function of DoG

X v :

The input sample of RBM

Y h :

The m-dimensional sample of RBM

v :

A collection of visible training samples

c j :

the bias of hidden nodes of RBM

P s :

The saliency maps

W :

The matrix of connection weights from the input layer to the hidden layer

N L :

The number of layers of the gaussian pyramid

S(x, y):

The final saliency object segmentation result

TN :

The number of negative classes that will be predicted as negative classes

p :

The bias vector of the hidden layer neurons

p :

The bias vector of the output layer neurons

η :

The learning rate

I saliency :

The final saliency map

I reconstructed :

The final reconstructed map

I original :

The original map

G(x, y):

The function of Gaussian convolution

δ :

The parameter of Gaussian pyramid

m :

The number of visible nodes

n :

The number of hidden nodes

b i :

The bias of visible nodes of RBM

h i :

The value of node element

F 1 :

(F1-Score)

Q D :

The fixation maps

W :

The matrix of connection weights from the hidden layer to the output layer

w ij :

The connection weight between visible node and hidden node

GT(x, y):

The ground truth of saliency object segmentation

FP :

The number of negative classes that will be predicted as positive classes

References

  1. Abdel-Hakim AE, Farag AA (2006) CSIFT: a SIFT descriptor with color invariant characteristics. In: 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Volume 2 (CVPR’06). IEEE, pp 1978–1983

  2. Afsharirad H, Seyedin SA (2019) Correction to: salient object detection using the phase information and object model. Multimed Tools Appl 78:19081. https://doi.org/10.1007/s11042-019-7431-9

    Article  Google Scholar 

  3. Ahlgren P, Jarneving B, Rousseau R (2003) Requirements for a cocitation similarity measure, with special reference to Pearson’s correlation coefficient. J Am Soc Inf Sci Technol 54:550–560

    Article  Google Scholar 

  4. Aytekin C, Possegger H, Mauthner T, Kiranyaz S, Bischof H, Gabbouj M (2018) Spatiotemporal saliency estimation by spectral foreground detection. IEEE Trans Multimed 20:82–95

    Article  Google Scholar 

  5. Borji A, Itti L (2012) Exploiting local and global patch rarities for saliency detection. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition. IEEE, pp 478–485. https://doi.org/10.1109/CVPR.2012.6247711

  6. Borji A, Frintrop S, Sihite DN, Itti L (2012) Adaptive object tracking by learning background context. In: 2012 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops. IEEE, pp 23–30. https://doi.org/10.1109/CVPRW.2012.6239191

  7. Borji A, Cheng M, Jiang H, Li J (2015) Salient object detection: a benchmark. IEEE Trans Image Process 24:5706–5722

    Article  MathSciNet  MATH  Google Scholar 

  8. Bruce NDB, Tsotsos JK (2005) Saliency based on information maximization. In: Advances in Neural Information Processing Systems. pp 155–162

  9. Bruce N, Tsotsos J (2010) Attention based on information maximization. J Vis 7:950

    Article  Google Scholar 

  10. Chang H-H, Shih TK, Chang CK, Tavanapong W (2019) CMAIR: content and mask-aware image retargeting. Multimed Tools Appl 78:21731–21758. https://doi.org/10.1007/s11042-019-7462-2

    Article  Google Scholar 

  11. Cheng M, Zhang G, Mitra NJ et al (2011) Global contrast based salient region detection. CVPR 2011:409–416

    Google Scholar 

  12. Cheng H, Zhang J, Wu Q, An P (2019) A computational model for stereoscopic visual saliency prediction. IEEE Trans Multimed 21:678–689

    Article  Google Scholar 

  13. Duan L, Wu C, Miao J et al (2011) Visual saliency detection by spatially weighted dissimilarity. CVPR 2011:473–480

    Google Scholar 

  14. Duncan J, Humphreys GW (1989) Visual search and stimulus similarity. J Am Soc Inf Sci Technol 96:433–458

    Google Scholar 

  15. Erdem E, Erdem A (2013) Visual saliency estimation by nonlinearly integrating features using region covariances. J Vis 13:11

    Article  Google Scholar 

  16. Fang Y, Lin W, Lee B et al (2012) Bottom-up saliency detection model based on human visual sensitivity and amplitude Spectrum. IEEE Trans Multimed 14:187–198

    Article  Google Scholar 

  17. Fang S, Li J, Tian Y, Huang T, Chen X (2017) Learning discriminative subspaces on random contrasts for image saliency analysis. IEEE Trans Neural Netw Learn Syst 28:1095–1108

    Article  Google Scholar 

  18. Gao Y, Wang M, Tao D, Ji R, Dai Q (2012) 3-D object retrieval and recognition with hypergraph analysis. IEEE Trans Image Process 21:4290–4303. https://doi.org/10.1109/TIP.2012.2199502

    Article  MathSciNet  MATH  Google Scholar 

  19. Gao Y, Shi M, Tao D, Xu C (2015) Database saliency for fast image retrieval. IEEE Trans Multimed 17:359–369

    Article  Google Scholar 

  20. Goferman S, Zelnik-Manor L, Tal A (2010) Context-aware saliency detection. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. IEEE, pp 2376–2383. https://doi.org/10.1109/CVPR.2010.5539929

  21. Harel J, Koch C, Perona P (2007) Graph-based visual saliency. In: Advances in Neural Information Processing Systems 19. The MIT Press, pp 545–552. https://doi.org/10.7551/mitpress/7503.003.0073

  22. He J, Feng J, Liu X, et al (2012) Mobile product search with bag of hash bits and boundary reranking. In: 2012 IEEE conference on computer vision and pattern recognition. pp. 3005–3012

  23. Hinton GE, Osindero S, Teh Y (2006) A fast learning algorithm for deep belief nets. Neural Comput 18:1527–1554

    Article  MathSciNet  MATH  Google Scholar 

  24. Hou X, Harel J, Koch C (2012) Image signature: highlighting sparse salient regions. IEEE Trans Pattern Anal Mach Intell 34:194–201

    Article  Google Scholar 

  25. Huang F, Qi J, Lu H, Zhang L, Ruan X (2017) Salient object detection via multiple instance learning. IEEE Trans Image Process 26:1911–1922

    Article  MathSciNet  MATH  Google Scholar 

  26. Jerripothula KR, Cai J, Yuan J (2016) Image co-segmentation via saliency co-fusion. IEEE Trans Multimed 18:1896–1909. https://doi.org/10.1109/TMM.2016.2576283

    Article  Google Scholar 

  27. Jia S, Bruce NDB (2020) EML-NET: An expandable multi-layer NETwork for saliency prediction. Image Vis Comput 95:103887. https://doi.org/10.1016/j.imavis.2020.103887

    Article  Google Scholar 

  28. Judd T, Ehinger K, Durand F, Torralba A (2009) Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision. IEEE, pp 2106–2113

  29. Ke Y, Sukthankar R (2004) PCA-SIFT: a more distinctive representation for local image descriptors. In: Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004. IEEE, pp 506–513

  30. Kim K-S, Yoon Y-J, Kang M-C et al (2014) An improved GrabCut using a saliency map. In: 2014 IEEE 3rd Global Conference on Consumer Electronics (GCCE). IEEE, pp 317–318

  31. Kuen J, Wang Z, Wang G (2016) Recurrent Attentional Networks for Saliency Detection. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, pp 3668–3677

  32. Le Roux N, Bengio Y (2008) Representational power of restricted Boltzmann machines and deep belief networks. Neural Comput 20:1631–1649

    Article  MathSciNet  MATH  Google Scholar 

  33. Li X, Lu H, Zhang L et al (2013) Saliency detection via dense and sparse reconstruction. In: 2013 IEEE International Conference on Computer Vision. IEEE, pp 2976–2983

  34. Li H, Lu H, Lin Z, Shen X, Price B (2015) Inner and inter label propagation: salient object detection in the wild. IEEE Trans Image Process 24:3176–3186

    Article  MathSciNet  MATH  Google Scholar 

  35. Liu F, Shen T, Lou S, Han B (2017) Deep network saliency detection based on global model and local optimization. Acta Opt Sin 37:272–280

    Google Scholar 

  36. Liu N, Han J, Yang M-H (2018) PiCANet: learning pixel-wise contextual attention for saliency detection. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. IEEE, pp 3089–3098

  37. Lowe DG (2004) Distinctive image features from scale-invariant keypoints. Int J Comput Vis 60:91–110

    Article  Google Scholar 

  38. Ma C, Miao Z, Zhang X, Li M (2017) A saliency prior context model for real-time object tracking. IEEE Trans Multimed 19:2415–2424

    Article  Google Scholar 

  39. Mahadevan V, Vasconcelos N (2013) Biologically inspired object tracking using center-surround saliency mechanisms. IEEE Trans Pattern Anal Mach Intell 35:541–554

    Article  Google Scholar 

  40. Margolin R, Tal A, Zelnik-Manor L (2013) What makes a patch distinct? In: 2013 IEEE Conference on Computer Vision and Pattern Recognition. IEEE, pp 1139–1146

  41. Qian X, Wang H, Zhao Y, Hou X, Hong R, Wang M, Tang YY (2017) Image location inference by multisaliency enhancement. IEEE Trans Multimed 19:813–821. https://doi.org/10.1109/TMM.2016.2638207

    Article  Google Scholar 

  42. Rafiee, G., Woo, et al (2013) Region-of-interest extraction in low depth of field images using ensemble clustering and difference of Gaussian approaches. Pattern Recognit J Pattern Recognit Soc 46:2685–2699

  43. Rahtu E, Kannala J, Salo M, Heikkila J (2010) Segmenting salient objects from images and videos. In: computer vision - ECCV 2010. P.V. Springer, Heraklion, Crete, Greece, pp 366–379

    Google Scholar 

  44. Ren Z, Gao S, Chia L-T, Tsang IW-H (2014) Region-based saliency detection and its application in object recognition. IEEE Trans Circuits Syst Video Technol 24:769–779. https://doi.org/10.1109/TCSVT.2013.2280096

    Article  Google Scholar 

  45. Riaz S, Park U, Lee S-W (2016) A photograph reconstruction by object retargeting for better composition. Multimed Tools Appl 75:16439–16460. https://doi.org/10.1007/s11042-015-3037-z

    Article  Google Scholar 

  46. Rubner Y, Tomasi C, Guibas LJ (2000) The earth Mover’s distance as a metric for image retrieval. Int J Comput Vis 40:99–121

    Article  MATH  Google Scholar 

  47. Tavakoli HR, Laaksonen J (2017) Bottom-up fixation prediction using unsupervised hierarchical models. In: Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), pp 287–302

  48. Vincent P, Larochelle H, Lajoie I et al (2010) Stacked Denoising autoencoders: learning useful representations in a deep network with a local Denoising criterion. J Mach Learn Res 11:3371–3408

    MathSciNet  MATH  Google Scholar 

  49. Vinyals O, Toshev A, Bengio S, Erhan D (2015) Show and tell: A neural image caption generator. In: 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, pp 3156–3164

  50. Wang L, Lu H, Ruan X, Yang M-H (2015) Deep networks for saliency detection via local estimation and global search. In: 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, pp 3183–3192

  51. Wrede, B., Tscherepanow, et al (2012) A saliency map based on sampling an image into random rectangular regions of interest. Pattern Recognit J Pattern Recognit Soc 45:3114–3124

  52. Xia C, Qi F, Shi G (2016) Bottom–up visual saliency estimation with deep autoencoder-based sparse reconstruction. IEEE Trans Neural Netw Learn Syst 27:1227–1240

    Article  MathSciNet  Google Scholar 

  53. Xiao X, Zhou Y, Gong Y (2019) RGB-‘D’ saliency detection with Pseudo depth. IEEE Trans Image Process 28:2126–2139

    Article  MathSciNet  Google Scholar 

  54. Xiao S, Li T, Wang J (2020) Optimization methods of video images processing for mobile object recognition. Multimed Tools Appl 79:17245–17255. https://doi.org/10.1007/s11042-019-7423-9

    Article  Google Scholar 

  55. Yang C, Zhang L, Lu H (2013) Graph-regularized saliency detection with convex-Hull-based center prior. IEEE Signal Process Lett 20:637–640

    Article  Google Scholar 

  56. Yang X, Qian X, Xue Y (2015) Scalable Mobile image retrieval by exploring contextual saliency. IEEE Trans Image Process 24:1709–1721. https://doi.org/10.1109/TIP.2015.2411433

    Article  MathSciNet  MATH  Google Scholar 

  57. Yang S, Lin G, Jiang Q, Lin W (2020) A dilated inception network for visual saliency prediction. IEEE Trans Multimed 22:2163–2176

    Article  Google Scholar 

  58. Ye L, Liu Z, Li L, Shen L, Bai C, Wang Y (2017) Salient object segmentation via effective integration of saliency and Objectness. IEEE Trans Multimed 19:1742–1756. https://doi.org/10.1109/TMM.2017.2693022

    Article  Google Scholar 

  59. Zhai Y, Shah M, Shah PM (2006) Visual attention detection in video sequences using spatiotemporal cues. In: In: Proceedings of the 14th annual ACM international conference on Multimedia. ACM Press, Santa Barbara, pp 815–824

    Chapter  Google Scholar 

  60. Zhang P, Wang D, Lu H et al (2017) Amulet: aggregating multi-level convolutional features for salient object detection. In: 2017 IEEE International Conference on Computer Vision (ICCV). IEEE, pp 202–211

  61. Zhao T, Wu X (2019) Pyramid feature attention network for saliency detection. In: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, pp 3080–3089

  62. Zhao R, Ouyang W, Li H, Wang X (2015) Saliency detection by multi-context deep learning. In: 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, pp 1265–1274

  63. Zhou H, Yuan Y, Shi C (2009) Object tracking using SIFT features and mean shift. Comput Vis Image Underst 113:345–352

    Article  Google Scholar 

  64. Zhu W, Liang S, Wei Y, Sun J (2014) Saliency optimization from robust background detection. In: 2014 IEEE Conference on Computer Vision and Pattern Recognition. IEEE, pp 2814–2821

Download references

Funding

This work was supported in part by National Natural Science Foundation of China under Grant (62001156, 62201197), the Fundamental Research Funds for the Central Universities (B220201037), the Key Research and Development Program of Jiangsu Province under Grant (BE2021042, BE2020649) and Jiangsu Excellent Postdoctoral Program.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ma Yunpeng.

Ethics declarations

Conflict of interest

The authors declare no conflicts of interest.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Ma, Y., Yu, Z., Zhou, Y. et al. Visual saliency detection via invariant feature constrained stacked denoising autoencoder. Multimed Tools Appl 82, 27451–27472 (2023). https://doi.org/10.1007/s11042-023-14525-8

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-023-14525-8

Keywords

Navigation