Visual saliency detection via invariant feature constrained stacked denoising autoencoder

Ma, Yunpeng; Yu, Zhihong; Zhou, Yaqin; Xu, Chang; Yu, Dabing

doi:10.1007/s11042-023-14525-8

Visual saliency detection via invariant feature constrained stacked denoising autoencoder

Published: 15 February 2023

Volume 82, pages 27451–27472, (2023)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

Ma Yunpeng ORCID: orcid.org/0000-0001-6077-3097^1,2,3,
Yu Zhihong^1,3,
Zhou Yaqin^1,2,3,
Xu Chang^1,3 &
…
Yu Dabing^1,3

183 Accesses
1 Altmetric
Explore all metrics

Abstract

Visual saliency detection is usually regarded as an image pre-processing method to predict and locate the position and shape of saliency regions. However, many existing saliency detection methods can only obtain the local or even incorrect position and shape of saliency regions, resulting in incomplete detection and segmentation of the salient target region. In order to solve this problem, a visual saliency detection method based on scale invariant feature and stacked denoising autoencoder is proposed. Firstly, the deep belief network would be pretrained to initialize the parameters of stacked denoising autoencoder network. Secondly, different from traditional features, scale invariant feature is not limited to the size, resolution, and content of original images. At the same time, it can help the network to restore important features of original images more accurately in multi-scale space. So, scale invariant feature is adopted to design the loss function of the network to complete self-training and update the parameters. Finally, the difference between the final reconstructed image obtained by stacked denoising autoencoder and the original is regarded as the final saliency map. In the experiment, we test the performance of the proposed method in both saliency prediction and saliency object segmentation. The experimental results show that the proposed method has good ability in saliency prediction and has the best performance in saliency object segmentation than other comparison saliency prediction methods and saliency object detection methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation

Methods for image denoising using convolutional neural network: a review

Article Open access 10 June 2021

Learning a Deep Convolutional Network for Image Super-Resolution

Abbreviations

DCNN:: Deep convolutional neural networks
CNN:: Convolutional neural networks
DNN:: Deep neural networks
FCN:: Fully convolutional networks
SIFT:: Scale invariant feature algorithm
DAE:: Denoising autoencoder
SDAE:: Stacked denoising autoencoder
DBN:: Deep belief network
BP:: Backpropagation
RBM:: Restricted Boltzmann Machines
DoG:: Difference of Gaussian
EMD:: Earth Mover’s Distance
GTFP:: Ground truth of fixation prediction
GTOS:: Ground truth of saliency object segmentation
CC:: Pearson’s Correlation Coefficient
SIM:: Similarity
MAE:: Mean Absolute Error
AUC:: Area Under Curve
ROC:: Receiver Operating Characteristic
TPR:: True positive rates
FPR:: False positive rates
X :: The input vector
\( \overset{\sim }{X} \) :: The corrupted input vector
Y :: The hidden layer vector
Z :: The output layer vector
S(⋅):: A non-linear activation function
f _e :: Enconder function
f _d :: Decoder function
L(W, p, X, Z):: Loss function
f _g(⋅):: The function of DoG
X _v :: The input sample of RBM
Y _h :: The m-dimensional sample of RBM
v :: A collection of visible training samples
c _j :: the bias of hidden nodes of RBM
P _s :: The saliency maps
W :: The matrix of connection weights from the input layer to the hidden layer
N _L :: The number of layers of the gaussian pyramid
S(x, y):: The final saliency object segmentation result
TN :: The number of negative classes that will be predicted as negative classes
p :: The bias vector of the hidden layer neurons
p ^′ :: The bias vector of the output layer neurons
η :: The learning rate
I _saliency :: The final saliency map
I _{reconstructed} :: The final reconstructed map
I _original :: The original map
G(x, y):: The function of Gaussian convolution
δ :: The parameter of Gaussian pyramid
m :: The number of visible nodes
n :: The number of hidden nodes
b _i :: The bias of visible nodes of RBM
h _i :: The value of node element
F ₁ :: (F1-Score)
Q ^D :: The fixation maps
W ^′ :: The matrix of connection weights from the hidden layer to the output layer
w _ij :: The connection weight between visible node and hidden node
GT(x, y):: The ground truth of saliency object segmentation
FP :: The number of negative classes that will be predicted as positive classes

References

Abdel-Hakim AE, Farag AA (2006) CSIFT: a SIFT descriptor with color invariant characteristics. In: 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Volume 2 (CVPR’06). IEEE, pp 1978–1983
Afsharirad H, Seyedin SA (2019) Correction to: salient object detection using the phase information and object model. Multimed Tools Appl 78:19081. https://doi.org/10.1007/s11042-019-7431-9
Article Google Scholar
Ahlgren P, Jarneving B, Rousseau R (2003) Requirements for a cocitation similarity measure, with special reference to Pearson’s correlation coefficient. J Am Soc Inf Sci Technol 54:550–560
Article Google Scholar
Aytekin C, Possegger H, Mauthner T, Kiranyaz S, Bischof H, Gabbouj M (2018) Spatiotemporal saliency estimation by spectral foreground detection. IEEE Trans Multimed 20:82–95
Article Google Scholar
Borji A, Itti L (2012) Exploiting local and global patch rarities for saliency detection. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition. IEEE, pp 478–485. https://doi.org/10.1109/CVPR.2012.6247711
Borji A, Frintrop S, Sihite DN, Itti L (2012) Adaptive object tracking by learning background context. In: 2012 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops. IEEE, pp 23–30. https://doi.org/10.1109/CVPRW.2012.6239191
Borji A, Cheng M, Jiang H, Li J (2015) Salient object detection: a benchmark. IEEE Trans Image Process 24:5706–5722
Article MathSciNet MATH Google Scholar
Bruce NDB, Tsotsos JK (2005) Saliency based on information maximization. In: Advances in Neural Information Processing Systems. pp 155–162
Bruce N, Tsotsos J (2010) Attention based on information maximization. J Vis 7:950
Article Google Scholar
Chang H-H, Shih TK, Chang CK, Tavanapong W (2019) CMAIR: content and mask-aware image retargeting. Multimed Tools Appl 78:21731–21758. https://doi.org/10.1007/s11042-019-7462-2
Article Google Scholar
Cheng M, Zhang G, Mitra NJ et al (2011) Global contrast based salient region detection. CVPR 2011:409–416
Google Scholar
Cheng H, Zhang J, Wu Q, An P (2019) A computational model for stereoscopic visual saliency prediction. IEEE Trans Multimed 21:678–689
Article Google Scholar
Duan L, Wu C, Miao J et al (2011) Visual saliency detection by spatially weighted dissimilarity. CVPR 2011:473–480
Google Scholar
Duncan J, Humphreys GW (1989) Visual search and stimulus similarity. J Am Soc Inf Sci Technol 96:433–458
Google Scholar
Erdem E, Erdem A (2013) Visual saliency estimation by nonlinearly integrating features using region covariances. J Vis 13:11
Article Google Scholar
Fang Y, Lin W, Lee B et al (2012) Bottom-up saliency detection model based on human visual sensitivity and amplitude Spectrum. IEEE Trans Multimed 14:187–198
Article Google Scholar
Fang S, Li J, Tian Y, Huang T, Chen X (2017) Learning discriminative subspaces on random contrasts for image saliency analysis. IEEE Trans Neural Netw Learn Syst 28:1095–1108
Article Google Scholar
Gao Y, Wang M, Tao D, Ji R, Dai Q (2012) 3-D object retrieval and recognition with hypergraph analysis. IEEE Trans Image Process 21:4290–4303. https://doi.org/10.1109/TIP.2012.2199502
Article MathSciNet MATH Google Scholar
Gao Y, Shi M, Tao D, Xu C (2015) Database saliency for fast image retrieval. IEEE Trans Multimed 17:359–369
Article Google Scholar
Goferman S, Zelnik-Manor L, Tal A (2010) Context-aware saliency detection. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. IEEE, pp 2376–2383. https://doi.org/10.1109/CVPR.2010.5539929
Harel J, Koch C, Perona P (2007) Graph-based visual saliency. In: Advances in Neural Information Processing Systems 19. The MIT Press, pp 545–552. https://doi.org/10.7551/mitpress/7503.003.0073
He J, Feng J, Liu X, et al (2012) Mobile product search with bag of hash bits and boundary reranking. In: 2012 IEEE conference on computer vision and pattern recognition. pp. 3005–3012
Hinton GE, Osindero S, Teh Y (2006) A fast learning algorithm for deep belief nets. Neural Comput 18:1527–1554
Article MathSciNet MATH Google Scholar
Hou X, Harel J, Koch C (2012) Image signature: highlighting sparse salient regions. IEEE Trans Pattern Anal Mach Intell 34:194–201
Article Google Scholar
Huang F, Qi J, Lu H, Zhang L, Ruan X (2017) Salient object detection via multiple instance learning. IEEE Trans Image Process 26:1911–1922
Article MathSciNet MATH Google Scholar
Jerripothula KR, Cai J, Yuan J (2016) Image co-segmentation via saliency co-fusion. IEEE Trans Multimed 18:1896–1909. https://doi.org/10.1109/TMM.2016.2576283
Article Google Scholar
Jia S, Bruce NDB (2020) EML-NET: An expandable multi-layer NETwork for saliency prediction. Image Vis Comput 95:103887. https://doi.org/10.1016/j.imavis.2020.103887
Article Google Scholar
Judd T, Ehinger K, Durand F, Torralba A (2009) Learning to predict where humans look. In: 2009 IEEE 12th International Conference on Computer Vision. IEEE, pp 2106–2113
Ke Y, Sukthankar R (2004) PCA-SIFT: a more distinctive representation for local image descriptors. In: Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004. IEEE, pp 506–513
Kim K-S, Yoon Y-J, Kang M-C et al (2014) An improved GrabCut using a saliency map. In: 2014 IEEE 3rd Global Conference on Consumer Electronics (GCCE). IEEE, pp 317–318
Kuen J, Wang Z, Wang G (2016) Recurrent Attentional Networks for Saliency Detection. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, pp 3668–3677
Le Roux N, Bengio Y (2008) Representational power of restricted Boltzmann machines and deep belief networks. Neural Comput 20:1631–1649
Article MathSciNet MATH Google Scholar
Li X, Lu H, Zhang L et al (2013) Saliency detection via dense and sparse reconstruction. In: 2013 IEEE International Conference on Computer Vision. IEEE, pp 2976–2983
Li H, Lu H, Lin Z, Shen X, Price B (2015) Inner and inter label propagation: salient object detection in the wild. IEEE Trans Image Process 24:3176–3186
Article MathSciNet MATH Google Scholar
Liu F, Shen T, Lou S, Han B (2017) Deep network saliency detection based on global model and local optimization. Acta Opt Sin 37:272–280
Google Scholar
Liu N, Han J, Yang M-H (2018) PiCANet: learning pixel-wise contextual attention for saliency detection. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. IEEE, pp 3089–3098
Lowe DG (2004) Distinctive image features from scale-invariant keypoints. Int J Comput Vis 60:91–110
Article Google Scholar
Ma C, Miao Z, Zhang X, Li M (2017) A saliency prior context model for real-time object tracking. IEEE Trans Multimed 19:2415–2424
Article Google Scholar
Mahadevan V, Vasconcelos N (2013) Biologically inspired object tracking using center-surround saliency mechanisms. IEEE Trans Pattern Anal Mach Intell 35:541–554
Article Google Scholar
Margolin R, Tal A, Zelnik-Manor L (2013) What makes a patch distinct? In: 2013 IEEE Conference on Computer Vision and Pattern Recognition. IEEE, pp 1139–1146
Qian X, Wang H, Zhao Y, Hou X, Hong R, Wang M, Tang YY (2017) Image location inference by multisaliency enhancement. IEEE Trans Multimed 19:813–821. https://doi.org/10.1109/TMM.2016.2638207
Article Google Scholar
Rafiee, G., Woo, et al (2013) Region-of-interest extraction in low depth of field images using ensemble clustering and difference of Gaussian approaches. Pattern Recognit J Pattern Recognit Soc 46:2685–2699
Rahtu E, Kannala J, Salo M, Heikkila J (2010) Segmenting salient objects from images and videos. In: computer vision - ECCV 2010. P.V. Springer, Heraklion, Crete, Greece, pp 366–379
Google Scholar
Ren Z, Gao S, Chia L-T, Tsang IW-H (2014) Region-based saliency detection and its application in object recognition. IEEE Trans Circuits Syst Video Technol 24:769–779. https://doi.org/10.1109/TCSVT.2013.2280096
Article Google Scholar
Riaz S, Park U, Lee S-W (2016) A photograph reconstruction by object retargeting for better composition. Multimed Tools Appl 75:16439–16460. https://doi.org/10.1007/s11042-015-3037-z
Article Google Scholar
Rubner Y, Tomasi C, Guibas LJ (2000) The earth Mover’s distance as a metric for image retrieval. Int J Comput Vis 40:99–121
Article MATH Google Scholar
Tavakoli HR, Laaksonen J (2017) Bottom-up fixation prediction using unsupervised hierarchical models. In: Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), pp 287–302
Vincent P, Larochelle H, Lajoie I et al (2010) Stacked Denoising autoencoders: learning useful representations in a deep network with a local Denoising criterion. J Mach Learn Res 11:3371–3408
MathSciNet MATH Google Scholar
Vinyals O, Toshev A, Bengio S, Erhan D (2015) Show and tell: A neural image caption generator. In: 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, pp 3156–3164
Wang L, Lu H, Ruan X, Yang M-H (2015) Deep networks for saliency detection via local estimation and global search. In: 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, pp 3183–3192
Wrede, B., Tscherepanow, et al (2012) A saliency map based on sampling an image into random rectangular regions of interest. Pattern Recognit J Pattern Recognit Soc 45:3114–3124
Xia C, Qi F, Shi G (2016) Bottom–up visual saliency estimation with deep autoencoder-based sparse reconstruction. IEEE Trans Neural Netw Learn Syst 27:1227–1240
Article MathSciNet Google Scholar
Xiao X, Zhou Y, Gong Y (2019) RGB-‘D’ saliency detection with Pseudo depth. IEEE Trans Image Process 28:2126–2139
Article MathSciNet Google Scholar
Xiao S, Li T, Wang J (2020) Optimization methods of video images processing for mobile object recognition. Multimed Tools Appl 79:17245–17255. https://doi.org/10.1007/s11042-019-7423-9
Article Google Scholar
Yang C, Zhang L, Lu H (2013) Graph-regularized saliency detection with convex-Hull-based center prior. IEEE Signal Process Lett 20:637–640
Article Google Scholar
Yang X, Qian X, Xue Y (2015) Scalable Mobile image retrieval by exploring contextual saliency. IEEE Trans Image Process 24:1709–1721. https://doi.org/10.1109/TIP.2015.2411433
Article MathSciNet MATH Google Scholar
Yang S, Lin G, Jiang Q, Lin W (2020) A dilated inception network for visual saliency prediction. IEEE Trans Multimed 22:2163–2176
Article Google Scholar
Ye L, Liu Z, Li L, Shen L, Bai C, Wang Y (2017) Salient object segmentation via effective integration of saliency and Objectness. IEEE Trans Multimed 19:1742–1756. https://doi.org/10.1109/TMM.2017.2693022
Article Google Scholar
Zhai Y, Shah M, Shah PM (2006) Visual attention detection in video sequences using spatiotemporal cues. In: In: Proceedings of the 14th annual ACM international conference on Multimedia. ACM Press, Santa Barbara, pp 815–824
Chapter Google Scholar
Zhang P, Wang D, Lu H et al (2017) Amulet: aggregating multi-level convolutional features for salient object detection. In: 2017 IEEE International Conference on Computer Vision (ICCV). IEEE, pp 202–211
Zhao T, Wu X (2019) Pyramid feature attention network for saliency detection. In: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, pp 3080–3089
Zhao R, Ouyang W, Li H, Wang X (2015) Saliency detection by multi-context deep learning. In: 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, pp 1265–1274
Zhou H, Yuan Y, Shi C (2009) Object tracking using SIFT features and mean shift. Comput Vis Image Underst 113:345–352
Article Google Scholar
Zhu W, Liang S, Wei Y, Sun J (2014) Saliency optimization from robust background detection. In: 2014 IEEE Conference on Computer Vision and Pattern Recognition. IEEE, pp 2814–2821

Download references

Funding

This work was supported in part by National Natural Science Foundation of China under Grant (62001156, 62201197), the Fundamental Research Funds for the Central Universities (B220201037), the Key Research and Development Program of Jiangsu Province under Grant (BE2021042, BE2020649) and Jiangsu Excellent Postdoctoral Program.

Author information

Authors and Affiliations

Key Laboratory of Sensor Networks and Environmental Sensing, Hohai University, Changzhou, 213022, China
Ma Yunpeng, Yu Zhihong, Zhou Yaqin, Xu Chang & Yu Dabing
Jiangsu Key Laboratory of Power Transmission and Distribution Equipment Technology, Hohai University, Changzhou, 213022, China
Ma Yunpeng & Zhou Yaqin
College of Internet of Things Engineering, Hohai University, Changzhou, 213022, China
Ma Yunpeng, Yu Zhihong, Zhou Yaqin, Xu Chang & Yu Dabing

Authors

Ma Yunpeng
View author publications
You can also search for this author in PubMed Google Scholar
Yu Zhihong
View author publications
You can also search for this author in PubMed Google Scholar
Zhou Yaqin
View author publications
You can also search for this author in PubMed Google Scholar
Xu Chang
View author publications
You can also search for this author in PubMed Google Scholar
Yu Dabing
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ma Yunpeng.

Ethics declarations

Conflict of interest

The authors declare no conflicts of interest.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Ma, Y., Yu, Z., Zhou, Y. et al. Visual saliency detection via invariant feature constrained stacked denoising autoencoder. Multimed Tools Appl 82, 27451–27472 (2023). https://doi.org/10.1007/s11042-023-14525-8

Download citation

Received: 14 April 2021
Revised: 07 May 2022
Accepted: 31 January 2023
Published: 15 February 2023
Issue Date: July 2023
DOI: https://doi.org/10.1007/s11042-023-14525-8

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Visual saliency detection via invariant feature constrained stacked denoising autoencoder

Abstract

Access this article

Similar content being viewed by others

Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation

Methods for image denoising using convolutional neural network: a review

Learning a Deep Convolutional Network for Image Super-Resolution

Abbreviations

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Visual saliency detection via invariant feature constrained stacked denoising autoencoder

Abstract

Access this article

Similar content being viewed by others

Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation

Methods for image denoising using convolutional neural network: a review

Learning a Deep Convolutional Network for Image Super-Resolution

Abbreviations

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation