Saliency Detection Based on Background and Foreground Modeling

Wang, Zhengbing; Xu, Guili; Cheng, Yuehua; Wang, Zhengsheng

doi:10.1007/978-3-319-71607-7_46

Zhengbing Wang¹⁶,
Guili Xu¹⁶,
Yuehua Cheng¹⁶ &
…
Zhengsheng Wang¹⁶

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 10666))

Included in the following conference series:

International Conference on Image and Graphics

2443 Accesses

Abstract

In this paper, a novel saliency detection algorithm is proposed to fuse both the background and foreground information while detecting salient objects in complex scenes. Firstly, we extract background seeds as well as their spatial information from image borders to construct a background-based saliency map. Then, an optimal contour closure is selected as the foreground region according to the first-stage saliency map. The optimal contour closure can provide a preferable description for salient object. We compute a foreground-based saliency map using the selected foreground region and integrate it with the background-based one. Finally, the unified saliency map is further refined to obtain a more accurate result. Experimental results show that the proposed algorithm can achieve favorable performance compared to the state-of-the-art ones.

You have full access to this open access chapter, Download conference paper PDF

Saliency Detection Based on Foreground and Background Propagation

Improved Salient Object Detection Based on Background Priors

Salient Object Detection Based on the Fusion of Foreground Coarse Extraction and Background Prior

Keywords

1 Introduction

Saliency detection aims at highlighting the most attractive regions in a scene. It has been further studied in recent years, and numerous computational models have been presented. As a preprocessing operation, saliency detection can benefit many other tasks, including image segmentation [9, 14], image compression [4], object localization and recognition [3].

Saliency detection algorithms can be roughly divided into two categories from the perspective of information processing. The top-down approaches [13, 22] driven by specific tasks need to learn the visual information of specific objects to form the saliency maps. In contrast, the bottom-up methods [5, 12, 19, 20] usually exploit low-level cues such as color, lamination and texture to highlight salient objects. Early researches address saliency detection via heuristic principles [12], including contrast prior, center prior and background prior. Most works based on these principles exploit low-level features directly extracted from images [5, 19]. They perform well in many cases, but are still unfavorable in complex scenes. Due to the shortcomings of low-level features, many algorithms are presented to incorporate high-level features in saliency detection. Xie et al. [20] propose a bottom-up approach which integrates both low and mid level cues using the Bayesian framework. Some learning methods [6, 16] are also presented to integrate both low and high level features to compute saliency based on parameters trained from sample images.

Recently, to achieve better performance, some object-level cues are introduced as hints of the foreground. Some examples are shown in Fig. 1. Xie et al. [20] detect the salient points in the image and a convex hull is computed to denote the approximate location of the salient object. Wang et al. [17] binarize the coarse saliency map using an adaptive threshold and select the super-pixels whose saliency values are larger than the threshold as foreground seeds. While the extracted foreground information can be used to improve performance of saliency detection, the false foreground region may have unfavorable influence.

In this paper, we propose an effective method to incorporate foreground information in saliency detection. First, we extract background seeds and their spatial information to construct a background-based saliency map. Then, several compact regions are generated using the contour information. We select the optimal one as the foreground region and calculate the foreground-based saliency map accordingly. To achieve better performance, two saliency maps are finally integrated and further refined.

2 Saliency Detection Algorithm

This section explains the details of the proposed saliency detection algorithm. In order to preserve the structural information, we over-segment the input image to generate N super-pixels [2] and use them as the minimum units. After that, a background-based saliency map is firstly constructed using the background information (Subsect. 2.1). We then select the optimal contour closure as the foreground region according to the first-stage saliency map and compute the foreground-based saliency map (Subsect. 2.2). Finally, these two saliency maps are integrated and further refined to form a more accurate result (Subsect. 2.3). The pipeline of our saliency detection method is illustrated in Fig. 2.

2.1 Saliency Detection via Background Information

Border regions of the image have been proved to be good visual cues for background priors in saliency detection [19]. Observing that background areas are usually connected to the image borders, we select the super-pixels along the image borders as background seeds and define the coarse saliency of super-pixels as their color contrast to the background ones. Denote the background seeds set as BG, and the coarse saliency value of super-pixel $s_{i}$ is computed as

$$\begin{aligned} S_{i}^{c}=\sum _{s_{j} \in BG}d_{c}(s_{i},s_{j}) *w_{l}(s_{i},s_{j}) \end{aligned}$$

(1)

where $d_{c}(s_{i},s_{j})$ is the Euclidean color distance between two super-pixels and $w_{l}(s_{i},s_{j})$ denotes the spatial weight.

As shown in Fig. 2(c), the coarse saliency map may include a large amount of background noises and is visually unfavorable. Therefore, we further consider the spatial information of the selected background seeds to define the background weight for each super-pixel, which can be used to restrain undesirable noises. The process of computing background weight is given as follows: First, we cluster the super-pixels in BG into K clusters using K-means clustering algorithm. The number of clusters K is set to 3 in our experiments as shown in Fig. 3(a). For each cluster k, determine the shortest continuous super-pixel link $SL_{k}$, which contains all the super-pixels belonging to cluster k. Denote the length of this super-pixel link as $L_{s}$, and the background weight for cluster k can be calculated as

$$\begin{aligned} P_{k}=1-\exp (-\alpha (L_{s}+L_{o})) \quad (k=1,2,\cdots ,K) \end{aligned}$$

(2)

where $L_{o}$ is the number of super-pixels belonging to the other clusters in $SL_{k}$. As shown in Fig. 3(b), for each super-pixel $s_{j}$ in cluster k, we assign the same value $P_{k}$ to its background weight $p_{s_{j}}$. The background weights of the remainder super-pixels are determined as

$$\begin{aligned} p_{s_{i}}=\frac{p_{s_{j}^{*}}}{d_{geo}^{*}} \quad (s_{i} \notin BG) \end{aligned}$$

(3)

where $d_{geo}^{*}$ is the shortest geodesic distance from super-pixel $s_{i}$ to the background seeds and $s_{j}^{*}$ is the corresponding seed in BG.

The background-based saliency value of super-pixel $s_{i}$ is finally calculated as

$$\begin{aligned} S_{i}^{b}=S_{i}^{c} *(1-p_{s_{i}}) \end{aligned}$$

(4)

As shown in Fig. 2(e), the background-based saliency map can be substantially improved by considering the spatial information of background seeds. However, some background regions with discriminative appearance are still incorrectly highlighted. The foreground information is therefore incorporated to suppress the background noises.

2.2 Saliency Detection via Optimal Contour Closure

The background-based saliency map can highlighted all the regions with high contrast to the background seeds but may be invalid for background noises. Some recent works [17, 18, 20] incorporate foreground information to restrain noises. However, the false foreground information may have unfavorable influence on saliency detection. According to the research of visual psychology [15], compact regions grouped by contour information can provide important cues for selective attention. We adopt Levinshtein et al.’s mechanism [10] to generate foreground regions. Given the contour image and the assumption that the salient contours that define the boundary of the object will align well with super-pixel boundaries, we obtain several contour closures by solving a parametric maxflow problem as shown in Fig. 4(c). We select the optimal contour closure as

$$\begin{aligned} \mathbf {x}^{*}=\mathop {\arg \min }_{\mathbf {x}^{m}} \sum _{i=1}^{N}|\mathbf {x}_{i}^{m}-S_{i}^{b}|+V(\mathbf {x}^{m}) \quad (m \le M) \end{aligned}$$

(5)

where $\mathbf {x}^{m}$ is a binary mask, which denotes the m-th foreground region (contour closure) and M is the number of previously obtained contour closures. $V(\mathbf {x}^{m})$ denotes the spatial variance of a foreground region.

The selected optimal contour closure is shown in Fig. 4(d), and we collect all the super-pixels in this contour closure to compose the foreground seeds set FG. The foreground-based saliency value of each super-pixel is computed as

$$\begin{aligned} S_{i}^{f}=\sum _{s_{j} \in FG} \frac{1}{d_{c}(s_{i},s_{j})+\beta d_{l}(s_{i},s_{j})} \end{aligned}$$

(6)

where $d_{l}(s_{i},s_{j})$ is the spatial distance between two super-pixels.

2.3 Integration and Refinement Operation

Referring to [17], the background-based saliency map can uniformly highlight the salient object while the foreground-based one can well restrain the background noises. In order to take advantage of both the background-based saliency and the foreground-based one, we integrate two saliency maps as

$$\begin{aligned} S_{i}^{u}=S_{i}^{b} *(1-\exp (-\theta *S_{i}^{f})) \end{aligned}$$

(7)

where $\theta $ is set to 4 in our experiments.

To obtain a better result, we further refine the unified saliency map by the energy function presented in [23]. The used energy function can not only assign large saliency value to foreground region but promote the smoothness of refined saliency map. The energy function is given as

$$\begin{aligned} \begin{aligned} \mathbf {S}^{r}&=\mathop {\arg \min }_{\mathbf {S}}(\sum _{i,j=1}^{N} w_{c}(s_{i},s_{j})(S_{i}-S_{j})^{2} + \sum _{i=1}^{N} p_{s_{i}}S_{i}^{2}\\&+ \sum _{i=1}^{N} S_{i}^{u}(S_{i}-1)^{2}) \end{aligned} \end{aligned}$$

(8)

where $w_{c}(s_{i},s_{j})$ denotes the color similarity between two adjacent super-pixels and $p_{s_{i}}$ is the background weight of super-pixel $s_{i}$ obtained in Subsect. 2.1. $\mathbf {S}^{r}=[S_{1}^{r},S_{2}^{r},\cdots ,S_{N}^{r}]^{T}$ denotes the refined saliency value vector.

3 Experiments

In this section, we evaluate our algorithm on two public datasets: ASD [1] and ECSSD [21]. Both of them consist of 1000 images with pixel-wise labeled ground truth, while the ECSSD dataset is more challenging as many images contain more complex scenes. We compare our algorithm with 7 state-of-the-art methods, including IT [5], FT [1], GB [7], SF [8], XIE [20], BFS [17], and LPS [11].

To make a fair comparison, the precision-recall curve and F-measure are used for quantitative analysis. Given a saliency map, we segment it with the thresholds ranging from 0 to 255, and compare each result with ground truth to generate the precision-recall curve. The precision-recall curves of compared methods are shown in Fig. 5, which demonstrates that our result performs better than others. To compute the F-measure, we first over-segment the original image using the mean-shift algorithm. A binary map can be obtained by a threshold, which is set to twice the mean saliency value. For each binary map, we compute the F-measure as

$$\begin{aligned} F-measure=\frac{(1+\gamma ^{2})Precision \times Recall}{\gamma ^{2}Precision + Recall} \end{aligned}$$

(9)

where $\gamma ^{2}$ is set to 0.3 according to [1]. As shown in Fig. 6, our result achieves the highest recall and F-measure, although the precision is not always the best.

Table 1. Average values of precision and recall for ASD and ECSSD

Full size table

Figure 7 shows some visual comparison results. We note that our method can not only highlight the salient object uniformly, but well restrain the background noises. The presented algorithm achieves good performance against other state-of-the-art methods, especially in complex scenes.

The effectiveness of the proposed algorithm is partially due to the more accurate foreground information compared to the previous methods [17, 18, 20]. To evaluate the foreground information incorporated in the presented algorithm, we compute the precision $p_{F}$ and recall $r_{F}$ for our foreground regions and compare them to the Otsu segmentations used in the BFS [17]. The precision $p_{F}$ and recall $r_{F}$ for each foreground region are calculated as

$$\begin{aligned} \left\{ \begin{array}{rcl} p_{F}=\frac{|R_{F}\bigcap R_{GT}|}{|R_{F}|}\\ r_{F}=\frac{|R_{F}\bigcap R_{GT}|}{|R_{GT}|} \end{array} \right. \end{aligned}$$

(10)

where $R_{F}$ denotes the estimated foreground region and $R_{GT}$ is the ground truth foreground region. The average values of precision and recall for each dataset is shown in Table 1. It indicates that the selected foreground regions are usually more favourable than the Otsu segmentations, since the high-level cue is incorporated.

Note that, the Levinshtein et al.’s mechanism [10] usually generates a dozen of contour closures and we select an optimal one using Eq. (5), which may not always obtain the best region. Figure 8 illustrates a failure case. Figure 8(a) presents all the contour closures generated by [10] and Fig. 8(c) is the selected contour closure. It is clear that the presented method selects an acceptable foreground region instead of the best one.

4 Conclusions

In this paper, we propose an effective method to fuse both the background and foreground information in saliency detection. To efficiently suppress the background noises, we employ two techniques: (1) the background weights defined by the spatial information of background seeds. (2) a foreground-based saliency map constructed from the optimal contour closure. The experimental results show that the presented algorithm can achieve favorable performance compared to the state-of-the-art methods.

References

Achanta, R., Hemami, S., Estrada, F., Susstrunk, S.: Frequency-tuned salient region detection. In: CVPR, pp. 1597–1604 (2009)
Google Scholar
Achanta, R., Shaji, A., Smith, K., Lucchi, A., Fua, P., Susstrunk, S.: SLIC superpixels compared to state-of-the-art superpixel methods. IEEE TPAMI 34(11), 2274 (2012)
Article Google Scholar
Gao, D., Han, S., Vasconcelos, N.: Discriminant saliency, the detection of suspicious coincidences, and applications to visual recognition. IEEE TPAMI 31(6), 989 (2009)
Article Google Scholar
Guo, C., Zhang, L.: A novel multiresolution spatiotemporal saliency detection model and its applications in image and video compression. Oncogene 3(5), 523–529 (2010)
MATH Google Scholar
Itti, L., Koch, C., Niebur, E.: A model of saliency-based visual attention for rapid scene analysis. IEEE TPAMI 20(11), 1254–1259 (1998)
Article Google Scholar
Jiang, H., Wang, J., Yuan, Z., Wu, Y.: Salient object detection: a discriminative regional feature integration approach. Int. J. Comput. Vis. 9(4), 1–18 (2014)
Google Scholar
Jonathan, H., Christof, K., Pietro, P.: Graph-based visual saliency. In: Advances in Neural Information Processing Systems, pp. 545–552 (2006)
Google Scholar
Krahenbuhl, P.: Saliency filters: contrast based filtering for salient region detection. In: CVPR, pp. 733–740 (2012)
Google Scholar
Lempitsky, V., Kohli, P., Rother, C., Sharp, T.: Image segmentation with a bounding box prior. In: ICCV, pp. 277–284 (2009)
Google Scholar
Levinshtein, A., Sminchisescu, C., Dickinson, S.: Optimal contour closure by superpixel grouping. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010. LNCS, vol. 6312, pp. 480–493. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-15552-9_35
Chapter Google Scholar
Li, H., Lu, H., Lin, Z., Shen, X., Price, B.: Inner and inter label propagation: salient object detection in the wild. IEEE TIP 24(10), 3176–86 (2015)
MathSciNet Google Scholar
Liu, H., Tao, S., Li, Z.: Saliency detection via global-object-seed-guided cellular automata. In: ICIP, pp. 2772–2776 (2016)
Google Scholar
Liu, T., Sun, J., Zheng, N.N., Tang, X.: Learning to detect a salient object. In: CVPR, pp. 1–8 (2007)
Google Scholar
Qin, C., Zhang, G., Zhou, Y., Tao, W., Cao, Z.: Integration of the saliency-based seed extraction and random walks for image segmentation. Neurocomputing 129(4), 378–391 (2014)
Article Google Scholar
Qiu, F., Sugihara, T., Von Der Heydt, R.: Figure-ground mechanisms provide structure for selective attention. Nat. Neurosci. 10(11), 1492–1499 (2007)
Article Google Scholar
Siva, P., Russell, C., Xiang, T., Agapito, L.: Looking beyond the image: unsupervised learning for object saliency and detection. In: CVPR, pp. 3238–3245 (2013)
Google Scholar
Wang, J., Lu, H., Li, X., Tong, N., Liu, W.: Saliency detection via background and foreground seed selection. Neurocomputing 152(C), 359–368 (2015)
Article Google Scholar
Wang, Z., Xu, G., Wang, Z., Zhu, C.: Saliency detection integrating both background and foreground information. Neurocomputing 216, 468–477 (2016)
Article Google Scholar
Wei, Y., Wen, F., Zhu, W., Sun, J.: Geodesic saliency using background priors. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012. LNCS, vol. 7574, pp. 29–42. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-33712-3_3
Chapter Google Scholar
Xie, Y., Lu, H., Yang, M.H.: Bayesian saliency via low and mid level cues. IEEE TIP 22(5), 1689–1698 (2013)
MathSciNet MATH Google Scholar
Yan, Q., Xu, L., Shi, J., Jia, J.: Hierarchical saliency detection. In: CVPR, pp. 1155–1162 (2013)
Google Scholar
Zhang, L., Tong, M.H., Marks, T.K., Shan, H., Cottrell, G.W.: Sun: a bayesian framework for saliency using natural statistics. J. Vis. 8(7), 1–20 (2008)
Article Google Scholar
Zhu, W., Liang, S., Wei, Y., Sun, J.: Saliency optimization from robust background detection. In: CVPR, pp. 2814–2821 (2014)
Google Scholar

Download references

Acknowledgments

This work is supported by the National Natural Science Foundation of China (61473148) and the Funding of Jiangsu Innovation Program for Graduate Education (KYLX16-0337).

Author information

Authors and Affiliations

College of Automation Engineering, Nanjing University of Aeronautics and Astronautics, Nanjing, China
Zhengbing Wang, Guili Xu, Yuehua Cheng & Zhengsheng Wang

Authors

Zhengbing Wang
View author publications
You can also search for this author in PubMed Google Scholar
Guili Xu
View author publications
You can also search for this author in PubMed Google Scholar
Yuehua Cheng
View author publications
You can also search for this author in PubMed Google Scholar
Zhengsheng Wang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Guili Xu .

Editor information

Editors and Affiliations

Beijing Jiaotong University, Beijing, China
Yao Zhao
Dalian University of Technology, Dalian, China
Xiangwei Kong
UNSW, Sydney, New South Wales, Australia
David Taubman

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Wang, Z., Xu, G., Cheng, Y., Wang, Z. (2017). Saliency Detection Based on Background and Foreground Modeling. In: Zhao, Y., Kong, X., Taubman, D. (eds) Image and Graphics. ICIG 2017. Lecture Notes in Computer Science(), vol 10666. Springer, Cham. https://doi.org/10.1007/978-3-319-71607-7_46

Download citation

DOI: https://doi.org/10.1007/978-3-319-71607-7_46
Published: 30 December 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-71606-0
Online ISBN: 978-3-319-71607-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

The International Association for Pattern Recognition (opens in a new tab)