# Salient object detection using the phase information and object model

- 87 Downloads

## Abstract

One of the most important features of saliency detection algorithms is to reduce the size of processing data for algorithms with higher processing size such as object detection algorithms. A main condition for algorithms of saliency detection to be used in detecting the object in the image is their low processing size and broadness of the application extent while having acceptable precision. In this article we introduce a Salient Object Detection method using Task Simulation (SOD-TS). This method has a low processing size and wide functional domain using task simulation (object model). Our proposed method has a wide range of application including ship detection, words and letter detection in texts, etc. Relying on the task simulation (object model), SOD-TS method detects the salient object which is the best response to the current task. It uses the information of the frequency domain.

## Keywords

Salient regions detection Fourier transform (FT) phase Frequency domain Object detection Ship detection Optical character recognition (OCR) Task simulation## 1 Introduction

The main reason for using visual attention algorithms is the reduction of the considerable size of data for higher level processes that usually require heavy processes. Approximately 108 bits of information enter into the human visual system. The online processing of such large information is impossible (or at least very costly) without being helped by from visual attention [8]. The use of visual attention leads to process of considerably smaller size of data such that the processed regions would include more significant and exciting information with more information than the adjacent regions. Salient region detection has several applications as explained by Borji, et al. (2013) among which we can refer to object detection, object extraction, image segmentation, tracking, etc.

In terms of the used models, saliency detection methods are classified in two groups: Top-Down (TD) attention and Bottom-Up (BU) attention. In BU (i.e. the sense-driven) methods, lower-level features such as the color, intensity and orientation are used for detecting the salient regions, while in TD (i.e. expectation-driven) methods, cognitive phenomena such as the expectations, knowledge and response to current goals are used as the characteristic of higher level features. Accessibility of lower-level features is one of the advantages of BU methods in which the region in question is indeed a region whose lower-level features are different from other regions. Such methods are specifically suitable for the images with a single salient object; but if we have several salient objects in the image, BU methods detect the most salient one while the people with different interests may perceive different objects as the most salient in the image. For example, in an image with a car and a bicycle. Even if the car is more salient in the image, a bicycler may perceive the region of the bicycle more saliently. TD methods don’t face with such a weakness because their algorithm detect the region in question based on the current task. Indeed, the algorithm detects a region with the best response to the current task. But a main problem of TD methods lays in modelling the cognitive phenomena such as the task. The algorithms using TD methods have a higher processing level because they use neural networks [47] and/or deep learning [29] to model the cognitive phenomena, or they extract the features to find the salient regions such that they fulfill the task simulation process.

One of the broadest applications of visual attention is the object detection. In this process, first the regions with more probability of object presence are specified by regions detection methods; then at the mentioned region the intended object is specified using higher level algorithms of object detection which have higher processing size. The intended regions are detected by several methods each of which has its own strengths and weaknesses. A main problem of such methods is that have either high speed with low precision or high precision with high processing size. The high processing size is an important challenge for such methods because the objective of the methods of salient region detection for the object detection in the image is to reduce the data size for higher level processes of object detection. Since object detection algorithms undertake high size processes, if we want to reduce the processing data size that are heavier than the higher level processes, then the useful feature of such algorithms is neutralized in practice. Thus, the main objective of using such algorithms is to reduce the data size for object detection algorithms while the processing size of salient region detection algorithm must be lower than the object detection algorithm with acceptable precision.

On the other hand, the proposed method of this article (SOD-TS method hereafter) uses the information of frequency domain to simulate the task with very low processing size, and considers the obtained simulated task as one of the algorithm input; then considering this specified task, the algorithm detect the salient object as the best response to the current task. Since our proposed method uses the information of frequency, in following sections first we review two fast and famous methods (SR and PFT methods) that are similar to our proposed and use the information of frequency domain. Then we will introduce SOD-TS method in more details.

## 2 Frequency domain information

Fourier Transform (FT) phase of the signal includes very important information. Indeed, when the signal has an important part, the values of the signal phase are higher. Aiger, et al. (2010) have used this point to find defects on texture [4].

Moreover, other counterparts to Fourier phase spectral, such as modules values of wavelet transform, can also reveal some important information such as information of object edges [6, 46].

### 2.1 Using the phase information by domain filtering

Zhang, et al. (2007) introduced the SR method [18] that is known for its simplicity and very high speed. This method does not use the information of the image color and takes the difference of the logarithm of the image FT and its smoothed image FT, then in order to calculate the saliency map, the FT phase of the input image is used to obtain the inverse FT. Afterwards, HFT method [25] was proposed. The latter method used the color feature with an algorithm similar to SR. In fact, it is the developed version of the SR method. In sum, the SR method calculates the saliency map, using information of the signal FT phase and the specific filtering of signal FT magnitude.

### 2.2 Using the phase information by normalizing the domain

Zhang et al. (2008) proposed a method known as PFT [17]. This method (unlike SR method) doesn’t use the color feature while having a high speed. The main difference between PFT and SR methods is that the PFT method normalizes FT magnitude (indeed, it considers the strength of all frequencies equal to its unit) and then it takes its reverse FT. In sum, the PFT method detects the salient region by the signal FT phase information.

## 3 SOD-TS method

SOD-TS algorithm is a task-driven (TD) method using the high level features of task for detecting the salient object. That is, the algorithm detects the salient object with regard to a task that has been specified as an input for the algorithm. But we have to model the task in order to insert it to algorithm as an input.

### 3.1 Task simulation

Task is a cognitive and qualitative concept whose quantitative simulation is not simple at all because it requires lots of calculative complexities. It can be simulated through some tools such as the deep learning or neural network. In order to avoid the mentioned calculative complexity, in simulating the task in SOD-TS algorithm, we use the geometrical model of the salient object that is to be detected in the image. In fact, this method considers the task as a geometrical shape of the intended salient objects that is to be detected. Then the obtained geometrical model (which represents the feature of the intended salient object to be revealed) can include the binary image; or we want to introduce the intended object to algorithm in a more precise way, it can be a grayscale image.

### 3.2 How SOD-TS works?

SOD-TS method uses the basic concepts of Fourier Transform that are based on the available information in frequency domain. Before dealing with this method, let recall two important theories on Fourier Transform.

### FT phase theory

Changing a signal in the time domain would not lead to the FT magnitude of that signal in the frequency domain; just its phase is changed. This issue can be generalized to the two-dimensional signal (image) such that if a specific object is being changed in the image, its FT magnitude would not change but only its FT phase gets changed.

#### 3.2.1 Conclusion 1: FT phase of any image includes some information about the location of that object in the image

### FT absolute theory

If a signal is expanded or contracted (deformed) in the time domain, the subsequent feature in the FT signal is reverse; i.e. if the signal is expanded it gets contracted in the frequency domain, and vice versa. This issue can be generalized to the two-dimensional signal (image) so if the size of a specific object in the image is changed, its FT magnitude is changed as well.

#### 3.2.2 Conclusion 2: FT size of a signal includes information about the geometrical shape of the object in image

In equation (1), *i*_{0}(*m*. *n*) is the original image, *i*(*m*. *n*) is the filtered image, and *g*(*m*. *n*) is the Gaussian filter function.

In equation (2), *I*(*u*. *v*) is the FT of the filtered image by Gaussian filter; and *A*_{I}(*x*. *y*) and *p*_{I}(*x*. *y*) are FT size and FT phase, respectively.

In equation (3), *M*(*u*. *v*) is the FT of the model object; and *A*_{M}(*x*. *y*) and *p*_{M}(*x*. *y*) are FT size and FT phase, respectively.

In equation (4), *wmx*(*m*. *n*) is the obtained final image in which the location of strongest pixels implies the location of the object; and.

In equation (5), (*m*_{i}. *n*_{i}) denotes the locations of the i^{th} strongest pixels, for i = 1,…,N.

In all of the above equations, * is the convolution operator, \( \mathcal{F} \) and \( {\mathcal{F}}^{-1} \) are the FT operator and inverse FT respectively.

### 3.3 Justification of the proposed method

Considering the mentioned explanations about the algorithm of the proposed method, we used FT magnitude of the image of object model and the phase of the original image since (based on the conclusion 1) the information of the object location is available in the FT phase of the image. On the other hand, considering the conclusion 2, we use the FT magnitude of the image of object model because the information of the signal appearance is available in the FT magnitude. Moreover, since we look for the location of the object whose appearance is similar to object model, we use the FT phase of the original image that includes the information of location; besides, we use the FT magnitude of image of the object model that includes information of appearance of the object.

### 3.4 A new justification for phase-only transform (PFT) method

### 3.5 Advantages of SOD-TS method

#### 3.5.1 Very low processing size

In this method, the task simulation operation and salient object detection are done by simple calculations with low processing size; hence the most important feature of this method is its low processing size and its high speed.

#### 3.5.2 Object detection

Scale-independency and rotation-independency are among the very important characteristics of object detection algorithms. Thus, if an object detection algorithm can meet these characteristics it would be considerably more efficient.

#### 3.5.3 Algorithm controllability in detecting salient object

## 4 Results

There are several ways to measure the agreement between model predictions and human annotations [9]. Some metrics evaluate the overlap between a tagged region and model predictions while others try to assess the accuracy of drawn shapes with object boundary. In addition, some metrics have tried to consider both boundary and shape [7, 33]. To evaluate our proposed method, in this article we used two parameters: Area under ROC curve (AUC) and algorithm running time. Moreover, we use the data derived from PASCAL(http://host.robots.ox.ac.uk/pascal/VOC/), EGSSD(http://www.cse.cuhk.edu.hk/leojia/projects/hsaliency/dataset.html), MSRA(https://mmcheng.net/msra10k/).

### Receiver operating characteristics (ROC) curve

*Precision*,

*Recall*and

*Fβ*, we can also report the false positive rate (

*FPR*) and true positive rate (

*TPR*) when binarizing the saliency map with a set of fixed thresholds:

*M*and ground-truth, respectively. The ROC curve is the plot of

*TPR*versus

*FPR*by varying the threshold

*Tf*.

### Area under ROC curve (AUC) score

While ROC is a two-dimensional representation of a model’s performance, the AUC distills this information into a single scalar. As the name implies, it is calculated as the area under the ROC curve. A perfect model will score an AUC of 1, while random guessing will score an AUC around 0.5.

Comparing the proposed method to other methods in terms of the AUC parameter and running time of the algorithm

# | Model | AUC |
---|---|---|

1 | AC [2] | .565 |

2 | FT [3] | .629 |

3 | CA [16] | .782 |

4 | MSS [1] | .766 |

5 | SEG [35] | .787 |

6 | RS [14] | .713 |

7 | HC [14] | .669 |

8 | SWD [15] | .835 |

9 | SVO [10] | .826 |

10 | CB [19] | .818 |

11 | FES [37] | .854 |

12 | SF [34] | .762 |

13 | LMLC [43] | .800 |

14 | HS [13] | .840 |

15 | GMR [44] | .829 |

16 | DRFI [42] | .897 |

17 | PCA [31] | .848 |

18 | LBI [36] | .828 |

19 | GC [12] | .728 |

20 | CHM [26] | .864 |

21 | DSR [27] | .873 |

22 | MC [20] | .873 |

23 | UFO [21] | .825 |

24 | MNP [32] | .807 |

25 | GR [45] | .794 |

26 | RBD [49] | .867 |

27 | HDCT [22] | .815 |

28 | ST [30] | .868 |

29 | QCUT [5] | .870 |

30 | SOD-TS | .517 |

### 4.1 Other examples of SOD-TS

We encounter another challenge of ship detection when the ship is smoothed in the image. In this situation, the intended ship is smoothed in the image, like Fig. 12q and u. However, using the suitable model, the algorithm of the proposed method has been successful in detecting the ship even in this image (12t, 12x). It should be mentioned that the suitable model to detect the smoothed object in the image is a model which is most similar to the intended object. In fact, the precision of the algorithm is increased by doing so and as seen in Fig. 12r, a model which is very similar to a ship has been selected. The same model is selected for the image in Fig. 12u as well. However, the shape of the ship in this Figure is different from its shape in Fig. 12q.

## 5 Conclusion

Based on task simulation, in this paper we proposed a salient object detection method that uses the bottom-up attention model. The model proposed in this paper has a high speed and very low processing size due to the use of new features based on FT concepts. Thus it can be easily run for software and hardware implementation. This method has been inspired by the PFT method and it is indeed a very general completion of the PFT method. Moreover, since we simulate the task, this model can have different behaviors with regard to the current task (i.e. object model).

The proposed model does not use the color feature because we believe that the human being manages to detect the salient regions and intended object without relying on the color feature. Since we simulate the task, this model operates with regard to the current task, i.e. the object model because we believe that the concept of task has geometrical components. That is, when a person has a specific task, the intended task can emerge as the geometrical shapes in his mind; and this characteristic is just one of the endless components of the qualitative concept of the task.

## Notes

## References

- 1.Achanta R, Susstrunk S (2010) Saliency detection using maximum symmetric surround. IEEE ICIP: 2653–2656Google Scholar
- 2.Achanta R, Estrada F, Wils P, Susstrunk S (2008) Salient region detection and segmentation. Comp Vis SysGoogle Scholar
- 3.Achanta R, Hemami S, Estrada F, Susstrunk S (2009) Frequency-tuned salient region detection. IEEE CVPRGoogle Scholar
- 4.Aiger D, Talbot H (2010) The phase-only transform for unsupervised surface defect detection. Computer Vision and Pattern Recognition (CVPR) IEEE ConferenceGoogle Scholar
- 5.Aytekin C, Kiranyaz S, Gabbouj M (2014) Automatic object segmentation by quantum cuts. IEEE ICPR: 112–117Google Scholar
- 6.Bhatnagar G, Wu QMJ, Raman B (2013) Discrete fractional wavelet transform and its application to multiple encryption. Inf Sci 223:297–316MathSciNetCrossRefGoogle Scholar
- 7.Borji A et al (2015) Salient object detection: a benchmark. IEEE Trans Image Process 24(12)Google Scholar
- 8.Borji A, Itti L (2013) State-of-the-art in visual attention modeling. IEEE Trans Pattern Anal Mach Intell 35(1)Google Scholar
- 9.Borji A, Sihite DN, Itti L (2013) What stands out in a scene? A study of human explicit saliency judgment. Vis Res 91(0):62–77CrossRefGoogle Scholar
- 10.Chang K-Y, Liu T-L, Chen H-T, Lai S-H, (2011) Fusing generic objectness and visual saliency for salient object detection. IEEE ICCV: 914–921Google Scholar
- 11.Chen T, Lin L, Liu L, Luo X, Li X (2016) Deep image saliency computing via progressive representation learning. IEEE Trans Neural Netw Learn Syst 27(6):1135MathSciNetCrossRefGoogle Scholar
- 12.Cheng M-M, Warrell J, Lin W-Y, Zheng S, Vineet V, Crook N (2013) Efficient salient region detection with soft image abstraction. IEEE ICCV: 1529–1536Google Scholar
- 13.Cheng M-M et al. (2015) Global contrast based salient region detection. IEEE TPAMIGoogle Scholar
- 14.Cheng M-M, Mitra NJ, Huang X, Torr PHS, Hu S-M (2015) Global contrast based salient region detection. IEEE TPAMIGoogle Scholar
- 15.Duan L, Wu C, Miao J, Qing L, Fu Y (2011) Visual saliency detection by spatially weighted dissimilarity. IEEE CVPR: 473–480Google Scholar
- 16.Goferman S, Zelnik-Manor L, Tal A (2012) Context-aware saliency detection. IEEE TPAMI 34(10):1915–1926CrossRefGoogle Scholar
- 17.Gue C, Ma Q, Zhang L (2008) Spatiotemporal saliency detection using phase spectrum of quaternion Fourier transform. Proc IEEE Int Conf Comput Vision Pattern RecognGoogle Scholar
- 18.Hou X, Zhang L (2007) Saliency detection: a spectral residual approach. Proc IEEE Int Conf Comput Vision Pattern Recogn,: 1–8Google Scholar
- 19.Jiang H, Wang J, Yuan Z, Liu T, Zheng N, (2011) Automatic salient object segmentation based on context and shape prior. BMVCGoogle Scholar
- 20.Jiang B, Zhang L, Lu H, Yang C, Yang M-H (2013) Saliency detection via absorbing markov chain. IEEE ICCVGoogle Scholar
- 21.Jiang P, Ling H, Yu J, Peng J (2013) Salient region detection by ufo: uniqueness, focusness and objectness. IEEE ICCVGoogle Scholar
- 22.Kim J, Han D, Tai Y-W, Kim J (2014) Salient region detection via high-dimensional color transform. IEEE CVPRGoogle Scholar
- 23.Lee G, Tai YW, Kim J (2016) Deep saliency with encoded low level distance map and high level features. 2016 IEEE Conf Comput Vision Pattern Recognition (CVPR): 660–668Google Scholar
- 24.Li G, Yu Y (2015) Visual saliency based on multiscale deep features. Computer Vision and Pattern Recognition: 5455–5463Google Scholar
- 25.Li J, Levine MD, An X (2007) IEEE Transactions on Pattern Analysis and Machine Intelligence. Class files 6(1)Google Scholar
- 26.Li X, Li Y, Shen C, Dick AR, van den Hengel A (2013) Contextual hypergraph modeling for salient object detection. IEEE ICCV: 3328–3335Google Scholar
- 27.Li X, Lu H, Zhang L, Ruan X, Yang M-H (2013) Saliency detection via dense and sparse reconstruction. IEEE ICCVGoogle Scholar
- 28.Li J, Levine MD, An X, Xu X, He H (2013) Visual saliency based on scale-space analysis in the frequency domain. IEEE Trans Pattern Anal Mach Intell 35(4):996–1010CrossRefGoogle Scholar
- 29.Liu N, Han J (2018) A deep spatial contextual long-term recurrent convolutional network for saliency detection. IEEE Trans. Image ProcessGoogle Scholar
- 30.Liu Z, Zou W, Le Meur O (2013) Saliency tree: a novel saliency detection framework. IEEE TIPGoogle Scholar
- 31.Margolin R, Tal A, Zelnik-Manor L (2013) What makes a patch distinct?. IEEE CVPR: 1139–1146Google Scholar
- 32.Margolin R, Zelnik-Manor L, Tal A (2013) Saliency for image manipulation. Vis Comput: 1–12Google Scholar
- 33.Movahedi V, Elder JH (2010) Design and perceptual validation of performance measures for salient object segmentation. IEEE CVPRWGoogle Scholar
- 34.Perazzi F, Krahenbuhl P, Pritch Y, Hornung A (2012) Saliency filters: contrast based filtering for salient region detection. IEEE CVPR: 733–740Google Scholar
- 35.Rahtu E, Kannala J, Salo M, Heikkila J (2010) Segmenting salient objects from images and videos. ECCVGoogle Scholar
- 36.Siva P, Russell C, Xiang T, Agapito L (2013) Looking beyond the image: unsupervised learning for object saliency and detection. IEEE CVPR: 3238–3245Google Scholar
- 37.Tavakoli HR, Rahtu E, Heikkila J (2011) Fast and efficient saliency detection using sparse sampling and kernel density estimation. Scandinavian conference on image analysis: 666–675Google Scholar
- 38.Tong N, Lu H, Ruan X, Yang M-H (2015) Salient object detection via bootstrap learning. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition: 1884–1892Google Scholar
- 39.Tong N, Lu H, Zhang Y, Ruan X (2015) Salient object detection via global and local cues. Pattern Recogn 48(10):3258–3267CrossRefGoogle Scholar
- 40.Wang J, Lu H, Li X, Tong N, Liu W (2015) Saliency detection via background and foreground seed selection. Neurocomputing 152:359–368CrossRefGoogle Scholar
- 41.Wang L, Lu H, Xiang R, Yang MH (2015) Deep networks for saliency detection via local estimation and global search. Comput Vision Pattern Recogn: 3183–3192Google Scholar
- 42.Wang J, Jiang H, Yuan Z, Cheng M-M, Hu X, Zheng N (2017) Salient object detection: a discriminative regional feature integration approach. Int J Comput Vis 123(2):251–268CrossRefGoogle Scholar
- 43.Xie Y, Lu H, Yang M-H (2013) Bayesian saliency via low and mid-level cues. IEEE TIP 22(5)Google Scholar
- 44.Yang C, Zhang L, Lu H, Ruan X, Yang M-H (2013) Saliency detection via graph-based manifold ranking. IEEE CVPRGoogle Scholar
- 45.Yang C, Zhang L, Lu H (2013) Graph-regularized saliency detection with convex-hull-based center prior. IEEE Signal Process Lett 20(7):637–640CrossRefGoogle Scholar
- 46.You X, Du L, Cheung Y-m, Chen Q (2010) A blind watermarking scheme using new nontensor product wavelet filter banks. IEEE Trans Image Process 19(12):3271–3284MathSciNetCrossRefGoogle Scholar
- 47.Zhang J et al (2018) Top-down neural attention by excitation backprop. Int J Comput Vis 126:1084–1102CrossRefGoogle Scholar
- 48.Zhu D (2018) Salient object detection via a local and global method based on deep residual network. J Vis Commun Image Represent 54Google Scholar
- 49.Zhu W, Liang S, Wei Y, Sun J (2014) Saliency optimization from robust background detection. IEEE CVPRGoogle Scholar

## Copyright information

**Open Access** This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.