Contour detection refined by a sparse reconstruction-based discrimination method

Sparse representations have been widely used for many image processing tasks. In this paper, a sparse reconstruction-based discrimination (SRBD) method, which was previously proposed for the classification of image patches, is utilized to improve boundary detection in colour images. This method is applied to refining the results generated by three different algorithms: a biologically inspired method, and two state-of-the-art algorithms for contour detection. All of the contour detection results are evaluated by the BSDS300 and BSDS500 benchmarks using the quantitative measures: F-score, ODS, OIS and AP. Evaluation results shows that the performance of each algorithm is improved using the proposed method of refinement with at least one of the quantitative measures increased by 0.01. In particularly, even two state-of-the-art algorithms are slightly improved by applying the SRBD method to refine their contour detection results.


Introduction
Object contours play an important role in human visual perception such as scene understanding and object recognition [1]. Contour detection is also a fundamental technique for many applications in computer vision such as image segmen-  [2], object recognition [3], robot vision [4] and medical image analysis [5]. It is widely accepted that the locations of object contours can typically be derived from discontinuities in the intensity, texture and colour of adjacent image regions. Therefore, there are numerous algorithms for contour detection which aim to locate these discontinuities. For instance, Canny [6] utilized first-order derivative of Gaussian filters to locate intensity discontinuities. Similarly, other linear filters such as Sobel, Prewitt, Beaudet, Robert [7], Laplacian of Gaussian [8], higher-order derivative of Gaussian and Gabor, have also been devised. Algorithms that employ a linear filter can be interpreted as using a measure of the match between pixels and a given edge template to locate intensity discontinuities [9]. In contrast to utilizing an edge template (linear filter), other methods to locate intensity and colour discontinuities by comparing each pixel and its neighbouring pixels have been defined such as Mean shift [10], Bilateral filtering [11], Vector order statistics [12], and Morphological edge detector [13].
Because the algorithms mentioned above are not able to locate contours defined by texture discontinuities, several statistical approaches were devised for locating texture boundaries. For instance, Huang et al. [14] utilized twosample statistical test of independence to compute the dissimilarity between adjacent image regions. Ojala et al. [15] employed distributions of local binary patterns and pattern contrast to evaluate the similarity of neighbouring image regions using G statistics to compare distributions. There are also many different types of algorithms which were developed for contour detections such as psychophysically inspired methods (e.g. phase congruency [16] and Multiresolution analysis [17]), interactive detection methods (e.g. active contours [18]), graph-based methods (e.g. normalized cuts [19]), and region-based methods (e.g. watershed transform [20], ultrametric contour map [21], plaster cast model [22], pixon-based method [23], and meta-heuristic method [24]).
Another set of algorithms attempted to simulate some operations performed by the human/animal visual system for contour detection. For instance, Spratling [25] applied a predictive coding/biased competition (PC/BC) model of the primary visual cortex (V1), which simulated neuron activity in the V1 [26], to locate contours defined by intensity discontinuities. Afterwards, Wang et al. [27] extended the PC/BC model of V1 to locate colour discontinuities, which was inspired by neurophysiological data from single neurons in Macaque V1 [28]. There are many other algorithms inspired by different kinds of operations in the human/animal visual system, including-but not limited to-the Eye movement model [29], the Retina and V1 model [30], and the Surround suppression model [31].
One of the state-of-the-art contour detection algorithms for natural images is gPb-owt-ucm [2], which was derived from the integration of some previously proposed methods. Specifically, Martin et al. [32] developed the Probability of Boundary (Pb) algorithm first, which applied three different gradient methods to locate local discontinuities in brightness, texture and colour within an image. Secondly, Maire et al. [33] and Ren [34] found that the combination of these local discontinuities across different scales can improve the performance of the Pb algorithm. Thirdly, Maire et al. [33] also integrated these multi-scaled local discontinuities using the normalized cuts [19] technique and proposed the globalization of probability of boundary (gPb) algorithm. Finally, Arbelaez et al. [2] post-processed the gPb algorithm by utilizing an improved watershed transform algorithm [20] and an ultrametric contour map [21], to generate the gPb-owtucm method. Song et al. [35] used a measure of Laplacian energy to remove redundant contours in gPb-owt-ucm [2].
Based on the hypothesis that natural images allow a sparse decomposition in some redundant basis (or dictionary), sparse representation techniques have been widely used for different image processing tasks, such as image denoising [36], image restoration [37], texture classification [38], and face recognition [39]. Sparse coding has also been applied to contour detection. Sparse Code Gradients (SCG) [40] use a learned sparse representation method [41] to generate a sparse code of each input image. It then applies the gradient method [32] to compute the gradients of the sparse code across multiple scales and multiple orientations. Finally, a support vector machine (SVM) is applied to locate contours within an image. SCG generates results that equal the stateof-the-art performance of gPb-owt-ucm.
Mairal et al. [42] proposed a sparse reconstruction-based discrimination (SRBD) method, which use two dictionaries to place image patches into two classes: edges and non-edges. The dictionaries were obtained by training with two different classes of image patches extracted from a set of training images. Afterwards, a test image patch was represented as a sparse code using these two dictionaries. Finally, the classification of the test image patch was determined by assigning it the category of the dictionary that produces the smallest reconstruction error. Mairal et al. [43] increased the discriminative power of the SRBD method by adding a discriminative training algorithm to the procedure of dictionary learning, and then applied the SRBD method to refining the result derived from the Canny edge detector [6], which is significantly improved compared to the unrefined Canny detector.
In this paper, a slightly modified version of the SRBD method is applied to refining the contour detection results produced by three other, previously proposed, algorithms. These three algorithms are the Colour PC/BC model of V1 [27] (c-PC/BC), gPb-owt-ucm [2] and SCG [40]. The SRBD method is described in more detail in Sect. 2. Section 3 describes the procedure used to apply the modified version of SRBD to refining the results derived from the three contour detectors, highlighting the modified steps compared to Mairal et al. [43]. The Results section evaluates the performance of each of the three refined contour detection methods using the BSDS300 [44] and BSDS500 [2] datasets.

The sparse reconstruction-based discrimination method
The sparse representation technique can be summarized as: an input signal x admits a sparse approximation over a dictionary D comprised of many elements, when a linear combination of only a few elements from D can be found that approximates the original signal x.
In the SRBD method, input signals are p × p pixels patches extracted from images. These patches are converted to column vectors x with length n = p 2 pixels. The K-SVD algorithm [41] is applied to learning the dictionary D with a fixed number of k elements and a fixed sparsity factor L, using M patches (column vectors) extracted from training images. The problem of dictionary learning is to work out an optimal dictionary D which can minimize the reconstruction error given a fixed sparsity constraint, as described by the following equation: where . 2 represents the Frobenius l 2 -norm; x l represents an input image patch converted to a column vector with the size n pixels; D represents the dictionary to be learnt with the size n × k. The k-element column vector α l is the sparse representation for the l-th patch using the dictionary D. . 0 denotes the number of non-zero elements in a vector.
Afterwards, dictionary D learnt from Eq. 1 as well as patches (column vectors x) extracted from a test image become inputs to the OMP algorithm [45], which is applied to generate optimal sparse representations (α * ) of the test image patches using the same sparsity factor L; as described by the following Eq. 2: Afterwards, sparse reconstruction error R * (x, D, α * ) can be straightforwardly obtained by encoding patches x extracted from a test image using the dictionary D learnt from training patches by Eq. 1 and sparse representations α * computed from Eq. 2, as described by the following Eq. 3: Suppose that there are N = 2 sets of image patches extracted from a set of training images, belonging to two different classes: edges and non-edges. The class of a training patch being determined by the class of its centre pixel. These two sets of patches can be used to define two dictionaries D by applying the K-SVD algorithm [41] as described in Eq. 1. Afterwards, two sets of optimal sparse parameters α * can be determined for each test image patch using the OMP algorithm [45] as described in Eq. 2. Similarly, two sets of sparse reconstruction errors R * can be obtained, as described in Eq. 3, by encoding the test patch using the two different dictionaries. Finally, the centre pixel of the patch from the test image can be assigned class i 0 (edge or non-edge) if the sparse reconstruction error R * is smaller when encoded with dictionary D i than with the other dictionary, as shown in the following Eq. 4:

Procedure of refinement
Contour detection results derived from three previously proposed boundary-detection algorithms are refined by the application of a slightly modified version of the SRBD method. These three algorithms include a biologically inspired model (c-PC/BC [27]), and two state-of-the-art algorithms (gPb-owt-ucm [2] and SCG [40]). Each step of the refinement is described in the following paragraphs. The steps in SRBD which are modified compared with [43] are highlighted.
Step 1: Two classes of patches are extracted from the set of training images provided with the BSDS300 [44] or BSDS500 [2] dataset. These two classes are true contour and false contour. Only if the centre pixel of a patch satisfies the following condition is the surrounding patch extracted. The centre pixel is identified as an 'effective' contour by the previously proposed boundary-detection algorithm that is being refined. Afterwards, the extracted patch x T is classified as a true contour if the centre pixel of the patch is selected as a contour pixel by at least one human subject [44]. In contrast, the extracted patch x F is classified as a false contour if the centre pixel with its surrounding r × r pixels of the patch are all not selected as a contour pixel by all human subjects [44]. Because c-PC/BC, gPb-owt-ucm and SCG all produce a continuous value of probability of boundary (Pb) at each pixel, 'possible' contours are defined as those pixels with Pb values are larger than a threshold T 1 = 0.01. The value of the parameter T 1 was determined by trial and error. 1000 patches that satisfied these conditions were selected, at random, from each training image. As there are 200 training images this gave a total of 200000 training patches for each dictionary.
As previously mentioned, each image patch x T or x F belonging to the true or false contour with size p × p pixels are converted to a column vector with size n = p 2 pixels. If those training images are chromatic, image patches extracted from the red, green, and blue channel of the RGB colour space are concatenated to a single column vector with the size 3n pixels. It has been found that utilizing multiple sizes of patches and resolutions of images can improve the performance of the refinement [42]. Mairal et al. [43] used p = 5, 7, 9, 11, 15, 19, 23 for both full-scale and half-scale images to refine the Canny edge detector. However, in this modified version, only some of these sizes of patches were used. p = 7, 9, 11, 15, 19 pixels with r = 5 pixels for fullscale images and p = 7, 9, 11 pixels with r = 3 pixels for half-scale images were used with the three algorithms. The selection of these particular sizes of patches to be extracted and the value of the parameter r were all determined by trial and error. As a result, there were overall N = 16 (8 sizes × 2 classes) sets of training patches.
Step 2: When SRBD was applied to refining the Canny edge detector by Mairal et al. [43], all dictionaries belonging to the true and false contour classes were learnt by using both the K-SVD algorithm [41] and the discriminative training method defined in [42]. The K-SVD algorithm, as described in Eq. 1, defines dictionaries that minimize the reconstruction error obtained when encoding true (or false) contour training patches using the true (or false) contour dictionary. At the same time, the discriminative training method was employed to increase the discriminative power of each pair of dictionaries by maximizing the reconstruction error obtained by encoding true (or false) contour training patches using the false (or true) contour dictionary. The dictionaries used in [43] had a fixed number of 256 elements. The number of iterations for the K-SVD algorithm was set to 25, and a series of sparsity factors ranging from 1 to 15 were used.
In this modified version of SRBD, the procedure of dictionary learning was simpler. The N /2 = 8 sets of dictionary D T and D F belonging to the true and false contour classes were learnt using the K-SVD algorithm [41], as described in Eq. 1. The dictionaries D T and D F also had a fixed number of k = 256 elements, the number of iterations for the K-SVD algorithm was set to 1000, and a fixed sparsity factor L = 12 was used. The problem of dictionary learning is to work out an optimal dictionary D T or D F which can minimize the reconstruction error given a fixed sparsity constraint, as described by the following equation: where all operators are identical to Eq. 1.
Step 3: Similar of the method of extracting patches from a set of training images described in step 1, N /2 = 8 sizes of patches (column vectors) were extracted from each test image at all locations where the centre pixel had been identified as an 'effective' contour by the previously proposed boundarydetection algorithm that was being refined. Afterwards, each of the N /2 = 8 sets of column vectors were encoding using the OMP algorithm [45] and each corresponding pair of dictionaries D T and D F learnt at the step 2 from Eqs. 5 and 6 to generate N = 8 sets of optimal sparse representations α T * and α F * as described by the following equation: where all operators are identical to Eq. 2. Afterwards, eight sets of sparse reconstruction error R T * x, D T , α T * and R F * x, D F , α F * can be straightforwardly obtained by: where R T * x, D T , α T * and R F * x, D F , α F * represents sparse reconstruction error obtained from computing the difference between the test image patch and the encoding of this test image patch using the dictionary D T or D F with its corresponding sparse representations α T * x, D T and α F * x, D F belonging to the true and false contour class.
Step 4: Each corresponding pair of sparse reconstruction errors R T * and R F * obtained in Eqs. 9 and 10 could be used to assign a test patch to a class as described in Eq. 4. However, the classifications produced for different patch sizes need to be combined to produce single, final, classification. To do this Mairal et al. [43] used a series of sparsity factors in the procedure of dictionary learning. Therefore, each classifier can output a pair of curves of reconstruction errors as a function of the sparsity constraint. Afterwards, all curves generated by multiple sizes of classifiers were combined together to generate a feature vector for classification by a logistic regression classifier or a linear SVM.
As a fixed sparsity constraint was used in this modified version of SRBD, a new method was defined. Specifically, each corresponding pair of R * was used to calculate a confidence metric (CM).
where R T * and R F * represent the sparse reconstruction errors obtained by encoding test patches using the true and false contour dictionaries. There are overall N /2 = 8 CM values obtained from multiple sizes of patches and resolutions of the image. All values of CM continuously range from −1 to 1. Specifically, positive values correspond to the belief that the centre pixel of the patch is a false contour. In contrast, negative values in CM indicate the confidence that the centre pixel of the test patch is a true contour. Only positive values in CM were kept, and negative values in CM were set to 0, to generate a confidence of false contour metric (CFM) as defined in: v (x, y) = u (x, y) , u (x, y) > 0 0, u (x, y) ≤ 0 for ∀v (x, y) ∈ CFM and ∀u (x, y) ∈ CM (12) where u (x, y) represents each value in CM and v (x, y) represents the corresponding value with same location in CFM. Afterwards, N /2 = 8 CFM were summed up and normalized by a function G, which clipped values w (x, y) within the aggregated CFM at a threshold T 2 and then normalized these values by T 2 , as defined in: ∀G (w (x, y) , T 2 ) ∈ NACFM (13) where w (x, y) represents each value in the aggregated CFM; NACFM whose values are the normalized values in the aggregated CFM by the function G still indicates the confidence of false contour and is used to suppress false contour pixels within an image. The value of T 2 was set to 0.15, 0.65, and 1.15 for c-PC/BC, gPb-owt-ucm, and SCG respectively. Finally, 1 − N AC F M was multiplied by the corresponding output of the boundary-detection algorithm that was being refined. The proposed version of SRBD thus acts to suppress boundary pixels it determines to be false contours. When gPb-owt-ucm and SCG were being refined, the output that was suppressed was the final result generated by these two algorithms. However, for c-PC/BC, a slightly different approach was used. The algorithm simulates neuron activity in V1, which was a convolutional neural network applied to perform contour detection by using the receptive fields of V1 neurons (modelled as first-order derivative of Gaussians) to locate boundaries defined by local intensity and colour discontinuities. The output of c-PC/BC that was suppressed was the predictive neuron responses across multiple orientations which indicated discontinuities in colour and luminance. Afterwards, these predictive neuron responses were used to linearly reconstruct the final contour detection result [27].
In summary, the main difference between the SRBD method [43] used to refine Canny [6] and the modified version of the SRBD method used to refine the three previously proposed methods are in the following aspects. First, the multiple sizes of image patches extracted from multiple resolution of training and test images. The SRBD [43] used p = 5, 7, 9, 11, 15, 19, 23 for both full-scale and half-scale images to refine the Canny edge detector [6], whereas the proposed modified version of SRBD only employed some of these sizes of patches including p = 7, 9, 11, 15, 19 pixels for full-scale images and p = 7, 9, 11 pixels for half-scale images. Second, the sparsity constraint used in the dictionary learning procedure. The SRBD [43] used to refine Canny defined a discriminative dictionary training method and used a series of sparsity constraint to learn a series of dictionaries for each different size of training image patches. In contrast, the proposed modified version of SRBD is simpler, as it does not use the discriminative dictionary learning method and only uses one sparsity constraint to generate one dictionary for each different size of training image patch. Third, the sparse reconstruction error employed to separate false contours from true contours within test images. The SRBD [43] used to refine Canny can output a pair of curves for each different size of image patches plotted by two series of reconstruction errors using the corresponding series of dictionaries belonging to the true and false contour class as a function of the corresponding series of sparsity constraint. Afterwards, all curves generated by multiple sizes of reconstruction errors were combined together to generate a feature vector for the separation of false contours from true contours by a logistic regression classifier or a linear SVM. In contrast, the proposed modified version of SRBD defined a confidence metric calculated from multiple sizes of sparse reconstruction errors. The positive part of the confidence metric indicates the confidence of a false contour at each pixel within a test image and is employed to suppress contour pixels that it determines to be false contours.

Result
The BSDS300 dataset [44] is a standard benchmark for evaluating the performance of contour detection and image segmentation algorithms. The BSDS300 dataset contains 100 test images and 200 training images of size 481-by-321 pixels. They are all natural images of people, animals, plants, buildings, man-made objects and some natural scenes. To quantitatively evaluate the performance of a contour detection algorithm, the BSDS300 benchmark employs the F-score. The F-score is defined as 2PR/(P+R) where P is the precision and R is the recall obtained when comparing the algorithm's contours with contours produced by humans. In practice, the F-score varies from 0.41, which is the performance obtained by randomly defining pixels as contours, to 0.79, which is the performance obtained by a human observer compared with other human observers.
The BSDS500 dataset, which was first utilized in [2], is an extended version of the BSDS300. There are 200 fresh natural images added as test images, and the original 100 test images are altered to validation images. Three different quantities are employed for evaluation: the F-score with a fixed threshold for all test images (ODS), which is the same as the F-score used in the BSDS300 benchmark; the aggregated F-score with an optimal threshold per image (OIS); and the average precision (AP) for the full range of recall values, which is equivalent to the area under the precisionrecall curve.
Contour detection results derived from c-PC/BC [27], gPb-owt-ucm [2], and SCG [40] were all generated using the code provided by the authors of these algorithms. Afterwards, the result derived from each algorithm was refined by the modified version of the SRBD method proposed in Sect. 3. Figure 1 shows results of contour detection for two images from each of the two BSDS datasets. These results include contours detected by human observers, and by each of the three algorithms with and without refinement by SRBD. Figures 2 and 3 shows F-scores (ODS) with their precision-recall curves obtained by the different algorithms when their inputs are colour images from the BSDS300 and the BSDS500 respectively. Table 1 reports the ODS, OIS and AP obtained by these three algorithms and their refined versions. The ODS and OIS of c-PC/BC [27] is slightly increased using our proposed SRBD method, and the AP is significantly increased. The result derived from one of the state-of-theart algorithms, gPb-owt-ucm, is slightly improved after the refinement, with the ODS increased from 0.71 to 0.72 and the AP increased from 0.73 to 0.76, when evaluated by the BSDS300 benchmark; however, only the AP is significantly increased from 0.73 to 0.77 for the BSDS500 benchmark. The performance of the other state-of-the-art algorithm, SCG, is also slightly improved in its refined version, with the ODS Fig. 2 Performances of contour detection algorithms evaluated by the precision-recall curve of the F-score from the BSDS300 Benchmark [44] Fig. 3 Performances of contour detection algorithms evaluated by the precision-recall Curve of the ODS (F-score) from the BSDS500 Benchmark increased from 0.71 to 0.72, the OIS increased from 0.73 to 0.74 and the AP increased from 0.75 to 0.76, when evaluated by the BSDS300 benchmark. Only the AP is slightly increased from 0.77 to 0.78 for the BSDS500 benchmark.
The BSDS300/500 benchmark applies a set of thresholds ranging from 0 to 1 to convert the contour detection result into a binary map and then output a precision-recall curve at different threshold values. The F-score/ODS or OIS of the method being tested is the highest F-score value in a fixed In other words, some false contour pixels are no longer treated as a contour pixel at the majority of threshold values and most of the true contour pixels are unaffected by the proposed SRBD. As shown in Table 1, the quantitative evaluation results are consistent with the expectation. Specifically, the average precision (AP) values are all increased for the comparison between each of the three previously proposed methods with and without refinement. Also, the recall values have only a slight or no decrease at each threshold value for each of the three methods. However, the recall value may be decreased or increased in the highest F-score of different refined results because of optimal threshold values may have been changed. Therefore, the F-score/ODS and OIS are all increased for the comparison between each of the three previously proposed methods with and without refinement, although some of these increments are negligible.
The value of each parameter in the proposed modified version of SRBD was determined by trial and error. Specifically, the modified SRBD was operated on training images with a set of manually defined default parameter values. Afterwards, each parameter was sequentially justified, while all other parameter values were kept fixed according to the alteration of F-score on these training images. The sequence of the justification of each parameter value is from parameters related to training image patches extraction, dictionary learning, sparse reconstruction error computation, to suppression of contour pixels if the modified SRBD determines as false contours within results derived from the three previously proposed contour detection methods.

Discussion
The sparse reconstruction-based discrimination (SRBD) method was previously proposed by Mairal et al. [42] and utilized two dictionaries to place image patches into two classes: edge and non-edge. Afterwards, Mairal et al. [43] applied SRBD to improve the performance of the Canny edge detector [6]. However, given the poor performance of the Canny method in locating salient boundaries, the refined method still produced results that fell far short of state-of-art contour detection methods.
In this paper, a modified version of the SRBD method is applied to refining the contour detection results generated by three different boundary-detection algorithms: the colour Predictive Coding/Biased Competition Model of V1 [27] and the two state-of-the-art algorithms: gPb-owt-ucm [2] and Sparse Coding Gradients method [40]. Evaluation using the BSDS300 [44] and BSDS500 [2] datasets indicates that for all three algorithms boundary detection is at least slightly improved by the proposed refinement. In other words, the two state-of-the-art algorithms refined by this proposed version of SRBD can be improved beyond the previous state-of-the-art.
Different elements in dictionaries learnt from training image patches belonging to the true or false contour classes by the SRBD can also be interpreted as different receptive fields of a neuron representing an image feature of true or false contour pixels. Hence, these elements can also be added into a neurophysiologically inspired contour detection method implemented by the sparse coding model (such as the PC/BC model of V1 [26]) as the receptive fields of neurons representing true or false contour pixels, which may improve its performance of contour detection.