Abstract
The image segmentation problem is to delineate, or segment, a salient feature in an image. As such, this is a bipartition problem with the goal of separating the foreground from the background. An NPhard optimization problem, the Normalized Cut problem, is often used as a model for image segmentation. The common approach for solving the normalized cut problem is the spectral method which generates heuristic solutions based upon finding the Fiedler eigenvector. Recently, Hochbaum (IEEE Trans Pattern Anal Mach Intell 32(5):889–898, 2010) presented a new relaxation of the normalized cut problem, called normalized cut\(^\prime \) problem, which is solvable in polynomial time by a combinatorial algorithm. We compare this new algorithm with the spectral method and present experimental evidence that the combinatorial algorithm provides solutions which better approximate the optimal normalized cut solution. In addition, the subjective visual quality of the segmentations provided by the combinatorial algorithm greatly improves upon those provided by the spectral method. Our study establishes an interesting observation about the normalized cut criterion that the segmentation which provides the subjectively best visual bipartition rarely corresponds to the segmentation which minimizes the objective function value of the normalized cut problem. We conclude that modeling the image segmentation problem as normalized cut criterion might not be appropriate. Instead, normalized cut\(^\prime \) not only provides better visual segmentations but is also solvable in polynomial time. Therefore, normalized cut\(^\prime \) should be the preferred segmentation criterion for both complexity and good segmentation quality reasons.
Introduction
Image segmentation is fundamental in computer vision (Shapiro and Stockman 2001). It is used in numerous applications, such as in medical imaging (Pham et al. 2000; Dhawan 2003; Hosseini et al. 2010; Roobottom et al. 2010), and is also of independent interest in clustering (Coleman and Andrews 1979; Pappas 1992; Wu and Leahy 1993; Shi and Malik 2000; Xing and Jordan 2003; Tolliver and Miller 2006). The image segmentation problem is to delineate, or segment, a salient feature in an image. As such, this is a bipartition problem with the goal of separating the foreground from the background. It is not obvious how to construct a quantitative measure for optimizing the quality of a segmentation. The common belief is that normalized cut (NC) criterion (Shi and Malik 2000) is a good model for achieving highquality image segmentation and it is often used.
The normalized cut criterion uses similarity weights that quantify the similarity between pairs of pixels. These weights are typically set to be a function of the difference between the color intensities of the pixels. Such functions are increasing with the perceived similarity between the pixels. Even though the use of normalized cut is common, it is an NPhard problem (Shi and Malik 2000) and heuristics and approximation algorithms have been employed (Shi and Malik 2000; Xing and Jordan 2003; Dhillon et al. 2004; Tolliver and Miller 2006; Dhillon et al. 2007). The most frequently used method for obtaining an approximate solution for the normalized cut problem is the spectral method that finds the Fiedler eigenvector (Shi and Malik 2000).
Hochbaum (2010) presented a new relaxation of the normalized cut problem, called the normalized cut\(^\prime \) problem (NC\(^\prime \)). The normalized cut\(^\prime \) problem was shown in Hochbaum (2010) to be solved in polynomial time with a combinatorial (flowbased) algorithm. In addition, Hochbaum (2010, 2012) introduces a generalization of normalized cut, called the qnormalized cut problem (qNC). For the qnormalized cut problem, there are, in addition to the similarity weights, also pixel weights. The pixel weights could be a function of some pixel’s feature other than color intensity. The combinatorial algorithm that solves the normalized cut\(^\prime \) problem was shown to generalize, with the same complexity, to a respective relaxation problem qnormalized cut\(^\prime \) (qNC\(^\prime \)) (Hochbaum 2010, 2012). It is also shown in Hochbaum (2012) that the spectral method heuristic for the normalized cut problem extends to a respective heuristic for qnormalized cut.
Unlike the combinatorial algorithm’s solution, the spectral method’s solution is a real eigenvector, rather than a discrete bipartition. In order to generate a bipartition, a method, called the threshold technique, is commonly used. For a given threshold value, all pixels that correspond to entries of the eigenvector that exceed this threshold are set in one side of the bipartition, and the remaining pixels constitute the complement set. For further improvement, the spectral sweep technique selects, among all possible thresholds, the one that gives a smallest objective value for the respective normalized cut objective. A different technique, utilized by Yu and Shi (2003) and Cour et al. (2011), generates a bipartition from the Fiedler eigenvector which is claimed to give a superior approximation to the objective value of the respective normalized cut problem. This different method will be referred to as Shi’s code in the remainder of the paper. Our experimental study implements both the spectral sweep technique and the Shi’s code for the spectral method.
In this paper, we provide a detailed experimental study comparing the combinatorial algorithm to the spectral method, in terms of approximating the optimal value of both the normalized cut and the qnormalized cut criteria, quality of visual segmentation, and in terms of running times in practice.
To compare the approximation quality, we evaluate the objective functions of the normalized cut and qnormalized cut problems for the solutions resulting from solving the normalized cut\(^\prime \) problem and the spectral method. These solutions are bipartitions, and hence feasible solutions for the normalized cut and qnormalized cut problems.
To evaluate visual quality, we view the feature(s) that are delineated by the bipartition solutions. The evaluation is inevitably subjective. The manner in which we evaluate the visual quality is explained in detail in “Visual segmentation quality evaluation”.
For running time comparisons, we test the methods not only for the benchmark images given in \(160 \times 160\) resolution but also for higher image resolutions.
The main findings of the experimental study presented here are:

1.
The combinatorial algorithm solution is a better approximation of the optimal objective value of the normalized cut problem than the solution provided by the spectral method. This dominance of the combinatorial algorithm holds for both the spectral sweep technique and the Shi’s code’s. This is discussed in “Quantitative evaluation for objective function values”.

2.
The discretizing technique used in Shi’s code to generate a bipartition from the eigenvector is shown here to give results inferior to those of the spectral sweep technique, in terms of approximating the objective value of the respective normalized cut problem. This is displayed in “Comparing approximation quality of SWEEP and COMB”.

3.
The visual quality of the segmentation provided by the combinatorial algorithm is far superior to that of the spectral method solutions, as presented in “Visual segmentation quality evaluation”.

4.
Shi’s code includes a variant that uses similarity weights derived with intervening contour (Leung and Malik 1998; Malik et al. 2001). The visual quality resulting from segmentation with the intervening contour code is much better than the other spectral segmentations. Yet, the combinatorial algorithm with standard similarity (exponential similarity) weights delivers better visual results than Shi’s code with intervening contour (“Visual segmentation quality evaluation”). The combinatorial algorithm does not work well with intervening contour similarity weights since these weights tend to be of uniform value. A detailed discussion of this phenomenon is provided in “Comparing instances with intervening contour similarity weights: comparing SHINCIC with COMBNCIC and SHIqNCIC with COMBqNCIC”.

5.
Our study compares the visual quality of segmentations resulting from the qnormalized cut\(^\prime \) criterion with those resulting from the normalized cut\(^\prime \) criterion in “Visual segmentation quality evaluation”. (We use entropy for pixel weights in the qnormalized cut\(^\prime \) instances.) The results show that qnormalized cut\(^\prime \) often provides better visual segmentation than normalized cut\(^\prime \). Therefore, for applications such as medical imaging, where each pixel is associated with multiple features, these features can be used to generate characteristic node weights, and qnormalized cut\(^\prime \) would be a better criterion than normalized cut\(^\prime \).

6.
Over the benchmark images of size \(160\times 160\), the combinatorial algorithm runs faster than the spectral method by an average speedup factor, for the normalized cut objective, of 84. Furthermore, the combinatorial algorithm scales much better than the spectral method: the speedup ratio provided by the combinatorial algorithm compared to the spectral method grows substantially with the size of the image, increasing from a factor of 84 for images of size \(160\times 160\) to a factor of 5,628 for images of size \(660 \times 660\). The details are discussed in “Running time comparison between the spectral method and the combinatorial algorithm”.

7.
For normalized cut\(^\prime \) we get a collection of nested bipartitions as a biproduct of the combinatorial algorithm (Hochbaum 2010, 2012). The best visual bipartition and the best normalized cut objective value bipartition are chosen among these nested bipartitions. Our study results show that in most cases the best visual bipartition does not coincide with the bipartition that gives the best objective value of the normalized cut (or qnormalized cut) problem. (The details are discussed in “Visual segmentation quality evaluation”) Therefore, normalized cut, in spite of its popularity, is not a good segmentation criterion. Normalized cut\(^\prime \) improves on normalized cut not only in complexity (from NPhard to polynomial time solvable problem) but also in segmentation quality delivered.
The paper is organized as follows: “Notations and problem definitions” presents the notations employed. The detailed settings of the experiment are discussed in “Experimental setting”. In “Assessing quality of seed selection methods of COMB”, we first evaluate the effect of different selection of seeds—an important component of the combinatorial algorithm—in approximating the optimal objective values of the normalized cut and qnormalized cut problems. Then in “Running time comparison between the spectral method and the combinatorial algorithm”, we evaluate and compare the running times of the spectral method and the combinatorial algorithm. In “Quantitative evaluation for objective function values”, we present the quantitative evaluation of the spectral method and the combinatorial algorithm, in terms of the quality of approximation to the optimal objective values of the normalized cut and qnormalized cut problems. “Visual segmentation quality evaluation” provides a comparison of the visual results for the spectral method versus the combinatorial algorithm followed by concluding remarks in “Conclusions”.
Notations and problem definitions
In image segmentation an image is formalized as an undirected weighted graph \(G = (V,E)\). Each pixel in the image is represented as a node in the graph. A pair of pixels is said to be neighbors if they are adjacent to each other. The common neighborhoods used in image segmentation are the 4neighbor and 8neighbor relations. In the 4neighbor relation, a pixel is a neighbor of the two vertically adjacent pixels and two horizontally adjacent pixels. The 8neighbor relation adds also the four diagonally adjacent pixels. Every pair of neighbors \(i,j \in V\) is associated with an edge \([i,j] \in E\). Each edge \([i,j] \in E\) has a weight \(w_{ij} \ge 0\) representing the similarity between pixel node \(i\) and \(j\). We adopt the common notation that \(n = V\) and \(m = E\).
For two subsets \(V_1,V_2\subseteq V\), we define \(C(V_1,V_2) = \sum _{[i,j] \in E, i \in V_1, j\in V_2}w_{ij}\). A bipartition of a graph is called a cut, \((S,\bar{S}) = \{[i,j] \in Ei\in S, j \in \bar{S}\}\), where \(\bar{S} = V{\setminus }S\) is the complement of set \(S\). The cut capacity is \(C(S,\bar{S})\). Each node has a weight \(d(i) = \sum _{[i,j] \in E}w_{ij}\) which is the sum of the weights of its incident edges. For a set of nodes \(S\), \(d(S) = \sum _{i \in S}d(i)\). A node may have also an arbitrary nonnegative weight associated with it, \(q(i)\). For a set of nodes \(S \subseteq V\), \(q(S) = \sum _{i\in S}q(i)\).
Let \(\mathbf{D}\) be a diagonal \(n \times n\) matrix with \(\mathbf{D}_{ii} = d(i) = \sum _{[i,j] \in E}w_{ij}\). Let \( \mathbf{W}\) be the weighted node–node adjacency matrix of the graph, where \(\mathbf{W}_{ij} = \mathbf{W}_{ji} = w_{ij}\). The matrix \(\mathcal L = \mathbf{D}  \mathbf{W}\) is called the Laplacian of the graph.
The mathematical formulations of the normalized cut and qnormalized cut problems are:
Normalized cut (Shi and Malik 2000):
qNormalized cut (Hochbaum 2012):
A relaxation of these problems, introduced by Hochbaum (2010, 2012), omits the second term in the objective value. We call these relaxations normalized cut\(^\prime \) and qnormalized cut\(^\prime \) problems, respectively:
Normalized cut \(^\prime \) (Hochbaum 2010):
qNormalized cut \(^\prime \) (Hochbaum 2010, 2012):
It is shown in Hochbaum (2010, 2012) that
defined in Sharon et al. (2006) is equivalent to (3) in that both have the same optimal solution. Problem (5) is a criterion characterizing a good image bipartition by two goals. One requires the salient region segmented to be dissimilar from the rest of the image, or formally to have a small value for \(C(S,\bar{S})\). The second goal is to have the pixels in the segmented region as similar to each other as possible. This second goal is to have a large value for \(C(S,S)\).
The normalized cut and qnormalized cut problems are NPhard (Shi and Malik 2000; Hochbaum 2012). The combinatorial algorithm presented in Hochbaum (2010) solves the normalized cut and qnormalized cut problems approximately by solving their relaxations, normalized cut\(^\prime \) and qnormalized cut\(^\prime \) problems, respectively. Both the normalized cut\(^\prime \) and the qnormalized cut\(^\prime \) problems are polynomial time solvable by the combinatorial algorithm.
A bound on the relation between the spectral method solution and \(\mathrm{NC}_G\)
The bounds on the Fiedler eigenvalue were developed for a problem closely associated with normalized cut. This problem, devised by Cheeger (1970) and called the Cheeger constant problem (e.g., Chung 1997), is a “halfversion” of normalized cut. If the balance constraint \(d(S) \le d(V)/2\) is added, the formulation of the Cheeger constant problem is
where \(h_G\) is called the Cheeger constant of the graph \(G\). Like the normalized cut problem, the Cheeger constant problem is also NPhard (e.g., Chung 1997). For any (undirected) graph \(G\), the Cheeger inequality states that its Cheeger constant \(h_G\) is bounded by the second smallest eigenvalue of the (normalized) Laplacian of \(G\), the Fiedler eigenvalue \(\lambda _1\): \(\frac{\lambda _1}{2} \le h_G \le \sqrt{2\lambda _1}\) (Cheeger 1970; Chung 1997). In addition, the solution resulting from applying the spectral sweep technique to the Fiedler eigenvector evaluated by the objective of the Cheeger constant problem is of value at most \(2\sqrt{h_G}\) (Chung 2007). On the other hand, it is easily shown that \(h_G\) and \(\mathrm{NC}_G\) satisfy: \(\frac{1}{2}\mathrm{NC}_G \le h_G \le \mathrm{NC}_G\) (e.g., Hochbaum 2012). These (approximation) bounds for the Cheeger constant (with respect to \(\lambda _1\) and the sweep solution) can be easily transformed to corresponding bounds for the normalized cut objective. Therefore, the spectral method approximately solves the normalized cut (and the Cheeger constant and qnormalized cut) problem by finding the Fiedler eigenvector (Cheeger 1970; Donath and Hoffman 1973; Fiedler 1975; Alon and Milman 1985; Alon 1986; Shi and Malik 2000; Hochbaum 2012).
Experimental setting
Edge and node weights
Similarity edge weights
The benchmark images used here consist of grayscale images. A color intensity value is associated with every pixel, represented as an integer in \([0,255]\) in MATLAB. This is normalized and mapped to \([0,1]\). The similarity weight between a pair of pixel nodes is a function of the difference of their color intensities. For \(p_i\) and \(p_j\) the color intensities of two neighboring pixel nodes \(i\) and \(j\), the exponential similarity weight is defined as
where \(\alpha \) can be viewed as amplifying the dissimilarity between two pixels based on the color intensity difference. If \(\alpha \) is too small, then the dissimilarity is not significant enough to reflect the color intensity difference. On the other hand, setting the value of \(\alpha \) to be too large results in all pairs of pixels very dissimilar and therefore color intensity differences are not sufficiently informative. We tested several settings for \(\alpha \) values and found \(\alpha = 100\) works well. In all experiments prepared here \(\alpha \) is set to \(100\).
Another similarity weight is intervening contour introduced in Leung and Malik (1998) and Malik et al. (2001). Intervening contour uses the contour information in an image to characterize the (local) similarity between two pixels that are not necessarily neighboring. If two pixels are on the two different sides of a boundary, their similarity should be small as they are more likely to belong to different segments. In the experiment, we use the intervening contour similarity weight generated by Shi’s code. Since Shi’s code with intervening contour is considered to generate good segmentation, we compare it to the combinatorial algorithm.
Node weights
For qnormalized cut, the entropy of a pixel is used as its weight. The entropy of an image is a measure of randomness in the image that can be used to characterize the texture of an image. In MATLAB, by default the local entropy value of a pixel is the entropy value of the 9by9 neighborhood around the pixel. In our experiment, the entropy of a pixel is computed directly via the MATLAB builtin function entropyfilt.
Image database
We select 20 benchmark images from the Berkeley Segmentation Dataset and Benchmark (http://www.eecs.berkeley.edu/Research/Projects/CS/vision/grouping/segbench/). See Fig. 6 in Appendix. The 20 benchmark images are chosen to cover various segmentation difficulties and have been resized to be \(160 \times 160\) for testing since it is the default size in Shi’s code.
Implementation of the combinatorial algorithm
Seed selection
The combinatorial algorithm requires to designate a node as a seed in one set and a node as a seed in the other set to guarantee that both sets are nonempty (Hochbaum 2010). On the other hand, the delineation of foreground versus background depends on the interpretation of what is the main feature. This is not self evident and the purpose of the seeds is to have one seed indicating a pixel in the foreground and the other seed indicating a pixel in the background.
Theoretically, in order to obtain the optimal solutions to the normalized cut\(^\prime \) and qnormalized cut\(^\prime \) problems, all possible pairs of seed nodes should be considered. This increases the complexity of the combinatorial algorithm by a factor of \(O(n)\). To avoid this added complexity we devise a test for automatically choosing the seed nodes.
Two automatic seed selection rules are introduced here. The first rule is to select the pixel with the maximum entropy as a seed node in one set. The second rule is to select the pixel with the maximum group luminance value as a seed node in one set. The group luminance value is defined for pixels not on the boundary. For every pixel \(i\), the group luminance value of pixel \(i\) is the average of color intensities of the nine neighboring pixels in the \(3 \times 3\) region centering at \(i\). Intuitively, if a pixel has a greater group luminance value, that pixel and its surrounding pixels are more likely to be in the same segment. The other seed node is any arbitrarily selected node in the complement region to the one occupied by the first seed node. We compare the two automatic seed selection methods with a manual selection of both seed nodes (Table 1).
For each pair of seed nodes, the combinatorial algorithm is run twice where in the second run the two seed nodes are exchanged between the two sets. Therefore, for each image the combinatorial algorithm is executed six times, for the three different seed selection rules.
Nested cuts
Each run of the combinatorial algorithm for a pair of seed nodes, to either the normalized cut or the qnormalized cut problem, produces a series of nested cuts. This is because the combinatorial algorithm uses a parametric minimum cut solver as a subroutine (Hochbaum 2010, 2012). The parametric minimum cut problem can be solved efficiently. Theoretically, it is shown in Gallo et al. (1989) and Hochbaum (2008) that the running time to solve the parametric minimum cut problem is only a small constant factor of the time to solve a single instance of the minimum cut problem. We implement Hochbaum’s pseudoflow algorithm in Hochbaum (2008) as the parametric minimum cut solver. The implementation is described in Chandran and Hochbaum (2009) and the code is available online at Chandran and Hochbaum (2012).
The number of the nested cuts is typically 5–15. The combinatorial algorithm stores the visual segmentations by all the nested cuts, which enables to choose the one that is deemed (subjectively) best visually. The combinatorial algorithm also automatically selects the bipartition which gives the smallest objective values of the normalized cut or qnormalized cut problem among the nested cuts.
Implementation of the spectral method
The eigenvector solver subroutine of Shi’s code, named eigs2, is based on the MATLAB builtin eigenvector solver eigs with some modifications. Shi’s code applies the following two operations to the Laplacian matrix \(\mathcal L = \mathbf{D}  \mathbf{W}\):

1.
Sparsifying operation: It rounds to 0 small values of \(w_{ij}\) where the “small” is determined by some threshold value. The default threshold value in Shi’s code is \(10^{6}\).

2.
Offset operation: It adds a constant (1 by default) to \(\mathbf{D}_{ii} = d(i)\ (i = 1, \ldots , n)\). It also adds a value to each diagonal entry of the \(\mathbf{W}\) matrix. This value for entry \(\mathbf{W}_{ii}\) is \(0.5\) plus a quantity that estimates the roundoff error for row \(i, e(i) = d(i)  \sum _{j = 1}^n w_{ij}\).
In our experiment, we exclude the above two operations in Shi’s code to compare it with the combinatorial algorithm, which contains no any sparsifying or offset operations.
The spectral sweep technique uses the Fiedler eigenvector from Shi’s code (without the two operations) and then chooses the best bipartition threshold as described in “Introduction”.
Algorithm, optimization criterion, and similarity classifications and nomenclatures
Each experimental set is characterized by the choice of algorithm, the choice of optimization objective, and the choice of similarity weight definition. For the algorithm, we choose among the combinatorial algorithm, COMB, Shi’s code, SHI and the spectral sweep technique, SWEEP. For the optimization objective, we choose among normalized cut, NC, and qnormalized cut, qNC. For the similarity weight definition, we choose among the exponential similarity weights, EXP, and the intervening contour similarity weights, IC. The format of AlgorithmCriterionSimilarity is used to represent an experimental set.
For each choice of optimization objective and similarity weight definition, the combinatorial algorithm outputs a series of nested cuts for a pair of seeds (see “Nested cuts”), among which the cut that gives the smallest objective value of NC or qNC is selected. The pairs of seeds are selected according to the automatic seed selection criterion, including both the maximum entropy criterion and the maximum group luminance criterion, described in “Seed selection”. The numerically best cut is selected among the four series of corresponding nested cuts. The segmentation of the selected cut is considered as the output of the combinatorial algorithm and the objective value of NC or qNC of the selected cut is considered as the objective value output by the combinatorial algorithm.
We test the following experimental sets:
COMBNCEXP
COMBqNCEXP
COMBNCIC
COMBqNCIC
SHINCEXP
SHIqNCEXP
SHINCIC
SHIqNCIC
SWEEPNCEXP
SWEEPqNCEXP
SWEEPNCIC
SWEEPqNCIC.
Assessing quality of seed selection methods of COMB
In this section, we evaluate the three seed selection methods introduced in “Seed selection” for the combinatorial algorithm in terms of approximating the objective values of NC and qNC. We evaluate the objective values of NC and qNC for each of the solutions provided by the three seed selection methods and count the number of images on which each seed selection method gives the smallest objective values of NC and qNC. Table 2 gives the percentage of the number of images, out of the 20 images, in which each method gets the best values for NC and qNC, respectively. The exponential similarity weight is used in both cases.
The results given in Table 2 show that method 2 (max entropy) is best for NC and qNC. This indicates that the maximum entropy is a good seed selection method for image segmentation.
Since method 3 (max group luminance) is automatic, and also works well, we derive an automatic seed selection method which combines methods 2 and 3. This is done by running the combinatorial algorithm for the pairs of seeds generated by methods 2 and 3, and the output is the one corresponds to the best of these four values. The automatic seed selection method is best \(66.67\,\%\) of the time for NC and \(73.92\,\%\) of the time for qNC. This improves a great deal on method 1, where the two seeds are selected manually.
As a result of the comparison, in the following comparisons the cut that gives the smallest objective values of NC or qNC of COMB is selected from the four series of nested cuts corresponding to the four pairs of seeds selected according to the automatic seed selection method defined above.
Running time comparison between the spectral method and the combinatorial algorithm
In order to provide a fair comparison of the running times of the two algorithms, we disregard the input and output processing parts of each algorithm. We run both algorithms on a 2010 Apple MacBook Pro Laptop (2.4GHz Intel Core 2 Duo processor and 4GB of 1067 MHz DDR3). The running times of the two algorithms over all the 20 benchmark images with size \(160 \times 160\) are reported in Table 3. The exponential similarity weights are used and both NC and qNC objectives are applied.
Table 3 shows that the combinatorial algorithm runs much faster than the spectral method for the NC objective by an average speedup factor of 84. For the qNC objective, in most cases the combinatorial algorithm is still faster. The same comparison results also apply to the case of intervening contour similarity weights. It is not clear why the spectral method runs so much faster for qNC than NC. We note, however, that the results delivered by the spectral method for qNC are dramatically inferior to those provided by the combinatorial algorithm, both in terms of approximating the optimal objective value of qNC (Figs. 2, 4), and in terms of visual quality (“Visual segmentation quality evaluation”).
We further evaluate the scalability of the two algorithms by creating input sequences of images each based on one image at increasing resolutions: we employ six different image sizes: \(160 \times 160, 260 \times 260, 360 \times 360, 460 \times 460, 560 \times 560\) and \(660 \times 660\). For every image size we run the two algorithms on 5 of the 20 benchmark images (Images 4, 8, 12, 16 and 20) and average the running times over the five images. The running times of the spectral method and the combinatorial algorithm with these different image sizes are plotted in Fig. 1.
Figure 1 shows that as the input size increases, the running time of the spectral method grows significantly faster than that of the combinatorial algorithm, with an average speedup factor increasing from 84 for images of size \(160\times 160\) to 5,628 for images of size \(660 \times 660\). The running time of the combinatorial algorithm appears insensitive to changes in the input size. Interestingly, we observe that the running time of the combinatorial algorithm does not increase with the size of the image. This is because for these images in higher resolutions, the number of breakpoints is smaller and therefore there are fewer updates required between consecutive breakpoints Hochbaum (2012).
Quantitative evaluation for objective function values
In this section, we compare the performance of the spectral method and the combinatorial algorithm in terms of how well they approximate the optimal objective values of the normalized cut and qnormalized cut problems. In “Comparing approximation quality of SHI and COMB”, we compare SHI with COMB and in “Comparing approximation quality of SWEEP and COMB”, we compare SWEEP with COMB. Both exponential similarity weights and intervening contour weights are used in the comparisons.
In order to compare the performance of the spectral method, either SHI or SWEEP, with COMB in approximating the optimal objective value of NC or qNC, we compute a ratio of the objective value of NC or qNC generated by the spectral method to the corresponding objective value generated by COMB. If the ratio is greater than 1, it indicates that COMB performs better than the spectral method, while the ratio smaller than 1 is indicative of the spectral method having better performance. If the ratio is smaller than 1, its reciprocal characterizes the improvement of the spectral method on COMB in approximating the optimal objective value of NC or qNC.
Comparing approximation quality of SHI and COMB
Comparing instances with exponential similarity weights: comparing SHINCEXP with COMBNCEXP and SHIqNCEXP with COMBqNCEXP
Tables 4 and 5 show the ratios over the 20 benchmark images using exponential similarity weights. Tables 4 and 5 present the ratios with respect to NC and qNC, respectively, for SHI versus COMB.
As seen in Table 4, for every case COMB yields smaller NC objective values than SHI, with the mean improvement factor exceeding \(2.3 \times 10^6\). For qNC the results in Table 5 show that COMB not only yields smaller objective values than SHI, but these improvements are dramatically larger than those for NC, with a mean improvement factor exceeding \(9.2 \times 10^{11}\). We conclude that the relative performance of SHI is noncompetitive for NC and worse still for qNC. Figure 2 is a bar chart for the ratios in Tables 4 and 5. (Note that the bar chart ratios are given in \(\log \) scale, and same for the rest bar charts.) Table 6 displays the mean and median values of the ratios in Tables 4 and 5. They demonstrate the extent of the improvement of COMB over SHI.
Comparing instances with intervening contour similarity weights: comparing SHINCIC with COMBNCIC and SHIqNCIC with COMBqNCIC
For the intervening contour similarity weights, we observe that the graph has nodes of (roughly) equal degrees. The edge weights are almost the same value close to 1. Hence, the cut separating a small set of nodes consists of fewer edges, and thus smaller capacity, than those cuts that separate sets of roughly equal sizes. Consequently, COMB favors unbalanced cuts with a small number of nodes on one side of the bipartition. This phenomenon is illustrated for Image 8 in Table 7.
The bipartition \((S8, \bar{S}8)\) obtained by COMBNCEXP in Table 7a has the background pixels all black and the foreground pixels unmodified. In this particular bipartition the background is the sky. Thus, the similarity weights of edges in the cut \((S8, \bar{S}8)\) should be small. We compute the capacity of cut \((S8, \bar{S}8), C(S8, \bar{S}8)\), with respect to exponential weights and intervening contour weights.
For the exponential and intervening contour similarity weights, the maximum similarity value is 1. Note, however, that the average intervening contour edge weight in cut \((S8, \bar{S}8)\) is 0.67385345, which is quite close to 1. This is not the case for exponential similarity weights where the average exponential edge weight in cut \((S8, \bar{S}8)\) is 0.0000018360302. This demonstrates that intervening contour similarity weights are almost uniform and close to 1 throughout the graph.
We now select a single pixel \(v\) (highlighted with the square) in the background (sky), which implies that it is highly similar to its neighbors, and considers the cut \((\{v\}, V{\setminus }\{v\})\). For exponential similarity weights the capacity of this cut, \(C(\{v\}, V{\setminus }\{v\})\), is 3.9985 and therefore substantially higher than the capacity of the cut \((S8, \bar{S}8)\). For intervening contour similarity weights, however, the capacity of the cut \((\{v\}, V{\setminus }\{v\})\) is 8, which is far smaller than the capacity of the cut \((S8, \bar{S}8)\).
Therefore, intervening contour similarity weights do not work well and produce unbalanced cuts with algorithms that consider the cut capacity such as COMB.
Tables 8 and 9 show the ratios over the 20 benchmark images using intervening contour similarity weights. Table 8 demonstrates the ratios with respect to NC objective values and Table 9 for qNC objective values, for SHI versus COMB.
For the NC results shown in Table 8, there are 5 images where COMB gives better approximations while for the rest 15 images SHI performs better. In most of the cases among the 15 images, COMB just favors an unbalanced cut.
For qNC, as can be seen from the ratios displayed in Table 9, COMB performs much better than SHI. There are 15 images where COMB gives better approximations to qNC than SHI. Furthermore, for qNC, the improvements of COMB on SHI are very significant. Table 10 shows the average improvements of COMB on SHI with respect to NC and qNC with intervening contour similarity weights. They are the mean and median values of the ratios that are greater than 1 in Table 8, for NC, and Table 9, for qNC, respectively. We also display the average improvements of SHI on COMB for NC and qNC with intervening contour similarity weights in Table 11. For all the ratios in Tables 8 and 9 that are smaller to 1, the corresponding reciprocals characterize the improvements of SHI. We take the mean and median values of these reciprocals from Table 8, for NC, and Table 9, for qNC, respectively, to produce Table 11. We display the ratios of Tables 8 and 9 in bar chart Fig. 3. The bars extending to the right (the ratios are greater than 1) indicate the improvements of COMB on SHI while the bars extending to the left (the ratios are smaller than 1) indicate the improvements of SHI on COMB.
Comparing approximation quality of SWEEP and COMB
In general, SWEEP should perform better than just taking the eigenvector with some predetermined procedure for bipartition. Let \(\mathrm{NC}_\mathrm{SWEEP}\) be the normalized cut objective value generated by SWEEP. By the analysis in “A bound on the relation between the spectral method solution and \(\mathrm{NC}_G\)”, it can be shown that \(\mathrm{NC}_\mathrm{SWEEP} \le 4\sqrt{\mathrm{NC}_G}\). Therefore, one may expect that SWEEP will improve on SHI in approximating the optimal objective value of NC. We illustrate the potential improvement of SWEEP over SHI for Image 6, where the gap between the approximation of COMBNCEXP and the approximation of SHINCEXP is largest among the 20 images, as shown in Table 4. From the original data, the approximate objective value of NC achieved by COMB is \(\mathrm{NC}_\mathrm{COMB} = 3.8129621\times 10^{11}\). Therefore, the optimal NC objective of Image 6, \(\mathrm{NC}_G\), is less than or equal to the value of \(\mathrm{NC}_\mathrm{COMB}\). If we use \(\mathrm{NC}_\mathrm{COMB}\) as an estimation to \(\mathrm{NC}_G\), then we obtain an upper bound for \(\mathrm{NC}_\mathrm{SWEEP}\):
However, our original data show that the objective value of NC achieved by SHI for Image 6 is \(1.7250761\times 10^{3}\). Since \(\mathrm{NC}_\mathrm{SWEEP}\) can only be smaller than the upper bound \(2.4699675 \times 10^{5}\), the objective value of NC achieved by SWEEP improves by at least a factor of 70 on the objective value generated by SHI. Indeed, the experimental results match the theoretical prediction that SWEEP does better than SHI for NC. But still, COMB is better than SWEEP with exponential similarity weights.
In addition to the improvement of SWEEP over SHI or fixed threshold bipartition of the Fiedler eigenvector, SWEEP can improve on COMB for intervening contour similarity weights. As discussed in “Comparing instances with intervening contour similarity weights: comparing SHINCIC with COMBNCIC and SHIqNCIC with COMBqNCIC”, COMB tends to provide unbalanced bipartitions for intervening contour similarity weight matrices. For SWEEP this is not an issue, because each threshold bipartition is considered, and the best threshold will obviously correspond to a balanced bipartition. Therefore, we expect SWEEP to do better than COMB for intervening contour similarity weights.
In the following, we display the comparisons of approximation quality of SWEEPNCEXP with COMBNCEXP and SWEEPqNCEXP with COMBqNCEXP.
Tables 12 and 13 show the ratios over the 20 benchmark images using exponential similarity weights for SWEEP versus COMB. Tables 12 and 13 present the ratios with respect to NC and qNC, respectively.
For the NC results shown in Table 12, there are 11 out of the 20 benchmark images where COMB gives a better approximation than SWEEP and the improvements of COMB over SWEEP are smaller than those over SHI. For the qNC results displayed in Table 13, COMB dominates SWEEP and gives better approximations in every case. The results establish that while SWEEP delivers better results than SHI, COMB is still dominant, and gives better results in most cases. We display the mean and median values of the improvements of each method to the other in Tables 14 and 15, respectively. Table 14 is for the average improvement of COMB on SWEEP, while Table 15 is for the average improvement of SWEEP on COMB. The mean and median values are obtained from the ratios in Tables 12 and 13 for NC and qNC, respectively, using the same method introduced in “Comparing instances with intervening contour similarity weights: comparing SHINCIC with COMBNCIC and SHIqNCIC with COMBqNCIC”. The ratios in Tables 12 and 13 are displayed as a bar chart in Fig. 4.
Visual segmentation quality evaluation
In this section, we first evaluate the visual segmentation quality among the three methods: COMB, SHI and SWEEP. Then we compare the criteria NC\(^\prime \) and qNC\(^\prime \) to NC and qNC, respectively, to see which is a better criterion to give good visual segmentation results. Since visual quality is subjective, we provide a subjective assessment, which may not agree with the readers’ judgement.
In some of the comparisons made in this section, we select for COMB the cut which gives the visually best segmentation among the four series of nested cuts corresponding to the four pairs of seeds selected according to the automatic seed selection criterion, as the output of COMB. This visually best cut is often not the numerically best cut that gives the smallest value for NC or qNC objectives. When the visually best cut is chosen as the output of COMB, we use the experimental set notation COMB(NC \(^\prime \))Similarity or COMB(qNC \(^\prime \))Similarity. Here, the (NC \(^\prime \)) or (qNC \(^\prime \)) is used to denote which optimization objective that COMB actually solves, and the Similarity choice can be either exponential or intervening contour similarity weights. Notice that COMB(NC\(^\prime \))Similarity represents a different experimental set from COMBNCSimilarity introduced in “Algorithm, optimization criterion, and similarity classifications and nomenclatures”, since the former experimental set uses the visually best cut while the latter experimental set uses the numerically best cut. So are COMB(qNC\(^\prime \))Similarity and COMBqNCSimilarity.
For SHI and SWEEP, since each of them outputs a unique cut as the solution, there is no distinction between the numerically and visually best cuts. We still use the experimental set notations defined in “Algorithm, optimization criterion, and similarity classifications and nomenclatures” for experimental sets of SHI and SWEEP.
SHI uses a discretization method to generate a bipartition from the Fiedler eigenvector (Yu and Shi 2003) which is considered to give good visual segmentations. Hence, when comparing SHI with COMB, we use the visually best cut as the output of COMB. We conduct the following four comparisons between SHI and COMB:
SHINCEXP and COMB(NC\(^\prime \))EXP
SHIqNCEXP and COMB(qNC\(^\prime \))EXP
SHINCIC and COMB(NC\(^\prime \))IC
SHIqNCIC and COMB(qNC\(^\prime \))IC.
When comparing SWEEP with COMB, we use the numerically best cut as the output of COMB. This is because SWEEP outputs the cut that gives the smallest objective value of NC or qNC among all potential threshold values. Hence, we conduct the following two comparisons between SWEEP and COMB:
SWEEPNCEXP and COMBNCEXP
SWEEPqNCEXP and COMBqNCEXP.
We assess the visual quality of segmentations generated by COMB to compare the performance of different optimization criteria in producing visually good segmentation results. We compare the visual segmentation quality of COMB(NC\(^\prime \))EXP with COMB(qNC\(^\prime \))EXP to determine which criterion, NC\(^\prime \) or qNC\(^\prime \), works better visually. We then compare NC with NC\(^\prime \) and qNC with qNC\(^\prime \) by comparing the visual segmentation quality of the following two pairs of experimental sets:
COMBNCEXP and COMB(NC\(^\prime \))EXP
COMBqNCEXP and COMB(qNC\(^\prime \))EXP.
For each of the above comparisons of two experimental sets, namely experimental set 1 and experimental set 2, we classify each of the 20 benchmark images into the following three categories:

1.
Experimental set 1 gives a better visual segmentation result than experimental set 2. This is denoted as \(1 \succ _{\text{ v}} 2\), where the subscript “v” stands for “visual” and same for the rest.

2.
Experimental set 2 gives a better visual segmentation result than experimental set 1. This is denoted as \(2 \succ _{\text{ v}} 1\).

3.
Both experimental set 1 and experimental set 2 give segmentations of similar visual quality. It includes both cases where the segmentations generated by the two experimental sets are either both good or both bad. This is denoted as \(1 \simeq _{\text{ v}} 2\).
For each of the above visual comparisons, we count how many benchmark images belong to each category. The results are summarized in Table 16.
Based on the data in the first six rows of Table 16, we find that with exponential similarity weights, in general, the visual quality of segmentations generated by COMB is superior to both SHI and SWEEP. If the qNC (or qNC\(^\prime \)) optimization objective is applied, the visual superiority of COMB over SHI and SWEEP is dominant. Based on the data in the seventh row of Table 16, we find that qNC\(^\prime \) works better visually than NC\(^\prime \). According to the data in the last two rows of Table 16, we find that the criteria NC or qNC are not good segmentation criteria. Since the visually best segmentations are obtained through solving NC\(^\prime \) or qNC\(^\prime \), they should be preferred segmentation criteria, for good visual segmentation quality and their tractability.
We find from Table 16 that, in general, SHINCIC delivers best visual segmentations among all the experimental sets using method SHI or SWEEP. That is, SHI works better with intervening contour similarity weights. We also find that COMB(NC\(^\prime \))EXP provides better visual results than COMB(NC\(^\prime \))IC, meaning that COMB works better with exponential similarity weights.
We provide here the images and their segmentations for the two leading methods, SHINCIC and COMB(NC\(^\prime \))EXP. The segmentation is displayed by setting all the pixels in the background part to be black. For each of the 20 benchmark images shown, we give the segmentations generated by SHINCIC and COMB(NC\(^\prime \))EXP in Fig. 5. Notice that for Images 1, 2, 10, 12, 19 and 20, the segmentations generated by SHINCIC are almost entirely black. This is because the segmentations of these images by SHINCIC have almost all pixels in the background parts.
Our judgment is that for Image 1, Image 2, Image 5, Image 10, Image 12, Image 13, Image 14, Image 15, Image 16, Image 17, Image 19 and Image 20, COMB(NC\(^\prime \))EXP gives visually better segmentations than SHINCIC; for Image 3 and Image 6, SHINCIC is visually better than COMB(NC\(^\prime \))EXP; for Image 4, Image 7, Image 8 and Image 18, both SHINCIC and COMB(NC\(^\prime \))EXP generate visually good segmentations of similar quality; for the rest two images, Image 9 and Image 11, neither COMB(NC\(^\prime \))EXP nor SHINCIC gives visually good segmentations.
Conclusions
We report here on detailed experiments conducted on algorithms for the normalized cut and its generalization as quantity normalized cut applied to image segmentation problems. We find that, in general, the combinatorial flow algorithm of Hochbaum (2010, 2012) outperforms the spectral method both numerically and visually. In most cases, the combinatorial algorithm yields tighter objective function values of the two criteria we test. Furthermore, we find that the combinatorial algorithm almost always produces a visual segmentation which is at least as good as that of the spectral method, and often better.
Another important finding in our experiments is that, in contrary to prevalent belief, the normalized cut criterion is not a good model for image segmentation, since it does not provide good quality solutions, in terms of visual quality. Moreover, the normalized cut problem is NPhard. We conclude that instead of modeling the image segmentation problem as the normalized cut problem, it is more effective to model and solve the problem as the polynomial time solvable normalized cut\(^\prime \) problem.
For future research, we plan on investigating other methods of solving image segmentation and other clustering problems, such as the kmeans clustering method discussed in Dhillon et al. (2004, 2007).
References
Alon N, Milman VD (1985) \(\lambda _1\), isoperimetric inequalities for graphs, and superconcentrators. J Combin Theory Ser B 38(1):73–88
Alon N (1986) Eigenvalues and expanders. Combinatorica 6(2):83–96
Chandran BG, Hochbaum DS (2012) Pseudoflow parametric maximum flow solver version 1.0. http://riot.ieor.berkeley.edu/Applications/Pseudoflow/parametric.html. Retrieved August 2012
Chandran BG, Hochbaum DS (2009) A computational study of the pseudoflow and pushrelabel algorithms for the maximum flow problem. Oper Res 57(2):358–376
Cheeger J (1970) A lower bound for the smallest eigenvalue of the Laplacian. In: Gunning RC (ed) Problems in analysis. Princeton University Press, Princeton, pp 195–199
Chung FRK (2007) Four proofs for the Cheeger inequality and graph partition algorithms. In: Proceedings of the international congress of Chinese mathematicians, vol 2
Chung FRK (1997) Spectral graph theory. American Mathematical Society, Providence
Coleman GB, Andrews HC (1979) Image segmentation by clustering. Proc IEEE 67(5):773–785
Cour T, Yu S, Shi J (2011) MATLAB normalized cut image segmentation code. http://www.cis.upenn.edu/~jshi/software/. Retrieved July 2011
Dhawan PA (2003) Medical imaging analysis. Wiley, Hoboken
Dhillon IS, Guan YQ, Kulis, B (2004) Kernel kmeans: spectral clustering and normalized cuts. In: Proceedings of international conference on knowledge discovery and data mining
Dhillon IS, Guan YQ, Kulis B (2007) Weighted graph cuts without eigenvectors: a multilevel approach. IEEE Trans Pattern Anal Mach Intell 29(11):1944–1957
Donath WE, Hoffman AJ (1973) Lower bounds for the partitioning of graphs. IBM J Res Dev 17:420–425
Fiedler M (1975) A property of eigenvectors of nonnegative symmetric matrices and its applications to graph theory. Czech Math J 25(100):619–633
Gallo G, Grigoriadis MD, Tarjan RE (1989) A fast parametric maximum flow algorithm and applications. SIAM J Comput 18(1):30–55
Hochbaum DS (2010) Polynomial time algorithms for ratio regions and a variant of normalized cut. IEEE Trans Pattern Anal Mach Intell 32(5):889–898
Hochbaum DS (2012) A polynomial time algorithm for Rayleigh ratio on discrete variables: replacing spectral techniques for expander ratio, normalized cut and Cheeger constant. Oper Res (to appear, 2012) Early version in, Hochbaum DS (2010) Replacing spectral techniques for expander ratio, normalized cut and conductance by combinatorial flow algorithms. arXiv:1010.4535v1 [math.OC]
Hochbaum DS (2008) The pseudoflow algorithm: a new algorithm for the maximumflow problem. Oper Res 56(4):992–1009
Hosseini MS, Araabi BN, SoltanianZadeh H (2010) Pigment melanin: pattern for Iris recognition. IEEE Trans Instrum Meas 59(4):792–804
Leung T, Malik J (1998) Contour continuity in region based image segmentation. In: Burkhardt H, Neumann B (eds) Proceedings of the fifth European conference on computer vision, Freiburg, vol 1, pp 544–559
Malik J, Belongie S, Leung T, Shi J (2001) Contour and texture analysis for image segmentation. Int J Comput Vis 43(1):7–27
Pappas TN (1992) An adaptive clustering algorithm for image segmentation. IEEE Trans Signal Process 40(4):901–914
Pham DL, Xu CY, Prince JL (2000) Current methods in medical image segmentation. Annu Rev Biomed Eng 2:315–337
Roobottom CA, Mitchell G, MorganHughes G (2010) Radiationreduction strategies in cardiac computed tomographic angiography. Clin Radiol 65(11):859–867
Shapiro LG, Stockman GC (2001) Computer vision. PrenticeHall, New Jersey
Sharon E, Galun M, Sharon D, Basri R, Brandt A (2006) Hierarchy and adaptivity in segmenting visual scenes. Nature 442:810–813
Shi J, Malik J (2000) Normalized cuts and image segmentation. IEEE Trans Pattern Anal Mach Intell 22(8):888–905
Tolliver DA, Miller GL (2006) Graph partitioning by spectral rounding: applications in image segmentation and clustering. IEEE conference on computer vision and pattern recognition, pp 1053–1060
Wu Z, Leahy R (1993) An optimal graph theoretic approach to data clustering: theory and its application to image segmentation. IEEE Trans Pattern Anal Mach Intell 15(11):1101–1113
Xing EP, Jordan MI (2003) On semidefinite relaxations for normalized kcut and connections to spectral clustering. Tech. Report No. UCB/CSD31265, June
Yu SX, Shi J (2003) Multiclass spectral clustering. In: Proceedings of international conference on computer vision, pp 313–319
Acknowledgments
The authors wish to express their thanks to Arnaud CARUSO for his contribution to this project, and in particular, to the development of the automatic seed selection method.
Author information
Affiliations
Corresponding author
Additional information
Research was supported in part by NSF award no. DMI0620677, CMMI1200592 and CBET0736232.
Appendix
Appendix
Benchmark images
Figure 6 contains all the 20 benchmark images we use in the experiment. We sequentially name them from Image 1 to Image 20.
Rights and permissions
About this article
Cite this article
Hochbaum, D.S., Lyu, C. & Bertelli, E. Evaluating performance of image segmentation criteria and techniques. EURO J Comput Optim 1, 155–180 (2013). https://doi.org/10.1007/s1367501200028
Received:
Accepted:
Published:
Issue Date:
Keywords
 Image segmentation
 Normalized cut
 Network flow
 Combinatorial algorithm
 Spectral method
Mathematics Subject Classification
 9008 Computational methods
 90B10 Network models, deterministic
 90C27 Combinatorial optimization