Convex Variational Methods on Graphs for Multiclass Segmentation of HighDimensional Data and Point Clouds
 1.5k Downloads
 4 Citations
Abstract
Graphbased variational methods have recently shown to be highly competitive for various classification problems of highdimensional data, but are inherently difficult to handle from an optimization perspective. This paper proposes a convex relaxation for a certain set of graphbased multiclass data segmentation models involving a graph total variation term, region homogeneity terms, supervised information and certain constraints or penalty terms acting on the class sizes. Particular applications include semisupervised classification of highdimensional data and unsupervised segmentation of unstructured 3D point clouds. Theoretical analysis shows that the convex relaxation closely approximates the original NPhard problems, and these observations are also confirmed experimentally. An efficient dualitybased algorithm is developed that handles all constraints on the labeling function implicitly. Experiments on semisupervised classification indicate consistently higher accuracies than related nonconvex approaches and considerably so when the training data are not uniformly distributed among the data set. The accuracies are also highly competitive against a wide range of other established methods on three benchmark data sets. Experiments on 3D point clouds acquire by a LaDAR in outdoor scenes and demonstrate that the scenes can accurately be segmented into object classes such as vegetation, the ground plane and humanmade structures.
Keywords
Variational methods Graphical models Convex optimization Semisupervised classification Point cloud segmentation1 Introduction
The graphical framework has become a popular setting for classification [8, 25, 92, 100, 101, 102] and filtering [31, 34, 63, 85, 88, 89] of highdimensional data. Some of the best performing classification algorithms are based on solving variational problems on graphs [3, 10, 14, 16, 17, 25, 39, 44, 45, 66, 68, 82, 86, 101]. In simple terms, these algorithms attempt to group the data points into classes in such a way that pairs of data points with different class memberships are as dissimilar as possible with respect to a certain feature. In order to avoid the computational complexity of working with fully connected graphs, approximations, such as those based on spectral graph theory [10, 38, 66] or nearest neighbors [17, 33, 68], are typically employed. For example, [10] and [66] employ spectral approaches along with the Nyström extension method [36] to efficiently calculate the eigendecomposition of a dense graph Laplacian. Works, such as [17, 24, 33, 39, 68, 99], use the ‘nearest neighbor’ approach to sparsify the graph for computational efficiency. Variational problems on graphs have also become popular for processing of 3D point clouds [30, 33, 45, 55, 60, 62].
When the classification task is cast as the minimization of similarity of point pairs with different class membership, extra information is necessary to avoid the trivial global minimizer of value zero where all points are assigned to the same class. In semisupervised classification methods, a small set of the data points are given as training data in advance, and their class memberships are imposed as hard constraints in the optimization problem. In unsupervised classification methods, one typically enforces the sizes of each class not to deviate too far from each other, examples including the normalized cut [78] and Cheeger ratio cut problems [26].
Most of the computational methods for semisupervised and unsupervised classification obtain the solution by computing the local minimizer of a nonconvex energy functional. Examples of such algorithms are those based on phase fields [10] and the MBO scheme [38, 65, 66, 67]. PDEs on graphs for semisupervised classification also include the Eikonal Equation [30] and tugofwar games related to the infinity Laplacian Equation [34]. Unsupervised problems with class size constraints are inherently the most difficult to handle from an optimization viewpoint, as the convex envelope of the problem has a trivial constant function as a minimizer [16, 78, 82]. Various ways of simplifying the energy landscape have been proposed [17, 43, 89]. Our recent work [68] showed that semisupervised classification problems with two classes could be formulated in a completely convex framework and also presented efficient algorithms that could obtain global minimizers.
Image segmentation is a special classification problem where the objective is to assign each pixel to a region. Algorithms based on energy minimization are among the most successful image segmentation methods, and they have historically been divided into ‘regionbased’ and ‘contourbased.’
Regionbased methods attempt to find a partition of the image so that the pixels within each region as a whole are as similar as possible. Additionally, some regularity is imposed on the region boundaries to favor spatial grouping of the pixels. The similarity is usually measured in the statistical sense. In the simplest case, the pixels within each region should be similar to the mean intensity of each region, as proposed in the Chan–Vese [23] and Mumford–Shah [71] models. Contourbased methods [50, 93] instead seek the best suited locations of the region boundaries, typically at locations of large jumps in the image intensities, indicating the interface between two objects.
More recently, it has been shown that the combination of region and contourbased terms in the energy function can give qualitatively very good results [15, 39, 48], especially when nonlocal operators are used in the contour terms [30, 39, 48]. There now exists efficient algorithms for solving the resulting optimization problems that can avoid getting stuck in a local minimum, including both combinatorial optimization algorithms [12, 13, 53] and more recent convex continuous optimization algorithms [6, 7, 15, 20, 58, 75, 96, 97, 98]. The latter have shown to be advantageous in several aspects, such as the fact that they require less memory and have a greater potential for parallel implementation of graphics processing units (GPUs), but special care is needed in case of nonlocal variants of the differential operators (e.g., [76]).
Most of the current data segmentation methods [10, 14, 16, 17, 44, 66, 68, 82] can be viewed as ‘contourbased,’ since they seek an optimal location of the boundaries of each region. Regionbased variational image segmentation models with two classes were generalized to graphs for data segmentation in [59] and for 3D point cloud segmentation in [59, 60, 84] in a convex framework. The region terms could be constructed directly from the point geometry and/or be constructed from a color vector defined at the points. Concrete examples of the latter were used for experiments on point cloud segmentation. Region terms have also been proposed in the context of Markov Random Fields for 3D point cloud segmentation [2, 72, 87], where the weights were learned from training data using associate Markov networks. The independent preprint [94] proposed to use region terms for multiclass semisupervised classification in a convex manner, where the region terms were inferred from the supervised points by diffusion.
Contributions This paper proposes a convex relaxation and an efficient algorithmic optimization framework for a general set of graphbased data classification problems that exhibits nontrivial global minimizers. It extends the convex approach for semisupervised classification with two classes given in our previous work [68] to a much broader range of problems, including multiple classes, novel and more practically useful incorporation of class size information, and a novel unsupervised segmentation model for 3D point clouds acquired by a LaDAR.
The same basic relaxation for semisupervised classification also appeared in the independent preprint [94]. The most major distinctions of this work compared to the preprint [94] are: we also incorporate class size information in the convex framework; we give a mathematical and experimental analysis of the close relation between the convex relaxed and original problems; we propose a different dualitybased ‘maxflow’inspired algorithm; we incorporate information of the supervised points in a different way; and we consider unsupervised segmentation of 3D point clouds.

We specify a general set of classification problems that are suitable for being approximated in a convex manner. The general set of problems involves minimization of a multiclass graph cut term together with supervised constraints, region homogeneity terms and novel constraints or penalty terms acting on the class sizes. Special cases include semisupervised classification of highdimensional data and unsupervised segmentation of 3D point clouds.

A convex relaxation is proposed for the general set of problems, and its approximation properties are analyzed thoroughly in theory and experiments. This extends the work on multiregion image segmentation [6, 58, 98] to data clustering on graphs and to cases where there are constraints or penalty terms acting on the class sizes. Since the introduction of either multiple classes or size constraints causes the general problem to become NPhard, the relaxation can (probably) not be proved to be exact. Instead, conditions are derived for when an exact global minimizer can be obtained from a dual solution of the relaxed problem. The strongest conditions are derived in case there are no constraints on the class sizes, but the theoretical results in both cases show that very close approximations are expected. These theoretical results also agree well with experimental observations.

The convex relaxed problems are formulated as equivalent dual problems that are structurally similar to the ‘maxflow’ problem over the graph. This extends our work [68] to multiple classes and the work on image segmentation proposed in [95] to data clustering on graphs. We use a conceptually different proof than [68, 95], which relates ‘maxflow’ to another more direct dual formulation of the problem. Furthermore, it is shown that also the size constraints and penalty term can be incorporated naturally in the maxflow problem by modifying the flow conservation condition, such that there should be a constant flow excess at each node.

As in our previous work [68, 95], an augmented Lagrangian algorithm is developed based on the new ‘maxflow’ dual formulations of the problems. A key advantage compared to related primal–dual algorithms [21] in imaging, such as the one considered in the preprint [94], is that all constraints on the labeling function are handled implicitly. This includes constraints on the class sizes, which are dealt with by separate dual variables indicating the flow excess at the nodes. Consequently, projections onto the constraint set of the labeling function, which tend to decrease the accuracy and put strict restrictions on the step sizes, are avoided.

We propose an unsupervised segmentation model for unstructured 3D point clouds acquired by a LaDAR within the general framework. It extends the models of [59, 60, 84] to multiple classes and gives concrete examples of region terms constructed purely based on geometrical information of the unlabeled points, in order to distinguish classes such as vegetation, the ground plane and humanmade structures in an outdoor scene. We also propose a graph total variation term that favors alignment of the region boundaries along ‘edges’ indicated by discontinuities in the normal vectors or the depth coordinate. In contrast to [2, 41, 72, 87], our model does not rely on any training data.

Extensive experimental evaluations on semisupervised classification indicate consistently higher accuracies than related local minimization approaches, and considerably so when the training data are not uniformly distributed among the data set. The accuracies are also highly competitive against a wide range of other established methods on three benchmark data sets. The accuracies can be improved further if an estimate of the approximate class sizes are given in advance. Experiments on 3D point clouds acquired by a LaDAR in outdoor scenes demonstrate that the scenes can accurately be segmented into object classes such as vegetation, the ground plane and regular structures. The experiments also demonstrate fast and highly accurate convergence of the algorithms, and show that the approximation difference between the convex and original problems vanishes or becomes extremely low in practice.
Organization This paper starts by formulating the general set of problems mathematically in Sect. 2. Section 3 formulates a convex relaxation of the general problem and analyzes the quality of the relaxation from a dual perspective. Section 4 reformulates the dual problem as a ‘maxflow’ type of problem and derives an efficient algorithm. Applications to semisupervised classification of highdimensional data are presented in Sect. 5.1, and applications to segmentation of unstructured 3D point clouds are described in Sect. 5.2, including specific constructions of each term in the general model. Section 5 also presents a detailed experimental evaluation for both applications (Fig. 1).
2 Data Segmentation as Energy Minimization Over a Graph
2.1 Size Constraints and Supervised Constraints
The energy functions (6) are highly nonconvex, but ways to simplify the energy landscape have been proposed [16, 17, 44, 82] in order to reduce the number of local minima.
2.2 New Flexible Constraint and Penalty Term on Class Sizes
2.3 Region Homogeneity Terms
The independent preprint [94], proposed to use region terms in the energy function for semisupervised classification and the authors, proposed a region term that was inferred from the supervised points by diffusion. In contrast, the region terms in this work do not rely on any supervised points, but are as mentioned only specified and demonstrated for the application of 3D point cloud segmentation.
3 Convex Relaxation of Minimization Problem and Analysis Based on Duality
In this section, the classification problems are formulated as optimization problems in terms of binary functions instead of sets. The binary representations are used to derive convex relaxations. First, some essential mathematical concepts are introduced, such as various differential operators on graphs. These concepts are used extensively to formulate the binary and convex problems and the algorithms.
3.1 Differential Operators on Graphs
Our definitions of operators on graphs are based on the theory in [33, 42, 91]. More information is found in these papers.
3.2 Binary Formulation of Energy Minimization Problem
If the number of supervised points is very low and there is no additional region term, the global minimizer of (20) may become the trivial solution where for one of the classes, say k, \(u_k(x) = 1\) everywhere, and for the other classes \(u_i(x) = 1\) for supervised points of class i and 0 elsewhere. The threshold tends to occur around less than \(2.5 \%\) of the points. As in our previous work [68], this problem can be countered by increasing the number of edges incident to supervised points in comparison with other points. Doing so will increase the cost of the trivial solution without significantly influencing the desired global minimizer. An alternative, proposed in the preprint [94], is to create region terms in a preprocessing step by diffusing information of the supervised points into their neighbors.
3.3 Convex Relaxation of Energy Minimization Problem
In this paper, we are interested in using the convex relaxation (25) to solve the original problem approximately. Under certain conditions, the convex relaxation gives an exact global minimizer of the original problem. For instance, it can be straight forwardly shown that
Proposition 1
Let \(u^*\) be a solution of the relaxed problem (25), with optional size constraints (7) or penalty term (8). If \(u^* \in \mathcal {B}\), then \(u^*\) is a global minimizer of the original nonconvex problem (20).
Proof
Let \(E^P(u)\) be the energy function defined in (25) with or without the size penalty term (8). Since \(\mathcal {B} \subset \mathcal {B}'\) it follows that \(\min _{u \in \mathcal {B}'} E^P(u) \le \min _{u \in \mathcal {B}} E^P(u)\). Therefore, if \(u^* = \hbox {arg min}_{u \in \mathcal {B}'} E^P(u)\) and \(u^* \in \mathcal {B}\) it follows that \(E(u^*) = \min _{u \in \mathcal {B}} E^P(u)\). The size constraints (7) can be regarded as a special case by choosing \(\gamma = \infty \). \(\square \)
3.4 Analysis of Convex Relaxation Through a Dual Formulation
We will now derive theoretical results which indicate that the multiclass problem (20) is closely approximated by the convex relaxation (25). The following results extend those given in [6] from image domains to graphs. In contrast to [6], we also incorporate size constraints or penalty terms in the analysis. In fact, the strongest results given near the end of the section are only valid for problems without such size constraints/terms. This observation agrees well with our experiments, although in both cases very close approximations are obtained.
We start by deriving an equivalent dual formulation of (25). Note that this dual problem is different from the ‘maxflow’ type dual problem on graphs proposed in our previous work [68] in case of two classes. Its main purpose is theoretical analysis, not algorithmic development. In fact, its relation to flow maximization will be the subject of the next section. Dual formulations on graphs have also been proposed in [45] for variational multiscale decomposition of graph signals.
Theorem 1
Proof
Assuming a solution of the dual problem \(q^*,{\rho ^1}^*,\) \({\rho ^2}^*\) has been obtained, the following theorem characterizes the corresponding primal variable \(u^*\)
Theorem 2
Proof
If the minimizer \(I_m(x)\) is unique, it follows directly from (39) that \(u^*_i(x)\) must be the indicator vector (40).
If the minimizer \(I_m(x)\) is unique at every point \(x \in V\), then the corresponding primal solution \(u^*\) given by (40) is contained in the binary set \(\mathcal {B}\). By Proposition 1, \(u^*\) is a global minimizer of (20). \(\square \)
Theorem 3
Assume that \(q^*\) is a maximizer of the dual problem (27) with \(\gamma = 0\), i.e., no class size constraints. If (38) has at most two minimal components for all \(x \in V\), then there exists a corresponding binary primal solution to the convex relaxed primal problem (25), which is a global minimizer of the original nonconvex problem (20).
A constructive proof of Theorem 3 is given in “Appendix.”
If the vector \((C(x) + {{\mathrm{div}}}_w q^{*}(x) + {\rho ^2}^*  {\rho ^1}^*)\) has three or more minimal components, it cannot in general be expected that a corresponding binary primal solution exists, reflecting that one can probably not obtain an exact solution to the NPhard problem (20) in general by a convex relaxation. Experiments indicate that this very rarely, if ever, happens in practice for the classification problem (20).
As an alternative thresholding scheme, \(u^T\) can be selected based on the formula (40) after a dual solution to the convex relaxation has been obtained. If there are multiple minimal components to the vector \((C + {{\mathrm{div}}}q^*)(x)\), one can select \(u^T(x)\) to be one for an arbitrary one of those indices, just as for the ordinary thresholding scheme (26). Experiments will demonstrate and compare both schemes in Sect. 5.
4 ‘MaxFlow’ Formulation of Dual Problem and Algorithm
A drawback of the dual model (27) is the nonsmoothness of the objective function, which is also a drawback of the original primal formulation of the convex relaxation. This section reformulates the dual model in a structurally similar way to a maxflow problem, which is smooth and facilitates the development of a very efficient algorithm based on the augmented Lagrangian theory.
The resulting dual problem can be seen as a multiclass variant of the maxflow model proposed in our work [68] for two classes, and a graph analogue of the maxflow model given for image domains in [95]. Note that our derivations differ conceptually from [68, 95], because we directly utilize the dual problem derived in the last section. Furthermore, the new flexible size constraint (7) and penalty term (8) are incorporated naturally in the maxflow problem by a modified flow conservation condition, which indicates that there should be a constant flow excess at each node. The amount of flow excess is expressed with a few additional optimization variables in the algorithm, and they can optimized over with very little additional computational cost.
4.1 ‘MaxFlow’ Reformulation Dual Problem
We now derive alternative dual and primal–dual formulations of the convex relaxed problem that are more beneficial for computations. The algorithm will be presented in the next section.
Proposition 2
Proof
Problem (41) with constraints (42)–(45) is structurally similar to a maxflow problem over n copies of the graph G, \((V_1,E_1) \times ... \times (V_n,E_n)\), where \((V_i,E_i) = G\) for \(i \in I\). The aim of the maxflow problem is to maximize the flow from a source vertex to a sink vertex under flow capacity at each edge and flow conservation at each node. The variable \(p_s(x)\) can be regarded as the flow on the edges from the source to the vertex x in each of the subgraphs \((V_1,E_1), ..., (V_n,E_n)\), which have unbounded capacities. The variables \(p_i(x)\) and \(C_i(x)\) can be regarded as the flow and capacity on the edge from vertex x in the subgraph \((V_i,E_i)\) to the sink. Constraint (47) is the flow conservation condition. Observe that in case of size constraints/terms, instead of being conserved, there should be a constant excess flow \(\rho _i^1  \rho _i^2\) for each node in the subgraph \((V_i,E_i)\). The objective function (41) is a measure of the total amount of flow in the graph.
Utilizing results from Sect. 3.4, we now show that the convex relaxation (25) is the equivalent dual problem to the maxflow problem (41).
Theorem 4
 (1)
 (2)The primal–dual problem:subject to (42), (43) and (45), where u is the relaxed region indicator function.$$\begin{aligned}&\min _{u} \sup _{p_s, p, q,\rho ^1,\rho ^2} \left\{ E(p_s, p, q, \rho ^1,\rho ^2; u) \right. \nonumber \\&= \sum _{x \in V} p_s(x) + \sum _{i=1}^n \left( \rho _i^1 S^\ell _i  \rho _i^2 S^u_i \right) \nonumber \\&\quad \left. + \sum _{i=1}^n \sum _{x \in V} u_i \left( {{\mathrm{div}}}_w q_i  p_s + p_i + \rho _i^2  \rho _i^1 \right) (x) \right\} \end{aligned}$$(48)
 (3)
The convex relaxed problem (25) with size constraint (7) if \(\gamma = \infty \), size penalty term (8) if \(0< \gamma < \infty \) and no size constraints if \(\gamma = 0\).
Proof
The equivalence between the primal–dual problem (48) and the maxflow problem (41) follows directly as \(u_i\) is an unconstrained Lagrange multiplier for the flow conservation constraint (47). Existence of the Lagrange multipliers follows as: (1) (41) is upper bounded, since it is equivalent to (27), which by Theorem 2 admits a solution, and (2) the constraints (44) are linear and hence differentiable.
The equivalence between the primal–dual problem (48), the maxflow problem (41) and the convex relaxed problem (25) now follows: By Proposition 2 the ‘maxflow’ problem (41) is equivalent to the dual problem (27). By Theorem 1, the dual problem (27) is equivalent to the convex relaxed problem (25) with size constraints (7) if \(\gamma = \infty \), size penalty term (8) if \(0< \gamma < \infty \) and no size constraints if \(\gamma = 0\). \(\square \)
4.2 Augmented Lagrangian Maxflow Algorithm
This section derives an efficient algorithm, which exploits the fact all constraints on u are handled implicitly in the primal–dual problem (48). The algorithm is based on the augmented Lagrangian theory, where u is updated as a Lagrange multiplier by a gradient descent step each iteration. Since no subsequent projection of u is necessary, the algorithm tolerates a wide range of step sizes and converges with high accuracy. The advantages of related ‘maxflow’ algorithms for ordinary 2D imaging problems over, e.g., Arrow–Hurwicz type primal–dual algorithms have been demonstrated in [5, 97].
5 Applications and Experiments
We now focus on specific applications of the convex framework. Experimental results on semisupervised classification of highdimensional data are presented in Sect. 5.1. Section 5.2 proposes specific terms in the general model (9) for segmentation of unstructured 3D point clouds and presents experimental results on LaDAR data acquired in outdoor scenes. In both cases, we give a thorough examination of accuracy of the results, tightness of the convex relaxations and convergence properties of the algorithms.
5.1 Semisupervised Classification Results
In this section, we describe the supervised classification results, using the algorithm with and without the size constraints (5), (7) or penalty term (8).
All experiments were performed on a 2.4 GHz Intel Core i2 Quad CPU. We initialize \(C_i(x)=\) constant (in our case, the constant is set to 500) if x is a supervised point of any class but class i, and 0 otherwise, for all \(i \in I\). The variables u, \(q_i\), \(\rho _i^1\), \(\rho _i^2\) are initialized to zero for all \(i \in I\). The variable \(p_s\) is initialized to \(C_n\), where n is the number of classes. We set \(p_i=p_s \quad \forall i \in I\).
In the following, we give details about the setup and results for each data set, before we draw some general conclusions in the end.
5.1.1 MNIST
We construct the graph as follows; each image is a node on a graph, described by the feature vector of 784 pixel intensity values in the image. These feature vectors are used to compute the weights for pairs of nodes. The weight matrix is computed using the Zelnik–Manor and Perona weight function (64) with local scaling using the 8th closest neighbor. We note that preprocessing of the data is not needed to obtain an accurate classification; we do not perform any preprocessing. The parameter c used was 0.05.
The average accuracy results over 100 different runs with randomly chosen supervised points are shown in Table 1 in case of no size constraints. We note that the new approaches reach consistently higher accuracies and lower energies than related local minimization approaches, and that incorporation of size information can improve the accuracies further. The computation times are highly efficient, but not quite as fast as MBO, which only uses 10 iteration to solve the problem in an approximate manner. The \(\text {Log}_{10}\) plots of the binary difference versus iteration, depicted in Fig. 7, show that the binary difference converges to an extremely small number.
The results of the data set are visualized in Fig. 4. For the visualization procedure, we use the first and the sixth eigenvector of the graph Laplacian. The dimension of each of the eigenvectors is \(N \times 1\), and each node of the data set is associated with a value of each of the vectors. One way to visualize a classification of a data set such as MNIST, which consists of a collection of images is to plot the values of one eigenvector of the graph Laplacian versus another and use colors to differentiate classes in a given segmentation. In this case, the plots in Fig. 4 graph the values of the first versus the sixth eigenvector (of the graph Laplacian) relating to the nodes of class 4 and 9 only. The blue and red region represents nodes of class 4 and 9, respectively. The green region represents misclassified points.
Moreover, we compare our results to those of other methods in Table 1, where our method’s name is written in bold. Note that algorithms such as linear and nonlinear classifiers, boosted stumps, support vector machines and both neural and convolution nets are all supervised learning approaches, which use around 60, 000 of the images as a training set (\(86\%\) of the data) and 10, 000 images for testing. However, we use only \(3.57\%\) (or less) of our data as supervised training points, and obtain classification results that are either competitive or better than those of some of the best methods. Moreover, note that no preprocessing was performed on the data, as was needed for some of the methods we compare with; we worked with the raw data directly.
5.1.2 Three Moons Data Set
We created a synthetic data set, called the three moons data set, to test our method. The set is constructed as follows. First, consider three half circles in \(\mathbb {R}^2\). The first two half top circles are unit circles with centers at (0, 0) and (3, 0). The third half circle is a bottom half circle with radius of 1.5 and center at (1.5, 0.4). A thousand points from each of the three half circles are sampled and embedded in \(\mathbb {R}^{100}\) by adding Gaussian noise with standard deviation of 0.14 to each of the 100 components of each embedded point. The goal is to segment the circles, using a small number of supervised points from each class. Thus, this is a 3class segmentation problem. The noise and the fact that the points are embedded in highdimensional space make this difficult.
5.1.3 COIL
We evaluated our performance on the benchmark COIL data set [25, 73] from the Columbia University Image Library. This is a set of color \(128 \times 128\) images of 100 objects, taken at different angles. The red channel of each image was downsampled to \(16 \times 16\) pixels by averaging over blocks of \(8 \times 8\) pixels. Then, 24 of the objects were randomly selected and then partitioned into six classes. Discarding 38 images from each class leaves 250 per class, giving a data set of 1500 data points and 6 classes.
We construct the graph as follows; each image is a node on a graph. We apply PCA to project each image onto 241 principal components; these components form the feature vectors. The vectors are used to calculate the distance component of the weight function. The weight matrix is computed using the Zelnik–Manor and Perona weight function (64) with local scaling using the 4th nearest neighbor. The parameter c used was 0.03.
Resulting accuracies are shown in Table 1, indicating that our method outperforms local minimization approaches and is comparable to or better than some of the other best existing methods. The results of the data set are visualized in Fig. 6; the procedure used is similar to that of the MNIST data set visualization procedure. The plots in the figure graph the values of the first versus the third eigenvector of the graph Laplacian. The results of the classification are labeled by different colors.
5.1.4 Landsat Satellite Data Set
We also evaluated our performance on the Landsat Satellite data set, obtained from the UCI Machine Learning Repository [4]. This is a hyperspectral data set which is composed of sets of multispectral values of pixels in 3 \(\times \) 3 neighborhoods in a satellite image; the portions of the electromagnetic spectrum covered include nearinfrared. The goal is to predict the classification of the central pixel in each element of the data set. The six classes are red soil, cotton crop, gray soil, damp gray soil, soil with vegetation stubble and very damp gray soil. There are 6435 nodes in the data set.
We construct the graph as follows. The UCI Web site provides a 36dimensional feature vector for each node. The feature vectors are used to calculate the distance component of the weight function. The weight matrix is computed using the Zelnik–Manor and Perona weight function (64) with local scaling using the 4th nearest neighbor. The parameter c used was 0.3.
Table 1 includes comparison of our method to some of the best methods (most cited in [70]). One can see that our results are of higher accuracy. We now note that, except the GL and MBO algorithms, all other algorithms we compare the Landsat satellite data to are supervised learning methods, which use 80% of data for training and 20% for testing. Our method was able to outperform these algorithms while using a very small percentage of the data set (10%) as supervised points. Even with 5.6% supervised points, it outperforms all but one of the aforementioned methods.
5.1.5 Nonuniform Distribution of Supervised Points
In all previous experiments, the supervised points have been sampled randomly from all the data points. To test the algorithms in more challenging scenarios, we introduce some bias in the sampling of the supervised points, which is also a more realistic situation in practice. We used two different data sets for this test: the MNIST data set and the COIL data set.
In the case of the MNIST data set, we chose the supervised points nonrandomly for digits 4 and 9 only. To obtain the nonrandomness, we allowed a point to be chosen as supervised only if it had a particular range of values for the second eigenvector. This resulted in a biased distribution of the supervised points. The results for this experiment were the following: For the maxflow algorithm, the overall accuracy was \(97.734\%\), while for digits 4 and 9, it was \(96.85\%\). For comparison, the nonconvex MBO algorithm [38] gave an accuracy of \(95.60\%\) overall, but \(89.71\%\) for digits 4 and 9. The MBO method was also a bit more unstable in its accuracy with respect to different distributions of the supervised points. The maxflow algorithm was very stable, with a very small standard deviation for a set of accuracies for different supervised point distributions.
In the case of the COIL data set, we chose the supervised points nonrandomly for classes 2 and 6. The nonrandomness was achieved in the same way as for the MNIST data set. The results were the following: The overall accuracy of the maxflow was \(92.69\%\), while for classes 2 and 6, it was \(90.89\%\). The MBO algorithm [38] gave an accuracy of \(83.90\%\) overall, but \(77.24\%\) for classes 2 and 6.
These results are summarized in Table 2 and are visualized in Figs. 4 and 6 for MNIST and COIL data sets, respectively.
5.1.6 Experiments with Size Constraints and Penalty Term
The exact size constraints (5) could improve the accuracies if knowledge of the exact class sizes are available. However, it is not realistic to obtain the exact knowledge of the class sizes in practice, and this was the motivation behind developing the flexible constraints (7) or the penalty term (8). In order to simulate the case that only an estimate of the class sizes are known, we perturb the exact class sizes by a random number ranging between 1 and 20 \(\%\) of V / n. The lower and upper bounds in (7) and (8) are centered around the perturbed class size, and the difference between them is chosen based on the uncertainty of the estimation, which we assume to be known. More specifically, denoting the exact class size \(c_i\), the perturbed class size \({\tilde{c}}_i\) is chosen as a random number in the interval \([c_ip,c_i+p]\). In experiments, we select p as 1, 10 and 20 \(\%\) of V / n. The lower and upper bounds in the flexible size constraint (7) and the penalty term (8) are chosen as \(S^\ell _i = {\tilde{c}}_i  p\) and \(S^u_i = {\tilde{c}}_i +p\). The parameter \(\gamma \) in the penalty term is set to 10 for all data sets.
Accuracy compared to ground truth of the proposed algorithm versus other algorithms
Method  Accuracy (%) 

MNIST (10 classes) \(^\mathrm{a}\)  
pLaplacian [19]  87.1 
Multicut normalized 1cut [43]  87.64 
88  
Cheeger cuts [82]  88.2 
92.3–98.74  
Transductive classification [83]  92.6 
Tree GL [37]  93.0 
95.0–97.17  
95.3–99.65  
96.4–96.7  
98.6–99.32  
GL [38] (3.57% supervised pts.)  96.8 
MBO [38] (3.57% supervised pts.)  96.91 
Proposed (3.57% supervised pts.)  97.709 
Three moons (\(5 \%\) supervised points)  
GL [38]  98.4 
MBO [38]  99.12 
Proposed  98.714 
COIL (\(10 \%\) supervised points)  
knearest neighbors [81]  83.5 
87.8  
89.9  
SQLossI [81]  90.9 
MP [81]  91.1 
GL [38]  91.2 
MBO [38]  91.46 
Proposed  93.302 
Landsat satellite data set \(^\mathrm{b}\)  
SCSVM\(^{\mathrm{b}}\) [70]  65.15 
SH SVM\(^{\mathrm{b}}\) [70]  75.43 
SLS\(^{\mathrm{b}}\) [70] )  65.88 
Simplex boosting\(^{\mathrm{b}}\) [70]  86.65 
SLS rbf.\(^{\mathrm{b}}\) [70]  90.15 
GL [38] (10% supervised pts.)  87.62 
GL [38] (5.6% supervised pts.)  87.05 
MBO [38] (10% supervised pts.)  87.76 
MBO [38] (5.6% supervised pts.)  87.25 
Proposed (10% supervised pts.)  90.267 
Proposed (5.6% supervised pts.)  88.621 
Accuracies in case of nonuniformly distributed supervised points
Overall (%)  Classes 4 and 9 (%)  

Proposed, MNIST  97.734  96.85 
MBO, MNIST  95.60  89.72 
Overall (%)  Classes 2 and 6 (%)  

Proposed, COIL  92.69  90.89 
MBO, COIL  83.90  77.24 
Accuracies for experiments with class size incorporation.
Max size perturbation (p)  \(1 \%\)  \(10 \%\)  \(20 \%\) 

MNIST, 3.57% supervised points  
Flexible size constraints (7)  97.761  97.725  97.716 
Penalty term (8)  97.755  97.739  97.722 
Exact size constraints (5)  96.139  70.820  63.660 
Three moons  
\(5\%\) supervised points  
Flexible size constraints (7)  99.374  98.829  98.750 
Penalty term (8)  99.368  98.789  98.718 
Exact size constraints (5)  99.108  72.685  66.627 
\(0.6\%\) supervised points  
Flexible size constraints (7)  97.833  97.738  97.160 
Penalty term (8)  97.848  97.793  97.406 
Exact size constraints (5)  97.706  68.956  66.872 
COIL  
\(10\%\) supervised points  
Flexible size constraints (7)  93.403  93.535  93.527 
Penalty term (8)  93.360  93.418  93.325 
Exact size constraints (5)  92.990  59.936  55.624 
\(5\%\) supervised points  
Flexible size constraints (7)  90.428  90.892  90.730 
Penalty term (8)  89.957  90.967  90.712 
Exact size constraints (5)  89.931  55.152  54.674 
Landsat satellite data set  
\(10\%\) supervised points  
Flexible size constraints (7)  90.504  90.397  90.344 
Penalty term (8)  90.479  90.371  90.347 
Exact size constraints (5)  87.773  67.687  65.757 
\(5\%\) supervised points  
Flexible size constraints (7)  89.024  89.022  88.848 
Penalty term (8)  89.025  89.018  88.987 
Exact size constraints (5)  86.327  60.904  51.276 
5.1.7 Summary of Experimental Results
Experimental results on the benchmark data sets, shown in Table 1, indicate a consistently higher accuracy of the proposed convex algorithm than related local minimization approaches based on the MBO or Ginzburg–Landau scheme. The improvements are especially significant when the supervised points are not uniformly distributed among the data set as shown in Table 2. On one synthetic data set, ‘three moons,’ the accuracy of the new algorithm was slightly worse, indicating that the global minimizer was not the best in terms of accuracy for this particular toy example. Table 5 shows that the new algorithm reaches the lowest energy on all of the experiments, further indicating that MBO and Ginzburg–Landau are not able to converge to the global minimum. Table 1 shows that the accuracies of the proposed algorithm are also highly competitive against a wide range of other established algorithms, even when substantially less training data than those algorithms are being used. Table 3 shows that that the flexible size constraints (7) and penalty term (8) can improve the accuracy, if a rough estimate of the approximate class sizes are given.
Initial and final energy
Initial energy  Final energy (MBO) [38]  Final energy proposed  

MNIST  225,654  15,196  12,324 
3 moons  5982.79  433.19  420.24 
COIL  1774.3  24.61  24.18 
Satellite  5116.9  221.87  214.95 
Note that a lot more iterations than necessary have been used in the binary difference plots. In practice, the algorithm reaches sufficient stability in 100–300 iterations. The CPU times, summerized in Table 4, indicate a fast convergence of the new algorithm, much faster than GL, although not quite as fast as the MBO scheme. It must be noted that MBO is an extremely fast front propagation algorithm that only uses a few (e.g., 10) iterations, but its accuracy is limited due to the large step sizes. A deeper discussion on the number of iterations needed to reach the exact solution after thresholding will be given at the end of the next section on point cloud segmentation.
5.2 Segmentation of 3D Point Clouds
The energy function (9) that combines region homogeneity terms and dissimilarity across region boundaries will be demonstrated for segmentation of unstructured 3D point clouds, where each point is a vertex in V. Point clouds arise for instance through laserbased range imaging or multiple view scene reconstruction. The results of point cloud segmentation are easy to visualize and the choice of each term in the energy function will have a clear intuitive meaning that may be translated to other graphbased classification problems in the future. We focus especially on point clouds acquired through the concept of laser detection and ranging (LaDAR) in outdoor scenarios. A fundamental computer vision task is to segment such scenes into classes of similar objects. Roughly, some of the most common object classes in an outdoor scene are the ground plane, vegetation and humanmade ‘objects’ with a certain regular structure.
5.2.1 Construction of the Energy Function
We construct the graph by connecting each node to its knearest neighbors (kNN) based on the Euclidian distance as described at the beginning of Sect. 2. In experiments, we set \(k=20\). We construct region terms that favor homogeneity of geometrical features based on a combination of point coordinates, normal vectors and variation of normal vectors. The construction is a concrete realization of the general region terms introduced in [59, 60, 84]. We also propose to use a contour term that favors alignment of the boundaries of the regions at ‘edges,’ indicated by sharp discontinuities of the normal vectors. Our model can be seen as a point cloud analogue of variational models for traditional image segmentation that combine region and edgebased features in a single energy functional [15, 39, 48]. In contrast to the work [2, 72, 87] our model does not rely on training data.
The variable \(\mathbf{v }^1(x)\) is consequently a discrete estimation of the normal vector at x and the first eigenvalue \(\lambda ^1(x)\) indicates to which extend the normal vectors vary locally around the point x. If all the points were laying on a plane, then \(\lambda ^1(x)\) would be zero and \(\mathbf{v }^1(x)\) would be the normal vector of the plane.
5.2.2 Experiments
It can be observed that the algorithm leads to consistent results even though these scenes are particularly challenging because the tilt and height of the ground plane vary highly over the scene due to the hilly landscape, and some of the trees and bushes are completely aligned with and touches the buildings. Note that buildings hidden behind vegetation get detected since the laser pulses are able to partially penetrate through the leaves. A misassignment can be observed in the middle of Fig. 10, where only the roof of one of the buildings is visible due to occlusions. Since no points are observed from the wall of the building, the roof gets assigned to the ground plane region. Some large rocks on Fig. 10 also get assigned to the blue region due to their steep and smooth surfaces (Fig. 12).
6 Conclusions
Variational models on graphs have shown to be highly competitive for various data classification problems, but are inherently difficult to handle from an optimization perspective, due to NPhardness except in some restricted special cases. This work has developed an efficient convex algorithmic framework for a set of classification problems with multiple classes involving graph total variation, region homogeneity terms, supervised information and certain constraints or penalty terms acting on the class sizes. Particular problems that could be handled as special cases included semisupervised classification of highdimensional data and unsupervised segmentation of unstructured 3D point clouds. The latter involved minimization of a novel energy function enforcing homogeneity of point coordinatebased features within each region, together with a term aligning the region boundaries along edges. Theoretical and experimental analysis revealed that the convex algorithms were able to produce vanishingly close approximations to the global minimizers of the original problems in practice.
Experiments on benchmark data sets for semisupervised classification resulted in higher accuracies of the new algorithm compared to related local minimization approaches. The accuracies were also highly competitive against a wide range of other established algorithms. The advantages of the proposed algorithm were particularly prominent in case of sparse or nonuniformly distributed training data. The accuracies could be improved further if an estimate of the approximate class sizes were given in advance. Experiments also demonstrated that 3D point clouds acquired by a LaDAR in outdoor scenes could be segmented into object classes with a high degree of accuracy, purely based on the geometry of the points and without relying on training data. The computational efficiency was at least an order of magnitude faster than related work reported in the literature.
In the future, it would be interesting to investigate region homogeneity terms for general unsupervised classification problems. In addition to avoiding the problem of trivial global minimizers, the region terms may improve the accuracy compared to models based primarily on boundary terms. Region homogeneity may, for instance, be defined in terms of the eigendecomposition of the covariance matrix or graph Laplacian.
References
 1.Altman, N.S.: An introduction to kernel and nearestneighbor nonparametric regression. Am. Stat. 46, 175–185 (1992)MathSciNetGoogle Scholar
 2.Anguelov, D., Taskar, B., Chatalbashev, V., Koller, D., Gupta, D., Heitz, G., Ng, A. Y.: Discriminative learning of Markov random fields for segmentation of 3d scan data. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition, San Diego, CA, USA, pp. 169–176 (2005)Google Scholar
 3.Aujol, J.F., Gilboa, G., Papadakis, N.: Fundamentals of nonlocal total variation spectral theory. In: Proceedings Scale Space and Variational Methods in Computer Vision, pp. 66–77 (2015)Google Scholar
 4.Bache, K., Lichman, M.: UCI machine learning repository (2013)Google Scholar
 5.Bae, E., Tai, X.C., Yuan, J.: Maximizing flows with messagepassing: Computing spatially continuous mincuts. In: Energy Minimization Methods in Computer Vision and Pattern Recognition—10th International Conference, pp. 15–28 (2014)Google Scholar
 6.Bae, E., Yuan, J., Tai, X.C.: Global minimization for continuous multiphase partitioning problems using a dual approach. Int. J. Comput. Vis. 92(1), 112–129 (2011)MathSciNetCrossRefzbMATHGoogle Scholar
 7.Bae, E., Tai, X.C.: Efficient global minimization methods for image segmentation models with four regions. J. Math. Imaging Vis. 51(1), 71–97 (2015)MathSciNetCrossRefzbMATHGoogle Scholar
 8.Belkin, M., Niyogi, P., Sindhwani, V.: Manifold regularization: a geometric framework for learning from labeled and unlabeled examples. J. Mach. Learn. Res. 7, 2399–2434 (2006)MathSciNetzbMATHGoogle Scholar
 9.Bentley, J.L.: Multidimensional binary search trees used for associative searching. Commun. ACM 18, 509–517 (1975)CrossRefzbMATHGoogle Scholar
 10.Bertozzi, A.L., Flenner, A.: Diffuse interface models on graphs for classification of high dimensional data. Multiscale Model. Simul. 10(3), 1090–1118 (2012)MathSciNetCrossRefzbMATHGoogle Scholar
 11.Boykov, Y., Veksler, O., Zabih, R.: Markov random fields with efficient approximations. In: 1998 Conference on Computer Vision and Pattern Recognition (CVPR ’98), 23–25 June 1998, Santa Barbara, pp. 648–655 (1998)Google Scholar
 12.Boykov, Y., Veksler, O., Zabih, R.: Fast approximate energy minimization via graph cuts. IEEE Trans. Pattern Anal. Mach. Intell. 23(11), 1222–1239 (2001)CrossRefGoogle Scholar
 13.Boykov, Y., Kolmogorov, V.: An experimental comparison of mincut/maxflow algorithms for energy minimization in vision. IEEE Trans. Pattern Anal. Mach. Intell. 26, 359–374 (2001)zbMATHGoogle Scholar
 14.Bresson, X., Laurent, T., Uminsky, D., von Brecht, J., Multiclass total variation clustering. In: Advances in Neural Information Processing Systems, pp. 1421–1429 (2013)Google Scholar
 15.Bresson, X., Esedoglu, S., Vandergheynst, P., Thiran, J.P., Osher, S.: Fast global minimization of the active contour/snake model. J. Math. Imaging Vis. 28(2), 151–167 (2007)MathSciNetCrossRefGoogle Scholar
 16.Bresson, X., Laurent, T., Uminsky, D., von Brecht, J.H.: Convergence and energy landscape for Cheeger cut clustering. Adv. Neural Inf. Process. Syst. 25, 1394–1402 (2012)Google Scholar
 17.Bresson, X., Tai, X.C., Chan, T.F., Szlam, A.: Multiclass transductive learning based on \(\ell _1\) relaxations of cheeger cut and mumfordshahpotts model. J. Math. Imaging Vis. 49(1), 191–201 (2014)CrossRefzbMATHGoogle Scholar
 18.Brown, E.S., Chan, T.F., Bresson, X.: Completely convex formulation of the ChanVese image segmentation model. Int. J. Comput. Vis. 98, 103–121 (2011). doi: 10.1007/s112630110499y MathSciNetCrossRefzbMATHGoogle Scholar
 19.Bühler, T., Hein, M.: Spectral clustering based on the graph pLaplacian. In: Proceedings of the 26th Annual International Conference on Machine Learning, pp. 81–88. ACM (2009)Google Scholar
 20.Chambolle, A., Cremers, D., Pock, T.: A convex approach to minimal partitions. SIAM J. Imaging Sci. 5(4), 1113–1158 (2012)MathSciNetCrossRefzbMATHGoogle Scholar
 21.Chambolle, A., Pock, T.: A firstorder primaldual algorithm for convex problems with applications to imaging. J. Math. Imaging Vis. 40(1), 120–145 (2011)MathSciNetCrossRefzbMATHGoogle Scholar
 22.Chan, T.F., Esedo\(\bar{\rm g}\)lu, S., Nikolova, M.: Algorithms for finding global minimizers of image segmentation and denoising models. SIAM J. Appl. Math. 66(5), 1632–1648 (2006)Google Scholar
 23.Chan, T., Vese, L.A.: Active contours without edges. IEEE Imag. Proc. 10, 266–277 (2001)CrossRefzbMATHGoogle Scholar
 24.Chan, T.F., Zhang, X.: Wavelet inpainting by nonlocal total variation. Inverse Probl. Imaging 4(1), 191–210 (2010)MathSciNetCrossRefzbMATHGoogle Scholar
 25.Chapelle, O., Schölkopf, B., Zien, A.: Semisupervised Learning, vol. 2. MIT Press, Cambridge (2006)CrossRefGoogle Scholar
 26.Cheeger, J.: A Lower Bound for the Smallest Eigenvalue of the Laplacian. Princeton University Press, Princeton (1970)zbMATHGoogle Scholar
 27.Cireşan, D.C., Meier, U., Masci, J., Gambardella, L.M., Schmidhuber, J.: Flexible, high performance convolutional neural networks for image classification. In: Proceedings of the 22nd International Joint Conference on Artificial Intelligence, pp. 1237–1242 (2011)Google Scholar
 28.Dahlhaus, E., Johnson, D. S., Papadimitriou, C. H., Seymour, P. D., Yannakakis, M.: The complexity of multiway cuts (extended abstract). In STOC ’92: Proceedings of the 24th annual ACM symposium on Theory of computing, pp. 241–251, New York, NY, USA, 1992. ACM (1992)Google Scholar
 29.Decoste, D., Schölkopf, B.: Training invariant support vector machines. Mach. Learn. 46(1), 161–190 (2002)CrossRefzbMATHGoogle Scholar
 30.Desquesnes, X., Elmoataz, A., Lezoray, O.: Eikonal equation adaptation on weighted graphs: fast geometric diffusion process for local and nonlocal image and data processing. J. Math. Imaging Vis. 46, 238–257 (2013)MathSciNetCrossRefzbMATHGoogle Scholar
 31.Digne, J.: Similarity based filtering of point clouds. In: 2012 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops, Providence, RI, USA, June 16–21, 2012, pp. 73–79 (2012)Google Scholar
 32.Ekeland, I., Téman, R.: Convex analysis and variational problems. Society for Industrial and Applied Mathematics, Philadelphia, PA, USA (1999)Google Scholar
 33.Elmoataz, A., Lezoray, O., Bougleux, S.: Nonlocal discrete regularization on weighted graphs: a framework for image and manifold processing. IEEE Trans. Image Process. 17, 1047–1060 (2008)MathSciNetCrossRefzbMATHGoogle Scholar
 34.Elmoataz, A., Touttain, M., Tenbrinck, D.: On the plaplacian and infinitylaplacian on graphs with application in image and data processing. SIAM J. Imaging Sci. 8, 2412–2451 (2015)MathSciNetCrossRefzbMATHGoogle Scholar
 35.Esser, J. E.: Primal dual algorithms for convex models and applications to image restoration, registration and nonlocal inpainting. (Ph.D. thesis, UCLA CAMreport 1031), April (2010)Google Scholar
 36.Fowlkes, C., Belongie, S., Chung, F., Malik, J.: Spectral grouping using the Nyström method. IEEE Trans. Pattern Anal. Mach. Intell. 26(2), 214–225 (2004)CrossRefGoogle Scholar
 37.GarciaCardona, C., Flenner, A., Percus, A.G.: Multiclass diffuse interface models for semisupervised learning on graphs. In: Proceedings of the 2th International Conference on Pattern Recognition Applications and Methods. SciTePress (2013)Google Scholar
 38.GarciaCardona, C., Merkurjev, E., Bertozzi, A.L., Flenner, A., Percus, A.G.: Multiclass data segmentation using diffuse interface methods on graphs. IEEE Trans. Pattern Anal. Mach. Intell. 36(8), 1600–1613 (2014)CrossRefzbMATHGoogle Scholar
 39.Gilboa, G., Osher, S.: Nonlocal linear image regularization and supervised segmentation. SIAM Multiscale Model. Simul. (MMS) 6(2), 595–630 (2007)MathSciNetCrossRefzbMATHGoogle Scholar
 40.Goldstein, T., Bresson, X., Osher, S.: Global minimization of markov random fields with applications to optical flow. UCLA camreport 0977 (2009)Google Scholar
 41.Golovinskiy, A., Kim, V. G., Funkhouser, T.: Shapebased recognition of 3D point clouds in urban environments. In: International Conference on Computer Vision (ICCV), pp. 2154–2161 (2009)Google Scholar
 42.Hein, M., Audibert, J., Von Luxburg, U.: From graphs to manifoldsweak and strong pointwise consistency of graph laplacians. In: Proceedings of the 18th Conference on Learning Theory (COLT), pp. 470–485. Springer, New York (2005)Google Scholar
 43.Hein, M., Setzer, S.: Beyond spectral clustering—tight relaxations of balanced graph cuts. In: J. ShaweTaylor, R.S. Zemel, P. Bartlett, F.C.N. Pereira, and K.Q. Weinberger, (eds.) Advances in Neural Information Processing Systems 24, pp. 2366–2374 (2011)Google Scholar
 44.Hein, M., Bühler, T.: An inverse power method for nonlinear eigenproblems with applications in 1spectral clustering and sparse PCA. Adv. Neural Inf. Process. Syst. 23, 847–855 (2010)Google Scholar
 45.Hidane, M., Lezoray, O., Elmoataz, A.: Nonlinear multilayered representation of graphsignals. J. Math. Imaging Vis. 45, 114–137 (2013)MathSciNetCrossRefzbMATHGoogle Scholar
 46.Indyk, P.: Chapter 39 : Nearest neighbours in highdimensional spaces. In: Handbook of Discrete and Computational Geometry (2nd ed.). pp. 1–16. CRC Press, Boca Raton (2004)Google Scholar
 47.Joachims et al, T.: Transductive learning via spectral graph partitioning. In: International Conference on Machine Learning, 20, pp. 290 (2003)Google Scholar
 48.Jung, M., Peyré, G., Cohen, L.D.: Nonlocal active contours. SIAM J. Imaging Sci. 5(3), 1022–1054 (2012)MathSciNetCrossRefzbMATHGoogle Scholar
 49.Kégl, B., BusaFekete, R.: Boosting products of base classifiers. In: Proceedings of the 26th Annual International Conference on Machine Learning, pp. 497–504 (2009)Google Scholar
 50.Kimmel, R., Caselles, V., Sapiro, G.: Geodesic active contours. Int. J. Comput. Vision 22, 61–79 (1997)CrossRefzbMATHGoogle Scholar
 51.Kleinberg, J.M., Tardos, É.: Approximation algorithms for classification problems with pairwise relationships: metric labeling and markov random fields. J. ACM 49(5), 616–639 (2002)MathSciNetCrossRefzbMATHGoogle Scholar
 52.Klodt, M., Cremers, D.: A convex framework for image segmentation with moment constraints. In: IEEE International Conference on Computer Vision, ICCV 2011, Barcelona, Spain, November 6–13, 2011, pp. 2236–2243 (2011)Google Scholar
 53.Kolmogorov, V., Zabih, R.: What energy functions can be minimized via graph cuts. IEEE Trans. Pattern Anal. Mach. Intell. 26, 65–81 (2004)CrossRefzbMATHGoogle Scholar
 54.Komodakis, N., Tziritas, G.: Approximate labeling via graphcuts based on linear programming. Pattern Anal. Mach. Intell. 29(8), 1436–1453 (2007)CrossRefGoogle Scholar
 55.Lai, R., Liang, J., Zhao, H.K.: A local mesh method for solving pdes on point clouds. Inverse Probl. Imaging 7, 737–755 (2013)MathSciNetCrossRefzbMATHGoogle Scholar
 56.LeCun, Y., Cortes, C.: The MNIST database of handwritten digits. http://yann.lecun.com/exdb/mnist/
 57.LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradientbased learning applied to document recognition. Proc. IEEE 86(11), 2278–2324 (1998)CrossRefGoogle Scholar
 58.Lellmann, J., Kappes, J., Yuan, J., Becker, F., Schnörr, C., Convex multiclass image labeling by simplexconstrained total variation. In: X.C. Tai., K. Mórken., M. Lysaker., K.A. Lie., (eds) Scale Space and Variational Methods in Computer Vision (SSVM 2009), vol. 5567 of LNCS, pp. 150–162. Springer, New York (2009)Google Scholar
 59.Lezoray, O., Elmoataz, A., Ta, V.T.: Nonlocal pdes on graphs for active contours models with applications to image segmentation and data clustering. In: 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 873–876. IEEE (2012)Google Scholar
 60.Lezoray, O., Lozes, F., Elmoataz, A.: Partial difference operators on weighted graphs for image processing on surfaces and point clouds. IEEE Trans. Imag. Process. 23, 3896–3909 (2014)MathSciNetCrossRefGoogle Scholar
 61.Li, F., Ng, M.K., Zeng, T., Shen, C.: A multiphase image segmentation method based on fuzzy region competition. SIAM J. Imaging Sci. 3(3), 277–299 (2010)MathSciNetCrossRefzbMATHGoogle Scholar
 62.Lozes, F., Elmoataz, A., Lezoray, O.: Nonlocal processing of 3d colored point clouds. In: Proceedings of the 21st International Conference on Pattern Recognition, ICPR 2012, Tsukuba, Japan, November 11–15, 2012, pp. 1968–1971 (2012)Google Scholar
 63.Macdonald, C.B., Merriman, B., Ruuth, S.J.: Simple computation of reactiondiffusion processes on point clouds. Proc. Natl. Acad. Sci. 110, 3009–3012 (2013)MathSciNetCrossRefzbMATHGoogle Scholar
 64.Martello, S., Toth, P.: Knapsack Problems: Algorithms and Computer Implementations. Wiley, New York (1990)zbMATHGoogle Scholar
 65.Merkurjev, E., Sunu, J., Bertozzi, A.L.: Graph MBO method for multiclass segmentation of hyperspectral standoff detection video. In: 2014 IEEE International Conference on Image Processing (ICIP), pp. 689–693. IEEE (2014)Google Scholar
 66.Merkurjev, E., Kostic, T., Bertozzi, A.L.: An MBO scheme on graphs for classification and image processing. SIAM J. Imaging Sci. 6(4), 1903–1930 (2013)MathSciNetCrossRefzbMATHGoogle Scholar
 67.Merkurjev, E., GarciaCardona, C., Bertozzi, A.L., Flenner, A., Percus, A.G.: Diffuse interface methods for multiclass segmentation of highdimensional data. Appl. Math. Lett. 33, 29–34 (2014)MathSciNetCrossRefzbMATHGoogle Scholar
 68.Merkurjev, E., Bae, E., Bertozzi, A.L., Tai, X.C.: Global binary optimization on graphs for classification of highdimensional data. J. Math. Imaging Vis. 52(3), 414–435 (2015)MathSciNetCrossRefzbMATHGoogle Scholar
 69.Miller, R. E., Thatcher, J. W.: editors. In: Proceedings of a symposium on the Complexity of Computer Computations, held March 20–22, 1972, at the IBM Thomas J. Watson Research Center, Yorktown Heights, New York, The IBM Research Symposia Series. Plenum Press, New York. (1972)Google Scholar
 70.Mroueh, Y., Poggio, T., Rosasco, L., Slotine, J.J.: Multiclass learning with simplex coding. In: Advances in Neural Information Processing Systems, pp. 2789–2797 (2012)Google Scholar
 71.Mumford, D., Shah, J.: Optimal approximation by piecewise smooth functions and associated variational problems. Comm. Pure Appl. Math. 42(42), 577–685 (1989)MathSciNetCrossRefzbMATHGoogle Scholar
 72.Munoz, D., Vandapel, N., Hebert, M.: Directional associative markov network for 3d point cloud classification. In: International Symposium on 3D Data Processing, Visualization and Transmission (3DPVT) (2008)Google Scholar
 73.Nene, S.A., Nayar, S.K., Murase, H.: Columbia Object Image Library (COIL100). Technical Report CUCS00696, (1996)Google Scholar
 74.Perona, P., ZelnikManor, L.: Selftuning spectral clustering. Adv. Neural Inf. Process. Syst. 17, 1601–1608 (2004)Google Scholar
 75.Pock, T., Cremers, D., Bischof, H., Chambolle, A.: An algorithm for minimizing the piecewise smooth MumfordShah functional. In: IEEE International Conference on Computer Vision (ICCV), Kyoto, Japan, (2009)Google Scholar
 76.Ranftl, R., Bredies, K., Pock, T.: Nonlocal total generalized variation for optical flow estimation. In: Computer Vision  ECCV 2014—13th European Conference, Zurich, Switzerland, September 6–12, 2014, Proceedings, Part I, pp. 439–454 (2014)Google Scholar
 77.Sawatzky, A., Tenbrinck, D., Jiang, X., Burger, M.: A variational framework for regionbased segmentation incorporating physical noise models. J. Math. Imaging Vis. 47(3), 179–209 (2013)MathSciNetCrossRefzbMATHGoogle Scholar
 78.Shi, J., Malik, J.: Normalized cuts and image segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 22(8), 888–905 (2000)CrossRefGoogle Scholar
 79.Sion, M.: On general minimax theorems. Pacific J. Math. 8, 171–176 (1958)MathSciNetCrossRefzbMATHGoogle Scholar
 80.Strang, G.: Maximum flows and minimum cuts in the plane. Adv. Mech. Math. III, 1–11 (2008)zbMATHGoogle Scholar
 81.Subramanya, A., Bilmes, J.: Semisupervised learning with measure propagation. J. Mach. Learn. Res. 12, 3311–3370 (2011)MathSciNetzbMATHGoogle Scholar
 82.Szlam, A., Bresson, X.: A total variationbased graph clustering algorithm for Cheeger ratio cuts. In: Proceedings of the 27th International Conference on Machine Learning, pp. 1039–1046 (2010)Google Scholar
 83.Szlam, A.D., Maggioni, M., Coifman, R.R.: Regularization on graphs with functionadapted diffusion processes. J. Mach. Learn. Res. 9, 1711–1739 (2008)MathSciNetzbMATHGoogle Scholar
 84.Tenbrinck, D., Lozes, F., Elmoataz, A.: Solving minimal surface problems on surfaces and point clouds. In: Scale Space and Variational Methods in Computer Vision–5th International Conference, SSVM 2015, LègeCap Ferret, France, May 31–June 4, 2015, Proceedings, pp. 601–612 (2015)Google Scholar
 85.Tian, L., Macdonald, C. B., Ruuth, S. J.: Segmentation on surfaces with the closest point method. In: Proceedings ICIP09, 16th IEEE International Conference on Image Processing, pp. 3009–3012 (2009)Google Scholar
 86.Toutain, M., Elmoataz, A., Lzoray, O.: Geometric pdes on weighted graphs for semisupervised classification. In: 13th International Conference on Machine Learning and Applications (ICMLA), pp. 231–236 (2014)Google Scholar
 87.Triebel, R., Kersting, K., Burgard, W.: Robust 3D scan point classification using associative markov networks. In: Proceedings of the International Conference on Robotics and Automation(ICRA), pp. 2603–2608 (2006)Google Scholar
 88.Trillos, N.G., Slepcev, D.: Continuum limit of total variation on point couds. Arch. Ration. Mech. Anal. 220, 193–241 (2016)MathSciNetCrossRefzbMATHGoogle Scholar
 89.Trillos, N. G., Slepcev, D., von Brecht, J., Laurent, T., Bresson, X.: Consistency of cheeger and ratio graph cuts. Technical report, arXiv:1411.6590, (2014)
 90.van Gennip, Y., Guillen, N., Osting, B., Bertozzi, A.L.: Mean curvature, threshold dynamics, and phase field theory on finite graphs. Milan J, Milan J. Math. 82(1), 3–65 (2014)Google Scholar
 91.van Gennip, Y., Bertozzi, A.L.: Gammaconvergence of graph Ginzburg–Landau functionals. Adv. Differ. Equ. 17(11–12), 1115–1180 (2012)zbMATHGoogle Scholar
 92.Wang, J., Jebara, T., Chang, S.F.: Graph transduction via alternating minimization. In: Proceedings of the 25th International Conference on Machine Learning, pp. 1144–1151 (2008)Google Scholar
 93.Witkin, A., Kass, M., Terzopoulos, D.: Snakes Active contour models. Int. J. Comput. Vis. 1, 321–331 (1988)CrossRefzbMATHGoogle Scholar
 94.Yin, K., Tai, X.C., Osher, S.: An effective region force for some variational models for learning and clustering. UCLA CAM Report 1618 (2016)Google Scholar
 95.Yuan, J., Bae, E., Tai, X.C., Boykov, Y.: A continuous maxflow approach to potts model. In: European Conference on Computer Vision, vol. 6316 of LNCS, pp. 379–392 (2010)Google Scholar
 96.Yuan, J., Bae, E., Tai, X.C.: A study on continuous maxflow and mincut approaches. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 2217–2224 (2010)Google Scholar
 97.Yuan, J., Bae, E., Tai, X.C., Boykov, Y.: A spatially continuous maxflow and mincut framework for binary labeling problems. Numer. Math. 126(3), 559–587 (2013)MathSciNetCrossRefzbMATHGoogle Scholar
 98.Zach, C., Gallup, D., Frahm, J.M., Niethammer, M.: Fast global labeling for realtime stereo using multiple plane sweeps. In: Vision, Modeling and Visualization Workshop (VMV) (2008)Google Scholar
 99.Zhang, X., Burger, M., Bresson, X., Osher, S.: Bregmanized nonlocal regularization for deconvolution and sparse reconstruction. SIAM J. Imaging Sci. 3(3), 253–276 (2010)MathSciNetCrossRefzbMATHGoogle Scholar
 100.Zhou, D., Schölkopf, B.: A regularization framework for learning from graph data. In: Workshop on Statistical Relational Learning. International Conference on Machine Learning (2004)Google Scholar
 101.Zhou, D., Bousquet, O., Lal, T.N., Weston, J., Schölkopf, B.: Learning with local and global consistency. Adv. Neural Inf. Process. Syst. 16, 321–328 (2004)Google Scholar
 102.Zhu, X.: Semisupervised learning literature survey. Computer Sciences Technical Report 1530, University of WisconsinMadison (2005)Google Scholar
Copyright information
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.