A Survey of Support Vector Machines with Uncertainties

Wang, Ximing; Pardalos, Panos M.

doi:10.1007/s40745-014-0022-8

A Survey of Support Vector Machines with Uncertainties

Published: 17 January 2015

Volume 1, pages 293–309, (2014)
Cite this article

Download PDF

Annals of Data Science Aims and scope Submit manuscript

A Survey of Support Vector Machines with Uncertainties

Download PDF

Ximing Wang¹ &
Panos M. Pardalos¹

4295 Accesses
31 Citations
Explore all metrics

Abstract

Support Vector Machines (SVM) is one of the well known supervised classes of learning algorithms. SVM have wide applications to many fields in recent years and also many algorithmic and modeling variations. Basic SVM models are dealing with the situation where the exact values of the data points are known. This paper presents a survey of SVM when the data points are uncertain. When a direct model cannot guarantee a generally good performance on the uncertainty set, robust optimization is introduced to deal with the worst case scenario and still guarantee an optimal performance. The data uncertainty could be an additive noise which is bounded by norm, where some efficient linear programming models are presented under certain conditions; or could be intervals with support and extremum values; or a more general case of polyhedral uncertainties with formulations presented. Another field of the uncertainty analysis is chance constrained SVM which is used to ensure the small probability of misclassification for the uncertain data. The multivariate Chebyshev inequality and Bernstein bounding schemes have been used to transform the chance constraints through robust optimization. The Chebyshev based model employs moment information of the uncertain training points. The Bernstein bounds can be less conservative than the Chebyshev bounds since it employs both support and moment information, but it also makes a strong assumption that all the elements in the data set are independent.

Robust chance-constrained support vector machines with second-order moment information

Article 12 October 2015

Robust Support Vector Machines with Polyhedral Uncertainty of the Input Data

Stochastic support vector regression with probabilistic constraints

Article 29 June 2017

1 Introduction

As one of the well known supervised learning algorithms, Support Vector Machines (SVM) are gaining more and more attention. It was proposed by Vapnik [1, 2] as a maximum-margin classifier, and tutorials on SVM could refer to [3–6]. In recent years, SVM have been applied to many fields and have many algorithmic and modeling variations. In the biomedical field, SVM have been used to identify physical diseases [7–10] as well as psychological diseases [11]. Electroencephalography (EEG) signals can also be analyzed using SVM [12–14]. Besides these, SVM also applied to protein prediction [15–19] and medical images [20–22]. Computer vision includes many applications of SVM like person identification [23], hand gesture detection [24], face recognition [25] and background subtraction [26]. In geosceinces, SVM have been applied to remote sensing analysis [27–29], land cover change [30–32], landslide susceptibility [33–36] and hydrology [37, 38]. In power systems, SVM was used for transient status prediction [39], power load forecasting [40], electricity consumption prediction [41] and wind power forecasting [42]. Stock price forecasting [43–45] and business administration [46] can also use SVM. Other applications of SVM include agriculture plant disease detection [47], condition monitoring [48], network security [49] and electronics [50, 51]. When basic SVM models cannot satisfy the application requirement, different modeling variations of SVM can be found in [52].

In this paper, a survey of SVM with uncertainties is presented. Basic SVM models are dealing with the situation that the exact values of the data points are known. When the data points are uncertain, different models have been proposed to formulate the SVM with uncertainties. Bi and Zhang [53] assumed the data points are subject to an additive noise which is bounded by the norm and proposed a very direct model. However, this model cannot guarantee a generally good performance on the uncertainty set. To guarantee an optimal performance when the worst case scenario constraints are still satisfied, robust optimization is utilized. Trafalis et al. [54–58] proposed a robust optimization model when the perturbation of the uncertain data is bounded by norm. Ghaoui et al. [59] derived a robust model when the uncertainty is expressed as intervals. Fan et al. [60] studied a more general case for polyhedral uncertainties. Robust optimization is also used when the constraint is a chance constraint which is to ensure the small probability of misclassification for the uncertain data. The chance constraints are transformed by different bounding inequalities, for example multivariate Chebyshev inequality [61, 62] and Bernstein bounding schemes [63].

The organization of this paper is as follows: Sect. 2 gives an introduction to the basic SVM models. Section 3 presents the SVM with uncertainties, stating both the robust SVM with bounded uncertainty and chance constrained SVM through robust optimization. Section 4 presents concluding remarks and suggesting for further research.

2 Basic SVM Models

Support Vector Machines construct maximum-margin classifiers, such that small perturbations in data are least likely to cause misclassification. Empirically, SVM works really well and are well known supervised learning algorithms proposed by Vapnik [1, 2]. Suppose we have a two-class dataset of $m$ data points $\{\mathbf {x}_i,y_i\}_{i=1}^m$ with $n$-dimensional features $ \mathbf {x}_i \in \mathbb {R}^n $ and respective class labels $ y_i \in \{ +1,-1 \} $. For linearly separable datasets, there exists a hyperplane $ \mathbf {w}^\top \mathbf {x} + b = 0 $ to separate the two classes and the corresponding classification rule is based on the sign$(\mathbf {w}^\top \mathbf {x} + b )$. If this value is positive, $\mathbf {x}$ is classified to be in $+1$ class; otherwise, $-1$ class.

The datapoints that the margin pushes up against are called support vectors. A maximum-margin hyperplane is one that maximizes the distance between the hyperplane and the support vectors. For the separating hyperplane $ \mathbf {w}^\top \mathbf {x} + b = 0 $, $\mathbf {w}$ and $b$ could be normalized so that $ \mathbf {w}^\top \mathbf {x} + b = +1 $ goes through support vectors of $+1$ class, and $ \mathbf {w}^\top \mathbf {x} + b = -1 $ goes through support vectors of $-1$ class. The distance between these two hyperplane, i.e., the margin width, is ${2 \over \Vert \mathbf {w} \Vert _2^2}$, therefore, maximization of the margin can be performed as minimization of $ {1 \over 2} \Vert \mathbf {w} \Vert _2^2 $ subject to separation constraints. This can be expressed as the following quadratic optimization problem:

$$\begin{aligned} \min _{\mathbf {w},b}\,&\dfrac{1}{2} \Vert \mathbf {w} \Vert _2^2 \end{aligned}$$

(1a)

$$\begin{aligned} \hbox {s.t.} \, \,&y_i (\mathbf {w}^\top \mathbf {x}_i +b) \ge 1, \ \ i=1,\ldots ,m \end{aligned}$$

(1b)

Introduing Lagrange multipliers $\varvec{\alpha }=[\alpha _1, \ldots , \alpha _m]$, the above constrained problem can be expressed as:

$$\begin{aligned} \min _{\mathbf {w},b} \max _{\varvec{\alpha } \ge 0} \ \fancyscript{L} (\mathbf {w},b,\varvec{\alpha }) = \dfrac{1}{2} \Vert \mathbf {w} \Vert _2^2 - \sum _{i=1}^m \alpha _i \bigl [y_i (\mathbf {w}^\top \mathbf {x}_i +b)-1\bigr ] \end{aligned}$$

(2)

Take the derivatives with respect to $\mathbf {w}$ and $b$, and set to zero:

$$\begin{aligned} \frac{\partial \fancyscript{L} (\mathbf {w},b,\varvec{\alpha })}{\partial \mathbf {w}} = 0 \ \,&\Rightarrow \ \ \mathbf {w} = \sum _{i=1}^m \alpha _i y_i \mathbf {x}_i \end{aligned}$$

(3a)

$$\begin{aligned} \frac{\partial \fancyscript{L} (\mathbf {w},b,\varvec{\alpha })}{\partial b} = 0 \ \,&\Rightarrow \ \ \sum _{i=1}^m \alpha _i y_i = 0 \end{aligned}$$

(3b)

Substituting into $ \fancyscript{L} (\mathbf {w},b,\varvec{\alpha }) $:

$$\begin{aligned} \fancyscript{L} (\varvec{\alpha }) = \sum _{i=1}^m \alpha _i - \dfrac{1}{2} \sum _{i=1}^m \sum _{j=1}^m \alpha _i \alpha _j y_i y_j \mathbf {x}_i^\top \mathbf {x}_j \end{aligned}$$

(4)

Then the dual of the original SVM problem is also a convex quadratic problem:

$$\begin{aligned} \max _{\varvec{\alpha }} \ \,&\sum _{i=1}^m \alpha _i - \dfrac{1}{2} \sum _{i=1}^m \sum _{j=1}^m \alpha _i \alpha _j y_i y_j \mathbf {x}_i^\top \mathbf {x}_j \end{aligned}$$

(5a)

$$\begin{aligned} \hbox {s.t.} \ \,&\sum _{i=1}^m \alpha _i y_i = 0, \ \ \alpha _i \ge 0, \ \ i=1,\ldots ,m \end{aligned}$$

(5b)

Since only the $\alpha _i$ corresponding to support vectors can be nonzero, this dramatically simplifies solving the dual problem.

The above is in the case that the two classes are linearly separable. When they are not, mislabeled samples need to be allowed where soft margin SVM arises. Soft margin SVM introduces non-negative slack variables $\xi _i$ to measure the distance of within-margine or misclassified data $\mathbf {x}_i$ to the hyperplane with the correct label, and $\xi _i = \max \{0,1-y_i (\mathbf {w}^\top \mathbf {x}_i +b) \}$. When $0<\xi _i<1$, the data is within margine but correctly classified; when $\xi _i > 1 $, the data is misclassified. The objective function is then adding a term that penalizes these slack variables, and the optimization is a trade off between a large margin and a small error penalty. The soft margin SVM formulation with $L_1$ regularization [64] is:

$$\begin{aligned} \min _{\mathbf {w},b,\xi _i} \ \,&\dfrac{1}{2} \Vert \mathbf {w} \Vert _2^2 + C \sum _{i=1}^m \xi _i \end{aligned}$$

(6a)

$$\begin{aligned} \hbox {s.t.} \ \,&y_i (\mathbf {w}^\top \mathbf {x}_i +b) \ge 1-\xi _i, \ \ \ \xi _i \ge 0, \ \ i=1,\ldots ,m \end{aligned}$$

(6b)

where $C$ is a trade-off parameter.

Similarly, the Lagrange of the soft margin SVM is:

$$\begin{aligned} \min _{\mathbf {w},b,\varvec{\xi }} \max _{\varvec{\alpha },\varvec{\beta } \ge 0} \ \fancyscript{L} (\mathbf {w},b,\varvec{\xi },\varvec{\alpha },\varvec{\beta })&= \dfrac{1}{2} \Vert \mathbf {w} \Vert _2^2 + C \sum _{i=1}^m \xi _i\nonumber \\&- \sum _{i=1}^m \alpha _i \bigl [y_i (\mathbf {w}^\top \mathbf {x}_i +b)-1+\xi _i \bigr ] - \sum _{i=1}^m \beta _i \xi _i\qquad \end{aligned}$$

(7)

Take the derivative with respect to $\xi _i$ and set to zero:

$$\begin{aligned} \frac{\partial \fancyscript{L} (\mathbf {w},b,\varvec{\xi },\varvec{\alpha },\varvec{\beta })}{\partial \xi _i} = 0 \ \ \Rightarrow \ \ C-\alpha _i-\beta _i = 0 \end{aligned}$$

(8)

Then $\alpha _i=C-\beta _i$. Since $\beta _i \ge 0$, it indicates that $\alpha _i \le C$.

The derivatives with respect to $\mathbf {w}$ and $b$ are the same as before, substituting into $\fancyscript{L} (\mathbf {w},b,\varvec{\xi },\varvec{\alpha },\varvec{\beta })$ and get the dual of the soft margin SVM:

$$\begin{aligned} \max _{\varvec{\alpha }} \ \,&\sum _{i=1}^m \alpha _i - \dfrac{1}{2} \sum _{i=1}^m \sum _{j=1}^m \alpha _i \alpha _j y_i y_j \mathbf {x}_i^\top \mathbf {x}_j \end{aligned}$$

(9a)

$$\begin{aligned} \hbox {s.t.} \ \,&\sum _{i=1}^m \alpha _i y_i = 0, \ \ 0 \le \alpha _i \le C, \ \ i=1,\ldots ,m \end{aligned}$$

(9b)

The only difference is that the dual variables $\alpha _i$ now have upper bounds $C$. The advantage of the $L_1$ regularization (linear penalty function) is that in the dual problem, the slack variables $\xi _i$ vanish and the constant $C$ is just an additional constraint on the Lagrange multipliers $\alpha _i$. Because of this nice property and its huge impact in practice, $L_1$ is the most widely used regularization term.

Besides the linear kernel $k(\mathbf {x}_i,\mathbf {x}_j)=\mathbf {x}_i^\top \mathbf {x}_j$, nonlinear kernels are also introduced into SVM to create nonlinear classifiers. The maximum-margin hyperplane is constructed in a high-dimensional transformed fearture space with a possible nonlinear transformation, therefore, it could be nonlinear in the original feature space. A widely used nonliear kernel is the Gaussian radial basis function $k(\mathbf {x}_i,\mathbf {x}_j)=\exp \big (-\gamma \Vert \mathbf {x}_i - \mathbf {x}_j \Vert _2^2\big )$. It corresponds to a Hilbert space of infinite dimensions.

3 SVM with Uncertainties

Given $m$ training data points in $\mathbb {R}^n$, use $ X_i = [X_{i1},\ldots ,X_{in}]^\top \in \mathbb {R}^n, i=1,\ldots ,m $ to denote the uncertain training data points and $ y_i \in \{ +1,-1 \}, i=1,\ldots ,m $ to denote the respective class labels. The soft margin SVM with uncertainty is as following:

$$\begin{aligned} \min _{\mathbf {w},b,\xi _i} \ \,&\dfrac{1}{2} \Vert \mathbf {w} \Vert _2^2 + C \sum _{i=1}^m \xi _i \end{aligned}$$

(10a)

$$\begin{aligned} \hbox {s.t.} \ \,&y_i (\mathbf {w}^\top X_i +b) \ge 1-\xi _i, \ \ \ \xi _i \ge 0, \ \ i=1,\ldots ,m \end{aligned}$$

(10b)

When the training data points $X_i$ are random vectors, the model needs to be modified to consider the uncertainties. The simplest model is to just employ the means of the uncertain data points, $\mu _i = \mathbf {E}[X_i]$. The formulation would become:

$$\begin{aligned} \min _{\mathbf {w},b,\xi _i} \ \,&\dfrac{1}{2} \Vert \mathbf {w} \Vert _2^2 + C \sum _{i=1}^m \xi _i \end{aligned}$$

(11a)

$$\begin{aligned} \hbox {s.t.} \ \,&y_i (\mathbf {w}^\top \mu _i +b) \ge 1-\xi _i, \ \ \ \xi _i \ge 0, \ \ i=1,\ldots ,m \end{aligned}$$

(11b)

The above model is equivalent to a soft margin SVM on data points fixed on the means, therefore does not take into account the uncertainties of the data. Bi and Zhang [53] assumed the data points are subject to an additive noise, $ X_i = \bar{\mathbf {x}}_i + \Delta \mathbf {x}_i$ and the noise is bounded by $\Vert \Delta \mathbf {x}_i \Vert _2 \le \delta _i$. Then they proposed the model as:

$$\begin{aligned} \min _{\mathbf {w},b,\xi _i} \ \,&\dfrac{1}{2} \Vert \mathbf {w} \Vert _2^2 + C \sum _{i=1}^m \xi _i \end{aligned}$$

(12a)

$$\begin{aligned} \hbox {s.t.} \ \,&y_i (\mathbf {w}^\top (\bar{\mathbf {x}}_i+\Delta \mathbf {x}_i) +b) \ge 1-\xi _i, \ \ \ \xi _i \ge 0, \ \ i=1,\ldots ,m\end{aligned}$$

(12b)

$$\begin{aligned}&\Vert \Delta \mathbf {x}_i \Vert _2 \le \delta _i, \ \ i=1,\ldots ,m \end{aligned}$$

(12c)

In this model, the uncertain data $ X_i $ is free in the circle centered at $\bar{\mathbf {x}}_i$ with radius equal to $\delta _i$, i.e., $ X_i $ could move toward any direction in the uncertainty set. A drawback of this model is that it cannot guarantee a generally good performance on the uncertainty set since the direction of how the data points are perturbed is not constrained in this model. It is highly possible and already presented in this paper that a data point with a perturbation making it move far away from the separation hyperplane could be used as the support vector. Then considering the original uncertainty set of this data point, it would be mostly lie within the margin and the constraint would not be satisfied any more. To guarantee a better performance under most conditions or with higher probability, robust optimization is introduced to solve the SVM with uncertainty.

3.1 Robust SVM with Bounded Uncertainty

Robust optimization is to guarantee an optimal performance under the worst case scenario. Given different information of the uncertain data, several models have been proposed. Trafalis et al. [54–58] proposed a model when the perturbation of the uncertain data is bounded by norm. The uncertain data could be expressed as $ X_i = \bar{\mathbf {x}}_i + \varvec{\sigma }_i$, the mean vector $\bar{\mathbf {x}}_i$ plus the additional perturbation $\varvec{\sigma }_i$ which is bounded by the $L_p$ norm with $\Vert \varvec{\sigma }_i \Vert _p \le \eta _i $, for all $i=1,\ldots ,m$. Robust optimization is to deal with the worst case perturbation, and this would be:

$$\begin{aligned} \min _{\Vert \varvec{\sigma }_i \Vert _p \le \eta _i} y_i (\mathbf {w}^\top \bar{\mathbf {x}}_i +b) + y_i \mathbf {w}^\top \varvec{\sigma }_i \ge 1-\xi _i, \ \ i=1,\ldots ,m \end{aligned}$$

(13)

To solve the robust SVM, the following subproblem needs to be solved first:

$$\begin{aligned} \min _{\varvec{\sigma }_i} \ \,&y_i \mathbf {w}^\top \varvec{\sigma }_i\end{aligned}$$

(14a)

$$\begin{aligned} \hbox {s.t.} \ \,&\Vert \varvec{\sigma }_i \Vert _p \le \eta _i \end{aligned}$$

(14b)

Hölder’s inequality says that for a pair of dual norms $L_p$ and $L_q$ with $p,q \in [1,\infty ]$ and $1/p+1/q=1$, the following inequality holds:

$$\begin{aligned} \Vert fg \Vert _1 \le \Vert f \Vert _p \Vert g \Vert _q \end{aligned}$$

(15)

Therefore

$$\begin{aligned} | y_i \mathbf {w}^\top \varvec{\sigma }_i | \le \Vert \varvec{\sigma }_i \Vert _p \Vert \mathbf {w} \Vert _q \le \eta _i \Vert \mathbf {w} \Vert _q \end{aligned}$$

(16)

A lower bound of $ y_i \mathbf {w}^\top \varvec{\sigma }_i $ is $ -\eta _i \Vert \mathbf {w} \Vert _q $, substituting into the original problem will get the following formulation:

$$\begin{aligned} \min _{\mathbf {w},b,\xi _i} \ \,&\dfrac{1}{2} \Vert \mathbf {w} \Vert _2^2 + C \sum _{i=1}^m \xi _i \end{aligned}$$

(17a)

$$\begin{aligned} \hbox {s.t.} \ \,&y_i (\mathbf {w}^\top \bar{\mathbf {x}}_i +b) - \eta _i \Vert \mathbf {w} \Vert _q \ge 1-\xi _i, \ \ \ \xi _i \ge 0, \ \ i=1,\ldots ,m \end{aligned}$$

(17b)

The above formulation depends on the norm $L_p$. When $p=q=2$, a conic program of the above formulation can be obtained:

$$\begin{aligned} \min _{\mathbf {w},b,\xi _i} \ \,&\dfrac{1}{2} \Vert \mathbf {w} \Vert _2^2 + C \sum _{i=1}^m \xi _i \end{aligned}$$

(18a)

$$\begin{aligned} \hbox {s.t.} \ \,&y_i (\mathbf {w}^\top \bar{\mathbf {x}}_i +b) - \eta _i \Vert \mathbf {w} \Vert _2 \ge 1-\xi _i, \ \ \ \xi _i \ge 0, \ \ i=1,\ldots ,m \end{aligned}$$

(18b)

An interesting property of the norm transformation is that for $L_1$ and $L_\infty $ norms, with the objective function $ \dfrac{1}{2} \Vert \mathbf {w} \Vert _q + C \sum _{i=1}^m \xi _i $, the problem can be transformed into a linear programming (LP) problem.

The dual of $L_1$ norm is $L_\infty $ norm. When $p=1$, the formulation becomes:

$$\begin{aligned} \min _{\mathbf {w},b,\xi _i} \ \,&\dfrac{1}{2} \Vert \mathbf {w} \Vert _\infty + C \sum _{i=1}^m \xi _i \end{aligned}$$

(19a)

$$\begin{aligned} \hbox {s.t.} \ \,&y_i (\mathbf {w}^\top \bar{\mathbf {x}}_i +b) - \eta _i \Vert \mathbf {w} \Vert _\infty \ge 1-\xi _i, \ \ \ \xi _i \ge 0, \ \ i=1,\ldots ,m \end{aligned}$$

(19b)

Introducing an auxiliary variable $\alpha = \Vert \mathbf {w} \Vert _\infty $, then the above formulation can be written as a LP problem:

$$\begin{aligned} \min _{\mathbf {w},b,\xi _i} \ \,&\dfrac{1}{2} \alpha + C \sum _{i=1}^m \xi _i \end{aligned}$$

(20a)

$$\begin{aligned} \hbox {s.t.} \ \,&y_i (\mathbf {w}^\top \bar{\mathbf {x}}_i +b) - \eta _i \alpha \ge 1-\xi _i, \ \ \ \xi _i \ge 0, \ \ i=1,\ldots ,m \end{aligned}$$

(20b)

$$\begin{aligned}&\alpha \ge -w_j, \ \alpha \ge w_j, \ \ j=1,\ldots ,n \end{aligned}$$

(20c)

When the $L_\infty $ norm is chosen to express the perturbation, then the formulation becomes:

$$\begin{aligned} \min _{\mathbf {w},b,\xi _i} \ \,&\dfrac{1}{2} \Vert \mathbf {w} \Vert _1 + C \sum _{i=1}^m \xi _i \end{aligned}$$

(21a)

$$\begin{aligned} \hbox {s.t.} \ \,&y_i (\mathbf {w}^\top \bar{\mathbf {x}}_i +b) - \eta _i \Vert \mathbf {w} \Vert _1 \ge 1-\xi _i, \ \ \ \xi _i \ge 0, \ \ i=1,\ldots ,m \end{aligned}$$

(21b)

Introducing an auxiliary vector $\varvec{\alpha }$ with $\alpha _j = |w_j|$, the resulting optimization problem is also LP:

$$\begin{aligned} \min _{\mathbf {w},b,\xi _i} \ \,&\dfrac{1}{2} \sum _{j=1}^n \alpha _j + C \sum _{i=1}^m \xi _i \end{aligned}$$

(22a)

$$\begin{aligned} \hbox {s.t.} \ \,&y_i (\mathbf {w}^\top \bar{\mathbf {x}}_i +b) - \eta _i \sum _{j=1}^n \alpha _j \ge 1-\xi _i, \ \ \ \xi _i \ge 0, \ \ i=1,\ldots ,m \end{aligned}$$

(22b)

$$\begin{aligned}&\alpha _j \ge -w_j, \ \alpha _j \ge w_j, \ \ j=1,\ldots ,n \end{aligned}$$

(22c)

Ghaoui et al. [59] derived a robust model when the uncertainty is expressed as intervals (also known as support or extremum values). Suppose the extremum values of the uncertain data points are known as $ l_{ij} \le X_{ij} \le u_{ij} $, then each training data point $X_i$ is lying in a hyper-rectangle $\fancyscript{R}_i = \{ \mathbf {x}_i=[x_{i1},\ldots ,x_{in}]^\top \in \mathbb {R}^n \ | \ l_{ij} \le x_{ij} \le u_{ij}, j=1,\ldots ,n \} $ and the robust optimization requires that all points in the hyper-rectangle should satisfy $ y_i (\mathbf {w}^\top \mathbf {x}_i +b) \ge 1-\xi _i, \forall \mathbf {x}_i \in \fancyscript{R}_i $. The geometric center of the hyper-rectangle $\fancyscript{R}_i$ is $\mathbf {c}_i = [c_{i1}, \ldots , c_{in}]^\top \in \mathbb {R}^n $ where $c_{ij} = (l_{ij}+u_{ij}) / 2, j=1,\ldots ,n$. The semi-lengths of the sides of the hyper-rectangle $\fancyscript{R}_i$ is $s_{ij} = (u_{ij}-l_{ij}) / 2, j=1,\ldots ,n$. Then the worst case with these interval information would be:

$$\begin{aligned} y_i (\mathbf {w}^\top \mathbf {c}_i +b) \ge 1-\xi _i + \sum _{j=1}^n s_{ij} |w_j| \end{aligned}$$

(23)

Then the SVM model with support information can be written as:

$$\begin{aligned} \min _{\mathbf {w},b,\xi _i} \ \,&\dfrac{1}{2} \Vert \mathbf {w} \Vert _2^2 + C \sum _{i=1}^m \xi _i \end{aligned}$$

(24a)

$$\begin{aligned} \hbox {s.t.} \ \,&y_i (\mathbf {w}^\top \mathbf {c}_i +b) \ge 1-\xi _i+||\mathbf {S}_i \mathbf {w}||_1, \ \ \ \xi _i \ge 0, \ \ i=1,\ldots ,m \end{aligned}$$

(24b)

where $\mathbf {S}_i$ is a diagonal matrix with entries $s_{ij}$.

The interval uncertainty is a special case of polyhedral uncertainty [60]. The polyhedral uncertainty can be expressed as $\mathbf {D}_i \mathbf {x}_i \le \mathbf {d}_i$, where the matrix $\mathbf {D}_i \in \mathbb {R}^{q \times n}$ and the vector $\mathbf {d}_i \in \mathbb {R}^q$. And since zero vectors could be added to obatin the same number $q$ of inequalities for all data points, $q$ is the largest dimension of the uncerainties of all the points. The robust SVM with polyhedral uncertainty is:

$$\begin{aligned} \min _{\mathbf {w},b,\xi _i} \ \,&\dfrac{1}{2} \Vert \mathbf {w} \Vert _2^2 + C \sum _{i=1}^m \xi _i \end{aligned}$$

(25a)

$$\begin{aligned} \hbox {s.t.} \ \,&\min _{\{ \mathbf {x}_i: \mathbf {D}_i \mathbf {x}_i \le \mathbf {d}_i \}} y_i (\mathbf {w}^\top \mathbf {x}_i +b) \ge 1-\xi _i, \ \ \ \xi _i \ge 0, \ \ i=1,\ldots ,m \end{aligned}$$

(25b)

Since $\min _{\{ \mathbf {x}_i: \mathbf {D}_i \mathbf {x}_i \le \mathbf {d}_i \}} y_i (\mathbf {w}^\top \mathbf {x}_i +b) \ge 1-\xi _i$ is equivalent to:

$$\begin{aligned} \max _{\{ \mathbf {x}_i: \mathbf {D}_i \mathbf {x}_i \le \mathbf {d}_i \}} (-y_i \mathbf {w}^\top \mathbf {x}_i) - y_i b \le -1+\xi _i \end{aligned}$$

(26)

To solve

$$\begin{aligned} \max \ \,&-y_i \mathbf {w}^\top \mathbf {x}_i \end{aligned}$$

(27a)

$$\begin{aligned} \hbox {s.t.} \ \,&\mathbf {D}_i \mathbf {x}_i \le \mathbf {d}_i \end{aligned}$$

(27b)

The dual is:

$$\begin{aligned} \min \ \,&\mathbf {d}_i^\top \mathbf {z}_i \end{aligned}$$

(28a)

$$\begin{aligned} \hbox {s.t.} \ \,&\mathbf {D}_i^\top \mathbf {z}_i = -y_i \mathbf {w} \end{aligned}$$

(28b)

$$\begin{aligned}&\mathbf {z}_i = (z_{i1}, \ldots , z_{iq})^\top \ge 0 \end{aligned}$$

(28c)

Strong duality would guarantee that the objevtive values of the dual and primal are equal. Therefore, the robust SVM with polyhedral uncertainty formulation is equivalent to:

$$\begin{aligned} \min _{\mathbf {w},b,\xi _i,\mathbf {z}} \ \,&\dfrac{1}{2} \Vert \mathbf {w} \Vert _2^2 + C \sum _{i=1}^m \xi _i \end{aligned}$$

(29a)

$$\begin{aligned} \hbox {s.t.} \ \,&\mathbf {d}_i^\top \mathbf {z}_i - y_i b \le -1+\xi _i , \ \ \ \xi _i \ge 0 \end{aligned}$$

(29b)

$$\begin{aligned}&\mathbf {D}_i^\top \mathbf {z}_i + y_i \,\mathbf {w} = 0, \ \ \ \mathbf {z}_i = (z_{i1}, \ldots , z_{iq})^\top \end{aligned}$$

(29c)

$$\begin{aligned}&z_{ij} \ge 0, \ \ \ i=1,\ldots ,m, \ \ \ j = 1,\ldots ,q \end{aligned}$$

(29d)

The authors also proved that for the hard margin SVM (i.e., when there is no $\xi _i$), the dual of the above formulation is:

$$\begin{aligned} \min _{\lambda ,\mu } \ \,&\sum _{i=1}^m \lambda _i - {1 \over 2} \sum _{k=1}^n \left( \sum _{i=1}^m y_i \mu _{ik} \right) ^2 \end{aligned}$$

(30a)

$$\begin{aligned} \hbox {s.t.} \ \,&\lambda _i d_{ij} + \sum _{k=1}^n \mu _{ik} D_{ijk} = 0, \ i=1,\ldots ,m, \ j = 1,\ldots ,q\end{aligned}$$

(30b)

$$\begin{aligned}&\sum _{i=1}^m \lambda _i y_i = 0 \end{aligned}$$

(30c)

$$\begin{aligned}&\lambda _i \ge 0, \ i=1,\ldots ,m \end{aligned}$$

(30d)

The interval uncertainty $[\mathbf {x}_i^0-\varvec{\delta }_i, \mathbf {x}_i^0+\varvec{\delta }_i]$ is a special case of polyhedral uncertainty since when defining

$$\begin{aligned} \mathbf {D}_i = \begin{pmatrix} I \\ -I \end{pmatrix}, \ \ \ \mathbf {d}_i = \begin{pmatrix} \mathbf {x}_i^0+\varvec{\delta }_i \\ -\mathbf {x}_i^0+\varvec{\delta }_i \end{pmatrix} \end{aligned}$$

(31)

$ \{ \mathbf {x}_i : \mathbf {x}_i \in [\mathbf {x}_i^0-\varvec{\delta }_i, \mathbf {x}_i^0+\varvec{\delta }_i] \} $ and $ \{ \mathbf {x}_i : \mathbf {D}_i \mathbf {x}_i \le \mathbf {d}_i \} $ are equivalent. The authors of [60] also proposed probabilistic bounds on constraint violation in this case.

3.2 Chance Constrained SVM through Robust Optimization

The chance-constrained program (CCP) is used to ensure the small probability of misclassification for the uncertain data. The chance-constrained SVM formulation is:

$$\begin{aligned} \min _{\mathbf {w},b,\xi _i} \ \,&\dfrac{1}{2} \Vert \mathbf {w} \Vert _2^2 + C \sum _{i=1}^m \xi _i \end{aligned}$$

(32a)

$$\begin{aligned} \hbox {s.t.} \ \,&\hbox {Prob} \Bigl \{ y_i (\mathbf {w}^\top X_i +b) \le 1-\xi _i \Bigr \} \le \varepsilon , \ \ \ \xi _i \ge 0, \ \ i=1,\ldots ,m \end{aligned}$$

(32b)

where $ 0 < \varepsilon \le 1 $ is a prameter given by the user and close to 0. This model ensures an upper bound on the misclassification probability, but the chance constraints are typically non-convex so the problem is very hard to solve.

The work so far to deal with the chance constraint is to transform them by different bounding inequalities. When the mean and covariance matrix are known, the multivariate Chebyshev bound via robust optimization can be used to express the chance constraints above [61, 62].

Markov’s inequality states that if $X$ is a nonnegative random variable and $a>0$, then

$$\begin{aligned} \hbox {Prob} \{ X \ge a \} \le { \mathbf {E} [X] \over a } \end{aligned}$$

(33)

Consider the random variable $ \bigl ( X-\mathbf {E} [X] \bigr )^2 $. Since $ \mathbf {Var} (X) = \mathbf {E} \bigl [ (X-\mathbf {E} [X])^2 \bigr ] $, then

$$\begin{aligned} \hbox {Prob} \{ \bigl ( X-\mathbf {E} [X] \bigr )^2 \ge a^2 \} \le { \mathbf {Var} (X) \over a^2 } \end{aligned}$$

(34)

which yields the Chebyshev’s inequality

$$\begin{aligned} \hbox {Prob} \{ \bigl | X-\mathbf {E} [X] \bigr | \ge a \} \le { \mathbf {Var} (X) \over a^2 } \end{aligned}$$

(35)

Let $ \mathbf {x} \sim (\mathbf {\mu }, \Sigma ) $ denote the random vector $ \mathbf {x} $ with mean $\mathbf {\mu }$ and convariance matrix $\Sigma $. The multivariate Chebyshev inequality [65, 66] states that for an arbitrary closed convex set $S$, the supremum of the probability that $ \mathbf {x} $ takes a value in $S$ is

$$\begin{aligned} \sup _{\mathbf {x} \sim (\mathbf {\mu }, \mathbf {\Sigma })} \hbox {Prob} \{\mathbf {x} \in S\}&= { 1 \over 1+d^2 } \end{aligned}$$

(36a)

$$\begin{aligned} d^2&= \inf _{\mathbf {x} \in S} (\mathbf {x} - \mathbf {\mu })^\top \Sigma ^{-1} (\mathbf {x} - \mathbf {\mu }) \end{aligned}$$

(36b)

For the constraint $\hbox {Prob} \{ \mathbf {w}^\top \mathbf {x} + b \le 0 \} \le \varepsilon $, it could be derived that:

$$\begin{aligned} \mathbf {w}^\top \mathbf {\mu } + b \ge \kappa _C ||\Sigma ^{1 \over 2} \mathbf {w}||_2 \end{aligned}$$

(37)

where $ \kappa _C=\sqrt{(1-\varepsilon )/\varepsilon } $.

Applying the above result to the chance constrained SVM, the Chebyshev based reformulation utilizing the means $\mathbf {\mu }_i $ and covariance matrix $\Sigma _i$ of each uncertain training point $X_i$ can be obtained as the following robust model [61, 62]:

$$\begin{aligned} \min _{\mathbf {w},b,\xi _i} \ \,&\dfrac{1}{2} \Vert \mathbf {w} \Vert _2^2 + C \sum _{i=1}^m \xi _i \end{aligned}$$

(38a)

$$\begin{aligned} \hbox {s.t.} \ \,&y_i (\mathbf {w}^\top \mu _i +b) \ge 1-\xi _i+\kappa _C||\Sigma _i^{1 \over 2} \mathbf {w}||_2, \ \ \ \xi _i \ge 0, \ \ i=1,\ldots ,m \end{aligned}$$

(38b)

Another approach to study SVM with chance constraints is to use Bernstein approximation schemes [67–69]. Ben-Tal et al. [63] employed Bernstein bounding schemes for the CCP relaxation and transformed the problem as a convex second order cone program with robust set constraints to guarantee the satisfaction of the chance constraints and can be solved efficiently using interior point solvers.

The Bernstein based relaxation utilized both the support (bounds, i.e. extremum values of the data points) and moment information (mean and variance). For random data point $ X_i = [X_{i1},\ldots ,X_{in}]^\top $ and its label $ y_i $, support information is the bounds of the data points $ l_{ij} \le X_{ij} \le u_{ij} $, i.e. $ X_i \in \fancyscript{R}_i = \{ \mathbf {x}_i=[x_{i1},\ldots ,x_{in}]^\top \in \mathbb {R}^n \ | \ l_{ij} \le x_{ij} \le u_{ij}, j=1,\ldots ,n \} $, 1st moment information is the bounds on the means of the data points $ \mu _i^- = [\mu _{i1}^-,\ldots ,\mu _{in}^-]^\top \le \mu _i = \mathbf {E}[X_i] = [\mathbf {E}[X_{i1}],\ldots ,\mathbf {E}[X_{in}]]^\top \le \mu _i^+ = [\mu _{i1}^+,\ldots ,\mu _{in}^+]^\top $, and 2nd moment information is the bounds on the second-moments of the data points $ 0 \le \mathbf {E}[X_{ij}^2] \le \sigma _{ij}^2 $.

The Bernstein based relaxation is to derive convex constraints so that when these convex constraints are satisfied then the chance-constraints are guaranteed to be satisfied. They proved that with the information of independent random variable $X_{ij}$, i.e. support $ l_{ij} \le X_{ij} \le u_{ij} $, bounds on the first-moment $ \mu _{ij}^- \le \mu _{ij} = \mathbf {E}[X_{ij}] \le \mu _{ij}^+ $, and bounds on the second-moment $ 0 \le \mathbf {E}[X_{ij}^2] \le \sigma _{ij}^2 $, the chance-constraint in SVM is satisfied if the following convex constraint holds:

$$\begin{aligned} 1 - \xi _i - y_i b + \sum _j \Bigl ( \max \bigl [ -y_i \mu _{ij}^- w_j, -y_i \mu _{ij}^+ w_j \bigr ] \Bigr ) + \kappa _B ||\Sigma _i \mathbf {w}||_2 \le 0 \end{aligned}$$

(39)

where $ \kappa _B = \sqrt{2\log (1/\varepsilon )} $, and the diagonal matrix

$$\begin{aligned} \Sigma _i = \hbox {diag} \Bigl ( s_{i1} \nu ( \mu _{i1}^-, \mu _{i1}^+, \sigma _{i1} ), \ldots , s_{in} \nu ( \mu _{in}^-, \mu _{in}^+, \sigma _{in} ) \Bigr ) \end{aligned}$$

(40)

where $ s_{ij} = { u_{ij} - l_{ij} \over 2 } $ and the function $ \nu ( \mu _{ij}^-, \mu _{ij}^+, \sigma _{ij} ) $ is defined by normalizing $ \hat{X}_{ij} = { X_{ij} - c_{ij} \over s_{ij} } $, where $ c_{ij} = {l_{ij}+u_{ij} \over 2} $ and $ s_{ij} = {u_{ij}-l_{ij} \over 2} $. Using the information of $ X_{ij} $, one can easily compute the moment information of $ \hat{X}_{ij} $, which are denoted by $ \hat{\mu }_{ij}^- \le \hat{\mu }_{ij} = \mathbf {E}[\hat{X}_{ij}] \le \hat{\mu }_{ij}^+ $ and $ 0 \le \mathbf {E}[\hat{X}_{ij}^2] \le \hat{\sigma }_{ij}^2 $. They proved that

$$\begin{aligned} \mathbf {E} \Bigl [ \exp \{\tilde{t} \hat{X}_{ij} \} \Bigr ] \le g_{\hat{\mu }_{ij},\hat{\sigma }_{ij}} (\tilde{t}) = {\left\{ \begin{array}{ll} {(1-\hat{\mu }_{ij})^2 \exp \Bigl \{ \tilde{t} { \hat{\mu }_{ij}-\hat{\sigma }_{ij}^2 \over 1-\hat{\mu }_{ij}} \Bigr \} \,+\, \bigl ( \hat{\sigma }_{ij}^2 - \hat{\mu }_{ij}^2 \bigr ) \exp \{ \tilde{t} \} \over 1-2\hat{\mu }_{ij}\,+\,\hat{\sigma }_{ij}^2}, &{}\tilde{t} \ge 0 \\ {(1+\hat{\mu }_{ij})^2 \exp \Bigl \{ \tilde{t} { \hat{\mu }_{ij}+\hat{\sigma }_{ij}^2 \over 1+\hat{\mu }_{ij}} \Bigr \} \,+\,\bigl ( \hat{\sigma }_{ij}^2 - \hat{\mu }_{ij}^2 \bigr ) \exp \{ - \tilde{t} \} \over 1+2\hat{\mu }_{ij}\,+\,\hat{\sigma }_{ij}^2}, &{}\tilde{t} \le 0 \end{array}\right. }\nonumber \\ \end{aligned}$$

(41)

They defined $ h_{\hat{\mu }_{ij},\hat{\sigma }_{ij}} (\tilde{t}) = \log g_{\hat{\mu }_{ij},\hat{\sigma }_{ij}} (\tilde{t}) $, and the function $\nu ( \mu ^-, \mu ^+, \sigma )$ is defined as:

$$\begin{aligned} \nu ( \mu ^-, \mu ^+, \sigma )&= \min \Bigl \{ k \ge 0 : h_{\hat{\mu },\hat{\sigma }} (\tilde{t}) \le \max [\hat{\mu }^- \tilde{t}, \hat{\mu }^+ \tilde{t}] + {k^2 \over 2} \tilde{t}^2, \nonumber \\&\quad \ \forall \hat{\mu } \in [\hat{\mu }^-,\hat{\mu }^+], \tilde{t} \Bigr \} \end{aligned}$$

(42)

This value can be calculated numerically. Under the condition that $ \mu _{ij}^- \le c_{ij} \le \mu _{ij}^+ $, this value can be computed analytically by $ \nu ( \mu ^-, \mu ^+, \sigma ) = \sqrt{1-(\hat{\mu }^{\min })^2} $, where $ \hat{\mu }^{\min } = \min (-\hat{\mu }^-, \hat{\mu }^+ ) $.

Replacing the chance-constraints in SVM by the convex constraint derived above, the problem is transformed into a convex second order cone program:

$$\begin{aligned} \min _{\mathbf {w},b,\xi _i,z_{ij}} \ \,&\dfrac{1}{2} \Vert \mathbf {w} \Vert _2^2 + C \sum _{i=1}^m \xi _i \end{aligned}$$

(43a)

$$\begin{aligned} \hbox {s.t.} \ \,&1 - \xi _i - y_i b + \sum _j z_{ij} + \kappa _B ||\Sigma _i \mathbf {w}||_2 \le 0 \end{aligned}$$

(43b)

$$\begin{aligned}&z_{ij} \ge -y_i \mu _{ij}^- w_j, \ z_{ij} \ge -y_i \mu _{ij}^+ w_j\end{aligned}$$

(43c)

$$\begin{aligned}&\xi _i \ge 0, \ \ i=1,\ldots ,m \end{aligned}$$

(43d)

which can be solved efficiently using cone programming solvers.

The geometrical interpretation of this convex constraint is that $ y_i (\mathbf {w}^\top \mathbf {x} +b) \ge 1-\xi _i $ is satisfied for all $ \mathbf {x} $ belonging to the union of ellipsoids $ \fancyscript{E} \bigl ( \mu _i, \kappa _B \Sigma _i \bigr ) = \bigl \{ \mathbf {x} = \mu _i + \kappa _B \Sigma _i \mathbf {a} \ : \ || \mathbf {a} ||_2 \le 1 \bigr \} $ with center $ \mu _i $, shape and size $ \kappa _B \Sigma _i $, and the union is over $ \mu _i \in [\mu _i^-, \mu _i^+] $, i.e.,

$$\begin{aligned} y_i (\mathbf {w}^\top \mathbf {x} +b) \ge 1-\xi _i, \ \ \forall \mathbf {x} \in \cup _{\mu _i \in [\mu _i^-, \mu _i^+]} \fancyscript{E} \bigl ( \mu _i, \kappa _B \Sigma _i \bigr ) \end{aligned}$$

(44)

Therefore, this constraint is defining an uncertainty set $ \cup _{\mu _i \in [\mu _i^-, \mu _i^+]} \fancyscript{E} \bigl ( \mu _i, \kappa _B \Sigma _i \bigr ) $ for each uncertain training data point $ X_i $. If all the points in the uncertainty set satisfy $ y_i (\mathbf {w}^\top \mathbf {x} +b) \ge 1-\xi _i $, then the chance-constraint is guaranteed to be satisfied. This transforms the CCP into a robust optimization problem over the uncertainty set.

Since the size of the uncertainty set depend on $\kappa _B$, and $ \kappa _B = \sqrt{2\log (1/\varepsilon )} $, when the upperbound of misclassification error $\varepsilon $ decreases, the size of the uncertainty set increases. When $\varepsilon $ is very small, the uncertainty set would become huge so the constraint would be too conservative. As the support information provides with the bounding hyper-rectangle $\fancyscript{R}_i$ where the true training data point $X_i$ would always lie in, a less conservative classifier can be obtained by taking the intersection of $ \cup _{\mu _i \in [\mu _i^-, \mu _i^+]} \fancyscript{E} \bigl ( \mu _i, \kappa _B \Sigma _i \bigr ) $ and $\fancyscript{R}_i$ as the new uncertainty set.

The authors proved that when the uncertainty set is the intersection, i.e.,

$$\begin{aligned} y_i (\mathbf {w}^\top \mathbf {x} +b) \ge 1-\xi _i, \ \ \forall \mathbf {x} \in \Bigg ( \cup _{\mu _i \in [\mu _i^-, \mu _i^+]} \fancyscript{E} \bigl ( \mu _i, \kappa _B \Sigma _i \bigr ) \Bigg ) \cap \fancyscript{R}_i \end{aligned}$$

(45)

The above constraint is satisfied if and only if the following convex constraint holds:

$$\begin{aligned}&\sum _j \Bigl ( \max \bigl [ -l_{ij} (y_i w_j + a_{ij}), -u_{ij} (y_i w_j + a_{ij}) \bigr ] + \max \bigl [ \mu _{ij}^- a_{ij}, \mu _{ij}^+ a_{ij} \bigr ] \Bigr )\nonumber \\&\qquad + 1 - \xi _i - y_i b + \kappa _B ||\Sigma _i \mathbf {a}_i||_2 \le 0 \end{aligned}$$

(46)

Replacing the chance-constraints in SVM by the robust but less conservative convex constraint above, the problem is transformed into the following SOCP:

$$\begin{aligned} \min _{\mathbf {w},b,\xi _i,z_{ij},\tilde{z}_{ij},\mathbf {a}_i} \ \,&\dfrac{1}{2} \Vert \mathbf {w} \Vert _2^2 + C \sum _{i=1}^m \xi _i \end{aligned}$$

(47a)

$$\begin{aligned} \hbox {s.t.} \ \,&1 - \xi _i - y_i b + \sum _j \tilde{z}_{ij} + \sum _j z_{ij} + \kappa _B ||\Sigma _i \mathbf {a}_i||_2 \le 0 \end{aligned}$$

(47b)

$$\begin{aligned}&z_{ij} \ge \mu _{ij}^- a_{ij}, \ z_{ij} \ge \mu _{ij}^+ a_{ij} \end{aligned}$$

(47c)

$$\begin{aligned}&\tilde{z}_{ij} \ge -l_{ij} (y_i w_j + a_{ij}), \ \tilde{z}_{ij} \ge -u_{ij} (y_i w_j + a_{ij}) \end{aligned}$$

(47d)

$$\begin{aligned}&\xi _i \ge 0, \ \ i=1,\ldots ,m \end{aligned}$$

(47e)

The Bernstein based formulations (43) and (47) are robust to the moment estimation errors in addition to the uncertainty in data, since they are using the bounds on mean $\big (\mu _{ij}^-,\mu _{ij}^+\big ) $ and bounds on second-moment $\big (\sigma _{ij}^2\big )$ instead of the exact values of the moments which are often unknown.

Comparing the two approaches for the chance constrained SVM, both of them are robust to uncertainties in data and did not make assumptions to the underlying probability distribution. Chebyshev based schemes only employed moment information of the uncertain training points, while Bernstein bounds employed both support and moment information, therefore can be less conservative than Chebyshev bounds. The resulting classifier by Bernstein approach achieved larger classification margins and therefore better generalization ability according to the structural risk minimization principle of Vapnik [1]. A drawback of Bernstein based formulation is that it assumes each element $X_{ij}$ is independent with each other, while Chebyshev based formulation allows the covariance matrix $\Sigma _i$ of uncertain training point $X_i$.

4 Concluding Remarks

This paper presented a survey on SVM with uncertainties. When direct model cannot guarantee a generally good performance on the uncertainty set, robust optimization is utilized to obtain an optimal performance under the worst case scenario. The perturbation of the uncertain data could be bounded by the norm, or expressed as intervals and polyhedrons. When the constraint is a chance constraint, different bounding schemes like multivariate Chebyshev inequality and Bernstein bounding schemes are used to ensure the small probability of misclassification for the uncertain data.

The models in the literature are generally processing the linear SVM. A big part of the power of SVM lies in the powerful representation of nonlinear kernel in SVM models, which is to generate nonlinear classification boundaries. Therefore, it is suggested that more study could be conducted to explore how to deal with nonlinear kernels. And more schemes could be explored to represent the robust regions of the uncertain data and formulate the models as convex solvable problems.

References

Vapnik VN (1998) Statistical learning theory. Wiley, New York
Google Scholar
Vapnik VN (1999) An overview of statistical learning theory. IEEE Trans Neural Netw 10(5):988–999
Article Google Scholar
Burges CJ (1998) A tutorial on support vector machines for pattern recognition. Data mining and knowledge discovery 2(2):121–167
Article Google Scholar
Abe S (2010) Support vector machines for pattern classification. Springer, Heidelberg
Book Google Scholar
Ben-Hur A, Weston J (2010) A users guide to support vector machines. In: Carugo O, Eisenhaber F (eds) Data mining techniques for the life sciences. Springer, Berlin, pp 223–239
Chang CC, Lin CJ (2011) Libsvm: a library for support vector machines. ACM Trans Intell Syst Technol (TIST) 2(3):27
Google Scholar
Mueller A, Candrian G, Grane VA, Kropotov JD, Ponomarev VA, Baschera GM (2011) Discriminating between adhd adults and controls using independent erp components and a support vector machine: a validation study. Nonlinear Biomed Phys 5(1):5
Article Google Scholar
Xie J, Wang C (2011) Using support vector machines with a novel hybrid feature selection method for diagnosis of erythemato-squamous diseases. Expert Syst Appl 38(5):5809–5815
Article Google Scholar
Orrù G, Pettersson-Yeo W, Marquand AF, Sartori G, Mechelli A (2012) Using support vector machine to identify imaging biomarkers of neurological and psychiatric disease: a critical review. Neurosci Biobehav Rev 36(4):1140–1152
Article Google Scholar
Ramírez J, Górriz J, Salas-Gonzalez D, Romero A, López M, Álvarez I, Gómez-Río M (2013) Computer-aided diagnosis of Alzheimers type dementia combining support vector machines and discriminant set of features. Inf Sci 237:59–72
Article Google Scholar
Mourao-Miranda J, Reinders A, Rocha-Rego V, Lappin J, Rondina J, Morgan C, Morgan KD, Fearon P, Jones PB, Doody GA et al (2012) Individualized prediction of illness course at the first psychotic episode: a support vector machine mri study. Psychol Med 42(05):1037–1047
Article Google Scholar
Subasi A, Ismail Gursoy M (2010) EEG signal classification using PCA, ICA, IDA and support vector machines. Expert Syst Appl 37(12):8659–8666
Article Google Scholar
Übeyli ED (2010) Least squares support vector machine employing model-based methods coefficients for analysis of EEG signals. Expert Syst Appl 37(1):233–239
Article Google Scholar
Nicolaou N, Georgiou J (2012) Detection of epileptic electroencephalogram based on permutation entropy and support vector machines. Expert Syst Appl 39(1):202–209
Article Google Scholar
Qiu JD, Huang JH, Shi SP, Liang RP (2010) Using the concept of chou’s pseudo amino acid composition to predict enzyme family classes: an approach with support vector machine based on discrete wavelet transform. Protein Pept Lett 17(6):715–722
Article Google Scholar
Kumar Kandaswamy K, Pugalenthi G, Moller S, Hartmann E, Uwe Kalies K, N Suganthan P, Martinetz T (2010) Prediction of apoptosis protein locations with genetic algorithms and support vector machines through a new mode of pseudo amino acid composition. Protein Pept Lett 17(12):1473–1479
Article Google Scholar
Bikadi Z, Hazai I, Malik D, Jemnitz K, Veres Z, Hari P, Ni Z, Loo TW, Clarke DM, Hazai E et al (2011) Predicting p-glycoprotein-mediated drug transport based on support vector machine and three-dimensional crystal structure of p-glycoprotein. PLoS One 6(10):e25–815
Lise S, Buchan D, Pontil M, Jones DT (2011) Predictions of hot spot residues at protein-protein interfaces using support vector machines. PLoS One 6(2):e16,774
Article Google Scholar
Mohabatkar H, Mohammad Beigi M, Esmaeili A (2011) Prediction of gaba receptor proteins using the concept of Chou’s pseudo-amino acid composition and support vector machine. J Theor Biol 281(1):18–23
Article Google Scholar
Morra JH, Tu Z, Apostolova LG, Green AE, Toga AW, Thompson PM (2010) Comparison of adaboost and support vector machines for detecting Alzheimer’s disease through automated hippocampal segmentation. IEEE Trans Med Imaging 29(1):30–43
Article Google Scholar
Bauer S, Nolte LP, Reyes M (2011) Fully automatic segmentation of brain tumor images using support vector machine classification in combination with hierarchical conditional random field regularization. In: Fichtinger G, Martel A, Peters T (eds) Proceedings of Medical Image Computing and Computer-Assisted Intervention-MICCAI 2011. Springer, Berlin, pp 354–361
Chapter Google Scholar
Yao J, Dwyer A, Summers RM, Mollura DJ (2011) Computer-aided diagnosis of pulmonary infections using texture analysis and support vector machine classification. Acad Radiol 18(3):306–314
Article Google Scholar
Prosser B, Zheng WS, Gong S, Xiang T, Mary Q (2010) Person re-identification by support vector ranking. BMVC 1:5
Google Scholar
Dardas NH, Georganas ND (2011) Real-time hand gesture detection and recognition using bag-of-features and support vector machine techniques. IEEE Trans Instrum Meas 60(11):3592–3607
Article Google Scholar
Wei J, Jian-qi Z, Xiang Z (2011) Face recognition method based on support vector machine and particle swarm optimization. Expert Syst Appl 38(4):4390–4393
Article Google Scholar
Han B, Davis LS (2012) Density-based multifeature background subtraction with support vector machine. IEEE Trans Pattern Anal Mach Intell 34(5):1017–1023
Article Google Scholar
Waske B, van der Linden S, Benediktsson JA, Rabe A, Hostert P (2010) Sensitivity of support vector machines to random feature selection in classification of hyperspectral data. IEEE Trans Geosci Remote Sens 48(7):2880–2889
Article Google Scholar
Mountrakis G, Im J, Ogole C (2011) Support vector machines in remote sensing: a review. ISPRS J Photogramm Remote Sens 66(3):247–259
Article Google Scholar
Li CH, Kuo BC, Lin CT, Huang CS (2012) A spatial-contextual support vector machine for remotely sensed image classification. IEEE Trans Geosci Remote Sens 50(3):784–799
Article Google Scholar
Otukei J, Blaschke T (2010) Land cover change assessment using decision trees, support vector machines and maximum likelihood classification algorithms. Int J Appl Earth Obs Geoinf 12:S27–S31
Article Google Scholar
Shao Y, Lunetta RS (2012) Comparison of support vector machine, neural network, and cart algorithms for the land-cover classification using limited training data points. ISPRS J Photogramm Remote Sens 70:78–87
Article Google Scholar
Volpi M, Tuia D, Bovolo F, Kanevski M, Bruzzone L (2013) Supervised change detection in VHR images using contextual information and support vector machines. Int J Appl Earth Obs Geoinf 20:77–85
Article Google Scholar
Yilmaz I (2010) Comparison of landslide susceptibility mapping methodologies for koyulhisar, Turkey: conditional probability, logistic regression, artificial neural networks, and support vector machine. Environ Earth Sci 61(4):821–836
Article Google Scholar
Tien Bui D, Pradhan B, Lofman O, Revhaug I (2012) Landslide susceptibility assessment in vietnam using support vector machines, decision tree, and naive bayes models. Math Problems Eng 2012: Article ID 974638
Xu C, Dai F, Xu X, Lee YH (2012) Gis-based support vector machine modeling of earthquake-triggered landslide susceptibility in the Jianjiang River Watershed, China. Geomorphology 145:70–80
Article Google Scholar
Pradhan B (2013) A comparative study on the predictive ability of the decision tree, support vector machine and neuro-fuzzy models in landslide susceptibility mapping using GIS. Comput Geosci 51:350–365
Article Google Scholar
Kisi O, Cimen M (2011) A wavelet-support vector machine conjunction model for monthly streamflow forecasting. J Hydrol 399(1):132–140
Article Google Scholar
Yoon H, Jun SC, Hyun Y, Bae GO, Lee KK (2011) A comparative study of artificial neural networks and support vector machines for predicting groundwater levels in a coastal aquifer. J Hydrol 396(1):128–138
Article Google Scholar
Gomez FR, Rajapakse AD, Annakkage UD, Fernando IT (2011) Support vector machine-based algorithm for post-fault transient stability status prediction using synchronized measurements. IEEE Trans Power Syst 26(3):1474–1483
Article Google Scholar
Niu D, Wang Y, Wu DD (2010) Power load forecasting using support vector machine and ant colony optimization. Expert Syst Appl 37(3):2531–2539
Article Google Scholar
Kavaklioglu K (2011) Modeling and prediction of Turkeys electricity consumption using support vector regression. Appl Energy 88(1):368–375
Article Google Scholar
Zhou J, Shi J, Li G (2011) Fine tuning support vector machines for short-term wind speed forecasting. Energy Conversv Manag 52(4):1990–1998
Article Google Scholar
Kara Y, Acar Boyacioglu M, Baykan ÖK (2011) Predicting direction of stock price index movement using artificial neural networks and support vector machines: the sample of the istanbul stock exchange. Expert Syst Appl 38(5):5311–5319
Article Google Scholar
Yeh CY, Huang CW, Lee SJ (2011) A multiple-kernel support vector regression approach for stock market price forecasting. Expert Syst Appl 38(3):2177–2186
Article Google Scholar
Huang CF (2012) A hybrid stock selection model using genetic algorithms and support vector regression. Appl Soft Comput 12(2):807–818
Article Google Scholar
Yang, XS, Deb S, Fong S (2011) Accelerated particle swarm optimization and support vector machine for business optimization and applications. In: Fong S (ed) Networked digital technologies. Springer, Berlin, pp 53–66
Rumpf T, Mahlein AK, Steiner U, Oerke EC, Dehne HW, Plümer L (2010) Early detection and classification of plant diseases with support vector machines based on hyperspectral reflectance. Comput Electron Agric 74(1):91–99
Article Google Scholar
Konar P, Chattopadhyay P (2011) Bearing fault detection of induction motor using wavelet and support vector machines (SVMS). Appl Soft Comput 11(6):4203–4211
Article Google Scholar
Horng SJ, Su MY, Chen YH, Kao TW, Chen RJ, Lai JL, Perkasa CD (2011) A novel intrusion detection system based on hierarchical clustering and support vector machines. Expert Syst Appl 38(1):306–313
Article Google Scholar
Wong PK, Xu Q, Vong CM, Wong HC (2012) Rate-dependent hysteresis modeling and control of a piezostage using online support vector machine and relevance vector machine. IEEE Trans Ind Electron 59(4):1988–2001
Article Google Scholar
Cui J, Wang Y (2011) A novel approach of analog circuit fault diagnosis using support vector machines classifier. Measurement 44(1):281–289
Article Google Scholar
Tian Y, Shi Y, Liu X (2012) Recent advances on support vector machines research. Technol Econ Dev Econ 18(1):5–33
Article Google Scholar
Bi J, Zhang T (2004) Support vector classification with input data uncertainty. Adv Neural Inf Process Syst 17:161–168
Google Scholar
Trafalis TB, Gilbert RC (2006) Robust classification and regression using support vector machines. Eur J Op Res 173(3):893–909
Article Google Scholar
Trafalis TB, Gilbert RC (2007) Robust support vector machines for classification and computational issues. Optim Methods Softw 22(1):187–198
Article Google Scholar
Trafalis TB, Alwazzi SA (2010) Support vector machine classification with noisy data: a second order cone programming approach. Int J Gen Syst 39(7):757–781
Article Google Scholar
Pant R, Trafalis TB, Barker K (2011) Support vector machine classification of uncertain and imbalanced data using robust optimization. In: Proceedings of the 15th WSEAS international conference on computers, World Scientific and Engineering Academy and Society (WSEAS), pp 369–374
Xanthopoulos P, Pardalos PM, Trafalis TB (2012) Robust data mining. Springer, New York
Google Scholar
Ghaoui LE, Lanckriet GR, Natsoulis G (2003) Robust classification with interval data. Technical report UCB/CSD-03-1279, Computer Science Division, University of California, Berkeley
Fan N, Sadeghi E, Pardalos PM (2014) Robust support vector machines with polyhedral uncertainty of the input data. In: Pardalos PM, Resende MGC, Vogiatzis C, Walteros JL (eds) Learning and intelligent optimization. Springer, Berlin, pp 291–305
Chapter Google Scholar
Bhattacharyya C, Grate LR, Jordan MI, El Ghaoui L, Mian IS (2004) Robust sparse hyperplane classifiers: application to uncertain molecular profiling data. J Comput Biol 11(6):1073–1089
Article Google Scholar
Shivaswamy PK, Bhattacharyya C, Smola AJ (2006) Second order cone programming approaches for handling missing and uncertain data. J Mach Learn Res 7:1283–1314
Google Scholar
Ben-Tal A, Bhadra S, Bhattacharyya C, Nath JS (2011) Chance constrained uncertain classification via robust optimization. Math Program 127(1):145–173
Article Google Scholar
Cortes C, Vapnik V (1995) Support-vector networks. Mach Learn 20(3):273–297
Google Scholar
Marshall AW, Olkin I (1960) Multivariate Chebyshev inequalities. Ann Math Stat 31(4):1001–1014
Article Google Scholar
Bertsimas D, Popescu I (2005) Optimal inequalities in probability theory: a convex optimization approach. Siam J Optim 15(3):780–804
Article Google Scholar
Ben-Tal A, Ghaoui LE, Nemirovski A (2009) Robust optimization. Princeton University Press, Princeton
Book Google Scholar
Ben-Tal A, Nemirovski A (2008) Selected topics in robust convex optimization. Math Program 112(1):125–158
Article Google Scholar
Nemirovski A, Shapiro A (2006) Convex approximations of chance constrained programs. Siam J Optim 17(4):969–996
Article Google Scholar

Download references

Acknowledgments

Research is supported by RSF Grant 14-41-00039.

Author information

Authors and Affiliations

Department of Industrial and Systems Engineering, University of Florida, Gainesville, FL, 32611, USA
Ximing Wang & Panos M. Pardalos

Authors

Ximing Wang
View author publications
You can also search for this author in PubMed Google Scholar
Panos M. Pardalos
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ximing Wang.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Wang, X., Pardalos, P.M. A Survey of Support Vector Machines with Uncertainties. Ann. Data. Sci. 1, 293–309 (2014). https://doi.org/10.1007/s40745-014-0022-8

Download citation

Received: 10 October 2014
Revised: 01 November 2014
Accepted: 10 December 2014
Published: 17 January 2015
Issue Date: December 2014
DOI: https://doi.org/10.1007/s40745-014-0022-8

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

A Survey of Support Vector Machines with Uncertainties

Abstract

Similar content being viewed by others

Robust chance-constrained support vector machines with second-order moment information

Robust Support Vector Machines with Polyhedral Uncertainty of the Input Data

Stochastic support vector regression with probabilistic constraints

1 Introduction

2 Basic SVM Models

3 SVM with Uncertainties

3.1 Robust SVM with Bounded Uncertainty

3.2 Chance Constrained SVM through Robust Optimization

4 Concluding Remarks

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A Survey of Support Vector Machines with Uncertainties

Abstract

Similar content being viewed by others

Robust chance-constrained support vector machines with second-order moment information

Robust Support Vector Machines with Polyhedral Uncertainty of the Input Data

Stochastic support vector regression with probabilistic constraints

1 Introduction

2 Basic SVM Models

3 SVM with Uncertainties

3.1 Robust SVM with Bounded Uncertainty

3.2 Chance Constrained SVM through Robust Optimization

4 Concluding Remarks

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation