An uncertain support vector machine based on soft margin method

Li, Qiqi; Qin, Zhongfeng; Liu, Zhe

doi:10.1007/s12652-022-04385-9

An uncertain support vector machine based on soft margin method

Original Research
Published: 29 September 2022

Volume 14, pages 12949–12958, (2023)
Cite this article

Download PDF

Journal of Ambient Intelligence and Humanized Computing Aims and scope Submit manuscript

An uncertain support vector machine based on soft margin method

Download PDF

Qiqi Li¹,
Zhongfeng Qin^1,2 &
Zhe Liu³

738 Accesses
1 Citation
1 Altmetric
Explore all metrics

Abstract

Traditional support vector machines (SVMs) play an important role in the classification of precise data. However, due to various reasons, available data are sometimes imprecise. In this paper, uncertain variables are adopted to describe the imprecise data, and an uncertain support vector machine (USVM) is built for linearly $\alpha$-nonseparable sets based on soft margin method, where a penalty coefficient is utilized as the trade-off between the maximum margin and the sum of slack variables. Then the equivalent crisp model is derived based on the inverse uncertainty distributions. Numerical experiments are designed to illustrate the application of the soft margin USVM. Finally, metrics, such as accuracy, precision, and recall are used to evaluate the robustness of the proposed model.

A Systematic Review on Supervised and Unsupervised Machine Learning Algorithms for Data Science

Generating method and application of basic probability assignment based on interval number distance and model reliability

Article 01 November 2023

Junwei Li, Baolin Xie, … Lin Zhou

Learning from imbalanced data: open challenges and future directions

Article Open access 22 April 2016

Bartosz Krawczyk

1 Introduction

As one of the most important classification methods, SVMs have achieved great success in numerous real-world problems (Vapnik 1999; Burges 1998), such as image recognition (VenkateswarLal et al. 2019), disease diagnosis (Wang et al. 2018; Okwuashi and Ndehedehe 2020; Gautam et al. 2021), intrusion detection (Mukkamala et al. 2002; Priyadharsini and Chitra 2021) and so on. Traditionally, SVMs are designed to search a tube with the maximum margin based on precise data. A tube is uniquely determined by a hyperplane and the minimum distance from the data to the hyperplane. Different SVMs are constructed according to the complexity of the classification problem. The hard margin method was first developed by Vapnik (1995) for linearly separable data sets, in which the data with positive labels and those with negative labels lie in the diverse half-spaces determined by a hyperplane. And the parameters of the hyperplane can be expressed as a linear combination of support vectors (Boser et al. 1992). Nevertheless, the majority of data sets are linearly nonseparable, and the SVM based on hard margin method failed in dealing with these problems. Thus, Cortes and Vapnik (1995) introduced slack variables for misclassification samples such that linearly nonseparable training sets can be classified at the maximum margin target, which is known as soft margin method.

However, when the observations are imprecise or the samples are not large enough, the results in the framework of probability theory are usually not satisfying. In order to handle such cases, Liu (2007) founded uncertainty theory, and then Liu (2009, 2010) further improved and perfected the theory. Following the idea, uncertain statistics are developed to handle the issues with imprecise observations. Yao and Liu (2018) proposed a least square method for regression by characterizing imprecise observations as uncertain variables. Then several other regression models (Hu and Gao 2020; Fang and Hong 2020; Zhang et al. 2020) were further studied, and parameter estimation methods (Liu and Yang 2020; Chen 2020; Li et al 2022) were discussed. In the meantime, confidence interval (Lio and Liu 2018), and hypothesis test (Lio and Liu 2020) for uncertain statistic were also introduced. For time series analysis with imprecise observations, autoregressive models (Yang and Liu 2019) and autoregressive moving average models (Lu et al. 2020; Xin et al. 2021) were also introduced, respectively. Some applications in other fields were also explored, such as uncertain differential game (Zhang et al. 2021; Yang and Gao 2016), uncertain extensive game (Wang et al. 2017), COVID-19 spread (Lio 2021) and uncertain queueing model (Yao 2021).

For classification problems with imprecise observations, Qin and Li (2022) introduced an USVM based on hard margin method in the framework of uncertainty theory, which extended the traditional hard margin SVM. However, the hard margin USVM is only suitable for the linearly $\alpha$-separable data sets with imprecise observations. In this paper, we propose an USVM based on soft margin method for the classification problem with linearly $\alpha$-nonseparable data set. Similarly, we assume the imprecise observations as uncertain variables and formulate an optimization model for the soft margin USVM. Further, we conduct numerical experiments to illustrate the application of the proposed method and evaluate its performance.

The paper is organized as follows. Some definitions and theorems in uncertainty theory are given in Sect. 2. In Sect. 3, we formulate an USVM based on soft margin method for linearly $\alpha$-nonseparable data sets in uncertain environment. Then, Sect. 4 presents two examples to show the application of the soft margin USVM. Finally, we give a conclusion of the paper in Sect. 5.

2 Preliminaries

In this section, we sketch some definitions and theorems used in this paper.

Let $\mathcal {L}$ be a $\sigma$-algebra on a nonempty set $\Gamma .$ A set function $\mathcal {M}:\mathcal {L}\rightarrow [0,1]$ is called an uncertain measure (Liu 2007; 2009) if it satisfies: (1) $\mathcal {M}\{\Gamma \}=1$ for the universal set $\Gamma$; (2) $\mathcal {M}\{\Lambda \}+\mathcal {M}\{\Lambda ^c\}=1$ for any $\Lambda \in \mathcal {L}$; (3) For every countable sequence $\Lambda _1,\Lambda _2,\ldots$, $\mathcal {M}\left\{ \bigcup _{i=1}^{\infty } \Lambda _i\right\} \le \sum _{i=1}^{\infty } \mathcal {M}\left\{ \Lambda _i\right\}$, where the triple $(\Gamma ,\mathcal {L},\mathcal {M})$ is called an uncertainty space; (4) Let $(\Gamma _k, \mathcal {L}_k, \mathcal {M}_k)$ be uncertainty spaces for $k=1, 2, \ldots .$ The product uncertain measure $\mathcal {M}$ is an uncertain measure satisfying

$$\begin{aligned} \mathcal {M}\left\{ \prod \limits _{i=1}^{\infty } \Lambda _k \right\} = \bigwedge \limits _{i=1}^{\infty } \mathcal {M}_k\{ \Lambda _k\}, \end{aligned}$$

where $\Lambda _k$ are arbitrarily chosen sets from $\mathcal {L}_k$ for $k = 1,\ 2,\ldots$, respectively.

An uncertain variable (Liu 2007) $\tau$ is a measurable function from an uncertainty space $(\Gamma ,\mathcal {L},\mathcal {M})$ to the set of real numbers, i.e., the set $\{\tau \in B \}= \{\gamma \in \Gamma \mid \tau (\gamma ) \in B \}$ is an event in $\mathcal {L}$ for any Borel set B. The function $\Upsilon (x)=\mathcal {M}\{ \tau \le x \}, x\in \Re$ is called the uncertainty distribution of $\tau$.

Theorem 1

(Liu 2010) Let $\xi _1,\xi _2,\ldots ,\xi _n$ be independent uncertain variables with inverse uncertainty distributions $\Phi ^{-1}_1,\Phi ^{-1}_2,\ldots ,\Phi ^{-1}_n$, respectively. If $f(x_1,x_2,\ldots ,x_n)$ is strictly increasing with respect to $x_1,x_2,\ldots ,$ $x_m$, and strictly decreasing with respect to $x_{m+1},x_{m+2}, \ldots , x_{n}$, then the uncertain variable $\xi =f(\xi _1,\xi _2,\ldots ,\xi _n)$ has an inverse uncertainty distribution

$$\begin{aligned} \begin{aligned} \Psi ^{-1}(u)=f\left( \Phi _1^{-1}(u),\Phi _2^{-1}(u),\ldots ,\Phi _{m}^{-1}(u), \right. \\ \left. \qquad \qquad \qquad \Phi _{m+1}^{-1} (1-u), \ldots ,\Phi _n^{-1}(1-u)\right) . \\ \end{aligned} \end{aligned}$$

Theorem 2

(Liu 2015) Assume the function $g(\pmb {x}$, $\xi _1$, $\xi _2, \ldots , \xi _n)$ is strictly increasing with respect to $\xi _1, \xi _2, \ldots , \xi _k$ and strictly decreasing with respect to $\xi _{k+1}, \xi _{k+2}, \ldots , \xi _n$. If $\xi _1, \xi _2, \ldots , \xi _n$ are independent uncertain variables with inverse uncertainty distributions $\Phi _1^{-1}, \Phi _2^{-1}, \ldots , \Phi _n^{-1},$ respectively, then the chance constraint

$$\begin{aligned} \mathcal {M}\left\{ g\left( \pmb {x}, \xi _1, \xi _2, \ldots , \xi _n \right) \le 0 \right\} \ge \alpha \end{aligned}$$

holds if and only if

$$\begin{aligned} \begin{aligned}&g\left( \pmb {x}, \Phi ^{-1}_1(\alpha ), \ldots , \Phi ^{-1}_k(\alpha ), \right. \\&\left. \qquad \qquad \Phi ^{-1}_{k+1} (1-\alpha ), \ldots , \Phi ^{-1}_{n}(1-\alpha ) \right) \le 0. \end{aligned} \end{aligned}$$

(1)

3 Soft margin method for linearly $\alpha$-nonseparable data sets

This section discusses linearly $\alpha$-nonseparable data sets and presents a soft margin USVM for this situation.

Suppose the observed data set is $S= \{( \tilde{\pmb {x}}_1,y_1 ), ( \tilde{\pmb {x}}_2, y_2), \ldots , (\tilde{\pmb {x}}_{l},y_{l}) \}$, where $\tilde{\pmb {x}}_i$ are uncertain vectors, and $y_i\in \{ 1,-1\}$ are crisp labels for $i=1,2,\ldots ,l$, respectively. Let $\alpha \in (0.5,1)$ be a given confidence level. Qin and Li (2022) defined that S is linearly $\alpha$-separable if there exists one hyperplane $\pmb {w}^T\pmb {x} +b=0$ such that

$$\begin{aligned} \mathcal {M}\left\{ y_i\cdot \left( \pmb {w}^T\tilde{\pmb {x}}_i +b \right) \ge 0 \right\} \ge \alpha , \quad i=1,2,\ldots , l, \end{aligned}$$

(2)

where $\pmb {w}=(w_1, w_2, \ldots , w_n)$ is an n-dimensional vector with an Euclidean norm $\Vert \pmb {w}\Vert$. This definition implies that linearly $\alpha$-separable data sets are those which can be classified by a hyperplane at confidence level $\alpha$.

If there exists no hyperplane $\pmb {w}^T\pmb {x} +b=0$ such that

$$\begin{aligned} \mathcal {M}\left\{ y_i\cdot \left( \pmb {w}^T\tilde{\pmb {x}}_i +b \right) \ge 0 \right\} \ge \alpha \end{aligned}$$

for all $i=1,2,\ldots , l$, then we call the data set S linearly $\alpha$-nonseparable. For instance, Fig. 1 gives an example of a linearly $\alpha$-nonseparable data set. There is no hyperplane that makes Inequality (2) hold. Thus the hard margin method (Qin and Li 2022) can not be applied in this case.

Suppose that S is linearly $\alpha$-nonseparable. Then, for each observation $(\tilde{\pmb {x}}_i, y_i )$, we may seek a nonnegative slack variable $s_i$ such that

$$\begin{aligned} \mathcal {M}\left\{ y_i\cdot \left( \pmb {w}^T\tilde{\pmb {x}}_i +b \right) \ge -s_i\right\} \ge \alpha . \end{aligned}$$

(3)

Note that such an inequality holds if $s_i$ is large enough. Therefore, we always want the slack variable $s_i$ to be as small as possible. Note that if $s_i=0$ for all i, then the set S degenerates into a linearly $\alpha$-separable one.

Theorem 3

Suppose that the components $\tilde{x}_{i1}$, $\tilde{x}_{i2}, \ldots , \tilde{x}_{in}$ of uncertain vector $\tilde{\pmb {x}}_{i}$ are independent. Let $\Phi ^{-1}_{i,j}$ denote the inverse uncertainty distributions of$\tilde{x}_{ij}$ for $i=1,2,\ldots ,l$ and $j=1, 2, \ldots , n$, respectively. Then Inequality (3) is equivalent to the following crisp form

$$\begin{aligned} y_i \left( \sum _{j=1}^{n} w_j \Upsilon ^{-1}_{ij} \left( \alpha ,-y_iw_j \right) +b \right) \ge -s_i, \end{aligned}$$

(4)

where

$$\begin{aligned} \begin{aligned} \Upsilon _{ij}^{-1}\left( \alpha ,-y_iw_j\right)&= \Phi ^{-1}_{ij}\left( \alpha \right) \cdot I_{ \left\{ -y_iw_{j} \ge 0 \right\} } \\&+ \Phi ^{-1}_{ij}\left( 1 - \alpha \right) \cdot I_{\left\{ -y_iw_{j} < 0 \right\} } .\\ \end{aligned} \end{aligned}$$

(5)

Proof

The argument breaks into two cases. Case I: When $y_i=1$. In this case, Inequality (3) degenerates into

$$\begin{aligned} \begin{aligned}&\mathcal {M}\left\{ \pmb {w}^T \tilde{\pmb {x}}_i +b \ge -s_i \right\} \\&\qquad =\mathcal {M}\left\{ -\pmb {w}^T \tilde{\pmb {x}}_i -b -s_i \le 0 \right\} \\&\qquad = \mathcal {M}\left\{ -\sum _{j=1}^nw_j\tilde{x}_{ij} -b - s_i \le 0 \right\} \\&\qquad \ge \alpha . \\ \end{aligned} \end{aligned}$$

(6)

The inverse uncertainty distribution of $-\pmb {w}^T \tilde{\pmb {x}}_i -b -s_i=-\sum _{j=1}^n w_j \tilde{x}_{ij} -b -s_i$ is

$$\begin{aligned} \begin{aligned} F^{-1}_{1i} (u)&= -\sum _{j=1}^n w_j \left( \Phi ^{-1}_{ij}(1-u)\cdot I_{\left\{ w_j \ge 0 \right\} } \right. \\&\left. \qquad \qquad \qquad + \Phi ^{-1}_{ij} (u) \cdot I_{\left\{ w_j <0\right\} } \right) - b - s_i \\&= -\sum _{j=1}^n w_j \Upsilon ^{-1}_{ij} (u, -w_j) -b - s_i.\\ \end{aligned} \end{aligned}$$

Thus, it follows from Theorem 2 that Inequality (6) is satisfied if and only if

$$\begin{aligned} F^{-1}_{1i} (\alpha )= -\sum _{j=1}^n w_j \Upsilon ^{-1}_{ij} \left( \alpha , -w_j\right) -b - s_i \le 0, \end{aligned}$$

which is equivalent to Inequality (4), i.e.

$$\begin{aligned} \begin{aligned}&\sum _{j=1}^n w_j \Upsilon ^{-1}_{ij} \left( \alpha , -w_j\right) +b \\&\qquad = y_i \left( \sum _{j=1}^n w_j \Upsilon ^{-1}_{ij} \left( \alpha , -y_iw_j\right) +b \right) \\&\qquad \ge -s_i.\\ \end{aligned} \end{aligned}$$

Case II: When $y_i=-1$. In this case, Inequality (3) degenerates into

$$\begin{aligned} \begin{aligned}&\mathcal {M}\left\{ \pmb {w}^T \tilde{\pmb {x}}_{i} +b -s_i \le 0 \right\} \\&\qquad =\mathcal {M}\left\{ \sum _{j=1}^n w_j \tilde{x}_{ij} +b -s_i \le 0 \right\} \\&\qquad \ge \alpha . \\ \end{aligned} \end{aligned}$$

(7)

Similarly, the inverse uncertainty distribution of $\sum _{j=1}^n w_j \tilde{x}_{ij} +b -s_i$ is

$$\begin{aligned} \begin{aligned} F^{-1}_{2i} (u)&= \sum _{j=1}^{n} w_j \left( \Phi ^{-1}_{ij} (u)\cdot I_{\left\{ w_j \ge 0 \right\} } \right. \\&\qquad + \left. \Phi ^{-1}_{ij}(1-u) \cdot I_{ \left\{ w_j<0\right\} }\right) + b-s_i \\&=\sum _{j=1}^n w_j\Upsilon ^{-1}_{ij}\left( u, w_j\right) +b-s_i. \\ \end{aligned} \end{aligned}$$

Thus, Inequality (7) is satisfied if and only if

$$\begin{aligned} F^{-1}_{2i}(\alpha ) = \sum _{j=1}^{n} w_j \Upsilon ^{-1}_{ij}\left( \alpha ,w_j\right) +b - s_i \le 0, \end{aligned}$$

which is equivalent to Inequality (4), i.e.,

$$\begin{aligned} \begin{aligned}&-\left( \sum _{j=1}^{n} w_j \Upsilon ^{-1}_{ij} \left( \alpha ,w_j\right) +b \right) \\&\qquad =y_i \left( \sum _{j=1}^{n} w_j \Upsilon ^{-1}_{ij} \left( \alpha ,-y_iw_j\right) +b\right) \\&\qquad \ge - s_i. \\ \end{aligned} \end{aligned}$$

$\square$

It follows from Qin and Li (2022) that the distance from a data set S to the hyperplane $\pmb {w}^T\pmb {x} +b=0$ is

$$\begin{aligned} \min _{i} E \left[ \frac{\left|\pmb {w}^T \tilde{\pmb {x}}_i + b \right|}{\Vert \pmb {w} \Vert } \right] , \end{aligned}$$

which determines the margin for the positive class and negative class. In order to maximize the margin and meanwhile minimize the sum of slack variables, we choose

$$\begin{aligned} \min _{i} \ E \left[ \frac{ \left|\pmb {w}^T \tilde{\pmb {x}}_i + b \right|}{\Vert \pmb {w} \Vert } \right] - C\sum _{i=1}^{l} s_i \end{aligned}$$

as the objective function, in which $C>0$ is a penalty coefficient. Correspondingly, we formulate the following optimization model for the soft margin USVM as

$$\begin{aligned} \left\{ \begin{aligned} \max \limits _{\begin{array}{c} \pmb {w},b,\\ s_1, s_2, \ldots , s_l \end{array}} \quad&\min _{i} \ E \left[ \frac{ \left|\pmb {w}^T \tilde{\pmb {x}}_i + b \right|}{\Vert \pmb {w} \Vert } \right] - C\sum _{i=1}^{l} s_i \\ \mathrm { s.t.} \quad&\mathcal {M}\left\{ y_i\left( \pmb {w}^T \tilde{\pmb {x}}_i + b\right) \ge -s_i \right\} \ge \alpha , \\&\qquad \qquad \qquad \qquad i=1,2, \ldots , l\\&s_i \ge 0, \quad i=1,2, \ldots , l. \\ \end{aligned} \right. \end{aligned}$$

(8)

If $(\pmb {w},b,\pmb {s})$ is an optimal solution to Model (8), then for any constant $\lambda >0$, $(\lambda \pmb {w},\lambda b,\lambda \pmb {s})$ is also an optimal solution to Model (8). To obtain an unique optimal solution, a constraint on the coefficients is necessary. Without loss of generality, we insert a new constraint $\Vert \pmb {w}\Vert =1$ to Model (8). The objective function correspondingly becomes

$$\begin{aligned} \begin{aligned}&\min _{i} \ E \left[ \left|\pmb {w}^T \tilde{\pmb {x}}_i + b \right|\right] - C\sum _{i=1}^{l} s_i \\&= \min _{i} \ E \left[ \left|\sum _{j=1}^n w_j\tilde{x}_{ij}+ b \right|\right] - C\sum _{i=1}^{l} s_i.\\ \end{aligned} \end{aligned}$$

Further, we obtain the following model for the soft margin USVM,

$$\begin{aligned} \left\{ \begin{aligned} \max \limits _{\begin{array}{c} w_1, w_2,\ldots , w_n,b, \\ s_1, s_2,\ldots , s_l \end{array}} \quad&\min _{i} \ E \left[ \left|\sum _{j=1}^n w_j\tilde{x}_{ij}+ b \right|\right] - C\sum _{i=1}^{l} s_i \\ \mathrm { s.t.} \quad&\mathcal {M}\left\{ y_i \left( \sum _{j=1}^n w_j\tilde{x}_{ij} + b\right) \ge -s_i \right\} \ge \alpha , \\&\qquad \qquad \qquad \qquad i=1, 2, \ldots , l \\&\sum _{j=1}^n w_j^2 =1 \\&s_i \ge 0, \quad i=1,2, \ldots , l. \\ \end{aligned} \right. \end{aligned}$$

(9)

For the linearly $\alpha$-separable case, we have $s_i=0$ for $i=1,2,\ldots ,l$. In this case, Model (9) degenerates into the hard margin USVM proposed by Qin and Li (2022).

Next we give the crisp equivalent form of Model (9) when inverse uncertainty distributions of imprecise observations are given.

Theorem 4

Suppose that the components $\tilde{x}_{i1}$, $\tilde{x}_{i2}, \ldots , \tilde{x}_{in}$ of each uncertain vectors $\tilde{\pmb {x}}_{i}$ are independent for $i=1, 2, \ldots , l$. Let $\Phi ^{-1}_{ij}$ be the inverse uncertainty distributions of $\tilde{x}_{ij}$ for $i=1, 2, \ldots , l$ and $j=1, 2, \ldots , n$, respectively. Then Model (9) is equivalent to Model (10).

$$\begin{aligned} \left\{ \begin{aligned} \max _{\begin{array}{c} w_1, w_2,\ldots , w_n, b, \\ s_1, s_2,\ldots , s_l \end{array}} \quad&\min _{i} \int _{0}^{1} \left|\sum \limits _{j=1}^{n} w_j\Upsilon _{ij}^{-1} \left( u,w_j\right) + b \right|\mathrm{d}u - C\sum _{i=1}^{l} s_i \\ \mathrm { s.t.} \quad&y_i \left( \sum _{j=1}^{n} w_j \Upsilon ^{-1}_{ij} \left( \alpha ,-y_iw_j\right) +b \right) \ge -s_i, \quad i=1,2,\ldots , l \\&\sum _{j=1}^{n} w^2_j =1 \\&s_i \ge 0, \quad i=1, 2,\ldots , l, \\ \end{aligned} \right. \end{aligned}$$

(10)

where $\Upsilon _{ij}^{-1}(\alpha ,-y_iw_j)$ is given in Equality (5), and

$$\begin{aligned} \Upsilon _{ij}^{-1}\left( u,w_j\right) = \Phi ^{-1}_{ij}\left( u \right) \cdot I_{ \left\{ w_{j} \ge 0 \right\} } + \Phi ^{-1}_{ij}\left( 1 - u \right) \cdot I_{\left\{ w_{j} < 0 \right\} }. \end{aligned}$$

Proof

It follows from Theorem 1 that the inverse uncertainty distributions of

$$\begin{aligned} \pmb {w}^T \tilde{\pmb {x}}_i + b = \sum _{j=1}^{n} w_j \tilde{x}_{ij}+b \end{aligned}$$

are

$$\begin{aligned} \sum _{j=1}^{n} w_j \Upsilon _{ij}^{-1}\left( \alpha , w_j\right) + b \end{aligned}$$

for $i=1, 2, \ldots , l ,$ respectively. Thus, the objective function becomes

$$\begin{aligned} \begin{aligned}&\int _{0}^{1} \left|\sum \limits _{j=1}^{n} w_j\Upsilon _{ij}^{-1} (u, w_j) + b \right|\mathrm{d}u - C\sum _{i=1}^{l} s_i . \end{aligned} \end{aligned}$$

The constraints follow from Theorem 3 immediately. $\square$

4 Numerical experiments

In this section, we conduct two examples to show the application of the soft margin USVM. We suppose that all the imprecise observations are characterized by linear uncertain variables. The uncertainty distribution and the inverse uncertainty distribution of a linear uncertain variable $\mathcal {L}(a, b)$ is

$$\begin{aligned} \Phi (x)=\left\{ \begin{array}{cl} 0, &{} \text {if}\quad x\le a\\ \displaystyle {\frac{x-a}{b-a}},&{} \text {if}\quad a<x \le b\\ 1, &{} \text {if}\quad b<x\\ \end{array} \right. \end{aligned}$$

and

$$\begin{aligned} \Phi ^{-1}(u) = (1-u)a+u b, \end{aligned}$$

respectively.

Example 1

We consider the first data set $S_1$ which contains 60 observations. The data is denoted as $(\tilde{x}_{i1},\tilde{x}_{i2})$ with the label $y_{i} \in \{1, -1\}$, $i=1, 2, \ldots , 60$. Let $\Phi ^{-1}_{i1}$ and $\Phi ^{-1}_{i2}$ denote the inverse uncertainty distributions of $\tilde{x}_{i1}$ and $\tilde{x}_{i2}$ for $i=1,2,\ldots , 60$, respectively. The detailed data are shown in Table 1.

Table 1 Data set $S_1$

Full size table

For brevity, denote by A the negative class and denote by B the positive one. The first 20 data in A and the first 20 data in B are utilized as the training data, while the rest are utilized as the test data. The training data and the test data are plotted in Fig. 2, where black squares represent the training data in A, black stars represent the training data in B, blue squares represent the test data in A, and blue stars represent the test data in B. It can be seen that $S_1$ is a linearly $\alpha$-nonseparable data set.

For the data set $S_1$, Model (10) is reformulated and we obtain Model (11).

$$\begin{aligned} \left\{ \begin{aligned} \max _{\begin{array}{c} w_1, w_2, b, \\ s_i, i \in I \end{array}} \quad&\min _{i\in I} \int _{0}^{1} \left|w_1\Upsilon _{i1}^{-1} (u, w_1) + w_2\Upsilon _{i2}^{-1} (u, w_2)+ b \right|\mathrm{d}u - C\sum _{i \in I} s_i \\ \mathrm { s.t.} \quad&y_i \left( w_1 \Upsilon ^{-1}_{i1} \left( \alpha , -y_iw_1\right) + w_2 \Upsilon ^{-1}_{i2} \left( \alpha , -y_iw_2\right) +b \right) \ge -s_i,\quad i \in I \\&w_1^2+w_2^2=1 \\&s_i \ge 0, \quad i \in I, \\ \end{aligned} \right. \end{aligned}$$

(11)

where $I=\{1, 2, \ldots , 20, 31, 32, \ldots , 50 \}$.

Setting confidence levels $\alpha$ = 0.90, 0.95, 0.99, respectively, we employ ‘fmincon’ package in Matlab to seek the optimal solution to Model (11) under different parameters. The coefficients for the optimal hyperplanes are reported in Table 2. It can be seen that the optimal hyperplanes appear to change slightly under different penalty coefficients C. And a larger C seems to produce more reliable results. However, the results are slightly different for distinct confidence levels.

Table 2 Optimal solutions ($w_1,w_2,b$) under different penalty coefficients and confidence levels

Full size table

When the parameter $C= 1$ and confidence level $\alpha =0.95$, the optimal hyperplane determined by Model (11) for $S_1$ is $0.9966 x_1 - 0.0820 x_2 = 19.9543$. The misclassified data are the 9th, 19th, and 39th observations, and their corresponding slack variables are 1.0278, 0.1051, and 2.2807, respectively. These misclassified data are pointed in red in Fig. 3, where the data $\tilde{\pmb {x}}_i$ with label $y_i = 1$ are indicated as squares, and the data $\tilde{\pmb {x}}_i$ with label $y_i = -1$ are indicated as stars for $i=1, 2, \ldots , 40$.

Next, we evaluate the generalization ability of the soft margin USVM by following the classification approach given by Qin and Li (2022). We evaluate Model (11) by the test data with the confidence level $\alpha =0.90$. Denote the number of the points in positive class for right classification as true positive (TP), the number of the points in positive class for the false classification as false negative (FN), the number of the points in negative class for the right classification as true negative (TN), and the number of points in negative class for the false classification as false positive (FP). The results are presented in Table 3 and summarized in a confusion matrix which is listed in Table 4.

Table 3 Predicting the data label

Full size table

Table 4 Confusion matrix in Example 1

Full size table

Further, commonly used evaluation indicators for binary classification problem are accuracy, precision, recall, and F-score, which are defined as follows,

$$\begin{aligned}&\text {Accuracy} = \frac{\text {TP} + \text {TN}}{\text {TP} + \text {FN} + \text {TN} + \text {FP}}, \\&\text {Precision} = \frac{\text {TP}}{\text {TP} + \text {FN}}, \\&\text {Recall} = \frac{\text {TP}}{\text {TP} + \text {FP}}, \end{aligned}$$

and

$$\begin{aligned} F=\frac{2}{\text {Recall}/2 + \text {Precision}/2 } = \frac{\text {TP} }{\text {TP} + (\text {FP} + \text {FN})/2}. \end{aligned}$$

We expect a higher number of the right classification for positive class as well as negative class. That is, above metrics indicate a good classification result if they are close to 1. Accordingly, we obtain

$$\begin{aligned}&\text {Accuracy} = \frac{9+10}{9 + 1 + 0 + 10} = 95.0\%, \\&\text {Precision} = \frac{9}{9+1} = 90.0\%, \\&\text {Recall} = \frac{9}{9+0} = 100.0\%, \end{aligned}$$

and

$$\begin{aligned} F=\frac{9}{9+(0+1)/2 }=94.7 \% . \end{aligned}$$

The results show that the soft margin USVM performs reasonably well for the data set $S_1$.

Next, we explore the robustness of the soft margin USVM by repeating the analysis and changing original data. Suppose that the uncertainty distributions of the 12th and the 35th data become $(\mathcal {L}(12,14), \mathcal {L}(7,9))$ and $(\mathcal {L}(38,40), \mathcal {L}(60,62))$, respectively, and their labels remain unchanged. Then, we obtain a new data set $S_2$. Running ‘fmincon’ function in Matlab, we obtain the optimal solutions under confidence levels 0.90, 0.95 and 0.99, respectively, which are reported in Table 5.

Table 5 Optimal solutions ($w_1,w_2,b$) under different penalty coefficients and confidence levels with two modified data

Full size table

When the penalty coefficient $C= 1$ and the confidence level $\alpha =0.95$, the optimal hyperplane determined by Model (11) for the linearly $\alpha$-nonseparable data set $S_2$ is $0.9966 x_1- 0.0820 x_2 = 19.9543$. Figure 4 shows the optimal hyperplane as a line and two modified data in red. The misclassified data are the 9th, the 19th, and the 39th data, and the corresponding slack variables are 1.0278, 0.1051 and 2.2807, respectively.

Table 5 indicates that the optimal solution will change when the penalty coefficient C from 0 to 1. But when C continues increasing, the optimal solutions are unchanged under data modification. This is possibly because the penalty term in the objective function plays a leading role.

Example 2

In this example, we examine the soft margin USVM based on the linearly $\alpha$-separable data set given by Qin and Li (2022). Denote the data set as $S_3$, whose elements are denoted as $(\tilde{x}_{i1},\tilde{x}_{i2})$ with label $y_{i} \in \{1, -1\}$, $i=1, 2, \ldots , 40$. Let $\Phi ^{-1}_{i1}$ and $\Phi ^{-1}_{i2}$ denote the inverse uncertainty distributions of $\tilde{x}_{i1}$ and $\tilde{x}_{i2}$ for $i=1,2,\ldots , 40$, respectively.

To classify the data in $S_3$, Model (10) is reformulated, and we obtain Model (12) with a penalty coefficient. This calls for an examination of the classification power of Model (12) from the perspective of different penalty coefficients as well as confidence levels. As before, we explore the optimal solutions for confidence levels $\alpha \in \{0.90, 0.95, 0.99 \}$, and penalty coefficient $C \in \{ 1, 2, 3, 4, 10, 20, 40\}.$ The optimal solution $(w_{1}, w_{2},b)$ to Model (12) can be obtained by employing ‘fmincon’ function in Matlab. We list the optimal solutions in Table 6.

$$\begin{aligned} \left\{ \begin{aligned} \max _{\begin{array}{c} w_1,w_2, b,\\ s_1, s_2\ldots , s_{40} \end{array}} \quad&\min _{i} \int _{0}^{1} \left|w_1\Upsilon _{i1}^{-1} (u, w_1) + w_2\Upsilon _{i2}^{-1} (u, w_2)+ b \right|\mathrm{d}u - C\sum _{i=1}^{40} s_i \\ \mathrm { s.t.} \quad&y_i \left( w_1 \Upsilon ^{-1}_{i1} \left( \alpha , -y_iw_1\right) + w_2 \Upsilon ^{-1}_{i2} \left( \alpha , -y_iw_2 \right) +b \right) \ge -s_i, \quad i=1, 2,\ldots , 40 \\&w^2_1 + w^2_2 = 1 \\&s_i \ge 0, \quad i=1, 2, \ldots , 40. \\ \end{aligned} \right. \end{aligned}$$

(12)

Table 6 Optimal solutions ($w_{1},w_{2},b$) under different penalty coefficients and confidence levels for data set S₃

Full size table

Table 7 Optimal solutions ($w_{1},w_{2},b$) under different penalty coefficients and confidence levels for data set S₄

Full size table

When the penalty coefficient C increases, the results tend to be unchanged. Letting $C=10$ and $\alpha =0.90$, we plot the results in Fig. 5 to show that the optimal hyperplane correctly classifies the data set. Suppose that the uncertainty distributions of the 26th and the 30th data become $(\mathcal {L}(1,3), \mathcal {L}(81,83))$ and $(\mathcal {L}(3,4), \mathcal {L}(55,58))$, respectively, and their labels remain unchanged. Then we obtain a new data set $S_4$. As indicated by Fig. 6, two new data are indicated in circles.

Similarly, we may obtain the corresponding optimal hyperplane and their coefficients are presented in Table 7. The results are not sensitive to the confidence level $\alpha$ and the penalty coefficient C. From Fig. 6, we observe that all the data with positive labels are lying in the same half-plane determined by the optimal hyperplane, and the data with negative labels are lying in the contrary part. In the meantime, the widest margin is manifested by the hyperplane. The results are consistent with the main conclusions in Qin and Li (2022). We can see that the soft margin USVM is not only capable of handling the classification problem with linearly $\alpha$-separable data sets, but also capable of the case with linearly $\alpha$-nonseparable data sets.

5 Conclusions

This paper proposed an USVM based on soft margin method for linearly $\alpha$-nonseparable data set with imprecise observations in the framework of uncertainty theory. The equivalent crisp model was derived based on the inverse uncertainty distributions. Two numerical examples were conducted to demonstrate the application and effectiveness of the proposed model. The modelling idea on soft margin USVM shed a light on some future research directions. We may further study the nonlinear classification problems, multi-classification problem and so on in the case with imprecise observations.

References

Boser B, Guyon I, Vapnik V (1992) A training algorithm for optimal margin classifiers. In: Fifth annual workshop on computational learning theory, vol 5. ACM, Pittsburgh, pp 144–152
Chapter Google Scholar
Burges C (1998) A tutorial on support vector machines for pattern recognition. Data Min Knowl Discov 2(2):121–167
Article Google Scholar
Chen D (2020) Tukey’s biweight estimation for uncertain regression model with imprecise observations. Soft Comput 24:16803–16809
Article MATH Google Scholar
Cortes C, Vapnik V (1995) Support-vector networks. Mach Learn 20(3):273–297
Article MATH Google Scholar
Fang L, Hong Y (2020) Uncertain revised regression analysis with responses of logarithmic, square root and reciprocal transformations. Soft Comput 24:2655–2670
Article MATH Google Scholar
Gautam N, Singh A, Kumar K et al (2021) Investigation on performance analysis of support vector machine for classification of abnormal regions in medical image. J Ambient Intell Human Comput. https://doi.org/10.1007/s12652-021-02965-9
Article Google Scholar
Hu Z, Gao J (2020) Uncertain Gompertz regression model with imprecise observations. Soft Comput 24:2543–2549
Article MATH Google Scholar
Li Q, Qin Z, Liu Z (2022) Uncertain support vector regression with imprecise observations. J Intell Fuzzy Syst 43:3403–3409
Article Google Scholar
Lio W (2021) Uncertain statistics and COVID-19 spread in China. J Uncertain Syst 14(1):2150008
Article MathSciNet Google Scholar
Lio W, Liu B (2018) Residual and confidence interval for uncertain regression model with imprecise observations. J Intell Fuzzy Syst 35(2):2573–2583
Article Google Scholar
Lio W, Liu B (2020) Uncertain maximum likelihood estimation with application to uncertain regression analysis. Soft Comput 24:9351–9360
Article MATH Google Scholar
Liu B (2007) Uncertainty theory, 2nd edn. Springer, Berlin
MATH Google Scholar
Liu B (2009) Some research problems in uncertainty theory. J Uncertain Syst 3(1):3–10
MathSciNet Google Scholar
Liu B (2010) Uncertainty theory: a branch of mathematics for modeling human uncertainty. Springer, Berlin
Book Google Scholar
Liu B (2015) Uncertainty theory, 4th edn. Springer, Berlin
Book MATH Google Scholar
Liu Z, Yang Y (2020) Least absolute deviations uncertain regression with imprecise observations. Fuzzy Optim Decis Mak 19:33–52
Article MathSciNet MATH Google Scholar
Lu J, Peng J, Chen J et al (2020) Prediction method of autoregressive moving average models for uncertain time series. Int J Gen Syst 49(5):546–572
Article MathSciNet Google Scholar
Mukkamala S, Janoski G, Sung A (2002) Intrusion detection using neural networks and support vector machines. In: Proceedings of the 2002 international joint conference on neural networks, vol 2, pp 1702–1707
Google Scholar
Okwuashi O, Ndehedehe C (2020) Deep support vector machine for hyperspectral image classification. Pattern Recognit 103:107298
Article Google Scholar
Priyadharsini N, Chitra D (2021) A kernel support vector machine based anomaly detection using spatio-temporal motion pattern models in extremely crowded scenes. J Ambient Intell Human Comput 12:5225–5234
Article Google Scholar
Qin Z, Li Q (2022) An uncertain support vector machine with imprecise observations. Technical Report
Vapnik V (1995) The nature of statistical learning theory. Springer, New York
Book MATH Google Scholar
Vapnik V (1999) An overview of statistical learning theory. IEEE Trans Neural Netw 10(5):988–999
Article Google Scholar
VenkateswarLal P, Nitta G, Prasad A (2019) Ensemble of texture and shape descriptors using support vector machine classification for face recognition. J Ambient Intell Human Comput. https://doi.org/10.1007/s12652-019-01192-7
Article Google Scholar
Wang Y, Luo S, Gao J (2017) Uncertain extensive game with application to resource allocation of national security. J Ambient Intell Human Comput 8:797–808
Article Google Scholar
Wang H, Zheng B, Yoon S et al (2018) A support vector machine-based ensemble algorithm for breast cancer diagnosis. Eur J Oper Res 267(2):687–699
Article MathSciNet MATH Google Scholar
Xin Y, Yang X, Gao J (2021) Least squares estimation for the high-order uncertain moving average model with application to carbon dioxide emissions. Int J Gen Syst 50(6):724–740
Article MathSciNet Google Scholar
Yang X, Gao J (2016) Linear-quadratic uncertain differential game with application to resource extraction problem. IEEE Trans Fuzzy Syst 24(4):819–826
Article MathSciNet Google Scholar
Yang X, Liu B (2019) Uncertain time series analysis with imprecise observations. Fuzzy Optim Decis Mak 18(3):263–278
Article MathSciNet MATH Google Scholar
Yao K (2021) An uncertain single-server queueing model. J Uncertain Syst 14(1):2150001
Article Google Scholar
Yao K, Liu B (2018) Uncertain regression analysis: an approach for imprecise observations. Soft Comput 22(17):5579–5582
Article MATH Google Scholar
Zhang C, Liu Z, Liu J (2020) Least absolute deviations for uncertain multivariate regression model. Int J Gen Syst 49(4):449–465
Article MathSciNet Google Scholar
Zhang Y, Gao J, Li X et al (2021) Two-person cooperative uncertain differential game with transferable payoffs. Fuzzy Optim Decis Making 20:1–28
Article MathSciNet MATH Google Scholar

Download references

Acknowledgements

This work was supported by National Natural Science Foundation of China (Nos. 72071008 and 71771011).

Author information

Authors and Affiliations

School of Economics and Management, Beihang University, Beijing, 100191, China
Qiqi Li & Zhongfeng Qin
Key Laboratory of Complex System Analysis, Management and Decision (Beihang University), Ministry of Education, Beijing, 100191, China
Zhongfeng Qin
School of Reliability and Systems Engineering, Beihang University, Beijing, 100191, China
Zhe Liu

Authors

Qiqi Li
View author publications
You can also search for this author in PubMed Google Scholar
Zhongfeng Qin
View author publications
You can also search for this author in PubMed Google Scholar
Zhe Liu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Zhongfeng Qin.

Ethics declarations

Conflict of interest

The author declares that there is no conflict of interests regarding the publication of this paper.

Ethics statement

This work did not involve any active collection of human data.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Li, Q., Qin, Z. & Liu, Z. An uncertain support vector machine based on soft margin method. J Ambient Intell Human Comput 14, 12949–12958 (2023). https://doi.org/10.1007/s12652-022-04385-9

Download citation

Received: 12 December 2021
Accepted: 06 June 2022
Published: 29 September 2022
Issue Date: September 2023
DOI: https://doi.org/10.1007/s12652-022-04385-9

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

An uncertain support vector machine based on soft margin method

Abstract

Similar content being viewed by others

A Systematic Review on Supervised and Unsupervised Machine Learning Algorithms for Data Science

Generating method and application of basic probability assignment based on interval number distance and model reliability

Learning from imbalanced data: open challenges and future directions

1 Introduction