1 Introduction

As one of the most important classification methods, SVMs have achieved great success in numerous real-world problems (Vapnik 1999; Burges 1998), such as image recognition (VenkateswarLal et al. 2019), disease diagnosis (Wang et al. 2018; Okwuashi and Ndehedehe 2020; Gautam et al. 2021), intrusion detection (Mukkamala et al. 2002; Priyadharsini and Chitra 2021) and so on. Traditionally, SVMs are designed to search a tube with the maximum margin based on precise data. A tube is uniquely determined by a hyperplane and the minimum distance from the data to the hyperplane. Different SVMs are constructed according to the complexity of the classification problem. The hard margin method was first developed by Vapnik (1995) for linearly separable data sets, in which the data with positive labels and those with negative labels lie in the diverse half-spaces determined by a hyperplane. And the parameters of the hyperplane can be expressed as a linear combination of support vectors (Boser et al. 1992). Nevertheless, the majority of data sets are linearly nonseparable, and the SVM based on hard margin method failed in dealing with these problems. Thus, Cortes and Vapnik (1995) introduced slack variables for misclassification samples such that linearly nonseparable training sets can be classified at the maximum margin target, which is known as soft margin method.

However, when the observations are imprecise or the samples are not large enough, the results in the framework of probability theory are usually not satisfying. In order to handle such cases, Liu (2007) founded uncertainty theory, and then Liu (2009, 2010) further improved and perfected the theory. Following the idea, uncertain statistics are developed to handle the issues with imprecise observations. Yao and Liu (2018) proposed a least square method for regression by characterizing imprecise observations as uncertain variables. Then several other regression models (Hu and Gao 2020; Fang and Hong 2020; Zhang et al. 2020) were further studied, and parameter estimation methods (Liu and Yang 2020; Chen 2020; Li et al 2022) were discussed. In the meantime, confidence interval (Lio and Liu 2018), and hypothesis test (Lio and Liu 2020) for uncertain statistic were also introduced. For time series analysis with imprecise observations, autoregressive models (Yang and Liu 2019) and autoregressive moving average models (Lu et al. 2020; Xin et al. 2021) were also introduced, respectively. Some applications in other fields were also explored, such as uncertain differential game (Zhang et al. 2021; Yang and Gao 2016), uncertain extensive game (Wang et al. 2017), COVID-19 spread (Lio 2021) and uncertain queueing model (Yao 2021).

For classification problems with imprecise observations, Qin and Li (2022) introduced an USVM based on hard margin method in the framework of uncertainty theory, which extended the traditional hard margin SVM. However, the hard margin USVM is only suitable for the linearly \(\alpha\)-separable data sets with imprecise observations. In this paper, we propose an USVM based on soft margin method for the classification problem with linearly \(\alpha\)-nonseparable data set. Similarly, we assume the imprecise observations as uncertain variables and formulate an optimization model for the soft margin USVM. Further, we conduct numerical experiments to illustrate the application of the proposed method and evaluate its performance.

The paper is organized as follows. Some definitions and theorems in uncertainty theory are given in Sect. 2. In Sect. 3, we formulate an USVM based on soft margin method for linearly \(\alpha\)-nonseparable data sets in uncertain environment. Then, Sect. 4 presents two examples to show the application of the soft margin USVM. Finally, we give a conclusion of the paper in Sect. 5.

2 Preliminaries

In this section, we sketch some definitions and theorems used in this paper.

Let \(\mathcal {L}\) be a \(\sigma\)-algebra on a nonempty set \(\Gamma .\) A set function \(\mathcal {M}:\mathcal {L}\rightarrow [0,1]\) is called an uncertain measure (Liu 2007; 2009) if it satisfies: (1) \(\mathcal {M}\{\Gamma \}=1\) for the universal set \(\Gamma\); (2) \(\mathcal {M}\{\Lambda \}+\mathcal {M}\{\Lambda ^c\}=1\) for any \(\Lambda \in \mathcal {L}\); (3) For every countable sequence \(\Lambda _1,\Lambda _2,\ldots\), \(\mathcal {M}\left\{ \bigcup _{i=1}^{\infty } \Lambda _i\right\} \le \sum _{i=1}^{\infty } \mathcal {M}\left\{ \Lambda _i\right\}\), where the triple \((\Gamma ,\mathcal {L},\mathcal {M})\) is called an uncertainty space; (4) Let \((\Gamma _k, \mathcal {L}_k, \mathcal {M}_k)\) be uncertainty spaces for \(k=1, 2, \ldots .\) The product uncertain measure \(\mathcal {M}\) is an uncertain measure satisfying

$$\begin{aligned} \mathcal {M}\left\{ \prod \limits _{i=1}^{\infty } \Lambda _k \right\} = \bigwedge \limits _{i=1}^{\infty } \mathcal {M}_k\{ \Lambda _k\}, \end{aligned}$$

where \(\Lambda _k\) are arbitrarily chosen sets from \(\mathcal {L}_k\) for \(k = 1,\ 2,\ldots\), respectively.

An uncertain variable (Liu 2007) \(\tau\) is a measurable function from an uncertainty space \((\Gamma ,\mathcal {L},\mathcal {M})\) to the set of real numbers, i.e., the set \(\{\tau \in B \}= \{\gamma \in \Gamma \mid \tau (\gamma ) \in B \}\) is an event in \(\mathcal {L}\) for any Borel set B. The function \(\Upsilon (x)=\mathcal {M}\{ \tau \le x \}, x\in \Re\) is called the uncertainty distribution of \(\tau\).

Theorem 1

(Liu 2010) Let \(\xi _1,\xi _2,\ldots ,\xi _n\) be independent uncertain variables with inverse uncertainty distributions \(\Phi ^{-1}_1,\Phi ^{-1}_2,\ldots ,\Phi ^{-1}_n\), respectively. If \(f(x_1,x_2,\ldots ,x_n)\) is strictly increasing with respect to \(x_1,x_2,\ldots ,\) \(x_m\), and strictly decreasing with respect to \(x_{m+1},x_{m+2}, \ldots , x_{n}\), then the uncertain variable \(\xi =f(\xi _1,\xi _2,\ldots ,\xi _n)\) has an inverse uncertainty distribution

$$\begin{aligned} \begin{aligned} \Psi ^{-1}(u)=f\left( \Phi _1^{-1}(u),\Phi _2^{-1}(u),\ldots ,\Phi _{m}^{-1}(u), \right. \\ \left. \qquad \qquad \qquad \Phi _{m+1}^{-1} (1-u), \ldots ,\Phi _n^{-1}(1-u)\right) . \\ \end{aligned} \end{aligned}$$

Theorem 2

(Liu 2015) Assume the function \(g(\pmb {x}\), \(\xi _1\), \(\xi _2, \ldots , \xi _n)\) is strictly increasing with respect to \(\xi _1, \xi _2, \ldots , \xi _k\) and strictly decreasing with respect to \(\xi _{k+1}, \xi _{k+2}, \ldots , \xi _n\). If \(\xi _1, \xi _2, \ldots , \xi _n\) are independent uncertain variables with inverse uncertainty distributions \(\Phi _1^{-1}, \Phi _2^{-1}, \ldots , \Phi _n^{-1},\) respectively, then the chance constraint

$$\begin{aligned} \mathcal {M}\left\{ g\left( \pmb {x}, \xi _1, \xi _2, \ldots , \xi _n \right) \le 0 \right\} \ge \alpha \end{aligned}$$

holds if and only if

$$\begin{aligned} \begin{aligned}&g\left( \pmb {x}, \Phi ^{-1}_1(\alpha ), \ldots , \Phi ^{-1}_k(\alpha ), \right. \\&\left. \qquad \qquad \Phi ^{-1}_{k+1} (1-\alpha ), \ldots , \Phi ^{-1}_{n}(1-\alpha ) \right) \le 0. \end{aligned} \end{aligned}$$
(1)

3 Soft margin method for linearly \(\alpha\)-nonseparable data sets

This section discusses linearly \(\alpha\)-nonseparable data sets and presents a soft margin USVM for this situation.

Suppose the observed data set is \(S= \{( \tilde{\pmb {x}}_1,y_1 ), ( \tilde{\pmb {x}}_2, y_2), \ldots , (\tilde{\pmb {x}}_{l},y_{l}) \}\), where \(\tilde{\pmb {x}}_i\) are uncertain vectors, and \(y_i\in \{ 1,-1\}\) are crisp labels for \(i=1,2,\ldots ,l\), respectively. Let \(\alpha \in (0.5,1)\) be a given confidence level. Qin and Li (2022) defined that S is linearly \(\alpha\)-separable if there exists one hyperplane \(\pmb {w}^T\pmb {x} +b=0\) such that

$$\begin{aligned} \mathcal {M}\left\{ y_i\cdot \left( \pmb {w}^T\tilde{\pmb {x}}_i +b \right) \ge 0 \right\} \ge \alpha , \quad i=1,2,\ldots , l, \end{aligned}$$
(2)

where \(\pmb {w}=(w_1, w_2, \ldots , w_n)\) is an n-dimensional vector with an Euclidean norm \(\Vert \pmb {w}\Vert\). This definition implies that linearly \(\alpha\)-separable data sets are those which can be classified by a hyperplane at confidence level \(\alpha\).

If there exists no hyperplane \(\pmb {w}^T\pmb {x} +b=0\) such that

$$\begin{aligned} \mathcal {M}\left\{ y_i\cdot \left( \pmb {w}^T\tilde{\pmb {x}}_i +b \right) \ge 0 \right\} \ge \alpha \end{aligned}$$

for all \(i=1,2,\ldots , l\), then we call the data set S linearly \(\alpha\)-nonseparable. For instance, Fig. 1 gives an example of a linearly \(\alpha\)-nonseparable data set. There is no hyperplane that makes Inequality (2) hold. Thus the hard margin method (Qin and Li 2022) can not be applied in this case.

Fig. 1
figure 1

A hyperplane and a linearly \(\alpha\)-nonseparable data set

Suppose that S is linearly \(\alpha\)-nonseparable. Then, for each observation \((\tilde{\pmb {x}}_i, y_i )\), we may seek a nonnegative slack variable \(s_i\) such that

$$\begin{aligned} \mathcal {M}\left\{ y_i\cdot \left( \pmb {w}^T\tilde{\pmb {x}}_i +b \right) \ge -s_i\right\} \ge \alpha . \end{aligned}$$
(3)

Note that such an inequality holds if \(s_i\) is large enough. Therefore, we always want the slack variable \(s_i\) to be as small as possible. Note that if \(s_i=0\) for all i, then the set S degenerates into a linearly \(\alpha\)-separable one.

Theorem 3

Suppose that the components \(\tilde{x}_{i1}\), \(\tilde{x}_{i2}, \ldots , \tilde{x}_{in}\) of uncertain vector \(\tilde{\pmb {x}}_{i}\) are independent. Let \(\Phi ^{-1}_{i,j}\) denote the inverse uncertainty distributions of\(\tilde{x}_{ij}\) for \(i=1,2,\ldots ,l\) and \(j=1, 2, \ldots , n\), respectively. Then Inequality (3) is equivalent to the following crisp form

$$\begin{aligned} y_i \left( \sum _{j=1}^{n} w_j \Upsilon ^{-1}_{ij} \left( \alpha ,-y_iw_j \right) +b \right) \ge -s_i, \end{aligned}$$
(4)

where

$$\begin{aligned} \begin{aligned} \Upsilon _{ij}^{-1}\left( \alpha ,-y_iw_j\right)&= \Phi ^{-1}_{ij}\left( \alpha \right) \cdot I_{ \left\{ -y_iw_{j} \ge 0 \right\} } \\&+ \Phi ^{-1}_{ij}\left( 1 - \alpha \right) \cdot I_{\left\{ -y_iw_{j} < 0 \right\} } .\\ \end{aligned} \end{aligned}$$
(5)

Proof

The argument breaks into two cases. Case I: When \(y_i=1\). In this case, Inequality (3) degenerates into

$$\begin{aligned} \begin{aligned}&\mathcal {M}\left\{ \pmb {w}^T \tilde{\pmb {x}}_i +b \ge -s_i \right\} \\&\qquad =\mathcal {M}\left\{ -\pmb {w}^T \tilde{\pmb {x}}_i -b -s_i \le 0 \right\} \\&\qquad = \mathcal {M}\left\{ -\sum _{j=1}^nw_j\tilde{x}_{ij} -b - s_i \le 0 \right\} \\&\qquad \ge \alpha . \\ \end{aligned} \end{aligned}$$
(6)

The inverse uncertainty distribution of \(-\pmb {w}^T \tilde{\pmb {x}}_i -b -s_i=-\sum _{j=1}^n w_j \tilde{x}_{ij} -b -s_i\) is

$$\begin{aligned} \begin{aligned} F^{-1}_{1i} (u)&= -\sum _{j=1}^n w_j \left( \Phi ^{-1}_{ij}(1-u)\cdot I_{\left\{ w_j \ge 0 \right\} } \right. \\&\left. \qquad \qquad \qquad + \Phi ^{-1}_{ij} (u) \cdot I_{\left\{ w_j <0\right\} } \right) - b - s_i \\&= -\sum _{j=1}^n w_j \Upsilon ^{-1}_{ij} (u, -w_j) -b - s_i.\\ \end{aligned} \end{aligned}$$

Thus, it follows from Theorem 2 that Inequality (6) is satisfied if and only if

$$\begin{aligned} F^{-1}_{1i} (\alpha )= -\sum _{j=1}^n w_j \Upsilon ^{-1}_{ij} \left( \alpha , -w_j\right) -b - s_i \le 0, \end{aligned}$$

which is equivalent to Inequality (4), i.e.

$$\begin{aligned} \begin{aligned}&\sum _{j=1}^n w_j \Upsilon ^{-1}_{ij} \left( \alpha , -w_j\right) +b \\&\qquad = y_i \left( \sum _{j=1}^n w_j \Upsilon ^{-1}_{ij} \left( \alpha , -y_iw_j\right) +b \right) \\&\qquad \ge -s_i.\\ \end{aligned} \end{aligned}$$

Case II: When \(y_i=-1\). In this case, Inequality (3) degenerates into

$$\begin{aligned} \begin{aligned}&\mathcal {M}\left\{ \pmb {w}^T \tilde{\pmb {x}}_{i} +b -s_i \le 0 \right\} \\&\qquad =\mathcal {M}\left\{ \sum _{j=1}^n w_j \tilde{x}_{ij} +b -s_i \le 0 \right\} \\&\qquad \ge \alpha . \\ \end{aligned} \end{aligned}$$
(7)

Similarly, the inverse uncertainty distribution of \(\sum _{j=1}^n w_j \tilde{x}_{ij} +b -s_i\) is

$$\begin{aligned} \begin{aligned} F^{-1}_{2i} (u)&= \sum _{j=1}^{n} w_j \left( \Phi ^{-1}_{ij} (u)\cdot I_{\left\{ w_j \ge 0 \right\} } \right. \\&\qquad + \left. \Phi ^{-1}_{ij}(1-u) \cdot I_{ \left\{ w_j<0\right\} }\right) + b-s_i \\&=\sum _{j=1}^n w_j\Upsilon ^{-1}_{ij}\left( u, w_j\right) +b-s_i. \\ \end{aligned} \end{aligned}$$

Thus, Inequality (7) is satisfied if and only if

$$\begin{aligned} F^{-1}_{2i}(\alpha ) = \sum _{j=1}^{n} w_j \Upsilon ^{-1}_{ij}\left( \alpha ,w_j\right) +b - s_i \le 0, \end{aligned}$$

which is equivalent to Inequality (4), i.e.,

$$\begin{aligned} \begin{aligned}&-\left( \sum _{j=1}^{n} w_j \Upsilon ^{-1}_{ij} \left( \alpha ,w_j\right) +b \right) \\&\qquad =y_i \left( \sum _{j=1}^{n} w_j \Upsilon ^{-1}_{ij} \left( \alpha ,-y_iw_j\right) +b\right) \\&\qquad \ge - s_i. \\ \end{aligned} \end{aligned}$$

\(\square\)

It follows from Qin and Li (2022) that the distance from a data set S to the hyperplane \(\pmb {w}^T\pmb {x} +b=0\) is

$$\begin{aligned} \min _{i} E \left[ \frac{\left|\pmb {w}^T \tilde{\pmb {x}}_i + b \right|}{\Vert \pmb {w} \Vert } \right] , \end{aligned}$$

which determines the margin for the positive class and negative class. In order to maximize the margin and meanwhile minimize the sum of slack variables, we choose

$$\begin{aligned} \min _{i} \ E \left[ \frac{ \left|\pmb {w}^T \tilde{\pmb {x}}_i + b \right|}{\Vert \pmb {w} \Vert } \right] - C\sum _{i=1}^{l} s_i \end{aligned}$$

as the objective function, in which \(C>0\) is a penalty coefficient. Correspondingly, we formulate the following optimization model for the soft margin USVM as

$$\begin{aligned} \left\{ \begin{aligned} \max \limits _{\begin{array}{c} \pmb {w},b,\\ s_1, s_2, \ldots , s_l \end{array}} \quad&\min _{i} \ E \left[ \frac{ \left|\pmb {w}^T \tilde{\pmb {x}}_i + b \right|}{\Vert \pmb {w} \Vert } \right] - C\sum _{i=1}^{l} s_i \\ \mathrm { s.t.} \quad&\mathcal {M}\left\{ y_i\left( \pmb {w}^T \tilde{\pmb {x}}_i + b\right) \ge -s_i \right\} \ge \alpha , \\&\qquad \qquad \qquad \qquad i=1,2, \ldots , l\\&s_i \ge 0, \quad i=1,2, \ldots , l. \\ \end{aligned} \right. \end{aligned}$$
(8)

If \((\pmb {w},b,\pmb {s})\) is an optimal solution to Model (8), then for any constant \(\lambda >0\), \((\lambda \pmb {w},\lambda b,\lambda \pmb {s})\) is also an optimal solution to Model (8). To obtain an unique optimal solution, a constraint on the coefficients is necessary. Without loss of generality, we insert a new constraint \(\Vert \pmb {w}\Vert =1\) to Model (8). The objective function correspondingly becomes

$$\begin{aligned} \begin{aligned}&\min _{i} \ E \left[ \left|\pmb {w}^T \tilde{\pmb {x}}_i + b \right|\right] - C\sum _{i=1}^{l} s_i \\&= \min _{i} \ E \left[ \left|\sum _{j=1}^n w_j\tilde{x}_{ij}+ b \right|\right] - C\sum _{i=1}^{l} s_i.\\ \end{aligned} \end{aligned}$$

Further, we obtain the following model for the soft margin USVM,

$$\begin{aligned} \left\{ \begin{aligned} \max \limits _{\begin{array}{c} w_1, w_2,\ldots , w_n,b, \\ s_1, s_2,\ldots , s_l \end{array}} \quad&\min _{i} \ E \left[ \left|\sum _{j=1}^n w_j\tilde{x}_{ij}+ b \right|\right] - C\sum _{i=1}^{l} s_i \\ \mathrm { s.t.} \quad&\mathcal {M}\left\{ y_i \left( \sum _{j=1}^n w_j\tilde{x}_{ij} + b\right) \ge -s_i \right\} \ge \alpha , \\&\qquad \qquad \qquad \qquad i=1, 2, \ldots , l \\&\sum _{j=1}^n w_j^2 =1 \\&s_i \ge 0, \quad i=1,2, \ldots , l. \\ \end{aligned} \right. \end{aligned}$$
(9)

For the linearly \(\alpha\)-separable case, we have \(s_i=0\) for \(i=1,2,\ldots ,l\). In this case, Model (9) degenerates into the hard margin USVM proposed by Qin and Li (2022).

Next we give the crisp equivalent form of Model (9) when inverse uncertainty distributions of imprecise observations are given.

Theorem 4

Suppose that the components \(\tilde{x}_{i1}\), \(\tilde{x}_{i2}, \ldots , \tilde{x}_{in}\) of each uncertain vectors \(\tilde{\pmb {x}}_{i}\) are independent for \(i=1, 2, \ldots , l\). Let \(\Phi ^{-1}_{ij}\) be the inverse uncertainty distributions of \(\tilde{x}_{ij}\) for \(i=1, 2, \ldots , l\) and \(j=1, 2, \ldots , n\), respectively. Then Model (9) is equivalent to Model (10).

$$\begin{aligned} \left\{ \begin{aligned} \max _{\begin{array}{c} w_1, w_2,\ldots , w_n, b, \\ s_1, s_2,\ldots , s_l \end{array}} \quad&\min _{i} \int _{0}^{1} \left|\sum \limits _{j=1}^{n} w_j\Upsilon _{ij}^{-1} \left( u,w_j\right) + b \right|\mathrm{d}u - C\sum _{i=1}^{l} s_i \\ \mathrm { s.t.} \quad&y_i \left( \sum _{j=1}^{n} w_j \Upsilon ^{-1}_{ij} \left( \alpha ,-y_iw_j\right) +b \right) \ge -s_i, \quad i=1,2,\ldots , l \\&\sum _{j=1}^{n} w^2_j =1 \\&s_i \ge 0, \quad i=1, 2,\ldots , l, \\ \end{aligned} \right. \end{aligned}$$
(10)

where \(\Upsilon _{ij}^{-1}(\alpha ,-y_iw_j)\) is given in Equality (5), and

$$\begin{aligned} \Upsilon _{ij}^{-1}\left( u,w_j\right) = \Phi ^{-1}_{ij}\left( u \right) \cdot I_{ \left\{ w_{j} \ge 0 \right\} } + \Phi ^{-1}_{ij}\left( 1 - u \right) \cdot I_{\left\{ w_{j} < 0 \right\} }. \end{aligned}$$

Proof

It follows from Theorem 1 that the inverse uncertainty distributions of

$$\begin{aligned} \pmb {w}^T \tilde{\pmb {x}}_i + b = \sum _{j=1}^{n} w_j \tilde{x}_{ij}+b \end{aligned}$$

are

$$\begin{aligned} \sum _{j=1}^{n} w_j \Upsilon _{ij}^{-1}\left( \alpha , w_j\right) + b \end{aligned}$$

for \(i=1, 2, \ldots , l ,\) respectively. Thus, the objective function becomes

$$\begin{aligned} \begin{aligned}&\int _{0}^{1} \left|\sum \limits _{j=1}^{n} w_j\Upsilon _{ij}^{-1} (u, w_j) + b \right|\mathrm{d}u - C\sum _{i=1}^{l} s_i . \end{aligned} \end{aligned}$$

The constraints follow from Theorem 3 immediately. \(\square\)

4 Numerical experiments

In this section, we conduct two examples to show the application of the soft margin USVM. We suppose that all the imprecise observations are characterized by linear uncertain variables. The uncertainty distribution and the inverse uncertainty distribution of a linear uncertain variable \(\mathcal {L}(a, b)\) is

$$\begin{aligned} \Phi (x)=\left\{ \begin{array}{cl} 0, &{} \text {if}\quad x\le a\\ \displaystyle {\frac{x-a}{b-a}},&{} \text {if}\quad a<x \le b\\ 1, &{} \text {if}\quad b<x\\ \end{array} \right. \end{aligned}$$

and

$$\begin{aligned} \Phi ^{-1}(u) = (1-u)a+u b, \end{aligned}$$

respectively.

Example 1

We consider the first data set \(S_1\) which contains 60 observations. The data is denoted as \((\tilde{x}_{i1},\tilde{x}_{i2})\) with the label \(y_{i} \in \{1, -1\}\), \(i=1, 2, \ldots , 60\). Let \(\Phi ^{-1}_{i1}\) and \(\Phi ^{-1}_{i2}\) denote the inverse uncertainty distributions of \(\tilde{x}_{i1}\) and \(\tilde{x}_{i2}\) for \(i=1,2,\ldots , 60\), respectively. The detailed data are shown in Table  1.

Table 1 Data set \(S_1\)

For brevity, denote by A the negative class and denote by B the positive one. The first 20 data in A and the first 20 data in B are utilized as the training data, while the rest are utilized as the test data. The training data and the test data are plotted in Fig. 2, where black squares represent the training data in A, black stars represent the training data in B, blue squares represent the test data in A, and blue stars represent the test data in B. It can be seen that \(S_1\) is a linearly \(\alpha\)-nonseparable data set.

Fig. 2
figure 2

Training data and test data in Example 1

For the data set \(S_1\), Model (10) is reformulated and we obtain Model (11).

$$\begin{aligned} \left\{ \begin{aligned} \max _{\begin{array}{c} w_1, w_2, b, \\ s_i, i \in I \end{array}} \quad&\min _{i\in I} \int _{0}^{1} \left|w_1\Upsilon _{i1}^{-1} (u, w_1) + w_2\Upsilon _{i2}^{-1} (u, w_2)+ b \right|\mathrm{d}u - C\sum _{i \in I} s_i \\ \mathrm { s.t.} \quad&y_i \left( w_1 \Upsilon ^{-1}_{i1} \left( \alpha , -y_iw_1\right) + w_2 \Upsilon ^{-1}_{i2} \left( \alpha , -y_iw_2\right) +b \right) \ge -s_i,\quad i \in I \\&w_1^2+w_2^2=1 \\&s_i \ge 0, \quad i \in I, \\ \end{aligned} \right. \end{aligned}$$
(11)

where \(I=\{1, 2, \ldots , 20, 31, 32, \ldots , 50 \}\).

Setting confidence levels \(\alpha\) = 0.90, 0.95, 0.99, respectively, we employ ‘fmincon’ package in Matlab to seek the optimal solution to Model (11) under different parameters. The coefficients for the optimal hyperplanes are reported in Table 2. It can be seen that the optimal hyperplanes appear to change slightly under different penalty coefficients C. And a larger C seems to produce more reliable results. However, the results are slightly different for distinct confidence levels.

Table 2 Optimal solutions (\(w_1,w_2,b\)) under different penalty coefficients and confidence levels

When the parameter \(C= 1\) and confidence level \(\alpha =0.95\), the optimal hyperplane determined by Model (11) for \(S_1\) is \(0.9966 x_1 - 0.0820 x_2 = 19.9543\). The misclassified data are the 9th, 19th, and 39th observations, and their corresponding slack variables are 1.0278, 0.1051, and 2.2807, respectively. These misclassified data are pointed in red in Fig. 3, where the data \(\tilde{\pmb {x}}_i\) with label \(y_i = 1\) are indicated as squares, and the data \(\tilde{\pmb {x}}_i\) with label \(y_i = -1\) are indicated as stars for \(i=1, 2, \ldots , 40\).

Fig. 3
figure 3

Misclassified data in \(S_1\)

Next, we evaluate the generalization ability of the soft margin USVM by following the classification approach given by Qin and Li (2022). We evaluate Model (11) by the test data with the confidence level \(\alpha =0.90\). Denote the number of the points in positive class for right classification as true positive (TP), the number of the points in positive class for the false classification as false negative (FN), the number of the points in negative class for the right classification as true negative (TN), and the number of points in negative class for the false classification as false positive (FP). The results are presented in Table 3 and summarized in a confusion matrix which is listed in Table 4.

Table 3 Predicting the data label
Table 4 Confusion matrix in Example 1

Further, commonly used evaluation indicators for binary classification problem are accuracy, precision, recall, and F-score, which are defined as follows,

$$\begin{aligned}&\text {Accuracy} = \frac{\text {TP} + \text {TN}}{\text {TP} + \text {FN} + \text {TN} + \text {FP}}, \\&\text {Precision} = \frac{\text {TP}}{\text {TP} + \text {FN}}, \\&\text {Recall} = \frac{\text {TP}}{\text {TP} + \text {FP}}, \end{aligned}$$

and

$$\begin{aligned} F=\frac{2}{\text {Recall}/2 + \text {Precision}/2 } = \frac{\text {TP} }{\text {TP} + (\text {FP} + \text {FN})/2}. \end{aligned}$$

We expect a higher number of the right classification for positive class as well as negative class. That is, above metrics indicate a good classification result if they are close to 1. Accordingly, we obtain

$$\begin{aligned}&\text {Accuracy} = \frac{9+10}{9 + 1 + 0 + 10} = 95.0\%, \\&\text {Precision} = \frac{9}{9+1} = 90.0\%, \\&\text {Recall} = \frac{9}{9+0} = 100.0\%, \end{aligned}$$

and

$$\begin{aligned} F=\frac{9}{9+(0+1)/2 }=94.7 \% . \end{aligned}$$

The results show that the soft margin USVM performs reasonably well for the data set \(S_1\).

Next, we explore the robustness of the soft margin USVM by repeating the analysis and changing original data. Suppose that the uncertainty distributions of the 12th and the 35th data become \((\mathcal {L}(12,14), \mathcal {L}(7,9))\) and \((\mathcal {L}(38,40), \mathcal {L}(60,62))\), respectively, and their labels remain unchanged. Then, we obtain a new data set \(S_2\). Running ‘fmincon’ function in Matlab, we obtain the optimal solutions under confidence levels 0.90, 0.95 and 0.99, respectively, which are reported in Table 5.

Table 5 Optimal solutions (\(w_1,w_2,b\)) under different penalty coefficients and confidence levels with two modified data

When the penalty coefficient \(C= 1\) and the confidence level \(\alpha =0.95\), the optimal hyperplane determined by Model (11) for the linearly \(\alpha\)-nonseparable data set \(S_2\) is \(0.9966 x_1- 0.0820 x_2 = 19.9543\). Figure 4 shows the optimal hyperplane as a line and two modified data in red. The misclassified data are the 9th, the 19th, and the 39th data, and the corresponding slack variables are 1.0278, 0.1051 and 2.2807, respectively.

Table 5 indicates that the optimal solution will change when the penalty coefficient C from 0 to 1. But when C continues increasing, the optimal solutions are unchanged under data modification. This is possibly because the penalty term in the objective function plays a leading role.

Fig. 4
figure 4

Optimal hyperplane for data set \(S_2\)

Example 2

In this example, we examine the soft margin USVM based on the linearly \(\alpha\)-separable data set given by Qin and Li (2022). Denote the data set as \(S_3\), whose elements are denoted as \((\tilde{x}_{i1},\tilde{x}_{i2})\) with label \(y_{i} \in \{1, -1\}\), \(i=1, 2, \ldots , 40\). Let \(\Phi ^{-1}_{i1}\) and \(\Phi ^{-1}_{i2}\) denote the inverse uncertainty distributions of \(\tilde{x}_{i1}\) and \(\tilde{x}_{i2}\) for \(i=1,2,\ldots , 40\), respectively.

To classify the data in \(S_3\), Model (10) is reformulated, and we obtain Model (12) with a penalty coefficient. This calls for an examination of the classification power of Model (12) from the perspective of different penalty coefficients as well as confidence levels. As before, we explore the optimal solutions for confidence levels \(\alpha \in \{0.90, 0.95, 0.99 \}\), and penalty coefficient \(C \in \{ 1, 2, 3, 4, 10, 20, 40\}.\) The optimal solution \((w_{1}, w_{2},b)\) to Model (12) can be obtained by employing ‘fmincon’ function in Matlab. We list the optimal solutions in Table 6.

$$\begin{aligned} \left\{ \begin{aligned} \max _{\begin{array}{c} w_1,w_2, b,\\ s_1, s_2\ldots , s_{40} \end{array}} \quad&\min _{i} \int _{0}^{1} \left|w_1\Upsilon _{i1}^{-1} (u, w_1) + w_2\Upsilon _{i2}^{-1} (u, w_2)+ b \right|\mathrm{d}u - C\sum _{i=1}^{40} s_i \\ \mathrm { s.t.} \quad&y_i \left( w_1 \Upsilon ^{-1}_{i1} \left( \alpha , -y_iw_1\right) + w_2 \Upsilon ^{-1}_{i2} \left( \alpha , -y_iw_2 \right) +b \right) \ge -s_i, \quad i=1, 2,\ldots , 40 \\&w^2_1 + w^2_2 = 1 \\&s_i \ge 0, \quad i=1, 2, \ldots , 40. \\ \end{aligned} \right. \end{aligned}$$
(12)
Table 6 Optimal solutions (\(w_{1},w_{2},b\)) under different penalty coefficients and confidence levels for data set S3
Table 7 Optimal solutions (\(w_{1},w_{2},b\)) under different penalty coefficients and confidence levels for data set S4

When the penalty coefficient C increases, the results tend to be unchanged. Letting \(C=10\) and \(\alpha =0.90\), we plot the results in Fig. 5 to show that the optimal hyperplane correctly classifies the data set. Suppose that the uncertainty distributions of the 26th and the 30th data become \((\mathcal {L}(1,3), \mathcal {L}(81,83))\) and \((\mathcal {L}(3,4), \mathcal {L}(55,58))\), respectively, and their labels remain unchanged. Then we obtain a new data set \(S_4\). As indicated by Fig. 6, two new data are indicated in circles.

Fig. 5
figure 5

Optimal hyperplane for data set \(S_3\)

Fig. 6
figure 6

Optimal hyperplane for data set \(S_4\)

Similarly, we may obtain the corresponding optimal hyperplane and their coefficients are presented in Table 7. The results are not sensitive to the confidence level \(\alpha\) and the penalty coefficient C. From Fig. 6, we observe that all the data with positive labels are lying in the same half-plane determined by the optimal hyperplane, and the data with negative labels are lying in the contrary part. In the meantime, the widest margin is manifested by the hyperplane. The results are consistent with the main conclusions in Qin and Li (2022). We can see that the soft margin USVM is not only capable of handling the classification problem with linearly \(\alpha\)-separable data sets, but also capable of the case with linearly \(\alpha\)-nonseparable data sets.

5 Conclusions

This paper proposed an USVM based on soft margin method for linearly \(\alpha\)-nonseparable data set with imprecise observations in the framework of uncertainty theory. The equivalent crisp model was derived based on the inverse uncertainty distributions. Two numerical examples were conducted to demonstrate the application and effectiveness of the proposed model. The modelling idea on soft margin USVM shed a light on some future research directions. We may further study the nonlinear classification problems, multi-classification problem and so on in the case with imprecise observations.