A bilateral-truncated-loss based robust support vector machine for classification problems

Yang, Xiaowei; Han, Le; Li, Yan; He, Lifang

doi:10.1007/s00500-014-1448-9

A bilateral-truncated-loss based robust support vector machine for classification problems

Methodologies and Application
Published: 10 September 2014

Volume 19, pages 2871–2882, (2015)
Cite this article

Soft Computing Aims and scope Submit manuscript

Xiaowei Yang¹,
Le Han¹,
Yan Li¹ &
…
Lifang He²

276 Accesses
11 Citations
Explore all metrics

Abstract

Support vector machine (SVM) is sensitive to outliers or noise in the training dataset. Fuzzy SVM (FSVM) and the bilateral-weighted FSVM (BW-FSVM) can partly overcome this shortcoming by assigning different fuzzy membership degrees to different training samples. However, it is a difficult task to set the fuzzy membership degrees of the training samples. To avoid setting fuzzy membership degrees, from the beginning of the BW-FSVM model, this paper outlines the construction of a bilateral-truncated-loss based robust SVM (BTL-RSVM) model for classification problems with noise. Based on its equivalent model, we theoretically analyze the reason why the robustness of BTL-RSVM is higher than that of SVM and BW-FSVM. To solve the proposed BTL-RSVM model, we propose an iterative algorithm based on the concave–convex procedure and the Newton–Armijo algorithm. A set of experiments is conducted on ten real world benchmark datasets to test the robustness of BTL-RSVM. The statistical tests of the experimental results indicate that compared with SVM, FSVM and BW-FSVM, the proposed BTL-RSVM can significantly reduce the effects of noise and provide superior robustness.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Unified SVM algorithm based on LS-DC loss

Article 21 July 2021

Fuzzy Least Squares Support Vector Machine with Fuzzy Hyperplane

Article 13 April 2023

Weighted relaxed support vector machines

Article 07 September 2014

References

Armijo L (1966) Minimization of functions having Lipschitz-continuous first partial derivatives. Pac J Math 16(1):1–3
Article MATH MathSciNet Google Scholar
Batuwita R, Palade V (2010) FSVM-CIL: fuzzy support vector machines for class imbalance learning. IEEE Trans Fuzzy Syst 18(3):558–571
Article Google Scholar
Bertsekas DP (1999) Nonlinear programming. Athena Scientific, Belmont
MATH Google Scholar
Chapelle O (2007) Training a support vector machine in the primal. Neural Comput 19(5):1155–1178
Article MATH MathSciNet Google Scholar
Collobert R, Sinz F, Weston J, Bottou L (2006) Trading convexity for scalability. In: Proceedings of the 23rd international conference on machine learning. ACM Press, Pittsburgh, pp 201–208
Demsar J (2006) Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 7:1–30
MATH MathSciNet Google Scholar
Dennis JE, Schnabel RB (1983) Numerical methods for unconstrained optimization and nonlinear equations. Prentice-Hall, Englewood Cliffs
MATH Google Scholar
Ertekin S, Bottou L, Giles CL (2011) Nonconvex online support vector machines. IEEE Trans Pattern Anal Mach Intell 33(2):368–381
Article Google Scholar
Fung G, Mangasarian OL (2001) Proximal support vector machine classifiers. In: Proceedings of the seventh ACM SIGKDD international conference on knowledge discovery and data mining, San Francisco, California, pp 77–86
Hochberg Y (1988) A sharper Bonferroni procedure for multiple tests of significance. Biometrika 75(4):800–802
Article MATH MathSciNet Google Scholar
Huang W, Lai KK, Yu L, Wang SY (2008) A least squares bilateral- weighted fuzzy SVM method to evaluate credit risk. In: Proceedings of the fourth international conference on natural computation, pp 13–17
Jayadeva Khemchandani R, Chandra S (2004) Fast and robust learning through fuzzy linear proximal support vector machines. Neurocomputing 61:401–411
Article Google Scholar
Jiang XF, Yi Z, Lv JC (2006) Fuzzy SVM with a new fuzzy membership function. Neural Comput Appl 15(3–4):268–276
Article Google Scholar
Jilani TA, Burney SMA (2008) Multiclass bilateral-weighted fuzzy support vector machine to evaluate financial strength credit rating. In: Proceedings of the international conference on computer science and information technology, pp 342–348
Keller JM, Hunt DJ (1985) Incorporating fuzzy membership functions into the perceptron algorithm. IEEE Trans Pattern Anal Mach Intell 7(6):693–699
Article Google Scholar
Lee YJ, Mangasarian OL (2001) SSVM: a smooth support vector machine for classification. Comput Optim Appl 20(1):5–22
Article MATH MathSciNet Google Scholar
Leski JK (2004) An $\varepsilon -$margin nonlinear classifier based on fuzzy if-then rules. IEEE Trans Syst Man Cybern Part B Cybern 34(1):68–76
Article Google Scholar
Lin CF, Wang SD (2002) Fuzzy support vector machines. IEEE Trans Neural Netw 13(2):464–471
Article Google Scholar
Lin CF, Wang SD (2004) Training algorithms for fuzzy support vector machines with noisy data. Pattern Recognit Lett 25(14):1647–1656
Article Google Scholar
Lin Y (2002) Support vector machines and the Bayes rule in classification. Data Min Knowl Discov 6(3):259–275
Article MathSciNet Google Scholar
Liu YF, Shen XT, Doss H (2005) Multicategory $\psi $-learning and support vector machine: computational tools. J Comput Graph Stat 14(1):219–236
Article MathSciNet Google Scholar
Liu YF, Shen XT (2006) Multicategory $\psi $-learning. J Am Stat Assoc 101(474):500–509
Article MathSciNet Google Scholar
Platt JC (1998) Sequential minimal optimization—a fast algorithm for training support vector machines. In: Advances in Kernel methods—support vector learning. MIT Press, Cambridge, pp 185–208
Song Q, Hu W, Xie W (2002) Robust support vector machine with bullet hole image classification. IEEE Trans Syst Man Cybern Part C Appl Rev 32(4):440–448
Article Google Scholar
Tao Q, Wang J (2004) A new fuzzy support vector machine based on the weighted margin. Neural Process Lett 20:139–150
Wang L, Jia HD, Li J (2008) Training robust support vector machine with smooth Ramp loss in the primal space. Neurocomputing 71:3020–3025
Article Google Scholar
Wu XD (1995) Knowledge acquisition from databases. Ablex Publishing Corporation, Norwood
Google Scholar
Wu YC, Liu YF (2007) Robust truncated hinge loss support vector machines. J Am Stat Assoc 102(479):974–983
Article MATH Google Scholar
Wu YC, Liu YF (2013) Adaptively weighted large margin classifiers. J Comput Graph Stat 22(2):416–432
Wang YQ, Wang SY, Lai KK (2005) A new fuzzy support vector machine to evaluate credit risk. IEEE Trans Fuzzy Syst 13(6):820–831
Article Google Scholar
Xu LL, Crammer K, Schuurmans D (2006) Robust support vector machine training via convex outlier ablation. In: Proceedings of the 21st national conference on artificial intelligence. AAAI Press, Boston, pp 536–542
Yang XW, Zhang GQ, Lu J, Ma J (2011) A kernel fuzzy c-means clustering based fuzzy support vector machine algorithm for classification problems with outliers or noise$s$. IEEE Trans Fuzzy Syst 19(1):105–115
Article Google Scholar
Yuille AL, Rangarajia A (2003) The concave–convex procedure. Neural Comput 15(4):915–936
Article MATH Google Scholar
Zhang XG (1999) Using class-center vectors to build support vector machines. In: Proceedings of the 1999 IEEE signal processing society workshop. IEEE Press, New York, pp 3–11

Download references

Acknowledgments

The work presented in this paper is supported by the National Science Foundation of China (61273295), the Major Project of the National Social Science Foundation of China (11&ZD156), the Open Project of Key Laboratory of Symbolic Computation and Knowledge Engineering of the Chinese Ministry of Education (93K-17-2009-K04).

Author information

Authors and Affiliations

Department of Mathematics, School of Sciences, South China University of Technology, Guangzhou, 510641, People’s Republic of China
Xiaowei Yang, Le Han & Yan Li
School of Computer Science and Engineering, South China University of Technology, Guangzhou, 510641, People’s Republic of China
Lifang He

Authors

Xiaowei Yang
View author publications
You can also search for this author in PubMed Google Scholar
Le Han
View author publications
You can also search for this author in PubMed Google Scholar
Yan Li
View author publications
You can also search for this author in PubMed Google Scholar
Lifang He
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Xiaowei Yang.

Additional information

Communicated by V. Loia.

Appendix

Proof of Proposition 1

The loss function $L({{\mathbf w}},b,m,{{\mathbf x}})$ can be rewritten as follows:

$$\begin{aligned} \begin{array}{lll} L({{\mathbf w}},b,m,{{\mathbf x}})&{}=&{}m\left| {1-({{\mathbf w}}^T\varphi ({{\mathbf x}})+b)} \right| _+ \\ &{}&{}+(1-m)\left| {1+{{\mathbf w}}^T\varphi ({{\mathbf x}})+b} \right| _+ +1 \\ &{}=&{}m( \left| {1-({{\mathbf w}}^T\varphi ({{\mathbf x}})+b)} \right| _+ \!-\!\left| 1+{{\mathbf w}}^T\varphi ({{\mathbf x}}_i )\right. \\ &{}&{}\left. +b \right| _+ )+\left| {1+{{\mathbf w}}^T\varphi ({{\mathbf x}}_i )+b} \right| _+ +1 \\ \end{array}.\nonumber \\ \end{aligned}$$

(42)

If $\left| {1-({{\mathbf w}}^T\varphi ({{\mathbf x}})+b)} \right| _+ \ge \left| {1+{{\mathbf w}}^T\varphi ({{\mathbf x}})+b} \right| _+ $, then $L({{\mathbf w}},b,m,{{\mathbf x}})\ge 1+\left| {1+{{\mathbf w}}^T\varphi ({{\mathbf x}})+b} \right| _+ $. If $\left| 1-({{\mathbf w}}^T\varphi ({{\mathbf x}})\right. +\left. b) \right| _+ <\left| {1+{{\mathbf w}}^T\varphi ({{\mathbf x}})+b} \right| _+ $, then $L({{\mathbf w}},b,m,{{\mathbf x}})\ge 1+\left| {1-{{\mathbf w}}^T\varphi ({{\mathbf x}})-b} \right| _+ $. From the definition of the bilateral truncated loss function $\mathrm{robust}({{\mathbf w}},b,{{\mathbf x}})$, we know that under these two cases, $L({{\mathbf w}},b,m,{{\mathbf x}})\ge \mathrm{robust}({{\mathbf w}},b,{{\mathbf x}})$ always holds.

Proof of Proposition 2

In the following proof, $z={{\mathbf w}}^T\varphi ({{\mathbf x}})+b$. The graphs of the functions $f(z)=| {1-z} |_+ $ and $f(z)=| {1+z} |_+ $ are shown in Fig. 2:

From Fig. 2, we know that

$$\begin{aligned} \left| {1-z} \right| _+ \ge 1\ge \left| {1+z} \right| _+, \end{aligned}$$

(43)

or

$$\begin{aligned} \left| {1-z} \right| _+ \le 1\le \left| {1+z} \right| _+ . \end{aligned}$$

(44)

If (43) holds, then $\mathrm{robust}({{\mathbf w}},b,{{\mathbf x}})=1+\left| {1+{{\mathbf w}}^T\varphi ({{\mathbf x}})+b} \right| _+ $. If (44) holds, then $\mathrm{robust}({{\mathbf w}},b,{{\mathbf x}})=1+\left| {1-{{\mathbf w}}^T\varphi ({{\mathbf x}})-b} \right| _+ $. From the proof procedure of Proposition 1, we know

$$\begin{aligned} \min L({{\mathbf w}},b,m,{{\mathbf x}})=\mathrm{robust}({{\mathbf w}},b,{{\mathbf x}}). \end{aligned}$$

(45)

Let $\mathrm{ErrorClass}({{\mathbf w}},b,{{\mathbf x}},y)$ be a misclassification error function,

$$\begin{aligned} \mathrm{ErrorClass}({{\mathbf w}},b,{{\mathbf x}},y)=\left\{ {{ \begin{array}{lc} {1, \quad y({{\mathbf w}}^T\varphi ({{\mathbf x}})+b)<0} \\ {0, \quad y({{\mathbf w}}^T\varphi ({{\mathbf x}})+b)\ge 0} \\ \end{array}}} \right. .\nonumber \\ \end{aligned}$$

(46)

From the definition of $\mathrm{robust}({{\mathbf w}},b,{{\mathbf x}})$ and (46), we know that the following formulation holds.

$$\begin{aligned} {\mathrm{robust}}({{\mathbf w}},b,{{\mathbf x}})\ge 1\ge {\mathrm{ErrorClass}}({{\mathbf w}},b,{{\mathbf x}},y), \end{aligned}$$

(47)

which shows that the bilateral truncated loss function $\mathrm{robust}({{\mathbf w}},b,{{\mathbf x}})$ in the optimization problem (14) is the upper bound of the misclassification error function.

Proof of Theorem 1

Define

$$\begin{aligned}&F_{\mathrm{rob}} ({{\mathbf w}},b)=\frac{1}{2}{{\mathbf w}}^T{{\mathbf w}}+C\sum \limits _{i=1}^l {\mathrm{robust}({{\mathbf w}},b,{{\mathbf x}}_i )} , \end{aligned}$$

(48)

$$\begin{aligned}&F_L ({{\mathbf w}},b,{{\mathbf m}})=\frac{1}{2}{{\mathbf w}}^T{{\mathbf w}}+C\sum \limits _{i=1}^l {L({{\mathbf w}},b,m_i, {{\mathbf x}}_i )}, \end{aligned}$$

(49)

$$\begin{aligned}&({{\mathbf w}}_r, b_r )=\arg \min \limits _{w,b} F_{\mathrm{rob}} ({{\mathbf w}},b), \end{aligned}$$

(50)

$$\begin{aligned}&({{\mathbf w}}_L, b_L, {{\mathbf m}}_L)=\arg \min \limits _{w,b} \min \limits _{0\le m\le 1} F_L ({{\mathbf w}},b,{{\mathbf m}}), \end{aligned}$$

(51)

$$\begin{aligned}&{{\mathbf m}}_r =\arg \min \limits _{0\le m\le 1} F_L ({{\mathbf w}}_r, b_r, {{\mathbf m}}). \end{aligned}$$

(52)

From Proposition 2, we know the following two equalities hold:

$$\begin{aligned}&F_{\mathrm{rob}} ({{\mathbf w}}_r, b_r )=\min \limits _{0\le {{\mathbf m}}\le 1} F_L ({{\mathbf w}}_r, b_r, {{\mathbf m}}), \end{aligned}$$

(53)

$$\begin{aligned}&\min \limits _{0\le {{\mathbf m}}\le 1} F_L ({{\mathbf w}}_L, b_L ,{{\mathbf m}})=F_{\mathrm{rob}} ({{\mathbf w}}_L, b_L ). \end{aligned}$$

(54)

Considering that $({{\mathbf w}}_r, b_r )$ and $({{\mathbf w}}_L ,b_L )$ are the optimal solutions of the optimization problems $\min \nolimits _{w,b} F_{\mathrm{rob}} ({{\mathbf w}},b)$ and $\min \nolimits _{w,b} \min \nolimits _{0\le m\le 1} F_L ({{\mathbf w}},b,{{\mathbf m}})$, respectively, we have:

$$\begin{aligned}&F_{\mathrm{rob}} ({{\mathbf w}}_L, b_L )\ge \min \limits _{w,b} F_{\mathrm{rob}} ({{\mathbf w}},b),\end{aligned}$$

(55)

$$\begin{aligned}&\min \limits _{0\le {{\mathbf m}}\le 1} F_L ({{\mathbf w}}_r, b_r, {{\mathbf m}})\ge \min \limits _{{{\mathbf w}},b} \min \limits _{0\le {{\mathbf m}}\le 1} F_L ({{\mathbf w}},b,{{\mathbf m}}). \end{aligned}$$

(56)

Based on (50), (53), and (56), we can obtain

$$\begin{aligned} \min \limits _{w,b} F_{\mathrm{rob}} ({{\mathbf w}},b)\ge \min \limits _{{{\mathbf w}},b} \min \limits _{0\le {{\mathbf m}}\le 1} F_L ({{\mathbf w}},b,{{\mathbf m}}). \end{aligned}$$

(57)

Based on (51), (52), and (55), we have

$$\begin{aligned} \min \limits _{w,b} \min \limits _{0\le {{\mathbf m}}\le 1} F_L ({{\mathbf w}},b,{{\mathbf m}})\ge \min \limits _{w,b} F_{\mathrm{rob}} ({{\mathbf w}},b). \end{aligned}$$

(58)

Comparing (57) with (58) yields

$$\begin{aligned}&\min \limits _{{{\mathbf w}},b} \min \limits _{0\le m\le 1} \frac{1}{2}{{\mathbf w}}^T{{\mathbf w}}+C\sum \limits _{i=1}^l {L({{\mathbf w}},b,m_i, {{\mathbf x}}_i )} \\&\quad =\min \limits _{{{\mathbf w}},b} \frac{1}{2}{{\mathbf w}}^T{{\mathbf w}}+C\sum \limits _{i=1}^l {\mathrm{robust}({{\mathbf w}},b,{{\mathbf x}}_i )} . \end{aligned}$$

From the following inequality

$$\begin{aligned} F_{\mathrm{rob}} ({{\mathbf w}}_r, b_r )&= F_L ({{\mathbf w}}_r, b_r ,{{\mathbf m}}_r )\ge F_L ({{\mathbf w}}_L, b_L, {{\mathbf m}}_L )\\&= F_{\mathrm{rob}} ({{\mathbf w}}_L, b_L )\ge F_{\mathrm{rob}} ({{\mathbf w}}_r, b_r ), \end{aligned}$$

we know that the optimal solutions $({{\mathbf w}}_r, b_r )$ and $({{\mathbf w}}_L, b_L )$ of the optimization problems (13) and (14) with respect to $({{\mathbf w}},b)$ are interchangeable.

Proof of Theorem 2

Based on the decision values $z_i =\sum \nolimits _{j=1}^l {\alpha _j K({{\mathbf x}}_j, {{\mathbf x}}_i )+b} $, we divide the training samples into seven sets $U_1 =\{ {i\vert \vert z_i -1\vert \le h} \}$, $U_2 =\{ {i\vert \vert z_i +1\vert \le h} \}$, $B_1^ =\{ {i\vert h<z_i^ <1-h} \}$, $B_2^ =\{ {i\vert \vert z_i^ \vert \le h} \}$, $B_3^ =\!\{ {i\vert \!-\!1\!+\!h<\!z_i^ <\!-\!h} \}$,$N_1 =\{i\vert z_i >1+h\}$ and $N_2^ =\{ {i\vert z_i^ <-1-h} \}$, which are illustrated in Fig. 3. Let $n_{U_1}$, $n_{U_2 }$, $n_{B_1}$, $n_{B_2}$, $n_{B_3 }$, $n_{N_1 }$ and $n_{N_2}$ denote the number of training samples located in the sets $U_1$, $U_2$, $B_1$, $B_2$, $B_3$, $N_1$ and $N_2 $, respectively. ${{\mathbf I}}_{U_1}$ denotes an $l\times l$ diagonal matrix with the first $n_{U_1}$ elements being 1 and the other elements being zeros. ${{\mathbf I}}_{U_2}$ (${{\mathbf I}}_{B_1}$, ${{\mathbf I}}_{B_2}$, ${{\mathbf I}}_{B_3}$, ${{\mathbf I}}_{N_1}$, and ${{\mathbf I}}_{N_2})$ denote an $l\times l$ diagonal matrix with the first $n_{U_1}$ ($n_{U_1 } +n_{U_2 } $, $n_{U_1 } +n_{U_2 } +n_{B_1 } $, $n_{U_1} +n_{U_2}+n_{B_1} +n_{B_2 } $, $n_{U_1 } +n_{U_2 } +n_{B_1} +n_{B_2}+n_{B_3}$, $n_{U_1 } +n_{U_2 } +n_{B_1 } +n_{B_2} +n_{B_3}+n_{N_1})$ elements being zeros, followed by $n_{U_2 }$ ($n_{B_1}$, $n_{B_2}$, $n_{B_3}$, $n_{N_1 } $ and $n_{N_2})$ elements being 1, and the other elements being zeros. ${{\mathbf e}}_{U_1}$ denotes an $l\times 1$ vector with the first $n_{U_1}$ elements being 1 and the other elements being zeros. ${{\mathbf e}}_{U_2}$ (${{\mathbf e}}_{B_1 }$, ${{\mathbf e}}_{B_2 }$, ${{\mathbf e}}_{B_3}$, ${{\mathbf e}}_{N_1}$, and ${{\mathbf e}}_{N_2})$ denote an $l\times 1$ vector with the first $n_{U_1 }$ ($n_{U_1 } +n_{U_2 }$, $n_{U_1 } +n_{U_2} +n_{B_1}$, $n_{U_1}+n_{U_2}+n_{B_1}+n_{B_2 }$, $n_{U_1}+n_{U_2} +n_{B_1}+n_{B_2}+n_{B_3}$, $n_{U_1 } +n_{U_2}+n_{B_1} +n_{B_2}+n_{B_3}+n_{N_1})$ elements being zeros, followed by $n_{U_2}$ ($n_{B_1}$, $n_{B_2}$, $n_{B_3}$, $n_{N_1}$ and $n_{N_2})$ elements being 1, and the other elements being zeros.

From

$$\begin{aligned} \frac{\partial G_1^*(z_i)}{\partial z_i }=\left\{ { \begin{array}{ll} -1, &{} z_i <1-h \\ \frac{z_i -(1+h)}{2h},&{} \left| {z_i -1} \right| \le h \\ 0, &{} \,z_i >1+h \\ \end{array}} \right. , \end{aligned}$$

(59)

and

$$\begin{aligned} \frac{\partial H_1^*(z_i)}{\partial z_i}=\left\{ {\begin{array}{ll} {1,} &{} {z_i^ >-1+h} \\ {\frac{z_i+(1+h)}{2h},} &{} {\left| {z_i +1} \right| \le h} \\ {0,} &{} {z_i <-1-h} \\ \end{array}} \right. , \end{aligned}$$

(60)

we can obtain the first-order partial derivatives and the second-order ones of $J({\varvec{\alpha }},b)$ with respect to ${\varvec{\alpha }}$ and $b$ as follows:

$$\begin{aligned} \frac{\partial J({\varvec{\alpha }},b)}{\partial {\varvec{\alpha }}}&= {\varvec{K \alpha }}+C\sum \limits _{i=1}^l {\left( \frac{\partial G_1^*(z_i )}{\partial z_i }{{\mathbf K}}_i +\frac{\partial H_1^*(z_i )}{\partial z_i }{{\mathbf K}}_i \right) } +C\sum \limits _{i=1}^l {\lambda _i^t {{\mathbf K}}_i} \nonumber \\&= {\varvec{K \alpha }}+C\left( \frac{{{\mathbf K}}({{\mathbf I}}_{U_1 } +{{\mathbf I}}_{U_2 })({\varvec{K \alpha }}+b{{\mathbf e}})}{2h}+\frac{{{\mathbf K}}({{\mathbf I}}_{U_2} -{{\mathbf I}}_{U_1})(1+h){{\mathbf e}}}{2h}\right) , \nonumber \\&\quad +C{{\mathbf K}}({{\mathbf I}}_{U_1} +{{\mathbf I}}_{N_1} -{{\mathbf I}}_{U_2} -{{\mathbf I}}_{N_2 }){{\mathbf e}}+C{\varvec{K} \varvec{\lambda }}^{\mathbf t} \end{aligned}$$

(61)

$$\begin{aligned} \frac{\partial J({\varvec{\alpha }},b)}{\partial b}&= \delta b+C\sum \limits _{i=1}^l {\left( \frac{\partial G_1^*(z_i)}{\partial z_i} +\frac{\partial H_1^*(z_i )}{\partial z_i }\right) } +C\sum \limits _{i=1}^l {\lambda _i^t}\nonumber \\&= \delta b+C\left( \frac{({{\mathbf e}}_{U_1 } +{{\mathbf e}}_{U_2})^T({\varvec{K \alpha }}+b{{\mathbf e}})}{2h}+\frac{({{\mathbf e}}_{U_2}-{{\mathbf e}}_{U_1})^T(1+h){{\mathbf e}}}{2h}\right) \nonumber \\&\quad +C({{\mathbf e}}_{U_1} +{{\mathbf e}}_{N_1} -{{\mathbf e}}_{U_2} -{{\mathbf e}}_{N_2})^T{{\mathbf e}},+C({\varvec{\lambda }}^{\mathbf t})^T{{\mathbf e}} \end{aligned}$$

(62)

$$\begin{aligned} \frac{\partial ^2J({\varvec{\alpha }},b)}{\partial {\varvec{\alpha }}^2}&= {{\mathbf K}}+C\frac{{{\mathbf K}}({{\mathbf I}}_{U_1} +{{\mathbf I}}_{U_2}){{\mathbf K}}}{2h},\end{aligned}$$

(63)

$$\begin{aligned} \frac{\partial ^2J({\varvec{\alpha }},b)}{\partial {\varvec{\alpha }}\partial b}&= C\frac{{{\mathbf K}}({{\mathbf e}}_{U_1} +{{\mathbf e}}_{U_2} )}{2h}, \end{aligned}$$

(64)

$$\begin{aligned} \frac{\partial ^2J({\varvec{\alpha }},b)}{\partial b^2}&= \delta +C\frac{({{\mathbf e}}_{U_1} +{{\mathbf e}}_{U_2})^T({{\mathbf e}}_{U_1}+{{\mathbf e}}_{U_2})}{2h}, \end{aligned}$$

(65)

where ${{\mathbf e}}$ is an $l\times 1$ vector with elements being 1, the vector ${\varvec{\lambda }}^t$ composes of $\lambda _i^t $ according to the order of training samples located in the sets $U_1$, $U_2 $, $B_1 $, $B_2 $, $B_3 $, $N_1$, and $N_2$, and is denoted by ${\varvec{\lambda }}^t=( {{\varvec{\lambda }}_{U_1 }^t, {\varvec{\lambda }}_{U_2 }^t, {\varvec{\lambda }}_{B_1 }^t, {\varvec{\lambda }}_{B_2 }^t, {\varvec{\lambda }}_{B_3 }^t, {\varvec{\lambda }}_{N_1 }^t, {\varvec{\lambda }}_{N_2 }^t })^T$.

From (61)–(65), we can obtain the Hessian matrix and the gradient of the objective function $J({\varvec{\alpha }},b)$ as follows:

$$\begin{aligned} {\mathbf H}=\left( {{ \begin{array}{ll} {\delta +\frac{C({{\mathbf e}}_{U_1 } +{{\mathbf e}}_{U_2 })^T({{\mathbf e}}_{U_1 }+{{\mathbf e}}_{U_2 })}{2h}}&{} {\frac{C({{\mathbf e}}_{U_1 }+{{\mathbf e}}_{U_2 })^T{{\mathbf K}}}{2h}}\\ {\frac{C{{\mathbf K}}({{\mathbf e}}_{U_1 } +{{\mathbf e}}_{U_2 })}{2h}} &{} {{{\mathbf K}}+\frac{C{{\mathbf K}}({{\mathbf I}}_{U_1}+{{\mathbf I}}_{U_2 }){{\mathbf K}}}{2h}} \\ \end{array}}}\right) ,\nonumber \\ \end{aligned}$$

(66)

and

$$\begin{aligned}&\nabla =\left( {\begin{array}{l} \frac{\partial J({\varvec{\alpha }},b)}{\partial b} \\ \frac{\partial J({\varvec{\alpha }},b)}{\partial {\varvec{\alpha }}} \\ \end{array}}\right) ={\mathbf H}\left( {{ \begin{array}{lc} b\\ {\varvec{\alpha }} \\ \end{array} }}\right) \nonumber \\&\qquad +\left( { \begin{array}{c} {\frac{C({{\mathbf e}}_{U_2 } -{{\mathbf e}}_{U_1 })^T(1+h)e}{2h}+C({{\mathbf e}}_{U_1}+{{\mathbf e}}_{N_1}-{{\mathbf e}}_{U_2} -{{\mathbf e}}_{N_2 })^T{{\mathbf e}} +C({\varvec{\lambda }}^{\mathbf t})^T{{\mathbf e}}} \\ {\frac{C{{\mathbf K}}({{\mathbf I}}_{U_2 } -{{\mathbf I}}_{U_1 } )(1+h)e}{2h} +C{{\mathbf K}}({{\mathbf I}}_{U_1 }+{{\mathbf I}}_{N_1 }-{{\mathbf I}}_{U_2 } -{{\mathbf I}}_{N_2 }){{\mathbf e}}+C{\varvec{K \lambda }}^{\mathbf t}} \\ \end{array}}\right) .\nonumber \\ \end{aligned}$$

(67)

For any nonzero vector $(\text{ b }{\varvec{\alpha }}^T)\in R^{l+1}$,

$$\begin{aligned}&(\text{ b }{\varvec{\alpha }}^T){\mathbf H}\left( {{ \begin{array}{ll} b \\ {\varvec{\alpha }} \\ \end{array}}}\right) =(\hbox {b} \quad {\varvec{\alpha }}^T) \\&\qquad \times \left( {{ \begin{array}{ll} {\delta +\frac{C({{\mathbf e}}_{U_1 }+{{\mathbf e}}_{U_2 })^T({{\mathbf e}}_{U_1} +{{\mathbf e}}_{U_2 })}{2h}}&{} {\frac{C({{\mathbf e}}_{U_1} +{{\mathbf e}}_{U_2})^T{{\mathbf K}}}{2h}} \\ {\frac{C{{\mathbf K}}({{\mathbf e}}_{U_1 }+{{\mathbf e}}_{U_2 })}{2h}} &{} {{{\mathbf K}}+\frac{C{{\mathbf K}}({{\mathbf I}}_{U_1}+{{\mathbf I}}_{U_2}){{\mathbf K}}}{2h}} \\ \end{array}}}\right) \left( {{ \begin{array}{l} b \\ {\varvec{\alpha }} \\ \end{array}}}\right) \\&\quad =b^2\delta +b^2C\frac{({{\mathbf e}}_{U_1 } +{{\mathbf e}}_{U_2 } )^T({{\mathbf e}}_{U_1 }+{{\mathbf e}}_{U_2 })}{2h} +bC {\varvec{\alpha }}^T\frac{{{\mathbf K}}({{\mathbf e}}_{U_1} \qquad +{{\mathbf e}}_{U_2 } )}{2h}\\&\qquad +bC\frac{({{\mathbf e}}_{U_1} +{{\mathbf e}}_{U_2 })^T{{\mathbf K}}}{2h}{\varvec{\alpha }}+{\varvec{\alpha }}^T( {{{\mathbf K}}+\frac{C{{\mathbf K}}({{\mathbf I}}_{U_1 } +{{\mathbf I}}_{U_2 }){{\mathbf K}}}{2h}}){\varvec{\alpha }} \\&\quad =b^2\delta +{\varvec{\alpha }}^T{\varvec{K \alpha }}+C( b^2\frac{({{\mathbf e}}_{U_1 }+{{\mathbf e}}_{U_2 })^T({{\mathbf e}}_{U_1}+{{\mathbf e}}_{U_2 })}{2h}\\&\qquad +b{\varvec{\alpha }}^T\frac{{{\mathbf K}}({{\mathbf e}}_{U_1} +{{\mathbf e}}_{U_2 })}{2h}+b\frac{({{\mathbf e}}_{U_1} +{{\mathbf e}}_{U_2 })^T{{\mathbf K}}}{2h}{\varvec{\alpha }}+{\varvec{\alpha }}^T\frac{{{\mathbf K}}({{\mathbf I}}_{U_1 } +{{\mathbf I}}_{U_2 } ){{\mathbf K}}}{2h}{\varvec{\alpha }}) \\&\quad =b^2\delta +{\varvec{\alpha }}^T{\varvec{K \alpha }}+\frac{C}{2h}( {b({{\mathbf e}}_{U_1 } +{{\mathbf e}}_{U_2 } )+({{\mathbf I}}_{U_1 }+{{\mathbf I}}_{U_2 }){\varvec{K \alpha }}})^T \\&\qquad \times ({b({{\mathbf e}}_{U_1 } +{{\mathbf e}}_{U_2 })+({{\mathbf I}}_{U_1}+{{\mathbf I}}_{U_2}){\varvec{K \alpha }}})>0 \\ \end{aligned}$$

Therefore, the optimization problem (31) is a strictly convex QP problem.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Yang, X., Han, L., Li, Y. et al. A bilateral-truncated-loss based robust support vector machine for classification problems. Soft Comput 19, 2871–2882 (2015). https://doi.org/10.1007/s00500-014-1448-9

Download citation

Published: 10 September 2014
Issue Date: October 2015
DOI: https://doi.org/10.1007/s00500-014-1448-9

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A bilateral-truncated-loss based robust support vector machine for classification problems

Abstract

Access this article

Similar content being viewed by others

Unified SVM algorithm based on LS-DC loss