Sparse discriminant twin support vector machine for binary classification

Zheng, Xiaohan; Zhang, Li; Yan, Leilei

doi:10.1007/s00521-022-07001-1

Sparse discriminant twin support vector machine for binary classification

S.I.: NCAA 2021
Published: 04 March 2022

Volume 34, pages 16173–16198, (2022)
Cite this article

Neural Computing and Applications Aims and scope Submit manuscript

421 Accesses
4 Citations
1 Altmetric
Explore all metrics

Abstract

For a binary classification problem, twin support vector machine (TSVM) has a faster learning speed than support vector machine (SVM) by seeking a pair of nonparallel hyperplanes. However, TSVM has two deficiencies: poor discriminant ability and poor sparsity. To relieve them, we propose a novel sparse discriminant twin support vector machine (SD-TSVM). Inspired by the idea of the Fisher criterion, maximizing the between-class scatter and minimizing the within-class scatter, SD-TSVM introduces twin Fisher regularization terms, which may improve the discriminant ability of SD-TSVM. Moreover, SD-TSVM has a good sparsity by utilizing both the 1-norm of model coefficients and the hinge loss. Thus, SD-TSVM can efficiently perform data reduction. Classification results on nine real-world datasets show that SD-TSVM has a satisfactory performance compared with related methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Feature dimensionality reduction: a review

Article Open access 21 January 2022

Survey on SVM and their application in image classification

Article 11 January 2018

A comprehensive survey on feature selection in the various fields of machine learning

Article 23 July 2021

References

Chen T, Guo Y, Hao S (2020) Unsupervised feature selection based on joint spectral learning and general sparse regression. Neural Comput Appl 32:6581–6589
Article Google Scholar
Cortes C, Vapnik V (1995) Support vector networks. Mach Learn 20(3):273–297
MATH Google Scholar
Cristianini N, Shawe-Taylor J (2000) An introduction to support vector machines and other kernel-based learning methods. Cambridge University, Cambridge
Book Google Scholar
Demsar J (2006) Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 7(1):1–30
MathSciNet MATH Google Scholar
Dheeru D, Karra Taniskidou E (2017) UCI machine learning repository. http://archive.ics.uci.edu/ml
Duda RO, Hart PE, Stork DG (2000) Pattern classification. Wiley, Hoboken
MATH Google Scholar
Dunn OJ (1961) Multiple comparisons among means. J Am Stat Assoc 56(293):52–64
Article MathSciNet Google Scholar
den Hertog D (1994) Interior point approach to linear, quadratic and convex programming: algorithms and complexity. Kluwer Academic Publishers, Dordrecht
Book Google Scholar
Gu Z, Zhang Z, Sun J, Li B (2017) Robust image recognition by l1-norm twin-projection support vector machine. Neurocomputing 223:1–11
Article Google Scholar
Horn RA, Johnson RC (1985) Matrix analysis. Cambridge University Press, Cambridge
Book Google Scholar
Jayadeva Khemchandani R, Chandra S (2007) Twin support vector machine for pattern classification. IEEE Trans Pattern Anal Mach Intell 29(5):905–910
Article Google Scholar
Jiang J, Ma J, Chen C, Jiang X, Wang Z (2017) Noise robust face image super-resolution through smooth sparse representation. IEEE Trans Cybern 47(11):3991–4002
Article Google Scholar
Kumar MA, Gopall M (2009) Least squares twin support vector machine for pattern classification. Expert Syst Appl 36:7535–7543
Article Google Scholar
Liu L, Chu M, Yang Y, Gong R (2020) Twin support vector machine based on adjustable large margin distribution for pattern classification. Int J Mach Learn Cybern 11:2371–2389
Article Google Scholar
Ma J, Tian J, Bai X, Tu Z (2013) Regularized vector field learning with sparse approximation for mismatch removal. Pattern Recognit 46:3519–3532
Article Google Scholar
Mangasarian OL (2006) Exact 1-norm support vector machines via unconstrained convex differentiable minimization. J Mach Learn Res 7(3):1517–1530
MathSciNet MATH Google Scholar
Mangasarian OL, Wild E (2006) Multisurface proximal support vector machine classification via generalized eigenvalues. IEEE Trans Pattern Anal Mach Intell 28(1):69–74
Article Google Scholar
Richhariya B, Tanveer M (2021) A fuzzy universum least squares twin support vector machine (FULSTSVM). Neural Comput Appl. https://doi.org/10.1007/s00521-021-05721-4
Article Google Scholar
Shao Y, Zhang C, Wang X, Deng N (2011) Improvements on twin support vector machine. IEEE Trans Neural Netw 22(6):962–968
Article Google Scholar
Shi Y, Miao J, Niu L (2019) Feature selection with MCP² regularization. Neural Comput Appl 31:6699–6709
Article Google Scholar
Tanveer M (2015) Robust and sparse linear programming twin support vector machines. Cogn Comput 7(1):137–149
Article Google Scholar
Thi HAL, Phan DN (2017) DC programming and DCA for sparse fisher linear discriminant analysis. Neural Comput Appl 28:2809–2822
Article Google Scholar
Tian Y, Ju X, Qi Z (2014) Efficient sparse nonparallel support vector machines for classification. Neural Comput Appl 24(5):1089–1099
Article Google Scholar
Vapnik VN (2000) The nature of statistical learning theory. Springer, Berlin
Book Google Scholar
Zhang L, Zhou W, Chang P, Liu J, Yan Z, Wang T, Li F (2012) Kernel sparse representation-based classifier. IEEE Trans Signal Process 60(4):1684–1695
Article MathSciNet Google Scholar
Zhang L, Zhou W, Jiao L (2004) Hidden space support vector machine. IEEE Trans Neural Netw 15(6):1424–1434
Article Google Scholar
Zhang L, Zhou WD (2016) Fisher-regularized support vector machine. Inf Sci 343–344:79–93
Article MathSciNet Google Scholar
Zhang Z, Zhen L, Deng N, Tan J (2014) Sparse least square twin support vector machine with adaptive norm. Appl Intell 41(4):1097–1107
Article Google Scholar
Zheng X, Zhang L, Xu Z (2021) L1-norm Laplacian support vector machine for data reduction in semi-supervised learning. Neural Comput Appl. https://doi.org/10.1007/s00521-020-05609-9
Article Google Scholar
Zheng X, Zhang L, Yan L (2020) Feature selection using sparse twin bounded support vector machine. In: International conference on neural information processing. Springer, Bangkok, pp 357–369
Zheng X, Zhang L, Yan L (2021) CTSVM: a robust twin support vector machine with correntropy-induced loss function for binary classification problems. Inf Sci 559:22–45
Article MathSciNet Google Scholar
Zheng X, Zhang L, Yan L (2021) Sample reduction using $\ell 1$-norm twin bounded support vector machine. In: Zhang H, Yang Z, Zhang Z, Wu Z, Hao TY (eds) Neural computing for advanced applications. Springer, Singapore, pp 141–153
Chapter Google Scholar

Download references

Acknowledgements

This work was supported in part by the Natural Science Foundation of the Jiangsu Higher Education Institutions of China under Grant No. 19KJA550002, by the Six Talent Peak Project of Jiangsu Province of China under Grant No. XYDXX-054, by the Priority Academic Program Development of Jiangsu Higher Education Institutions, and by the Collaborative Innovation Center of Novel Software Technology and Industrialization.

Author information

Authors and Affiliations

School of Computer Science and Technology & Joint International Research Laboratory of Machine Learning and Neuromorphic Computing, Soochow University, Suzhou, 215006, Jiangsu, China
Xiaohan Zheng, Li Zhang & Leilei Yan
Provincial Key Laboratory for Computer Information Processing Technology, Soochow University, Suzhou, 215006, Jiangsu, China
Xiaohan Zheng, Li Zhang & Leilei Yan

Authors

Xiaohan Zheng
View author publications
You can also search for this author in PubMed Google Scholar
Li Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Leilei Yan
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Li Zhang.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Human and animal rights

This article does not contain any studies with human participants or animals performed by any of the authors.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

1.1 Appendix 1: Proof of Theorem 1

According to the properties of convex programming [10], we know that an optimization problem is a convex programming if and only if both its objective function and constraints are convex. In other words, an optimization problem is non-convex if and only if its objective function or constraints are non-convex. Now, we try to prove that the objective of optimization problem (17) or (18) is non-convex. Because these two problems are similar and symmetric, we mainly discuss one of them, or the optimization problem (17).

For the sake of simplification, we define

$$\begin{aligned} J_1({\mathbf {w}}_1, b_1)&= \frac{1}{2} \left\| f_1\left( {\mathbf{X}}_1\right) \right\| ^2_2, \end{aligned}$$

(42)

$$\begin{aligned} J_2({\mathbf {w}}_1)&= C_1\left\| f_1({\mathbf {X}}_1)-\overline{f_1({\mathbf{X}}_1)}\right\| _2^2, \end{aligned}$$

(43)

and

$$\begin{aligned} J_3({\mathbf {w}}_1, b_1)=-C_1\left( \left\| -f_1({\mathbf{X}}_2)-\overline{f_1({\mathbf {X}}_1)}\right\| _2^2\right) . \end{aligned}$$

(44)

In doing so, we can express the optimization problem (17) as

$$\begin{aligned} \min _{{\mathbf {w}}_1, b_1}\,\,\,\,\,\,J_1({\mathbf {w}}_1, b_1)+J_2({\mathbf{w}}_1)+J_3({\mathbf {w}}_1, b_1). \end{aligned}$$

(45)

If we assume that (45) is a convex programming, then $J_1({\mathbf {w}}_1, b_1)$, $J_2({\mathbf {w}}_1)$, and $J_3({\mathbf{w}}_1, b_1)$ must be convex functions.

By substituting $f_1({\mathbf {X}}_1)={\mathbf {X}}_1{\mathbf {w}}_1+b_1$ into (42), the first term in the optimization problem (45) can be expressed as:

$$\begin{aligned} \begin{aligned}&J_1({\mathbf {w}}_1, b_1)\\&\quad =\frac{1}{2}({\mathbf {X}}_1{\mathbf {w}}_1+b_1)^T({\mathbf {X}}_1{\mathbf {w}}_1+b_1)\\&\quad =\frac{1}{2}\left( {\mathbf {w}}_1^T {\mathbf {X}}_1^T{\mathbf {X}}_1{\mathbf{w}}_1+2{\mathbf {w}}_1^T{\mathbf {X}}_1^Tb_1+b_1^2\right) . \end{aligned} \end{aligned}$$

(46)

Obviously, the second-order derivative of $J_1({\mathbf {w}}_1, b_1)$ is ${\mathbf {X}}_1^T{\mathbf {X}}_1$ that is a positive semi-definite matrix. Thus, $J_1({\mathbf {w}}_1, b_1)$ is convex.

By substituting $f_1({\mathbf {X}}_1)={\mathbf {X}}_1{\mathbf {w}}_1+b_1$ and $\overline{f_1({\mathbf {X}}_1)}={\mathbf {m}}_1^T{\mathbf {w}}_1+b_1$ into (43), the second term in the optimization problem (45) can be expressed as:

$$\begin{aligned} \begin{aligned}&J_2({\mathbf {w}}_1)\\&\quad =C_1\left( {\mathbf {X}}_1 {\mathbf {w}}_1-{\mathbf {e}}_1 {\mathbf {m}}_1^T {\mathbf {w}}_1\right) ^T\left( {\mathbf {X}}_1 {\mathbf {w}}_1-{\mathbf {e}}_1 {\mathbf {m}}_1^T {\mathbf {w}}_1\right) \\&\quad =C_1{\mathbf {w}}_1^T\left( {\mathbf {X}}_1-{\mathbf {e}}_1{\mathbf{m}}_1^T\right) ^T\left( {\mathbf {X}}_1-{\mathbf {e}}_1{\mathbf{m}}_1^T\right) {\mathbf {w}}_1. \end{aligned} \end{aligned}$$

(47)

The second-order derivative of $J_2({\mathbf {w}}_1)$ is $\left( {\mathbf{X}}_1-{\mathbf {e}}_1{\mathbf {m}}_1^T\right) ^T\left( {\mathbf {X}}_1-{\mathbf{e}}_1{\mathbf {m}}_1^T\right)$ that is a positive semi-definite matrix. Thus, $J_2({\mathbf {w}}_1)$ is convex.

By substituting $f_1({\mathbf {X}}_2)={\mathbf {X}}_2{\mathbf {w}}_1+b_1$ and $\overline{f_1({\mathbf {X}}_1)}={\mathbf {m}}_1^T{\mathbf {w}}_1+b_1$ into (44), the third term $J_3({\mathbf {w}}_1, b_1)$ in the optimization problem (45) can be expressed as:

$$\begin{aligned} \begin{aligned}&J_3({\mathbf {w}}_1, b_1)\\&\quad =-C_1\left\| -{\mathbf {X}}_2{\mathbf {w}}_1-b_1-{\mathbf {e}}_2{\mathbf {m}}_1^T{\mathbf {w}}_1-b_1\right\| _2^2\\&\quad =-C_1\left\| \left( -{\mathbf {X}}_2-{\mathbf {e}}_2{\mathbf {m}}_1^T\right) {\mathbf {w}}_1-2b_1\right\| _2^2\\&\quad =-C_1\left( {\mathbf {w}}_1^T\left( -{\mathbf {X}}_2-{\mathbf {e}}_2{\mathbf{m}}_1^T\right) ^T\left( -{\mathbf {X}}_2-{\mathbf {e}}_2{\mathbf{m}}_1^T\right) {\mathbf {w}}_1-4{\mathbf {w}}_1^T\left( -{\mathbf {X}}_2-{\mathbf{e}}_2{\mathbf {m}}_1^T\right) ^T{\mathbf {e}}_1b_1+4b_1^2\right) . \end{aligned} \end{aligned}$$

(48)

The second-order derivative of $J_3({\mathbf {w}}_1,b_1)$ is $-C_1\left( -{\mathbf {X}}_2-{\mathbf {e}}_2{\mathbf{m}}_1^T\right) ^T\left( -{\mathbf {X}}_2-{\mathbf {e}}_2{\mathbf {m}}_1^T\right)$ that is not positive semi-definite. Thus, $J_3({\mathbf {w}}_1, b_1)$ is non-convex.

We prove that both $J_1({\mathbf {w}}_1, b_1)$ and $J_2({\mathbf {w}}_1)$ are convex, but $J_3({\mathbf {w}}_1, b_1)$ is non-convex. This contradicts the assumption that the optimization problem (17) is convex.

In the same way, the non-convexity of optimization problem (18) can be proved, not tired in words here. It completes the proof of Theorem 1.

1.2 Appendix 2: Derivation of optimization problems

To construct convex forms for the optimization problems (17) and (18), we first remove the non-convex parts from the objective functions and then make them as constraints. We illustrate the transformation using (17). For (18), we can follow the same way.

In (17), the minimization of the non-convex part $-\left\| -f_1({\mathbf {X}}_2)-\overline{f_1({{\mathbf {X}}_1})}\right\| ^2_2$ is identical to the maximization of $\left\| -f_1({\mathbf {X}}_2)-\overline{f_1({{\mathbf {X}}_1})}\right\| ^2_2$. We take the latter as constraints and get a variant of (17) in the following:

$$\begin{aligned} \begin{aligned} \min _{{\mathbf {w}}_1, b_1}\,\,\,\,\,\,&\frac{1}{2} \left\| f_1({\mathbf {X}}_1)\right\| ^2_2+C_1\left\| f_1({\mathbf {X}}_1)-\overline{f_1({\mathbf {X}}_1)}\right\| ^2_2, \\ {\mathrm{s.t.}}\,\,\,\,\,\,&\left( -f_1({\mathbf {x}}_{2i})-\overline{f_1({\mathbf {X}}_1)}\right) ^2 \ge \epsilon _1^2,\,\,\, i=1,\dots ,n_2,\\ \end{aligned} \end{aligned}$$

(49)

where $\epsilon _1$ is a constant greater than or equal to 0.

Moreover, the constraints of (49) can be replaced by

$$\begin{aligned} \left\{ \begin{array}{ll} -f_1({\mathbf {x}}_{2i})-\overline{f_1({\mathbf {X}}_1)} \ge \epsilon _1,\,\,\, i=1\dots ,n_2, \\ -f_1({\mathbf {x}}_{2i})-\overline{f_1({\mathbf {X}}_1)} \le -\epsilon _1,\,\,\, i=1\dots ,n_2. \end{array} \right. \end{aligned}$$

(50)

The second group of inequalities in (50) holding true means that the value of $\overline{f_1({\mathbf {X}}_1)}$ is as large as possible, and that of $f_1({\mathbf {x}}_{2i})$ as small as possible due to $f_1({\mathbf {X}}_2)\le {0}$ and $f_1({\mathbf {X}}_1)\ge {0}$, which is inconsistent with the goal of minimizing $\left\| f_1({\mathbf{X}}_1)\right\| ^2$. Consequently, these inequalities of (50) are redundant and can be ignored. Therefore, (49) can be simplified to:

$$\begin{aligned} \begin{aligned} \min _{{\mathbf {w}}_1, b_1}\,\,\,\,\,\,&\frac{1}{2} \left\| f_1({\mathbf {X}}_1)\right\| ^2_2+C_1\left\| f_1({\mathbf {X}}_1)-\overline{f_1({\mathbf {X}}_1)}\right\| ^2_2, \\ {\mathrm{s.t.}}\,\,\,\,\,\,&-f_1({\mathbf {X}}_2)-\overline{f_1({\mathbf {X}}_1)} \ge \epsilon _1 {\mathbf {e}}_2.\\ \end{aligned} \end{aligned}$$

(51)

Considering the case of outlier or noise existing in data, we relax the constraints by adding slack variables $\xi _{2i}\ge 0$ and set $\epsilon _1=1$. Finally, we have the convex variant of (17) as follows:

$$\begin{aligned} \begin{aligned} \min _{{\mathbf {w}}_1, b_1,{\varvec{\xi }}_2}\,\,\,\,\,\,&\frac{1}{2} ||f_1({\mathbf {X}}_1)||^2_2+C_1\left\| f_1({\mathbf {X}}_1)-\overline{f_1({\mathbf {X}}_1)}\right\| ^2_2+C_3{\mathbf {e}}^T_2{\varvec{\xi }}_2, \\ {\mathrm{s.t.} }\quad \quad&-f_1({\mathbf {X}}_2)-\overline{f_1({\mathbf{X}}_1)}+{\varvec{\xi }}_2\ge {\mathbf {e}}_{2},\,\,\,{\varvec{\xi }}_2\ge {\mathbf {0}}_{n_2}, \end{aligned} \end{aligned}$$

where $C_3>0$ and ${\varvec{\xi }}_2=[\xi _{21},\dots ,\xi _{2n_2}]^T$.

Thus, we have completed the derivation from (17) to (19). Similarly, the optimization problem (18) also can be derived to (20) in the same way.

1.3 Appendix 3: Proof of Theorem 2

Here, we need to prove that the objective functions and constraints of optimization problems (19) and (20) are convex. Since the structure of (19) and (20) is similar, we discuss only one of them in detail.

To simplify the expression of (19), we use $J_1({\mathbf {w}}_1,b_1)$ (42) and $J_2({\mathbf {w}}_1)$ (43) in “Appendix 1” and define $J_4({\varvec{\xi }}_2)=C_3{\mathbf {e}}_2^T{\varvec{\xi }}_2$. Therefore, (19) can be rewritten as

$$\begin{aligned} \begin{aligned} \min _{{\mathbf {w}}_1, b_1, {\varvec{\xi }}_2} \,\,\,\,\,\,&J_1({\mathbf {w}}_1, b_1)+J_2({\mathbf {w}}_1)+J_3({\varvec{\xi }}_2),\\ {\mathrm{s.t.}} \,\,\,\,\,\,&J_{c_1}({\varvec{\nu }}_2)\ge {\mathbf {0}}_{n_2} ,\\&J_{c_2}({\varvec{\nu }}_2)\ge {\mathbf {0}}_{n_2},\\ \end{aligned} \end{aligned}$$

(52)

where ${\varvec{\nu }}_2=[{\mathbf {w}}_1^T,b_1,{\varvec{\xi }}_2]^T$, $J_{c_1}({\varvec{\nu }}_2) = {\mathbf {G}}_1 {\varvec{\nu }}_2^T$ with ${\mathbf {G}}_1 = \left[ -{\mathbf {X}}_2-{\mathbf {e}}_2{\mathbf {m}}_1^T,\,-2{\mathbf {e}}_2,\,{\mathbf{I}}_{n_2\times n_2}\right]$, and $J_{c_2}({\varvec{\nu }}_2) = {\mathbf {G}}_2 {\varvec{\nu }}_2^T$ with ${\mathbf {G}}_2 = \left[ {\mathbf {O}}_{n_2\times m},\,{\mathbf {0}}_{n_2},\,{\mathbf {I}}_{n_2\times n_2}\right]$.

According to the proof of Theorem 1, we know that both $J_1({\mathbf {w}}_1, b_1)$ and $J_2({\mathbf {w}}_1)$ are convex functions. In addition, $J_3({\varvec{\xi }}_2)$ is also convex since it is a linear function. Thus, the objective function of the problem (19) is convex.

Next, we discuss the constraints $J_{c_1}({\varvec{\nu }}_2)$ and $J_{c_2}({\varvec{\nu }}_2)$. The second-order derivative of constraints can be described as:

$$\begin{aligned} \begin{aligned}&\frac{\partial ^2 J_{c_1}({\varvec{\nu }}_2)}{\partial {\varvec{\nu }}_2\partial {\varvec{\nu }}_2}={\mathbf {0}}, \\&\frac{\partial ^2 J_{c_2}({\varvec{\nu }}_2)}{\partial {\varvec{\nu }}_2\partial {\varvec{\nu }}_2}={\mathbf {0}}. \\ \end{aligned} \end{aligned}$$

(53)

We know that if the second-order derivative of a function is greater than or equal to 0, this function is a convex function. Therefore, both $J_{c_1}({\varvec{\nu }}_2)$ and $J_{c_2}({\varvec{\nu }}_2)$ are convex.

Since the objective and constraint functions of (19) are convex, the optimization problem (19) is a convex programming. In a similar way, we can prove that the optimization problem (20) is also convex.

It completes the proof.

1.4 Appendix 4: Derivation from (23) to (25)

For convenience, we write the optimization problem (23) again. Namely,

$$\begin{aligned} \begin{aligned} \min _{{\varvec{\beta }}_1^*,{\varvec{\beta }}_1 ,\gamma _1^*, \gamma _1,{\varvec{\xi }}_2} \quad&\frac{1}{2} ||{\mathbf {X}}_1\left( {\varvec{\beta }}_1^*-{\varvec{\beta }}_1\right) +{\mathbf {e}}_1 (\gamma _1^*-\gamma _1)||^2_2+{{C_1}}||({\mathbf {X}}_1-{\mathbf {e}}_1{\mathbf {m}}_1^T)({\varvec{\beta }}_1^* - {\varvec{\beta }}_1)||^2_2\\&+C_3{\mathbf {e}}_2^T {\varvec{\xi }}_2+C_5\left( \Vert {\varvec{\beta }}_1^*\Vert _1+\Vert {\varvec{\beta }}_1\Vert _1 +\gamma ^*_{1}+\gamma _{1}\right) , \\ {\mathrm{s.t.}} \quad \quad&-({\mathbf {X}}_2+{\mathbf {e}}_2{\mathbf {m}}_1^T)\left( {\varvec{\beta }}_1^*-{\varvec{\beta }}_1\right) -2{\mathbf {e}}_2\left( \gamma _1^*-\gamma _1\right) +{\varvec{\xi} }_2 \ge {\mathbf {e}}_2, \\ \quad&{\varvec{\xi }}_2 \ge {\mathbf {0}}_{n_2},\,\,{\varvec{\beta} }_1^*\ge {\mathbf {0}}_{m},\,\,{\varvec{\beta }}_1 \ge {\mathbf {0}}_{m},\gamma _1^*\ge 0,\gamma _1\ge 0, \end{aligned} \end{aligned}$$

The first term in the objective function of the optimization problem (23) can be expressed as:

$$\begin{aligned} \begin{aligned}&\frac{1}{2} ||{\mathbf {X}}_1({\varvec{\beta }}_1^*-{\varvec{\beta }}_1)+{\mathbf {e}}_1 (\gamma _1^*-\gamma _1)||^2_2 \\&\quad = \frac{1}{2} \left( {\mathbf {X}}_1({\varvec{\beta }}_1^*-{\varvec{\beta }}_1)+{\mathbf {e}}_1 (\gamma _1^*-\gamma _1)\right) ^T\left( {\mathbf {X}}_1({\varvec{\beta }}_1^*-{\varvec{\beta }}_1)+{\mathbf {e}}_1(\gamma _1^*-\gamma _1)\right) \\&\quad = \frac{1}{2}({\varvec{\beta }}_1^*-{\varvec{\beta }}_1)^T {\mathbf {X}}_1^T {\mathbf {X}}_1({\varvec{\beta }}_1^*-{\varvec{\beta }}_1)+\frac{1}{2}({\varvec{\beta }}_1^*-{\varvec{\beta }}_1)^T{\mathbf {X}}_1^T {\mathbf {e}}_1 (\gamma _1^*-\gamma _1) \\&\qquad + \frac{1}{2}(\gamma _1^*-\gamma _1) {\mathbf {e}}_1^T {\mathbf {X}}_1 ({\varvec{\beta }}_1^*-{\varvec{\beta }}_1) + \frac{1}{2}(\gamma _1^*-\gamma _1) {\mathbf {e}}_1^T {\mathbf {e}}_1 (\gamma _1^*-\gamma _1) \\&\quad = {{\varvec{\beta }}_1^*}^T{\mathbf {X}}_1^T {\mathbf {X}}_1 {{\varvec{\beta }}_1^*} -{{\varvec{\beta }}_1}^T{\mathbf {X}}_1^T {\mathbf {X}}_1 {{\varvec{\beta }}_1^*} -{{\varvec{\beta }}_1^*}^T{\mathbf {X}}_1^T {\mathbf {X}}_1 {{\varvec{\beta }}_1} +{{\varvec{\beta }}_1}^T{\mathbf {X}}_1^T {\mathbf {X}}_1 {{\varvec{\beta }}_1} \\&\qquad +{{\varvec{\beta }}_1^*}^T{\mathbf {X}}_1^T {\mathbf {e}}_1 {\gamma _1^*} -{{\varvec{\beta }}_1^*}^T{\mathbf {X}}_1^T {\mathbf {e}}_1 {\gamma _1} -{{\varvec{\beta }}_1}^T{\mathbf {X}}_1^T {\mathbf {e}}_1 {\gamma _1^*} +{{\varvec{\beta }}_1}^T{\mathbf {X}}_1^T {\mathbf {e}}_1 {\gamma _1} \\&\qquad +{\gamma _1^*}{\mathbf {e}}_1^T {\mathbf {X}}_1 {{\varvec{\beta }}_1^*} -{\gamma _1}{\mathbf {e}}_1^T {\mathbf {X}}_1 {{\varvec{\beta }}_1^*} -{\gamma _1^*}^T{\mathbf {e}}_1^T {\mathbf {X}}_1 {{\varvec{\beta }}_1} +{\gamma _1}^T{\mathbf {e}}_1^T {\mathbf {X}}_1 {{\varvec{\beta }}_1}\\&\qquad +\gamma _1^*{\mathbf {e}}_1^T {\mathbf {e}}_1 \gamma _1^* -\gamma _1^*{\mathbf{e}}_1^T {\mathbf {e}}_1 \gamma _1 -\gamma _1 {\mathbf {e}}_1^T {\mathbf {e}}_1 \gamma _1^* +\gamma _1 {\mathbf {e}}_1^T {\mathbf {e}}_1 \gamma _1\\&\quad =\frac{1}{2}[{{\varvec{\beta }}_1^*}^T,\,{\varvec{\beta }}_1^{T},\,\gamma _1^*,\,\gamma _1] \left[ \begin{array}{cccc} {{\mathbf {X}}_1^T} {\mathbf {X}}_1 &\quad -{{\mathbf {X}}_1^T} {\mathbf {X}}_1 &\quad 0.5{{\mathbf {X}}_1^T}{\mathbf {e}}_1 &\quad -0.5{{\mathbf {X}}_1^T}{\mathbf {e}}_1 \\ -{{\mathbf {X}}_1^T} {\mathbf {X}}_1 &\quad {{\mathbf {X}}_1^T} {\mathbf {X}}_1 &\quad -0.5{{\mathbf {X}}_1^T} {\mathbf {e}}_1 &\quad 0.5{{\mathbf {X}}_1^T} {\mathbf {e}}_1 \\ 0.5{{\mathbf {e}}_1^T} {\mathbf {X}}_1 &\quad -0.5{{\mathbf {e}}_1^T} {\mathbf {X}}_1 &\quad {{\mathbf {e}}_1^T}{{\mathbf {e}}_1} &\quad -{{\mathbf {e}}_1^T}{{\mathbf {e}}_1} \\ -0.5{{\mathbf {e}}_1^T} {\mathbf {X}}_1 &\quad 0.5{{\mathbf {e}}_1^T} {\mathbf {X}}_1 &\quad -{{\mathbf {e}}_1^T}{{\mathbf {e}}_1} &\quad {{\mathbf {e}}_1^T}{{\mathbf {e}}_1} \end{array} \right] [{{\varvec{\beta }}_1^*}^T,\,{\varvec{\beta} }_1^{T},\,\gamma _1^*,\,\gamma _1]^T. \end{aligned} \end{aligned}$$

(54)

The second term in the objective function of (23) can be rewritten as:

$$\begin{aligned} \begin{aligned}&{{C_1}}||({\mathbf {X}}_1-{\mathbf {m}}_1)({\varvec{\beta }}_1^* - {\varvec{\beta }}_1)||^2_2\\&\quad ={{C_1}}\left( ({\mathbf {X}}_1-{\mathbf {m}}_1)({\varvec{\beta }}_1^* - {\varvec{\beta }}_1)\right) ^T\left( ({\mathbf {X}}_1-{\mathbf {m}}_1)({\varvec{\beta }}_1^* - {\varvec{\beta }}_1)\right) \\&\quad =C_1\left( {{\varvec{\beta }}_1^*}^T{\mathbf {X}}_0^T{\mathbf {X}}_0{\varvec{\beta }}_1^* -{{\varvec{\beta }}_1^*}^T{\mathbf {X}}_0^T{\mathbf {X}}_0{\varvec{\beta }}_1 -{{\varvec{\beta }}_1^{T}}{\mathbf {X}}_0^T{\mathbf {X}}_0{\varvec{\beta }}_1^* -{{\varvec{\beta }}_1^{T}}{\mathbf {X}}_0^T{\mathbf {X}}_0{\varvec{\beta }}_1 \right) \\&\quad =[{{\varvec{\beta }}_1^*}^T,\,{\varvec{\beta }}_1^{T}] \left[ \begin{array}{cc} {{\mathbf {X}}_0^T} {\mathbf {X}}_0 &\quad -{{\mathbf {X}}_0^T} {\mathbf {X}}_0 \\ -{{\mathbf {X}}_0^T} {\mathbf {X}}_0 &\quad {{\mathbf {X}}_0^T} {\mathbf {X}}_0 \end{array} \right] {[{{\varvec{\beta }}_1^*}^T,\,{\varvec{\beta }}_1^{T}]}^T, \end{aligned} \end{aligned}$$

(55)

where ${\mathbf {X}}_0=C_1^{\frac{1}{2}}({\mathbf {X}}_1-{\mathbf {e}}_1{\mathbf{m}}_1^T)$.

Next, we rewrite the last two terms in the objective of (23) as:

$$\begin{aligned} \begin{aligned}&C_3\left( \Vert {\varvec{\beta }}_1^*\Vert _1+\Vert {\varvec{\beta }}_1\Vert _1+\gamma _1^{*}+\gamma _1\right) +C_5{\mathbf {e}}_2^T {\varvec{\xi }}_2 \\&\quad = C_3 \left( 1_{m}^T{\varvec{\beta }}_1^*+1_{m}^T{\varvec{\beta }}_1+\gamma _1^*+\gamma _1\right) +C_5{\mathbf {e}}_2^T{\varvec{\xi }}_2 \\&\quad = \left[ \begin{array}{ccccc} C_3 {\mathbf {1}}_{m}^T,\, &\quad C_3 {\mathbf {1}}_{m}^T,\, &\quad C_3,\, &\quad C_3,\, &\quad C_5{\mathbf {e}}_2^T \\ \end{array}\right] ^T\left[ \begin{array}{ccccc} {{\varvec{\beta }}_1^{*T}},\,&\quad {{\varvec{\beta }}_1^T},\,&\quad \gamma _1^*,\,&\quad \gamma _1,\,&\quad {{\varvec{\xi }}_2^T}\end{array} \right] . \end{aligned} \end{aligned}$$

(56)

The inequality constraints in (23) can be directly written as:

$$\begin{aligned} \begin{aligned}&-({\mathbf {X}}_2+{\mathbf {m}}_1)\left( {\varvec{\beta }}_1^*-{\varvec{\beta }}_1\right) -2{\mathbf {e}}_2\left( \gamma _1^*-\gamma _1\right) +{\varvec{\xi }}_2 \ge {\mathbf {e}}_2 \\&\quad \Rightarrow -({\mathbf {X}}_2+{\mathbf {m}}_1) {\varvec{\beta }}_1^*+({\mathbf {X}}_2+{\mathbf {m}}_1) {\varvec{\beta }}_1-2{\mathbf {e}}_2\gamma _1^*+2{\mathbf {e}}_2\gamma _1+{\varvec{\xi }}_2 \ge {\mathbf {e}}_2 \\&\quad \Rightarrow \left[ \begin{array}{ccccc} -{\mathbf {X}}_2-{\mathbf {m}}_1, &\quad {\mathbf {X}}_2+{\mathbf {m}}_1,\, &\quad -2{\mathbf {e}}_2, &\quad 2{\mathbf {e}}_2, &\quad {\mathbf {I}}_{{n_2}\times {n_2}} \\ \end{array} \right] \left[ \begin{array}{ccccc} {{\varvec{\beta }}_1^{*T}},\,&\quad {{\varvec{\beta }}_1^T},\,&\quad \gamma _1^*,\,&\quad \gamma _1,\,&\quad {{\varvec{\xi }}_2^T}\end{array} \right] ^T \ge {\mathbf {e}}_2. \end{aligned} \end{aligned}$$

(57)

For simplicity, we denote ${\varvec{\alpha }}_1=\left[ \begin{array}{ccccc} {{\varvec{\beta }}_1^{*T}},\,&\quad {{\varvec{\beta }}_1^T},\,&\quad \gamma _1^*,\,&\quad \gamma _1,\,&\quad {{\varvec{\xi }}_2^T}\end{array} \right] ^T \in {\mathbb {R}}^{(2m+2+n_2)}$. Thus, the object function of optimization problem (23) can be represented as (25). Namely,

$$\begin{aligned} \begin{aligned} \min _{{\varvec{\alpha }}_1}\,\,\,\,&\frac{1}{2}{\varvec{\alpha }}_1^T{\mathbf {Q}}_1{\varvec{\alpha }}_1+{\mathbf {H}}_1^T{\varvec{\alpha }}_1, \\ {\mathrm{s.t.}} \quad&{\mathbf {P}}_1{\varvec{\alpha }}_1\ge {\mathbf{e}}_2,\,\,{\varvec{\alpha }}_1\ge {\mathbf {0}}_{2m+2+n_2}, \end{aligned} \end{aligned}$$

where

$$\begin{aligned} {\mathbf {Q}}_1= \left[ \begin{array}{ll} {\mathbf {Q}}'_1 & {\mathbf {O}}_{(2m+2)\times n_2}\\ {\mathbf {O}}_{n_2\times (2m+2)} & {\mathbf {O}}_{n_2\times n_2} \end{array} \right] \end{aligned}$$

with

$$\begin{aligned} \begin{aligned} {\mathbf {Q}}'_1&= \left[ \begin{array}{cccc} {{\mathbf {X}}_1^T} {\mathbf {X}}_1+{{\mathbf {X}}_0^T} {\mathbf {X}}_0 &\quad -{{\mathbf {X}}_1^T}{\mathbf {X}}_1-{{\mathbf {X}}_0^T} {\mathbf {X}}_0 &\quad 0.5{{\mathbf {X}}_1^T}{\mathbf {e}}_1 &\quad -0.5{{\mathbf {X}}_1^T}{\mathbf {e}}_1 \\ -{{\mathbf {X}}_1^T}{\mathbf {X}}_1-{{\mathbf {X}}_0^T} {\mathbf {X}}_0 &\quad {{\mathbf {X}}_1^T}{\mathbf {X}}_1+{{\mathbf {X}}_0^T} {\mathbf {X}}_0 &\quad -0.5{{\mathbf {X}}_1^T} {\mathbf {e}}_1 &\quad 0.5{{\mathbf {X}}_1^T} {\mathbf {e}}_1\\ 0.5{{\mathbf {e}}_1^T} {\mathbf {X}}_1 &\quad -0.5{{\mathbf {e}}_1^T} {\mathbf {X}}_1 &\quad {{\mathbf {e}}_1^T}{{\mathbf {e}}_1} &\quad -{{\mathbf {e}}_1^T}{{\mathbf {e}}_1} \\ -0.5{{\mathbf {e}}_1^T} {\mathbf {X}}_1 &\quad 0.5{{\mathbf {e}}_1^T} {\mathbf {X}}_1 &\quad -{{\mathbf {e}}_1^T}{{\mathbf {e}}_1} &\quad {{\mathbf {e}}_1^T}{{\mathbf {e}}_1} \end{array}\right] ,\\ {\mathbf {H}}_1&=\left[ \begin{array}{ccccc} C_3 {\mathbf {1}}_{m}^T,\, &\quad C_3 {\mathbf {1}}_{m}^T, &\quad C_3, &\quad C_3, & \quad C_5{\mathbf {e}}_2^T \\ \end{array}\right] ^T, \end{aligned} \end{aligned}$$

and

$$\begin{aligned} {\mathbf {P}}_1=\left[ \begin{array}{ccccc} -{\mathbf {X}}_2-{\mathbf {m}}_1, &\quad {\mathbf {X}}_2+{\mathbf {m}}_1,\, & \quad -2{\mathbf {e}}_2, &\quad 2{\mathbf {e}}_2, & \quad {\mathbf {I}}_{{n_2}\times {n_2}} \\ \end{array} \right] . \end{aligned}$$

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zheng, X., Zhang, L. & Yan, L. Sparse discriminant twin support vector machine for binary classification. Neural Comput & Applic 34, 16173–16198 (2022). https://doi.org/10.1007/s00521-022-07001-1

Download citation

Received: 01 November 2021
Accepted: 30 January 2022
Published: 04 March 2022
Issue Date: October 2022
DOI: https://doi.org/10.1007/s00521-022-07001-1

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Sparse discriminant twin support vector machine for binary classification

Abstract

Access this article

Similar content being viewed by others

Feature dimensionality reduction: a review

Survey on SVM and their application in image classification

A comprehensive survey on feature selection in the various fields of machine learning

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Human and animal rights

Additional information

Publisher's Note

Appendices

Appendices

1.1 Appendix 1: Proof of Theorem 1

1.2 Appendix 2: Derivation of optimization problems

1.3 Appendix 3: Proof of Theorem 2

1.4 Appendix 4: Derivation from (23) to (25)

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Sparse discriminant twin support vector machine for binary classification

Abstract

Access this article

Similar content being viewed by others

Feature dimensionality reduction: a review

Survey on SVM and their application in image classification

A comprehensive survey on feature selection in the various fields of machine learning

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Human and animal rights

Additional information

Publisher's Note

Appendices

Appendices

1.1 Appendix 1: Proof of Theorem 1

1.2 Appendix 2: Derivation of optimization problems

1.3 Appendix 3: Proof of Theorem 2

1.4 Appendix 4: Derivation from (23) to (25)

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation