Skip to main content

Advertisement

Log in

Sparse discriminant twin support vector machine for binary classification

  • S.I.: NCAA 2021
  • Published:
Neural Computing and Applications Aims and scope Submit manuscript

Abstract

For a binary classification problem, twin support vector machine (TSVM) has a faster learning speed than support vector machine (SVM) by seeking a pair of nonparallel hyperplanes. However, TSVM has two deficiencies: poor discriminant ability and poor sparsity. To relieve them, we propose a novel sparse discriminant twin support vector machine (SD-TSVM). Inspired by the idea of the Fisher criterion, maximizing the between-class scatter and minimizing the within-class scatter, SD-TSVM introduces twin Fisher regularization terms, which may improve the discriminant ability of SD-TSVM. Moreover, SD-TSVM has a good sparsity by utilizing both the 1-norm of model coefficients and the hinge loss. Thus, SD-TSVM can efficiently perform data reduction. Classification results on nine real-world datasets show that SD-TSVM has a satisfactory performance compared with related methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

References

  1. Chen T, Guo Y, Hao S (2020) Unsupervised feature selection based on joint spectral learning and general sparse regression. Neural Comput Appl 32:6581–6589

    Article  Google Scholar 

  2. Cortes C, Vapnik V (1995) Support vector networks. Mach Learn 20(3):273–297

    MATH  Google Scholar 

  3. Cristianini N, Shawe-Taylor J (2000) An introduction to support vector machines and other kernel-based learning methods. Cambridge University, Cambridge

    Book  Google Scholar 

  4. Demsar J (2006) Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 7(1):1–30

    MathSciNet  MATH  Google Scholar 

  5. Dheeru D, Karra Taniskidou E (2017) UCI machine learning repository. http://archive.ics.uci.edu/ml

  6. Duda RO, Hart PE, Stork DG (2000) Pattern classification. Wiley, Hoboken

    MATH  Google Scholar 

  7. Dunn OJ (1961) Multiple comparisons among means. J Am Stat Assoc 56(293):52–64

    Article  MathSciNet  Google Scholar 

  8. den Hertog D (1994) Interior point approach to linear, quadratic and convex programming: algorithms and complexity. Kluwer Academic Publishers, Dordrecht

    Book  Google Scholar 

  9. Gu Z, Zhang Z, Sun J, Li B (2017) Robust image recognition by l1-norm twin-projection support vector machine. Neurocomputing 223:1–11

    Article  Google Scholar 

  10. Horn RA, Johnson RC (1985) Matrix analysis. Cambridge University Press, Cambridge

    Book  Google Scholar 

  11. Jayadeva Khemchandani R, Chandra S (2007) Twin support vector machine for pattern classification. IEEE Trans Pattern Anal Mach Intell 29(5):905–910

    Article  Google Scholar 

  12. Jiang J, Ma J, Chen C, Jiang X, Wang Z (2017) Noise robust face image super-resolution through smooth sparse representation. IEEE Trans Cybern 47(11):3991–4002

    Article  Google Scholar 

  13. Kumar MA, Gopall M (2009) Least squares twin support vector machine for pattern classification. Expert Syst Appl 36:7535–7543

    Article  Google Scholar 

  14. Liu L, Chu M, Yang Y, Gong R (2020) Twin support vector machine based on adjustable large margin distribution for pattern classification. Int J Mach Learn Cybern 11:2371–2389

    Article  Google Scholar 

  15. Ma J, Tian J, Bai X, Tu Z (2013) Regularized vector field learning with sparse approximation for mismatch removal. Pattern Recognit 46:3519–3532

    Article  Google Scholar 

  16. Mangasarian OL (2006) Exact 1-norm support vector machines via unconstrained convex differentiable minimization. J Mach Learn Res 7(3):1517–1530

    MathSciNet  MATH  Google Scholar 

  17. Mangasarian OL, Wild E (2006) Multisurface proximal support vector machine classification via generalized eigenvalues. IEEE Trans Pattern Anal Mach Intell 28(1):69–74

    Article  Google Scholar 

  18. Richhariya B, Tanveer M (2021) A fuzzy universum least squares twin support vector machine (FULSTSVM). Neural Comput Appl. https://doi.org/10.1007/s00521-021-05721-4

    Article  Google Scholar 

  19. Shao Y, Zhang C, Wang X, Deng N (2011) Improvements on twin support vector machine. IEEE Trans Neural Netw 22(6):962–968

    Article  Google Scholar 

  20. Shi Y, Miao J, Niu L (2019) Feature selection with MCP2 regularization. Neural Comput Appl 31:6699–6709

    Article  Google Scholar 

  21. Tanveer M (2015) Robust and sparse linear programming twin support vector machines. Cogn Comput 7(1):137–149

    Article  Google Scholar 

  22. Thi HAL, Phan DN (2017) DC programming and DCA for sparse fisher linear discriminant analysis. Neural Comput Appl 28:2809–2822

    Article  Google Scholar 

  23. Tian Y, Ju X, Qi Z (2014) Efficient sparse nonparallel support vector machines for classification. Neural Comput Appl 24(5):1089–1099

    Article  Google Scholar 

  24. Vapnik VN (2000) The nature of statistical learning theory. Springer, Berlin

    Book  Google Scholar 

  25. Zhang L, Zhou W, Chang P, Liu J, Yan Z, Wang T, Li F (2012) Kernel sparse representation-based classifier. IEEE Trans Signal Process 60(4):1684–1695

    Article  MathSciNet  Google Scholar 

  26. Zhang L, Zhou W, Jiao L (2004) Hidden space support vector machine. IEEE Trans Neural Netw 15(6):1424–1434

    Article  Google Scholar 

  27. Zhang L, Zhou WD (2016) Fisher-regularized support vector machine. Inf Sci 343–344:79–93

    Article  MathSciNet  Google Scholar 

  28. Zhang Z, Zhen L, Deng N, Tan J (2014) Sparse least square twin support vector machine with adaptive norm. Appl Intell 41(4):1097–1107

    Article  Google Scholar 

  29. Zheng X, Zhang L, Xu Z (2021) L1-norm Laplacian support vector machine for data reduction in semi-supervised learning. Neural Comput Appl. https://doi.org/10.1007/s00521-020-05609-9

    Article  Google Scholar 

  30. Zheng X, Zhang L, Yan L (2020) Feature selection using sparse twin bounded support vector machine. In: International conference on neural information processing. Springer, Bangkok, pp 357–369

  31. Zheng X, Zhang L, Yan L (2021) CTSVM: a robust twin support vector machine with correntropy-induced loss function for binary classification problems. Inf Sci 559:22–45

    Article  MathSciNet  Google Scholar 

  32. Zheng X, Zhang L, Yan L (2021) Sample reduction using \(\ell 1\)-norm twin bounded support vector machine. In: Zhang H, Yang Z, Zhang Z, Wu Z, Hao TY (eds) Neural computing for advanced applications. Springer, Singapore, pp 141–153

    Chapter  Google Scholar 

Download references

Acknowledgements

This work was supported in part by the Natural Science Foundation of the Jiangsu Higher Education Institutions of China under Grant No. 19KJA550002, by the Six Talent Peak Project of Jiangsu Province of China under Grant No. XYDXX-054, by the Priority Academic Program Development of Jiangsu Higher Education Institutions, and by the Collaborative Innovation Center of Novel Software Technology and Industrialization.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Li Zhang.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Human and animal rights

This article does not contain any studies with human participants or animals performed by any of the authors.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendices

1.1 Appendix 1: Proof of Theorem 1

According to the properties of convex programming [10], we know that an optimization problem is a convex programming if and only if both its objective function and constraints are convex. In other words, an optimization problem is non-convex if and only if its objective function or constraints are non-convex. Now, we try to prove that the objective of optimization problem (17) or (18) is non-convex. Because these two problems are similar and symmetric, we mainly discuss one of them, or the optimization problem (17).

For the sake of simplification, we define

$$\begin{aligned} J_1({\mathbf {w}}_1, b_1)&= \frac{1}{2} \left\| f_1\left( {\mathbf{X}}_1\right) \right\| ^2_2, \end{aligned}$$
(42)
$$\begin{aligned} J_2({\mathbf {w}}_1)&= C_1\left\| f_1({\mathbf {X}}_1)-\overline{f_1({\mathbf{X}}_1)}\right\| _2^2, \end{aligned}$$
(43)

and

$$\begin{aligned} J_3({\mathbf {w}}_1, b_1)=-C_1\left( \left\| -f_1({\mathbf{X}}_2)-\overline{f_1({\mathbf {X}}_1)}\right\| _2^2\right) . \end{aligned}$$
(44)

In doing so, we can express the optimization problem (17) as

$$\begin{aligned} \min _{{\mathbf {w}}_1, b_1}\,\,\,\,\,\,J_1({\mathbf {w}}_1, b_1)+J_2({\mathbf{w}}_1)+J_3({\mathbf {w}}_1, b_1). \end{aligned}$$
(45)

If we assume that (45) is a convex programming, then \(J_1({\mathbf {w}}_1, b_1)\), \(J_2({\mathbf {w}}_1)\), and \(J_3({\mathbf{w}}_1, b_1)\) must be convex functions.

By substituting \(f_1({\mathbf {X}}_1)={\mathbf {X}}_1{\mathbf {w}}_1+b_1\) into (42), the first term in the optimization problem (45) can be expressed as:

$$\begin{aligned} \begin{aligned}&J_1({\mathbf {w}}_1, b_1)\\&\quad =\frac{1}{2}({\mathbf {X}}_1{\mathbf {w}}_1+b_1)^T({\mathbf {X}}_1{\mathbf {w}}_1+b_1)\\&\quad =\frac{1}{2}\left( {\mathbf {w}}_1^T {\mathbf {X}}_1^T{\mathbf {X}}_1{\mathbf{w}}_1+2{\mathbf {w}}_1^T{\mathbf {X}}_1^Tb_1+b_1^2\right) . \end{aligned} \end{aligned}$$
(46)

Obviously, the second-order derivative of \(J_1({\mathbf {w}}_1, b_1)\) is \({\mathbf {X}}_1^T{\mathbf {X}}_1\) that is a positive semi-definite matrix. Thus, \(J_1({\mathbf {w}}_1, b_1)\) is convex.

By substituting \(f_1({\mathbf {X}}_1)={\mathbf {X}}_1{\mathbf {w}}_1+b_1\) and \(\overline{f_1({\mathbf {X}}_1)}={\mathbf {m}}_1^T{\mathbf {w}}_1+b_1\) into (43), the second term in the optimization problem (45) can be expressed as:

$$\begin{aligned} \begin{aligned}&J_2({\mathbf {w}}_1)\\&\quad =C_1\left( {\mathbf {X}}_1 {\mathbf {w}}_1-{\mathbf {e}}_1 {\mathbf {m}}_1^T {\mathbf {w}}_1\right) ^T\left( {\mathbf {X}}_1 {\mathbf {w}}_1-{\mathbf {e}}_1 {\mathbf {m}}_1^T {\mathbf {w}}_1\right) \\&\quad =C_1{\mathbf {w}}_1^T\left( {\mathbf {X}}_1-{\mathbf {e}}_1{\mathbf{m}}_1^T\right) ^T\left( {\mathbf {X}}_1-{\mathbf {e}}_1{\mathbf{m}}_1^T\right) {\mathbf {w}}_1. \end{aligned} \end{aligned}$$
(47)

The second-order derivative of \(J_2({\mathbf {w}}_1)\) is \(\left( {\mathbf{X}}_1-{\mathbf {e}}_1{\mathbf {m}}_1^T\right) ^T\left( {\mathbf {X}}_1-{\mathbf{e}}_1{\mathbf {m}}_1^T\right)\) that is a positive semi-definite matrix. Thus, \(J_2({\mathbf {w}}_1)\) is convex.

By substituting \(f_1({\mathbf {X}}_2)={\mathbf {X}}_2{\mathbf {w}}_1+b_1\) and \(\overline{f_1({\mathbf {X}}_1)}={\mathbf {m}}_1^T{\mathbf {w}}_1+b_1\) into (44), the third term \(J_3({\mathbf {w}}_1, b_1)\) in the optimization problem (45) can be expressed as:

$$\begin{aligned} \begin{aligned}&J_3({\mathbf {w}}_1, b_1)\\&\quad =-C_1\left\| -{\mathbf {X}}_2{\mathbf {w}}_1-b_1-{\mathbf {e}}_2{\mathbf {m}}_1^T{\mathbf {w}}_1-b_1\right\| _2^2\\&\quad =-C_1\left\| \left( -{\mathbf {X}}_2-{\mathbf {e}}_2{\mathbf {m}}_1^T\right) {\mathbf {w}}_1-2b_1\right\| _2^2\\&\quad =-C_1\left( {\mathbf {w}}_1^T\left( -{\mathbf {X}}_2-{\mathbf {e}}_2{\mathbf{m}}_1^T\right) ^T\left( -{\mathbf {X}}_2-{\mathbf {e}}_2{\mathbf{m}}_1^T\right) {\mathbf {w}}_1-4{\mathbf {w}}_1^T\left( -{\mathbf {X}}_2-{\mathbf{e}}_2{\mathbf {m}}_1^T\right) ^T{\mathbf {e}}_1b_1+4b_1^2\right) . \end{aligned} \end{aligned}$$
(48)

The second-order derivative of \(J_3({\mathbf {w}}_1,b_1)\) is \(-C_1\left( -{\mathbf {X}}_2-{\mathbf {e}}_2{\mathbf{m}}_1^T\right) ^T\left( -{\mathbf {X}}_2-{\mathbf {e}}_2{\mathbf {m}}_1^T\right)\) that is not positive semi-definite. Thus, \(J_3({\mathbf {w}}_1, b_1)\) is non-convex.

We prove that both \(J_1({\mathbf {w}}_1, b_1)\) and \(J_2({\mathbf {w}}_1)\) are convex, but \(J_3({\mathbf {w}}_1, b_1)\) is non-convex. This contradicts the assumption that the optimization problem (17) is convex.

In the same way, the non-convexity of optimization problem (18) can be proved, not tired in words here. It completes the proof of Theorem 1.

1.2 Appendix 2: Derivation of optimization problems

To construct convex forms for the optimization problems (17) and (18), we first remove the non-convex parts from the objective functions and then make them as constraints. We illustrate the transformation using (17). For (18), we can follow the same way.

In (17), the minimization of the non-convex part \(-\left\| -f_1({\mathbf {X}}_2)-\overline{f_1({{\mathbf {X}}_1})}\right\| ^2_2\) is identical to the maximization of \(\left\| -f_1({\mathbf {X}}_2)-\overline{f_1({{\mathbf {X}}_1})}\right\| ^2_2\). We take the latter as constraints and get a variant of (17) in the following:

$$\begin{aligned} \begin{aligned} \min _{{\mathbf {w}}_1, b_1}\,\,\,\,\,\,&\frac{1}{2} \left\| f_1({\mathbf {X}}_1)\right\| ^2_2+C_1\left\| f_1({\mathbf {X}}_1)-\overline{f_1({\mathbf {X}}_1)}\right\| ^2_2, \\ {\mathrm{s.t.}}\,\,\,\,\,\,&\left( -f_1({\mathbf {x}}_{2i})-\overline{f_1({\mathbf {X}}_1)}\right) ^2 \ge \epsilon _1^2,\,\,\, i=1,\dots ,n_2,\\ \end{aligned} \end{aligned}$$
(49)

where \(\epsilon _1\) is a constant greater than or equal to 0.

Moreover, the constraints of (49) can be replaced by

$$\begin{aligned} \left\{ \begin{array}{ll} -f_1({\mathbf {x}}_{2i})-\overline{f_1({\mathbf {X}}_1)} \ge \epsilon _1,\,\,\, i=1\dots ,n_2, \\ -f_1({\mathbf {x}}_{2i})-\overline{f_1({\mathbf {X}}_1)} \le -\epsilon _1,\,\,\, i=1\dots ,n_2. \end{array} \right. \end{aligned}$$
(50)

The second group of inequalities in (50) holding true means that the value of \(\overline{f_1({\mathbf {X}}_1)}\) is as large as possible, and that of \(f_1({\mathbf {x}}_{2i})\) as small as possible due to \(f_1({\mathbf {X}}_2)\le {0}\) and \(f_1({\mathbf {X}}_1)\ge {0}\), which is inconsistent with the goal of minimizing \(\left\| f_1({\mathbf{X}}_1)\right\| ^2\). Consequently, these inequalities of (50) are redundant and can be ignored. Therefore, (49) can be simplified to:

$$\begin{aligned} \begin{aligned} \min _{{\mathbf {w}}_1, b_1}\,\,\,\,\,\,&\frac{1}{2} \left\| f_1({\mathbf {X}}_1)\right\| ^2_2+C_1\left\| f_1({\mathbf {X}}_1)-\overline{f_1({\mathbf {X}}_1)}\right\| ^2_2, \\ {\mathrm{s.t.}}\,\,\,\,\,\,&-f_1({\mathbf {X}}_2)-\overline{f_1({\mathbf {X}}_1)} \ge \epsilon _1 {\mathbf {e}}_2.\\ \end{aligned} \end{aligned}$$
(51)

Considering the case of outlier or noise existing in data, we relax the constraints by adding slack variables \(\xi _{2i}\ge 0\) and set \(\epsilon _1=1\). Finally, we have the convex variant of (17) as follows:

$$\begin{aligned} \begin{aligned} \min _{{\mathbf {w}}_1, b_1,{\varvec{\xi }}_2}\,\,\,\,\,\,&\frac{1}{2} ||f_1({\mathbf {X}}_1)||^2_2+C_1\left\| f_1({\mathbf {X}}_1)-\overline{f_1({\mathbf {X}}_1)}\right\| ^2_2+C_3{\mathbf {e}}^T_2{\varvec{\xi }}_2, \\ {\mathrm{s.t.} }\quad \quad&-f_1({\mathbf {X}}_2)-\overline{f_1({\mathbf{X}}_1)}+{\varvec{\xi }}_2\ge {\mathbf {e}}_{2},\,\,\,{\varvec{\xi }}_2\ge {\mathbf {0}}_{n_2}, \end{aligned} \end{aligned}$$

where \(C_3>0\) and \({\varvec{\xi }}_2=[\xi _{21},\dots ,\xi _{2n_2}]^T\).

Thus, we have completed the derivation from (17) to (19). Similarly, the optimization problem (18) also can be derived to (20) in the same way.

1.3 Appendix 3: Proof of Theorem 2

Here, we need to prove that the objective functions and constraints of optimization problems (19) and (20) are convex. Since the structure of (19) and (20) is similar, we discuss only one of them in detail.

To simplify the expression of (19), we use \(J_1({\mathbf {w}}_1,b_1)\) (42) and \(J_2({\mathbf {w}}_1)\) (43) in “Appendix 1” and define \(J_4({\varvec{\xi }}_2)=C_3{\mathbf {e}}_2^T{\varvec{\xi }}_2\). Therefore, (19) can be rewritten as

$$\begin{aligned} \begin{aligned} \min _{{\mathbf {w}}_1, b_1, {\varvec{\xi }}_2} \,\,\,\,\,\,&J_1({\mathbf {w}}_1, b_1)+J_2({\mathbf {w}}_1)+J_3({\varvec{\xi }}_2),\\ {\mathrm{s.t.}} \,\,\,\,\,\,&J_{c_1}({\varvec{\nu }}_2)\ge {\mathbf {0}}_{n_2} ,\\&J_{c_2}({\varvec{\nu }}_2)\ge {\mathbf {0}}_{n_2},\\ \end{aligned} \end{aligned}$$
(52)

where \({\varvec{\nu }}_2=[{\mathbf {w}}_1^T,b_1,{\varvec{\xi }}_2]^T\), \(J_{c_1}({\varvec{\nu }}_2) = {\mathbf {G}}_1 {\varvec{\nu }}_2^T\) with \({\mathbf {G}}_1 = \left[ -{\mathbf {X}}_2-{\mathbf {e}}_2{\mathbf {m}}_1^T,\,-2{\mathbf {e}}_2,\,{\mathbf{I}}_{n_2\times n_2}\right]\), and \(J_{c_2}({\varvec{\nu }}_2) = {\mathbf {G}}_2 {\varvec{\nu }}_2^T\) with \({\mathbf {G}}_2 = \left[ {\mathbf {O}}_{n_2\times m},\,{\mathbf {0}}_{n_2},\,{\mathbf {I}}_{n_2\times n_2}\right]\).

According to the proof of Theorem 1, we know that both \(J_1({\mathbf {w}}_1, b_1)\) and \(J_2({\mathbf {w}}_1)\) are convex functions. In addition, \(J_3({\varvec{\xi }}_2)\) is also convex since it is a linear function. Thus, the objective function of the problem (19) is convex.

Next, we discuss the constraints \(J_{c_1}({\varvec{\nu }}_2)\) and \(J_{c_2}({\varvec{\nu }}_2)\). The second-order derivative of constraints can be described as:

$$\begin{aligned} \begin{aligned}&\frac{\partial ^2 J_{c_1}({\varvec{\nu }}_2)}{\partial {\varvec{\nu }}_2\partial {\varvec{\nu }}_2}={\mathbf {0}}, \\&\frac{\partial ^2 J_{c_2}({\varvec{\nu }}_2)}{\partial {\varvec{\nu }}_2\partial {\varvec{\nu }}_2}={\mathbf {0}}. \\ \end{aligned} \end{aligned}$$
(53)

We know that if the second-order derivative of a function is greater than or equal to 0, this function is a convex function. Therefore, both \(J_{c_1}({\varvec{\nu }}_2)\) and \(J_{c_2}({\varvec{\nu }}_2)\) are convex.

Since the objective and constraint functions of (19) are convex, the optimization problem (19) is a convex programming. In a similar way, we can prove that the optimization problem (20) is also convex.

It completes the proof.

1.4 Appendix 4: Derivation from (23) to (25)

For convenience, we write the optimization problem (23) again. Namely,

$$\begin{aligned} \begin{aligned} \min _{{\varvec{\beta }}_1^*,{\varvec{\beta }}_1 ,\gamma _1^*, \gamma _1,{\varvec{\xi }}_2} \quad&\frac{1}{2} ||{\mathbf {X}}_1\left( {\varvec{\beta }}_1^*-{\varvec{\beta }}_1\right) +{\mathbf {e}}_1 (\gamma _1^*-\gamma _1)||^2_2+{{C_1}}||({\mathbf {X}}_1-{\mathbf {e}}_1{\mathbf {m}}_1^T)({\varvec{\beta }}_1^* - {\varvec{\beta }}_1)||^2_2\\&+C_3{\mathbf {e}}_2^T {\varvec{\xi }}_2+C_5\left( \Vert {\varvec{\beta }}_1^*\Vert _1+\Vert {\varvec{\beta }}_1\Vert _1 +\gamma ^*_{1}+\gamma _{1}\right) , \\ {\mathrm{s.t.}} \quad \quad&-({\mathbf {X}}_2+{\mathbf {e}}_2{\mathbf {m}}_1^T)\left( {\varvec{\beta }}_1^*-{\varvec{\beta }}_1\right) -2{\mathbf {e}}_2\left( \gamma _1^*-\gamma _1\right) +{\varvec{\xi} }_2 \ge {\mathbf {e}}_2, \\ \quad&{\varvec{\xi }}_2 \ge {\mathbf {0}}_{n_2},\,\,{\varvec{\beta} }_1^*\ge {\mathbf {0}}_{m},\,\,{\varvec{\beta }}_1 \ge {\mathbf {0}}_{m},\gamma _1^*\ge 0,\gamma _1\ge 0, \end{aligned} \end{aligned}$$

The first term in the objective function of the optimization problem (23) can be expressed as:

$$\begin{aligned} \begin{aligned}&\frac{1}{2} ||{\mathbf {X}}_1({\varvec{\beta }}_1^*-{\varvec{\beta }}_1)+{\mathbf {e}}_1 (\gamma _1^*-\gamma _1)||^2_2 \\&\quad = \frac{1}{2} \left( {\mathbf {X}}_1({\varvec{\beta }}_1^*-{\varvec{\beta }}_1)+{\mathbf {e}}_1 (\gamma _1^*-\gamma _1)\right) ^T\left( {\mathbf {X}}_1({\varvec{\beta }}_1^*-{\varvec{\beta }}_1)+{\mathbf {e}}_1(\gamma _1^*-\gamma _1)\right) \\&\quad = \frac{1}{2}({\varvec{\beta }}_1^*-{\varvec{\beta }}_1)^T {\mathbf {X}}_1^T {\mathbf {X}}_1({\varvec{\beta }}_1^*-{\varvec{\beta }}_1)+\frac{1}{2}({\varvec{\beta }}_1^*-{\varvec{\beta }}_1)^T{\mathbf {X}}_1^T {\mathbf {e}}_1 (\gamma _1^*-\gamma _1) \\&\qquad + \frac{1}{2}(\gamma _1^*-\gamma _1) {\mathbf {e}}_1^T {\mathbf {X}}_1 ({\varvec{\beta }}_1^*-{\varvec{\beta }}_1) + \frac{1}{2}(\gamma _1^*-\gamma _1) {\mathbf {e}}_1^T {\mathbf {e}}_1 (\gamma _1^*-\gamma _1) \\&\quad = {{\varvec{\beta }}_1^*}^T{\mathbf {X}}_1^T {\mathbf {X}}_1 {{\varvec{\beta }}_1^*} -{{\varvec{\beta }}_1}^T{\mathbf {X}}_1^T {\mathbf {X}}_1 {{\varvec{\beta }}_1^*} -{{\varvec{\beta }}_1^*}^T{\mathbf {X}}_1^T {\mathbf {X}}_1 {{\varvec{\beta }}_1} +{{\varvec{\beta }}_1}^T{\mathbf {X}}_1^T {\mathbf {X}}_1 {{\varvec{\beta }}_1} \\&\qquad +{{\varvec{\beta }}_1^*}^T{\mathbf {X}}_1^T {\mathbf {e}}_1 {\gamma _1^*} -{{\varvec{\beta }}_1^*}^T{\mathbf {X}}_1^T {\mathbf {e}}_1 {\gamma _1} -{{\varvec{\beta }}_1}^T{\mathbf {X}}_1^T {\mathbf {e}}_1 {\gamma _1^*} +{{\varvec{\beta }}_1}^T{\mathbf {X}}_1^T {\mathbf {e}}_1 {\gamma _1} \\&\qquad +{\gamma _1^*}{\mathbf {e}}_1^T {\mathbf {X}}_1 {{\varvec{\beta }}_1^*} -{\gamma _1}{\mathbf {e}}_1^T {\mathbf {X}}_1 {{\varvec{\beta }}_1^*} -{\gamma _1^*}^T{\mathbf {e}}_1^T {\mathbf {X}}_1 {{\varvec{\beta }}_1} +{\gamma _1}^T{\mathbf {e}}_1^T {\mathbf {X}}_1 {{\varvec{\beta }}_1}\\&\qquad +\gamma _1^*{\mathbf {e}}_1^T {\mathbf {e}}_1 \gamma _1^* -\gamma _1^*{\mathbf{e}}_1^T {\mathbf {e}}_1 \gamma _1 -\gamma _1 {\mathbf {e}}_1^T {\mathbf {e}}_1 \gamma _1^* +\gamma _1 {\mathbf {e}}_1^T {\mathbf {e}}_1 \gamma _1\\&\quad =\frac{1}{2}[{{\varvec{\beta }}_1^*}^T,\,{\varvec{\beta }}_1^{T},\,\gamma _1^*,\,\gamma _1] \left[ \begin{array}{cccc} {{\mathbf {X}}_1^T} {\mathbf {X}}_1 &\quad -{{\mathbf {X}}_1^T} {\mathbf {X}}_1 &\quad 0.5{{\mathbf {X}}_1^T}{\mathbf {e}}_1 &\quad -0.5{{\mathbf {X}}_1^T}{\mathbf {e}}_1 \\ -{{\mathbf {X}}_1^T} {\mathbf {X}}_1 &\quad {{\mathbf {X}}_1^T} {\mathbf {X}}_1 &\quad -0.5{{\mathbf {X}}_1^T} {\mathbf {e}}_1 &\quad 0.5{{\mathbf {X}}_1^T} {\mathbf {e}}_1 \\ 0.5{{\mathbf {e}}_1^T} {\mathbf {X}}_1 &\quad -0.5{{\mathbf {e}}_1^T} {\mathbf {X}}_1 &\quad {{\mathbf {e}}_1^T}{{\mathbf {e}}_1} &\quad -{{\mathbf {e}}_1^T}{{\mathbf {e}}_1} \\ -0.5{{\mathbf {e}}_1^T} {\mathbf {X}}_1 &\quad 0.5{{\mathbf {e}}_1^T} {\mathbf {X}}_1 &\quad -{{\mathbf {e}}_1^T}{{\mathbf {e}}_1} &\quad {{\mathbf {e}}_1^T}{{\mathbf {e}}_1} \end{array} \right] [{{\varvec{\beta }}_1^*}^T,\,{\varvec{\beta} }_1^{T},\,\gamma _1^*,\,\gamma _1]^T. \end{aligned} \end{aligned}$$
(54)

The second term in the objective function of (23) can be rewritten as:

$$\begin{aligned} \begin{aligned}&{{C_1}}||({\mathbf {X}}_1-{\mathbf {m}}_1)({\varvec{\beta }}_1^* - {\varvec{\beta }}_1)||^2_2\\&\quad ={{C_1}}\left( ({\mathbf {X}}_1-{\mathbf {m}}_1)({\varvec{\beta }}_1^* - {\varvec{\beta }}_1)\right) ^T\left( ({\mathbf {X}}_1-{\mathbf {m}}_1)({\varvec{\beta }}_1^* - {\varvec{\beta }}_1)\right) \\&\quad =C_1\left( {{\varvec{\beta }}_1^*}^T{\mathbf {X}}_0^T{\mathbf {X}}_0{\varvec{\beta }}_1^* -{{\varvec{\beta }}_1^*}^T{\mathbf {X}}_0^T{\mathbf {X}}_0{\varvec{\beta }}_1 -{{\varvec{\beta }}_1^{T}}{\mathbf {X}}_0^T{\mathbf {X}}_0{\varvec{\beta }}_1^* -{{\varvec{\beta }}_1^{T}}{\mathbf {X}}_0^T{\mathbf {X}}_0{\varvec{\beta }}_1 \right) \\&\quad =[{{\varvec{\beta }}_1^*}^T,\,{\varvec{\beta }}_1^{T}] \left[ \begin{array}{cc} {{\mathbf {X}}_0^T} {\mathbf {X}}_0 &\quad -{{\mathbf {X}}_0^T} {\mathbf {X}}_0 \\ -{{\mathbf {X}}_0^T} {\mathbf {X}}_0 &\quad {{\mathbf {X}}_0^T} {\mathbf {X}}_0 \end{array} \right] {[{{\varvec{\beta }}_1^*}^T,\,{\varvec{\beta }}_1^{T}]}^T, \end{aligned} \end{aligned}$$
(55)

where \({\mathbf {X}}_0=C_1^{\frac{1}{2}}({\mathbf {X}}_1-{\mathbf {e}}_1{\mathbf{m}}_1^T)\).

Next, we rewrite the last two terms in the objective of (23) as:

$$\begin{aligned} \begin{aligned}&C_3\left( \Vert {\varvec{\beta }}_1^*\Vert _1+\Vert {\varvec{\beta }}_1\Vert _1+\gamma _1^{*}+\gamma _1\right) +C_5{\mathbf {e}}_2^T {\varvec{\xi }}_2 \\&\quad = C_3 \left( 1_{m}^T{\varvec{\beta }}_1^*+1_{m}^T{\varvec{\beta }}_1+\gamma _1^*+\gamma _1\right) +C_5{\mathbf {e}}_2^T{\varvec{\xi }}_2 \\&\quad = \left[ \begin{array}{ccccc} C_3 {\mathbf {1}}_{m}^T,\, &\quad C_3 {\mathbf {1}}_{m}^T,\, &\quad C_3,\, &\quad C_3,\, &\quad C_5{\mathbf {e}}_2^T \\ \end{array}\right] ^T\left[ \begin{array}{ccccc} {{\varvec{\beta }}_1^{*T}},\,&\quad {{\varvec{\beta }}_1^T},\,&\quad \gamma _1^*,\,&\quad \gamma _1,\,&\quad {{\varvec{\xi }}_2^T}\end{array} \right] . \end{aligned} \end{aligned}$$
(56)

The inequality constraints in (23) can be directly written as:

$$\begin{aligned} \begin{aligned}&-({\mathbf {X}}_2+{\mathbf {m}}_1)\left( {\varvec{\beta }}_1^*-{\varvec{\beta }}_1\right) -2{\mathbf {e}}_2\left( \gamma _1^*-\gamma _1\right) +{\varvec{\xi }}_2 \ge {\mathbf {e}}_2 \\&\quad \Rightarrow -({\mathbf {X}}_2+{\mathbf {m}}_1) {\varvec{\beta }}_1^*+({\mathbf {X}}_2+{\mathbf {m}}_1) {\varvec{\beta }}_1-2{\mathbf {e}}_2\gamma _1^*+2{\mathbf {e}}_2\gamma _1+{\varvec{\xi }}_2 \ge {\mathbf {e}}_2 \\&\quad \Rightarrow \left[ \begin{array}{ccccc} -{\mathbf {X}}_2-{\mathbf {m}}_1, &\quad {\mathbf {X}}_2+{\mathbf {m}}_1,\, &\quad -2{\mathbf {e}}_2, &\quad 2{\mathbf {e}}_2, &\quad {\mathbf {I}}_{{n_2}\times {n_2}} \\ \end{array} \right] \left[ \begin{array}{ccccc} {{\varvec{\beta }}_1^{*T}},\,&\quad {{\varvec{\beta }}_1^T},\,&\quad \gamma _1^*,\,&\quad \gamma _1,\,&\quad {{\varvec{\xi }}_2^T}\end{array} \right] ^T \ge {\mathbf {e}}_2. \end{aligned} \end{aligned}$$
(57)

For simplicity, we denote \({\varvec{\alpha }}_1=\left[ \begin{array}{ccccc} {{\varvec{\beta }}_1^{*T}},\,&\quad {{\varvec{\beta }}_1^T},\,&\quad \gamma _1^*,\,&\quad \gamma _1,\,&\quad {{\varvec{\xi }}_2^T}\end{array} \right] ^T \in {\mathbb {R}}^{(2m+2+n_2)}\). Thus, the object function of optimization problem (23) can be represented as (25). Namely,

$$\begin{aligned} \begin{aligned} \min _{{\varvec{\alpha }}_1}\,\,\,\,&\frac{1}{2}{\varvec{\alpha }}_1^T{\mathbf {Q}}_1{\varvec{\alpha }}_1+{\mathbf {H}}_1^T{\varvec{\alpha }}_1, \\ {\mathrm{s.t.}} \quad&{\mathbf {P}}_1{\varvec{\alpha }}_1\ge {\mathbf{e}}_2,\,\,{\varvec{\alpha }}_1\ge {\mathbf {0}}_{2m+2+n_2}, \end{aligned} \end{aligned}$$

where

$$\begin{aligned} {\mathbf {Q}}_1= \left[ \begin{array}{ll} {\mathbf {Q}}'_1 & {\mathbf {O}}_{(2m+2)\times n_2}\\ {\mathbf {O}}_{n_2\times (2m+2)} & {\mathbf {O}}_{n_2\times n_2} \end{array} \right] \end{aligned}$$

with

$$\begin{aligned} \begin{aligned} {\mathbf {Q}}'_1&= \left[ \begin{array}{cccc} {{\mathbf {X}}_1^T} {\mathbf {X}}_1+{{\mathbf {X}}_0^T} {\mathbf {X}}_0 &\quad -{{\mathbf {X}}_1^T}{\mathbf {X}}_1-{{\mathbf {X}}_0^T} {\mathbf {X}}_0 &\quad 0.5{{\mathbf {X}}_1^T}{\mathbf {e}}_1 &\quad -0.5{{\mathbf {X}}_1^T}{\mathbf {e}}_1 \\ -{{\mathbf {X}}_1^T}{\mathbf {X}}_1-{{\mathbf {X}}_0^T} {\mathbf {X}}_0 &\quad {{\mathbf {X}}_1^T}{\mathbf {X}}_1+{{\mathbf {X}}_0^T} {\mathbf {X}}_0 &\quad -0.5{{\mathbf {X}}_1^T} {\mathbf {e}}_1 &\quad 0.5{{\mathbf {X}}_1^T} {\mathbf {e}}_1\\ 0.5{{\mathbf {e}}_1^T} {\mathbf {X}}_1 &\quad -0.5{{\mathbf {e}}_1^T} {\mathbf {X}}_1 &\quad {{\mathbf {e}}_1^T}{{\mathbf {e}}_1} &\quad -{{\mathbf {e}}_1^T}{{\mathbf {e}}_1} \\ -0.5{{\mathbf {e}}_1^T} {\mathbf {X}}_1 &\quad 0.5{{\mathbf {e}}_1^T} {\mathbf {X}}_1 &\quad -{{\mathbf {e}}_1^T}{{\mathbf {e}}_1} &\quad {{\mathbf {e}}_1^T}{{\mathbf {e}}_1} \end{array}\right] ,\\ {\mathbf {H}}_1&=\left[ \begin{array}{ccccc} C_3 {\mathbf {1}}_{m}^T,\, &\quad C_3 {\mathbf {1}}_{m}^T, &\quad C_3, &\quad C_3, & \quad C_5{\mathbf {e}}_2^T \\ \end{array}\right] ^T, \end{aligned} \end{aligned}$$

and

$$\begin{aligned} {\mathbf {P}}_1=\left[ \begin{array}{ccccc} -{\mathbf {X}}_2-{\mathbf {m}}_1, &\quad {\mathbf {X}}_2+{\mathbf {m}}_1,\, & \quad -2{\mathbf {e}}_2, &\quad 2{\mathbf {e}}_2, & \quad {\mathbf {I}}_{{n_2}\times {n_2}} \\ \end{array} \right] . \end{aligned}$$

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zheng, X., Zhang, L. & Yan, L. Sparse discriminant twin support vector machine for binary classification. Neural Comput & Applic 34, 16173–16198 (2022). https://doi.org/10.1007/s00521-022-07001-1

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00521-022-07001-1

Keywords

Navigation