Skip to main content
Log in

LMSVCR: novel effective method of semi-supervised multi-classification

  • Original Article
  • Published:
Neural Computing and Applications Aims and scope Submit manuscript

Abstract

The previously known works studying the learning performance of multi-classification algorithm are usually based on supervised samples, but large amount of data generated in real-life is usually unlabeled. This paper introduces a novel Laplacian multi-classification support vector classification and regression (LMSVCR) algorithm for the case of semi-supervised learning. We first establish the fast learning rate of LMSVCR algorithm with semi-supervised multi-classification samples, and prove that LMSVCR algorithm with semi-supervised multi-classification samples is consistent. We show the numerical investigation on the learning performance of LMSVCR algorithm. The experimental studies indicate that the proposed LMSVCR algorithm has better learning performance in terms of prediction accuracy, sampling and training total time than other semi-supervised multi-classification algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

Notes

  1. http://archive.ics.uci.edu/ml/datasets.html.

References

  1. Altun Y, McAllester D, Belkin M (2005) Maximum margin semi-supervised learning for structured variables. Adv Neural Inf Process Syst 18:33–40

    Google Scholar 

  2. Hady MFA, Schwenker F (2013) Semi-supervised learning. Handbook on Neural Information Processing. Springer, Berlin, Heidelberg, pp 215–239

    Chapter  Google Scholar 

  3. Chapelle O, Schölkopf B, Zien A (2009) Semi-supervised learning. IEEE Trans Neural Netw 20(3):542–542

    Article  Google Scholar 

  4. Zhu XJ (2005) Semi-supervised learning literature survey. University of Wisconsin-Madison Department of Computer Sciences

  5. Liu Y, Liu W, Obaid MA, Abbas IA (2016) Exponential stability of Markovian jumping cohen Cgrossberg neural networks with mixed mode-dependent time-delays. Neurocomputing 177:409–415

    Article  Google Scholar 

  6. Du B, Liu Y, Abbas IA (2016) Existence and asymptotic behavior results of periodic solution for discrete-time neutral-type neural networks. J Franklin Institute 353(2):448–461

    Article  MathSciNet  Google Scholar 

  7. Rebai I, BenAyed Y, Mahdi W (2016) Deep multilayer multiple kernel learning. Neural Comput. Appl. 27:2305–2314

    Article  Google Scholar 

  8. Li X, Mao W, Jiang W (2016) Multiple-kernel-learning-based extreme learning machine for classification design. Neural Comput. Appl. 27:175–184

    Article  Google Scholar 

  9. Carballal A, Fernandez-Lozano C, Heras J, Romero J (2020) Transfer learning features for predicting aesthetics through a novel hybrid machine learning method. Neural Comput. Appl. 32:5889–5900

    Article  Google Scholar 

  10. Joachims T (1999) Transductive inference for text classification using support vector machines. Int Conf Mach Learn 99:200–209

    Google Scholar 

  11. Belkin M, Niyogi P, Sindhwani V (2006) Manifold regularization: a geometric framework for learning from labeled and unlabeled examples. J Mach Learn Res 7(11):2399–2434

    MathSciNet  MATH  Google Scholar 

  12. Bennett K, Mangasarian OL (1999) Combining support vector and mathematical programming methods for induction. Advances in Kernel Methods-SV Learning 307–326

  13. Weston J, Watkins C (1999) Support vector machines for multi-class pattern recognition. In Esann, pp. 219–224

  14. Lee Y, Lin Y, Wahba G (2004) Multicategory support vector machines: theory and application to the classification of microarray data and satellite radiance data. J Am Stat Assoc 99(465):67–81

    Article  MathSciNet  Google Scholar 

  15. Bottou L, Cortes C, Denker JS, Drucker H, Guyon I, Jackel LD, LeCun Y, Muller UA, Sackinger E, Simard P, Vapnik V (1994) Comparison of classifier methods: a case study in handwritten digit recognition. In Proceedings of the 12th IAPR International Conference on Pattern Recognition, pp. 77–82

  16. Krebel UHG (1999) Pairwise classification and support vector machines. In Advances in kernel methods: support vector learning, pp. 255–268

  17. Angulo C, Parra X, Catala A (2003) K-SVCR. A support vector machine for multi-class classification. Neurocomputing 55(1–2):57–77

    Article  Google Scholar 

  18. Aronszajn N (1950) Theory of reproducing kernels. Trans Am Math Soc 68(3):337–404

    Article  MathSciNet  Google Scholar 

  19. Evgeniou T, Pontil M, Poggio T (2000) Regularization networks and support vector machines. Adv Comput Math 13(1):1–50

    Article  MathSciNet  Google Scholar 

  20. Feng Y, Yang Y, Zhao Y, Lv S, Suykens JA (2014) Learning with kernelized elastic net regularization. KU Leuven, Leuven Belgium

    Google Scholar 

  21. Xu Y, Yang Z (2014) Elastic-net regression algorithm based on multi-scale gaussian kernel. Sci J Inf Eng 4(1):19–25

    Google Scholar 

  22. Wang W, Xu Z, Lu W, Zhang X (2003) Determination of the spread parameter in the Gaussian kernel for classification and regression. Neurocomputing 55(3–4):643–663

    Article  Google Scholar 

  23. Wu Q, Zhou DX (2005) SVM soft margin classifiers: linear programming versus quadratic programming. Neural Comput 17(5):1160–1187

    Article  MathSciNet  Google Scholar 

  24. Wu Q, Ying Y, Zhou DX (2006) Learning rates of least-square regularized regression. Foundations Comput Math 6(2):171–192

    Article  MathSciNet  Google Scholar 

  25. Lv SG, Zhou F (2015) Optimal learning rates of \(l^{p}\)-type multiple kernel learning under general conditions. Inf Sci 294:255–268

    Article  Google Scholar 

  26. Chen DR, Wu Q, Ying Y, Zhou DX (2004) Support vector machine soft margin classifiers: error analysis. J Mach Learn Res 5:1143–1175

    MathSciNet  MATH  Google Scholar 

  27. Tong H, Chen DR, Peng L (2009) Analysis of support vector machines regression. Foundations Comput Math 9(2):243–257

    Article  MathSciNet  Google Scholar 

  28. Chen DR, Xiang DH (2006) The consistency of multicategory support vector machines. Adv Comput Math 24(1–4):155–169

    Article  MathSciNet  Google Scholar 

  29. Chen H, Li L (2009) Semisupervised multicategory classification with imperfect model. IEEE Trans Neural Netw 20(10):1594–1603

    Article  Google Scholar 

  30. Bamakan SMH, Wang H, Shi Y (2017) Ramp loss k-support vector classification-regression; a robust and sparse multi-class approach to the intrusion detection problem. Knowledge-Based Syst 126:113–126

    Article  Google Scholar 

  31. Huang CL, Dun JF (2008) A distributed PSO CSVM hybrid system with feature selection and parameter optimization. Appl Soft Comput 8(4):1381–1391

    Article  Google Scholar 

  32. Lin SW, Ying KC, Chen SC, Lee ZJ (2008) Particle swarm optimization for parameter determination and feature selection of support vector machines. Expert Syst Appl 35(4):1817–1824

    Article  Google Scholar 

  33. Qian M, Nie F, Zhang C (2009) Efficient multi-class unlabeled constrained semi-supervised SVM. In Proceedings of the 18th ACM conference on Information and knowledge management, pp. 1665–1668

  34. Pan H, Kang Z (2018) Robust graph learning for semi-supervised classification. In 2018 10th International Conference on Intelligent Human-Machine Systems and Cybernetics, pp 265–268

  35. Wilcoxon F (1992) Individual comparisons by ranking methods. Breakthroughs in statistics. Springer, New York, pp 196–202

    Chapter  Google Scholar 

  36. Cucker F, Smale S (2002) Best choices for regularization parameters in learning theory: on the bias-variance problem. Foundations Comput Math 2(4):413–428

    Article  MathSciNet  Google Scholar 

Download references

Acknowledgements

This work is supported in part by National Nature Science Foundation  of China (No. 61772011), and Open Project Foundation of Intelligent Information Processing Key Laboratory of Shanxi Province (No. CICIP2018002), and National Key Research and Development Program of China (No. 2020YFA0714200).

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Bin Zou or Jie Xu.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest regarding the publication of this paper.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix A

To bound the excess generalization error \({{\mathcal {E}}}(f_{\mathbf{z}}) - {{\mathcal {E}}}(f_B)\) of LMSVCR, by Proposition 1, we should estimate the errors \(T_1, T_2, T_3\). Thus we first present the main tools as follows:

Lemma 1

[24] Let \(\xi\) be a random variable on a probability space Z with mean \(E(\xi )\), variance \(\sigma ^2(\xi ) = \sigma ^2\), and satisfying \(|\xi (z) - E(\xi )| \le M_{\xi }\) for almost all \(z \in Z\). Then for all \(\varepsilon > 0\),

$$\begin{aligned} \mathrm{P} \left \{ \frac{1}{m}\sum _{i=1}^{m} \xi (z_i) - E(\xi ) \ge \varepsilon \right \} \le \exp \left \{ -\frac{m \varepsilon ^2}{2(\sigma ^2 + \frac{1}{3} M_{\xi } \varepsilon )}\right \}. \end{aligned}$$

Lemma 2

[24] Let \({\mathcal {G}}\) be a set of functions on Z such that for some \({c_\rho } \ge 0\), \(|g - E(g)| \le B\) almost everywhere and \(E(g^2) \le {c_\rho }E(g)\) for each \(g \in {{\mathcal {G}}}\) Then for every \(\varepsilon > 0\) and \(0 < \alpha \le 1\),

$$P\left\{ {\mathop {\sup \frac{{E\left( g \right) - \frac{1}{m}\sum\nolimits_{{i = 1}}^{m} {g\left( {z_{i} } \right)} }}{{\sqrt {E\left( g \right) + \varepsilon } }} \ge 4\alpha \sqrt \varepsilon }\limits_{{g \in \mathcal {G}}} } \right\} \le {\mathcal {N}}\left( {\mathcal {G},\alpha \varepsilon } \right){\text{exp}}\left\{ { - \frac{{\alpha ^{2} m\varepsilon }}{{2c_{\rho } + \frac{2}{3}B}}} \right\}$$

Lemma 3

[36] Let \(c_1,c_2 > 0\), \(p_1> p_2 > 0\). The equation \(x^{p_1} - c_1 x^{p_2} - c_2 = 0\) has a unique positive zero \(x^{*}\). And \(x^{*} \le \max \{(2 c_1)^{1/(p_1-p_2)}, (2 c_2)^{1/{p_1}}\}\).

Proof of Proposition 1:

Since for any \({{\mathbf {z}}} \in {Z}^m\), \(\lambda _1\Vert f_{{\mathbf {z}}}\Vert _{{\mathcal {K}}}^2\ge 0, \lambda _2\Vert f_{{\mathbf {z}}}\Vert _I^2 \ge 0\), we have the following error decomposition

$$\begin{aligned}&{{\mathcal {E}}}(f_{{\mathbf {z}}}) - {{\mathcal {E}}}(f_B)\\&\quad \le {{\mathcal {E}}}(f_{{\mathbf {z}}}) - {{\mathcal {E}}}(f_B) + \lambda _1\Vert f_{{\mathbf {z}}}\Vert _{{\mathcal {K}}}^2 + \lambda _2\Vert f_{{\mathbf {z}}}\Vert _I^2\\&\quad = \big \{{{\mathcal {E}}}(f_{{\mathbf {z}}}) - {{{\mathcal {E}}}_{\mathbf{z}}(f_{{\mathbf {z}}})} + {{{\mathcal {E}}}_{{\mathbf {z}}}(f_{{\mathbf {z}}})} - {{{\mathcal {E}}}(f_B)} + {{{\mathcal {E}}}(f_B)}\\&\qquad + {{{\mathcal {E}}}_{{\mathbf {z}}}(f_B)} - {{{\mathcal {E}}}_{{\mathbf {z}}}(f_B)}\\&\qquad + {{{\mathcal {E}}}_{{\mathbf {z}}}(f_{\lambda _1})} - {{\mathcal E}_{{\mathbf {z}}}(f_{\lambda _1})} - {{{\mathcal {E}}}(f_{\lambda _1})} + {{{\mathcal {E}}}(f_{\lambda _1})} - {{{\mathcal {E}}}_{\mathbf{z}}(f_{{{\mathbf {z}}},\lambda _1})}\\&\qquad + {{{\mathcal {E}}}_{\mathbf{z}}(f_{{{\mathbf {z}}},\lambda _1})} - {{\mathcal {E}}}(f_B)\big \}\\&\qquad +\big \{ {\lambda _1\Vert f_{{{\mathbf {z}}},\lambda _1}\Vert _{{\mathcal {K}}}^2} - {\lambda _1\Vert f_{{{\mathbf {z}}},\lambda _1}\Vert _{{\mathcal {K}}}^2} + {\lambda _1\Vert f_{\lambda _1}\Vert _{{\mathcal {K}}}^2} - {\lambda _1\Vert f_{\lambda _1}\Vert _{{\mathcal {K}}}^2}\\&\qquad + \lambda _1\Vert f_{{\mathbf {z}}}\Vert _{{\mathcal {K}}}^2 + \lambda _2\Vert f_{{\mathbf {z}}}\Vert _I^2\big \} \\&\quad =\big \{{{\mathcal {E}}}(f_{{\mathbf {z}}}) - {{\mathcal {E}}}_{{\mathbf {z}}}(f_{{\mathbf {z}}}) - {{\mathcal {E}}}(f_B) \\&\qquad + {{\mathcal {E}}}_{{\mathbf {z}}}(f_B)\big \}\\&\qquad +\big \{{{\mathcal {E}}}_{{\mathbf {z}}}(f_{{\mathbf {z}}}) - {\mathcal E}_{{\mathbf {z}}}(f_{{{\mathbf {z}}},\lambda _1})\\&\qquad + \lambda _1\Vert f_{\mathbf{z}}\Vert _{{\mathcal {K}}}^2 +\lambda _2\Vert f_{{\mathbf {z}}}\Vert _I^2 - \lambda _1\Vert f_{{{\mathbf {z}}},\lambda _1}\Vert _{{\mathcal {K}}}^2 \big \}\\&\qquad +\big \{{{\mathcal {E}}}_{{\mathbf {z}}}(f_{\lambda _1}) - {{\mathcal {E}}}(f_{\lambda _1}) - {{\mathcal {E}}}_{{\mathbf {z}}}(f_B)\\&\qquad + {{\mathcal {E}}}(f_B)\big \}\\&\qquad +\big \{{{\mathcal {E}}}_{{\mathbf {z}}}(f_{{{\mathbf {z}}},\lambda _1}) - {{\mathcal {E}}}_{{\mathbf {z}}}(f_{\lambda _1}) +\\&\qquad \lambda _1\Vert f_{{{\mathbf {z}}},\lambda _1}\Vert _{{\mathcal {K}}}^2 - \lambda _1\Vert f_{\lambda _1}\Vert _{{\mathcal {K}}}^2\big \}\\&\qquad +\big \{{{\mathcal {E}}}(f_{\lambda _1}) - {{\mathcal {E}}}(f_B) + \lambda _1\Vert f_{\lambda _1}\Vert _{{\mathcal {K}}}^2\big \}\\&\quad =: T_1 + T_2 + T_3 + T_4 + D(\lambda _1)\\&\quad \le T_1 + T_2 + T_3 + D(\lambda _1). \end{aligned}$$

The last inequality above is follows from the fact that \(T_4 \le 0\) since by the definition \(f_{{\mathbf {z}},\lambda _1}\), we have \({{\mathcal {E}}}_{{\mathbf {z}}}(f_{{{\mathbf {z}}},\lambda _1}) + \lambda _1\Vert f_{{{\mathbf {z}}},\lambda _1}\Vert _{{\mathcal {K}}}^2 \le {\mathcal E}_{{\mathbf {z}}}(f_{\lambda _1}) + \lambda _1\Vert f_{\lambda _1}\Vert _{\mathcal K}^2.\) where \(T_1, T_2, T_3\) are defined in Proposition 1. Then we complete the proof of Proposition 1. \(\square\)

Proposition 4

Assume \({{\mathbf {z}}} = \{z_i\}_{i=1}^m \in {Z}^m\) are i.i.d. sample set. We have that for any \(0< \delta < 1\), with confidence at least \(1 - \delta /2\),

$$\begin{aligned} T_1&\le \frac{1}{2}[{{\mathcal {E}}}(f_{{\mathbf {z}}})-{{\mathcal {E}}}(f_B)] + {\varepsilon }^{*}(m,\delta /2), \end{aligned}$$

where \({\varepsilon }^{*}(m,2/\delta ) = \max \big \{\frac{150C (\kappa +1)^2 R^2 \ln (2/\delta )}{m}, (\frac{150C (\kappa +1)^2 R^2 C_s (4R)^s}{m})^{\frac{1}{s+1}} \big \}\).

Proof

Set \(\zeta _1 = W(y,f) - W(y,f_B)\). Obviously, \(\zeta _1\) varies among a set of functions in accordance with the varying sample \({\mathbf {z}}\). Applying Lemma 2 to the function set

$$\begin{aligned} {{\mathcal {F}}}_{R} = \{ g := W(y, f) - W(y, f_B), f \in {\mathcal B}_{R} \}. \end{aligned}$$

Hence we first make sure that the functions have a bound in \({{\mathcal {F}}}_{R}\). Not only \(E(g) = {{\mathcal {E}}}(f) - {\mathcal E}(f_B) \ge 0\), \(\frac{1}{m} \sum _{i = 1}^{m} g(z_i) = {\mathcal E}_{{\mathbf {z}}}(f) - {{\mathcal {E}}}_{{\mathbf {z}}}(f_B)\), and \(g = C_1 [(1 - y f(x))_{+} - (1 - y f_{B}(x))_{+}] \cdot {{\mathbf {1}}}_{\{y \ne 0\}} + C_2 [(f(x) - f_{B}(x))(f(x) + f_{B}(x))] \cdot {\mathbf{1}}_{\{y = 0\}}.\) But also \(\Vert f\Vert _{\infty } \le \kappa \Vert f\Vert _{{\mathcal {K}}} \le \kappa R\) and \(|f_B(x)| \le M\) almost everywhere. We have

$$\begin{aligned} |g|\le & {} C_{1}(\kappa R + M)+ C_{2}(\kappa R + M)(\kappa R + M)\\\le & {} 2C(\kappa R + M)^2, \end{aligned}$$

where \(C=\max \{C_{1}, C_{2}\}\). Then we get \(|g - E(g)| \le 4C(\kappa R + M)^2\) almost everywhere. Also,

$$\begin{aligned} g^2 =&[ C_1 ((1 - y f(x))_{+} - (1 - y f_{B}(x))_{+}) \cdot {{\mathbf {1}}}_{\{y \ne 0\}} \\&+ C_2 (f^2(x) - f_{B}^2(x)) \cdot {{\mathbf {1}}}_{\{y = 0\}}]^2\\ =&C_1^2 [(1 - y f(x))_{+} - (1 - y f_{B}(x))_{+}]^2 \cdot {{\mathbf {1}}}_{\{y \ne 0\}} \\&+ C_2^2 [ f^2(x) - f_{B}^2(x)]^2 \cdot {{\mathbf {1}}}_{\{y = 0\}}\\ \le&C C_1 [(1 - y f(x))_{+} - (1 - y f_{B}(x))_{+}](\kappa R + M) \cdot {{\mathbf {1}}}_{\{y \ne 0\}} \\&+ C C_2 [ f^2(x) - f_{B}^2(x)]({\kappa }^2 R^2 + M^2) \cdot {{\mathbf {1}}}_{\{y = 0\}}\\ \le&C(\kappa R + M)^2 [C_1 ((1 - y f(x))_{+} - (1 - y f_{B}(x))_{+}) \cdot {{\mathbf {1}}}_{\{y \ne 0\}} \\&+ C_2 (f^2(x) - f_{B}^2(x)) \cdot {{\mathbf {1}}}_{\{y = 0\}}]. \end{aligned}$$

Thus \(E(g^2) \le C(\kappa R + M)^2 E(g)\). Applying Lemma 2 to the function set \({{\mathcal {F}}}_R\), we have that inequality

$$\begin{aligned}&\sup _{f \in {{\mathcal {B}}}_R} \frac{{{\mathcal {E}}}(f)-{\mathcal E}(f_B)-({{\mathcal {E}}}_{{\mathbf {z}}}(f)-{{\mathcal {E}}}_{\mathbf{z}}(f_B))}{\sqrt{{{\mathcal {E}}}(f)-{{\mathcal {E}}}(f_B)+\varepsilon }}\\&\quad = \sup _{g \in {{\mathcal {F}}}_R} \frac{E(g) - \frac{1}{m} \sum _{i=1}^{m} g(z_i)}{\sqrt{E(g) + \varepsilon }} \le \sqrt{\varepsilon }, \end{aligned}$$

is valid with probability at least

$$\begin{aligned}&1-{{\mathcal {N}}}({{\mathcal {F}}}_R, \frac{\varepsilon }{4}) \exp \\&\quad \Big \{-\frac{m \varepsilon }{16(2 \cdot C(\kappa R + M)^2+\frac{2}{3} \cdot 4C(\kappa R + M)^2)} \Big \}\\&\quad \ge 1-{{\mathcal {N}}}({{\mathcal {F}}}_R, \frac{\varepsilon }{4}) \exp \Big \{ -\frac{3m \varepsilon }{224C(\kappa +1)^2 R^2}\Big \}\\&\quad \ge 1-{{\mathcal {N}}}({{\mathcal {F}}}_R, \frac{\varepsilon }{4}) \exp \Big \{ -\frac{m \varepsilon }{75C(\kappa +1)^2 R^2}\Big \}. \end{aligned}$$

Here we use the restriction \(R \ge M\). By Definition 3, we can get

$$\begin{aligned}&\mathrm{P} \left \{ \sup _{f \in {{\mathcal {B}}}_R} \frac{{{\mathcal {E}}}(f)-{{\mathcal {E}}}(f_B)-({{\mathcal {E}}}_{{\mathbf {z}}}(f)-{{\mathcal {E}}}_{{\mathbf {z}}}(f_B))}{\sqrt{{{\mathcal {E}}}(f)-{{\mathcal {E}}}(f_B)+\varepsilon }} \ge \sqrt{\varepsilon } \right \}\\&\quad \le {{\mathcal {N}}}({{\mathcal {F}}}_R, \frac{\varepsilon }{4}) \exp \left \{ -\frac{m \varepsilon }{75C(\kappa +1)^2 R^2}\right \}\\&\quad \le \exp \left \{ C_s (\frac{4R}{\varepsilon })^s - \frac{m \varepsilon }{75C(\kappa +1)^2 R^2}\right \}. \end{aligned}$$

For any \(\delta \in (0,1)\), let

$$\begin{aligned} \delta = \exp \left \{ C_s (\frac{4R}{\varepsilon })^s - \frac{m \varepsilon }{75C(\kappa +1)^2 R^2}\right \}. \end{aligned}$$

It follows that,

$$\begin{aligned} {\varepsilon }^{s+1} - \frac{75C (\kappa +1)^2 R^2 \ln (\frac{1}{\delta })}{m} \cdot {\varepsilon }^s - \frac{75C (\kappa +1)^2 R^2 C_s (4R)^s}{m} = 0. \end{aligned}$$

By Lemma 3, we have \(varepsilon\le {\varepsilon }^{*}(m,\delta )\), where

$${\varepsilon }^{*}(m,\delta )= \max \left \{\frac{150C (\kappa +1)^2 R^2 \ln (\frac{1}{\delta })}{m}, T \left( \frac{150C (\kappa +1)^2 R^2 C_s (4R)^s}{m}\right) ^{\frac{1}{s+1}} \right \}.$$

Because \(\sqrt{\varepsilon }\sqrt{{{\mathcal {E}}}(f) + \varepsilon } \le \frac{1}{2} {{\mathcal {E}}}(f) + \varepsilon\) holds for any \(\varepsilon >0\), we have that for any \(\delta \in (0,1)\), the following inequality holds with the probability at least \(1-\delta\),

$$\begin{aligned}&{{\mathcal {E}}}(f)-{{\mathcal {E}}}(f_B)-({{\mathcal {E}}}_{\mathbf{z}}(f)-{{\mathcal {E}}}_{{\mathbf {z}}}(f_B)) \le \frac{1}{2}[{\mathcal E}(f)-{{\mathcal {E}}}(f_B)]\\&\quad + {\varepsilon }^{*}(m,\delta ). \end{aligned}$$

Replacing f by \(f_{{\mathbf {z}}}\), we have with probability at least \(1-\delta /2\),

$$\begin{aligned} T_1&= {{\mathcal {E}}}(f_{{\mathbf {z}}})-{{\mathcal {E}}}(f_B)-({\mathcal E}_{{\mathbf {z}}}(f_{{\mathbf {z}}})-{{\mathcal {E}}}_{{\mathbf {z}}}(f_B))\\&\le \frac{1}{2}[{{\mathcal {E}}}(f_{{\mathbf {z}}})-{{\mathcal {E}}}(f_B)] + {\varepsilon }^{*}(m,\delta /2) \end{aligned}$$

is valid. This completes the proof of Proposition 4. \(\square\)

Proposition 5

For any \({{\mathbf {z}}} = \{z_i\}_{i=1}^m \in {Z}^m\), \(T_2 \le 1\).

Proof

By the representor theorem in [20], we know that \(f_{{{\mathbf {z}}},\lambda _1}\) can be written as \(f_{{{\mathbf {z}}},\lambda _1} = \sum _{i=1}^{m} {\alpha }_i^{\lambda _1} {{\mathcal {K}}}_{x_i}\), and \(f_{{\mathbf {z}}} = \arg \min \left \{ \lambda _1 \Vert f\Vert _{{\mathcal {K}}}^2 + \lambda _2 \Vert f\Vert _I^2 + {{\mathcal {E}}}_{\mathbf{z}}(f) \right \}\). It follows that

$$\begin{aligned} T_2 \le {{\mathcal {E}}}_{{\mathbf {z}}}(f_{{\mathbf {z}}}) + \lambda _1\Vert f_{{\mathbf {z}}}\Vert _{{\mathcal {K}}}^2 + \lambda _2\Vert f_{\mathbf{z}}\Vert _I^2 \le {{\mathcal {E}}}_{{\mathbf {z}}}(0) \le 1, \end{aligned}$$

where \({{\mathcal {E}}}_{{\mathbf {z}}}(f_{{{\mathbf {z}}},\lambda _1}) \ge 0\) and \(\lambda _1\Vert f_{{{\mathbf {z}}},\lambda _1}\Vert _{{\mathcal {K}}}^2 \ge 0\). Then we accomplished the proof of Proposition 5. \(\square\)

Proposition 6

For any \(0< \delta < 1\), the following inequality holds with the probability at least \(1-\delta /2\),

$$\begin{aligned} T_3 \le D(\lambda _1)\left ( 1 + \frac{7C \kappa ^2 \ln (\frac{2}{\delta })}{m \lambda _1} \right ) + \frac{7 M^2 C \ln (\frac{2}{\delta })}{m} + 1. \end{aligned}$$

Proof

From the definitions of \(f_{\lambda _1}\) and \(D(\lambda _1)\), we have

$$\begin{aligned} \lambda _1 \Vert f_{\lambda _1}\Vert _{{\mathcal {K}}}^2 \le {\mathcal E}(f_{\lambda _1}) - {{\mathcal {E}}}(f_B) + \lambda _1 \Vert f_{\lambda _1}\Vert _{{\mathcal {K}}}^2 = D(\lambda _1). \end{aligned}$$
(9)

It follows from inequality (9), we find that \(|f_{\lambda _1}\Vert _{\infty } \le \kappa \Vert f_{\lambda _1}\Vert _{\mathcal K} \le \kappa \sqrt{D(\lambda _1)/\lambda _1}\). Set

$$\begin{aligned} \zeta _2&= W(y,f_{\lambda _1}) - W(y,f_B)\\&= C_1 (1 - y f_{\lambda _1}(x))_{+} \cdot {{\mathbf {1}}}_{\{y \ne 0\}} + C_2 (y - f_{\lambda _1}(x))^2 \cdot {{\mathbf {1}}}_{\{y = 0\}} \\&- C_1 (1 - y f_{B}(x))_{+} \cdot {{\mathbf {1}}}_{\{y \ne 0\}} - C_2 (y - f_{B}(x))^2 \cdot {{\mathbf {1}}}_{\{y = 0\}}\\&= C_1 [(1 - y f_{\lambda _1}(x))_{+} - (1 - y f_{B}(x))_{+}] \cdot {{\mathbf {1}}}_{\{y \ne 0\}} \\&\quad + C_2 [f_{\lambda _1}^2(x) - f_{B}^2(x)] \cdot {{\mathbf {1}}}_{\{y = 0\}}, \end{aligned}$$

then \(T_3 = \frac{1}{m} \sum _{i=1}^{m} \zeta _2(z_i) - E(\zeta _2)\). Since \(|f_B| \le M\) almost everywhere, we have

$$\begin{aligned} |\zeta _2|&\le | C_1 [(1 - y f_{\lambda _1}(x))_{+} - (1 - y f_{B}(x))_{+}] \cdot {{\mathbf {1}}}_{\{y \ne 0\}} \\&\quad + C_2 [(f_{\lambda _1}(x)-f_{B}(x))(f_{\lambda _1}(x) + f_{B}(x))] \cdot {{\mathbf {1}}}_{\{y = 0\}} |\\&\le C(\kappa \sqrt{D(\lambda _1)/\lambda _1} + M) + C(\kappa \sqrt{D(\lambda _1)/\lambda _1} + M)\\&\quad (\kappa \sqrt{D(\lambda _1)/\lambda _1} + M)\\&\le 2Cb := 2C(\kappa \sqrt{D(\lambda _1)/\lambda _1} + M)^2. \end{aligned}$$

Hence \(|\zeta _2 - E(\zeta _2)| \le M_{\zeta _2} := 4Cb\). Moreover,

$$\begin{aligned} E(\zeta _2^2)&= E[C_1 ((1 - y f_{\lambda _1}(x))_{+} - (1 - y f_{B}(x))_{+}) \cdot {{\mathbf {1}}}_{\{y \ne 0\}} \\&\quad + C_2 (f_{\lambda _1}(x)^2 - f_{B}(x)^2) \cdot {{\mathbf {1}}}_{\{y = 0\}}]^2\\&= E\{C_1 [(1 - y f_{\lambda _1}(x))_{+} - (1 - y f_{B}(x))_{+}]\\&\quad \cdot {{\mathbf {1}}}_{\{y \ne 0\}}\}^2 \\&\quad + E\{C_2 [(f_{\lambda _1}(x)-f_{B}(x))(f_{\lambda _1}(x) + f_{B}(x))]\\&\quad \cdot {{\mathbf {1}}}_{\{y =0\}} \}^2\\&\le C^2(\kappa \sqrt{D(\lambda _1)/\lambda _1} + M)^2 + C^2 \Vert f_{\lambda _1}(x)-f_{B}(x)\Vert ^2_{\rho } (\kappa \sqrt{D(\lambda _1)/\lambda _1} + M)^2\\&\le C^2(\kappa \sqrt{D(\lambda _1)/\lambda _1} + M)^2 + C^2 D(\lambda _1) \\&\quad (\kappa \sqrt{D(\lambda _1)/\lambda _1} + M)^2\\&\le C^2 (D(\lambda _1)+1) (\kappa \sqrt{D(\lambda _1)/\lambda _1} \\&\quad + M)^2 \le C^2 (D(\lambda _1)+1) b. \end{aligned}$$

By the one-side Bernstein inequality, we have that for any \(t > 0\), \(\frac{1}{m} \sum _{i=1}^{m} \zeta _2 (z_i) - E(\zeta _2) \le t\), with confidence at least

$$\begin{aligned}&1 - \exp \left \{ -\frac{m t^2}{2( \sigma ^2(\zeta _2) + \frac{1}{3} M_{\zeta _2} t)} \right \}\\&\quad \ge 1 - \exp \left \{ -\frac{m t^2}{2[C^2 (D(\lambda _1)+1) b + \frac{1}{3} \cdot 4Cbt]} \right \}\\&\quad = 1 - \exp \left \{ - \frac{m t^2}{2Cb [C(D(\lambda _1)+1) + \frac{4}{3}t]} \right \}. \end{aligned}$$

Set \(t^{*}\) to be the only positive solution of the above equation, we have

$$\begin{aligned} - \frac{m t^2}{2Cb [C(D(\lambda _1)+1) + \frac{4}{3}t]} = \ln (\delta ). \end{aligned}$$

So, \(\frac{1}{m} \sum _{i=1}^{m} \zeta _2 (z_i) - E(\zeta _2) \le t^{*}\) holds with probability at least \(1 - \delta\), where

$$\begin{aligned} t^{*}&= \frac{\frac{4Cb}{3} \ln (\frac{1}{\delta }) + \sqrt{(\frac{4Cb}{3} \ln (\frac{1}{\delta }))^2 + 2C^2bm(D(\lambda _1)+1) \ln (\frac{1}{\delta })}}{m}\\&\quad \le \frac{ 8Cb \ln (\frac{1}{\delta })}{3m} + \sqrt{\frac{2C^2bm(D(\lambda _1)+1) \ln (\frac{1}{\delta })}{m}}\\&\quad \le \frac{ 8Cb \ln (\frac{1}{\delta })}{3m} + D(\lambda _1) + 1 + \frac{ Cb \ln (\frac{1}{\delta })}{2m}. \end{aligned}$$

Recall \(b = (\kappa \sqrt{D(\lambda _1)/\lambda _1} + M)^2\). It follows that

$$\begin{aligned} t^{*} \le D(\lambda _1)\Big ( 1 + \frac{7C \kappa ^2 \ln (\frac{1}{\delta })}{m \lambda _1} \Big ) + \frac{7 M^2 C \ln (\frac{1}{\delta })}{m} + 1. \end{aligned}$$

We accomplished the proof of Proposition 6. \(\square\)

Appendix B

Proof of Proposition 2:

By Propositions 4-6 and Definition 3, we have that for any \(\delta \in (0,1)\), with confidence at least \(1 - \delta\), the following inequality is valid,

$$\begin{aligned} {{\mathcal {E}}}(f_{{\mathbf {z}}}) - {{\mathcal {E}}}(f_B)&\le T_1 + T_2 + T_3 + D(\lambda _1)\\&\le \frac{1}{2}[{{\mathcal {E}}}(f_{{\mathbf {z}}})-{{\mathcal {E}}}(f_B)] + {\varepsilon }^{*}(m,\delta /2) + 2 \\&\quad + D(\lambda _1)\Big ( 2 + \frac{7C \kappa ^2 \ln (\frac{2}{\delta })}{m \lambda _1} \Big ) + \frac{7 M^2 C \ln (\frac{2}{\delta })}{m}. \end{aligned}$$

For \({\varepsilon }^{*}(m,\delta /2)\), the inequality \(\left(\frac{150C (\kappa +1)^2 R^2 C_s (4R)^s}{m}\right)^{\frac{1}{s+1}} \ge \frac{150C (\kappa +1)^2 R^2 \ln (2/\delta )}{m}\) is valid with \(m \ge 37C (\kappa +1)^2 R \ln ({2}/{\delta }) ({\ln ({2}/{\delta })}/{C_s})^{1/s}\), we get \({\varepsilon }^{*}(m,\delta /2) = (\frac{150C (\kappa +1)^2 R^2 C_s (4R)^s}{m})^{\frac{1}{s+1}}\). Thus, for any \(0< \delta < 1\), with probability at least \(1- \delta\), we have

$$\begin{aligned} {{\mathcal {E}}}(f_{{\mathbf {z}}}) - {{\mathcal {E}}}(f_B)&\le D(\lambda _1)\Big ( 4 + \frac{14C \kappa ^2 \ln (\frac{2}{\delta })}{m \lambda _1} \Big ) + \frac{14 M^2 C \ln (\frac{2}{\delta })}{m} \\&\quad + 4 + 2\Big ( \frac{150C (\kappa +1)^2 R^2 C_s (4R)^s}{m} \Big )^{\frac{1}{s+1}}. \end{aligned}$$

We accomplished the proof of Proposition 2. \(\square\)

Proof of Theorem 1:

By Definition 1, for any \(\lambda _1>0\), we have \(D(\lambda _1) \le {\lambda _1}^q\). Let \(R = M\), then we have that for any \(0< \delta < 1\), with probability at least \(1- \delta\),

$$\begin{aligned}&{{\mathcal {E}}}(f_{{\mathbf {z}}}) - {{\mathcal {E}}}(f_B) \le D(\lambda _1)\left ( 4 + \frac{14C \kappa ^2 \ln (\frac{2}{\delta })}{m \lambda _1} \right) + \frac{14 M^2 C \ln (\frac{2}{\delta })}{m}\\&\quad + 4 + 2{\varepsilon }^{*}(m,\delta /2) \\&\le {\lambda _1}^q \left ( 4 + \frac{14C \kappa ^2 \ln (\frac{2}{\delta })}{m \lambda _1} \right ) + \frac{14 M^2 C \ln (\frac{2}{\delta })}{m} + 4 \\&\quad + \frac{300C (\kappa +1)^2 M^2 \ln (\frac{2}{\delta })}{m}\\&\quad + 2\left(\frac{150C (\kappa +1)^2 M^2 C_s (4M)^s}{m}\right)^{\frac{1}{s+1}} \\&\le {\widetilde{C}} \left ( {\lambda _1}^q + \frac{{\lambda _1}^q}{m \lambda _1} + \frac{1}{m} + \frac{1}{m} + \left(\frac{1}{m}\right)^{\frac{1}{1+s}} \right ), \end{aligned}$$

where \({\widetilde{C}} = 300 C (\kappa +1)^2 M^2(4 {C_s}^{\frac{1}{s+1}} + 3 \ln ({2}/{\delta }) + 2)\). Set \(\frac{{\lambda _1}^{q-1}}{m} = {\lambda _1}^q\), we have \(\lambda _1 = \frac{1}{m}\). Since \(0< \lambda _1 < 1\) and \(0 < q \le 1\), let s close to 0 and q close to 1, so the inequality

$$\begin{aligned} {{\mathcal {E}}}(f_{{\mathbf {z}}}) - {{\mathcal {E}}}(f_B) \le {\widetilde{C}} \left ( {\lambda _1}^q + \frac{{\lambda _1}^q}{m \lambda _1} + \frac{2}{m} + (\frac{1}{m})^{\frac{1}{1+s}} \right ) \le {\widetilde{C}} (\frac{1}{m}) \end{aligned}$$

is valid with probability at least \(1- \delta\), where \({\widetilde{C}} = 300 C (\kappa +1)^2 M^2(4 {C_s}^{\frac{1}{s+1}} + 3 \log ({2}/{\delta }) + 2)\) is a constant. Then we accomplished proof of Theorem 1. \(\square\)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Dong, Z., Qin, Y., Zou, B. et al. LMSVCR: novel effective method of semi-supervised multi-classification. Neural Comput & Applic 34, 3857–3873 (2022). https://doi.org/10.1007/s00521-021-06647-7

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00521-021-06647-7

Keywords

Navigation