Skip to main content
Log in

Double-fold localized multiple matrix learning machine with Universum

  • Theoretical Advances
  • Published:
Pattern Analysis and Applications Aims and scope Submit manuscript

Abstract

Matrix learning, multiple-view learning, Universum learning, and local learning are four hot spots of present research. Matrix learning aims to design feasible machines to process matrix patterns directly. Multiple-view learning takes pattern information from multiple aspects, i.e., multiple-view information into account. Universum learning can reflect priori knowledge about application domain and improve classification performances. A good local learning approach is important to the finding of local structures and pattern information. Our previous proposed learning machine, double-fold localized multiple matrix learning machine is a one with multiple-view information, local structures, and matrix learning. But this machine does not take Universum learning into account. Thus, this paper proposes a double-fold localized multiple matrix learning machine with Universum (Uni-DLMMLM) so as to improve the performance of a learning machine. Experimental results have validated that Uni-DLMMLM (1) makes full use of the domain knowledge of whole data distribution as well as inherits the advantages of matrix learning; (2) combines Universum learning with matrix learning so as to capture more global knowledge; (3) has a good ability to process different kinds of data sets; (4) has a superior classification performance and leads to a low empirical generation risk bound.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15

Similar content being viewed by others

Notes

  1. http://www.cs.columbia.edu/CAVE/software/softlib/coil-20.php.

  2. http://sun16.cecs.missouri.edu/pgader/CECS477/NNdigits.zip.

  3. http://www.cl.cam.ac.uk/research/dtg/attarchive/facedatabase.html.

  4. http://www.cs.columbia.edu/CAVE/software/softlib/coil-20.php.

  5. http://sun16.cecs.missouri.edu/pgader/CECS477/NNdigits.zip.

  6. http://www.cl.cam.ac.uk/research/dtg/attarchive/facedatabase.html.

References

  1. Duda RO, Hart PE, Stork DG (2001) Pattern classification. Wiley-Interscience Press, New York

    MATH  Google Scholar 

  2. Vapnik VN (1998) Statistical learning theory. Wiley-Interscience Press, New York

    MATH  Google Scholar 

  3. Wang H, Ahuja N (2005) Rank-r approximation of tensors using image-as-matrix representation. IEEE Comp Soc Conf Comp Vision Patt Recogn 2:346–353

    Google Scholar 

  4. Yang J, Zhang D, Frangi AF, Yang J (2004) Two-dimensional PCA: a new approach to appearance-based face representation and recognition. IEEE Trans Patt Anal Mach Intell 26(1):131–137

    Article  Google Scholar 

  5. Li M, Yuan B (2005) 2D-LDA: a novel statistical linear discriminant analysis for image matrix. Patt Recogn Lett 26(5):527–532

    Article  Google Scholar 

  6. Chen SC, Wang Z, Tian YJ (2007) Matrix-pattern-oriented ho-kashyap classifier with regularization learning. Patt Recogn 40(5):1533–1543

    Article  MATH  Google Scholar 

  7. Ye JP (2005) Generalized low rank approximations of matrices. Mach Learn 61(1):167–191

    Article  MATH  Google Scholar 

  8. Chen SC, Zhu Y, Zhang D, Yang JY (2005) Feature extraction approaches based on matrix pattern: Mat-PCA and Mat-FLDA. Patt Recogn Lett 26(8):1157–1167

    Article  Google Scholar 

  9. Wang Z, Chen SC (2007) New least squares support vector machines based on matrix patterns. Neural Process Lett 26(1):41–56

    Article  MathSciNet  Google Scholar 

  10. Yan Y, Wang Q, Ni G, Pan Z, Kong R (2012) One-class support vector machines based on matrix patterns. Int Conf Inform Cyber Comp Eng pp 223–231

  11. Tang Y, Yan PK, Yuan Y, Li XL (2011) Single-image super-resolution via local learning. Int J Mach Learn Cyber 2(1):15–23

    Article  Google Scholar 

  12. Wang Z, Zhu CM, Gao DQ, Chen SC (2013) Three-fold structured classifier design based on matrix pattern. Patt Recogn 46(6):1532–1555

    Article  MATH  Google Scholar 

  13. Sen D, Gupta N, Pal SK (2013) Incorporating local image structure in normalized cut based graph partitioning for grouping of pixels. Inform Sci 248:214–238

    Article  MathSciNet  Google Scholar 

  14. Yeung DS, Chan PPK, Ng WWY, Lin CM, Liu JNK (2012) Dynamic fusion method using localized generalization error model. Inform Sci 217:1–20

    Article  Google Scholar 

  15. Yeung DS, Chan PPK, Ng WWY (2009) Radial Basis Function network learning using localized generalization error bound. Inform Sci 179(19):3127–3199

    Article  MATH  MathSciNet  Google Scholar 

  16. Wang Z, Xu J, Gao DQ, Fu Y (2013) Multiple empirical kernel learning based on local information. Neural Comp Appl 23:2113–2120

    Article  Google Scholar 

  17. Wang Z, Chen SC, Gao DQ (2011) A novel multi-view learning developed from single-view patterns. Patt Recogn 44(10–11):2395–2413

    Article  MATH  Google Scholar 

  18. Wang Z, Xu J, Chen SC, Gao DQ (2012) Regularized multi-view machine based on response surface technique. Neurocomputing 97:201–213

    Article  Google Scholar 

  19. Zhu CM, Wang Z, Gao DQ, Feng X (2015) Double-fold localized multiple matrixized learning machine. Inform Sci 295:196–220

    Article  MATH  Google Scholar 

  20. Xu C, Tao DC, Xu C (2015) Multi-view intact space learning. IEEE Trans Patt Anal Mach Intell 37(12):2531–2544

    Article  Google Scholar 

  21. Luo Y, Tao DC, Wen YG, Ramamohanarao K, Xu C (2015) Tensor canonical correlation analysis for multi-view dimension reduction. IEEE Trans Know Data Eng 27(11):3111–3124

    Article  Google Scholar 

  22. Zhang LF, Zhang Q, Zhang LP, Tao DC, Huang X, Du B (2015) Ensemble manifold regularized sparse low-rank approximation for multiview feature embedding. Patt Recogn 48(10):3102–3112

    Article  Google Scholar 

  23. Vapnik V, Kotz S (1982) Estimation of dependences based on empirical data. Springer

  24. Cherkassky V, Dai WY (2009) Empirical Study of the Universum SVM learning for high-dimensional data. Lect Notes Comp Sci 5768:932–941

    Article  Google Scholar 

  25. Liu DL, Tian YJ, Bie RF, Shi Y (2014) Self-Universum support vector machine. Person Ubiquitous Comp 18:1813–1819

    Article  Google Scholar 

  26. Chen S, Zhang CS (2009) Selecting informative universum sample for semi-supervised learning. Int Jont Conf Art Intell pp 1016–1021

  27. Zhang D, Wang J, Si L (2001) Document clustering with universum. Int Conf Res Dev Inform Retr pp 873–882

  28. Peng B, Qian G, Ma YQ (2008) View-invariant pose recognition using multilinear analysis and the universum. Adv Visual Comp 5359:581–591

    Article  Google Scholar 

  29. Shen C, Wang P, Shen F, Wang H (2012) Uboost: boosting with the universum. IEEE Trans Patt Anal Mach Intell 34(4):825–832

    Article  Google Scholar 

  30. Chen XH, Chen SC, Xue H (2012) Universum linear discriminant analysis. Electr Lett 48(22):1407–1409

    Article  Google Scholar 

  31. Wang Z, Zhu YJ, Liu WW, Chen ZH, Gao DQ (2014) Multi-view learning with universum. Know Based Syst 70:376–391

    Article  Google Scholar 

  32. Leski J (2003) Ho-kashyap classifier with generalization control. Patt Recogn Lett 24(14):2281–2290

    Article  MATH  Google Scholar 

  33. Blake CL, Newman DJ, Hettich S, Merz CJ (2012) UCI repository of machine learning databases

  34. Wang N, Wang J, Yeung DY (2013) Online robust non-negative dictionary learning for visual tracking. Int Conf Comp Vision

  35. Wang N, Yeung DY (2013) Learning a deep compact image representation for visual tracking. Neural Inform Process Syst

  36. Milgram J, Cheriet M, Sabourin R (2013) “One Against One” or “One Against All”: Which one is better for handwriting recognition with SVMs?. In: Tenth International Workshop on Frontiers in Handwriting Recognition

  37. Debnath R, Takahide N, Takahashi H (2004) A decision based one-against-one method for multi-class support vector machine. Patt Anal Appl 7(2):164–175

    Article  MathSciNet  Google Scholar 

  38. Hsu CW, Lin CJ (2002) A comparison of methods for multiclass support vector machines. IEEE Trans Neural Networks 13(2):415–425

    Article  Google Scholar 

  39. Cortes C, Vapnik V (1995) Support vector machine. Mach Learn 20(3):273–297

    MATH  Google Scholar 

  40. Cawley GC, Talbot NLC (2010) On over-fitting in model selection and subsequent selection bias in performance evaluation. J Mach Learn Res 11:2079–2107

    MATH  MathSciNet  Google Scholar 

  41. Bartlett P, Boucheron S, Lugosi G (2002) Model selection and error estimation. Mach Learn 48:85–113

    Article  MATH  Google Scholar 

  42. Koltchinskii V (2001) Rademacher penalties and structural risk minimization. IEEE Trans Inform Theory 47(5):1902–1914

    Article  MATH  MathSciNet  Google Scholar 

  43. Koltchinskii V, Panchenko D (2000) Rademacher processes and bounding the risk of function learning. High Dimens Probab II:443–459

    Article  MATH  MathSciNet  Google Scholar 

  44. Mendelson S (2002) Rademacher averages and phase transitions in glivenko-cantelli classes. IEEE Trans Inform Theory 48(1):251–263

    Article  MATH  MathSciNet  Google Scholar 

  45. Wang Z, Zhu CM, Niu ZX, Gao DQ, Feng X (2014) Multi-kernel classification machine with reduced complexity. Know Based Syst 65:83–95

    Article  Google Scholar 

  46. Demsar J (2006) Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 7:1–30

    MATH  MathSciNet  Google Scholar 

Download references

Acknowledgments

This work was supported by Shanghai Natural Science Foundation under grant number 16ZR1414500, and the author would like to thank their supports.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Changming Zhu.

Appendix

Appendix

To minimize Eq. (5), we adopt a two-step alternating optimization algorithm. Further, for convenience and simplification, we use \(\mathcal {Q}(i,p)=\varphi _{i}g^p(A_i^p)-1-b_i^p\) and \(\mathcal {Q^{*}}(j,p)=g^p({A_j^{*}}^p)-1-{b_j^{*}}^p\) to denote some parts of Eq. (5). Now the details of this algorithm is given below.

First, we fix each \(\eta _q(A_i^q)\), and then the gradients of Eq. (5) with respect to \({u^p}, {\tilde{v}^p}\), \(v_0^p\), \({b^p}\), and \({b^{*}}^p\) are given below.

$$\begin{aligned}&\frac{\partial {L}}{\partial {u^p}}=2\left(\sum \limits _{i=1}^{N}(\Omega A_{i}^{p}\tilde{v}^p)+E\sum \limits _{j=1}^{L}(\Omega ^{*} {A_{j}^{*}}^{p}\tilde{v}^p)+CS_1^pu^p\right) \end{aligned}$$
(10)
$$\begin{aligned}&\frac{\partial {L}}{\partial {\tilde{v}^p}}=2\left(\sum \limits _{i=1}^{N}\left(\Omega ({u^p}^TA_i^p)^T\right) +E\sum \limits _{j=1}^{L}\left(\Omega ^{*}({u^p}^T{A_j^{*}}^p)^T\right) +CS_2^p\tilde{v}^p\right) \end{aligned}$$
(11)
$$\begin{aligned}&\frac{\partial {L}}{\partial {v_0^p}}=2(\Omega +\Omega ^{*})\end{aligned}$$
(12)
$$\begin{aligned}&\frac{\partial {L}}{\partial {b^p}}=-2(B-I-b^p)\end{aligned}$$
(13)
$$\begin{aligned}&\frac{\partial {L}}{\partial {{b^{*}}^p}}=-2E(B^{*}-I^{*}-{b^{*}}^p) \end{aligned}$$
(14)

where

$$\begin{aligned}&\Omega =\mathcal {Q}(i,p)\varphi _{i}-\gamma \sum \limits _{k=1}^{M}H(g_k,A_i)\eta _{p}(A_i^p)+\gamma H(g_p,A_i)\end{aligned}$$
(15)
$$\begin{aligned}&\Omega ^{*}=\mathcal {Q}^{*}(j,p)-D\sum \limits _{l=1}^{M}H(g_l,A_j^{*})\eta _{p}({A_j^{*}}^p)+D H(g_p,A_j^{*})\end{aligned}$$
(16)
$$\begin{aligned}&H(g_k,A_i)=g^k(A_i^k)-\sum \limits _{q=1}^{M}\eta _{q}(A_i^q)g^q(A_{i}^{q})\end{aligned}$$
(17)
$$\begin{aligned}&H(g_l,A_j^{*})=g^l({A_j^{*}}^l)-\sum \limits _{h=1}^{M}\eta _{h}({A_j^{*}}^h)g^h({A_j^{*}}^{h})\end{aligned}$$
(18)
$$\begin{aligned}&{B}=\left(\begin{array}{cc} \varphi _{1}g^p(A_1^p)\\ \varphi _{2}g^p(A_2^p)\\ \ldots \\ \varphi _{N}g^p(A_N^p)\\ \end{array} \right) \end{aligned}$$
(19)
$$\begin{aligned}&{B^{*}}=\left(\begin{array}{cc} g^p({A_1^{*}}^p)\\ g^p({A_2^{*}}^p)\\ ...\\ g^p({A_L^{*}}^p)\\ \end{array} \right) \end{aligned}$$
(20)

\(I_{N\times 1}\) is an identity matrix with the size of \(N\times 1\) and \(I^{*}_{L\times 1}\) is an identity matrix with the size of \(L\times 1\)

Then we make \(\frac{\partial {L}}{\partial {u^p}}\), \(\frac{\partial {L}}{\partial {\tilde{v}^p}}\), and \(\frac{\partial {L}}{\partial {v_0^p}}\) be zeros to get the results of the parameters \(u^p\), \(\tilde{v}^p\), and \(v_0^p\) below.

$$\begin{aligned}&u^p=\nonumber \\&-\left\{ \left[ C{S_1^p}^T+\sum \limits _{i=1}^{N}\left(A_i^p\tilde{v}^p\left(A_i^p\tilde{v}^p\right) ^TU(i)\right) +E\sum \limits _{j=1}^{L}\left({A_j^{*}}^p\tilde{v}^p\left({A_j^{*}}^p\tilde{v}^p\right) ^TU(j)^{*}\right) \right] ^{-1}\right\} ^T\nonumber \\& \times \left\{ \sum \limits _{i=1}^{N}\left[ W(i)\left(A_i^p\tilde{v}^p\right) ^T\right] +E\sum \limits _{j=1}^{L}\left[ W(j)^{*}\left({A_j^{*}}^p\tilde{v}^p\right) ^T\right] \right\} ^T \end{aligned}$$
(21)
$$\begin{aligned}&\tilde{v}^p=\nonumber \\&-\left\{ \left[ CS_2^p+\sum \limits _{i=1}^{N}\left(\left({u^p}^TA_i^{p}\right) ^T{u^p}^TA_i^{p}U(i)\right) +E\sum \limits _{j=1}^{L}\left(\left({u^p}^T{A_j^{*}}^{p}\right) ^T{u^p}^T{A_j^{*}}^{p}U(j)^{*}\right) \right] \right\} ^{-1}\nonumber \\& \times \left\{ \sum \limits _{i=1}^{N}\left[ \left({u^p}^TA_i^p\right) ^TW(i)\right] +E\sum \limits _{j=1}^{L}\left[ \left({u^p}^T{A_j^{*}}^p\right) ^TW(j)^{*}\right] \right\} \end{aligned}$$
(22)
$$\begin{aligned}&v_0^p=-\left\{ N+\sum \limits _{i=1}^{N}\left[ \gamma \left(M\eta _{p}^2(A_i^p)-3\eta _{p}(A_i^p)+2\right) \right] \right. \nonumber \\&\left. +L+\sum \limits _{j=1}^{L}\left[ D\left(M\eta _{p}^2({A_j^{*}}^p)-3\eta _{p}({A_j^{*}}^p)+2\right) \right] \right\} ^{-1} \nonumber \\&\times \left\{ \sum \limits _{i=1}^{N}\left[ {u^p}^TA_i^{p}\tilde{v}^p\left(\gamma M\eta _{p}^2(A_i^p)+\gamma \eta _{p}^2(A_i^p)-5\gamma \eta _{p}(A_i^p)+3\gamma +1\right) \right. \right. \nonumber \\&\left. \left. -(1+b_i^p)\varphi _{i}+\sum \limits _{q=1,q\ne p}^{M}\left[ \left(M\eta _q(A_i^q)\eta _{p}(A_i^p)-\eta _q(A_i^q)-2\eta _{p}(A_i^p)\right) \gamma g^q(A_i^q)\right] \right] \right. \nonumber \\&\left. +\sum \limits _{j=1}^{L}\left[ {u^p}^T{A_j^{*}}^{p}\tilde{v}^p\left(D M\eta _{p}^2({A_j^{*}}^p)+D\eta _{p}^2({A_j^{*}}^p)-5D\eta _{p}({A_j^{*}}^p)+3D+1\right) -\left(1+{b_j^{*}}^p)\right. \right. \right. \nonumber \\&\left. \left. \left. +\sum \limits _{h=1,h\ne p}^{M}\left[ \left(M\eta _h({A_j^{*}}^h)\eta _{p}({A_j^{*}}^p)-\eta _h({A_j^{*}}^h)-2\eta _{p}({A_j^{*}}^p)\right) D g^h({A_j^{*}}^h\right) \right] \right] \right\} \end{aligned}$$
(23)

Here, U(i), W(i), \(U(j)^{*}\), and \(W(j)^{*}\) are used to simplify Eqs. (21) and (22), namely,

$$\begin{aligned} U(i)=\gamma M\eta _{p}^2(A_i^p)-2\gamma \eta _{p}(A_i^p)+\gamma +1 \end{aligned}$$
(24)
$$\begin{aligned} U(j)^{*}=D M\eta _{p}^2({A_j^{*}}^p)-2D \eta _{p}({A_j^{*}}^p)+D+1 \end{aligned}$$
(25)
$$\begin{aligned}&W(i)=\left(\gamma M\eta _{p}^2(A_i^p)-2\gamma \eta _{p}(A_i^p)+\gamma +1\right) v_0^p-(1+b_i^p)\varphi _{i}+\nonumber \\&\gamma \sum \limits _{q=1,q\ne p}^{M}\Big [g^q(A_i^q)\left(M\eta _q(A_i^q)\eta _{p}(A_i^p)-\eta _q(A_i^q)-\eta _{p}(A_i^p)\right) \Big ] \end{aligned}$$
(26)
$$\begin{aligned}&W(j)^{*}=\left(D M\eta _{p}^2({A_j^{*}}^p)-2D\eta _{p}({A_j^{*}})+D+1\right) v_0^p-(1+{b_j^{*}}^p)+\nonumber \\&D\sum \limits _{h=1,h\ne p}^{M}\Big [g^h\left({A_j^{*}}^h\right) \left(M\eta _h\left({A_j^{*}}^h\right) \eta _{p}\left({A_j^{*}}^p\right) -\eta _h\left({A_j^{*}}^h\right) -\eta _{p}\left({A_j^{*}}^p\right) \right) \Big ] \end{aligned}$$
(27)

Then we let

$$\begin{aligned} {D^p(k)}=\left(\begin{array}{cc} \varphi _{1}^Tu^{p^T}(k)A_1^p\tilde{v}^p(k)+\varphi _{1}v_0^{p^T}(k)\\ \varphi _{2}^Tu^{p^T}(k)A_2^p\tilde{v}^p(k)+\varphi _{2}v_0^{p^T}(k)\\ \ldots \\ \varphi _{N}^Tu^{p^T}(k)A_N^p\tilde{v}^p(k)+\varphi _{N}v_0^{p^T}(k)\\ \end{array} \right) \end{aligned}$$
(28)

and

$$\begin{aligned} {{D^{*}}^p(k)}=\left(\begin{array}{cc} u^{p^T}(k){A_1^{*}}^p\tilde{v}^p(k)+v_0^{p^T}(k)\\ u^{p^T}(k){A_2^{*}}^p\tilde{v}^p(k)+v_0^{p^T}(k)\\ \ldots \\ u^{p^T}(k){A_L^{*}}^p\tilde{v}^p(k)+v_0^{p^T}(k)\\ \end{array} \right) \end{aligned}$$
(29)

then at k-th iteration, the error vector with p-th matrix, i.e., \(e^{p}(k)\) and \({e^{*}}^{p}(k)\) can be computed by Eqs. (30) and (31).

$$\begin{aligned} e^{p}(k)=D^p(k)-I-b^{p}(k) \end{aligned}$$
(30)
$$\begin{aligned} {e^{*}}^{p}(k)={D^{*}}^p(k)-I^{*}-{b^{*}}^p(k) \end{aligned}$$
(31)

The margin \(b^p\) and \({b^{*}}^p\) are given below.

$$\begin{aligned} b^{p}(k+1)=b^{p}(k)+\rho (e^{p}(k)+|e^{p}(k)|) \end{aligned}$$
(32)
$$\begin{aligned} {b^{*}}^{p}(k+1)={b^{*}}^{p}(k)+\rho ^{*}({e^{*}}^{p}(k)+|{e^{*}}^{p}(k)|) \end{aligned}$$
(33)

where \(b^{p}(1)\ge 0 _{N\times 1}\) and \({b^{*}}^{p}(1)\ge 0 _{N\times 1}\). The learning rates are \(0<\rho <1\) and \(0<\rho ^{*}<1\). \({D^{*}}^p(k)\), \({b^{*}}^{p}(k)\), \(D^p(k)\), \(b^{p}(k)\), \(u^{p}(k)\), \(\tilde{v}^p(k)\), and \(v_0^{p}(k)\) represent the values of \({D^{*}}^p\), \({b^{*}}^{p}\), \(D^p\), \(b^{p}\), \(u^{p}\), \(\tilde{v}^p\), and \(v_0^{p}\) at k-th iteration, respectively. For ones of Universum patterns, they have the same meaning.

Second, we fix each \(u^p\), \(\tilde{v}^p\), \(v_0^{p}\), \(b^p\), \({b^{*}}^{p}\) and calculate the gradients of Eq. (5) with respect to \(h_{q1}\), \(h_{q2}\), and \(h_{q0}\) are given below. Here \(p=1,2,\ldots ,M\) and \(q=1,2,\ldots ,M\).

$$\begin{aligned}&\frac{\partial {L}}{\partial {h_{qs}}}\nonumber \\&=2\gamma \sum \limits _{i=1}^{N}\sum \limits _{p=1}^{M}\left\{ \left[ -\sum \limits _{h=1}^{M}\frac{\partial {\eta _h\left(A_{i}^{h}\right) }}{\partial {h_{qs}}}g^h(A_i^h)\right] \left[ g^p(A_{i}^{p})-\sum \limits _{h=1}^{M}\eta _h(A_i^h)g^h(A_{i}^{h})\right] \right\} \nonumber \\&+2D\sum \limits _{j=1}^{L}\sum \limits _{r=1}^{M}\left\{ \left[ -\sum \limits _{k=1}^{M}\frac{\partial {\eta _k\left({A_j^{*}}^{k}\right) }}{\partial {h_{qs}}}g^k\left({A_j^{*}}^k\right) \right] \left[ g^r\left({A_j^{*}}^{r}\right) -\sum \limits _{k=1}^{M}\eta _k\left({A_j^{*}}^k\right) g^k\left({A_j^{*}}^{k}\right) \right] \right\} \end{aligned}$$
(34)

where \(s=0,1,2\). To simplify this equation, we give some other equations below.

$$\begin{aligned}&\frac{\partial {\eta _h(A_{i}^{h})}}{\partial {h_{q1}}}=E_1A_{i}^qh_{q2} \quad \quad (h \ne q)\end{aligned}$$
(35)
$$\begin{aligned}&\frac{\partial {\eta _q(A_{i}^{q})}}{\partial {h_{q1}}}=E_2A_i^qh_{q2}\end{aligned}$$
(36)
$$\begin{aligned}&\frac{\partial {\eta _h(A_{i}^{h})}}{\partial {h_{q2}}}=E_1A_{i}^{q^T}h_{q1} \quad \quad (h \ne q)\end{aligned}$$
(37)
$$\begin{aligned}&\frac{\partial {\eta _q(A_{i}^{q})}}{\partial {h_{q2}}}=E_2A_i^{q^T}h_{q1}\end{aligned}$$
(38)
$$\begin{aligned}&\frac{\partial {\eta _k({A_j^{*}}^{k})}}{\partial {h_{q1}}}=E_1^{*}{A_j^{*}}^qh_{q2} \quad \quad (k \ne q)\end{aligned}$$
(39)
$$\begin{aligned}&\frac{\partial {\eta _q({A_j^{*}}^{q})}}{\partial {h_{q1}}}=E_2^{*}A_i^qh_{q2}\end{aligned}$$
(40)
$$\begin{aligned}&\frac{\partial {\eta _k({A_j^{*}}^{k})}}{\partial {h_{q2}}}=E_1^{*}{A_j^{*}}^{q^T}h_{q1} \quad \quad (k \ne q)\end{aligned}$$
(41)
$$\begin{aligned}&\frac{\partial {\eta _q({A_j^{*}}^{q})}}{\partial {h_{q2}}}=E_2^{*}A_i^{q^T}h_{q1} \end{aligned}$$
(42)

and

$$\begin{aligned}&\frac{\partial {\eta _h(A_{i}^{h})}}{\partial {h_{q0}}}=E_1=\frac{-e^{\left(h_{h1}^TA_i^hh_{h2}+h_{h0}\right) }e^{\left(h_{q1}^TA_i^qh_{q2}+h_{q0}\right) }}{\left[ \sum \nolimits _{t=1}^{M}e^{\left(h_{t1}^TA_i^th_{t2}+h_{t0}\right) }\right] ^2} \quad \quad (h \ne q)\end{aligned}$$
(43)
$$\begin{aligned}&\frac{\partial {\eta _q(A_{i}^{q})}}{\partial {h_{q0}}}=E_2=\frac{e^{\left(h_{q1}^TA_i^qh_{q2}+h_{q0}\right) }}{\sum \nolimits _{t=1}^{M}e^{\left(h_{t1}^TA_i^th_{t2}+h_{t0}\right) }}-\frac{e^{2\left(h_{q1}^TA_i^qh_{q2}+h_{q0}\right) }}{\left[ \sum \nolimits _{t=1}^{M}e^{\left(h_{t1}^TA_i^th_{t2}+h_{t0}\right) }\right] ^2}\end{aligned}$$
(44)
$$\begin{aligned}&\frac{\partial {\eta _k\left({A_j^{*}}^{k}\right) }}{\partial {h_{q0}}}=E_1^{*}=\frac{-e^{\left(h_{k1}^T{A_j^{*}}^kh_{k2}+h_{k0}\right) }e^{\left(h_{q1}^T{A_j^{*}}^qh_{q2}+h_{q0}\right) }}{\left[ \sum \nolimits _{t=1}^{M}e^{\left(h_{t1}^T{A_j^{*}}^th_{t2}+h_{t0}\right) }\right] ^2} \quad 'ad (k \ne q)\end{aligned}$$
(45)
$$\begin{aligned}&\frac{\partial {\eta _q\left({A_j^{*}}^{q}\right) }}{\partial {h_{q0}}}=E_2^{*}=\frac{e^{\left(h_{q1}^T{A_j^{*}}^qh_{q2}+h_{q0}\right) }}{\sum \nolimits _{t=1}^{M}e^{\left(h_{t1}^T{A_j^{*}}^th_{t2}+h_{t0}\right) }}-\frac{e^{2\left(h_{q1}^T{A_j^{*}}^qh_{q2}+h_{q0}\right) }}{\left[ \sum \nolimits _{t=1}^{M}e^{\left(h_{t1}^T{A_j^{*}}^th_{t2}+h_{t0}\right) }\right] ^2} \end{aligned}$$
(46)

Then, we need to update the parameters of \(\eta _q(A_i^q)\) after k-th iteration, i.e.,

$$\begin{aligned} h_{qs}^{(k+1)}=h_{qs}^{(k)}-\mu \frac{\partial {L}}{\partial {h_{qs}^{(k)}}} \end{aligned}$$
(47)

Here, \(\mu\) is the attenuation coefficient which is a constant in each iteration and \(h_{qs}^{(k)}\) represents the \(h_{qs}\) at k-th iteration. In order to compute \(\frac{\partial {L}}{\partial {h_{qs}^{(k)}}}\), we replace \(h_{qs}\) in Eqs. (34)–(46) with \(h_{qs}^{(k)}\). Finally, we update the \({u^p}\), \(\tilde{v}^p\), \(v_0^{p}\), \({b^p}\), \({{b^{*}}^p}\), \(h_{q1}\), \(h_{q2}\), and \(h_{q0}\), then we can compute Eq. (5). In practice, the termination criterion is given in the following equation.

$$\begin{aligned} \frac{\parallel L(k+1)-L(k)\parallel _2}{\parallel L(k)\parallel _2}\le \xi \end{aligned}$$
(48)

where L(k) represents the value of Eq. (5) at k-th iteration and \(\parallel {.}\parallel _2\) still represents the 2-norm operation.

If Eq. (48) is satisfied, we can stop the procedure and get the optimal parameters.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhu, C. Double-fold localized multiple matrix learning machine with Universum. Pattern Anal Applic 20, 1091–1118 (2017). https://doi.org/10.1007/s10044-016-0548-9

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10044-016-0548-9

Keywords

Navigation