Double-fold localized multiple matrix learning machine with Universum

Zhu, Changming

doi:10.1007/s10044-016-0548-9

Double-fold localized multiple matrix learning machine with Universum

Theoretical Advances
Published: 27 April 2016

Volume 20, pages 1091–1118, (2017)
Cite this article

Pattern Analysis and Applications Aims and scope Submit manuscript

Changming Zhu¹

358 Accesses
9 Citations
Explore all metrics

Abstract

Matrix learning, multiple-view learning, Universum learning, and local learning are four hot spots of present research. Matrix learning aims to design feasible machines to process matrix patterns directly. Multiple-view learning takes pattern information from multiple aspects, i.e., multiple-view information into account. Universum learning can reflect priori knowledge about application domain and improve classification performances. A good local learning approach is important to the finding of local structures and pattern information. Our previous proposed learning machine, double-fold localized multiple matrix learning machine is a one with multiple-view information, local structures, and matrix learning. But this machine does not take Universum learning into account. Thus, this paper proposes a double-fold localized multiple matrix learning machine with Universum (Uni-DLMMLM) so as to improve the performance of a learning machine. Experimental results have validated that Uni-DLMMLM (1) makes full use of the domain knowledge of whole data distribution as well as inherits the advantages of matrix learning; (2) combines Universum learning with matrix learning so as to capture more global knowledge; (3) has a good ability to process different kinds of data sets; (4) has a superior classification performance and leads to a low empirical generation risk bound.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Matrixized Learning Machine with Feature-Clustering Interpolation

Article 28 July 2015

Regularized Matrix-Pattern-Oriented Classification Machine with Universum

Article 22 November 2016

A simple multiple-fold correlation-based multi-view multi-label learning

Article 01 February 2023

Notes

References

Duda RO, Hart PE, Stork DG (2001) Pattern classification. Wiley-Interscience Press, New York
MATH Google Scholar
Vapnik VN (1998) Statistical learning theory. Wiley-Interscience Press, New York
MATH Google Scholar
Wang H, Ahuja N (2005) Rank-r approximation of tensors using image-as-matrix representation. IEEE Comp Soc Conf Comp Vision Patt Recogn 2:346–353
Google Scholar
Yang J, Zhang D, Frangi AF, Yang J (2004) Two-dimensional PCA: a new approach to appearance-based face representation and recognition. IEEE Trans Patt Anal Mach Intell 26(1):131–137
Article Google Scholar
Li M, Yuan B (2005) 2D-LDA: a novel statistical linear discriminant analysis for image matrix. Patt Recogn Lett 26(5):527–532
Article Google Scholar
Chen SC, Wang Z, Tian YJ (2007) Matrix-pattern-oriented ho-kashyap classifier with regularization learning. Patt Recogn 40(5):1533–1543
Article MATH Google Scholar
Ye JP (2005) Generalized low rank approximations of matrices. Mach Learn 61(1):167–191
Article MATH Google Scholar
Chen SC, Zhu Y, Zhang D, Yang JY (2005) Feature extraction approaches based on matrix pattern: Mat-PCA and Mat-FLDA. Patt Recogn Lett 26(8):1157–1167
Article Google Scholar
Wang Z, Chen SC (2007) New least squares support vector machines based on matrix patterns. Neural Process Lett 26(1):41–56
Article MathSciNet Google Scholar
Yan Y, Wang Q, Ni G, Pan Z, Kong R (2012) One-class support vector machines based on matrix patterns. Int Conf Inform Cyber Comp Eng pp 223–231
Tang Y, Yan PK, Yuan Y, Li XL (2011) Single-image super-resolution via local learning. Int J Mach Learn Cyber 2(1):15–23
Article Google Scholar
Wang Z, Zhu CM, Gao DQ, Chen SC (2013) Three-fold structured classifier design based on matrix pattern. Patt Recogn 46(6):1532–1555
Article MATH Google Scholar
Sen D, Gupta N, Pal SK (2013) Incorporating local image structure in normalized cut based graph partitioning for grouping of pixels. Inform Sci 248:214–238
Article MathSciNet Google Scholar
Yeung DS, Chan PPK, Ng WWY, Lin CM, Liu JNK (2012) Dynamic fusion method using localized generalization error model. Inform Sci 217:1–20
Article Google Scholar
Yeung DS, Chan PPK, Ng WWY (2009) Radial Basis Function network learning using localized generalization error bound. Inform Sci 179(19):3127–3199
Article MATH MathSciNet Google Scholar
Wang Z, Xu J, Gao DQ, Fu Y (2013) Multiple empirical kernel learning based on local information. Neural Comp Appl 23:2113–2120
Article Google Scholar
Wang Z, Chen SC, Gao DQ (2011) A novel multi-view learning developed from single-view patterns. Patt Recogn 44(10–11):2395–2413
Article MATH Google Scholar
Wang Z, Xu J, Chen SC, Gao DQ (2012) Regularized multi-view machine based on response surface technique. Neurocomputing 97:201–213
Article Google Scholar
Zhu CM, Wang Z, Gao DQ, Feng X (2015) Double-fold localized multiple matrixized learning machine. Inform Sci 295:196–220
Article MATH Google Scholar
Xu C, Tao DC, Xu C (2015) Multi-view intact space learning. IEEE Trans Patt Anal Mach Intell 37(12):2531–2544
Article Google Scholar
Luo Y, Tao DC, Wen YG, Ramamohanarao K, Xu C (2015) Tensor canonical correlation analysis for multi-view dimension reduction. IEEE Trans Know Data Eng 27(11):3111–3124
Article Google Scholar
Zhang LF, Zhang Q, Zhang LP, Tao DC, Huang X, Du B (2015) Ensemble manifold regularized sparse low-rank approximation for multiview feature embedding. Patt Recogn 48(10):3102–3112
Article Google Scholar
Vapnik V, Kotz S (1982) Estimation of dependences based on empirical data. Springer
Cherkassky V, Dai WY (2009) Empirical Study of the Universum SVM learning for high-dimensional data. Lect Notes Comp Sci 5768:932–941
Article Google Scholar
Liu DL, Tian YJ, Bie RF, Shi Y (2014) Self-Universum support vector machine. Person Ubiquitous Comp 18:1813–1819
Article Google Scholar
Chen S, Zhang CS (2009) Selecting informative universum sample for semi-supervised learning. Int Jont Conf Art Intell pp 1016–1021
Zhang D, Wang J, Si L (2001) Document clustering with universum. Int Conf Res Dev Inform Retr pp 873–882
Peng B, Qian G, Ma YQ (2008) View-invariant pose recognition using multilinear analysis and the universum. Adv Visual Comp 5359:581–591
Article Google Scholar
Shen C, Wang P, Shen F, Wang H (2012) Uboost: boosting with the universum. IEEE Trans Patt Anal Mach Intell 34(4):825–832
Article Google Scholar
Chen XH, Chen SC, Xue H (2012) Universum linear discriminant analysis. Electr Lett 48(22):1407–1409
Article Google Scholar
Wang Z, Zhu YJ, Liu WW, Chen ZH, Gao DQ (2014) Multi-view learning with universum. Know Based Syst 70:376–391
Article Google Scholar
Leski J (2003) Ho-kashyap classifier with generalization control. Patt Recogn Lett 24(14):2281–2290
Article MATH Google Scholar
Blake CL, Newman DJ, Hettich S, Merz CJ (2012) UCI repository of machine learning databases
Wang N, Wang J, Yeung DY (2013) Online robust non-negative dictionary learning for visual tracking. Int Conf Comp Vision
Wang N, Yeung DY (2013) Learning a deep compact image representation for visual tracking. Neural Inform Process Syst
Milgram J, Cheriet M, Sabourin R (2013) “One Against One” or “One Against All”: Which one is better for handwriting recognition with SVMs?. In: Tenth International Workshop on Frontiers in Handwriting Recognition
Debnath R, Takahide N, Takahashi H (2004) A decision based one-against-one method for multi-class support vector machine. Patt Anal Appl 7(2):164–175
Article MathSciNet Google Scholar
Hsu CW, Lin CJ (2002) A comparison of methods for multiclass support vector machines. IEEE Trans Neural Networks 13(2):415–425
Article Google Scholar
Cortes C, Vapnik V (1995) Support vector machine. Mach Learn 20(3):273–297
MATH Google Scholar
Cawley GC, Talbot NLC (2010) On over-fitting in model selection and subsequent selection bias in performance evaluation. J Mach Learn Res 11:2079–2107
MATH MathSciNet Google Scholar
Bartlett P, Boucheron S, Lugosi G (2002) Model selection and error estimation. Mach Learn 48:85–113
Article MATH Google Scholar
Koltchinskii V (2001) Rademacher penalties and structural risk minimization. IEEE Trans Inform Theory 47(5):1902–1914
Article MATH MathSciNet Google Scholar
Koltchinskii V, Panchenko D (2000) Rademacher processes and bounding the risk of function learning. High Dimens Probab II:443–459
Article MATH MathSciNet Google Scholar
Mendelson S (2002) Rademacher averages and phase transitions in glivenko-cantelli classes. IEEE Trans Inform Theory 48(1):251–263
Article MATH MathSciNet Google Scholar
Wang Z, Zhu CM, Niu ZX, Gao DQ, Feng X (2014) Multi-kernel classification machine with reduced complexity. Know Based Syst 65:83–95
Article Google Scholar
Demsar J (2006) Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 7:1–30
MATH MathSciNet Google Scholar

Download references

Acknowledgments

This work was supported by Shanghai Natural Science Foundation under grant number 16ZR1414500, and the author would like to thank their supports.

Author information

Authors and Affiliations

1550 Haigang Avenue, Pudong New Area, Shanghai, China
Changming Zhu

Authors

Changming Zhu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Changming Zhu.

Appendix

To minimize Eq. (5), we adopt a two-step alternating optimization algorithm. Further, for convenience and simplification, we use $\mathcal {Q}(i,p)=\varphi _{i}g^p(A_i^p)-1-b_i^p$ and $\mathcal {Q^{*}}(j,p)=g^p({A_j^{*}}^p)-1-{b_j^{*}}^p$ to denote some parts of Eq. (5). Now the details of this algorithm is given below.

First, we fix each $\eta _q(A_i^q)$, and then the gradients of Eq. (5) with respect to ${u^p}, {\tilde{v}^p}$, $v_0^p$, ${b^p}$, and ${b^{*}}^p$ are given below.

$$\begin{aligned}&\frac{\partial {L}}{\partial {u^p}}=2\left(\sum \limits _{i=1}^{N}(\Omega A_{i}^{p}\tilde{v}^p)+E\sum \limits _{j=1}^{L}(\Omega ^{*} {A_{j}^{*}}^{p}\tilde{v}^p)+CS_1^pu^p\right) \end{aligned}$$

(10)

$$\begin{aligned}&\frac{\partial {L}}{\partial {\tilde{v}^p}}=2\left(\sum \limits _{i=1}^{N}\left(\Omega ({u^p}^TA_i^p)^T\right) +E\sum \limits _{j=1}^{L}\left(\Omega ^{*}({u^p}^T{A_j^{*}}^p)^T\right) +CS_2^p\tilde{v}^p\right) \end{aligned}$$

(11)

$$\begin{aligned}&\frac{\partial {L}}{\partial {v_0^p}}=2(\Omega +\Omega ^{*})\end{aligned}$$

(12)

$$\begin{aligned}&\frac{\partial {L}}{\partial {b^p}}=-2(B-I-b^p)\end{aligned}$$

(13)

$$\begin{aligned}&\frac{\partial {L}}{\partial {{b^{*}}^p}}=-2E(B^{*}-I^{*}-{b^{*}}^p) \end{aligned}$$

(14)

where

$$\begin{aligned}&\Omega =\mathcal {Q}(i,p)\varphi _{i}-\gamma \sum \limits _{k=1}^{M}H(g_k,A_i)\eta _{p}(A_i^p)+\gamma H(g_p,A_i)\end{aligned}$$

(15)

$$\begin{aligned}&\Omega ^{*}=\mathcal {Q}^{*}(j,p)-D\sum \limits _{l=1}^{M}H(g_l,A_j^{*})\eta _{p}({A_j^{*}}^p)+D H(g_p,A_j^{*})\end{aligned}$$

(16)

$$\begin{aligned}&H(g_k,A_i)=g^k(A_i^k)-\sum \limits _{q=1}^{M}\eta _{q}(A_i^q)g^q(A_{i}^{q})\end{aligned}$$

(17)

$$\begin{aligned}&H(g_l,A_j^{*})=g^l({A_j^{*}}^l)-\sum \limits _{h=1}^{M}\eta _{h}({A_j^{*}}^h)g^h({A_j^{*}}^{h})\end{aligned}$$

(18)

$$\begin{aligned}&{B}=\left(\begin{array}{cc} \varphi _{1}g^p(A_1^p)\\ \varphi _{2}g^p(A_2^p)\\ \ldots \\ \varphi _{N}g^p(A_N^p)\\ \end{array} \right) \end{aligned}$$

(19)

$$\begin{aligned}&{B^{*}}=\left(\begin{array}{cc} g^p({A_1^{*}}^p)\\ g^p({A_2^{*}}^p)\\ ...\\ g^p({A_L^{*}}^p)\\ \end{array} \right) \end{aligned}$$

(20)

$I_{N\times 1}$ is an identity matrix with the size of $N\times 1$ and $I^{*}_{L\times 1}$ is an identity matrix with the size of $L\times 1$

Then we make $\frac{\partial {L}}{\partial {u^p}}$, $\frac{\partial {L}}{\partial {\tilde{v}^p}}$, and $\frac{\partial {L}}{\partial {v_0^p}}$ be zeros to get the results of the parameters $u^p$, $\tilde{v}^p$, and $v_0^p$ below.

$$\begin{aligned}&u^p=\nonumber \\&-\left\{ \left[ C{S_1^p}^T+\sum \limits _{i=1}^{N}\left(A_i^p\tilde{v}^p\left(A_i^p\tilde{v}^p\right) ^TU(i)\right) +E\sum \limits _{j=1}^{L}\left({A_j^{*}}^p\tilde{v}^p\left({A_j^{*}}^p\tilde{v}^p\right) ^TU(j)^{*}\right) \right] ^{-1}\right\} ^T\nonumber \\& \times \left\{ \sum \limits _{i=1}^{N}\left[ W(i)\left(A_i^p\tilde{v}^p\right) ^T\right] +E\sum \limits _{j=1}^{L}\left[ W(j)^{*}\left({A_j^{*}}^p\tilde{v}^p\right) ^T\right] \right\} ^T \end{aligned}$$

(21)

$$\begin{aligned}&\tilde{v}^p=\nonumber \\&-\left\{ \left[ CS_2^p+\sum \limits _{i=1}^{N}\left(\left({u^p}^TA_i^{p}\right) ^T{u^p}^TA_i^{p}U(i)\right) +E\sum \limits _{j=1}^{L}\left(\left({u^p}^T{A_j^{*}}^{p}\right) ^T{u^p}^T{A_j^{*}}^{p}U(j)^{*}\right) \right] \right\} ^{-1}\nonumber \\& \times \left\{ \sum \limits _{i=1}^{N}\left[ \left({u^p}^TA_i^p\right) ^TW(i)\right] +E\sum \limits _{j=1}^{L}\left[ \left({u^p}^T{A_j^{*}}^p\right) ^TW(j)^{*}\right] \right\} \end{aligned}$$

(22)

$$\begin{aligned}&v_0^p=-\left\{ N+\sum \limits _{i=1}^{N}\left[ \gamma \left(M\eta _{p}^2(A_i^p)-3\eta _{p}(A_i^p)+2\right) \right] \right. \nonumber \\&\left. +L+\sum \limits _{j=1}^{L}\left[ D\left(M\eta _{p}^2({A_j^{*}}^p)-3\eta _{p}({A_j^{*}}^p)+2\right) \right] \right\} ^{-1} \nonumber \\&\times \left\{ \sum \limits _{i=1}^{N}\left[ {u^p}^TA_i^{p}\tilde{v}^p\left(\gamma M\eta _{p}^2(A_i^p)+\gamma \eta _{p}^2(A_i^p)-5\gamma \eta _{p}(A_i^p)+3\gamma +1\right) \right. \right. \nonumber \\&\left. \left. -(1+b_i^p)\varphi _{i}+\sum \limits _{q=1,q\ne p}^{M}\left[ \left(M\eta _q(A_i^q)\eta _{p}(A_i^p)-\eta _q(A_i^q)-2\eta _{p}(A_i^p)\right) \gamma g^q(A_i^q)\right] \right] \right. \nonumber \\&\left. +\sum \limits _{j=1}^{L}\left[ {u^p}^T{A_j^{*}}^{p}\tilde{v}^p\left(D M\eta _{p}^2({A_j^{*}}^p)+D\eta _{p}^2({A_j^{*}}^p)-5D\eta _{p}({A_j^{*}}^p)+3D+1\right) -\left(1+{b_j^{*}}^p)\right. \right. \right. \nonumber \\&\left. \left. \left. +\sum \limits _{h=1,h\ne p}^{M}\left[ \left(M\eta _h({A_j^{*}}^h)\eta _{p}({A_j^{*}}^p)-\eta _h({A_j^{*}}^h)-2\eta _{p}({A_j^{*}}^p)\right) D g^h({A_j^{*}}^h\right) \right] \right] \right\} \end{aligned}$$

(23)

Here, U(i), W(i), $U(j)^{*}$, and $W(j)^{*}$ are used to simplify Eqs. (21) and (22), namely,

$$\begin{aligned} U(i)=\gamma M\eta _{p}^2(A_i^p)-2\gamma \eta _{p}(A_i^p)+\gamma +1 \end{aligned}$$

(24)

$$\begin{aligned} U(j)^{*}=D M\eta _{p}^2({A_j^{*}}^p)-2D \eta _{p}({A_j^{*}}^p)+D+1 \end{aligned}$$

(25)

$$\begin{aligned}&W(i)=\left(\gamma M\eta _{p}^2(A_i^p)-2\gamma \eta _{p}(A_i^p)+\gamma +1\right) v_0^p-(1+b_i^p)\varphi _{i}+\nonumber \\&\gamma \sum \limits _{q=1,q\ne p}^{M}\Big [g^q(A_i^q)\left(M\eta _q(A_i^q)\eta _{p}(A_i^p)-\eta _q(A_i^q)-\eta _{p}(A_i^p)\right) \Big ] \end{aligned}$$

(26)

$$\begin{aligned}&W(j)^{*}=\left(D M\eta _{p}^2({A_j^{*}}^p)-2D\eta _{p}({A_j^{*}})+D+1\right) v_0^p-(1+{b_j^{*}}^p)+\nonumber \\&D\sum \limits _{h=1,h\ne p}^{M}\Big [g^h\left({A_j^{*}}^h\right) \left(M\eta _h\left({A_j^{*}}^h\right) \eta _{p}\left({A_j^{*}}^p\right) -\eta _h\left({A_j^{*}}^h\right) -\eta _{p}\left({A_j^{*}}^p\right) \right) \Big ] \end{aligned}$$

(27)

Then we let

$$\begin{aligned} {D^p(k)}=\left(\begin{array}{cc} \varphi _{1}^Tu^{p^T}(k)A_1^p\tilde{v}^p(k)+\varphi _{1}v_0^{p^T}(k)\\ \varphi _{2}^Tu^{p^T}(k)A_2^p\tilde{v}^p(k)+\varphi _{2}v_0^{p^T}(k)\\ \ldots \\ \varphi _{N}^Tu^{p^T}(k)A_N^p\tilde{v}^p(k)+\varphi _{N}v_0^{p^T}(k)\\ \end{array} \right) \end{aligned}$$

(28)

and

$$\begin{aligned} {{D^{*}}^p(k)}=\left(\begin{array}{cc} u^{p^T}(k){A_1^{*}}^p\tilde{v}^p(k)+v_0^{p^T}(k)\\ u^{p^T}(k){A_2^{*}}^p\tilde{v}^p(k)+v_0^{p^T}(k)\\ \ldots \\ u^{p^T}(k){A_L^{*}}^p\tilde{v}^p(k)+v_0^{p^T}(k)\\ \end{array} \right) \end{aligned}$$

(29)

then at k-th iteration, the error vector with p-th matrix, i.e., $e^{p}(k)$ and ${e^{*}}^{p}(k)$ can be computed by Eqs. (30) and (31).

$$\begin{aligned} e^{p}(k)=D^p(k)-I-b^{p}(k) \end{aligned}$$

(30)

$$\begin{aligned} {e^{*}}^{p}(k)={D^{*}}^p(k)-I^{*}-{b^{*}}^p(k) \end{aligned}$$

(31)

The margin $b^p$ and ${b^{*}}^p$ are given below.

$$\begin{aligned} b^{p}(k+1)=b^{p}(k)+\rho (e^{p}(k)+|e^{p}(k)|) \end{aligned}$$

(32)

$$\begin{aligned} {b^{*}}^{p}(k+1)={b^{*}}^{p}(k)+\rho ^{*}({e^{*}}^{p}(k)+|{e^{*}}^{p}(k)|) \end{aligned}$$

(33)

where $b^{p}(1)\ge 0 _{N\times 1}$ and ${b^{*}}^{p}(1)\ge 0 _{N\times 1}$. The learning rates are $0<\rho <1$ and $0<\rho ^{*}<1$. ${D^{*}}^p(k)$, ${b^{*}}^{p}(k)$, $D^p(k)$, $b^{p}(k)$, $u^{p}(k)$, $\tilde{v}^p(k)$, and $v_0^{p}(k)$ represent the values of ${D^{*}}^p$, ${b^{*}}^{p}$, $D^p$, $b^{p}$, $u^{p}$, $\tilde{v}^p$, and $v_0^{p}$ at k-th iteration, respectively. For ones of Universum patterns, they have the same meaning.

Second, we fix each $u^p$, $\tilde{v}^p$, $v_0^{p}$, $b^p$, ${b^{*}}^{p}$ and calculate the gradients of Eq. (5) with respect to $h_{q1}$, $h_{q2}$, and $h_{q0}$ are given below. Here $p=1,2,\ldots ,M$ and $q=1,2,\ldots ,M$.

$$\begin{aligned}&\frac{\partial {L}}{\partial {h_{qs}}}\nonumber \\&=2\gamma \sum \limits _{i=1}^{N}\sum \limits _{p=1}^{M}\left\{ \left[ -\sum \limits _{h=1}^{M}\frac{\partial {\eta _h\left(A_{i}^{h}\right) }}{\partial {h_{qs}}}g^h(A_i^h)\right] \left[ g^p(A_{i}^{p})-\sum \limits _{h=1}^{M}\eta _h(A_i^h)g^h(A_{i}^{h})\right] \right\} \nonumber \\&+2D\sum \limits _{j=1}^{L}\sum \limits _{r=1}^{M}\left\{ \left[ -\sum \limits _{k=1}^{M}\frac{\partial {\eta _k\left({A_j^{*}}^{k}\right) }}{\partial {h_{qs}}}g^k\left({A_j^{*}}^k\right) \right] \left[ g^r\left({A_j^{*}}^{r}\right) -\sum \limits _{k=1}^{M}\eta _k\left({A_j^{*}}^k\right) g^k\left({A_j^{*}}^{k}\right) \right] \right\} \end{aligned}$$

(34)

where $s=0,1,2$. To simplify this equation, we give some other equations below.

$$\begin{aligned}&\frac{\partial {\eta _h(A_{i}^{h})}}{\partial {h_{q1}}}=E_1A_{i}^qh_{q2} \quad \quad (h \ne q)\end{aligned}$$

(35)

$$\begin{aligned}&\frac{\partial {\eta _q(A_{i}^{q})}}{\partial {h_{q1}}}=E_2A_i^qh_{q2}\end{aligned}$$

(36)

$$\begin{aligned}&\frac{\partial {\eta _h(A_{i}^{h})}}{\partial {h_{q2}}}=E_1A_{i}^{q^T}h_{q1} \quad \quad (h \ne q)\end{aligned}$$

(37)

$$\begin{aligned}&\frac{\partial {\eta _q(A_{i}^{q})}}{\partial {h_{q2}}}=E_2A_i^{q^T}h_{q1}\end{aligned}$$

(38)

$$\begin{aligned}&\frac{\partial {\eta _k({A_j^{*}}^{k})}}{\partial {h_{q1}}}=E_1^{*}{A_j^{*}}^qh_{q2} \quad \quad (k \ne q)\end{aligned}$$

(39)

$$\begin{aligned}&\frac{\partial {\eta _q({A_j^{*}}^{q})}}{\partial {h_{q1}}}=E_2^{*}A_i^qh_{q2}\end{aligned}$$

(40)

$$\begin{aligned}&\frac{\partial {\eta _k({A_j^{*}}^{k})}}{\partial {h_{q2}}}=E_1^{*}{A_j^{*}}^{q^T}h_{q1} \quad \quad (k \ne q)\end{aligned}$$

(41)

$$\begin{aligned}&\frac{\partial {\eta _q({A_j^{*}}^{q})}}{\partial {h_{q2}}}=E_2^{*}A_i^{q^T}h_{q1} \end{aligned}$$

(42)

and

$$\begin{aligned}&\frac{\partial {\eta _h(A_{i}^{h})}}{\partial {h_{q0}}}=E_1=\frac{-e^{\left(h_{h1}^TA_i^hh_{h2}+h_{h0}\right) }e^{\left(h_{q1}^TA_i^qh_{q2}+h_{q0}\right) }}{\left[ \sum \nolimits _{t=1}^{M}e^{\left(h_{t1}^TA_i^th_{t2}+h_{t0}\right) }\right] ^2} \quad \quad (h \ne q)\end{aligned}$$

(43)

$$\begin{aligned}&\frac{\partial {\eta _q(A_{i}^{q})}}{\partial {h_{q0}}}=E_2=\frac{e^{\left(h_{q1}^TA_i^qh_{q2}+h_{q0}\right) }}{\sum \nolimits _{t=1}^{M}e^{\left(h_{t1}^TA_i^th_{t2}+h_{t0}\right) }}-\frac{e^{2\left(h_{q1}^TA_i^qh_{q2}+h_{q0}\right) }}{\left[ \sum \nolimits _{t=1}^{M}e^{\left(h_{t1}^TA_i^th_{t2}+h_{t0}\right) }\right] ^2}\end{aligned}$$

(44)

$$\begin{aligned}&\frac{\partial {\eta _k\left({A_j^{*}}^{k}\right) }}{\partial {h_{q0}}}=E_1^{*}=\frac{-e^{\left(h_{k1}^T{A_j^{*}}^kh_{k2}+h_{k0}\right) }e^{\left(h_{q1}^T{A_j^{*}}^qh_{q2}+h_{q0}\right) }}{\left[ \sum \nolimits _{t=1}^{M}e^{\left(h_{t1}^T{A_j^{*}}^th_{t2}+h_{t0}\right) }\right] ^2} \quad 'ad (k \ne q)\end{aligned}$$

(45)

$$\begin{aligned}&\frac{\partial {\eta _q\left({A_j^{*}}^{q}\right) }}{\partial {h_{q0}}}=E_2^{*}=\frac{e^{\left(h_{q1}^T{A_j^{*}}^qh_{q2}+h_{q0}\right) }}{\sum \nolimits _{t=1}^{M}e^{\left(h_{t1}^T{A_j^{*}}^th_{t2}+h_{t0}\right) }}-\frac{e^{2\left(h_{q1}^T{A_j^{*}}^qh_{q2}+h_{q0}\right) }}{\left[ \sum \nolimits _{t=1}^{M}e^{\left(h_{t1}^T{A_j^{*}}^th_{t2}+h_{t0}\right) }\right] ^2} \end{aligned}$$

(46)

Then, we need to update the parameters of $\eta _q(A_i^q)$ after k-th iteration, i.e.,

$$\begin{aligned} h_{qs}^{(k+1)}=h_{qs}^{(k)}-\mu \frac{\partial {L}}{\partial {h_{qs}^{(k)}}} \end{aligned}$$

(47)

Here, $\mu$ is the attenuation coefficient which is a constant in each iteration and $h_{qs}^{(k)}$ represents the $h_{qs}$ at k-th iteration. In order to compute $\frac{\partial {L}}{\partial {h_{qs}^{(k)}}}$, we replace $h_{qs}$ in Eqs. (34)–(46) with $h_{qs}^{(k)}$. Finally, we update the ${u^p}$, $\tilde{v}^p$, $v_0^{p}$, ${b^p}$, ${{b^{*}}^p}$, $h_{q1}$, $h_{q2}$, and $h_{q0}$, then we can compute Eq. (5). In practice, the termination criterion is given in the following equation.

$$\begin{aligned} \frac{\parallel L(k+1)-L(k)\parallel _2}{\parallel L(k)\parallel _2}\le \xi \end{aligned}$$

(48)

where L(k) represents the value of Eq. (5) at k-th iteration and $\parallel {.}\parallel _2$ still represents the 2-norm operation.

If Eq. (48) is satisfied, we can stop the procedure and get the optimal parameters.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zhu, C. Double-fold localized multiple matrix learning machine with Universum. Pattern Anal Applic 20, 1091–1118 (2017). https://doi.org/10.1007/s10044-016-0548-9

Download citation

Received: 24 June 2015
Accepted: 06 April 2016
Published: 27 April 2016
Issue Date: November 2017
DOI: https://doi.org/10.1007/s10044-016-0548-9

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Double-fold localized multiple matrix learning machine with Universum

Abstract

Access this article

Similar content being viewed by others

Matrixized Learning Machine with Feature-Clustering Interpolation

Regularized Matrix-Pattern-Oriented Classification Machine with Universum

A simple multiple-fold correlation-based multi-view multi-label learning

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Appendix

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Double-fold localized multiple matrix learning machine with Universum

Abstract

Access this article

Similar content being viewed by others

Matrixized Learning Machine with Feature-Clustering Interpolation

Regularized Matrix-Pattern-Oriented Classification Machine with Universum

A simple multiple-fold correlation-based multi-view multi-label learning

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Appendix

Appendix

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation