Skip to main content
Log in

Optimal Learning Rates for Kernel Partial Least Squares

  • Published:
Journal of Fourier Analysis and Applications Aims and scope Submit manuscript

Abstract

We study two learning algorithms generated by kernel partial least squares (KPLS) and kernel minimal residual (KMR) methods. In these algorithms, regularization against overfitting is obtained by early stopping, which makes stopping rules crucial to their learning capabilities. We propose a stopping rule for determining the number of iterations based on cross-validation, without assuming a priori knowledge of the underlying probability measure, and show that optimal learning rates can be achieved. Our novel analysis consists of a nice bound for the number of iterations in a priori knowledge-based stopping rule for KMR and a stepping stone from KMR to KPLS. Technical tools include a recently developed integral operator approach based on a second order decomposition of inverse operators and an orthogonal polynomial argument.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Bauer, F., Pereverzev, S., Rosasco, L.: On regularization algorithms in learning theory. J. Complex. 23, 52–72 (2007)

    Article  MathSciNet  MATH  Google Scholar 

  2. Blanchard, G., Kr\(\ddot{\text{a}}\)mer, N.: Kernel partial least square is universally consistent. In: Proceeding of the 13th International Conference on Artificial Intelligence and Statistics, JMLR Workshop & Conference Proceedings, vol. 9, pp. 57–64 (2010)

  3. Blanchard, G., Kr\(\ddot{\text{ a }}\)mer, N.: Optimal learning rates for kernel conjugate gradient regression. In: NIPs, pp. 226–234 (2010)

  4. Caponnetto, A., DeVito, E.: Optimal rates for the regularized least squares algorithm. Found. Comput. Math. 7, 331–368 (2007)

    Article  MathSciNet  MATH  Google Scholar 

  5. Caponnetto, A., Yao, Y.: Cross-validation based adaptation for regularization operators in learning theory. Anal. Appl. 8, 161–183 (2010)

    Article  MathSciNet  MATH  Google Scholar 

  6. Chun, H., Keles, S.: Sparse partial least squares for simultaneous dimension reduction and variable selection. J. R. Stat. Soc. Ser. B 72, 3–25 (2010)

    Article  MathSciNet  Google Scholar 

  7. De Vito, E., Pereverzyev, S., Rosasco, L.: Adaptive kernel methods using the balancing principle. Found. Comput. Math. 10, 455–479 (2010)

    Article  MathSciNet  MATH  Google Scholar 

  8. Engle, H., Hanke, M., Neubauer, A.: Regularization of Inverse Problems. Kluwer Academic, Amsterdam (2000)

    Google Scholar 

  9. Evgeniou, T., Pontil, M., Poggio, T.: Regularization networks and support vector machines. Adv. Comput. Math. 13, 1–50 (2000)

    Article  MathSciNet  MATH  Google Scholar 

  10. Guo, X., Zhou, D.X.: An empirical feature-based learning algorithm producing sparse approximations. Appl. Comput. Harmon. Anal. 32, 389–400 (2012)

    Article  MathSciNet  MATH  Google Scholar 

  11. Guo, Z.C., Lin, S.B., Zhou, D.X.: Optimal learning rates for spectral algorithm. In: Inverse Problems, Minor Revision Under Review (2016)

  12. Hanke, M.: Conjugate gradient type methods for Ill-posed problems. Pitman Research Notes in Mathematics Series, vol. 327 (1995)

  13. Hu, T., Fan, J., Wu, Q., Zhou, D.X.: Regularization schemes for minimum error entropy principle. Anal. Appl. 13, 437–455 (2015)

    Article  MathSciNet  MATH  Google Scholar 

  14. Li, S., Liao, C., Kwok, J.: Gene feature extraction using T-test statistics and kernel partial least squares. In: Neural Information Processing, pp. 11–20. Springer, Berlin (2006)

  15. Lin, S.B., Guo, X., Zhou, D.X.: Distributed learning with regularization schemes. J. Mach. Learn. Res. (Revision under review) (2016). arXiv:1608.03339

  16. Lo Gerfo, L., Rosasco, L., Odone, F., De Vito, E., Verri, A.: Spectral algorithms for supervised learning. Neural Comput. 20, 1873–1897 (2008)

    Article  MathSciNet  MATH  Google Scholar 

  17. Raskutti, G., Wainwright, M., Yu, B.: Early stopping and non-parametric regression: an optimal data-dependent stopping rule. J. Mach. Learn. Res. 15, 335–366 (2014)

    MathSciNet  MATH  Google Scholar 

  18. Rosipal, R., Trejo, L.: Kernel partial least squares regression in reproducing kernel Hilbert spaces. J. Mach. Learn. Res. 2, 97–123 (2001)

    MATH  Google Scholar 

  19. Smale, S., Zhou, D.X.: Learning theory estimates via integral operators and their approximations. Constr. Approx. 26, 153–172 (2007)

    Article  MathSciNet  MATH  Google Scholar 

  20. Wold, H.: Path models with latent variables: the NIPALS approach. In: Quantitative Sociology: International Perspectives on Mathematical and Statistical Model Building, pp. 307–357. Academic Press, New York (1975)

  21. Wu, Q., Ying, Y.M., Zhou, D.X.: Learning rates of least square regularized regression. Found. Comput. Math. 6, 171–192 (2006)

    Article  MathSciNet  MATH  Google Scholar 

  22. Yao, Y., Rosasco, L., Caponnetto, A.: On early stopping in gradient descent learning. Constr. Approx. 26, 289–315 (2007)

    Article  MathSciNet  MATH  Google Scholar 

Download references

Acknowledgements

The work described in this paper is partially supported by the NSFC/RGC Joint Research Scheme [RGC Project No. N_CityU120/14 and NSFC Project No. 11461161006] and by the National Natural Science Foundation of China [Grant Nos. 61502342 and 11471292]. The paper was written when the second author visited Shanghai Jiao Tong University jointly sponsored by Ministry of Education of China for which the hospitality is acknowledged.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Shao-Bo Lin.

Additional information

Communicated by Massimo Fornasier.

Appendix

Appendix

This appendix provides five technical lemmas and proofs of two propositions concerning the a priori knowledge based learning algorithms. The first lemma concerning the range of the operator \(L_{K,D}\) is well known in the literature [1, 3, 10, 16].

Lemma 1

Denote \({\mathscr {H}}_{K, \mathbf{x}} = \hbox {span}\{K(\cdot , x_i)\}_{i=1}^N\) for \(D \in {\mathscr {Z}}^N\). Then \(f_{K,D} \in {\mathscr {H}}_{K, \mathbf{x}}\). The space \({\mathscr {H}}_{K, \mathbf{x}}\) equals the range of \(L_{K,D}\) and is spanned by all eigenfunctions of \(L_{K,D}\) with positive eigenvalues. Its dimension equals the rank \(d_\mathbf{x}\) of the Gramian matrix \({\mathbb K}\).

The following two lemmas found in [12] describe some properties of \(p_m^{[u]}\) and \(q_{m-1}^{[u]}\).

Lemma 2

Let \(m\in \mathbb {N}_0\). The following identities hold

$$\begin{aligned} \big (p^{[1]}_m\big )'(0)-\big (p^{[1]}_{m+1}\big )'(0)= & {} \frac{\big [p^{[1]}_m, p^{[1]}_m\big ]_{[0]} -\big [p^{[1]}_{m+1}, p^{[1]}_{m+1}\big ]_{[0]}}{[p^{[2]}_m, p^{[2]}_m]_{[1]}}, \end{aligned}$$
(36)
$$\begin{aligned} \big (p^{[1]}_{m+1}\big )'(0)-\big (p^{[0]}_{m+1}\big )'(0)= & {} \frac{\big [p^{[1]}_{m+1}, p^{[1]}_{m+1}\big ]_{[0]}}{\big [p^{[2]}_{m},p^{[2]}_{m}\big ]_{[1]}}, \end{aligned}$$
(37)
$$\begin{aligned} p_m^{[2]}(t)= & {} \frac{p_{m+1}^{[1]}(t)-p_{m+1}^{[0]}(t)}{t\big [\big (p^{[1] }_{m+1}\big )'(0)-\big (p^{[0]}_{m+1}\big )'(0)\big ]},\qquad \forall \ 0<t\le \kappa ^2, \nonumber \\ \end{aligned}$$
(38)

where \((p^{[1]}_{m+1})'(0)\ne (p^{[0]}_{m+1})'(0)\).

The above three identities are stated in Corollary 2.6, Corollary 2.9, and Proposition 2.8 of [12].

Lemma 3

Let \(u\in \{0,1,2\}, m\in \mathbb {N},\) and \(\{t^{[u]}_{k,m}\}_{k=1}^m\) be the simple zeros of \(p^{[u]}_m\) in the increasing order. Then the following statements hold

$$\begin{aligned} 0<t^{[u]}_{k,m}< & {} t^{[u]}_{k,m-1}<t^{[u]}_{k+1,m}, \qquad \text{ for }\ m\ge 2, \end{aligned}$$
(39)
$$\begin{aligned} t^{[0]}_{k,m}< & {} t^{[1]}_{k,m}<t^{[2]}_{k,m}, \end{aligned}$$
(40)
$$\begin{aligned} q_{m-1}^{[u]}(0)= & {} -(p^{[u]}_m)'(0)=\sum _{k=1}^m\big (t^{[u]}_{k,m}\big )^{-1}=\max _{0\le t\le t^{[u]}_{1,m}} q^{[u]}_{m-1}(t), \end{aligned}$$
(41)
$$\begin{aligned} q^{[u]}_{m-1}(0)\le & {} q^{[u]}_m(0)\le \big (t^{[u]}_{1,m+1}\big )^{-1}+q^{[u]}_{m-1}(0). \end{aligned}$$
(42)

The first two statements above are stated in Corollary 2.7 of [12], while the last two follow from the first statement and the representation of \(p_m^{[u]}\) in terms of its constant term 1 and zeros as

$$\begin{aligned} p_m^{[u]}(t)= \prod _{k=1}^m\left( 1-t/t^{[u]}_{k,m}\right) , \qquad m\in \mathbb {N}. \end{aligned}$$
(43)

The fourth lemma focuses on bounding \(\mathscr {Q}_{D,\lambda }, \mathscr {P}_{D,\lambda }\) and \(\mathscr {R}_D\).

Lemma 4

Let D be a sample drawn independently according to \(\rho \) and \(0< \delta <1\). Then each of the following estimates holds with confidence at least \(1-\delta \),

$$\begin{aligned} \mathscr {Q}_{D,\lambda }\le & {} \frac{2\sqrt{2}(\kappa ^2 +\kappa ) \mathscr {A}_{D,\lambda }\log \frac{2}{\delta }}{\sqrt{\lambda }} +\sqrt{2}, \end{aligned}$$
(44)
$$\begin{aligned} \mathscr {P}_{D,\lambda }\le & {} 2(\kappa ^2 +\kappa ) \mathscr {A}_{D,\lambda } \log \bigl (2/\delta \bigr ), \end{aligned}$$
(45)
$$\begin{aligned} \mathscr {R}_D\le & {} \frac{2\kappa ^2}{\sqrt{|D|}} \log \frac{2}{\delta }. \end{aligned}$$
(46)

The proofs of (45), (46) and (44) can be found in [4, 22] and [11], respectively.

The last lemma refers to a concentration inequality stated in [5, Proposition 11].

Lemma 5

Let \(\{\xi _i\}_{i=1}^n\) be a sequence of real valued independent random variables with mean \(\mu \), satisfying \(|\xi _i|\le B\) and \( E[(\xi _i-\mu )^2]\le \tau ^2\) for \(i\in \{1,2,\dots ,n\}\). Then for any \(a>0\) and \(\varepsilon >0\), there hold

$$\begin{aligned} \mathbf P\left[ \frac{1}{n}\sum _{i=1}^n\xi _i-\mu \ge a\tau ^2+\varepsilon \right] \le e^{-\frac{6na\varepsilon }{3+4aB}}, \end{aligned}$$

and

$$\begin{aligned} \mathbf P\left[ \mu -\frac{1}{n}\sum _{i=1}^n\xi _i\ge a\tau ^2+\varepsilon \right] \le e^{-\frac{6na\varepsilon }{3+4aB}}. \end{aligned}$$

With the help of the above lemmas, we can now prove Propositions 1 and 2.

Proof of Proposition 1

We start with proving the first statement (24). Since \(p^{[1]}_{0}(t)=1\) and \((p^{[1]}_{0})'(t)=0\) for all \(t\in [0,\kappa ^2]\), (24) holds obviously for \(\hat{m}=1\). It then suffices to prove (24) for \(\hat{m}\ge 2\). It was presented in [12, p. 41] (see also [3, p. 16]) that

$$\begin{aligned} \Vert L_{K,D}f^{[1]}_{D,\hat{m}-1}-f_{K,D}\Vert _K \le \Vert F_{t^{[1]}_{1,\hat{m}-1}} \phi ^{[1]}_{\hat{m}-1}(L_{K,D})f_{K,D}\Vert . \end{aligned}$$

Here \(F_{t^{[1]}_{1,\hat{m}-1}} \phi ^{[1]}_{\hat{m}-1}(L_{K,D})\) is the linear operator on \({\mathscr {H}}_K\) defined in terms of the orthonormal basis \(\{\phi _j^\mathbf{x}\}_j\) and the orthogonal projection \(F_{t^{[1]}_{1,\hat{m}-1}}\) by spectral calculus as

$$\begin{aligned} F_{t^{[1]}_{1,\hat{m}-1}} \phi ^{[1]}_{\hat{m}-1}(L_{K,D}) \left( \sum _jb_j\phi _j^\mathbf{x}\right) = \sum _{\sigma _j^\mathbf{x}<t^{[1]}_{1,\hat{m}-1}} \phi ^{[1]}_{\hat{m}-1}(\sigma _j^\mathbf{x})b_j\phi _j^\mathbf{x}, \end{aligned}$$

where \(\phi ^{[1]}_{\hat{m}-1}(t)\) is the function defined on \([0, t^{[1]}_{1, \hat{m}-1})\) by

$$\begin{aligned} \phi ^{[1]}_{\hat{m}-1}(t)=p^{[1]}_{\hat{m}-1}(t)\left( \frac{t^{[1]}_{1, \hat{m}-1}}{t^{[1]}_{1, \hat{m}-1}-t}\right) ^{1/2}, \qquad 0\le t < t^{[1]}_{1, \hat{m}-1}. \end{aligned}$$

Then we decompose \(f_{K,D}\) as \(f_{K,D} - L_{K,D }f_\rho + L_{K,D }f_\rho \) and bound the norm as

$$\begin{aligned}&\Vert L_{K,D}f^{[1]}_{D,\hat{m}-1}-f_{K,D}\Vert _K \le \Vert F_{t^{[1]}_{1,{\hat{m}-1}}} \phi ^{[1]}_{{\hat{m}-1}}(L_{K,D })(f_{K,D}-L_{K,D }f_\rho )\Vert _K \nonumber \\&\qquad + \Vert F_{t^{[1]}_{1,{\hat{m}-1}}}\phi ^{[1]}_{{\hat{m}-1}}(L_{K,D })L_{K,D }f_\rho \Vert _K =: I + II. \end{aligned}$$
(47)

We continue our estimates by bounding the first term I. Applying (26) with \(\alpha =1/2\) gives

$$\begin{aligned} I= & {} \Vert F_{t^{[1]}_{1,{\hat{m}-1}}}\phi ^{[1]}_{{\hat{m}-1}}(L_{K,D})(L_{K,D}+\lambda I)^{1/2} (L_{K,D}+\lambda I)^{-1/2}(L_K+\lambda I)^{1/2}\\&(L_K+\lambda I)^{-1/2}(f_{K,D}-L_{K,D}f_\rho )\Vert _K\\\le & {} \Vert F_{t^{[1]}_{1,{\hat{m}-1}}}\phi ^{[1]}_{{\hat{m}-1}}(L_{K,D})(L_{K,D}+\lambda I)^{1/2}\Vert \mathscr {Q}_{D,\lambda }\mathscr {P}_{D,\lambda }\\\le & {} \left( \sup _{t\in [0,t^{[1]}_{1,{\hat{m}-1}})}t^{1/2}|\phi ^{[1]}_{{\hat{m}-1}}(t)| +\lambda ^{1/2}\sup _{t\in [0,t^{[1]}_{1,{\hat{m}-1}})}|\phi ^{[1]}_{{\hat{m}-1}}(t)|\right) \mathscr {Q}_{D,\lambda }\mathscr {P}_{D,\lambda }. \end{aligned}$$

Furthermore, the representation (43) for \(p^{[1]}_{\hat{m}-1}\) and the definition of \(\phi ^{[1]}_{\hat{m}-1}\) yield

$$\begin{aligned} |\phi ^{[1]}_{{\hat{m}-1}}(t)|=\left| (1-t/t^{[1]}_{1,{\hat{m}-1}})^{1/2}\prod _{k=2}^{{\hat{m}-1}} (1-t/t^{[1]}_{k,{\hat{m}-1}})\right| \le 1, \qquad \forall \ 0\le t < t^{[1]}_{1,{\hat{m}-1}}. \end{aligned}$$

It was shown in [12, Equation (3.10)] that for an arbitrary \(\nu >0\),

$$\begin{aligned} \sup _{t\in [0,t^{[1]}_{1,{\hat{m}-1}})}t^\nu (\phi ^{[1]}_{{\hat{m}-1}}(t))^2\le \nu ^\nu |(p^{[1]}_{{\hat{m}-1}})' (0)|^{-\nu }. \end{aligned}$$
(48)

Combining the above three bounds yields an estimate for the first term of (47) as

$$\begin{aligned} I \le (|(p^{[1]}_{{\hat{m}-1}})'(0)|^{-1/2}+\lambda ^{1/2})\mathscr {Q}_{D,\lambda }\mathscr {P}_{D,\lambda }. \end{aligned}$$
(49)

We now turn to the second term II of (47). By the regularity condition (7) for \(f_\rho = L_K^r h_\rho \) and the identity \(\Vert L_K^{1/2} h_\rho \Vert _K = \Vert h_\rho \Vert _\rho \), we find

$$\begin{aligned} II \le \Vert F_{t^{[1]}_{1,{\hat{m}-1}}}\phi ^{[1]}_{{\hat{m}-1}}(L_{K,D })L_{K,D } L_K^{r-1/2}\Vert \Vert h_\rho \Vert _\rho \le \widetilde{II} \Vert h_\rho \Vert _\rho , \end{aligned}$$
(50)

where for simplicity we denote the norm as

$$\begin{aligned} \widetilde{II} := \Vert F_{t^{[1]}_{1,{\hat{m}-1}}}\phi ^{[1]}_{{\hat{m}-1}}(L_{K, D})L_{K, D} (L_K + \lambda I)^{r-1/2}\Vert . \end{aligned}$$

When \(1/2\le r\le 3/2\), we express \((L_K + \lambda I)^{r-1/2}\) as \((L_{K, D} + \lambda I)^{r-1/2} (L_{K, D} + \lambda I)^{1/2 -r}(L_K + \lambda I)^{r-1/2}\) and apply (26) with \(\alpha =r-1/2\) to get

$$\begin{aligned} \widetilde{II} \le \Vert F_{t^{[1]}_{1,{\hat{m}-1}}}\phi ^{[1]}_{{\hat{m}-1}}(L_{K, D})L_{K, D} (L_{K, D} + \lambda I)^{r-1/2}\Vert \mathscr {Q}_{D,\lambda }^{2r-1}. \end{aligned}$$
(51)

When \(r>3/2\), we decompose the operator \((L_K + \lambda I)^{r-1/2}\) in \(\widetilde{II}\) as

$$\begin{aligned} (L_{K,D }+\lambda I)^{r-1/2} + \left\{ (L_K + \lambda I)^{r-1/2} - (L_{K,D }+\lambda I)^{r-1/2}\right\} . \end{aligned}$$

The bounds \(\Vert L_{K,D}\Vert \le \kappa ^2\), \(\Vert L_K\Vert \le \kappa ^2\), and the Lipschitz property of the function \(x\mapsto x^{r-1/2}\) imply

$$\begin{aligned} \Vert L_{K,D}^{r-1/2}-L_K^{r-1/2} \Vert \le (r-1/2)\kappa ^{2r-3}\Vert L_{K,D}-L_K\Vert . \end{aligned}$$
(52)

Hence

$$\begin{aligned} \widetilde{II}\le & {} \Vert F_{t^{[1]}_{1,{\hat{m}-1}}}\phi ^{[1]}_{{\hat{m}-1}}(L_{K, D})L_{K, D} (L_{K, D} + \lambda I)^{r-1/2}\Vert \\&+ \Vert F_{t^{[1]}_{1,{\hat{m}-1}}}\phi ^{[1]}_{\hat{m}-1}(L_{K,D }) L_{K,D }\Vert (r-1/2)\kappa ^{2r-3} \mathscr {R}_{D}. \end{aligned}$$

Combining this with (51) and the following norm estimate with \(\gamma , \beta \ge 0\),

$$\begin{aligned}&\Vert F_{t^{[1]}_{1,{\hat{m}-1}}}\phi ^{[1]}_{{\hat{m}-1}}(L_{K, D})L_{K, D}^{\gamma } (L_{K, D} + \lambda I)^{\beta }\Vert = \sup _{t\in [0,t^{[1]}_{1,{\hat{m}-1}})} \left\{ t^\gamma (t + \lambda )^{\beta } \left| \phi ^{[1]}_{{\hat{m}-1}}(t)\right| \right\} \\&\qquad \le 2^{\beta } \max \left\{ (2\gamma + 2\beta )^{\gamma + \beta } |(p^{[1]}_{{\hat{m}-1}})' (0)|^{-(\gamma + \beta )}, \lambda ^\beta (2\gamma )^{\gamma } |(p^{[1]}_{{\hat{m}-1}})' (0)|^{-\gamma }\right\} \end{aligned}$$

derived from spectral calculus and the inequality (48), we have with the notation \({\mathscr {I}} = \lambda |(p^{[1]}_{{\hat{m}-1}})' (0)|\),

$$\begin{aligned} \widetilde{II} \le \left\{ \begin{array}{ll} \left( 2^{r - \frac{1}{2}} (2 r + 1)^{r + \frac{1}{2}} {\mathscr {I}}^{-(r + \frac{1}{2})} + 2^{r + \frac{1}{2}} {\mathscr {I}}^{-1}\right) \lambda ^{r + \frac{1}{2}} \mathscr {Q}_{D,\lambda }^{2r-1}, &{} \hbox {when} \ \frac{1}{2} \le r \le \frac{3}{2}, \\ 2^{r - \frac{1}{2}} (2 r + 1)^{r + \frac{1}{2}} {\mathscr {I}}^{-(r + \frac{1}{2})} \lambda ^{r + \frac{1}{2}} &{} \\ + {\mathscr {I}}^{-1} \left( 2^{r + \frac{1}{2}} \lambda ^{r + \frac{1}{2}}+ 2(r-1/2)\kappa ^{2r-3}\lambda \mathscr {R}_{D}\right) , &{} \hbox {when} \ r > \frac{3}{2}. \end{array}\right. \end{aligned}$$

This together with (50), the bound (49) for I, (47), and the definition (16) of the quantity \(\Lambda _{\rho , \lambda , r}\) tells us that

$$\begin{aligned} \Vert L_{K,D}f^{[1]}_{D,\hat{m}-1}-f_{K,D}\Vert _K \le \left( \frac{1}{3}{\mathscr {I}}^{-\frac{1}{2}} +\frac{1}{3} + \frac{1}{6} {\mathscr {I}}^{-(r + \frac{1}{2})} + \frac{1}{3} {\mathscr {I}}^{-1}\right) \lambda ^{\frac{1}{2}} \Lambda _{\rho , \lambda , r}.\nonumber \\ \end{aligned}$$
(53)

On the other hand, \(\hat{m}\ge 2\) is the smallest nonnegative integer satisfying (16), so for the smaller integer \(\hat{m}-1\), we must have

$$\begin{aligned} \Vert L_{K,D}f^{[1]}_{D,\hat{m}-1}-f_{K,D}\Vert _K > \lambda ^{\frac{1}{2}} \Lambda _{\rho , \lambda , r}. \end{aligned}$$

This together with (53) implies

$$\begin{aligned} \lambda ^{\frac{1}{2}} \Lambda _{\rho , \lambda , r} \le \left( \frac{1}{3}{\mathscr {I}}^{-\frac{1}{2}} +\frac{1}{3} + \frac{1}{6} {\mathscr {I}}^{-(r + \frac{1}{2})} + \frac{1}{3} {\mathscr {I}}^{-1}\right) \lambda ^{\frac{1}{2}} \Lambda _{\rho , \lambda , r} \end{aligned}$$

and thereby

$$\begin{aligned} {\mathscr {I}}^{-\frac{1}{2}} + \frac{1}{2} {\mathscr {I}}^{-(r + \frac{1}{2})} + {\mathscr {I}}^{-1} \ge 2. \end{aligned}$$

One of the above terms in the summation is at least \(\frac{2}{3}\). It follows that \({\mathscr {I}} \le \frac{9}{4}<3\). It follows that \(|(p_{\hat{m}-1}^{[1]})'(0)| = \frac{{\mathscr {I}}}{\lambda } <\frac{3}{\lambda }\). This proves the first statement (24).

To prove the second statement, we first claim that for \(v\in \{1, 2\}\),

$$\begin{aligned} \Vert F_\varepsilon [p_{\hat{m}-1}^{[v]}( L_{K,D})f_{K,D}]\Vert _K \le \Vert F_\varepsilon [f_{K,D}]\Vert _K. \end{aligned}$$
(54)

This claim is obviously true for \(\hat{m} =1\) with equality valid since in this case \(p_{\hat{m}-1}^{[v]} \equiv 1\) and \(p_{\hat{m}-1}^{[v]}(L_{K,D})\) is the identity operator.

Consider the case \(\hat{m}\ge 2\). Since \(\varepsilon =\lambda /3\), we have from (24), (41) and (40) that

$$\begin{aligned} \varepsilon =\frac{\lambda }{3}\le |(p_{\hat{m}-1}^{[1]})'(0)|^{-1} =\left[ \sum _{k=1}^{\hat{m}-1}(t_{k,\hat{m}-1}^{[1]})^{-1}\right] ^{-1}\le t_{1,\hat{m}-1}^{[1]}<t_{1,\hat{m}-1}^{[2]}. \end{aligned}$$
(55)

It follows from (43) that for \(v\in \{1, 2\}\), there holds

$$\begin{aligned} \max _{0\le t\le \varepsilon }p_{\hat{m}-1}^{[v]}(t) \le \max _{0\le t\le t^{[v]}_{1,\hat{m}-1}}p_{\hat{m}-1}^{[v]}(t) = \max _{0\le t\le t^{[v]}_{1,\hat{m}-1}} \Pi _{k=1}^{\hat{m}-1} \left( 1- t/t^{[v]}_{k,\hat{m}-1}\right) \le 1. \end{aligned}$$

Recall the eigenpairs \(\{(\sigma _i^\mathbf{x},\phi _i^\mathbf{x})\}_{i}\) of \(L_{K,D}\). Expressing \(f_{K,D}=\sum _jc_j\phi _j^\mathbf{x}\) implies

$$\begin{aligned} \Vert F_\varepsilon [p_{\hat{m}-1}^{[v]}( L_{K,D})f_{K,D}]\Vert _K= & {} \left\| F_\varepsilon \left[ \sum _{j }p_{\hat{m}-1}^{[v]}( \sigma _j^\mathbf{x}) c_j\phi _j^\mathbf{x} \right] \right\| _K\\= & {} \left\| \sum _{j:\sigma _j^\mathbf{x}<\varepsilon } p_{\hat{m}-1}^{[v]}( \sigma _j^\mathbf{x})c_j\phi _j^\mathbf{x} \right\| _K = \sqrt{ \sum _{j:\sigma _j^\mathbf{x}<\varepsilon } [ p_{\hat{m}-1}^{[v]}( \sigma _j^\mathbf{x})c_j]^2}\\\le & {} \sqrt{ \sum _{j:\sigma _j^\mathbf{x}<\varepsilon } c_j^2}= \Vert F_\varepsilon [f_{K,D}]\Vert _K. \end{aligned}$$

So the claim (54) is also true in the case \(\hat{m}\ge 2\). This proves the claim.

To prove the second statement of the proposition, we estimate the norm \(\Vert F_\varepsilon [f_{K,D}]\Vert _K\). Under the condition (7),

$$\begin{aligned}&\Vert F_\varepsilon [f_{K,D}]\Vert _K \le \Vert F_\varepsilon [f_{K,D}-L_{K,D}f_\rho ]\Vert _K + \Vert F_\varepsilon [L_{K,D}f_\rho ]\Vert _K\nonumber \\&\quad \le \Vert F_\varepsilon [(L_{K,D}+\lambda I)^{1/2}] \Vert \Vert (L_{K,D}+\lambda I)^{-1/2}(L_{K}+\lambda I)^{1/2}\Vert \nonumber \\&\quad \quad \times \Vert (L_{K}+\lambda I)^{-1/2}(f_{K,D}-L_{K,D}f_\rho )\Vert _K + \Vert F_\varepsilon L_{K,D}L_K^{r-1/2}\Vert \Vert L_{K}^{1/2}h_\rho \Vert _K \nonumber \\&\quad \le (\varepsilon +\lambda )^{1/2}\mathscr {Q}_{D,\lambda }\mathscr {P}_{D,\lambda }+ \Vert F_\varepsilon L_{K,D}L_K^{r-1/2}\Vert \Vert h_\rho \Vert _\rho , \end{aligned}$$
(56)

where the operators \(F_\varepsilon (L_{K,D}+\lambda I)^{1/2} \) and \(F_\varepsilon L_{K,D}L_K^{r-1/2}\) are defined by spectral calculus.

When \(1/2\le r\le 3/2\), we have

$$\begin{aligned} \Vert F_\varepsilon L_{K,D}L_K^{r-1/2}\Vert \le \Vert F_\varepsilon L_{K,D} (L_{K,D}+\lambda I)^{r-1/2}\Vert \mathscr {Q}_{D,\lambda }^{2r-1}\le \varepsilon (\lambda +\varepsilon )^{r-1/2}\mathscr {Q}_{D,\lambda }^{2r-1}. \end{aligned}$$

When \(r>3/2\), it follows from (52) that

$$\begin{aligned} \Vert F_\varepsilon L_{K,D}L_K^{r-1/2}\Vert\le & {} \Vert F_\varepsilon L_{K,D}L_{K,D}^{r-1/2}\Vert +\Vert F_\varepsilon L_{K,D}(L_K^{r-1/2}-L_{K,D}^{r-1/2})\Vert \\\le & {} \varepsilon ^{r+1/2}+(r-1/2)\kappa ^{2r-3}\varepsilon \mathscr {R}_{D}. \end{aligned}$$

Combining the above bounds for \(\Vert F_\varepsilon L_{K,D}L_K^{r-1/2}\Vert \) with (56) and noticing the choice \(\varepsilon = \lambda /3\) and the definition (17) of the quantity \(\Lambda _{\rho , \lambda , r}\), we find

$$\begin{aligned} \Vert F_\varepsilon [p_{\hat{m}-1}^{[v]}(L_{K,D}) f_{K,D}]\Vert _K \le \frac{1}{2} \lambda ^{1/2} \Lambda _{\rho , \lambda , r}. \end{aligned}$$

But \(\hat{m}\) is the smallest nonnegative integer satisfying (16), the integer \(\hat{m}-1\) does not satisfy (16). Hence (22) implies

$$\begin{aligned} \lambda ^{1/2} \Lambda _{\rho , \lambda , r} \le \Vert L_{K,D}f_{D,\hat{m}-1}^{[1]}- f_{K,D}\Vert _K = [p_{\hat{m}-1}^{[1]}, p_{\hat{m}-1}^{[1]}]_{[0]}^{1/2}. \end{aligned}$$

Then the desired statement of the proposition is verified. The proof of Proposition 1 is completed. \(\square \)

Proof of Proposition 2

We first prove (30). Since \(|(p^{[1]}_{0})'(0)|=0\), (30) obviously holds for \(\hat{m}=0\). We then consider the case \(\hat{m}\ge 1\). By (41),

$$\begin{aligned} |(p_{\hat{m}}^{[1] })'(0)|=-(p_{\hat{m}}^{[1] })'(0) =(p_{\hat{m}-1}^{[1] })'(0)-(p_{\hat{m}}^{[1] })'(0)+|(p_{\hat{m}-1}^{[1] })'(0)|. \end{aligned}$$
(57)

From (36), we have

$$\begin{aligned} (p_{\hat{m}-1}^{[1] })'(0)-(p_{\hat{m}}^{[1] })'(0) =\frac{[p_{\hat{m}-1}^{[1]},p_{\hat{m}-1}^{[1]}]_{[0]} -[p_{\hat{m}}^{[1]},p_{\hat{m}}^{[1]}]_{[0]}}{[p^{[2]}_{\hat{m}-1},p^{[2] }_{\hat{m}-1}]_{[1]}}. \end{aligned}$$

Therefore,

$$\begin{aligned} (p_{\hat{m}-1}^{[1] })'(0)-(p_{\hat{m}}^{[1] })'(0)\le \frac{[p_{\hat{m}-1}^{[1]}, p_{\hat{m}-1}^{[1]}]_{[0]}}{[p^{[2]}_{\hat{m}-1},p^{[2]}_{\hat{m}-1}]_{[1]}}. \end{aligned}$$
(58)

Then, it follows from (55), (22) and (5) that

$$\begin{aligned}{}[p_{\hat{m}-1}^{[1]}, p_{\hat{m}-1}^{[1]}]_0^{1/2}= & {} \Vert p^{[1]}_{\hat{m}-1}(L_{K,D })f_{K,D}\Vert _K = \Vert L_{K,D }f_{D,\hat{m}-1}^{[1]}-f_{K,D}\Vert _K \\\le & {} \Vert L_{K,D }f_{D,\hat{m}-1}^{[2]}-f_{K,D}\Vert _K = \Vert p^{[2]}_{\hat{m}-1}(L_{K,D })f_{K,D}\Vert _K\\\le & {} \Vert F_\varepsilon [p^{[2]}_{\hat{m}-1}(L_{K,D })f_{K,D}]\Vert _K +\Vert F^\bot _\varepsilon [p^{[2]}_{\hat{m}-1}(L_{K,D })f_{K,D}]\Vert _K. \end{aligned}$$

But Proposition 1 with \(v=2\) gives

$$\begin{aligned} \Vert F_\varepsilon [p_{\hat{m}-1}^{[2]}( L_{K,D })f_{K,D}]\Vert _K \le \frac{1}{2} [p_{\hat{m}-1}^{[1]}, p_{\hat{m}-1}^{[1]}]_{[0]}^{1/2}. \end{aligned}$$

Hence

$$\begin{aligned} \, [p_{\hat{m}-1}^{[1]}, p_{\hat{m}-1}^{[1]}]_{[0]}^{1/2}\le & {} \frac{1}{2} [p_{\hat{m}-1}^{[1]}, p_{\hat{m}-1}^{[1]}]_{[0]}^{1/2}+\varepsilon ^{-1/2} \Vert p^{[2]}_{\hat{m}-1}(L_{K,D})L_{K,D}^{1/2}f_{K,D}\Vert _K\\= & {} \frac{1}{2} [p_{\hat{m}-1}^{[1]}, p_{\hat{m}-1}^{[1]}]_{[0]}^{1/2} +\varepsilon ^{-1/2}[p^{[2]}_{\hat{m}-1},p^{[2]}_{\hat{m}-1}]_{[1]}^{1/2}. \end{aligned}$$

Therefore,

$$\begin{aligned} \, [p_{\hat{m}-1}^{[1]}, p_{\hat{m}-1}^{[1]}]_{[0]}^{1/2} \le 2\varepsilon ^{-1/2}[p^{[2]}_{\hat{m}-1} ,p^{[2]}_{\hat{m}-1} ]_{[1]}^{1/2}, \end{aligned}$$

which together with (57), (58) and Proposition 1 yields

$$\begin{aligned} |(p_{\hat{m}}^{[1]})'(0)|\le 3\lambda ^{-1}+12\lambda ^{-1}=15\lambda ^{-1}. \end{aligned}$$

This proves the first statement (30) of Proposition 2.

To prove the second statement, we denote \(\varepsilon _0=\lambda /15\) and

$$\begin{aligned} f^{[1]*}_{D,\hat{m}} = \left\{ \begin{array}{ll} q^{[1]}_{\hat{m}-1}(L_{K,D})L_{K,D}f_\rho , &{} \hbox {if} \ \hat{m} \ge 1, \\ 0, &{} \hbox {if} \ \hat{m} =0. \end{array}\right. \end{aligned}$$
(59)

We can decompose \(\Vert f^{[1]}_{D,\hat{m}}-f_\rho \Vert _\rho \) as

$$\begin{aligned}&\Vert f^{[1]}_{D,\hat{m}}-f_\rho \Vert _\rho = \Vert L_K^{1/2}(f^{[1]}_{D,\hat{m}}-f_\rho )\Vert _K \le \Vert (L_K+\lambda I)^{1/2}(f^{[1]}_{D,\hat{m}}-f_\rho )\Vert _K \nonumber \\&\quad \le \mathscr {Q}_{D,\lambda }\Vert F_{\varepsilon _0}[(L_{K,D}+\lambda I)^{1/2}(f^{[1]}_{D,\hat{m}}-f^{[1]*}_{D,\hat{m}})]\Vert _K + \mathscr {Q}_{D,\lambda }\Vert F_{\varepsilon _0}[(L_{K,D}\nonumber \\&\quad +\lambda I)^{1/2}(f^{[1]*}_{D,\hat{m}}-f_\rho )]\Vert _K\nonumber \\&\quad \quad + \mathscr {Q}_{D,\lambda }\Vert F_{\varepsilon _0}^{\bot }[(L_{K,D}+\lambda I)^{1/2}(f^{[1]}_{D,\hat{m}}-f_\rho )]\Vert _K\nonumber \\&\quad =: \mathscr {Q}_{D,\lambda }(A_1+A_2+A_3). \end{aligned}$$
(60)

Due to (41) and (30), we have

$$\begin{aligned} \varepsilon _0=\lambda /15\le |(p^{[1] }_{\hat{m}})'(0)|^{-1} \le \left[ \sum _{k=1}^{\hat{m}}(t_{k,\hat{m}})^{-1}\right] ^{-1} \le t^{[1]}_{1,\hat{m}}. \end{aligned}$$
(61)

Note that \(A_1 =0\) and \(f^{[1]*}_{D,\hat{m}}-f_\rho = -f_\rho =- p^{[1]}_{\hat{m}}(L_{K,D}) f_\rho \) when \(\hat{m}=0\). If \(\hat{m} \ge 1\), we use (20), (59), (26), the definitions of \(\mathscr {P}_{D,\lambda }\) and \(\mathscr {Q}_{D,\lambda }\) to bound \(A_1\) as

$$\begin{aligned}&A_1 = \Vert F_{\varepsilon _0}[(L_{K,D}+\lambda I)^{1/2}q_{\hat{m}-1}^{[1]}(L_{K,D})(f_{K,D}-L_{K,D}f_\rho )]\Vert _K\\&\quad \le \Vert F_{\varepsilon _0} [(L_{K,D}+\lambda I)^{1/2}q^{[1]}_{\hat{m}-1}(L_{K,D})(L_K+\lambda I)^{1/2}]\Vert \Vert (L_K+\lambda I)^{-1/2} (f_{K,D}\\&\qquad -L_{K,D }f_\rho ) \Vert _K\\&\quad \le \mathscr {Q}_{D,\lambda }\mathscr {P}_{D,\lambda }\max _{0\le t < {\varepsilon _0}}|(t+\lambda )q^{[1]}_{\hat{m}-1}(t)|. \end{aligned}$$

By (43), for \(0\le t < {\varepsilon _0} \le t^{[1]}_{1,\hat{m}}\), we have

$$\begin{aligned} |tq^{[1]}_{\hat{m}-1}(t)|=|1-p^{[1]}_{\hat{m}}(t)|\le 1. \end{aligned}$$

Furthermore, (43), (41), (61) and (30) imply

$$\begin{aligned} \max _{0\le t < {\varepsilon _0}}|q^{[1]}_{\hat{m}-1}(t)|\le q^{[1]}_{\hat{m}-1}(0) =|(p^{[1] }_{\hat{m}})'(0)| \le 15\lambda ^{-1}. \end{aligned}$$

Therefore, the first term in (60) can be bounded as

$$\begin{aligned} A_1\le 16\mathscr {Q}_{D,\lambda }\mathscr {P}_{D,\lambda }. \end{aligned}$$
(62)

We then bound the second term \(A_2\) in two cases involving r.

When \(1/2\le r\le 3/2\), we have \(r-1/2\le 1\), and the bound \(\sup _{0\le t < {\varepsilon _0}\le t^{[1]}_{1,\hat{m}}} |p^{[1]}_{\hat{m}}(t)| \le 1\) together with (26) and the regularization condition (7) yields

$$\begin{aligned} A_2\le & {} \Vert F_{\varepsilon _0}(L_{K,D}+\lambda I)^{1/2}p^{[1]}_{\hat{m}}(L_{K,D})L_K^{r-1/2}\Vert \Vert h_\rho \Vert _\rho \nonumber \\\le & {} \mathscr {Q}_{D,\lambda }^{2r-1}\Vert F_{\varepsilon _0}(L_{K,D}+\lambda I)^{1/2}(L_{K,D}+\lambda I)^{r-1/2}\Vert \Vert h_\rho \Vert _\rho \nonumber \\\le & {} \mathscr {Q}_{D,\lambda }^{2r-1}\Vert h_\rho \Vert _\rho ({\varepsilon _0}+\lambda )^r. \end{aligned}$$
(63)

When \(r>3/2\), we use (21), (59) and the regularization condition (7) to get

$$\begin{aligned} A_2&\le \Vert F_{\varepsilon _0}(L_{K,D}+\lambda I)^{1/2}p^{[1]}_{\hat{m}}(L_{K,D}) (L_{K,D}^{r-1/2}-L_K^{r-1/2})\Vert \Vert h_\rho \Vert _\rho \\&\quad + \Vert F_{\varepsilon _0}(L_{K,D}+\lambda I)^{1/2}p^{[1]}_{\hat{m}}(L_{K,D})L_{K,D}^{r-1/2}\Vert \Vert h_\rho \Vert _\rho . \end{aligned}$$

Since \(|p^{[1]}_{\hat{m}}(t)|\le 1\) for all \(0\le t < {\varepsilon _0}\le t^{[1]}_{1,\hat{m}}\), we get

$$\begin{aligned} \Vert F_{\varepsilon _0}(L_{K,D}+\lambda I)^{1/2}p^{[1]}_{\hat{m}}(L_{K,D})L_{K,D}^{r-1/2}\Vert \le \varepsilon _0^{r-1/2}({\varepsilon _0}+\lambda )^{1/2}. \end{aligned}$$

Combining these with (52) and the definition of \(\mathscr {R}_{D}\) yields

$$\begin{aligned} A_2 \le (\varepsilon _0^{r-1/2}+(r-1/2)\kappa ^{2r-3}\mathscr {R}_{D}) ({\varepsilon _0}+\lambda )^{1/2}\Vert h_\rho \Vert _\rho . \end{aligned}$$
(64)

Finally, we turn to bound \(A_3\). From Lemma 1 and \(f_\rho \in \mathscr {H}_K\), we obtain that \(f_\rho \) is in the range of \(L_{K,D}\). Since \((a+b)^{1/2}\le a^{1/2}+b^{1/2}\) for \(a,b>0\), we then have

$$\begin{aligned} A_3\le & {} \Vert F^\bot _{\varepsilon _0} [L^{1/2}_{K,D}(f^{[1]}_{D,\hat{m}}-f_\rho )]\Vert _K + \lambda ^{1/2}\Vert F^\bot _{\varepsilon _0} (f^{[1]}_{D,\hat{m}}-f_\rho )\Vert _K\\\le & {} \left( \frac{({\varepsilon _0}+\lambda )^{1/2}}{{\varepsilon _0}^{1/2}}+ \lambda ^{1/2}\frac{({\varepsilon _0}+\lambda )^{1/2}}{{\varepsilon _0}}\right) \Vert F^\bot _{\varepsilon _0} (L_{K,D}+\lambda I)^{-1/2} L_{K,D}(f^{[1]}_{D,\hat{m}}-f_\rho )\Vert _K\\\le & {} \left( 1+\frac{\lambda ^{1/2}}{{\varepsilon _0}^{1/2}}\right) \left( 1+ \frac{\lambda }{{\varepsilon _0}}\right) ^{1/2} \Vert F^\bot _{\varepsilon _0} (L_{K,D}+\lambda I)^{-1/2} (L_{K,D}f^{[1]}_{D,\hat{m}}-f_{K,D})\Vert _K\\+ & {} \quad \left( 1+\frac{\lambda ^{1/2}}{{\varepsilon _0}^{1/2}}\right) \left( 1+ \frac{\lambda }{{\varepsilon _0}}\right) ^{1/2} \Vert F^\bot _{\varepsilon _0} (L_{K,D }+\lambda I)^{-1/2} (f_{K,D}-L_{K,D(x)}f_\rho )\Vert _K\\\le & {} \left( 1+\frac{\lambda ^{1/2}}{{\varepsilon _0}^{1/2}}\right) {\varepsilon _0}^{-1/2} \Vert L_{K,D}f^{[1]}_{D,\hat{m}}-f_{K,D}\Vert _K + \sqrt{2}\mathscr {Q}_{D,\lambda }\mathscr {P}_{D,\lambda }\left( 1+ \frac{\lambda }{{\varepsilon _0}}\right) . \end{aligned}$$

But \(\hat{m}\) satisfies (16). It follows that

$$\begin{aligned} A_3\le \left( 1+\frac{\lambda ^{1/2}}{{\varepsilon _0}^{1/2}}\right) {\varepsilon _0}^{-1/2} \lambda ^{1/2}\Lambda _{\rho ,\lambda ,r}+\sqrt{2}\mathscr {Q}_{D,\lambda }\mathscr {P}_{D,\lambda }\left( 1+ \frac{\lambda }{{\varepsilon _0}}\right) . \end{aligned}$$
(65)

Inserting (62), (63), (64) and (65) into (60) and noticing \(\varepsilon _0=\lambda /15\), we obtain

$$\begin{aligned} \Vert f^{[1]}_{D,\hat{m}}-f_\rho \Vert _\rho \le 32\mathscr {Q}_{D,\lambda }\Lambda _{\rho ,\lambda ,r}. \end{aligned}$$

This verifies the second statement of the proposition. The proof of Proposition 2 is complete. \(\square \)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Lin, SB., Zhou, DX. Optimal Learning Rates for Kernel Partial Least Squares. J Fourier Anal Appl 24, 908–933 (2018). https://doi.org/10.1007/s00041-017-9544-8

Download citation

  • Received:

  • Revised:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00041-017-9544-8

Keywords

Mathematics Subject Classification

Navigation