Abstract
We consider the estimation of the slope function in functional linear regression, where a scalar response Y is modelled in dependence of a random function X, when Y and only a panel Z 1,…,Z L of noisy measurements of X are observable. Assuming an i.i.d. sample of (Y,Z 1,…,Z L ) of size n we propose an estimator of the slope which is based on a dimension reduction technique and additional thresholding. We derive in terms of both the sample size n and the panel size L a lower bound of a maximal weighted risk over a certain ellipsoid of slope functions and a certain class of covariance operators associated with the regressor X. It is shown that the proposed estimator attains this lower bound up to a constant and hence it is minimax-optimal. The results are illustrated considering different configurations which cover in particular the estimation of the slope as well as its derivatives.
Similar content being viewed by others
Notes
The limit ‘‘∞’’ is admitted with lim n→∞ a n =∞ :⇔ \(\forall\, K>0\,\exists\, n_{o}\in \mathbb {N}\,\forall\, n\geq n_{o}:a_{n}\geq K\).
References
Bereswill M (2009) Minimax-optimal estimation in functional linear model with noisy regressor. Master’s thesis, Ruprecht-Karls-Universität Heidelberg
Bosq D (2000) Linear processes in function spaces. Lecture notes in statistics, vol 149. Springer, Berlin
Cardot H, Johannes J (2010) Thresholding projection estimators in functional linear models. J Multivar Anal 101(2):395–408
Cardot H, Ferraty F, Sarda P (1999) Functional linear model. Stat Probab Lett 45(1):11–22
Cardot H, Ferraty F, Sarda P (2003) Spline estimators for the functional linear model. Stat Sin 13:571–591
Cardot H, Mas A, Sarda P (2007) CLT in functional linear regression models. Probab Theory Relat Fields 138(3–4):325–361
Crambes C, Kneip A, Sarda P (2009) Smoothing splines estimators for functional linear regression. Ann Stat 37(1):35–72
Dahlhaus R, Polonik W (2006) Nonparametric quasi-maximum likelihood estimation for Gaussian locally stationary processes. Ann Stat 34(6):2790–2824
Davidson KR, Szarek SJ (2001) Local operator theory, random matrices and Banach spaces. In: Johnson WB, Lindenstrauss J (eds) Handbook on the geometry of Banach spaces, vol 1. North-Holland/Elsevier, Amsterdam, pp 317–366
Efromovich S, Koltchinskii V (2001) On inverse problems with unknown operators. IEEE Trans Inf Theory 47(7):2876–2894
Engl HW, Hanke M, Neubauer A (2000) Regularization of inverse problems. Kluwer Academic, Dordrecht
Ferraty F, Vieu P (2006) Nonparametric functional data analysis: methods, theory, applications and implementations. Springer, London
Forni M, Reichlin L (1998) Let’s get real: a factor analytical approach to disaggregated business cycle dynamics. Rev Econ Stud 65:453–473
Frank I, Friedman J (1993) A statistical view of some chemometrics regression tools. Technometrics 35:109–148
Hall P, Horowitz JL (2007) Methodology and convergence rates for functional linear regression. Ann Stat 35(1):70–91
Heinz E (1951) Beiträge zur Störungstheorie der Spektralzerlegung. Math Ann 123:415–438
Hoffmann M, ReißM (2008) Nonlinear estimation for linear inverse problems with error in the operator. Ann Stat 36(1):310–336
Johannes J, Schenk R (2013) On rate optimal local estimation in functional linear regression. Electron J Stat 7:191–216
Korostolev AP, Tsybakov AB (1993) Minimax theory for image reconstruction. Lecture notes in statistics, vol 82. Springer, Berlin
Marx BD, Eilers PH (1999) Generalized linear regression on sampled signals and curves: a p-spline approach. Technometrics 41:1–13
Meister A (2011) Asymptotic equivalence of functional linear regression and a white noise inverse problem. Ann Stat 39(3):1471–1495
Müller H-G, Stadtmüller U (2005) Generalized functional linear models. Ann Stat 33:774–805
Natterer F (1984) Error bounds for Tikhonov regularization in Hilbert scales. Appl Anal 18:29–37
Neubauer A (1988) When do Sobolev spaces form a Hilbert scale? Proc Am Math Soc 103(2):557–562
Preda C, Saporta G (2005) PLS regression on a stochastic process. Comput Stat Data Anal 48:149–158
Ramsay J, Silverman B (2005) Functional data analysis, 2nd edn. Springer, New York
Yao F, Müller H-G, Wang J-L (2005) Functional linear regression analysis for longitudinal data. Ann Stat 33(6):2873–2903
Acknowledgements
We are grateful to two referees and the Associate Editor for constructive criticism.
Author information
Authors and Affiliations
Corresponding author
Additional information
Support from the IAP Research Network P7/06 of the Belgian State (Belgian Science Policy) is gratefully acknowledged.
Appendix A: Proofs
Appendix A: Proofs
We begin by defining and recalling notation to be used in the proofs of this section. Given m≥1, \(\mathbb {H}_{m}\) denotes the subspace of \(\mathbb {H}\) spanned by the functions {ψ 1,…,ψ m }. Π m and \(\varPi_{m}^{\perp}\) denote the orthogonal projections on \(\mathbb {H}_{m}\) and its orthogonal complement \(\mathbb {H}_{m}^{\perp}\), respectively. If K is an operator mapping \(\mathbb {H}\) to itself and if we restrict Π m KΠ m to an operator from \(\mathbb {H}_{m}\) to itself, then it can be represented by a matrix \([K]_{{\underline {m}}}\) with generic entries \(\langle \psi _{j},K\psi _{l}\rangle_{\mathbb {H}}=:[K]_{j,l}\) for 1≤j,l≤m. The spectral norm of \([K]_{{\underline {m}}} \) is denoted by \(\lVert[K]_{{\underline {m}}}\rVert_{s}\) and the inverse matrix of \([K]_{{\underline {m}}}\) by \([K]_{{\underline {m}}}^{-1}\). We denote by \(\operatorname{\lVert\cdot\rVert}\) the Euclidean norm, by \([\mathop{\nabla}\nolimits _{\omega }]_{{\underline {m}}}\) the m-dimensional diagonal matrix with entries (ω 1,…,ω m ) and by \([{\rm Id}]_{{\underline {m}}}\) the m-dimensional identity matrix. Consider the Galerkin solution \(\beta ^{m}\in \mathbb {H}_{m}\) and \(h\in \mathbb {H}_{m}\) then the random variables \(\langle \beta - \beta ^{m},X\rangle_{\mathbb {H}}\) and \(\langle X,h\rangle_{\mathbb {H}}\) are jointly normally distributed and independent because \(\mathbb{E}[\langle \beta - \beta ^{m},X\rangle_{\mathbb {H}}\langle X,h\rangle_{\mathbb {H}}]= \langle \varGamma (\beta - \beta ^{m}),h\rangle_{\mathbb {H}}=[h]_{{\underline {m}}}^{t} [\varGamma (\beta -\beta ^{m})]_{{\underline {m}}}=[h]_{{\underline {m}}}^{t} ([g]_{{\underline {m}}}- [\varGamma ]_{{\underline {m}}}[ \beta ^{m}]_{{\underline {m}}})=0\). Recall that \(\widehat {X}_{1}^{(i)}:=\frac{2}{L}\sum_{\ell=1}^{L/2} Z_{\ell}^{(i)}\) and \(\widehat {X}_{2}^{(i)}:=\frac{2}{L}\sum_{\ell=L/2+1}^{L} Z_{\ell}^{(i)}\) where \([\widehat {X}_{1}^{(i)}]_{{\underline {m}}}\) and \([\widehat {X}_{2}^{(i)}]_{{\underline {m}}}\) are jointly normally distributed with marginal mean vector zero, marginal covariance matrix \([\varSigma]_{{\underline {m}}}:= [\varGamma ]_{{\underline {m}}}+2\varsigma ^{2}L^{-1} [{\rm Id}]_{{\underline {m}}}\) and cross-covariance matrix \([\varGamma ]_{{\underline {m}}}\). Moreover, it follows that \(U^{(i)}:= Y^{(i)} -\langle \beta ^{m},\widehat {X}_{2}^{(i)}\rangle_{\mathbb {H}}\) and \([\widehat {X}_{1}^{(i)}]_{{\underline {m}}}\) are independent, normally distributed with mean zero, and, respectively, variance \(\rho^{2}_{m}:=\sigma ^{2}+ \langle \varGamma (\beta -\beta ^{m}),(\beta - \beta ^{m})\rangle_{\mathbb {H}}+ \frac{2\varsigma ^{2}}{L}\lVert \beta ^{m}\rVert_{\mathbb {H}}^{2}\) and covariance matrix \([\varSigma]_{{\underline {m}}}:= [\varGamma ]_{{\underline {m}}}+2\varsigma ^{2}L^{-1} [{\rm Id}]_{{\underline {m}}}\). Note that \(\lVert[\varSigma]_{{\underline {m}}}^{-1}\rVert_{s}=(\lVert [\varGamma ]_{m}^{-1}\rVert _{s}^{-1}+2\varsigma L^{-1})^{-1}\) and \(\lVert[\varSigma]_{{\underline {m}}}^{1/2} [\varGamma ]_{{\underline {m}}}^{-1}[\varSigma]_{{\underline {m}}}^{1/2}\rVert_{s}=(1+2\varsigma L^{-1}\lVert [\varGamma ]_{m}^{-1}\rVert_{s})\). Moreover, \([\widehat {\varGamma }]_{{\underline {m}}}=\frac {1}{n}\sum _{i=1}^{n} [\widehat {X}^{(i)}_{1}]_{{\underline {m}}}[\widehat {X}^{(i)}_{2}]_{{\underline {m}}}^{t}\) and \([\widehat {g}]_{{\underline {m}}}= \frac{1}{n}\sum_{i=1}^{n} Y^{(i)}[\widehat {X}^{(i)}_{1}]_{{\underline {m}}}\) satisfy \(\mathbb{E} [\widehat {\varGamma }]_{{\underline {m}}}= [\varGamma ]_{{\underline {m}}}\) and \(\mathbb{E}[\widehat {g}]_{{\underline {m}}}=[g]_{{\underline {m}}}\). Define the random matrix \([\varXi]_{{\underline {m}}}\) and the random vector \([W]_{{\underline {m}}}\) respectively by
where \(\mathbb{E}[\varXi]_{{\underline {m}}}= 0\) and \(\mathbb{E}[W]_{{\underline {m}}}=0\). Moreover, we define the events
Observe that \(\mho_{n,L} \subset\Omega_{n,L}\) for all L≥2 and \(n \geq2 \lVert [\varGamma ]_{m}^{-1}\rVert_{s}\). Indeed, on the event \(\mho _{n,L}\), i.e., \(\lVert[\varXi]_{{\underline {m}}}\rVert_{s}\lVert[\varSigma]_{{\underline {m}}}^{1/2}[\varGamma ]_{{\underline {m}}}^{-1}[\varSigma]_{{\underline {m}}}^{1/2}\rVert_{s}\leq1/2\), the identity
implies by the usual Neumann series argument that
Moreover, we have
Thereby, if \(n \geqslant2\lVert [\varGamma ]_{m}^{-1}\rVert_{s}\), then we have \(\mho_{n,L} \subset\Omega_{n,L}\), for all L≥2. These results will be used below without further reference. We shall gather in the end of this section technical Lemmas A.2–A.6 which are used in the following proofs. Furthermore, we will denote by C universal numerical constants and by C(⋅) constants depending only on the arguments. In both cases, the values of the constants may change from line to line.
1.1 A.1 Proof of the consistency result
Proof of Proposition 3.1
We use the identity
and obtain
where \(\lVert \beta ^{m}-\beta \rVert_{\omega }^{2}=o(1)\) as m→∞ due to the condition (3.1) and we will bound the remaining two terms on the right hand side separately. Consider the last right hand side term. From (3.2) it follows, on the one hand, that \(n \geqslant2 \lVert [\varGamma ]_{{\underline {m}}}^{-1}\rVert_{s}\) for all sufficiently large n (and hence \(\Omega_{n,L}^{c}\subset\mho_{n,L}^{c}\)) and, on the other hand, by employing (A.24) in Lemma A.6 that \(\lVert \beta \rVert_{\omega }^{2}P(\Omega_{n,L}^{c})=o(1)\) as n→∞ for all \(\beta \in \mathcal {F}_{\omega }\). Regarding the first right hand side term in (A.2), we have
By using , the identity (A.1) and
it follows that
By employing Lemma A.6 and (A.21)–(A.23) we conclude that
Keeping in mind condition (3.2) we deduce from (A.24) in Lemma A.6 that \(n m^{2}(P(\mho _{n,L}^{c}))^{1/2}=O(1)\) which in turn implies and completes the proof. □
Proof of Corollary 3.2
First, we prove that \(\varGamma \in \mathcal {G}_{\gamma }^{d}\) implies (3.1). On the one hand, we have \(\lVert\varPi _{m}^{\perp} \beta \rVert_{\omega }=o(1)\) as m→∞ by Lebesgue’s dominated convergence theorem. On the other hand, from the identity \([\varPi_{m} \beta - \beta ^{m}]_{{\underline {m}}} = -[\varGamma ]_{{\underline {m}}}^{-1}[\varGamma \varPi_{m}^{\perp} \beta ]_{{\underline {m}}}\) we conclude \(\lVert\varPi_{m} \beta - \beta ^{m}\rVert_{\omega }^{2} \leq2(1+d^{2})\lVert\varPi_{m}^{\perp} \beta \rVert_{\omega }^{2}\) for all \(\varGamma \in \mathcal {G}_{\gamma }^{d}\). By combining the two results, we obtain the assertion. It remains to show that (3.2) can be substituted by (3.4). If \(\varGamma \in \mathcal {G}_{\gamma }^{d}\) then
due to (A.12) and (A.13) in Lemma A.2. Taking into account these bounds the condition (3.4) implies (3.2), which proves the result. □
1.2 A.2 Proof of the lower bound
Proof of Theorem 3.3
Consider i.i.d. standard normally distributed random variables {U j } j≥1 and ϵ. Let \(X:=\sum_{j\geq1}\gamma _{j}^{1/2}\,U_{j}\,\psi _{j}\) which is a centred Gaussian random function with associated covariance operator Γ belonging to \(\mathcal {G}_{\gamma }^{d}\) and having eigenfunctions given by the basis {ψ j } j≥1. Then \([X]_{j}=\langle X,\psi _{j}\rangle_{\mathbb {H}}\), j≥1, are independent and normally distributed random variables with mean zero and variance γ j . Let \(Z_{\ell}=X+\varsigma \,\dot{B}_{\ell}\), l=1,…,L, be a panel of noisy observations of X where \(\dot{B}_{1},\dotsc, \dot{B}_{L}\) are independent Gaussian white noises, i.e., \([\dot{B}_{\ell}]_{j}=\langle \dot{B}_{\ell},\psi_{j}\rangle _{\mathbb {H}}\), j≥1, ℓ=1,…,L, are i.i.d. standard normally distributed random variables, which are independent of ϵ and X. Consequently, \(\frac{1}{L}\sum_{\ell =1}^{L}[Z_{\ell}]_{j}\), j≥1, are independent and normally distributed with mean zero and variance γ j +ς 2 L −1. Consider \(\theta\in\{-1,1\}^{{m^{*}}}\) where \({m^{*}}:={m^{*}_{n,L}}\) is defined in (3.6). Let u be m ∗-dimensional vector with coefficients u j to be chosen below such that
Then for each θ the slope function \(\beta ^{\theta}=\sum _{j=1}^{{m^{*}}}\theta_{j}u_{j}\psi _{j}\) belongs to \(\mathcal {F}_{b}^{r}\). Moreover, let \(Y=\langle \beta ^{\theta},X\rangle_{\mathbb {H}}+\sigma \,\epsilon \), then (Y,Z 1,…,Z L ) obey model (1.1a)–(1.1b). Consider an i.i.d. sample \(\{(Y^{(i)},Z_{1}^{(i)},\dotsc ,Z_{L}^{(i)})\}_{i=1}^{n}\) from (Y,Z 1,…,Z L ) of size n and denote its joint distribution by \(P_{\theta}^{n}\). Furthermore, for j=1,…,m ∗ and each θ we introduce θ (j) by \(\theta^{(j)}_{k}=\theta_{k}\) for k≠j and \(\theta ^{(j)}_{j}=-\theta_{j}\). As in case of \(P_{\theta}^{n}\) the conditional distribution of Y (i) given \(Z_{1}^{(i)},\dotsc ,Z_{L}^{(i)}\) is Gaussian with conditional mean
and conditional variance
it is easily seen that the log-likelihood of \(P_{\theta^{(j)}}^{n}\) w.r.t. \(P_{\theta}^{n}\) is
and its expectation w.r.t. \(P_{\theta}^{n}\) satisfies \(\mathbb{E}_{P_{\theta }^{n}}[\log(dP_{\theta^{(j)}}^{n}/dP_{\theta}^{n})]\geq-\frac{2n}{\sigma ^{2}} \frac{\gamma _{j}^{2} u_{j}^{2}}{(\gamma _{j}+\varsigma ^{2}\,L^{-1})}\) because \(\sigma^{2}_{u}\geq \sigma ^{2}\). In terms of Kullback–Leibler divergence this means \(KL(P_{\theta^{(j)}}^{n},P_{\theta}^{n}) \leq\frac{2n}{\sigma ^{2}} \frac {\gamma _{j}^{2} u_{j}^{2}}{(\gamma _{j}+\varsigma ^{2}\,L^{-1})}\). Since the Hellinger distance \(H(P_{\theta^{(j)}}^{n},P_{\theta}^{n})\) satisfies \(H^{2}(P_{\theta^{(j)}}^{n},P_{\theta}^{n}) \leqslant KL(P_{\theta ^{(j)}}^{n},P_{\theta}^{n})\), from (A.5) it follows that
Consider the Hellinger affinity \(\rho(P_{\theta^{(j)}}^{n},P_{\theta }^{n})= \int\sqrt{dP_{\theta^{(j)}}^{n}dP_{\theta}^{n}}\), then we obtain for any estimator \(\widetilde {\beta }\) of β that
Due to the identity \(\rho(P_{\theta^{(j)}}^{n},P_{\theta}^{n})=1-\frac {1}{2}H^{2}(P_{\theta^{(j)}}^{n},P_{\theta}^{n})\) combining (A.6) with (A.7) yields
From this we conclude for each estimator \(\widetilde {\beta }\) that
We will obtain the claimed result of the theorem by evaluating (A.8) for two special choices of the vector u satisfying (A.5), which we will construct in the following. Define \(\zeta:=\Delta\min (r,\frac{\sigma ^{2}}{2})\) with Δ given by (3.8). We distinguish in the following the two cases: (i) \(\sum_{j=1}^{{m^{*}}}\frac{\omega _{j}}{\gamma _{j}}\geq\sum_{j=1}^{{m^{*}}}\frac{\varsigma ^{2}\,\omega _{j}}{L\, \gamma _{j}^{2}}\), and (ii) \(\sum_{j=1}^{{m^{*}}}\frac{\omega _{j}}{\gamma _{j}}< \sum _{j=1}^{{m^{*}}}\frac{\varsigma ^{2}\,\omega _{j}}{L\, \gamma _{j}^{2}}\). Consider first (i). Given \(\alpha:= {R^{*}_{n,L}}(\sum_{j=1}^{{m^{*}}}\frac{\omega _{j}}{n\gamma _{j}})^{-1}\leq\Delta^{-1}\) by employing (3.8) let u be the vector with coefficients \(u_{j}=(\zeta\, \alpha\, n^{-1})^{1/2} \gamma _{j}^{-1/2}\) which satisfies the condition (A.5). Indeed, since b/ω is monotonically increasing and by using successively the definition of α, Δ and ζ it follows that \(\sum_{j=1}^{{m^{*}}}b_{j}u_{j}^{2}\leq\zeta\, \frac{b_{{m^{*}}}}{\omega _{{m^{*}}}}\alpha\sum_{j=1}^{{m^{*}}}\frac{\omega _{j}}{n\gamma _{j}}= \zeta\, \frac{b_{{m^{*}}}}{\omega _{{m^{*}}}} {R^{*}_{n,L}}\leq \zeta\,\Delta^{-1}\leq r\) and \(\frac{2n}{\sigma ^{2}} \frac{\gamma _{j}^{2} u_{j}^{2}}{(\gamma _{j}+\varsigma ^{2}\,L^{-1})}\leq\frac{2n}{\sigma ^{2}} \gamma _{j} u_{j}^{2}= \frac{2}{\sigma ^{2}} \zeta\alpha\leq \frac{2}{\sigma ^{2}} \zeta\Delta^{-1}\leq1\) for j=1,…,m ∗. Consequently, by evaluating (A.8) we obtain in case (i) the result of the theorem:
On the other hand, in case (ii) let \(\alpha:= {R^{*}_{n,L}}(\sum _{j=1}^{{m^{*}}}\frac{\varsigma ^{2}\omega _{j}}{Ln\gamma _{j}})^{-1}\leq\Delta ^{-1}\) and u be the vector with coefficients \(u_{j}=(\zeta\, \alpha\, n^{-1} \, \varsigma ^{2}\,L^{-1})^{1/2} \gamma _{j}^{-1}\) satisfying (A.5), because \(\sum_{j=1}^{{m^{*}}}b_{j}u_{j}^{2}\leq\zeta\, \frac{b_{{m^{*}}}}{\omega _{{m^{*}}}}\alpha\sum_{j=1}^{{m^{*}}}\frac{\varsigma ^{2}\omega _{j}}{nL\gamma _{j}}= \zeta\, \frac{b_{{m^{*}}}}{\omega _{{m^{*}}}} {R^{*}_{n,L}}\leq r\) and \(\frac{2n}{\sigma ^{2}} \frac{\gamma _{j}^{2} u_{j}^{2}}{(\gamma _{j}+\varsigma ^{2}\, L^{-1})}\leq\frac{2n}{\sigma ^{2}} \frac{\gamma _{j}^{2} u_{j}^{2}}{\varsigma ^{2}\,L^{-1}} = \frac{2}{\sigma ^{2}} \zeta\alpha\leq1\) for j=1,…,m ∗. From (A.8) follows
which proves the claimed result in case (ii) and completes the proof. □
1.3 A.3 Proof of the upper bound
The following technical lemma is used in the proof of Theorem 3.5.
Lemma A.1
If the assumptions of Theorem 3.5 hold true, then there exists a constant \(K:=K(\varsigma ,\mathcal {F}_{b}^{r}, \mathcal {G}_{\gamma }^{d})\) depending on ς and the classes \(\mathcal {F}_{b}^{r}\) and \(\mathcal {G}_{\gamma }^{d}\) only such that (i) \(n^{2}({m^{*}_{n,L}})^{4}P(\mho ^{c}_{n,L})\leq K\) and (ii) \(nP(\Omega^{c}_{n,L})\leq K\) for all n≥1,L≥2.
Proof
We observe that \(n^{-1}b_{{m^{*}_{n,L}}}\omega _{{m^{*}_{n,L}}}^{-1} \max(\omega _{{m^{*}_{n,L}}}\gamma _{{m^{*}_{n,L}}}^{-1}, \varsigma ^{2}L^{-1}\omega _{{m^{*}_{n,L}}}\gamma _{{m^{*}_{n,L}}}^{-2})\leq \Delta^{-1}\) for all n,L≥1 by exploiting condition (3.8) and that \(\lVert[\varSigma]_{{\underline {m}}}^{1/2} [\varGamma ]_{{\underline {m}}}^{-1}[\varSigma]_{{\underline {m}}}^{1/2}\rVert_{s}\) \(\leq C(d) \max(1, \varsigma ^{2}L^{-1}\gamma _{m}^{-1})\) and \(\lVert [\varGamma ]_{{\underline {m}}}^{-1}\rVert_{s}\leq C(d) \gamma _{m}^{-1}\) for all \(\varGamma \in \mathcal {G}_{\gamma }^{d}\) and m≥1 due to (A.12) in Lemma A.2. Combining the estimates we obtain \(\lVert[\varSigma]_{{\underline {m}^{*}_{n,L}}}^{1/2} [\varGamma ]_{{\underline {m}^{*}_{n,L}}}^{-1}[\varSigma]_{{\underline {m}^{*}_{n,L}}}^{1/2}\rVert_{s}^{2} \leq C(d) \gamma _{{m^{*}_{n,L}}}^{2} b_{{m^{*}_{n,L}}}^{-2} n^{2} \Delta ^{-2}\) and \(\lVert [\varGamma ]_{{\underline {m}}}^{-1}\rVert_{s}\leq C(d) b_{{m^{*}_{n,L}}}^{-1} n \Delta^{-1}\) for all \(\varGamma \in \mathcal {G}_{\gamma }^{d}\) and n,L≥1. Consider each bound separately. On the one hand, from the first bound and the first condition in (3.9) follows
which in turn by employing (A.24) in Lemma A.6 implies \(n^{2}({m^{*}_{n,L}})^{4}P(\mho_{n,L}^{c})\leq K\) for some constants \(K:=K(\varsigma , \mathcal {F}_{b}^{r}, \mathcal {G}_{\gamma }^{d})\) depending on ς and the classes \(\mathcal {F}_{b}^{r}\) and \(\mathcal {G}_{\gamma }^{d}\) only. On the other hand, combining the second bound with \(b_{{m^{*}_{n,L}}}^{-1}=o(1)\) as n→∞ due to Assumption 2.1 we conclude \(2\lVert [\varGamma ]_{{\underline {m}^{*}_{n,L}}}^{-1}\rVert_{s} =o(n)\). Therefore, there exists an integer n o such that for all n≥n o we have \(\mho_{n,L}\subset\Omega_{n,L}\). We distinguish in the following the cases n<n o and n≥n o . Consider first n≥n o , from (i) we obtain \(nP(\Omega^{c}_{n,L})\leq nP(\mho _{n,L}^{c})\leqslant K \). On the other hand, if n<n 0 then trivially \(P(\Omega^{c}_{n,L})\leq n^{-1}n_{o}\). Since n o depends on ς and the classes \(\mathcal {F}_{b}^{r}\) and \(\mathcal {G}_{\gamma }^{d}\) only, we obtain the assertion (ii) by combining both cases, which completes the proof. □
Proof of Theorem 3.5
Let \(K:=K(\varsigma ,\mathcal {F}_{b}^{r}, \mathcal {G}_{\gamma }^{d})\) denote a constant depending on ς and the classes \(\mathcal {F}_{b}^{r}\) and \(\mathcal {G}_{\gamma }^{d}\) only which changes from line to line. Consider the decomposition (A.2) where \(\lVert \beta - \beta ^{m}\rVert_{w}^{2} \leq C(d) \, r\, \omega _{m}b_{m}^{-1}\) for all m≥1 by employing (A.15) in Lemma A.2 together with Assumption 2.1, i.e., γω −1 is monotonically nonincreasing, and \(n\lVert \beta \rVert_{\omega }^{2}P(\Omega_{n}^{c})\leq K(\varsigma , \mathcal {F}_{b}^{r}, \mathcal {G}_{\gamma }^{d}) \) due to Lemma A.1, we obtain
From the last estimate and the definition of \({R^{*}_{n,L}}\) given in (3.6) we conclude
On the other hand, combining (A.3), \(\lVert [\varSigma]_{{\underline {m}}}\rVert_{s}=\lVert[\varGamma ]_{{\underline {m}}}\rVert_{s}+\varsigma ^{2}L^{-1}\leq d+\varsigma ^{2}L^{-1}\), \(\rho_{m}^{2}\leq \sigma ^{2} + C(d) r(1 + 2\varsigma ^{2}L^{-1})\) for all m≥1 ((A.14) and (A.16) in Lemma A.2), \(\mathbb{E}\lVert X\rVert_{\mathbb {H}}^{2}\leq d\sum_{j\geq1}\gamma _{j}\) and Lemma A.1 we have
Moreover, by employing
for all m≥1 the condition (3.9) implies
Combination of the last bound and (A.11) implies the assertion of the theorem. □
Proof of Proposition 3.6
Under the stated conditions it is easy to verify that the assumptions of Theorem 3.5 are satisfied. The result follows by applying Theorem 3.5 and we omit the details. □
1.4 A.4 Technical assertions
The following Lemma A.2–A.6 gather technical results used in the proof of Proposition 3.1 and Theorem 3.5. The proof of the next lemma can be found in Johannes and Schenk (2013).
Lemma A.2
Let \(\varGamma \in \mathcal {G}_{\gamma }^{d}\) where the sequence γ satisfies Assumption 2.1, then we have
Let in addition \(\beta \in \mathcal {F}_{b}^{r}\) with sequence b satisfying Assumption 2.1. If β m denotes a Galerkin solution of g=Γβ then for each strictly positive sequence w:=(w j ) j≥1 such that w/b is nonincreasing, we obtain for all \(m\in \mathbb {N}\),
The following bound for the spectral norm of a standard Wishart-distributed random matrix is due to Davidson and Szarek (2001) (Theorem 2.13). Recall that given independent and standard normally distributed m-dimensional random vectors Z 1,…,Z n , the random matrix \(W=\sum_{i=1}^{n}Z_{i}Z_{i}^{t}\) follows a standard Wishart-distribution with parameter (n,m).
Lemma A.3
Let W be a standard Wishart-distributed random matrix with parameters (n,m). For all t>0 we have \(P(\lVert W\rVert_{s}\geq n(1+\sqrt{m/n}+t)^{2})\leq\exp(-nt^{2}/2)\).
Moreover, from Theorem 2.4 in Davidson and Szarek (2001) we derive the following bound for the spectral norm of a random matrix with i.i.d. standard normally distributed components.
Lemma A.4
Let W be a (m×m)-dimensional random matrix with i.i.d. standard normally distributed components. There exist numerical constants C>2, c 0∈(0,1) and β 0≥1 such that for all \(m\in \mathbb {N}\) we have \(\mathbb{E}\lVert m^{-1/2}W\rVert_{s}^{8}\leq C\) and \(P(\lVert m^{-1/2}W\rVert_{s}>t)\leq \exp(-c_{0} t^{2})\) for all t≥β 0.
Let us further state elementary inequalities for Gaussian random variables.
Lemma A.5
Let {U i ,V ij ,1≤i≤n,1≤j≤m} be independent and standard normally distributed random variables. Then we have for all η>0 that
and for all c≥1 and a 1,…,a m ≥0 we have
Proof of Lemma A.5
Define \(W:=\sum_{i=1}^{n} U_{i}^{2}\) and \(Z_{j}:=(\sum_{i=1}^{n}U_{i}^{2})^{-1/2}\sum_{i=1}^{n} U_{i}V_{ij}\). Obviously, W has \(\chi^{2}_{n}\) distribution with n degrees of freedom and Z 1,…,Z m given U 1,…,U n are independent and standard normally distributed, which we use below without further reference. From the estimate (A.17) given in Dahlhaus and Polonik (2006) (Proposition A.1) follows
which implies (A.18). It remains to prove (A.19) and (A.20) which can be realised as follows (keep in mind that \(\mathbb{E}[W]=n\) and \(\mathbb{E}[Z_{j}^{2}\big|U_{1},\dotsc,U_{n}]=1\)):
Finally, by applying \(\mathbb{E}[Z_{j}^{8}|U_{1},\dotsc,U_{n}]=105\) and \(\mathbb{E}W^{4}=n(n+2)(n+4)(n+6)\) we obtain \(\mathbb{E}[W^{4}Z_{j}^{8}]\leq (11n)^{4}\) and hence
which shows (A.20) and completes the proof. □
Lemma A.6
For all n,m≥1 we have
Furthermore, there exist numerical constants C,c 0>0 such that for all n,m≥1,
Proof of Lemma A.6
Let (λ j ,e j )1≤j≤m denote an eigenvalue decomposition of \([\varSigma]_{{\underline {m}}}\).
Proof of (A.21) and (A.22). Define \(U_{i}:= \rho_{m}^{-1}U^{(i)}= \rho_{m}^{-1}(Y^{(i)}-\langle \beta ^{m},\widehat {X}^{(i)}_{2}\rangle_{\mathbb {H}})\) and \(V_{ij}:=(\lambda_{j}^{-1/2}e_{j}^{t}[\widehat {X}^{(i)}_{1}]_{{\underline {m}}})\), 1≤i≤n, 1≤j≤m, where U 1,…,U n ,V 11,…,V nm are independent and standard normally distributed random variables. Taking into account \(\sum_{j=1}^{m}\lambda_{j}=\mathop{\rm tr}\nolimits ([\varSigma]_{{\underline {m}}})=\mathop{\rm tr}\nolimits ([\varGamma ]_{{\underline {m}}}) + 2\varsigma ^{2}L^{-1}m \leq \mathbb{E}\lVert X\rVert_{\mathbb {H}}^{2} + 2\varsigma ^{2}L^{-1}m\) and the identities \(n^{2}\rho _{m}^{-2}\lVert [\varSigma]_{{\underline {m}}}^{-1/2}[W]_{{\underline {m}}}\rVert ^{2}=\sum_{j=1}^{m} (\sum_{i=1}^{n}U_{i}V_{ij})^{2}\) and \(n^{8}\rho_{m}^{-8}\lVert [W]_{{\underline {m}}}\rVert^{8} = (\sum_{j=1}^{m} \lambda_{j}(\sum_{i=1}^{n}U_{i}V_{ij})^{2})^{4}\) the assertions (A.21) and (A.22) follow, respectively, from (A.19) and (A.20) in Lemma A.5 (with a j =1 and a j =λ j respectively).
Proof of (A.23) and (A.24). Define the random matrix \(A_{m}:=\sum_{i=1}^{n}[\varSigma]_{{\underline {m}}}^{-1/2}[\widehat {X}^{(i)}_{2}]_{{\underline {m}}} [\widehat {X}^{(i)}_{2}]_{{\underline {m}}}^{t}[\varSigma]_{{\underline {m}}}^{-1/2}=C_{nm}C_{nm}^{t}\) with \(C_{nm}:=([\varSigma]_{{\underline {m}}}^{-1/2}[\widehat {X}^{(1)}_{2}]_{{\underline {m}}},\dotsc,[\varSigma]_{{\underline {m}}}^{-1/2}[\widehat {X}^{(n)}_{2}]_{{\underline {m}}})\) and for i=1,…,n the random vector \([U_{i}]_{{\underline {m}}}:=[\varSigma]_{{\underline {m}}}^{-1/2}[\widehat {X}^{(i)}_{1}]_{{\underline {m}}}-([\varSigma]_{{\underline {m}}}^{-1/2} [\varGamma ]_{{\underline {m}}} [\varSigma]_{{\underline {m}}}^{-1/2})[\varSigma ]_{{\underline {m}}}^{-1/2}[\widehat {X}^{(i)}_{2}]_{{\underline {m}}}\). Observe that the conditional distribution of \([U_{i}]_{{\underline {m}}}\) given \([\varSigma]_{{\underline {m}}}^{-1/2}[\widehat {X}^{(i)}_{2}]_{{\underline {m}}}\) is Gaussian with mean zero and covariance matrix \(\varSigma_{U}:=([{\rm Id}]_{{\underline {m}}}- [\varSigma]_{{\underline {m}}}^{-1/2} [\varGamma ]_{{\underline {m}}}[\varSigma]_{{\underline {m}}}^{-1} [\varGamma ]_{{\underline {m}}}[\varSigma]_{{\underline {m}}}^{-1/2})\), and given \([\varSigma]_{{\underline {m}}}^{-1/2}[\widehat {X}^{(1)}_{2}]_{{\underline {m}}},\dotsc,[\varSigma ]_{{\underline {m}}}^{-1/2}[\widehat {X}^{(n)}_{2}]_{{\underline {m}}}\) the components of the (m×m)-dimensional matrix
are i.i.d. standard normally distributed. By employing this notation it is easily seen that
For all 1≤j,l≤m let δ jl =1 if j=l and zero otherwise. It is easily verified that \(\lVert n^{-1}A_{m} -[{\rm Id}]_{{\underline {m}}}\rVert_{s}^{2}\leq\sum_{j=1}^{m}\sum _{l=1}^{m}|n^{-1}\sum_{i=1}^{n}(V_{ij}V_{il}-\delta_{jl})|^{2}\) with \(V_{ij}:=(\lambda_{j}^{-1/2}e_{j}^{t}[\widehat {X}^{(i)}_{2}]_{{\underline {m}}})\), 1≤i≤n, 1≤j≤m. Moreover, for j≠l we have \(\mathbb{E}|\sum_{i=1}^{n} V_{ij}V_{il}|^{8} \leq(11n)^{4}\) by employing (A.20) in Lemma A.5 (take m=1 and a 1=1), while \(\mathbb{E}|\sum_{i=1}^{n}(V_{ij}^{2}-1)|^{8}=n^{4} 256(105/16+595/(2n)+ 1827/n^{2}+2520/n^{3})\leq(34n)^{4}\). From these estimates we get \(m^{-8}\mathbb{E}\lVert n^{-1}A_{m} -[{\rm Id}]_{{\underline {m}}}\rVert_{s}^{8}\leq C n^{-4}\) for all m≥1 which implies \(\mathbb{E}\lVert n^{-1/2}A_{m}^{1/2}\rVert_{s}^{8}\leq C (m^{4}n^{-2}+1)\). Combining the last bound and \(\mathbb{E}[\lVert m^{-1/2} B_{m}\rVert_{s}^{8}\big|C_{nm}]\leq C\) due to Lemma A.4 we obtain
If \(\{v_{j}\}_{j=1}^{m}\) denote the eigenvalues of \([\varGamma ]_{{\underline {m}}}\) in a decreasing order then it follows that \(\{v_{j}+2\varsigma ^{2}L^{-1}\}_{j=1}^{m}\) are the eigenvalues of \([\varSigma]_{{\underline {m}}}\) and, hence \(\{v_{j}(v_{j}+2\varsigma ^{2}L^{-1})^{-1}\}_{j=1}^{m}\) are the eigenvalues of \([\varSigma]_{{\underline {m}}}^{-1/2} [\varGamma ]_{{\underline {m}}}[\varSigma]_{{\underline {m}}}^{-1/2}\) which implies \(\lVert [\varSigma]_{{\underline {m}}}^{-1/2} [\varGamma ]_{{\underline {m}}}[\varSigma]_{{\underline {m}}}^{-1/2}\rVert_{s} \leq1\), \(\lVert [\varSigma]_{{\underline {m}}}^{1/2}[\varGamma ]_{{\underline {m}}}^{-1}[\varSigma]_{{\underline {m}}}^{1/2}\rVert_{s}=1+2\varsigma ^{2}L^{-1}\lVert [\varGamma ]_{{\underline {m}}}^{-1}\rVert_{s}\) and analogously \(\lVert\varSigma_{U}\rVert_{s}\leq1\). Combining the last estimates and (A.26) we obtain the assertion (A.23). Moreover, we have
and hence
Since \(n\lVert n^{-1}A_{m} -[{\rm Id}]_{{\underline {m}}}\rVert_{s}\leq m\max_{1\leq j,l\leq m}|\sum_{i=1}^{n}(V_{ij}V_{il}-\delta_{jl})|\), we obtain due to (A.17) and (A.18) in Lemma A.5 that
Moreover, for all η≤m/2 the last bound simplifies to
and it is easily seen that the last bound, Lemmas A.3 and A.4 by making use of the decomposition (A.27) imply (A.23), which completes the proof. □
Rights and permissions
About this article
Cite this article
Bereswill, M., Johannes, J. On the effect of noisy measurements of the regressor in functional linear models. TEST 22, 488–513 (2013). https://doi.org/10.1007/s11749-013-0325-7
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11749-013-0325-7