Skip to main content
Log in

Robust functional estimation in the multivariate partial linear model

  • Published:
Annals of the Institute of Statistical Mathematics Aims and scope Submit manuscript

Abstract

We consider the problem of adaptive estimation of the functional component in a partial linear model where the argument of the function is defined on a q-dimensional grid. Obtaining an adaptive estimator of this functional component is an important practical problem in econometrics where exact distributions of random errors and the parametric component are mostly unknown. An estimator of the functional component that is adaptive over the wide range of multivariate Besov classes and robust to a wide choice of distributions of the linear component and random errors is constructed. It is also shown that the same estimator is locally adaptive over the same range of Besov classes and robust over large collections of distributions of the linear component and random errors as well. At any fixed point, this estimator attains a local adaptive minimax rate.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Amato, U., Antoniadis, A., Pensky, M. (2006). Wavelet kernel penalized estimation for non-equispaced design regression. Statistics and Computing, 16(1), 37–55.

  • Antoniadis, A., Pham, D. T. (1998). Wavelet regression for random or irregular design. Computational Statistics & Data Analysis, 28(4), 353–369.

  • Autin, F., Claeskens, G., Freyermuth, J.-M. (2014). Hyperbolic wavelet thresholding methods and the curse of dimensionality through the maxiset approach. Applied and Computational Harmonic Analysis, 36(2), 239–255.

  • Besov, O. V., Ilin, V. P., Nikolskii, S. (1979). Integral representations of functions and imbedding theorems (Vol. II). Washington: Wiley.

  • Brown, L. D., Cai, T. T., Low, M. G., Zhang, C.-H. (2002). Asymptotic equivalence theory for nonparametric regression with random design. Annals of Statistics, 30(3), 688–707.

  • Brown, L. D., Cai, T. T., Zhou, H. H. (2008). Robust nonparametric estimation via wavelet median regression. Annals of Statistics, 36(5), 2055–2084.

  • Brown, L. D., Levine, M., Wang, L. (2016). A semiparametric multivariate partially linear model: A difference approach. Journal of Statistical Planning and Inference, 178, 99–111.

  • Cai, T. T. (1999). Adaptive wavelet estimation: A block thresholding and oracle inequality approach. Annals of Statistics, 27(3), 898–924.

    Article  MathSciNet  MATH  Google Scholar 

  • Cai, T. T., Brown, L. D. (1998). Wavelet shrinkage for nonequispaced samples. Annals of Statistics, 26(5), 1783–1799.

  • Casella, G., Berger, R. L. (2002). Statistical inference (Vol. 2). Pacific Grove: Duxbury.

  • Daubechies, I. (1992). Ten lectures on wavelets. Philadelphia: SIAM.

    Book  MATH  Google Scholar 

  • Donoho, D. L., Johnstone, I. M., Kerkyacharian, G., Picard, D. (1995). Wavelet shrinkage: Asymptopia? Journal of the Royal Statistical Society. Series B (Methodological), 57(2), 301–369.

  • Fang, K.-T., Kotz, S., Ng, K. W. (1990). Symmetric multivariate and related distributions. London: Chapman and Hall.

  • Hall, P., Turlach, B. A. (1997). Interpolation methods for nonlinear wavelet regression with irregularly spaced design. Annals of Statistics, 25(5), 1912–1925.

  • Härdle, W., Liang, H., Gao, J. (2012). Partially linear models. Heidelberg: Springer.

  • He, X., Shi, P. (1996). Bivariate tensor-product b-splines in a partly linear model. Journal of Multivariate Analysis, 58(2), 162–181.

  • Horowitz, J. L. (2009). Semiparametric and nonparametric methods in econometrics (Vol. 12). Berlin: Springer.

    Book  MATH  Google Scholar 

  • Kohler, M. (2008). Multivariate orthogonal series estimates for random design regression. Journal of Statistical Planning and Inference, 138(10), 3217–3237.

    Article  MathSciNet  MATH  Google Scholar 

  • Kovac, A., Silverman, B. W. (2000). Extending the scope of wavelet regression methods by coefficient-dependent thresholding. Journal of the American Statistical Association, 95(449), 172–183.

  • Levine, M. (2015). Minimax rate of convergence for an estimator of the functional component in a semiparametric multivariate partially linear model. Journal of Multivariate Analysis, 140, 283–290.

    Article  MathSciNet  MATH  Google Scholar 

  • Meyer, Y. (1995). Wavelets and operators (Vol. 1). Cambridge: Cambridge University Press.

    MATH  Google Scholar 

  • Müller, U. U., Schick, A., Wefelmeyer, W. (2012). Estimating the error distribution function in semiparametric additive regression models. Journal of Statistical Planning and Inference, 142(2), 552–566.

  • Neumann, M. H. (2000). Multivariate wavelet thresholding in anisotropic function spaces. Statistica Sinica, 10(2), 399–431.

    MathSciNet  MATH  Google Scholar 

  • Pensky, M., Vidakovic, B. (2001). On non-equally spaced wavelet regression. Annals of the Institute of Statistical Mathematics, 53(4), 681–690.

  • Robinson, P. M. (1988). Root-n-consistent semiparametric regression. Econometrica: Journal of the Econometric Society, 56(4), 931–954.

  • Sardy, S., Percival, D. B., Bruce, A. G., Gao, H.-Y., Stuetzle, W. (1999). Wavelet shrinkage for unequally spaced data. Statistics and Computing, 9(1), 65–75.

  • Schick, A. (1996). Root-n consistent estimation in partly linear regression models. Statistics & Probability Letters, 28(4), 353–358.

    Article  MathSciNet  MATH  Google Scholar 

  • Schmalensee, R., Stoker, T. M. (1999). Household gasoline demand in the united states. Econometrica, 67(3), 645–662.

  • Scott, D. W. (2015). Multivariate density estimation: Theory, practice, and visualization. New York: Wiley.

  • Stuck, B. W. (2000). An historical overview of stable probability distributions in signal processing. In IEEE international conference on acoustics speech and signal processing, vol 6, p. VI–3795. IEEE; 1999.

  • Stuck, B. W., Kleiner, B. (1974). A statistical analysis of telephone noise. Bell Labs Technical Journal, 53(7), 1263–1320.

  • Triebel, H. (2006). Theory of function spaces. III, volume 100 of monographs in mathematics. Basel: BirkhauserVerlag.

    Google Scholar 

  • Vidakovic, B. (2009). Statistical modeling by wavelets (Vol. 503). New York: Wiley.

    MATH  Google Scholar 

  • Wang, J.-L., Xue, L., Zhu, L., Chong, Y. S. (2010). Estimation for a partial-linear single-index model. Annals of Statistics, 38(1), 246–274.

  • Zhang, S., Wong, M.-Y., Zheng, Z. (2002). Wavelet threshold estimation of a regression function with random design. Journal of Multivariate Analysis, 80(2), 256–284.

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Michael Levine.

Appendix

Appendix

In order to prove our adaptation results, we need to start with the following proposition that provides us with an expansion of the median of binned observations. That expansion shows that, up to small stochastic and deterministic errors, that median has an approximately normal distribution. An important fact that we need to use in order to prove this proposition is that any Besov ball \(B^{\alpha }_{s,t}(M)\) can be embedded into a Hölder ball with the smoothness index \(d=\min \left( \alpha -\frac{q}{s},1\right) \); for details see e.g., Meyer (1995). We also remark here that, in this section, we use the notation C for a generic positive constant that can be different from one line to another.

Proposition 2

Let the function \(f\in B^{\alpha }_{s,t}(M)\) and \(d\doteq \min \left( \alpha -\frac{q}{s},1\right) \). The median \(Q_{\mathbf {l}}\) of observations that belong to the \(\mathbf {l}\)th bin can be written as

$$\begin{aligned} \sqrt{\kappa }Q_{\mathbf {l}}=\sqrt{\kappa }f\left( \frac{\mathbf {l}}{T}\right) +\sqrt{\kappa }b_{\mathbf {l}}+\frac{1}{2}Z_{\mathbf {l}}+\varepsilon _{\mathbf {l}}+\zeta _{\mathbf {l}}, \end{aligned}$$

where

  1. 1.

    \(Z_{\mathbf {l}}\sim N(0,1/h^{2}(0));\)

  2. 2.

    \(\varepsilon _{\mathbf {l}}\) are constants such that \(\vert \varepsilon _{\mathbf {l}}\vert \le C\sqrt{\kappa }q^{d/2}T^{-d};\)

  3. 3.

    \(\zeta _{\mathbf {l}}\) are independent random variables such that for any \(r>0\)

$$\begin{aligned} \mathbb {E}\,\vert \zeta _{\mathbf {l}}\vert ^{r} \le C_{r}\kappa ^{-r/2}+C_{r}\kappa ^{r/2}q^{dr/2}T^{-dr}, \end{aligned}$$

where \(C_{r}\) is a positive constant that depends on r only; moreover, for any \(a>0\)

$$\begin{aligned} P(\vert \zeta _{\mathbf {l}}\vert >a)\le C_{r}(a^{2}\kappa )^{-r/2}+C_{r}(a^{2}T^{2d}/\kappa q^{d})^{-r/2}. \end{aligned}$$
(13)

Proof

In this proof, we denote the cdf of the standard normal distribution \(\varPhi (\cdot )\). Define \(Z_{\mathbf {l}}=\frac{1}{h(0)}\varPhi ^{-1}(G(\eta _{\mathbf {l}}))\) where G is the distribution of the median \(\eta _{\mathbf {l}}\). Due to Theorem (1), we know that the rescaled median of errors \(\sqrt{4\kappa }\eta _{\mathbf {l}}\) can be well approximated by a mean zero normal random variable with the variance equal to \(\frac{1}{h^{2}(0)}\). Next, we define

$$\begin{aligned} \varepsilon _{\mathbf {l}}&=\sqrt{\kappa }\mathbb {E}\,Q_{\mathbf {l}}-\sqrt{\kappa }f\left( \frac{\mathbf {l}}{T}\right) -\sqrt{\kappa }b_{\mathbf {l}}\\&=\mathbb {E}\,\left\{ \sqrt{\kappa }Q_{\mathbf {l}}-\sqrt{\kappa }f\left( \frac{\mathbf {l}}{T}\right) -\sqrt{\kappa }\eta _{\mathbf {l}}\right\} . \end{aligned}$$

What we have in the above is the deterministic component of the approximation error due to binning. Clearly, for the \(\mathbf {l}\)th bin \(D_{\mathbf {l}}\), we have

$$\begin{aligned}&\min _{u_{\mathbf {i}}\in D_{\mathbf {l}}}\left[ f(u_{\mathbf {i}})-f\left( \frac{\mathbf {l}}{T}\right) \right] \le Q_{\mathbf {l}}-\eta _{\mathbf {l}}-f\left( \frac{\mathbf {l}}{T}\right) \le \max _{u_{\mathbf {i}}\in D_{\mathbf {l}}}\left[ f(u_{\mathbf {i}})-f\left( \frac{\mathbf {l}}{T}\right) \right] . \end{aligned}$$
(14)

Since the function f is in a Hölder ball with the smoothness index \(d=\min \left( \alpha -\frac{q}{s},1\right) \), we have

$$\begin{aligned} \vert \varepsilon _{\mathbf {l}}\vert&\le \sqrt{\kappa }\mathbb {E}\,\left| Q_{\mathbf {l}}-f\left( \frac{\mathbf {l}}{T}\right) -\eta _{\mathbf {l}}\right| \\&\le \sqrt{\kappa }\max _{u_{\mathbf {i}}\in D_{\mathbf {l}}}\left| f(u_{\mathbf {i}})-f\left( \frac{\mathbf {l}}{T}\right) \right| \le C\sqrt{\kappa }q^{d/2}T^{-d}. \end{aligned}$$

Now, it becomes necessary to characterize the random error of our approximation. First, we define \(\zeta _{\mathbf {l}}=\sqrt{\kappa }Q_{\mathbf {l}}-\sqrt{\kappa }f\left( \frac{\mathbf {l}}{T}\right) -\sqrt{\kappa }b_{\mathbf {l}}-\varepsilon _{\mathbf {l}}-\frac{1}{2}Z_{\mathbf {l}}\). Note that \(\mathbb {E}\,\zeta _{\mathbf {l}}=0\) and this random error can be represented as the sum of two components, \(\zeta _{1\mathbf {l}}=\sqrt{\kappa }Q_{\mathbf {l}}-\sqrt{\kappa }f\left( \frac{\mathbf {l}}{T}\right) -\sqrt{\kappa }\eta _{\mathbf {l}}-\varepsilon _{\mathbf {l}}\) and \(\zeta _{2\mathbf {l}}=\sqrt{\kappa }\eta _{\mathbf {l}}-\sqrt{\kappa }b_{\mathbf {l}}-\frac{1}{2}Z_{\mathbf {l}}\). The first component \(\zeta _{1\mathbf {l}}\) represents the random error resulting from the binning of observations, while \(\zeta _{2\mathbf {l}}\) is the error resulting from approximation of the median of random errors with a normal random variable. First, the error \(\zeta _{1\mathbf {l}}\) is bounded due to (14) as \(\vert \zeta _{1\mathbf {l}}\vert \le C\sqrt{\kappa }q^{d/2}T^{-d}\). Next, using Corollary (1), we can bound the absolute value of the second random error term as \(\vert \zeta _{2\mathbf {l}}\vert \le \frac{C}{\kappa ^{1/2}}(1+\vert Z_{\mathbf {l}}\vert ^{2})\) when \(\vert Z_{\mathbf {l}}\vert \le \varepsilon \sqrt{\kappa }\) for some \(\varepsilon >0\). Thus, for any fixed \(r\ge 0\),

$$\begin{aligned} \mathbb {E}\,\vert \zeta _{2\mathbf {l}}\vert ^{r}&=\mathbb {E}\,\vert \zeta _{2\mathbf {l}}\vert ^{r}I(\vert Z_{\mathbf {l}}\vert \le \varepsilon \sqrt{\kappa })+\mathbb {E}\,\vert \zeta _{2\mathbf {l}}\vert ^{r}I(\vert Z_{\mathbf {l}}\vert> \varepsilon \sqrt{\kappa })\\&\le C\kappa ^{-r/2}+\mathbb {E}\,\vert \zeta _{2\mathbf {l}}\vert ^{r}I(\vert Z_{\mathbf {l}}\vert > \varepsilon \sqrt{\kappa })\\&\le C\kappa ^{-r/2}+\left\{ \mathbb {E}\,\vert \zeta _{2\mathbf {l}}\vert ^{2r}\right\} ^{r/2}\exp \left( -\frac{\varepsilon ^{2}}{\kappa }\right) \end{aligned}$$

since, due to the Mill’s ratio inequality, we can quickly verify that \(P(\vert Z_{\mathbf {l}}\vert > \varepsilon \sqrt{\kappa })\le \exp \left( -\frac{\varepsilon ^{2}}{\kappa }\right) \).

To continue, we will need to use Assumption (A2). Using the same argument as in Brown et al. (2008), we can show that the density of the centered median \(\eta _{\mathbf {l}}-b_{\mathbf {l}}\)g(x) is such that the sample centered median has any finite moments and, therefore, \(\mathbb {E}\,\vert \sqrt{\kappa }(\eta _{\mathbf {l}}-b_{\mathbf {l}})\vert ^{2r}\le \kappa ^{r}\mathbb {E}\,\vert \eta _{\mathbf {l}}\vert ^{2r}\le D_{r}\kappa ^{r}\) for some positive constant \(D_{r}\) that does not depend on n. This allows us to conclude that \(\mathbb {E}\,\vert \zeta _{2\mathbf {l}}\vert ^{r}\le C_{r}\kappa ^{-r/2}\) since the normal random variable \(Z_{j}\) has finite moments of any order. Finally,

$$\begin{aligned}&\mathbb {E}\,\vert \zeta _{\mathbf {l}}\vert ^{r}\le 2^{r-1}\left( \mathbb {E}\,\vert \zeta _{1\mathbf {l}}\vert ^{r}+\mathbb {E}\,\vert \zeta _{2\mathbf {l}}\vert ^{r}\right) \le C_{r}\kappa ^{-r/2}+C_{\mathbf {r}}\kappa ^{r/2}q^{dr/2}T^{-dr}. \end{aligned}$$

The inequality (13) can now be obtained immediately using the Markov’s inequality. \(\square \)

Remark 4

In the proof of our adaptation results, we will assume everywhere that h(0) is known and equal to 1. This can be done because \(h^{-2}(0)\) can always be estimated in such a way that the difference between \(h^{-2}(0)\) and \({\hat{h}}^{-2}(0)\) is bounded in probability as \(O_{p}(n^{-\delta })\) for some \(\delta >0\) and, moreover, \(P(|{\hat{h}}^{-2}(0)-h^{-2}(0)|\ge n^{-\delta })=c_{l}n^{-l}\) for any \(l \ge 1\). Such an estimator can be constructed as a properly normalized sum of ordered squared difference of medians of observations \(Q_{\mathbf {l}}\), \(\mathbf {1} \le \mathbf {l} \le V\); to order medians, one can use, for example, a lexicographical order of their indices \(\mathbf {l}\). To make the notation easier, we denote two successive medians \(Q_{2k-1}\) and \(Q_{2k}\), using a scalar index to avoid confusion here. Then, the needed estimator of \(h^{-2}(0)\) will be proportional to \(\sum _{k}(Q_{2k-1}-Q_{2k})^{2}\). When such an estimator is constructed, one can immediately check that everywhere in proofs asymptotic properties do not change if \(\lambda ^{*}(1+O(n^{-\delta }))\) is used instead of \(\lambda ^{*}\). The details are similar to the argument of Brown et al. (2008) and are omitted for conciseness.

The next proposition is needed to obtain a uniform bound of the mean squared error risk of estimating the expected error median. Its proof is very similar to that of Lemma 5 in Brown et al. (2008) and is omitted for brevity.

Proposition 3

Let the expectation of the error median for \(\mathbf {l}\)th bin and its estimate be \(b_{\mathbf {l}}\) and \({\hat{b}}_{\mathbf {l}}\), as defined earlier in (8). Then,

$$\begin{aligned} \sup _{h\in {{\mathscr {H}}}}\left| b_{\mathbf {l}}+\frac{h^{'}(0)}{8h^{3}(0)\kappa }\right| \le C\kappa ^{-2}, \end{aligned}$$

and

$$\begin{aligned} \sup _{h\in {{\mathscr {H}}}}\sup _{f\in B^{\alpha }_{s,t}(M)}\mathbb {E}\,({\hat{b}}_{\mathbf {l}}-b_{\mathbf {l}})^{2}\le C\max \left\{ q^{d}T^{-2d},\kappa ^{-4}\right\} \end{aligned}$$

for any index \(\mathbf {l}\).

Proof

Without loss of generality, assume that \(\kappa =2\nu +1\). Then, the expectation of the \(\mathbf {l}\)th median is

$$\begin{aligned} \mathbb {E}\,\eta _{\mathbf {l}}=\int x\frac{2\nu +1}{(\nu !)^{2}}H^{\nu }(x)[1-H(x)]^{\nu }\,\mathrm{d}H(x), \end{aligned}$$

where H(x) is the distribution function corresponding to h(x). For any \(\delta >0\), define a set \(A_{\delta }=\{x:\left| H(x)-\frac{1}{2}\right| \le \delta \}\). It follows from the definition of the class \({{\mathscr {H}}}\) that there exists a constant \(\delta >0\) such that for some \(\varepsilon >0\) we have \(h^{(3)}(x)\le \frac{1}{\varepsilon }\) and \(\varepsilon \le h(x)\le \frac{1}{\varepsilon }\) for any \(x\in A_{\delta }\) uniformly for all \(h\in {{\mathscr {H}}}\). This property implies that \(H^{-1}(x)\) is well defined and is differentiable up to the fourth order for any \(x\in A_{\delta }\). Now, we can expand the expectation of the median into two parts:

$$\begin{aligned} \mathbb {E}\,\eta _{\mathbf {l}}=\left( \int _{A_{\delta }}+\int _{A_{\delta }^{c}}\right) x\frac{2\nu +1}{(\nu !)^{2}}H^{\nu }(x)[1-H(x)]^{\nu }\,\mathrm{d}H(x). \end{aligned}$$

Since we earlier established that all of the moments of the median are finite, \(Q_{2}\) goes to zero exponentially as \(\mu \rightarrow \infty \). Next, we find that

$$\begin{aligned} Q_{1}&=\int _{1/2-\delta }^{1/2+\delta }\left( H^{-1}(x)-H^{-1}\left( \frac{1}{2}\right) \right) \frac{(2\nu +1)!}{(\nu !)^{2}}x^{\nu }(1-x)^{\nu }\,\mathrm{d}x\\&=\int _{1/2-\delta }^{1/2+\delta }\left[ \frac{1}{2}(H^{-1})^{''}\left( \frac{1}{2}\right) \left( x-\frac{1}{2}\right) ^{2}+\frac{(H^{-1})^{(4)}(\tau )}{24}\left( x-\frac{1}{2}\right) ^{4}\right] \\&\quad \times \frac{2\nu +1}{(\nu !)^{2}}x^{\nu }(1-x)^{\nu }\,\mathrm{d}x \end{aligned}$$

since \(x^{\nu }(1-x)^{\nu }\) is symmetric around \(\frac{1}{2}\). The expression \(\frac{2\nu +1}{(\nu !)^{2}}x^{\nu }(1-x)^{\nu }\) is the density function of the Beta\((\nu +1,\nu +1)\) with the mean equal to \(\frac{1}{2}\); this, and the fact that \((H^{-1})^{(4)}(\tau )\) is bounded uniformly for all \(h\in {{\mathscr {H}}}\), implies that, in the same way as in Brown et al. (2008), \(Q_{1}=-\frac{h^{'}(0)}{8h^{3}(0)\kappa }+O\left( \frac{1}{\kappa ^{2}}\right) \). Recall that we earlier established that (see Proposition (2))

$$\begin{aligned} Q_{\mathbf {l}}=f\left( \frac{\mathbf {l}}{T}\right) +b_{\mathbf {l}}+\frac{1}{2\sqrt{\kappa }}Z_{\mathbf {l}}+\frac{1}{\sqrt{\kappa }}\varepsilon _{\mathbf {l}}+\frac{1}{\sqrt{\kappa }}\zeta _{\mathbf {l}}. \end{aligned}$$

In a similar way, we can write for the median of the “half” \(\mathbf {l}\) th bin that

$$\begin{aligned} Q_{\mathbf {l}}^{*}=f\left( \frac{\mathbf {l}-\mathbf { 1/2}}{T}\right) +b_{\mathbf {l}}^{*}+\frac{1}{2\sqrt{\nu }}Z_{\mathbf {l}}^{*}+\frac{1}{\sqrt{\nu }}\varepsilon _{\mathbf {l}}^{*}+\frac{1}{\sqrt{\nu }}\zeta _{\mathbf {l}}^{*}, \end{aligned}$$

where \(\mathbf {\frac{1}{2}}\) is a q-dimensional vector \(\left( \frac{1}{2},\ldots ,\frac{1}{2}\right) ^{'}\), \(b_{\mathbf {l}}^{*}\) is the expected median of the errors of all observations in the “half” \(\mathbf {l}\)th bin, and where \(Z_{\mathbf {l}}^{*}\), \(\varepsilon _{\mathbf {l}}^{*}\), and \(\zeta _{\mathbf {l}}^{*}\) satisfy Proposition (2). Then, the error from the median estimation \({\hat{b}}_{\mathbf {l}}-b_{\mathbf {l}}=\frac{1}{V}\sum _{\mathbf {l}}(Q_{\mathbf {l}}^{*}-Q_{\mathbf {l}})-b_{\mathbf {l}}\) can be written as follows:

$$\begin{aligned} {\hat{b}}_{\mathbf {l}}-b_{\mathbf {l}}&=\frac{1}{V}\sum _{\mathbf {l}}\left( f\left( \frac{\mathbf {l}-\mathbf {1/2}}{T}\right) -f\left( \frac{\mathbf {l}}{T}\right) \right) +(b_{\mathbf {l}}^{*}-2b_{\mathbf {l}})\\&\quad +\,\left[ \frac{1}{\sqrt{\nu }}\frac{1}{V}\sum _{\mathbf {l}}\varepsilon _{\mathbf {l}}^{*}-\frac{1}{\sqrt{\kappa }}\frac{1}{V}\sum _{\mathbf {l}}\varepsilon _{\mathbf {l}}\right] \\&\quad +\,\left[ \frac{1}{2\sqrt{\nu }}\frac{1}{V}\sum _{\mathbf {l}}Z_{\mathbf {l}}^{*}-\frac{1}{2\sqrt{\kappa }}\frac{1}{V}\sum _{\mathbf {l}}Z_{\mathbf {l}}\right] \\&\quad +\,\left[ \frac{1}{\sqrt{\nu }}\frac{1}{V}\sum _{\mathbf {l}}\zeta _{\mathbf {l}}^{*}-\frac{1}{\sqrt{\kappa }}\frac{1}{V}\sum _{\mathbf {l}}\zeta _{\mathbf {l}}\right] \\&\equiv R_{1}+R_{2}+R_{3}+R_{4}+R_{5}. \end{aligned}$$

Due to the embedding of the Besov ball \(B_{s,t}^{\alpha }(M)\) into the Hölder ball with the smoothness index \(d=\min \left( \alpha -\frac{q}{s},1\right) \), the first term is uniformly bounded: \(\sup _{f\in B^{\alpha }_{s,t}(M)}R_{1}^{2}\le CT^{-2d}\). The second term is bounded as \(\sup _{h\in {{\mathscr {H}}}}R_{2}^{2}\le C\kappa ^{-4}\). By Proposition (2), the third term is also bounded as \(\sup _{h\in {{\mathscr {H}}}, f\in B^{\alpha }_{s,t}(M)}R_{3}^{2}\le CT^{-d}\). Since \(Z_{\mathbf {l}}^{*}-Z_{\mathbf {l}}\) are always independent, we have \(\mathbb {E}\,R_{4}^{2}\le \frac{1}{h^{2}(0)}\left( \frac{1}{\kappa }+\frac{1}{\nu }\right) \frac{1}{T}\le Cn^{-1}\). Finally, by Proposition (2), we have \(\mathbb {E}\,R_{5}^{2}=o(n^{-1})\). Thus, the overall bound is

$$\begin{aligned} \sup _{h\in {{\mathscr {H}}},f\in B^{\alpha }_{s,t}(M)}\mathbb {E}\,({\hat{b}}_{\mathbf {l}}-b_{\mathbf {l}})^{2}\le \max \left\{ T^{-2d},\kappa ^{-4}\right\} . \end{aligned}$$

\(\square \)

Due to Proposition (2), we can write

$$\begin{aligned} \frac{1}{\sqrt{V}}Q_{\mathbf {l}}=\frac{g(\mathbf {l}/T)}{\sqrt{V}}+\frac{\varepsilon _{\mathbf {l}}}{\sqrt{n}}+\frac{Z_{\mathbf {l}}}{2\sqrt{n}}+\frac{\zeta _{\mathbf {l}}}{\sqrt{n}}. \end{aligned}$$
(15)

Let Q be the vector of all bin medians \(Q_{\mathbf {l}}\); such a vector will have the length V. Applying the discrete wavelet transform to both sides of (15), we can expand the empirical wavelet coefficients \(y^{i}_{j,\mathbf {k}}\) as

$$\begin{aligned} y^{i}_{j,\mathbf {k}}=\breve{\theta }_{j,\mathbf {k}}^{i}+\varepsilon _{j,\mathbf {k}}^{i}+\frac{1}{2h(0)\sqrt{n}}z_{j,\mathbf {k}}^{i}+\xi _{j,\mathbf {k}}^{i}, \end{aligned}$$
(16)

where \(\breve{\theta }_{j,\mathbf {k}}^{i}\) are the discrete wavelet coefficients of \(g\left( \frac{\mathbf {l}}{T}\right) _{\mathbf {1}\le \mathbf {l} \le \mathbf {T}}\) that are approximately equal to the true wavelet coefficients of g\(\theta _{j,\mathbf {k}}^{i}\), \(\varepsilon _{j,\mathbf {k}}^{i}\) are “small” deterministic approximation errors, \(z_{j,\mathbf {k}}^{i}\) are i.i.d N(0, 1), and \(\xi _{j,\mathbf {k}}^{i}\) are “small” stochastic errors. If it can be assumed that \(\varepsilon _{j,\mathbf {k}}^{i}\) and \(\xi _{j,\mathbf {k}}^{i}\) are both negligible in some sense, we may be able to treat the model in the wavelet domain as an idealized sequence model

$$\begin{aligned} y_{j,\mathbf {k}}^{i}\approx \breve{\theta }_{j,\mathbf {k}}^{i}+\frac{1}{2h(0)\sqrt{n}}z_{j,\mathbf {k}}^{i}, \end{aligned}$$

where \(\frac{1}{2h(0)\sqrt{n}}\) plays the role of the noise level. At this point, we can define a simple estimation procedure for the function f. Some auxiliary results are necessary before stating the main result. The first of these results is needed in order to bound the difference between the true wavelet coefficients \(\theta ^{i}_{j,\mathbf {k}}\) of the function \(f\left( \frac{\mathbf {l}}{T}\right) _{\mathbf {1} \le \mathbf {l} \le \mathbf {T}}\) and discrete wavelet coefficients \(\breve{\theta }^{i}_{j,k}\). Its proof is straightforward and is, therefore, omitted.

Lemma 2

Let \(T=2^{J}\) and define \(f_{J}(u)=\frac{1}{\sqrt{V}}\sum _{\mathbf {l}\le \mathbf {T}}\sum _{i=1}^{2^{q}-1}f\left( \frac{\mathbf {l}}{T}\right) \psi _{J,\mathbf {l}}^{i}(u)\). Then,

$$\begin{aligned} \sup _{f\in B^{\alpha }_{s,t}(M)}\vert |f_{J}-f\vert |^{2}_{2}\le C\cdot q^{d}T^{-2d}, \end{aligned}$$

where \(d=\min \left( \alpha -\frac{q}{s},1\right) \). Moreover, \(|\breve{\theta }_{j_{0},\mathbf {k}}^{i}-\theta _{j,\mathbf {k}}^{i}|\le CT^{-d}2^{-j/2}\) and so \(\sum _{j=j_{0}}^{J-1}\sum _{\mathbf {1} \le k \le {\mathbf {2}^{j}}}(\breve{\theta }_{j,\mathbf {k}}^{i}-\theta _{j,\mathbf {k}}^{i})^{2}\le CT^{-2d}\).

Another result that we need is the following proposition that studies the risk of our proposed procedure.

Proposition 4

Let the empirical wavelet coefficients \(y_{j,\mathbf {k}}^{i}=\breve{\theta }_{j,\mathbf {k}}^{i}+\varepsilon _{j,\mathbf {k}}^{i}+\frac{1}{2h(0)\sqrt{n}}z_{j,\mathbf {k}}^{i}+\xi _{j,\mathbf {k}}^{i}\) be as given in (16) and let estimated block thresholding coefficients \({\hat{\theta }}_{j,\mathbf {k}}^{i}\) be as defined in (7). Then, for some constant \(C>0\),

$$\begin{aligned} \mathbb {E}\,\sum _{\mathbf {k}\in B_{j,u}^{i}}({\hat{\theta }}_{j,\mathbf {k}}^{i}-\breve{\theta }_{j,\mathbf {k}}^{i})^{2} \le \min \left\{ 4\sum _{\mathbf {k}\in B^{i}_{j,u}}[\breve{\theta }_{j,\mathbf {k}}^{i}]^{2},8\lambda L n^{-1}\right\} {+}6\sum _{\mathbf {k}\in B^{i}_{j,u}}[\varepsilon _{j,\mathbf {k}}^{i}]^{2}{+}C L n^{-2}; \end{aligned}$$
(17)

also, for any \(0<\tau <1\), there exists a constant \(C_{\tau }>0\) that depends on \(\tau \) only such that for all \(\mathbf {k}\in B^{i}_{j,u}\)

$$\begin{aligned} \mathbb {E}\,({\hat{\theta }}_{j,\mathbf {k}}^{i}-\breve{\theta }_{j,\mathbf {k}}^{i})^{2}\le C_{\tau }\min \left\{ \max _{\mathbf {k} \in B^{i}_{j,u}}\{(\breve{\theta }_{j,\mathbf {k}}^{i}+\varepsilon _{j,\mathbf {k}}^{i})^{2}\},Ln^{-1}\right\} +n^{-2+\tau }. \end{aligned}$$

Proof

This proof is similar to the proof of Proposition 2 in Brown et al. (2008); therefore, we only give a brief proof for the inequality (17). First, we recall that \(\vert \varepsilon _{\mathbf {l}} \vert \le C\sqrt{\kappa }q^{d/2}T^{-d}\). The discrete wavelet transform of \(\frac{\varepsilon _{\mathbf {l}}}{\sqrt{n}}\) in our case is equal to \(\varepsilon ^{i}_{j,\mathbf {k}}=\sum _{\mathbf {l}\in \mathbb {Z}^{q}}\frac{\varepsilon _{\mathbf {l}}}{\sqrt{n}}\int \phi _{j,\mathbf {l}}\psi _{j,\mathbf {k}}^{i}\). Proposition (2) suggests that \(\sum _{j}\sum _{\mathbf {k}}\sum _{i}[\varepsilon _{j,\mathbf {k}}^{i}]^{2}=\frac{1}{n}\sum _{\mathbf {l} \in \mathbb {Z}^{q}}\varepsilon _{\mathbf {l}}^{2}\le Cq^{d}T^{-2d}\) for some positive constant C due to orthogonality of the discrete wavelet transform. Thus, we have for any \(r>0\)

$$\begin{aligned} \mathbb {E}\,\vert \xi _{j,\mathbf {k}}^{i}\vert ^{r}\le C_{r}(\kappa n)^{-r/2}+C_{r}(\kappa n )^{-r/2}q^{dr/2}T^{-dr}, \end{aligned}$$

and for any \(a>0\)

$$\begin{aligned} P(\vert \xi _{j,\mathbf {k}}^{i}\vert >a)\le C_{r}^{'}(a^{2}\kappa n)^{-r/2}+C_{r}^{'}(a^{2}n T^{2d}/\kappa q^{d})^{-r/2}, \end{aligned}$$
(18)

where \(C_{r}\) and \(C_{r}^{'}\) are constants that do not depend on n. At this point, we need to use Lemma 2 of Brown et al. (2008) with the number of observations being the size of each block L. For an ith wavelet function, \(i=1,\ldots ,2^{q}-1\), we have the expectation of the risk over each block bounded as

$$\begin{aligned}&\mathbb {E}\,\sum _{\mathbf {k}\in B^{i}_{j,u}}({\hat{\theta }}_{j,\mathbf {k}}^{i}-\breve{\theta }_{j,k}^{i})^{2}\le \min \left\{ 4\sum _{\mathbf {k}\in B^{i}_{j,u}}(\breve{\theta }_{j,\mathbf {k}}^{i})^{2},8\lambda Ln^{-1}\right\} +6\sum _{\mathbf {k}\in B^{i}_{j,u}}(\varepsilon _{j,\mathbf {k}}^{i})^2\\&\quad +\, 2n^{-1}\mathbb {E}\sum _{\mathbf {k}\in B^{i}_{j,u}}(z_{j,\mathbf {k}}^{i}+2\sqrt{n}\xi _{j,\mathbf {k}}^{i})^{2}I\left( \sum _{\mathbf {k}\in B^{i}_{j,u}}(z_{j,\mathbf {k}}^{i}+2\sqrt{n}\xi _{j,\mathbf {k}}^{i})^{2}>\lambda L\right) . \end{aligned}$$

Denote by A the event that all \(\vert \xi _{j,\mathbf {k}}^{i}\vert \) are bounded by \(\frac{1}{2\sqrt{n}L}\), that is,

$$\begin{aligned} A=\{\vert 2\sqrt{n}\xi _{j,\mathbf {k}}^{i}\vert \le L^{-1} \text{ for } \text{ all } \mathbf {k} \in B^{i}_{j,u}\}. \end{aligned}$$

Then, it follows from (18) that for any \(r\ge 1\), the probability of a complement of A is

$$\begin{aligned}&P(A^{'})\le \sum _{\mathbf {k}\in B^{i}_{j,u}}P\left( \vert 2\sqrt{n}\xi _{j,\mathbf {k}}^{i}\vert >L^{-1}\right) \\&\qquad \quad \le C_{r}^{'}(L^{-2}\kappa )^{-r/2}+C_{r}^{'}(L^{-2}T^{d}/\kappa q^{d})^{-r/2}. \end{aligned}$$

Thus, we have

$$\begin{aligned} D= & {} \mathbb {E}\,\sum _{\mathbf {k}\in B^{i}_{j,u}}(z_{j,\mathbf {k}}^{i}+2\sqrt{n}\xi _{j,\mathbf {k}}^{i})^{2}I\left( \sum _{\mathbf {k}\in B^{i}_{j,u}}(z_{j,\mathbf {k}}^{i}+2\sqrt{n}\xi _{j,\mathbf {k}}^{i})^{2}>\lambda L\right) \\= & {} \mathbb {E}\sum _{\mathbf {k}\in B^{i}_{j,u}}(z_{j,\mathbf {k}}^{i}+2\sqrt{n}\xi _{j,\mathbf {k}}^{i})^{2}I\left( A\cap \sum _{\mathbf {k}\in B^{i}_{j,u}}(z_{j,k}^{i}+2\sqrt{n}\xi _{j,k}^{i})^{2}>\lambda L\right) \\&+\,\mathbb {E}\sum _{\mathbf {k}\in B^{i}_{j,u}}(z_{j,\mathbf {k}}^{i}{+}2\sqrt{n}\xi _{j,\mathbf {k}}^{i})^{2}I\left( A^{c}\cap \sum _{\mathbf {k}\in B^{i}_{j,u}}(z_{j,\mathbf {k}}^{i}{+}2\sqrt{n}\xi _{j,\mathbf {k}}^{i})^{2}{>}\lambda L\right) \equiv D_{1}{+}D_{2}. \end{aligned}$$

Recall that \((x+y)^{2}\le 2x^{2}+2y^{2}\) for any x and y. At the next step, we have to use the inequality from Lemma 3 of Brown et al. (2008) (with the value \(\tilde{\lambda }=\frac{\lambda L-\lambda -1}{L}\)), and Hölder’s inequality to obtain

$$\begin{aligned} D_{1}&=\mathbb {E}\,\sum _{\mathbf {k}\in B^{i}_{j,u}}(z_{j,\mathbf {k}}^{i}+2\sqrt{n}\xi _{j,\mathbf {k}}^{i})^{2}I\left( A\cap \sum _{\mathbf {k}\in B^{i}_{j,u}}(z_{j,\mathbf {k}}^{i}+2\sqrt{n}\xi _{j,\mathbf {k}}^{i})^{2}>\lambda L\right) \\&\le 2\mathbb {E}\,\sum _{\mathbf {k}\in B^{i}_{j,u}}[z_{j,\mathbf {k}}^{i}]^{2}I\left( \sum _{\mathbf {k}\in B^{i}_{j,u}}[z_{j,\mathbf {k}}^{i}]^{2}>\lambda L-\lambda -1\right) \\&+8n\mathbb {E}\,\sum _{\mathbf {k}\in B^{i}_{j,u}}[\xi _{j,\mathbf {k}}^{i}]^{2}I\left( \sum _{\mathbf {k}\in B^{i}_{j,u}}[z_{j,\mathbf {k}}^{i}]^{2}>\lambda L-\lambda -1\right) \\&\le 2(\lambda L-\lambda -1)e^{-L/2(\lambda -(\lambda +1)L^{-1}-\log (\lambda -(\lambda +1)L^{-1})-1)}\\&+8n\sum _{\mathbf {k}\in B^{i}_{j,u}}(\mathbb {E}\,[\xi _{j,\mathbf {k}}^{i}]^{2v})^{1/v}\left( P\left( \sum _{\mathbf {k}\in B^{i}_{j,u}}[z_{j,\mathbf {k}}^{i}]^{2}>\lambda L-\lambda -1\right) \right) ^{1/\omega }, \end{aligned}$$

where \(v,\omega >1\) and \(\frac{1}{v}+\frac{1}{\omega }=1\). Recall that \(\kappa =n^{1/4}\) and choose \(\frac{1}{\omega }=1-\frac{1}{4}=\frac{3}{4}\). This lets us conclude that \(D_{1}\le CLn^{-1}\). Arguing similarly, we conclude that \(D_{2}\le n^{-1}\) and so the final inequality is obtained. \(\square \)

Remark 5

Note that the tail probability \(P(\vert \xi _{j,\mathbf {k}}^{i}\vert >a)\) must decay faster than any polynomial in n to ensure that the contribution of \(\xi _{j,\mathbf {k}}^{i}\) to the squared risk of the proposed procedure is negligible compared to that of \(z_{j,\mathbf {k}}^{i}\). Recall that \(\kappa =n^{1/4}\). Then, \(\frac{T^{2d}}{\kappa }=n^{\frac{6d-q}{4q}}\) and we have to require that \(6d-q>0\), or \(d=\min \left( \alpha -\frac{q}{s},1\right) >\frac{q}{6}\). Since d characterizes smoothness of the Hölder ball the Besov ball is embedded into, it may be a good idea to describe this requirement in terms of the original smoothness indicator \(\alpha \). Note that (see Remark (6)), due to approximation error over multivariate Besov spaces, we must have \(\frac{3d}{2q}> \frac{2\alpha }{2\alpha +q}\). To guarantee that \(d>\frac{q}{6}\), we may require that \( \frac{4\alpha q}{3(2\alpha +q)}> \frac{q}{6}\) which is equivalent to \(\alpha > \frac{q}{6}\). This is the origin of the lower bound on \(\alpha \) in the statement of Theorems (2) and (3).

Proof

First, note that

$$\begin{aligned} \mathbb {E}\,\Vert {\hat{f}}_{n}-f\Vert ^{2}_{2}\le 2\mathbb {E}\,\Vert {\hat{g}}_{n}-g\Vert ^{2}_{2}+2\mathbb {E}\,({\hat{b}}_{\mathbf {l}}^{2}-b_{\mathbf {l}})^{2}. \end{aligned}$$

By selecting \(\kappa \asymp n^{1/4}\), we ensure that \(\mathbb {E}\,({\hat{b}}_{\mathbf {l}}^{2}-b_{\mathbf {l}})^{2}=o(n^{-2\alpha /2\alpha +q})\) and so we need only to focus on bounding \(\mathbb {E}\,\Vert {\hat{g}}_{n}-g\Vert ^{2}_{2}\). Note that, if the intercept a and/or the median of the vector \(X_{\mathbf {i}}\) in the model (1) is nonzero, an appropriate term in the model can be estimated at the rate of \(n^{-1}=o(n^{-2\alpha /2\alpha +q})\). Using the notation of Sect. (1), let \(A=a+(med X_{\mathbf {i}})^{'}\beta \) and \(A_{n}\) an asymptotically normal \(\sqrt{n}\) convergent estimator of A. Such an estimator can be easily obtained as in Brown et al. (2016). In that case, we will have \(\mathbb {E}\,\Vert {\hat{f}}_{n}-f\Vert ^{2}_{2}\le 2\mathbb {E}\,\Vert {\hat{g}}_{n}-g\Vert ^{2}_{2}+2\mathbb {E}\,({\hat{b}}_{\mathbf {l}}^{2}-b_{\mathbf {l}})^{2}+2E\,({\hat{A}}_{n}-A)^{2}\) where \(E\,({\hat{A}}_{n}-A)^{2}=o(n^{-2\alpha /2\alpha +q})\). Since functions f and g only differ by a constant \(b_{\mathbf {l}}\), their wavelet coefficients coincide, that is, \(\theta _{j,\mathbf {k}}^{i}=\int _{[0,1]^{q}} f\mathbf {\psi }_{j,\mathbf {k}}^{i}=\int _{[0,1]^{q}]}g\psi _{j,\mathbf {k}}^{i}\). To make our analysis straightforward, we expand \( \mathbb {E}\,\Vert {\hat{g}}_{n}-g\Vert ^{2}_{2}\) as follows:

$$\begin{aligned} \mathbb {E}\,\Vert {\hat{g}}_{n}^{2}-g\Vert ^{2}_{2}&=\sum _{\mathbf {1} \le \mathbf {k}\le \mathbf {2^{j_0}}}\sum _{i=1}^{2^{q}-1} \mathbb {E}\,[\hat{\theta }_{j_{0},\mathbf {k}}-\theta _{j_{0},\mathbf {k}}]^{2}+\sum _{j=j_{0}}^{J-1}\sum _{\mathbf {1} \le \mathbf {k}\le \mathbf {2^{j}}} \sum _{i=1}^{2^{q}-1}\mathbb {E}\,(\hat{\theta }_{j,\mathbf {k}}^{i}-\theta _{j,\mathbf {k}}^{i})^{2}\\&\quad +\sum _{j=J}^{\infty }\sum _{\mathbf {1} \le \mathbf {k}\le \mathbf {2^{j}}}\sum _{i=1}^{2^{q}-1}(\theta _{j,\mathbf {k}}^{i} )^{2}\equiv S_{1}+S_{2}+S_{3}. \end{aligned}$$

First, we note that the term \(S_{1}\) is asymptotically small. Indeed, by definition, \({\hat{\theta }}_{j_{0},\mathbf {k}}=y_{j_{0},\mathbf {k}}\). Since \({\hat{\theta }}_{j_{0},\mathbf {k}}-\theta _{j_{0},\mathbf {k}}=(y_{j_{0},\mathbf {k}}-\breve{\theta }_{j_{0},\mathbf {k}})+(\breve{\theta }_{j_{0},\mathbf {k}}-\theta _{j_{0},\mathbf {k}})\), we have

$$\begin{aligned} S_{1}\le C\cdot 2^{j_{0}q}n^{-1}\varepsilon ^{2}+CT^{-2d}=o(n^{-2\alpha /2\alpha +q}). \end{aligned}$$

The term \(S_{3}\) is also small asymptotically. To show that this is true, we note first that \(2^{j\left( \alpha +q\left( \frac{1}{2}-\frac{1}{s}\right) \right) }\left( \sum _{\mathbf {1} \le \mathbf {k}\le \mathbf {2^{j}}}\sum _{i=1}^{2^{q}-1}|\theta _{j,k}^{i}|^s\right) ^{1/s}\le M\) for any function \(f\in B^{\alpha }_{s,t}(M)\). Then, using the inequality \(\vert |x\vert |_{p_{2}}\le \vert |x\vert |_{p_{1}}\le q^{1/p_{1}-1/p_{2}}\vert |x\vert |_{p_{2}}\) for any \(0<p_{1}\le p_{2}\le \infty \) and \(x\in {\mathbb {R}}^{q}\), we obtain that

$$\begin{aligned} S_{3}\le 2^{-2J\min \left( \alpha ,\alpha +q\left( \frac{1}{2}-\frac{1}{s}\right) \right) }=o(n^{-2\alpha /2\alpha +q}) \end{aligned}$$

due to assumptions on J and \(\alpha \). In the next step, we will use Proposition (4) to analyze the term \(S_{2}\). Next, we find out that

$$\begin{aligned} S_{2}&\le 2\sum _{j=j_{0}}^{J-1}\sum _{\mathbf {1} \le \mathbf {k}\le \mathbf {2^{j}}}\sum _{i=1}^{2^{q}-1}\mathbb {E}\,(\hat{\theta }^{i}_{j,\mathbf {k}}-\breve{\theta }_{j,\mathbf {k}}^{i})^{2}+2\sum _{j=j_{0}}^{J-1}\sum _{i=1}^{2^{q}-1}(\breve{\theta }_{j,\mathbf {k}}^{i}-\theta ^{i}_{j,\mathbf {k}})^{2}\\&\le \sum _{j=1}^{J-1}\sum _{u=1}^{2^{qj}/L}\sum _{i=1}^{2^{q}-1}\min \left\{ 8\sum _{\mathbf {k}\in B^{i}_{j,u}}[\breve{\theta }_{j,\mathbf {k}}^{i}]^{2},8\lambda ^{*}Ln^{-1}\right\} +6\sum _{j=j_{0}}^{J-1}\sum _{\mathbf {1}\le \mathbf {k}\le \mathbf {2^{j}}}\sum _{i=1}^{2^{q}-1}[\varepsilon _{j,\mathbf {k}}^{i}]^{2}\\&\quad +\,Cn^{-1}+2\sum _{j=j_{0}}^{J-1}\sum _{\mathbf {1} \le \mathbf {k}\le \mathbf {2^{j}}}\sum _{i=1}^{2^{q}-1}[\breve{\theta }_{j,\mathbf {k}}^{i}-\theta _{j,\mathbf {k}}^{i}]^{2}\\&\le \sum _{j=1}^{J-1}\sum _{u=1}^{2^{qj}/L}\sum _{i=1}^{2^{q}-1}\min \left\{ 8\sum _{\mathbf {k}\in B^{i}_{j,u}}[\breve{\theta }_{j,\mathbf {k}}^{i}]^{2},8\lambda ^{*}Ln^{-1}\right\} +Cn^{-1}+CT^{-2d}. \end{aligned}$$

At this point, we consider two different cases. First, start with \(p\ge 2\). Select \(J_{1}=\left\lfloor \frac{q}{2\alpha +q}\log _{2}n\right\rfloor \) which implies that \(2^{J_{1}}\approx n^{q/2\alpha +q}\). Thus, using the result of Lemma (2), we obtain

$$\begin{aligned} S_{2}&\le 8\lambda ^{*} \sum _{j=j_{0}}^{J_{1}-1}\sum _{u=1}^{2^{qj}/L}\sum _{i=1}^{2^{q}-1}Ln^{-1}\nonumber \\&\quad +\,8\sum _{j=J_{1}}^{J-1}\sum _{\mathbf {1} \le \mathbf {k} \le \mathbf {2^{j}}}\sum _{i=1}^{2^{q}-1}[\breve{\theta }_{j,\mathbf {k}}^{i}]^{2}+Cn^{-1}+CT^{-2d}\le Cn^{-2\alpha /2\alpha +q}. \end{aligned}$$

Next, consider the case of \(p<2\). First, note that

$$\begin{aligned} \sum _{u=1}^{2^{jq}/L}\sum _{i=1}^{2^{q}-1}\left( \sum _{\mathbf {k}\in B^{i}_{j,u}}[\theta _{j,\mathbf {k}}^{i}]^{2}\right) ^{s/2} \le \sum _{\mathbf {1} \le \mathbf {k}\le 2^{\mathbf {j}}}([\theta _{j,\mathbf {k}}^{i}]^{2})^{s/2}\le M2^{-jws}. \end{aligned}$$

Select \(J_{2}\) such that \(2^{J_{2}}\asymp n^{1/2\alpha +q}(\log n)^{2-s/s(2\alpha +q)+2(1-q)}\). Using Lemma 6 from Brown et al. (2008), one obtains

$$\begin{aligned} \sum _{j=J_{2}}^{J-1}\sum _{u=1}^{2^{j}/L}\min \left\{ 8\sum _{\mathbf {k}\in B^{i}_{j,u}}[\breve{\theta }_{j,\mathbf {k}}^{i}]^{2},8\lambda ^{*}Ln^{-1}\right\} \le Cn^{-2\alpha /2\alpha +q}(\log n)^{2-s/s(2\alpha +q)+2(1-q)}. \end{aligned}$$

On the other hand, we also have

$$\begin{aligned}&\sum _{j=j_{0}}^{J_{2}-1}\sum _{u=1}^{2^{qj}/L}\min \left\{ 8\sum _{\mathbf {k}\in B^{i}_{j,u}}[\breve{\theta }_{j,\mathbf {k}}^{i}]^{2},8\lambda ^{*}Ln^{-1}\right\} \\&\sum _{j=j_{0}}^{J_{2}-1}\sum _{u=1}^{2^{qj}/L}8\lambda _{*}Ln^{-1}\le Cn^{-2\alpha /2\alpha +q}(\log n)^{2-s/[s(2\alpha +q)+2(1-q)]}. \end{aligned}$$

Thus, we can now confirm that the \(L_{2}\) risk in the case \(p<2\) is bounded from above uniformly as

$$\begin{aligned} \mathbb {E}\,\Vert {\hat{f}}_{n}-f\Vert ^{2}_{2}\le Cn^{-2\alpha /2\alpha +q}(\log n)^{2-s/[s(2\alpha +q)+2(1-q)]} \end{aligned}$$

\(\square \).

Remark 6

In order to ensure that the risk of \({\hat{b}}_{\mathbf {l}}\) is negligible, we need to have \(\kappa ^{-4}=o(n^{-2\alpha /2\alpha +q})\); note that \(\kappa =n^{1/4}\) satisfies this assumption. Also, to make the approximation error \(\Vert f_{J}-f\Vert ^{2}_{2}\) negligible, we need to have \(T^{-2d}=O(n^{-2\alpha /2\alpha +q})\). It is easy to see that this is guaranteed by the inequality \(\frac{3d}{2q}>\frac{2\alpha }{2\alpha +q}\). Note that the latter, rather ponderous, assumption, is needed due to approximation over the q-dimensional Besov spaces.

Proof

As in the proof of Theorem (2), and without loss of generality, we can assume that the \(med(X_{i})\) is identically equal to zero; if this is not the case, an additional term can be estimated at the rate of \(n^{-1}=o\left( \frac{\log n}{n}\right) ^{\alpha /2\alpha +q}\). Next, note that for all \(f\in \Lambda ^{\alpha }(M)\), the absolute values of its wavelet coefficients are \(|\theta _{j,\mathbf {k}}^{i}|=|\langle f, \psi _{j,\mathbf {k}}^{i}\rangle |\le C2^{-j(q/2+\alpha )}\) for some constant \(C>0\) that does not depend on f. Also, note that for any random variables \(X_{i}\), \(i=1,\ldots ,n\), \(\mathbb {E}\,\left( \sum _{i=1}^{n}X_{i}\right) ^{2}\le \left( \sum _{i=1}^{n}(\mathbb {E}\,X_{i}^{2})^{1/2}\right) ^{2}\). Then, we have

$$\begin{aligned} \mathbb {E}\,({\hat{f}}_{n}(\mathbf {u}_{0})-f(\mathbf {u}_{0}))^{2}&=\mathbb {E}\,\Big [\sum _{\mathbf {1} \le \mathbf {k}\le \mathbf {2}^{j_{0}}}({\hat{\theta }}_{j_{0},\mathbf {k}}-\theta _{j_{0},\mathbf {k}})\phi _{j_0,\mathbf {k}}^{i}(\mathbf {u}_{0})\\&\quad +\,\sum _{j=j_{0}}^{\infty }\sum _{\mathbf {1} \le \mathbf {k}\le \mathbf {2}^{j}}\sum _{i=1}^{2^{q}-1}({\hat{\theta }}_{j,\mathbf {k}}^{i}-\theta _{j,\mathbf {k}}^{i})\psi _{j,\mathbf {k}}^{(i)}(\mathbf {u}_{0})-({\hat{b}}_{\mathbf {l}}-b_{\mathbf {l}})\Big ]^{2}\\&\le \Big [(\mathbb {E}\,({\hat{b}}_{\mathbf {l}}-b_{\mathbf {l}})^{2})^{1/2}+\sum _{\mathbf {1} \mathbf {k}\le \mathbf {2}^{j_{0}}}(\mathbb {E}\,({\hat{\theta }}_{j_{0},\mathbf {k}}-\theta _{j_{0},\mathbf {k}})^{2}\phi _{j_{0},\mathbf {k}}^{2}(\mathbf {u}_{0}))^{1/2}\\&\quad +\sum _{j=j_{0}}^{J-1}\sum _{\mathbf {1} \le \mathbf {k}\le \mathbf {2}^{j}}\sum _{i=1}^{2^{q}-1}(\mathbb {E}\,({\hat{\theta }}_{j,\mathbf {k}}^{i}-\theta _{j,\mathbf {k}}^{i})^{2}[\psi _{j,\mathbf {k}}^{i}]^{2}(\mathbf {u}_{0}))^{1/2}\\&\quad +\sum _{j=J}^{\infty }\sum _{\mathbf {1} \le \mathbf {k}\le \mathbf {2}^{j}}\sum _{i=1}^{2^{q}-1}|\theta _{j,\mathbf {k}}^{i}\psi _{j,\mathbf {k}}^{i}(\mathbf {u}_{0})|\Big ]^{2}\\&\equiv (Q_{1}+Q_{2}+Q_{3}+Q_{4})^{2}. \end{aligned}$$

First of all, we note that, due to Proposition (3), we have \(Q_{1} =(\mathbb {E}\,({\hat{b}}_{\mathbf {l}}-b_{\mathbf {l}})^{2})^{1/2}=o(n^{-\alpha /2\alpha +q})\). Next, clearly we have \(Q_{2}=\sum _{\mathbf {1} \le \mathbf {k}\le \mathbf {2}^{j_{0}}}\mathbb {E}\,({\hat{\theta }}_{j_{0},\mathbf {k}}-\theta _{j_{0},\mathbf {k}})^{2}\vert \phi _{j_{0},\mathbf {k}}(\mathbf {u}_{0})\vert =O(n^{-1})\). Recall that for any sequence of translated and rescaled ith wavelet \(\psi _{j,\mathbf {k}}^{i}\) there are at most N that are nonvanishing at the point \(\mathbf {u}_{0}\); here, N is the length of support of \(\psi ^{i}\). Mathematically, we have \(K(t_{0},j)=\{\mathbf {k}:\psi _{j,\mathbf {k}}^{(i)}(\mathbf {u}_{0})\ne 0\}\) such that \(\vert K_{t_{0},j} \vert \le N\). Thus, we have

$$\begin{aligned} Q_{4}\!=\!\sum _{j=J}^{\infty }\sum _{\mathbf {1} \le \mathbf {k}\le \mathbf {2}^{j}}\sum _{i=1}^{2^{q}-1}|\theta _{j,\mathbf {k}}^{i}||\psi _{j,\mathbf {k}}^{i}(\mathbf {u}_{0})|\le \sum _{j=J}^{\infty }N 2^{q}\Vert \psi \Vert _{\infty }2^{jq/2}C2^{-j(q/2+\alpha )}\le C T^{-\alpha }. \end{aligned}$$
(19)

Finally, if we select a sufficiently small \(\tau \) and use the second inequality from Proposition (3), we have

$$\begin{aligned} Q_{3}&\le \sum _{j=j_{0}}^{J-1}\sum _{K(t_{0},j)}\sum _{i=1}^{2^{q}-1}2^{jq/2}\Vert \psi \Vert _{\infty }(\mathbb {E}\,({\hat{\theta }}_{j,\mathbf {k}}^{i}-\theta _{j,\mathbf {k}}^{i})^{2})^{1/2}\\&\le C\sum _{j=j_{0}}^{J-1}2^{jq/2}\left[ \min (2^{-j(q+2\alpha )}+T^{-2\alpha \wedge 1}2^{-jq},Ln^{-1})+n^{-2+\tau }\right] ^{1/2}\\&\le C\left( \frac{\log n}{n}\right) ^{\alpha /2\alpha +q}. \end{aligned}$$

The final statement of this theorem is then obtained combining all inequalities for \(Q_{1}\), \(Q_{2}\), \(Q_{3}\), and \(Q_{4}.\)\(\square \)

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Levine, M. Robust functional estimation in the multivariate partial linear model. Ann Inst Stat Math 71, 743–770 (2019). https://doi.org/10.1007/s10463-018-0661-1

Download citation

  • Received:

  • Revised:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10463-018-0661-1

Keywords

Navigation