High-dimensional local polynomial regression with variable selection and dimension reduction

Cheung, Kin Yap; Lee, Stephen M. S.

doi:10.1007/s11222-023-10308-1

High-dimensional local polynomial regression with variable selection and dimension reduction

Original Paper
Published: 17 October 2023

Volume 34, article number 1, (2024)
Cite this article

Statistics and Computing Aims and scope Submit manuscript

Kin Yap Cheung¹ &
Stephen M. S. Lee¹

314 Accesses
Explore all metrics

Abstract

Variable selection and dimension reduction have been considered in nonparametric regression for improving the precision of estimation, via the formulation of a semiparametric multiple index model. However, most existing methods are ill-equipped to cope with a high-dimensional setting where the number of variables may grow exponentially fast with sample size. We propose a new procedure for simultaneous variable selection and dimension reduction in high-dimensional nonparametric regression problems. It consists essentially of penalised local polynomial regression, with the bandwidth matrix regularised to facilitate variable selection, dimension reduction and optimal estimation at the oracle convergence rate, all in one go. Unlike most existing methods, the proposed procedure does not require explicit bandwidth selection or an additional step of dimension determination using techniques like cross-validation or principal components. Empirical performance of the procedure is illustrated with both simulated and real data examples.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Local Polynomials for Variable Selection

Multiple-population shrinkage estimation via sliced inverse regression

Article 20 November 2015

Robust and efficient estimation of nonparametric generalized linear models

Article 16 May 2023

References

Allen, G.I.: Automatic feature selection via weighted kernels and regularization. J. Comput. Graph. Stat. 22, 284–299 (2013)
Article MathSciNet Google Scholar
Chen, X., Zou, C., Cook, R.D.: Coordinate-independent sparse sufficient dimension reduction and variable selection. Ann. Stat. 38, 3696–3723 (2010)
Article MathSciNet Google Scholar
Chen, J., Zhang, C., Kosorok, M.R., Liu, Y.: Double sparsity kernel learning with automatic variable selection and data extraction. Stat. Interface 11, 401–420 (2018)
Article MathSciNet Google Scholar
Conn, D., Li, G.: An oracle property of the Nadaraya–Watson kernel estimator for high dimensional nonparametric regression. Scand. J. Stat. 46, 735–764 (2019)
Article MathSciNet Google Scholar
Cook, R.D., Li, B.: Dimension reduction for conditional mean in regression. Ann. Stat. 30, 455–474 (2002)
Article MathSciNet Google Scholar
Cook, R.D., Weisberg, S.: Comment on “sliced inverse regression for dimension reduction’’. J. Am. Stat. Assoc. 86, 328–332 (1991)
Google Scholar
Fan, J., Gijbels, I.: Local Polynomial Modelling and Its Applications. Chapman and Hall, London (1996)
Google Scholar
Giordano, F., Lahiri, S.N., Parrella, M.L.: GRID: A variable selection and structure discovery method for high dimensional nonparametric regression. Ann. Stat. 48, 1848–1874 (2020)
Article MathSciNet Google Scholar
Jiang, R., Qian, W.M., Zhou, Z.G.: Single-index composite quantile regression with heteroscedasticity and general error distributions. Stat. Pap. 57, 185–203 (2016)
Article MathSciNet Google Scholar
Lafferty, J., Wasserman, L.: Rodeo: sparse, greedy nonparametric regression. Ann. Stat. 36, 28–63 (2008)
Article MathSciNet Google Scholar
Li, K.C.: Sliced inverse regression for dimension reduction. J. Am. Stat. Assoc. 86, 316–327 (1991)
Article MathSciNet Google Scholar
Li, K.C.: On principal Hessian directions for data visualization and dimension reduction: another application of Stein’s lemma. J. Am. Stat. Assoc. 87, 1025–1039 (1992)
Article MathSciNet Google Scholar
Li, L.: Sparse sufficient dimension reduction. Biometrika 94, 603–613 (2007)
Article MathSciNet Google Scholar
Li, B., Dong, Y.: Dimension reduction for nonelliptically distributed predictors. Ann. Stat. 37, 1272–1298 (2009)
Article MathSciNet Google Scholar
Li, L., Cook, R.D., Nachtsheim, C.J.: Model-free variable selection. J. R. Stat. Soc. Ser. B 67, 285–299 (2005)
Article MathSciNet Google Scholar
Rekabdarkolaee, H.M., Wang, Q.: Variable selection through adaptive MAVE. Stat. Probab. Lett. 128, 44–51 (2017)
Article MathSciNet Google Scholar
van der Vaart, A.W., Wellner, J.A.: Weak Convergence and Empirical Processes. Springer, New York (1996)
Book Google Scholar
Wang, Q., Yin, X.: A nonlinear multi-dimensional variable selection method for high dimensional data: sparse MAVE. Comput. Stat. Data Anal. 52, 4512–4520 (2008)
Article MathSciNet Google Scholar
Wang, T., Xu, P., Zhu, L.: Penalized minimum average variance estimation. Stat. Sin. 23, 543–569 (2013)
MathSciNet Google Scholar
White, K.R., Stefanski, L.A., Wu, Y.: Variable selection in kernel regression using measurement error selection likelihoods. J. Am. Stat. Assoc. 112, 1587–1597 (2017)
Article MathSciNet Google Scholar
Wu, W., Hilafu, H., Xue, Y.: Simultaneous estimation for semi-parametric multi-index models. J. Stat. Comput. Simul. 89, 2354–2372 (2019)
Article MathSciNet Google Scholar
Xia, Y.: Asymptotic distributions for two estimators of the single-index model. Econometric Theory 22, 1112–1137 (2006)
Article MathSciNet Google Scholar
Xia, Y., Tong, H., Li, W.K., Zhu, L.X.: An adaptive estimation of dimension reduction space. J. R. Stat. Soc. Ser. B 64, 363–410 (2002)
Article MathSciNet Google Scholar
Yu, P., Du, J., Zhang, Z.: Single-index partially functional linear regression model. Stat. Pap. 61, 1107–1123 (2020)
Article MathSciNet Google Scholar
Zhang, J.: Estimation and variable selection for partial linear single-index distortion measurement errors models. Stat. Pap. 62, 887–913 (2021)
Article MathSciNet Google Scholar
Zhao, W., Zhang, F., Li, R., Lian, H.: Principal single-index varying-coefficient models for dimension reduction in quantile regression. J. Stat. Comput. Simul. 90, 800–818 (2020)
Article MathSciNet Google Scholar
Zhao, W., Li, R., Lian, H.: High-dimensional quantile varying-coefficient models with dimension reduction. Metrika 85, 1–19 (2022)
Article MathSciNet Google Scholar

Download references

Author information

Authors and Affiliations

Department of Statistics and Actuarial Science, The University of Hong Kong, Pokfulam, Hong Kong
Kin Yap Cheung & Stephen M. S. Lee

Authors

Kin Yap Cheung
View author publications
You can also search for this author in PubMed Google Scholar
Stephen M. S. Lee
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

Lee designed and directed the research. Cheung wrote the main manuscript and performed the numerical studies. All authors reviewed the manuscript.

Corresponding author

Correspondence to Kin Yap Cheung.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix

1.1 I Lemmas and proofs

Consider a fixed subset $\mathcal {B}\subset \{1,\ldots ,D\}$ with $|\mathcal {B}|$ bounded, and a bandwidth vector $\pmb {h}=(h_1,\ldots ,h_D)^\top \in (0,\infty ]^D$ satisfying $\bar{h}\equiv \max _{d\in \mathcal {B}}h_d=o(1)$ and $h_d^{-1}=O(1)$ for $d\notin \mathcal {B}$. Define, for $\mathcal {T}\in \mathbb {R}^{r\times D}$ and $i=1,\ldots ,n$, $w_{i,\mathcal {T}}=Y_i-m_{\mathcal {T}}(\mathcal {T}\pmb {X}_i)$. For $\pmb {x}\in \mathbb {R}^D$, $T\in \mathscr {T}^\circ $, $\pmb {\gamma }\in \mathbb {Z}_+^D$, $r\ge 1$ and any function g on $\mathbb {R}^D$, define

$$\begin{aligned} \kappa _{\pmb {\gamma }}(\pmb {x};T)&=n^{-1}\pmb {h}_\mathcal {B}^{-\pmb {\gamma }_\mathcal {B}}\pmb {h}_{\mathcal {B}^c}^{\varvec{1}_{\mathcal {B}^c}}\\&\sum _{i=1}^n(T\pmb {X}_i-T\pmb {x})^{\pmb {\gamma }}K_{\textrm{diag}(\pmb {h})^{-1}T}(\pmb {X}_i-\pmb {x}),\\ \mathcal {K}_{\mathcal {B}}^r(\pmb {x};g,\pmb {\gamma },T)&= \int g(T_{\mathcal {B}\cdot }\,\pmb {x},T_{\mathcal {B}^c\cdot }\,\pmb {x}+\pmb {u}_{\mathcal {B}^c})\pmb {u}_{\mathcal {B}^c}^{\pmb {\gamma }_{\mathcal {B}^c}}\\&\prod _{d\in \mathcal {B}^c}K(u_d/h_d)^rd\pmb {u}_{\mathcal {B}^c}. \end{aligned}$$

We state four technical lemmas before presenting our main proofs.

Lemma 1 establishes asymptotic expansions for the mean and variance of a general kernel-weighted sample average commonly found in local polynomial regression, with index-specific bandwidths set to be $\pmb {h}$.

Lemma 1

Let $\pmb {\gamma }\in \pmb {\Gamma }^{2p+1}$, $T\in \mathscr {T}^\circ $ and $\pmb {x}\in \mathbb {R}^D$ be fixed. Let g be a $(\Delta (\Vert \pmb {\gamma }_{\mathcal {B}}\Vert _1)+1)$ times differentiable function on $\mathbb {R}^D$, with $\mathbb {E}\left[ \Vert T\pmb {X}\Vert _2^{2p+1}|g(T\pmb {X})|\right] <\infty $. Let $W_1,\ldots ,W_n$ be independent random variables such that $\mathbb {E}[W_i^r|T\pmb {X}_1,\ldots ,T\pmb {X}_n]=\omega _r$, for $r=1,2$ and $i=1,\ldots ,n$. Then we have

$$\begin{aligned}{} & {} \mathbb {E}\Big [n^{-1}\pmb {h}_\mathcal {B}^{-\pmb {\gamma }_\mathcal {B}}\pmb {h}_{\mathcal {B}^c}^{\varvec{1}_{\mathcal {B}^c}} \sum _{i=1}^n(T\pmb {X}_i-T\pmb {x})^{\pmb {\gamma }}K_{\textrm{diag}(\pmb {h})^{-1}T}\nonumber \\{} & {} \quad (\pmb {X}_i-\pmb {x})g(T\pmb {X}_i)W_i \Big ]\nonumber \\= & {} \omega _1\sum _{\pmb {\psi }\in \pmb {\Gamma }^{\Delta (\Vert \pmb {\gamma }_{\mathcal {B}}\Vert _1)}_{\mathcal {B}}} \pmb {h}_\mathcal {B}^{\pmb {\psi }_\mathcal {B}}\mathcal {K}_{\mathcal {B}}^1(\pmb {x}; \nabla _{\pmb {\psi }}(f_Tg),\pmb {\gamma },T)\nonumber \\{} & {} \prod _{d\in \mathcal {B}}\mu _{1,\gamma _d+\psi _d}+O\left( \omega _1\bar{h}^{\Delta (\Vert \pmb {\gamma }_{\mathcal {B}}\Vert _1)+1}\right) , \end{aligned}$$

(S.1)

and

$$\begin{aligned}{} & {} \textrm{Var}\Big (n^{-1}\pmb {h}_\mathcal {B}^{-\pmb {\gamma }_\mathcal {B}}\pmb {h}_{\mathcal {B}^c}^{\varvec{1}_{\mathcal {B}^c}} \sum _{i=1}^n(T\pmb {X}_i-T\pmb {x})^{\pmb {\gamma }}\nonumber \\{} & {} K_{\textrm{diag}(\pmb {h})^{-1}T}(\pmb {X}_i-\pmb {x}) g(T\pmb {X}_i)W_i\Big )\nonumber \\{} & {} =n^{-1}\pmb {h}_\mathcal {B}^{-\varvec{1}_{\mathcal {B}}}\omega _2\mathcal {K}_{\mathcal {B}}^2(\pmb {x};f_Tg^2,2\pmb {\gamma },T)\prod _{d\in \mathcal {B}}\mu _{2,2\gamma _d} \left\{ 1+O\left( \bar{h}\right) \right\} .\nonumber \\ \end{aligned}$$

(S.2)

Proof of Lemma 1

Noting that

$$\begin{aligned}{} & {} \mathbb {E}\left[ (T\pmb {X}_1-T\pmb {x})^{\pmb {\gamma }}K_{\textrm{diag}(\pmb {h})^{-1}T}(\pmb {X}_1-\pmb {x})g(T\pmb {X}_1)W_1\right] \\{} & {} =\omega _1\pmb {h}_\mathcal {B}^{\pmb {\gamma }_\mathcal {B}}\pmb {h}_{\mathcal {B}^c}^{-\varvec{1}_{\mathcal {B}^c}}\\{} & {} \times \int (f_Tg)(T_{\mathcal {B}\cdot }\pmb {x}+\textrm{diag}(\pmb {h}_{\mathcal {B}})\pmb {u}_{\mathcal {B}},T_{\mathcal {B}^c\cdot }\pmb {x}+ \pmb {u}_{\mathcal {B}^c})\\{} & {} \Big \{\pmb {u}_\mathcal {B}^{\pmb {\gamma }_\mathcal {B}}\prod _{d\in \mathcal {B}}K(u_d)\Big \}\Big \{\pmb {u}_{\mathcal {B}^c}^{\pmb {\gamma }_{\mathcal {B}^c}} \prod _{d\in \mathcal {B}^c}K(u_d/h_d)\Big \}d\pmb {u}, \end{aligned}$$

(S.1) follows by Taylor expanding $f_Tg$ in powers of $\pmb {h}_{\mathcal {B}}$, that is

$$\begin{aligned}{} & {} (f_Tg)(T_{\mathcal {B}\cdot }\pmb {x}+\textrm{diag}(\pmb {h}_{\mathcal {B}})\pmb {u}_{\mathcal {B}},T_{\mathcal {B}^c\cdot }\pmb {x}+\pmb {u}_{\mathcal {B}^c})\\= & {} \sum _{\pmb {\psi }\in \pmb {\Gamma }^{\Delta (\Vert \pmb {\gamma }_{\mathcal {B}}\Vert _1)}_{\mathcal {B}}}\frac{1}{\pmb {\psi }!}\nabla _{\pmb {\psi }}(f_Tg)(T_{\mathcal {B}\cdot } \pmb {x},T_{\mathcal {B}^c\cdot }\pmb {x}+\pmb {u}_{\mathcal {B}^c}) \pmb {h}_{\mathcal {B}}^{\pmb {\psi }_{\mathcal {B}}}\pmb {u}_{\mathcal {B}}^{\pmb {\psi }_{\mathcal {B}}}\\{} & {} +\sum _{\pmb {\psi }\in \pmb {\Gamma }^{\Delta (\Vert \pmb {\gamma }_{\mathcal {B}}\Vert _1)+1}_{\mathcal {B}},\,\Vert \pmb {\psi }\Vert _1=\Delta (\Vert \pmb {\gamma }_{\mathcal {B}}\Vert _1)+1} \hspace{-7ex}\frac{\Delta (\Vert \pmb {\gamma }_{\mathcal {B}}\Vert _1) +1}{\pmb {\psi }!}\pmb {h}_{\mathcal {B}}^{\pmb {\psi }_{\mathcal {B}}}\pmb {u}_{\mathcal {B}}^{\pmb {\psi }_{\mathcal {B}}}\\{} & {} \times \int _0^1 (1-t)^{\Delta (\Vert \pmb {\gamma }_{\mathcal {B}}\Vert _1)}\nabla _{\pmb {\psi }}(f_Tg)\\{} & {} \big (T_{\mathcal {B}\cdot }\pmb {x}+t\, \textrm{diag}(\pmb {h}_{\mathcal {B}})\pmb {u}_{\mathcal {B}},T_{\mathcal {B}^c\cdot }\pmb {x}+\pmb {u}_{\mathcal {B}^c}\big )dt, \end{aligned}$$

term-by-term integration and (A2). The result (S.2) follows by noting that

$$\begin{aligned}{} & {} {\textrm{Var}\Big (n^{-1}\pmb {h}_\mathcal {B}^{-\pmb {\gamma }_\mathcal {B}}\pmb {h}_{\mathcal {B}^c}^{\varvec{1}_{\mathcal {B}^c}} \sum _{i=1}^n(T\pmb {X}_i-T\pmb {x})^{\pmb {\gamma }}K_{\textrm{diag}(\pmb {h})^{-1}T}}\\{} & {} \quad \times (\pmb {X}_i-\pmb {x}) g(T\pmb {X}_i)W_i\Big )\\{} & {} =n^{-1}\pmb {h}_\mathcal {B}^{-2\pmb {\gamma }_\mathcal {B}}\pmb {h}_{\mathcal {B}^c}^{2\varvec{1}_{\mathcal {B}^c}} \omega _2\mathbb {E}\left[ (T\pmb {X}_1-T\pmb {x})^{2\pmb {\gamma }}K_{\textrm{diag}(\pmb {h})^{-1}T}\right. \\{} & {} \left. (\pmb {X}_1-\pmb {x})^2 g(T\pmb {X}_1)^2\right] -O(n^{-1}\omega _1^2)\\{} & {} =n^{-1}\omega _2\pmb {h}_\mathcal {B}^{-\varvec{1}_{\mathcal {B}}} \Big \{\mathcal {K}^2_{\mathcal {B}}(\pmb {x};f_Tg^2,2\pmb {\gamma })\prod _{d\in \mathcal {B}}\mu _{2,2\gamma _d} +O\left( \bar{h}\right) \Big \}\\{} & {} -O(n^{-1}\omega _1^2) \end{aligned}$$

and that $n^{-1}\omega _1^2=O\left( n^{-1}\pmb {h}_\mathcal {B}^{-\varvec{1}_{\mathcal {B}}}\bar{h}\right) $. $\square $

In particular, we deduce by setting $g(\cdot )\equiv 1$ and $W_i\equiv 1$ for all i in Lemma 1 that

$$\begin{aligned}{} & {} \kappa _{\pmb {\gamma }}(\pmb {x};T)\\{} & {} =\left\{ \begin{array}{ll}\displaystyle \mathcal {K}_{\mathcal {B}}^1(\pmb {x};f_T,\pmb {\gamma },T) \prod _{d\in \mathcal {B}}\mu _{1,\gamma _d}\big \{1+O_p(\bar{h}+n^{-1/2}\pmb {h}_\mathcal {B}^{-(1/2)\varvec{1}_{\mathcal {B}}})\big \}, &{}\\ \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \Delta (\Vert \pmb {\gamma }_{\mathcal {B}}\Vert _1)=0,&{} \\ \displaystyle \Omega _p(\bar{h}+n^{-1/2}\pmb {h}_{\mathcal {B}}^{-(1/2)\varvec{1}_{\mathcal {B}}}),&{} \\ \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \Delta (\Vert \pmb {\gamma }_{\mathcal {B}}\Vert _1)=1.&{}\\ \end{array}\right. \end{aligned}$$

Since $\mathcal {K}_{\mathcal {B}}^1(\pmb {x};f_T,\pmb {\gamma },T)$ does not depend on $\pmb {\gamma }$ for $\pmb {\gamma }\in \pmb {\Gamma }^p_{\mathcal {B}}$, we can construct, by (A1), an inverse $\{\breve{\kappa }_{\pmb {\gamma },\pmb {\psi }}(\pmb {x};T):\pmb {\gamma },\pmb {\psi }\in \pmb {\Gamma }^p_{\mathcal {B}}\}$ such that

$$\begin{aligned} \sum _{\pmb {\psi }\in \pmb {\Gamma }^p_{\mathcal {B}}}\breve{\kappa }_{\pmb {\gamma },\pmb {\psi }}(\pmb {x};T)\kappa _{\pmb {\gamma }'+\psi }(\pmb {x};T)=\mathbb {I}\{\pmb {\gamma }=\pmb {\gamma }'\}, \;\;\;\pmb {\gamma },\pmb {\gamma }'\in \pmb {\Gamma }_{\mathcal {B}}^p. \end{aligned}$$

Setting $\pmb {Q}=\textrm{diag}(\pmb {h})^{-1}T$, the local coefficient estimators ${\hat{\beta _{\pmb {\gamma }}}}^{-\emptyset }(\pmb {x};\pmb {Q})$ defined in (1) solve the following system of equations for $\beta _{\pmb {\gamma }}$:

$$\begin{aligned}{} & {} n^{-1}\pmb {h}_{\mathcal {B}}^{-\pmb {\gamma }_{\mathcal {B}}} \pmb {h}_{\mathcal {B}^c}^{\varvec{1}_{\mathcal {B}^c}}\sum _{i=1}^n(T\pmb {X}_i-T\pmb {x})^{\pmb {\gamma }}K_{\pmb {Q}}(\pmb {X}_i-\pmb {x})Y_i\nonumber \\{} & {} =\sum _{\pmb {\psi }\in \pmb {\Gamma }^p}\pmb {h}_\mathcal {B}^{\pmb {\psi }_{\mathcal {B}}}\kappa _{\pmb {\gamma }+\pmb {\psi }}(\pmb {x};T)\beta _{\pmb {\psi }}+\pmb {h}_{\mathcal {B}}^{-\pmb {\gamma }_{\mathcal {B}}}\alpha |\beta _{\pmb {\gamma }}|^{\alpha -1}e(\beta _{\pmb {\gamma }})\nonumber \\{} & {} \qquad \times \kappa _{\varvec{0}}(\pmb {x};T) \sum _{d=1}^DC_n(h_d)\mathbb {I}\{\gamma _d>0\},\;\pmb {\gamma }\in \pmb {\Gamma }^p, \end{aligned}$$

(S.3)

where $e(a)=\mathbb {I}\{a>0\}-\mathbb {I}\{a<0\}$ for $a\ne 0$ and $|e(0)|\le 1$. For $\pmb {\gamma }\in \pmb {\Gamma }^p\setminus \pmb {\Gamma }^p_{\mathcal {B}}$, ${\hat{\beta _{\pmb {\gamma }}}}^{-\emptyset }(\pmb {x};\pmb {Q})$ is associated with at least one index which has a large bandwidth $h_d$ with $h_d^{-1}=O(1)$, and is therefore heavily penalised. The following lemma states the order of such estimators.

Lemma 2

For any $\pmb {x}\in \mathbb {R}^D$ and $T=\textrm{diag}(\pmb {h})\pmb {Q}\in \mathscr {T}^\circ $, we have

$$\begin{aligned} {\hat{\beta _{\pmb {\gamma }}}}^{-\emptyset }(\pmb {x};\pmb {Q})=O_p\left( n^{-1}D^{-p}\right) ,\;\; \pmb {\gamma }\in \pmb {\Gamma }^p\setminus \pmb {\Gamma }^p_{\mathcal {B}}. \end{aligned}$$

(S.4)

Proof of 2

Let $M_\beta (\pmb {x})=\max \big \{|{\hat{\beta }}^{-\emptyset }_{\pmb {\gamma }}(\pmb {x};\pmb {Q})|:\pmb {\gamma }\in \pmb {\Gamma }^p\setminus \pmb {\Gamma }^p_{\mathcal {B}}\big \}$. For $\alpha =1$, it follows by minimality of $\big \{{\hat{\beta }}^{-\emptyset }_{\pmb {\gamma }}(\pmb {x};\pmb {Q}):\pmb {\gamma }\in \pmb {\Gamma }^p\big \}$ that

$$\begin{aligned}{} & {} {\sum _{\pmb {\gamma }\in \pmb {\Gamma }^p}|{\hat{\beta }}^{-\emptyset }_{\pmb {\gamma }}(\pmb {x};\pmb {Q})|^\alpha \sum _{d=1}^DC_n(h_{d})\mathbb {I}\{\gamma _d>0\}}\\\le & {} \sum _{i=1}^n K_{\pmb {Q}}(\pmb {X}_i-\pmb {x})Y_i^2/\sum _{i=1}^n K_{\pmb {Q}}(\pmb {X}_i-\pmb {x})=O_p(1). \end{aligned}$$

Thus, (S.4) holds for $\alpha =1$. For $\alpha >1$, we have by (S.3) that

$$\begin{aligned} M_\beta (\pmb {x})^{\alpha -1}n^{\alpha \vee 2-1}D^{p(\alpha \vee 2-1)}=O_p(1+D^pM_\beta (\pmb {x})). \end{aligned}$$

(S.5)

For $\alpha \in (1,2]$, (S.5) reduces to $M_\beta (\pmb {x})^{\alpha -1} =O_p(n^{-1}D^{-p}+n^{-1}M_\beta (\pmb {x}))$, so that $M_\beta (\pmb {x})=O_p(n^{-1}D^{-p})$. For $\alpha >2$, we have by (S.5) either $M_\beta (\pmb {x})=O_p (n^{-1}D^{-p})$ or $M_\beta (\pmb {x}) =O_p(n^{-(\alpha -1)/(\alpha -2)}D^{-p})$. It follows that (S.4) also holds for any $\alpha >1$. $\square $

Define, for $i=1,\ldots ,n$, $\pmb {\gamma }\in \pmb {\Gamma }^p_{\mathcal {B}}$, $\pmb {\psi }\in \pmb {\Gamma }^{p+1}_{\mathcal {B}}$ and $\pmb {x},\pmb {y}\in \mathbb {R}^D$,

$$\begin{aligned} \nu _{\pmb {\gamma },i}(\pmb {x};\pmb {Q})=&n^{-1}\pmb {h}_{\mathcal {B}}^{-\pmb {\gamma }_{\mathcal {B}}}\pmb {h}_{\mathcal {B}^c}^{\varvec{1}_{\mathcal {B}^c}} \sum _{\pmb {\gamma }'\in \pmb {\Gamma }^p_{\mathcal {B}}}\breve{\kappa }_{\pmb {\gamma },\pmb {\gamma }'}(\pmb {x};T) \pmb {h}_{\mathcal {B}}^{-\pmb {\gamma }'_{\mathcal {B}}}\\&(T\pmb {X}_i-T\pmb {x})^{\pmb {\gamma }'}K_{\pmb {Q}}(\pmb {X}_i-\pmb {x}),\\ R_{\pmb {\psi },\pmb {x}}(\pmb {y},T)=&\frac{p+1}{\pmb {\psi }!}\int _0^1 (1-t)^p\nabla _{\pmb {\psi }}m_{T_{\mathcal {B}\cdot }}\\&\times \left( (1-t)T_{\mathcal {B}\cdot }\pmb {x}+tT_{\mathcal {B}\cdot }\pmb {y}\right) dt,\\ R^*_{\pmb {\gamma }}(\pmb {x};\pmb {Q})=&\pmb {h}_\mathcal {B}^{-\pmb {\gamma }_{\mathcal {B}}}\sum _{\pmb {\psi }\in \pmb {\Gamma }^{p+1}_{\mathcal {B}},\,\Vert \pmb {\psi }\Vert _1=p+1} \pmb {h}_{\mathcal {B}}^{\pmb {\psi }_{\mathcal {B}}}\sum _{\pmb {\gamma }'\in \pmb {\Gamma }^p_{\mathcal {B}}}\breve{\kappa }_{\pmb {\gamma },\pmb {\gamma }'}(\pmb {x};T)\\&\times n^{-1}\pmb {h}_{\mathcal {B}}^{-\pmb {\gamma }'_{\mathcal {B}}-\pmb {\psi }_{\mathcal {B}}}\pmb {h}_{\mathcal {B}^c}^{\varvec{1}_{\mathcal {B}^c}} \sum _{i=1}^n(T\pmb {X}_{i}-T\pmb {x})^{\pmb {\gamma }'+\pmb {\psi }}\\&\times K_{\pmb {Q}}(\pmb {X}_i-\pmb {x}) R_{\pmb {\psi },\pmb {x}}(\pmb {X}_{i};T). \end{aligned}$$

The next lemma establishes asymptotic expansions for the regression function estimator ${\hat{\beta _{\varvec{0}}}}^{-\emptyset }(\pmb {x};\pmb {Q})$ obtained by setting the index-specific bandwidths and the direction matrix to be $\pmb {h}$ and T, respectively.

Lemma 3

For any $\pmb {x}\in \mathbb {R}^D$ and $T=\textrm{diag}(\pmb {h})\pmb {Q}\in \mathscr {T}^\circ $, we have

$$\begin{aligned}{} & {} {\hat{\beta _{\varvec{0}}}}^{-\emptyset }(\pmb {x};\pmb {Q}) =\frac{\mathcal {K}^1_{\mathcal {B}}(\pmb {x};f_{T}m_{T},\varvec{0},T)}{\mathcal {K}^1_{\mathcal {B}}(\pmb {x};f_{T},\varvec{0},T)}\\{} & {} +O_p\big (\bar{h}+n^{-1/2}\pmb {h}_\mathcal {B}^{-(1/2)\varvec{1}_{\mathcal {B}}} +n^{p+(\alpha \vee 2)-1}D^{p(\alpha \vee 2-1)}e^{-1/\bar{h}}\big ). \end{aligned}$$

Suppose further that $|\mathcal {B}|\ge r_0$. Then we have

$$\begin{aligned}{} & {} {\hat{\beta _{\varvec{0}}}}^{-\emptyset }(\pmb {x};\pmb {Q}) =m_{T_{\mathcal {B}\cdot }}(T_{\mathcal {B}\cdot }\,\pmb {x})+\sum _{i=1}^n\nu _{\varvec{0},i} (\pmb {x};\pmb {Q})w_{i,T_\mathcal {B}\cdot }\nonumber \\{} & {} +R^*_{\varvec{0}}(\pmb {x};\pmb {Q})+O_p\big (n^{-1}+n^{p+(\alpha \vee 2)-1}D^{p(\alpha \vee 2-1)}e^{-1/\bar{h}}\big ), \nonumber \\ \end{aligned}$$

(S.6)

with

$$\begin{aligned} \sum _{i=1}^n\nu _{\varvec{0},i}(\pmb {x};\pmb {Q})w_{i,T_\mathcal {B\cdot }}&=\Omega _p(n^{-1/2}\pmb {h}_{\mathcal {B}}^{-(1/2)\varvec{1}_{\mathcal {B}}}),\\ R^*_{\varvec{0}}(\pmb {x};\pmb {Q})&=\Omega _p(\bar{h}^{p^*}+n^{-1/2}\bar{h}^{p+1}\pmb {h}_\mathcal {B}^{-(1/2)\varvec{1}_{\mathcal {B}}}). \end{aligned}$$

Proof of Lemma 3

Noting $\sum _{\pmb {\psi }\in (\pmb {\Gamma }^p\backslash \pmb {\Gamma }^p_{\mathcal {B}})}\pmb {h}_\mathcal {B}^{\pmb {\gamma }+\pmb {\psi }_{\mathcal {B}}}\kappa _{\pmb {\psi }}(\pmb {x};T)\beta _{\pmb {\psi }}$ $=O_p(n^{-1})$ and

$$\begin{aligned}{} & {} \pmb {h}_{\mathcal {B}}^{-\pmb {\gamma }_{\mathcal {B}}}\alpha |\beta _{\pmb {\gamma }}|^{\alpha -1}e(\beta _{\pmb {\gamma }})\kappa _{\varvec{0}}(\pmb {x};T) \sum _{d=1}^DC_n(h_d)\mathbb {I}\{\gamma _d>0\}\\{} & {} =O_p(n^{-1}+n^{p+(\alpha \vee 2)-1}D^{p(\alpha \vee 2-1)}e^{-1/\bar{h}}) \end{aligned}$$

for any $\pmb {\gamma }\in \pmb {\Gamma }^p_{\mathcal {B}}$, by (S.3), we have

$$\begin{aligned}{} & {} {\hat{\beta _{\varvec{0}}}}^{-\emptyset }(\pmb {x};\pmb {Q})=n^{-1} \pmb {h}_{\mathcal {B}^c}^{\varvec{1}_{\mathcal {B}^c}}\nonumber \\{} & {} \sum _{i=1}^n\sum _{\pmb {\gamma }'\in \pmb {\Gamma }^p_{\mathcal {B}}}\breve{\kappa }_{\varvec{0},\pmb {\gamma }'}(\pmb {x};T) \pmb {h}_{\mathcal {B}}^{-\pmb {\gamma }'_{\mathcal {B}}}(T\pmb {X}_i-T\pmb {x})^{\pmb {\gamma }'}\nonumber \\{} & {} K_{\pmb {Q}}(\pmb {X}_i-\pmb {x})m_{T_{\mathcal {B}\cdot }} (T_{\mathcal {B}\cdot }\pmb {X}_i)\nonumber \\{} & {} +\sum _{i=1}^n\nu _{\varvec{0},i}(\pmb {x};\pmb {Q})w_{i,T_{\mathcal {B}\cdot }}\nonumber \\{} & {} +O_p\big (n^{-1} +n^{p+(\alpha \vee 2)-1}D^{p(\alpha \vee 2-1)}e^{-1/\bar{h}}\big ). \end{aligned}$$

(S.7)

Applying Lemma 1, we have $\sum _{i=1}^n\nu _{\varvec{0},i}(\pmb {x};\pmb {Q})w_{i,T_{\mathcal {B}\cdot }}=\Omega _p\big (n^{-1/2}\pmb {h}_{\mathcal {B}}^{-(1/2)\varvec{1}_{\mathcal {B}}}\big )$ and

$$\begin{aligned}{} & {} {\sum _{\pmb {\gamma }'\in \pmb {\Gamma }^p_{\mathcal {B}}}\breve{\kappa }_{\varvec{0},\pmb {\gamma }'}(\pmb {x};T) n^{-1}\pmb {h}_{\mathcal {B}}^{-\pmb {\gamma }'_{\mathcal {B}}}\pmb {h}_{\mathcal {B}^c}^{\varvec{1}_{\mathcal {B}^c}}\sum _{i=1}^n (T\pmb {X}_i-T\pmb {x})^{\pmb {\gamma }'}}\\{} & {} \quad \times K_{\pmb {Q}}(\pmb {X}_i-\pmb {x})m_{T_{\mathcal {B}\cdot }}(T_{\mathcal {B}\cdot }\pmb {X}_i)\\{} & {} =\sum _{\pmb {\gamma }'\in \pmb {\Gamma }^p_{\mathcal {B}}}\breve{\kappa }_{\varvec{0},\pmb {\gamma }'}(\pmb {x};T) \Big \{\frac{\mathcal {K}^1_{\mathcal {B}}(\pmb {x};f_Tm_T,\varvec{0},T)}{\mathcal {K}^1_{\mathcal {B}}(\pmb {x};f_T,\varvec{0},T)}\,\kappa _{\pmb {\gamma }'}(\pmb {x};T)\\{} & {} \quad +O_p\big (\bar{h}+n^{-1/2}\pmb {h}_\mathcal {B}^{-(1/2)\varvec{1}_{\mathcal {B}}}\big )\Big \}\\{} & {} = \mathcal {K}^1_{\mathcal {B}}(\pmb {x};f_Tm_T,\varvec{0},T)/\mathcal {K}^1_{\mathcal {B}}(\pmb {x};f_T,\varvec{0},T)\\{} & {} \quad +O_p\big (\bar{h}+n^{-1/2}\pmb {h}_\mathcal {B}^{-(1/2)\varvec{1}_{\mathcal {B}}}\big ), \end{aligned}$$

so that the first expression for ${\hat{\beta _{\varvec{0}}}}^{-\emptyset }(\pmb {x};\pmb {Q})$ follows from (S.7).

Consider next the case $|\mathcal {B}|\ge r_0$. The expansion (S.6) holds by substituting

$$\begin{aligned} m_{T_{\mathcal {B}\cdot }}(T_{\mathcal {B}\cdot }\pmb {X})= & {} \sum _{\pmb {\psi }\in \pmb {\Gamma }_{\mathcal {B}}^{p}}\frac{1}{\pmb {\psi }!}\nabla _{\pmb {\psi }}m_{T_{\mathcal {B}\cdot }}(T_{\mathcal {B}\cdot }\pmb {x}) (T\pmb {X}-T\pmb {x})^{\pmb {\psi }}\\{} & {} +\sum _{\pmb {\psi }\in \pmb {\Gamma }^{p+1}_{\mathcal {B}},\,\Vert \pmb {\psi }\Vert _1=p+1} \hspace{-3ex}(T\pmb {X}-T\pmb {x})^{\pmb {\psi }}R_{\pmb {\psi },\pmb {x}}(\pmb {X},T) \end{aligned}$$

in (S.7). That $R^*_{\varvec{0}}(\pmb {x};\pmb {Q})=\Omega _p(\bar{h}^{p^*}+n^{-1/2}\bar{h}^{p+1}\pmb {h}_\mathcal {B}^{-(1/2)\varvec{1}_{\mathcal {B}}})$ follows from the fact that $\mathbb {E}\big [R^*_{\varvec{0}}(\pmb {x};\pmb {Q})\big ]=\Omega _p(\bar{h}^{p^*})$ and $\textrm{Var}(R^*_{\varvec{0}}(\pmb {x};\pmb {Q}))=\Omega _p(n^{-1/2}\bar{h}^{p+1}\pmb {h}_\mathcal {B}^{-(1/2)\varvec{1}_{\mathcal {B}}})$. And $\sum _{i=1}^n\nu _{\varvec{0},i}(\pmb {x};\pmb {Q})w_{i,T_\mathcal {B\cdot }}=\Omega _p(n^{-1/2}\pmb {h}_{\mathcal {B}}^{-(1/2)\varvec{1}_{\mathcal {B}}})$ follows from the fact that expectation and variance of $\sum _{i=1}^n\nu _{\varvec{0},i}(\pmb {x};\pmb {Q})w_{i,T_\mathcal {B\cdot }}$ equal to zero and $\Omega _p(n^{-1}\pmb {h}_{\mathcal {B}}^{-\varvec{1}_{\mathcal {B}}})$ respectively by Lemma 1. $\square $

Define, for $j\notin S_k$ and $\pmb {\gamma }\in \pmb {\Gamma }_{\mathcal {B}}^p$, $R^{*-S_k}_{\pmb {\gamma }}$ and $\nu ^{-S_k}_{\pmb {\gamma },j}$ to be the counterparts of $R^*_{\pmb {\gamma }}$ and $\nu _{\pmb {\gamma },j}$, respectively, evaluated on the sample obtained by removing the observations $\big \{(\pmb {X}_i,Y_i): i \in S_k\big \}$. The next lemma establishes an expansion for the K-fold cross-validated squared prediction error.

Lemma 4

Suppose that $|\mathcal {B}|\ge r_0$ and (A5) holds. Let $\tilde{\mathcal {C}}\supset \mathcal {A}$ be a fixed subset in $\{1,\ldots ,D\}$ with $|\tilde{\mathcal {C}}|$ bounded. Then we have

$$\begin{aligned}{} & {} n^{-1}\sum _{k=1}^K\sum _{i\in S_k}\big \{Y_i-{\hat{\beta _{\varvec{0}}}}^{-S_k}(\pmb {X}_i;\pmb {Q})\big \}^2\\{} & {} \quad =n^{-1}\sum _{i=1}^nw_{i,T^*}^2 +n^{-1}\sum _{k=1}^K\sum _{i\in S_k}\mathscr {R}_{i,k}(T)^2\{1+o_p(1)\}\\{} & {} \qquad +O_p\big (n^{p+(\alpha \vee 2)-1}D^{p(\alpha \vee 2-1)}e^{-1/\bar{h}}\big ), \end{aligned}$$

uniformly over $T=\textrm{diag}(\pmb {h})\pmb {Q}\in \mathscr {T}^\circ $ satisfying $\Vert T_{\mathcal {B},\{d\}}\Vert _2=\Omega _p(\mathbb {I}\{d\in \tilde{\mathcal {C}}\})$, $d=1,\ldots ,D$, and $d_{\mathcal {B}}(T)$ sufficiently small, where

$$\begin{aligned} \mathscr {R}_{i,k}(T)\equiv & {} R^{*-{S_k}}_{\varvec{0}}(\pmb {X}_i;\pmb {Q})+\frac{n}{n-n_0}\sum _{j\notin S_k}\nu ^{-S_k}_{\varvec{0},j}(\pmb {X}_i;\pmb {Q})w_{j,T_{\mathcal {B}\cdot }}\\{} & {} +m_{T_{\mathcal {B}\cdot }} (T_{\mathcal {B}\cdot }\,\pmb {X}_i)-m_{T^*}(T^*\pmb {X}_{i}). \end{aligned}$$

Moreover, we have in general

$$\begin{aligned} n^{-1}\sum _{k=1}^K\sum _{i\in S_k}\mathscr {R}_{i,k}(T)^2=\Omega _p\big (\bar{h}^{2p^*}+n^{-1}\pmb {h}_{\mathcal {B}}^{-\varvec{1}_{\mathcal {B}}}+d_\mathcal {B}(T)^2\big ),\nonumber \\ \end{aligned}$$

(S.8)

provided that $m_{T^*}$ is not functionally related to the distribution function of $(\pmb {X},Y)$.

Proof of Lemma 4

Using Lemma 3, the cross-validated squared error has the expansion

$$\begin{aligned}{} & {} n_0^{-1}\sum _{i\in S_1}\big \{Y_i-\hat{\beta }_{\varvec{0}}^{-S_1}(\pmb {X}_i;\pmb {Q})\big \}^2\\{} & {} \quad =n_0^{-1}\sum _{i\in S_1}\big \{w_{i,T^*}^2+ \mathscr {R}_{i,1}(T)^2-2\mathscr {R}_{i,1}(T)w_{i,T^*}\big \}\\{} & {} \qquad +O_p\big (n^{-1}+n^{p+(\alpha \vee 2)-1}D^{p(\alpha \vee 2-1)}e^{-1/\bar{h}}\big ). \end{aligned}$$

To prove (S.8), it suffices to show that $n_0^{-1}\sum _{i\in S_1}\mathscr {R}_{i,1}(T)^2=\Omega _p\big (\bar{h}^{2p^*}+n^{-1}\pmb {h}_{\mathcal {B}}^{-\varvec{1}_{\mathcal {B}}}+d_{\mathcal {B}}(T)^2\big )$. Note that for $T,T'\in \mathscr {T}^\circ $ with $T_{\mathcal {B}\cdot }\ne T'_{\mathcal {B}\cdot }$, $|\mathscr {R}_{i,1}(T)-\mathscr {R}_{i,1}(T')|/\Vert T_{\mathcal {B}\cdot }-T'_{\mathcal {B}\cdot }\Vert _1$ has uniformly bounded moments. Applying Example 19.7 in, we have, for large n and some sufficiently small constant $k_0$,

$$\begin{aligned} \mathbb {P}\Big (\Big |n_0^{-1}\sum _{i\in S_1}\mathscr {R}_{i,1}(T)^2-\mathbb {E}\big [ \mathscr {R}_{i,1}(T)^2\big ]\Big |\ge t\Big )\le k_0^{-1}e^{-k_0n\pmb {h}_{\mathcal {B}}^{\varvec{1}_{\mathcal {B}}}t^2}\nonumber \\ \end{aligned}$$

(S.9)

uniformly over $T\in \mathscr {T}^\circ $. Noting that

$$\begin{aligned}&\mathbb {E}\big [\mathscr {R}_{i,1}(T)^2\big ]=d_{\mathcal {B}}(T)^2+\mathbb {E}\big [R^{*-{S_1}}_{\varvec{0}}(\pmb {X}_i;\pmb {Q})^2\big ]\\&\qquad +\frac{n^2}{(n-n_0)^2}\mathbb {E}\Big [\sum _{j\notin S_1}\nu ^{-S_1}_{\varvec{0},j}(\pmb {X}_i;\pmb {Q})^2\sigma _T(T\pmb {X})^2\Big ]\\&\qquad +2\,\mathbb {E}\big [R^{*-{S_1}}_{\varvec{0}}(\pmb {X}_i;\pmb {Q})\{m_{T_{\mathcal {B}\cdot }}(T_{\mathcal {B}\cdot }\, \pmb {X}_i)-m_{T^*}(T^*\pmb {X}_{i})\}\big ]\\&=d_{\mathcal {B}}(T)^2+\Omega _p(\bar{h}^{2p^*})+\Omega _p(n^{-1}\pmb {h}_{\mathcal {B}}^{-\varvec{1}_{\mathcal {B}}})+\Omega _p(\bar{h}^{p^*}d_B(T)), \end{aligned}$$

we have in general that

$$\begin{aligned} \mathbb {E}\big [\mathscr {R}_{i,1}(T)^2\big ]=\Omega _p\big (\bar{h}^{2p^*}+n^{-1}\pmb {h}_{\mathcal {B}}^{-\varvec{1}_{\mathcal {B}}}+d_{\mathcal {B}}(T)^2\big ). \end{aligned}$$

(S.10)

Similarly, the tail bound $\mathbb {P}\big (|n_0^{-1}\sum _{i\in S_1}\mathscr {R}_{i,1}(T)w_{i,T^*}|\ge t\big )\le k_0^{-1}e^{-k_0nt^2/\mathbb {E}[\mathscr {R}_{i,1}(T)^2]}$ implies that

$$\begin{aligned}{} & {} n_0^{-1}\sum _{i\in S_1}\mathscr {R}_{i,1}(T)w_{i,T^*}=O_p\Big (\sqrt{\mathbb {E}\big [\mathscr {R}_{i,1}(T)^2\big ]/n}\,\Big )\nonumber \\{} & {} =o_p\big (\bar{h}^{2p^*}+n^{-1}\pmb {h}_{\mathcal {B}}^{-\varvec{1}_{\mathcal {B}}}+d_{\mathcal {B}}(T)^2\big ) \end{aligned}$$

(S.11)

uniformly over $T\in \mathscr {T}^\circ $. The lemma then follows from (S.9), (S.10) and (S.11). $\square $

1.2 II Proof of Theorem 1

It follows from (A5) that

$$\begin{aligned}{} & {} n^{p+\alpha \vee 2-1}D^{p(\alpha \vee 2-1)}e^{-c/h}\nonumber \\{} & {} \quad =O\big (n^{p+\alpha \vee 2-1 }e^{-cp(h^{-1}-p^{-1}n^{\zeta _n})} \big )=o(n^{-1}). \end{aligned}$$

(S.12)

Setting $\pmb {A}=n^{1/(2p^*+r_0)}T^*$, it follows by (S.12) and Lemma 4 that

$$\begin{aligned}{} & {} n^{-1}\sum _{k=1}^K\sum _{i\in S_k}\big \{Y_i-{\hat{\beta _{\varvec{0}}}}^{-S_k}(\pmb {X}_i;\pmb {A})\big \}^2\\{} & {} +\lambda _1\sum _{j=1}^D\Lambda _n(\pmb {A}_{j\cdot }) +\lambda _2\sum _{\ell =1}^D\Lambda _n(\pmb {A}_{\cdot \ell })\\{} & {} \quad =n^{-1}\sum _{i=1}^nw_{i,T^*}^2+r_0\lambda _1 \left\{ 1+o(1)\right\} s\\{} & {} \qquad +|\mathcal {A}|\lambda _2\left\{ 1+o(1)\right\} +O_p\big ( n^{-2p^*/(2p^*+r_0)}\big ). \end{aligned}$$

Define $r^\dag =\big |\{d:{\hat{h}}_d=o_p(1)\}\big |$ and $ \mathcal {C}=\{d:\Vert \hat{T}_{\{1,\dots ,r^\dag \},\{d\}}\Vert _2=\Omega _p(1)\}$. Using Lemma 3 and comparing the objective values in (2) with $(\mathcal {B},\pmb {Q})$ set to be $\big (\{1,\ldots ,r_0\},\pmb {A}\big )$ and $\big (\{1,\ldots ,r^\dag \},\hat{\pmb {Q}}\big )$, respectively, we have

$$\begin{aligned} 0\ge&n^{-1}\sum _{k=1}^K\sum _{i\in S_k}\big \{Y_i-{\hat{\beta _{\varvec{0}}}}^{-S_k}(\pmb {X}_i;{\hat{\pmb {Q}}})\big \}^2\\&+\lambda _1\sum _{j=1}^D\Lambda _n(\hat{\pmb {Q}}_{j\cdot })+\lambda _2\sum _{k=1}^D\Lambda _n(\hat{\pmb {Q}}_{\cdot k})\\&-\,n^{-1}\sum _{k=1}^K\sum _{i\in S_k}\big \{Y_i-{\hat{\beta _{\varvec{0}}}}^{-S_k}(\pmb {X}_i;\pmb {A})\big \}^2\\&-\lambda _1\sum _{j=1}^D\Lambda _n(\pmb {A}_{j\cdot })-\lambda _2\sum _{k=1}^D\Lambda _n(\pmb {A}_{\cdot k})\\ \ge&n^{-1}\sum _{i=1}^n\left\{ \frac{\mathcal {K}^1_{\{1,\dots ,r^\dag \}}(\pmb {X}_i;f_{{\hat{T}}}m_{{\hat{T}}}, \varvec{0},{\hat{T}})}{\mathcal {K}^1_{\{1,\dots ,r^\dag \}}(\pmb {X}_i;f_{{\hat{T}}},\varvec{0},{\hat{T}})}\right. \\&\left. -\frac{\mathcal {K}^1_{\{1,\dots ,r_0\}}(\pmb {X}_i;f_{T^*}m_{T^*},\varvec{0},T^*)}{\mathcal {K}^1_{\{1,\dots ,r_0\}} (\pmb {X}_i;f_{T^*},\varvec{0},T^*)}\right\} ^2\!\!\!+o_p(1). \end{aligned}$$

It follows that $m_{\hat{T}_{\{1,\dots ,r^\dag \}\cdot }}(\hat{T}_{\{1,\dots ,r^\dag \}\cdot }\,\pmb {X})=m_{T^*}(T^*\pmb {X})+o(1)$ almost surely. This happens only if $r^\dag \ge r_0$ and $\mathcal {C}\supset \mathcal {A}$. We assume for technical convenience that $\hat{T}_{\{1,\ldots ,r^\dag \},\mathcal {C}^c}$ is a zero matrix with probability converging to one. This condition can be ensured, for example, by a simple thresholding step to reset the elements in $\hat{T}_{\hat{\mathcal {I}},\hat{\mathcal {A}}^c}$ to zero. Our empirical experience reveals that the effects of such thresholding are negligibly small. Setting $(\mathcal {B},\tilde{\mathcal {C}})=\big (\{1,\ldots ,r^\dag \},\mathcal {C}\big )$ and writing $\hat{\mathscr {R}}_{i,k}$ for $\mathscr {R}_{i,k}$ with $\pmb {h}$ replaced by $\hat{\pmb {h}}$, we have

$$\begin{aligned}{} & {} O_p\big ( n^{-2p^*/(2p^*+r_0)}\big )\ge n^{-1}\sum _{k=1}^K\sum _{i\in S_k}\hat{\mathscr {R}}_{i,k}({\hat{T}})^2\big \{1+o_p(1)\big \}\nonumber \\{} & {} \quad +\lambda _1\big \{r^\dag -r_0 -\sum _{d\le r^\dag }\hat{h}_d^{2p^*} (1+o_p(1))\big \}\nonumber \\{} & {} \quad +\lambda _1\sum _{d>r^\dag }\big (1+\hat{h}_d^{2p^*}\big )^{-1}+\lambda _2\{|\mathcal {C}|-|\mathcal {A}|\nonumber \\{} & {} \quad -\sum _{d\in \mathcal {C}}\Vert \hat{\pmb {Q}}_{\cdot d}\Vert _2^{-{2p^*}} (1+o_p(1))\}\nonumber \\{} & {} \quad +\lambda _2\sum _{d\notin \mathcal {C}}\big (1+\Vert \hat{\pmb {Q}}_{\cdot d}\Vert _2^{-2p^*}\big )^{-1}. \end{aligned}$$

(S.13)

Noting that $r^\dag \ge r_0$, $|\mathcal {C}|\ge |\mathcal {A}|$ and $n^{-2p^*/(2p^*+r_0)}=o\big (\min \{\lambda _1,\lambda _2\}\big )$, (S.13) implies that $r^\dag =r_0$ and $\mathcal {C}=\mathcal {A}$. Noting that $\Vert \hat{\pmb {Q}}_{\cdot d}\Vert _2^{-2p^*}=O_p\big (\max _{d'\le r_0}\hat{h}_{d'}^{2p^*}\big )$ for $d\in \mathcal {A}$, we have, by (S.8) and (S.13), that

$$\begin{aligned}{} & {} {o_p\big (\max _{d\le r_0}\{\hat{h}^{2p^*}_d\}\big )+ O_p\big (n^{-2p^*/(2p^*+r_0)}\big )}\\\ge & {} \lambda _1\sum _{d\le r_0}\hat{h}_d^{2p^*} (1+o_p(1))\\{} & {} +\lambda _2\sum _{d\in \mathcal {A}}\Vert \hat{\pmb {Q}}_{\cdot d}\Vert _2^{-{2p^*}} (1+o_p(1))+O_p\big ( n^{-2p^*/(2p^*+r_0)}\big )\\\ge & {} n^{-1}\sum _{k=1}^K\sum _{i\in S_k}\hat{\mathscr {R}}_{i,k}({\hat{T}})^2\big \{1+o_p(1)\big \}\\= & {} \Omega _p\big (\max _{d\le r_0}\{\hat{h}^{2p^*}_d\}+n^{-1}\hat{\pmb {h}}_{\{1,\dots ,r_0\}}^{-\varvec{1}_{\{1,\dots ,r_0\}}}+d_{\{1,\dots ,r_0\}}(\hat{T})^2\big )>0, \end{aligned}$$

so that $O_p\big ( n^{-2p^*/(2p^*+r_0)}\big )\ge \Omega _p\big (\max _{d\le r_0}\{\hat{h}^{2p^*}_d\}+n^{-1}$ $\hat{\pmb {h}}_{\{1,\dots ,r_0\}}^{-\varvec{1}_{\{1,\dots ,r_0\}}}\big )>0$ for general $m_{T^*}$ functionally unrelated to the distribution function of $(\pmb {X},Y)$. This implies $\hat{h}_d=\Omega _p(n^{-1/(2p^*+r_0)})$ for $d\le r_0$ and $d_{\{1,\dots ,r_0\}}({\hat{T}})=O_p$ $(n^{-p^*/(2p^*+r_0)})$, which proves parts (i) and (ii). Part (iv) then follows by noting that $n^{1/(2p^*+r_0)}=\Omega _p\big (\min _{d'\le r_0}\hat{h}_{d'}^{-1}\big )=O_p\big ( \Vert \hat{\pmb {Q}}_{\cdot d}\Vert _2\big )$ for $d\in \mathcal {A}$. For $d>r_0$, we have, by (ii) and (S.13), that

$$\begin{aligned} \hat{h}_d\rightarrow \infty ,\qquad \lambda _1\big (1+{\hat{h}}_d^{2p^*}\big )^{-1}&=O_p\big (n^{-2p^*/(2p^*+r_0)}\big ),\\ \lambda _1(D-r_0)\big (1+\max _{d>r_0} {\hat{h}}_d^{2p^*}\big )^{-1}&\le \lambda _1\sum _{d>r_0}\big (1+{\hat{h}}_d^{2p^*}\big )^{-1}\\&= O_p\big (n^{-2p^*/(2p^*+r_0)}\big ), \end{aligned}$$

which proves part (iii). Part (v) follows by applying similar arguments to the penalty terms $\lambda _2\big (1+\Vert \hat{\pmb {Q}}_{\cdot d}\Vert _2^{-2p^*}\big )^{-1}$, $d\not \in \mathcal {A}$, in (S.13). $\square $

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Cheung, K.Y., Lee, S.M.S. High-dimensional local polynomial regression with variable selection and dimension reduction. Stat Comput 34, 1 (2024). https://doi.org/10.1007/s11222-023-10308-1

Download citation

Received: 10 May 2023
Accepted: 26 September 2023
Published: 17 October 2023
DOI: https://doi.org/10.1007/s11222-023-10308-1

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

High-dimensional local polynomial regression with variable selection and dimension reduction

Abstract

Access this article

Similar content being viewed by others

Local Polynomials for Variable Selection

Multiple-population shrinkage estimation via sliced inverse regression

Robust and efficient estimation of nonparametric generalized linear models

References

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Additional information

Publisher's Note

Appendix

1.1 I Lemmas and proofs

Lemma 1

Proof of Lemma 1

Lemma 2

Proof of 2

Lemma 3

Proof of Lemma 3

Lemma 4

Proof of Lemma 4

1.2 II Proof of Theorem 1

Rights and permissions

About this article

Cite this article

Keywords

Navigation

High-dimensional local polynomial regression with variable selection and dimension reduction

Abstract

Access this article

Similar content being viewed by others

Local Polynomials for Variable Selection

Multiple-population shrinkage estimation via sliced inverse regression

Robust and efficient estimation of nonparametric generalized linear models

References

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Additional information

Publisher's Note

Appendix

Appendix

1.1 I Lemmas and proofs

Lemma 1

Proof of Lemma 1

Lemma 2

Proof of 2

Lemma 3

Proof of Lemma 3

Lemma 4

Proof of Lemma 4

1.2 II Proof of Theorem 1

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation