Skip to main content
Log in

High-dimensional local polynomial regression with variable selection and dimension reduction

  • Original Paper
  • Published:
Statistics and Computing Aims and scope Submit manuscript

Abstract

Variable selection and dimension reduction have been considered in nonparametric regression for improving the precision of estimation, via the formulation of a semiparametric multiple index model. However, most existing methods are ill-equipped to cope with a high-dimensional setting where the number of variables may grow exponentially fast with sample size. We propose a new procedure for simultaneous variable selection and dimension reduction in high-dimensional nonparametric regression problems. It consists essentially of penalised local polynomial regression, with the bandwidth matrix regularised to facilitate variable selection, dimension reduction and optimal estimation at the oracle convergence rate, all in one go. Unlike most existing methods, the proposed procedure does not require explicit bandwidth selection or an additional step of dimension determination using techniques like cross-validation or principal components. Empirical performance of the procedure is illustrated with both simulated and real data examples.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

References

  • Allen, G.I.: Automatic feature selection via weighted kernels and regularization. J. Comput. Graph. Stat. 22, 284–299 (2013)

    Article  MathSciNet  Google Scholar 

  • Chen, X., Zou, C., Cook, R.D.: Coordinate-independent sparse sufficient dimension reduction and variable selection. Ann. Stat. 38, 3696–3723 (2010)

    Article  MathSciNet  Google Scholar 

  • Chen, J., Zhang, C., Kosorok, M.R., Liu, Y.: Double sparsity kernel learning with automatic variable selection and data extraction. Stat. Interface 11, 401–420 (2018)

    Article  MathSciNet  Google Scholar 

  • Conn, D., Li, G.: An oracle property of the Nadaraya–Watson kernel estimator for high dimensional nonparametric regression. Scand. J. Stat. 46, 735–764 (2019)

    Article  MathSciNet  Google Scholar 

  • Cook, R.D., Li, B.: Dimension reduction for conditional mean in regression. Ann. Stat. 30, 455–474 (2002)

    Article  MathSciNet  Google Scholar 

  • Cook, R.D., Weisberg, S.: Comment on “sliced inverse regression for dimension reduction’’. J. Am. Stat. Assoc. 86, 328–332 (1991)

    Google Scholar 

  • Fan, J., Gijbels, I.: Local Polynomial Modelling and Its Applications. Chapman and Hall, London (1996)

    Google Scholar 

  • Giordano, F., Lahiri, S.N., Parrella, M.L.: GRID: A variable selection and structure discovery method for high dimensional nonparametric regression. Ann. Stat. 48, 1848–1874 (2020)

    Article  MathSciNet  Google Scholar 

  • Jiang, R., Qian, W.M., Zhou, Z.G.: Single-index composite quantile regression with heteroscedasticity and general error distributions. Stat. Pap. 57, 185–203 (2016)

    Article  MathSciNet  Google Scholar 

  • Lafferty, J., Wasserman, L.: Rodeo: sparse, greedy nonparametric regression. Ann. Stat. 36, 28–63 (2008)

    Article  MathSciNet  Google Scholar 

  • Li, K.C.: Sliced inverse regression for dimension reduction. J. Am. Stat. Assoc. 86, 316–327 (1991)

    Article  MathSciNet  Google Scholar 

  • Li, K.C.: On principal Hessian directions for data visualization and dimension reduction: another application of Stein’s lemma. J. Am. Stat. Assoc. 87, 1025–1039 (1992)

    Article  MathSciNet  Google Scholar 

  • Li, L.: Sparse sufficient dimension reduction. Biometrika 94, 603–613 (2007)

    Article  MathSciNet  Google Scholar 

  • Li, B., Dong, Y.: Dimension reduction for nonelliptically distributed predictors. Ann. Stat. 37, 1272–1298 (2009)

    Article  MathSciNet  Google Scholar 

  • Li, L., Cook, R.D., Nachtsheim, C.J.: Model-free variable selection. J. R. Stat. Soc. Ser. B 67, 285–299 (2005)

    Article  MathSciNet  Google Scholar 

  • Rekabdarkolaee, H.M., Wang, Q.: Variable selection through adaptive MAVE. Stat. Probab. Lett. 128, 44–51 (2017)

    Article  MathSciNet  Google Scholar 

  • van der Vaart, A.W., Wellner, J.A.: Weak Convergence and Empirical Processes. Springer, New York (1996)

    Book  Google Scholar 

  • Wang, Q., Yin, X.: A nonlinear multi-dimensional variable selection method for high dimensional data: sparse MAVE. Comput. Stat. Data Anal. 52, 4512–4520 (2008)

    Article  MathSciNet  Google Scholar 

  • Wang, T., Xu, P., Zhu, L.: Penalized minimum average variance estimation. Stat. Sin. 23, 543–569 (2013)

    MathSciNet  Google Scholar 

  • White, K.R., Stefanski, L.A., Wu, Y.: Variable selection in kernel regression using measurement error selection likelihoods. J. Am. Stat. Assoc. 112, 1587–1597 (2017)

    Article  MathSciNet  Google Scholar 

  • Wu, W., Hilafu, H., Xue, Y.: Simultaneous estimation for semi-parametric multi-index models. J. Stat. Comput. Simul. 89, 2354–2372 (2019)

    Article  MathSciNet  Google Scholar 

  • Xia, Y.: Asymptotic distributions for two estimators of the single-index model. Econometric Theory 22, 1112–1137 (2006)

    Article  MathSciNet  Google Scholar 

  • Xia, Y., Tong, H., Li, W.K., Zhu, L.X.: An adaptive estimation of dimension reduction space. J. R. Stat. Soc. Ser. B 64, 363–410 (2002)

    Article  MathSciNet  Google Scholar 

  • Yu, P., Du, J., Zhang, Z.: Single-index partially functional linear regression model. Stat. Pap. 61, 1107–1123 (2020)

    Article  MathSciNet  Google Scholar 

  • Zhang, J.: Estimation and variable selection for partial linear single-index distortion measurement errors models. Stat. Pap. 62, 887–913 (2021)

    Article  MathSciNet  Google Scholar 

  • Zhao, W., Zhang, F., Li, R., Lian, H.: Principal single-index varying-coefficient models for dimension reduction in quantile regression. J. Stat. Comput. Simul. 90, 800–818 (2020)

    Article  MathSciNet  Google Scholar 

  • Zhao, W., Li, R., Lian, H.: High-dimensional quantile varying-coefficient models with dimension reduction. Metrika 85, 1–19 (2022)

    Article  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Contributions

Lee designed and directed the research. Cheung wrote the main manuscript and performed the numerical studies. All authors reviewed the manuscript.

Corresponding author

Correspondence to Kin Yap Cheung.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix

Appendix

1.1 I Lemmas and proofs

Consider a fixed subset \(\mathcal {B}\subset \{1,\ldots ,D\}\) with \(|\mathcal {B}|\) bounded, and a bandwidth vector \(\pmb {h}=(h_1,\ldots ,h_D)^\top \in (0,\infty ]^D\) satisfying \(\bar{h}\equiv \max _{d\in \mathcal {B}}h_d=o(1)\) and \(h_d^{-1}=O(1)\) for \(d\notin \mathcal {B}\). Define, for \(\mathcal {T}\in \mathbb {R}^{r\times D}\) and \(i=1,\ldots ,n\), \(w_{i,\mathcal {T}}=Y_i-m_{\mathcal {T}}(\mathcal {T}\pmb {X}_i)\). For \(\pmb {x}\in \mathbb {R}^D\), \(T\in \mathscr {T}^\circ \), \(\pmb {\gamma }\in \mathbb {Z}_+^D\), \(r\ge 1\) and any function g on \(\mathbb {R}^D\), define

$$\begin{aligned} \kappa _{\pmb {\gamma }}(\pmb {x};T)&=n^{-1}\pmb {h}_\mathcal {B}^{-\pmb {\gamma }_\mathcal {B}}\pmb {h}_{\mathcal {B}^c}^{\varvec{1}_{\mathcal {B}^c}}\\&\sum _{i=1}^n(T\pmb {X}_i-T\pmb {x})^{\pmb {\gamma }}K_{\textrm{diag}(\pmb {h})^{-1}T}(\pmb {X}_i-\pmb {x}),\\ \mathcal {K}_{\mathcal {B}}^r(\pmb {x};g,\pmb {\gamma },T)&= \int g(T_{\mathcal {B}\cdot }\,\pmb {x},T_{\mathcal {B}^c\cdot }\,\pmb {x}+\pmb {u}_{\mathcal {B}^c})\pmb {u}_{\mathcal {B}^c}^{\pmb {\gamma }_{\mathcal {B}^c}}\\&\prod _{d\in \mathcal {B}^c}K(u_d/h_d)^rd\pmb {u}_{\mathcal {B}^c}. \end{aligned}$$

We state four technical lemmas before presenting our main proofs.

Lemma 1 establishes asymptotic expansions for the mean and variance of a general kernel-weighted sample average commonly found in local polynomial regression, with index-specific bandwidths set to be \(\pmb {h}\).

Lemma 1

Let \(\pmb {\gamma }\in \pmb {\Gamma }^{2p+1}\), \(T\in \mathscr {T}^\circ \) and \(\pmb {x}\in \mathbb {R}^D\) be fixed. Let g be a \((\Delta (\Vert \pmb {\gamma }_{\mathcal {B}}\Vert _1)+1)\) times differentiable function on \(\mathbb {R}^D\), with \(\mathbb {E}\left[ \Vert T\pmb {X}\Vert _2^{2p+1}|g(T\pmb {X})|\right] <\infty \). Let \(W_1,\ldots ,W_n\) be independent random variables such that \(\mathbb {E}[W_i^r|T\pmb {X}_1,\ldots ,T\pmb {X}_n]=\omega _r\), for \(r=1,2\) and \(i=1,\ldots ,n\). Then we have

$$\begin{aligned}{} & {} \mathbb {E}\Big [n^{-1}\pmb {h}_\mathcal {B}^{-\pmb {\gamma }_\mathcal {B}}\pmb {h}_{\mathcal {B}^c}^{\varvec{1}_{\mathcal {B}^c}} \sum _{i=1}^n(T\pmb {X}_i-T\pmb {x})^{\pmb {\gamma }}K_{\textrm{diag}(\pmb {h})^{-1}T}\nonumber \\{} & {} \quad (\pmb {X}_i-\pmb {x})g(T\pmb {X}_i)W_i \Big ]\nonumber \\= & {} \omega _1\sum _{\pmb {\psi }\in \pmb {\Gamma }^{\Delta (\Vert \pmb {\gamma }_{\mathcal {B}}\Vert _1)}_{\mathcal {B}}} \pmb {h}_\mathcal {B}^{\pmb {\psi }_\mathcal {B}}\mathcal {K}_{\mathcal {B}}^1(\pmb {x}; \nabla _{\pmb {\psi }}(f_Tg),\pmb {\gamma },T)\nonumber \\{} & {} \prod _{d\in \mathcal {B}}\mu _{1,\gamma _d+\psi _d}+O\left( \omega _1\bar{h}^{\Delta (\Vert \pmb {\gamma }_{\mathcal {B}}\Vert _1)+1}\right) , \end{aligned}$$
(S.1)

and

$$\begin{aligned}{} & {} \textrm{Var}\Big (n^{-1}\pmb {h}_\mathcal {B}^{-\pmb {\gamma }_\mathcal {B}}\pmb {h}_{\mathcal {B}^c}^{\varvec{1}_{\mathcal {B}^c}} \sum _{i=1}^n(T\pmb {X}_i-T\pmb {x})^{\pmb {\gamma }}\nonumber \\{} & {} K_{\textrm{diag}(\pmb {h})^{-1}T}(\pmb {X}_i-\pmb {x}) g(T\pmb {X}_i)W_i\Big )\nonumber \\{} & {} =n^{-1}\pmb {h}_\mathcal {B}^{-\varvec{1}_{\mathcal {B}}}\omega _2\mathcal {K}_{\mathcal {B}}^2(\pmb {x};f_Tg^2,2\pmb {\gamma },T)\prod _{d\in \mathcal {B}}\mu _{2,2\gamma _d} \left\{ 1+O\left( \bar{h}\right) \right\} .\nonumber \\ \end{aligned}$$
(S.2)

Proof of Lemma 1

Noting that

$$\begin{aligned}{} & {} \mathbb {E}\left[ (T\pmb {X}_1-T\pmb {x})^{\pmb {\gamma }}K_{\textrm{diag}(\pmb {h})^{-1}T}(\pmb {X}_1-\pmb {x})g(T\pmb {X}_1)W_1\right] \\{} & {} =\omega _1\pmb {h}_\mathcal {B}^{\pmb {\gamma }_\mathcal {B}}\pmb {h}_{\mathcal {B}^c}^{-\varvec{1}_{\mathcal {B}^c}}\\{} & {} \times \int (f_Tg)(T_{\mathcal {B}\cdot }\pmb {x}+\textrm{diag}(\pmb {h}_{\mathcal {B}})\pmb {u}_{\mathcal {B}},T_{\mathcal {B}^c\cdot }\pmb {x}+ \pmb {u}_{\mathcal {B}^c})\\{} & {} \Big \{\pmb {u}_\mathcal {B}^{\pmb {\gamma }_\mathcal {B}}\prod _{d\in \mathcal {B}}K(u_d)\Big \}\Big \{\pmb {u}_{\mathcal {B}^c}^{\pmb {\gamma }_{\mathcal {B}^c}} \prod _{d\in \mathcal {B}^c}K(u_d/h_d)\Big \}d\pmb {u}, \end{aligned}$$

(S.1) follows by Taylor expanding \(f_Tg\) in powers of \(\pmb {h}_{\mathcal {B}}\), that is

$$\begin{aligned}{} & {} (f_Tg)(T_{\mathcal {B}\cdot }\pmb {x}+\textrm{diag}(\pmb {h}_{\mathcal {B}})\pmb {u}_{\mathcal {B}},T_{\mathcal {B}^c\cdot }\pmb {x}+\pmb {u}_{\mathcal {B}^c})\\= & {} \sum _{\pmb {\psi }\in \pmb {\Gamma }^{\Delta (\Vert \pmb {\gamma }_{\mathcal {B}}\Vert _1)}_{\mathcal {B}}}\frac{1}{\pmb {\psi }!}\nabla _{\pmb {\psi }}(f_Tg)(T_{\mathcal {B}\cdot } \pmb {x},T_{\mathcal {B}^c\cdot }\pmb {x}+\pmb {u}_{\mathcal {B}^c}) \pmb {h}_{\mathcal {B}}^{\pmb {\psi }_{\mathcal {B}}}\pmb {u}_{\mathcal {B}}^{\pmb {\psi }_{\mathcal {B}}}\\{} & {} +\sum _{\pmb {\psi }\in \pmb {\Gamma }^{\Delta (\Vert \pmb {\gamma }_{\mathcal {B}}\Vert _1)+1}_{\mathcal {B}},\,\Vert \pmb {\psi }\Vert _1=\Delta (\Vert \pmb {\gamma }_{\mathcal {B}}\Vert _1)+1} \hspace{-7ex}\frac{\Delta (\Vert \pmb {\gamma }_{\mathcal {B}}\Vert _1) +1}{\pmb {\psi }!}\pmb {h}_{\mathcal {B}}^{\pmb {\psi }_{\mathcal {B}}}\pmb {u}_{\mathcal {B}}^{\pmb {\psi }_{\mathcal {B}}}\\{} & {} \times \int _0^1 (1-t)^{\Delta (\Vert \pmb {\gamma }_{\mathcal {B}}\Vert _1)}\nabla _{\pmb {\psi }}(f_Tg)\\{} & {} \big (T_{\mathcal {B}\cdot }\pmb {x}+t\, \textrm{diag}(\pmb {h}_{\mathcal {B}})\pmb {u}_{\mathcal {B}},T_{\mathcal {B}^c\cdot }\pmb {x}+\pmb {u}_{\mathcal {B}^c}\big )dt, \end{aligned}$$

term-by-term integration and (A2). The result (S.2) follows by noting that

$$\begin{aligned}{} & {} {\textrm{Var}\Big (n^{-1}\pmb {h}_\mathcal {B}^{-\pmb {\gamma }_\mathcal {B}}\pmb {h}_{\mathcal {B}^c}^{\varvec{1}_{\mathcal {B}^c}} \sum _{i=1}^n(T\pmb {X}_i-T\pmb {x})^{\pmb {\gamma }}K_{\textrm{diag}(\pmb {h})^{-1}T}}\\{} & {} \quad \times (\pmb {X}_i-\pmb {x}) g(T\pmb {X}_i)W_i\Big )\\{} & {} =n^{-1}\pmb {h}_\mathcal {B}^{-2\pmb {\gamma }_\mathcal {B}}\pmb {h}_{\mathcal {B}^c}^{2\varvec{1}_{\mathcal {B}^c}} \omega _2\mathbb {E}\left[ (T\pmb {X}_1-T\pmb {x})^{2\pmb {\gamma }}K_{\textrm{diag}(\pmb {h})^{-1}T}\right. \\{} & {} \left. (\pmb {X}_1-\pmb {x})^2 g(T\pmb {X}_1)^2\right] -O(n^{-1}\omega _1^2)\\{} & {} =n^{-1}\omega _2\pmb {h}_\mathcal {B}^{-\varvec{1}_{\mathcal {B}}} \Big \{\mathcal {K}^2_{\mathcal {B}}(\pmb {x};f_Tg^2,2\pmb {\gamma })\prod _{d\in \mathcal {B}}\mu _{2,2\gamma _d} +O\left( \bar{h}\right) \Big \}\\{} & {} -O(n^{-1}\omega _1^2) \end{aligned}$$

and that \(n^{-1}\omega _1^2=O\left( n^{-1}\pmb {h}_\mathcal {B}^{-\varvec{1}_{\mathcal {B}}}\bar{h}\right) \). \(\square \)

In particular, we deduce by setting \(g(\cdot )\equiv 1\) and \(W_i\equiv 1\) for all i in Lemma 1 that

$$\begin{aligned}{} & {} \kappa _{\pmb {\gamma }}(\pmb {x};T)\\{} & {} =\left\{ \begin{array}{ll}\displaystyle \mathcal {K}_{\mathcal {B}}^1(\pmb {x};f_T,\pmb {\gamma },T) \prod _{d\in \mathcal {B}}\mu _{1,\gamma _d}\big \{1+O_p(\bar{h}+n^{-1/2}\pmb {h}_\mathcal {B}^{-(1/2)\varvec{1}_{\mathcal {B}}})\big \}, &{}\\ \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \Delta (\Vert \pmb {\gamma }_{\mathcal {B}}\Vert _1)=0,&{} \\ \displaystyle \Omega _p(\bar{h}+n^{-1/2}\pmb {h}_{\mathcal {B}}^{-(1/2)\varvec{1}_{\mathcal {B}}}),&{} \\ \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \Delta (\Vert \pmb {\gamma }_{\mathcal {B}}\Vert _1)=1.&{}\\ \end{array}\right. \end{aligned}$$

Since \(\mathcal {K}_{\mathcal {B}}^1(\pmb {x};f_T,\pmb {\gamma },T)\) does not depend on \(\pmb {\gamma }\) for \(\pmb {\gamma }\in \pmb {\Gamma }^p_{\mathcal {B}}\), we can construct, by (A1), an inverse \(\{\breve{\kappa }_{\pmb {\gamma },\pmb {\psi }}(\pmb {x};T):\pmb {\gamma },\pmb {\psi }\in \pmb {\Gamma }^p_{\mathcal {B}}\}\) such that

$$\begin{aligned} \sum _{\pmb {\psi }\in \pmb {\Gamma }^p_{\mathcal {B}}}\breve{\kappa }_{\pmb {\gamma },\pmb {\psi }}(\pmb {x};T)\kappa _{\pmb {\gamma }'+\psi }(\pmb {x};T)=\mathbb {I}\{\pmb {\gamma }=\pmb {\gamma }'\}, \;\;\;\pmb {\gamma },\pmb {\gamma }'\in \pmb {\Gamma }_{\mathcal {B}}^p. \end{aligned}$$

Setting \(\pmb {Q}=\textrm{diag}(\pmb {h})^{-1}T\), the local coefficient estimators \({\hat{\beta _{\pmb {\gamma }}}}^{-\emptyset }(\pmb {x};\pmb {Q})\) defined in (1) solve the following system of equations for \(\beta _{\pmb {\gamma }}\):

$$\begin{aligned}{} & {} n^{-1}\pmb {h}_{\mathcal {B}}^{-\pmb {\gamma }_{\mathcal {B}}} \pmb {h}_{\mathcal {B}^c}^{\varvec{1}_{\mathcal {B}^c}}\sum _{i=1}^n(T\pmb {X}_i-T\pmb {x})^{\pmb {\gamma }}K_{\pmb {Q}}(\pmb {X}_i-\pmb {x})Y_i\nonumber \\{} & {} =\sum _{\pmb {\psi }\in \pmb {\Gamma }^p}\pmb {h}_\mathcal {B}^{\pmb {\psi }_{\mathcal {B}}}\kappa _{\pmb {\gamma }+\pmb {\psi }}(\pmb {x};T)\beta _{\pmb {\psi }}+\pmb {h}_{\mathcal {B}}^{-\pmb {\gamma }_{\mathcal {B}}}\alpha |\beta _{\pmb {\gamma }}|^{\alpha -1}e(\beta _{\pmb {\gamma }})\nonumber \\{} & {} \qquad \times \kappa _{\varvec{0}}(\pmb {x};T) \sum _{d=1}^DC_n(h_d)\mathbb {I}\{\gamma _d>0\},\;\pmb {\gamma }\in \pmb {\Gamma }^p, \end{aligned}$$
(S.3)

where \(e(a)=\mathbb {I}\{a>0\}-\mathbb {I}\{a<0\}\) for \(a\ne 0\) and \(|e(0)|\le 1\). For \(\pmb {\gamma }\in \pmb {\Gamma }^p\setminus \pmb {\Gamma }^p_{\mathcal {B}}\), \({\hat{\beta _{\pmb {\gamma }}}}^{-\emptyset }(\pmb {x};\pmb {Q})\) is associated with at least one index which has a large bandwidth \(h_d\) with \(h_d^{-1}=O(1)\), and is therefore heavily penalised. The following lemma states the order of such estimators.

Lemma 2

For any \(\pmb {x}\in \mathbb {R}^D\) and \(T=\textrm{diag}(\pmb {h})\pmb {Q}\in \mathscr {T}^\circ \), we have

$$\begin{aligned} {\hat{\beta _{\pmb {\gamma }}}}^{-\emptyset }(\pmb {x};\pmb {Q})=O_p\left( n^{-1}D^{-p}\right) ,\;\; \pmb {\gamma }\in \pmb {\Gamma }^p\setminus \pmb {\Gamma }^p_{\mathcal {B}}. \end{aligned}$$
(S.4)

Proof of 2

Let \(M_\beta (\pmb {x})=\max \big \{|{\hat{\beta }}^{-\emptyset }_{\pmb {\gamma }}(\pmb {x};\pmb {Q})|:\pmb {\gamma }\in \pmb {\Gamma }^p\setminus \pmb {\Gamma }^p_{\mathcal {B}}\big \}\). For \(\alpha =1\), it follows by minimality of \(\big \{{\hat{\beta }}^{-\emptyset }_{\pmb {\gamma }}(\pmb {x};\pmb {Q}):\pmb {\gamma }\in \pmb {\Gamma }^p\big \}\) that

$$\begin{aligned}{} & {} {\sum _{\pmb {\gamma }\in \pmb {\Gamma }^p}|{\hat{\beta }}^{-\emptyset }_{\pmb {\gamma }}(\pmb {x};\pmb {Q})|^\alpha \sum _{d=1}^DC_n(h_{d})\mathbb {I}\{\gamma _d>0\}}\\\le & {} \sum _{i=1}^n K_{\pmb {Q}}(\pmb {X}_i-\pmb {x})Y_i^2/\sum _{i=1}^n K_{\pmb {Q}}(\pmb {X}_i-\pmb {x})=O_p(1). \end{aligned}$$

Thus, (S.4) holds for \(\alpha =1\). For \(\alpha >1\), we have by (S.3) that

$$\begin{aligned} M_\beta (\pmb {x})^{\alpha -1}n^{\alpha \vee 2-1}D^{p(\alpha \vee 2-1)}=O_p(1+D^pM_\beta (\pmb {x})). \end{aligned}$$
(S.5)

For \(\alpha \in (1,2]\), (S.5) reduces to \(M_\beta (\pmb {x})^{\alpha -1} =O_p(n^{-1}D^{-p}+n^{-1}M_\beta (\pmb {x}))\), so that \(M_\beta (\pmb {x})=O_p(n^{-1}D^{-p})\). For \(\alpha >2\), we have by (S.5) either \(M_\beta (\pmb {x})=O_p (n^{-1}D^{-p})\) or \(M_\beta (\pmb {x}) =O_p(n^{-(\alpha -1)/(\alpha -2)}D^{-p})\). It follows that (S.4) also holds for any \(\alpha >1\). \(\square \)

Define, for \(i=1,\ldots ,n\), \(\pmb {\gamma }\in \pmb {\Gamma }^p_{\mathcal {B}}\), \(\pmb {\psi }\in \pmb {\Gamma }^{p+1}_{\mathcal {B}}\) and \(\pmb {x},\pmb {y}\in \mathbb {R}^D\),

$$\begin{aligned} \nu _{\pmb {\gamma },i}(\pmb {x};\pmb {Q})=&n^{-1}\pmb {h}_{\mathcal {B}}^{-\pmb {\gamma }_{\mathcal {B}}}\pmb {h}_{\mathcal {B}^c}^{\varvec{1}_{\mathcal {B}^c}} \sum _{\pmb {\gamma }'\in \pmb {\Gamma }^p_{\mathcal {B}}}\breve{\kappa }_{\pmb {\gamma },\pmb {\gamma }'}(\pmb {x};T) \pmb {h}_{\mathcal {B}}^{-\pmb {\gamma }'_{\mathcal {B}}}\\&(T\pmb {X}_i-T\pmb {x})^{\pmb {\gamma }'}K_{\pmb {Q}}(\pmb {X}_i-\pmb {x}),\\ R_{\pmb {\psi },\pmb {x}}(\pmb {y},T)=&\frac{p+1}{\pmb {\psi }!}\int _0^1 (1-t)^p\nabla _{\pmb {\psi }}m_{T_{\mathcal {B}\cdot }}\\&\times \left( (1-t)T_{\mathcal {B}\cdot }\pmb {x}+tT_{\mathcal {B}\cdot }\pmb {y}\right) dt,\\ R^*_{\pmb {\gamma }}(\pmb {x};\pmb {Q})=&\pmb {h}_\mathcal {B}^{-\pmb {\gamma }_{\mathcal {B}}}\sum _{\pmb {\psi }\in \pmb {\Gamma }^{p+1}_{\mathcal {B}},\,\Vert \pmb {\psi }\Vert _1=p+1} \pmb {h}_{\mathcal {B}}^{\pmb {\psi }_{\mathcal {B}}}\sum _{\pmb {\gamma }'\in \pmb {\Gamma }^p_{\mathcal {B}}}\breve{\kappa }_{\pmb {\gamma },\pmb {\gamma }'}(\pmb {x};T)\\&\times n^{-1}\pmb {h}_{\mathcal {B}}^{-\pmb {\gamma }'_{\mathcal {B}}-\pmb {\psi }_{\mathcal {B}}}\pmb {h}_{\mathcal {B}^c}^{\varvec{1}_{\mathcal {B}^c}} \sum _{i=1}^n(T\pmb {X}_{i}-T\pmb {x})^{\pmb {\gamma }'+\pmb {\psi }}\\&\times K_{\pmb {Q}}(\pmb {X}_i-\pmb {x}) R_{\pmb {\psi },\pmb {x}}(\pmb {X}_{i};T). \end{aligned}$$

The next lemma establishes asymptotic expansions for the regression function estimator \({\hat{\beta _{\varvec{0}}}}^{-\emptyset }(\pmb {x};\pmb {Q})\) obtained by setting the index-specific bandwidths and the direction matrix to be \(\pmb {h}\) and T, respectively.

Lemma 3

For any \(\pmb {x}\in \mathbb {R}^D\) and \(T=\textrm{diag}(\pmb {h})\pmb {Q}\in \mathscr {T}^\circ \), we have

$$\begin{aligned}{} & {} {\hat{\beta _{\varvec{0}}}}^{-\emptyset }(\pmb {x};\pmb {Q}) =\frac{\mathcal {K}^1_{\mathcal {B}}(\pmb {x};f_{T}m_{T},\varvec{0},T)}{\mathcal {K}^1_{\mathcal {B}}(\pmb {x};f_{T},\varvec{0},T)}\\{} & {} +O_p\big (\bar{h}+n^{-1/2}\pmb {h}_\mathcal {B}^{-(1/2)\varvec{1}_{\mathcal {B}}} +n^{p+(\alpha \vee 2)-1}D^{p(\alpha \vee 2-1)}e^{-1/\bar{h}}\big ). \end{aligned}$$

Suppose further that \(|\mathcal {B}|\ge r_0\). Then we have

$$\begin{aligned}{} & {} {\hat{\beta _{\varvec{0}}}}^{-\emptyset }(\pmb {x};\pmb {Q}) =m_{T_{\mathcal {B}\cdot }}(T_{\mathcal {B}\cdot }\,\pmb {x})+\sum _{i=1}^n\nu _{\varvec{0},i} (\pmb {x};\pmb {Q})w_{i,T_\mathcal {B}\cdot }\nonumber \\{} & {} +R^*_{\varvec{0}}(\pmb {x};\pmb {Q})+O_p\big (n^{-1}+n^{p+(\alpha \vee 2)-1}D^{p(\alpha \vee 2-1)}e^{-1/\bar{h}}\big ), \nonumber \\ \end{aligned}$$
(S.6)

with

$$\begin{aligned} \sum _{i=1}^n\nu _{\varvec{0},i}(\pmb {x};\pmb {Q})w_{i,T_\mathcal {B\cdot }}&=\Omega _p(n^{-1/2}\pmb {h}_{\mathcal {B}}^{-(1/2)\varvec{1}_{\mathcal {B}}}),\\ R^*_{\varvec{0}}(\pmb {x};\pmb {Q})&=\Omega _p(\bar{h}^{p^*}+n^{-1/2}\bar{h}^{p+1}\pmb {h}_\mathcal {B}^{-(1/2)\varvec{1}_{\mathcal {B}}}). \end{aligned}$$

Proof of Lemma 3

Noting \(\sum _{\pmb {\psi }\in (\pmb {\Gamma }^p\backslash \pmb {\Gamma }^p_{\mathcal {B}})}\pmb {h}_\mathcal {B}^{\pmb {\gamma }+\pmb {\psi }_{\mathcal {B}}}\kappa _{\pmb {\psi }}(\pmb {x};T)\beta _{\pmb {\psi }}\) \(=O_p(n^{-1})\) and

$$\begin{aligned}{} & {} \pmb {h}_{\mathcal {B}}^{-\pmb {\gamma }_{\mathcal {B}}}\alpha |\beta _{\pmb {\gamma }}|^{\alpha -1}e(\beta _{\pmb {\gamma }})\kappa _{\varvec{0}}(\pmb {x};T) \sum _{d=1}^DC_n(h_d)\mathbb {I}\{\gamma _d>0\}\\{} & {} =O_p(n^{-1}+n^{p+(\alpha \vee 2)-1}D^{p(\alpha \vee 2-1)}e^{-1/\bar{h}}) \end{aligned}$$

for any \(\pmb {\gamma }\in \pmb {\Gamma }^p_{\mathcal {B}}\), by (S.3), we have

$$\begin{aligned}{} & {} {\hat{\beta _{\varvec{0}}}}^{-\emptyset }(\pmb {x};\pmb {Q})=n^{-1} \pmb {h}_{\mathcal {B}^c}^{\varvec{1}_{\mathcal {B}^c}}\nonumber \\{} & {} \sum _{i=1}^n\sum _{\pmb {\gamma }'\in \pmb {\Gamma }^p_{\mathcal {B}}}\breve{\kappa }_{\varvec{0},\pmb {\gamma }'}(\pmb {x};T) \pmb {h}_{\mathcal {B}}^{-\pmb {\gamma }'_{\mathcal {B}}}(T\pmb {X}_i-T\pmb {x})^{\pmb {\gamma }'}\nonumber \\{} & {} K_{\pmb {Q}}(\pmb {X}_i-\pmb {x})m_{T_{\mathcal {B}\cdot }} (T_{\mathcal {B}\cdot }\pmb {X}_i)\nonumber \\{} & {} +\sum _{i=1}^n\nu _{\varvec{0},i}(\pmb {x};\pmb {Q})w_{i,T_{\mathcal {B}\cdot }}\nonumber \\{} & {} +O_p\big (n^{-1} +n^{p+(\alpha \vee 2)-1}D^{p(\alpha \vee 2-1)}e^{-1/\bar{h}}\big ). \end{aligned}$$
(S.7)

Applying Lemma 1, we have \(\sum _{i=1}^n\nu _{\varvec{0},i}(\pmb {x};\pmb {Q})w_{i,T_{\mathcal {B}\cdot }}=\Omega _p\big (n^{-1/2}\pmb {h}_{\mathcal {B}}^{-(1/2)\varvec{1}_{\mathcal {B}}}\big )\) and

$$\begin{aligned}{} & {} {\sum _{\pmb {\gamma }'\in \pmb {\Gamma }^p_{\mathcal {B}}}\breve{\kappa }_{\varvec{0},\pmb {\gamma }'}(\pmb {x};T) n^{-1}\pmb {h}_{\mathcal {B}}^{-\pmb {\gamma }'_{\mathcal {B}}}\pmb {h}_{\mathcal {B}^c}^{\varvec{1}_{\mathcal {B}^c}}\sum _{i=1}^n (T\pmb {X}_i-T\pmb {x})^{\pmb {\gamma }'}}\\{} & {} \quad \times K_{\pmb {Q}}(\pmb {X}_i-\pmb {x})m_{T_{\mathcal {B}\cdot }}(T_{\mathcal {B}\cdot }\pmb {X}_i)\\{} & {} =\sum _{\pmb {\gamma }'\in \pmb {\Gamma }^p_{\mathcal {B}}}\breve{\kappa }_{\varvec{0},\pmb {\gamma }'}(\pmb {x};T) \Big \{\frac{\mathcal {K}^1_{\mathcal {B}}(\pmb {x};f_Tm_T,\varvec{0},T)}{\mathcal {K}^1_{\mathcal {B}}(\pmb {x};f_T,\varvec{0},T)}\,\kappa _{\pmb {\gamma }'}(\pmb {x};T)\\{} & {} \quad +O_p\big (\bar{h}+n^{-1/2}\pmb {h}_\mathcal {B}^{-(1/2)\varvec{1}_{\mathcal {B}}}\big )\Big \}\\{} & {} = \mathcal {K}^1_{\mathcal {B}}(\pmb {x};f_Tm_T,\varvec{0},T)/\mathcal {K}^1_{\mathcal {B}}(\pmb {x};f_T,\varvec{0},T)\\{} & {} \quad +O_p\big (\bar{h}+n^{-1/2}\pmb {h}_\mathcal {B}^{-(1/2)\varvec{1}_{\mathcal {B}}}\big ), \end{aligned}$$

so that the first expression for \({\hat{\beta _{\varvec{0}}}}^{-\emptyset }(\pmb {x};\pmb {Q})\) follows from (S.7).

Consider next the case \(|\mathcal {B}|\ge r_0\). The expansion (S.6) holds by substituting

$$\begin{aligned} m_{T_{\mathcal {B}\cdot }}(T_{\mathcal {B}\cdot }\pmb {X})= & {} \sum _{\pmb {\psi }\in \pmb {\Gamma }_{\mathcal {B}}^{p}}\frac{1}{\pmb {\psi }!}\nabla _{\pmb {\psi }}m_{T_{\mathcal {B}\cdot }}(T_{\mathcal {B}\cdot }\pmb {x}) (T\pmb {X}-T\pmb {x})^{\pmb {\psi }}\\{} & {} +\sum _{\pmb {\psi }\in \pmb {\Gamma }^{p+1}_{\mathcal {B}},\,\Vert \pmb {\psi }\Vert _1=p+1} \hspace{-3ex}(T\pmb {X}-T\pmb {x})^{\pmb {\psi }}R_{\pmb {\psi },\pmb {x}}(\pmb {X},T) \end{aligned}$$

in (S.7). That \(R^*_{\varvec{0}}(\pmb {x};\pmb {Q})=\Omega _p(\bar{h}^{p^*}+n^{-1/2}\bar{h}^{p+1}\pmb {h}_\mathcal {B}^{-(1/2)\varvec{1}_{\mathcal {B}}})\) follows from the fact that \(\mathbb {E}\big [R^*_{\varvec{0}}(\pmb {x};\pmb {Q})\big ]=\Omega _p(\bar{h}^{p^*})\) and \(\textrm{Var}(R^*_{\varvec{0}}(\pmb {x};\pmb {Q}))=\Omega _p(n^{-1/2}\bar{h}^{p+1}\pmb {h}_\mathcal {B}^{-(1/2)\varvec{1}_{\mathcal {B}}})\). And \(\sum _{i=1}^n\nu _{\varvec{0},i}(\pmb {x};\pmb {Q})w_{i,T_\mathcal {B\cdot }}=\Omega _p(n^{-1/2}\pmb {h}_{\mathcal {B}}^{-(1/2)\varvec{1}_{\mathcal {B}}})\) follows from the fact that expectation and variance of \(\sum _{i=1}^n\nu _{\varvec{0},i}(\pmb {x};\pmb {Q})w_{i,T_\mathcal {B\cdot }}\) equal to zero and \(\Omega _p(n^{-1}\pmb {h}_{\mathcal {B}}^{-\varvec{1}_{\mathcal {B}}})\) respectively by Lemma 1. \(\square \)

Define, for \(j\notin S_k\) and \(\pmb {\gamma }\in \pmb {\Gamma }_{\mathcal {B}}^p\), \(R^{*-S_k}_{\pmb {\gamma }}\) and \(\nu ^{-S_k}_{\pmb {\gamma },j}\) to be the counterparts of \(R^*_{\pmb {\gamma }}\) and \(\nu _{\pmb {\gamma },j}\), respectively, evaluated on the sample obtained by removing the observations \(\big \{(\pmb {X}_i,Y_i): i \in S_k\big \}\). The next lemma establishes an expansion for the K-fold cross-validated squared prediction error.

Lemma 4

Suppose that \(|\mathcal {B}|\ge r_0\) and (A5) holds. Let \(\tilde{\mathcal {C}}\supset \mathcal {A}\) be a fixed subset in \(\{1,\ldots ,D\}\) with \(|\tilde{\mathcal {C}}|\) bounded. Then we have

$$\begin{aligned}{} & {} n^{-1}\sum _{k=1}^K\sum _{i\in S_k}\big \{Y_i-{\hat{\beta _{\varvec{0}}}}^{-S_k}(\pmb {X}_i;\pmb {Q})\big \}^2\\{} & {} \quad =n^{-1}\sum _{i=1}^nw_{i,T^*}^2 +n^{-1}\sum _{k=1}^K\sum _{i\in S_k}\mathscr {R}_{i,k}(T)^2\{1+o_p(1)\}\\{} & {} \qquad +O_p\big (n^{p+(\alpha \vee 2)-1}D^{p(\alpha \vee 2-1)}e^{-1/\bar{h}}\big ), \end{aligned}$$

uniformly over \(T=\textrm{diag}(\pmb {h})\pmb {Q}\in \mathscr {T}^\circ \) satisfying \(\Vert T_{\mathcal {B},\{d\}}\Vert _2=\Omega _p(\mathbb {I}\{d\in \tilde{\mathcal {C}}\})\), \(d=1,\ldots ,D\), and \(d_{\mathcal {B}}(T)\) sufficiently small, where

$$\begin{aligned} \mathscr {R}_{i,k}(T)\equiv & {} R^{*-{S_k}}_{\varvec{0}}(\pmb {X}_i;\pmb {Q})+\frac{n}{n-n_0}\sum _{j\notin S_k}\nu ^{-S_k}_{\varvec{0},j}(\pmb {X}_i;\pmb {Q})w_{j,T_{\mathcal {B}\cdot }}\\{} & {} +m_{T_{\mathcal {B}\cdot }} (T_{\mathcal {B}\cdot }\,\pmb {X}_i)-m_{T^*}(T^*\pmb {X}_{i}). \end{aligned}$$

Moreover, we have in general

$$\begin{aligned} n^{-1}\sum _{k=1}^K\sum _{i\in S_k}\mathscr {R}_{i,k}(T)^2=\Omega _p\big (\bar{h}^{2p^*}+n^{-1}\pmb {h}_{\mathcal {B}}^{-\varvec{1}_{\mathcal {B}}}+d_\mathcal {B}(T)^2\big ),\nonumber \\ \end{aligned}$$
(S.8)

provided that \(m_{T^*}\) is not functionally related to the distribution function of \((\pmb {X},Y)\).

Proof of Lemma 4

Using Lemma 3, the cross-validated squared error has the expansion

$$\begin{aligned}{} & {} n_0^{-1}\sum _{i\in S_1}\big \{Y_i-\hat{\beta }_{\varvec{0}}^{-S_1}(\pmb {X}_i;\pmb {Q})\big \}^2\\{} & {} \quad =n_0^{-1}\sum _{i\in S_1}\big \{w_{i,T^*}^2+ \mathscr {R}_{i,1}(T)^2-2\mathscr {R}_{i,1}(T)w_{i,T^*}\big \}\\{} & {} \qquad +O_p\big (n^{-1}+n^{p+(\alpha \vee 2)-1}D^{p(\alpha \vee 2-1)}e^{-1/\bar{h}}\big ). \end{aligned}$$

To prove (S.8), it suffices to show that \(n_0^{-1}\sum _{i\in S_1}\mathscr {R}_{i,1}(T)^2=\Omega _p\big (\bar{h}^{2p^*}+n^{-1}\pmb {h}_{\mathcal {B}}^{-\varvec{1}_{\mathcal {B}}}+d_{\mathcal {B}}(T)^2\big )\). Note that for \(T,T'\in \mathscr {T}^\circ \) with \(T_{\mathcal {B}\cdot }\ne T'_{\mathcal {B}\cdot }\), \(|\mathscr {R}_{i,1}(T)-\mathscr {R}_{i,1}(T')|/\Vert T_{\mathcal {B}\cdot }-T'_{\mathcal {B}\cdot }\Vert _1\) has uniformly bounded moments. Applying Example 19.7 in, we have, for large n and some sufficiently small constant \(k_0\),

$$\begin{aligned} \mathbb {P}\Big (\Big |n_0^{-1}\sum _{i\in S_1}\mathscr {R}_{i,1}(T)^2-\mathbb {E}\big [ \mathscr {R}_{i,1}(T)^2\big ]\Big |\ge t\Big )\le k_0^{-1}e^{-k_0n\pmb {h}_{\mathcal {B}}^{\varvec{1}_{\mathcal {B}}}t^2}\nonumber \\ \end{aligned}$$
(S.9)

uniformly over \(T\in \mathscr {T}^\circ \). Noting that

$$\begin{aligned}&\mathbb {E}\big [\mathscr {R}_{i,1}(T)^2\big ]=d_{\mathcal {B}}(T)^2+\mathbb {E}\big [R^{*-{S_1}}_{\varvec{0}}(\pmb {X}_i;\pmb {Q})^2\big ]\\&\qquad +\frac{n^2}{(n-n_0)^2}\mathbb {E}\Big [\sum _{j\notin S_1}\nu ^{-S_1}_{\varvec{0},j}(\pmb {X}_i;\pmb {Q})^2\sigma _T(T\pmb {X})^2\Big ]\\&\qquad +2\,\mathbb {E}\big [R^{*-{S_1}}_{\varvec{0}}(\pmb {X}_i;\pmb {Q})\{m_{T_{\mathcal {B}\cdot }}(T_{\mathcal {B}\cdot }\, \pmb {X}_i)-m_{T^*}(T^*\pmb {X}_{i})\}\big ]\\&=d_{\mathcal {B}}(T)^2+\Omega _p(\bar{h}^{2p^*})+\Omega _p(n^{-1}\pmb {h}_{\mathcal {B}}^{-\varvec{1}_{\mathcal {B}}})+\Omega _p(\bar{h}^{p^*}d_B(T)), \end{aligned}$$

we have in general that

$$\begin{aligned} \mathbb {E}\big [\mathscr {R}_{i,1}(T)^2\big ]=\Omega _p\big (\bar{h}^{2p^*}+n^{-1}\pmb {h}_{\mathcal {B}}^{-\varvec{1}_{\mathcal {B}}}+d_{\mathcal {B}}(T)^2\big ). \end{aligned}$$
(S.10)

Similarly, the tail bound \(\mathbb {P}\big (|n_0^{-1}\sum _{i\in S_1}\mathscr {R}_{i,1}(T)w_{i,T^*}|\ge t\big )\le k_0^{-1}e^{-k_0nt^2/\mathbb {E}[\mathscr {R}_{i,1}(T)^2]}\) implies that

$$\begin{aligned}{} & {} n_0^{-1}\sum _{i\in S_1}\mathscr {R}_{i,1}(T)w_{i,T^*}=O_p\Big (\sqrt{\mathbb {E}\big [\mathscr {R}_{i,1}(T)^2\big ]/n}\,\Big )\nonumber \\{} & {} =o_p\big (\bar{h}^{2p^*}+n^{-1}\pmb {h}_{\mathcal {B}}^{-\varvec{1}_{\mathcal {B}}}+d_{\mathcal {B}}(T)^2\big ) \end{aligned}$$
(S.11)

uniformly over \(T\in \mathscr {T}^\circ \). The lemma then follows from (S.9), (S.10) and (S.11). \(\square \)

1.2 II Proof of Theorem 1

It follows from (A5) that

$$\begin{aligned}{} & {} n^{p+\alpha \vee 2-1}D^{p(\alpha \vee 2-1)}e^{-c/h}\nonumber \\{} & {} \quad =O\big (n^{p+\alpha \vee 2-1 }e^{-cp(h^{-1}-p^{-1}n^{\zeta _n})} \big )=o(n^{-1}). \end{aligned}$$
(S.12)

Setting \(\pmb {A}=n^{1/(2p^*+r_0)}T^*\), it follows by (S.12) and Lemma 4 that

$$\begin{aligned}{} & {} n^{-1}\sum _{k=1}^K\sum _{i\in S_k}\big \{Y_i-{\hat{\beta _{\varvec{0}}}}^{-S_k}(\pmb {X}_i;\pmb {A})\big \}^2\\{} & {} +\lambda _1\sum _{j=1}^D\Lambda _n(\pmb {A}_{j\cdot }) +\lambda _2\sum _{\ell =1}^D\Lambda _n(\pmb {A}_{\cdot \ell })\\{} & {} \quad =n^{-1}\sum _{i=1}^nw_{i,T^*}^2+r_0\lambda _1 \left\{ 1+o(1)\right\} s\\{} & {} \qquad +|\mathcal {A}|\lambda _2\left\{ 1+o(1)\right\} +O_p\big ( n^{-2p^*/(2p^*+r_0)}\big ). \end{aligned}$$

Define \(r^\dag =\big |\{d:{\hat{h}}_d=o_p(1)\}\big |\) and \( \mathcal {C}=\{d:\Vert \hat{T}_{\{1,\dots ,r^\dag \},\{d\}}\Vert _2=\Omega _p(1)\}\). Using Lemma 3 and comparing the objective values in (2) with \((\mathcal {B},\pmb {Q})\) set to be \(\big (\{1,\ldots ,r_0\},\pmb {A}\big )\) and \(\big (\{1,\ldots ,r^\dag \},\hat{\pmb {Q}}\big )\), respectively, we have

$$\begin{aligned} 0\ge&n^{-1}\sum _{k=1}^K\sum _{i\in S_k}\big \{Y_i-{\hat{\beta _{\varvec{0}}}}^{-S_k}(\pmb {X}_i;{\hat{\pmb {Q}}})\big \}^2\\&+\lambda _1\sum _{j=1}^D\Lambda _n(\hat{\pmb {Q}}_{j\cdot })+\lambda _2\sum _{k=1}^D\Lambda _n(\hat{\pmb {Q}}_{\cdot k})\\&-\,n^{-1}\sum _{k=1}^K\sum _{i\in S_k}\big \{Y_i-{\hat{\beta _{\varvec{0}}}}^{-S_k}(\pmb {X}_i;\pmb {A})\big \}^2\\&-\lambda _1\sum _{j=1}^D\Lambda _n(\pmb {A}_{j\cdot })-\lambda _2\sum _{k=1}^D\Lambda _n(\pmb {A}_{\cdot k})\\ \ge&n^{-1}\sum _{i=1}^n\left\{ \frac{\mathcal {K}^1_{\{1,\dots ,r^\dag \}}(\pmb {X}_i;f_{{\hat{T}}}m_{{\hat{T}}}, \varvec{0},{\hat{T}})}{\mathcal {K}^1_{\{1,\dots ,r^\dag \}}(\pmb {X}_i;f_{{\hat{T}}},\varvec{0},{\hat{T}})}\right. \\&\left. -\frac{\mathcal {K}^1_{\{1,\dots ,r_0\}}(\pmb {X}_i;f_{T^*}m_{T^*},\varvec{0},T^*)}{\mathcal {K}^1_{\{1,\dots ,r_0\}} (\pmb {X}_i;f_{T^*},\varvec{0},T^*)}\right\} ^2\!\!\!+o_p(1). \end{aligned}$$

It follows that \(m_{\hat{T}_{\{1,\dots ,r^\dag \}\cdot }}(\hat{T}_{\{1,\dots ,r^\dag \}\cdot }\,\pmb {X})=m_{T^*}(T^*\pmb {X})+o(1)\) almost surely. This happens only if \(r^\dag \ge r_0\) and \(\mathcal {C}\supset \mathcal {A}\). We assume for technical convenience that \(\hat{T}_{\{1,\ldots ,r^\dag \},\mathcal {C}^c}\) is a zero matrix with probability converging to one. This condition can be ensured, for example, by a simple thresholding step to reset the elements in \(\hat{T}_{\hat{\mathcal {I}},\hat{\mathcal {A}}^c}\) to zero. Our empirical experience reveals that the effects of such thresholding are negligibly small. Setting \((\mathcal {B},\tilde{\mathcal {C}})=\big (\{1,\ldots ,r^\dag \},\mathcal {C}\big )\) and writing \(\hat{\mathscr {R}}_{i,k}\) for \(\mathscr {R}_{i,k}\) with \(\pmb {h}\) replaced by \(\hat{\pmb {h}}\), we have

$$\begin{aligned}{} & {} O_p\big ( n^{-2p^*/(2p^*+r_0)}\big )\ge n^{-1}\sum _{k=1}^K\sum _{i\in S_k}\hat{\mathscr {R}}_{i,k}({\hat{T}})^2\big \{1+o_p(1)\big \}\nonumber \\{} & {} \quad +\lambda _1\big \{r^\dag -r_0 -\sum _{d\le r^\dag }\hat{h}_d^{2p^*} (1+o_p(1))\big \}\nonumber \\{} & {} \quad +\lambda _1\sum _{d>r^\dag }\big (1+\hat{h}_d^{2p^*}\big )^{-1}+\lambda _2\{|\mathcal {C}|-|\mathcal {A}|\nonumber \\{} & {} \quad -\sum _{d\in \mathcal {C}}\Vert \hat{\pmb {Q}}_{\cdot d}\Vert _2^{-{2p^*}} (1+o_p(1))\}\nonumber \\{} & {} \quad +\lambda _2\sum _{d\notin \mathcal {C}}\big (1+\Vert \hat{\pmb {Q}}_{\cdot d}\Vert _2^{-2p^*}\big )^{-1}. \end{aligned}$$
(S.13)

Noting that \(r^\dag \ge r_0\), \(|\mathcal {C}|\ge |\mathcal {A}|\) and \(n^{-2p^*/(2p^*+r_0)}=o\big (\min \{\lambda _1,\lambda _2\}\big )\), (S.13) implies that \(r^\dag =r_0\) and \(\mathcal {C}=\mathcal {A}\). Noting that \(\Vert \hat{\pmb {Q}}_{\cdot d}\Vert _2^{-2p^*}=O_p\big (\max _{d'\le r_0}\hat{h}_{d'}^{2p^*}\big )\) for \(d\in \mathcal {A}\), we have, by (S.8) and (S.13), that

$$\begin{aligned}{} & {} {o_p\big (\max _{d\le r_0}\{\hat{h}^{2p^*}_d\}\big )+ O_p\big (n^{-2p^*/(2p^*+r_0)}\big )}\\\ge & {} \lambda _1\sum _{d\le r_0}\hat{h}_d^{2p^*} (1+o_p(1))\\{} & {} +\lambda _2\sum _{d\in \mathcal {A}}\Vert \hat{\pmb {Q}}_{\cdot d}\Vert _2^{-{2p^*}} (1+o_p(1))+O_p\big ( n^{-2p^*/(2p^*+r_0)}\big )\\\ge & {} n^{-1}\sum _{k=1}^K\sum _{i\in S_k}\hat{\mathscr {R}}_{i,k}({\hat{T}})^2\big \{1+o_p(1)\big \}\\= & {} \Omega _p\big (\max _{d\le r_0}\{\hat{h}^{2p^*}_d\}+n^{-1}\hat{\pmb {h}}_{\{1,\dots ,r_0\}}^{-\varvec{1}_{\{1,\dots ,r_0\}}}+d_{\{1,\dots ,r_0\}}(\hat{T})^2\big )>0, \end{aligned}$$

so that \(O_p\big ( n^{-2p^*/(2p^*+r_0)}\big )\ge \Omega _p\big (\max _{d\le r_0}\{\hat{h}^{2p^*}_d\}+n^{-1}\) \(\hat{\pmb {h}}_{\{1,\dots ,r_0\}}^{-\varvec{1}_{\{1,\dots ,r_0\}}}\big )>0\) for general \(m_{T^*}\) functionally unrelated to the distribution function of \((\pmb {X},Y)\). This implies \(\hat{h}_d=\Omega _p(n^{-1/(2p^*+r_0)})\) for \(d\le r_0\) and \(d_{\{1,\dots ,r_0\}}({\hat{T}})=O_p\) \((n^{-p^*/(2p^*+r_0)})\), which proves parts (i) and (ii). Part (iv) then follows by noting that \(n^{1/(2p^*+r_0)}=\Omega _p\big (\min _{d'\le r_0}\hat{h}_{d'}^{-1}\big )=O_p\big ( \Vert \hat{\pmb {Q}}_{\cdot d}\Vert _2\big )\) for \(d\in \mathcal {A}\). For \(d>r_0\), we have, by (ii) and (S.13), that

$$\begin{aligned} \hat{h}_d\rightarrow \infty ,\qquad \lambda _1\big (1+{\hat{h}}_d^{2p^*}\big )^{-1}&=O_p\big (n^{-2p^*/(2p^*+r_0)}\big ),\\ \lambda _1(D-r_0)\big (1+\max _{d>r_0} {\hat{h}}_d^{2p^*}\big )^{-1}&\le \lambda _1\sum _{d>r_0}\big (1+{\hat{h}}_d^{2p^*}\big )^{-1}\\&= O_p\big (n^{-2p^*/(2p^*+r_0)}\big ), \end{aligned}$$

which proves part (iii). Part (v) follows by applying similar arguments to the penalty terms \(\lambda _2\big (1+\Vert \hat{\pmb {Q}}_{\cdot d}\Vert _2^{-2p^*}\big )^{-1}\), \(d\not \in \mathcal {A}\), in (S.13). \(\square \)

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Cheung, K.Y., Lee, S.M.S. High-dimensional local polynomial regression with variable selection and dimension reduction. Stat Comput 34, 1 (2024). https://doi.org/10.1007/s11222-023-10308-1

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s11222-023-10308-1

Keywords

Navigation