Discovering model structure for partially linear models

He, Xin; Wang, Junhui

doi:10.1007/s10463-018-0682-9

Discovering model structure for partially linear models

Published: 30 July 2018

Volume 72, pages 45–63, (2020)
Cite this article

Annals of the Institute of Statistical Mathematics Aims and scope Submit manuscript

Xin He¹ &
Junhui Wang²

483 Accesses
3 Citations
Explore all metrics

Abstract

Partially linear models (PLMs) have been widely used in statistical modeling, where prior knowledge is often required on which variables have linear or nonlinear effects in the PLMs. In this paper, we propose a model-free structure selection method for the PLMs, which aims to discover the model structure in the PLMs through automatically identifying variables that have linear or nonlinear effects on the response. The proposed method is formulated in a framework of gradient learning, equipped with a flexible reproducing kernel Hilbert space. The resultant optimization task is solved by an efficient proximal gradient descent algorithm. More importantly, the asymptotic estimation and selection consistencies of the proposed method are established without specifying any explicit model assumption, which assure that the true model structure in the PLMs can be correctly identified with high probability. The effectiveness of the proposed method is also supported by a variety of simulated and real-life examples.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Robust and sparse regression in generalized linear model by stochastic optimization

Article 11 June 2019

Takayuki Kawashima & Hironori Fujisawa

Robust estimation in partially nonlinear models

Article 20 June 2023

Andrés Muñoz & Daniela Rodriguez

Variable selection and estimation using a continuous approximation to the $$L_0$$ penalty

Article 19 October 2016

Yanxin Wang, Qibin Fan & Li Zhu

References

Boyd, S., Vandenberghe, L. (2004). Convex optimization. Cambridge: Cambridge University Press.
Braun, M. (2006). Accurate error bounds for the eigenvalues of the kernel matrix. Journal of Machine Learning Research, 7, 2303–2328.
MathSciNet MATH Google Scholar
Combettes, P., Wajs, V. (2005). Signal recovery by proximal forward–backward splitting. Multiscale Modeling and Simulation, 4, 1168–1200.
Article MathSciNet Google Scholar
Engle, F., Granger, C., W. J., Rice, J., Weiss, A. (1986). Nonparametric estimates of the relation between weather and electricity sales. Journal of the American Statistical Association, 81, 310–320.
Article Google Scholar
Fan, J., Lv, J. (2008). Sure independence screening for ultrahigh dimensional feature space (with discussion). Journal of the Royal Statistical Society Series B, 70, 849–911.
Article MathSciNet Google Scholar
Fan, J., Feng, Y., Song, R. (2011). Nonparametric independence screening in sparse ultra-high dimensional additive models. Journal of the American Statistical Association, 106, 544–557.
Article MathSciNet Google Scholar
Hoeffding, W. (1963). Probability inequalities for sums of bounded random variables. Journal of American Statistical Association, 58, 13–30.
Article MathSciNet Google Scholar
He, X., Wang, J., Lv, S. (2018). Scalable kernel-based variable selection with sparsistency. arXiv:1802.09246.
Huang, J., Wei, F., Ma, S. (2012). Semiparametric regression pursuit. Statistica Sinica, 22, 1403–1426.
Jaakkola, T., Diekhans, M., Haussler, D. (1999). Using the Fisher kernel method to detect remote protein homologies. In Proceedings of seventh international conference on intelligent systems for molecular biology (pp. 149–158).
Lian, H., Liang, H., Ruppert, D. (2015). Separation of covariates into nonparametric and parametric parts in high-dimensional partially linear additive models. Statistica Sinica, 25, 591–607.
Lin, D., Ying, Z. (1994). Semiparametric analysis of the additive risk model. Biometrika, 81, 61–71.
Article MathSciNet Google Scholar
Moreau, J. (1962). Fonctions convexes duales et points proximaux dans un espace Hilbertien. Reports of the Paris Academy of Sciences, Series A, 255, 2897–2899.
MathSciNet MATH Google Scholar
Prada-Sánchez, J., Febrero-Bande, M., Cotos-Yáñez, T., Gonzlez-Manteiga, W., Bermúdez-Cela, J., Lucas-Dominguez, T. (2000). Prediction of SO2 pollution Incidents near a power station using partially linear models and an historical matrix of predictor-response Vectors. Environmetrics, 11, 209–225.
Raskutti, G., Wainwright, M., Yu, B. (2012). Minimax-optimal rates for sparse additive models over kernel classes via convex programming. Journal of Machine Learning Research, 13, 389–427.
Rotnitzky, A., Jewell, N. (1990). Hypothesis testing of regression parameters in semiparametric generalized Linear models for cluster correlated Data. Biometrika, 77, 485–497.
Article MathSciNet Google Scholar
Schmalensee, R., Stoker, M. (1999). Household gasoline demand in the united states. Econometrica, 67, 645–662.
Article Google Scholar
Sun, W., Wang, J., Fang, Y. (2013). Consistent selection of tuning parameters via variable selection stability. Journal of Machine Learning Research, 14, 3419–3440.
Wahba, G. (1998). Support vector machines, reproducing kernel hilbert spaces, and randomized GACV. In B. Scholkopf, C. Burges, A. Smola (Eds.), Advances in kernel methods: support vector learning (pp. 69–88). Cambridge: MIT Press.
Wu, Y., Stefanski, L. (2015). Automatic structure recovery for additive models. Biometrika, 102, 381–395.
Article MathSciNet Google Scholar
Xue, L. (2009). Consistent variable selection in additive models. Statistica Sinica, 19, 1281–1296.
MathSciNet MATH Google Scholar
Yafeh, Y., Yosha, O. (2003). Large shareholders and banks: Who monitors and how? The Economic Journal, 113, 128–146.
Article Google Scholar
Yang, L., Lv, S., Wang, J. (2016). Model-free variable selection in reproducing kernel Hilbert space. Journal of Machine Learning Research, 17, 1–24.
Ye, G., Xie, X. (2012). Learning sparse gradients for variable selection and dimension reduction. Machine Learning, 87, 303–355.
Article MathSciNet Google Scholar
Yuan, M., Lin, Y. (2006). Model selection and estimation in regression with group variables. Journal of the Royal Statistical Society Series B, 68, 49–67.
Article MathSciNet Google Scholar
Zhang, H., Cheng, G., Liu, Y. (2011). Linear or nonlinear? Automatic structure discovery for partially linear models. Journal of American Statistical Association, 106, 1099–1112.
Article MathSciNet Google Scholar
Zhou, D. (2007). Derivative reproducing properties for kernel methods in learning theory. Journal of Computational and Applied Mathematics, 220, 456–463.
Article MathSciNet Google Scholar

Download references

Acknowledgements

This research is supported in part by HK GRF-11302615, HK GRF-11331016, and City SRG-7004865. The authors would like to thank the associate editor and two anonymous referees for their constructive suggestions. The authors would also like to thank Dr. Heng Lian (City University of Hong Kong) for sharing his code on the DPLM method.

Author information

Authors and Affiliations

School of Statistics and Management, Shanghai University of Finance and Economics, 777 Guoding Road, Shanghai, 200433, China
Xin He
Department of Mathematics, City University of Hong Kong, 83 Tat Chee Ave, Kowloon Tong, 999077, Hong Kong, China
Junhui Wang

Authors

Xin He
View author publications
You can also search for this author in PubMed Google Scholar
Junhui Wang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Xin He.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 230 KB)

Appendix: technical proofs

Proof of Theorem 1

For some constant ${a_1}$, denote

$$\begin{aligned} \mathcal {C}&=\Big \{ \mathcal{E}(\widehat{\mathbf{g}},\widehat{\mathbf{H}})-2\sigma ^2_s \ge a_{1} \left( \log \frac{4}{\delta _n}\right) ^{1/2} \\&\qquad \big ( n^{-1/4} + n^{-1/2} \lambda _0^{-1} + n^{-1/2}\lambda _1^{-2} +s^{p+6} +\lambda _0 +\lambda _1 \big ) \Big \}. \end{aligned}$$

Then it suffices to bound $P(\mathcal{C})$. First,

$$\begin{aligned} P(\mathcal {C})&= P \left( \mathcal {C}\cap \{ |y| \le n^{1/8}~\text{ and }~ U_n \le M_0\} \right) \\&\quad +P \left( \mathcal {C}\cap \{ |y| \le n^{1/8}~\text{ and }~ U_n \le M_0\}^{C} \right) \\&\le P\left( |y|> n^{1/8}\right) + P\left( |y| \le n^{1/8}~\text{ and }~U_n> M_0 \right) \\&\quad +P\Big (\mathcal {C}\cap \{ |y| \le n^{1/8}~\text{ and }~ U_n \le M_0\}\Big )=P_1+P_2+P_3, \end{aligned}$$

where $U_n=\frac{1}{n(n-1)}\sum _{i,j=1}^n(y_i-y_j)^2$, and $M_0 = 4A^2 + 2\sigma ^2+1$ with A being the upper bound of $f^*(\mathbf{x})$ on $\mathcal{X}$ and $\sigma ^2=\mathrm{Var}(\epsilon )$. Next, we bound $P_1, P_2,$ and $P_3$ separately. To bound $P_1$, we have $P_1\le {E(|y|)}n^{-1/8}$ by the Markov’s inequality, where E(|y|) is a bounded quantity. To bound $P_2$, note that $E(U_n)=E(E(U_n|\mathbf{x}_i,\mathbf{x}_j))=E((f^*(\mathbf{x}_i)-f^*(\mathbf{x}_j))^2)+E((\epsilon _i-\epsilon _j)^2) \le 4A^2 + 2\sigma ^2$. And thus by Bernstein’s inequality for U-statistics (Hoeffding 1963), we have that

$$\begin{aligned} P_2&\le P\left( U_n> M_0 \big | |y|\le n^{1/8}\right) \\&\le P\left( U_n -E(U_n) >1 \big | |y|\le n^{1/8}\right) \le \exp \left( -\frac{n^{{1}/{2}}}{16}\right) . \end{aligned}$$

To bound $P_3$, within the set $\{ |y| \le n^{\frac{1}{8}}~\text{ and }~ U_n{\tiny {\tiny }} \le M_0\}$, equlaity (1) and by Lemma 3 in the supplementary file, we have with probability at least $1-\delta _n$ that

$$\begin{aligned} 0&\le \mathcal{E}(\widehat{\mathbf{g}},\widehat{\mathbf{H}}) - 2\sigma ^2_s \le a_1 \left( \log \frac{4}{\delta _n}\right) ^{1/2} \\&\quad \left( n^{-{1}/{4}}+ M_0^2n^{-{1}/{2}}\lambda _0^{-1} + M_0^2n^{-{1}/{2}}\lambda _1^{-2} +s^{p+6} + \lambda _0 +\lambda _1 \right) , \end{aligned}$$

which implies $P_3\le \delta _n$, and thus $P(\mathcal {C})\le \delta _n+O(n^{-{1}/{8}})$ for some constant $a_1$. Specially, when $\lambda _0= n^{-{1}/{4}}$, $\lambda _1=n^{-{1}/{4(p+2)}}$ and $s=n^{-{1}/{4(p+6)(p+2+2\theta )}}$, there exists a constant $c_5$ such that with probability at least $1-\delta _n$

$$\begin{aligned} 0\le \mathcal{E}(\widehat{\mathbf{g}},\widehat{\mathbf{H}}) -2\sigma ^2_s \le c_5 \left( \log \frac{4}{\delta _n}\right) ^{{1}/{2}}n^{-\frac{1}{4(p+2+2\theta )}}. \end{aligned}$$

Next, we establish the estimation consistency. By Assumptions 1 and 2 and equlaity (1), for some constant $a_2$ there holds

$$\begin{aligned} 0\le \mathcal{E}({\mathbf{g}}^*,{\mathbf{H}}^*) - 2 \sigma _s^2&\le \iint w(\mathbf{x},\mathbf{u})c_0^2\Vert \mathbf{x}-\mathbf{u}\Vert _2^6\rho _{\mathbf{x}}\rho _{\mathbf{u}}\\&\le c_0^2c_4s^{p+6}\int e^{-\mathbf{t}^\mathrm{T}\mathbf{t}}{\mathbf{t}}^\mathrm{T}{\mathbf{t}} \mathrm{d}\mathbf{t}\le a_2s^{p+6}, \end{aligned}$$

where $a_2=c_0^2c_4\int e^{-\mathbf{t}^\mathrm{T}\mathbf{t}}{\mathbf{t}}^\mathrm{T}{\mathbf{t}} \mathrm{d}\mathbf{t}$, $\mathbf{t}=(\mathbf{u}-\mathbf{x})/s$ and $\int e^{-\mathbf{t}^\mathrm{T}\mathbf{t}}{\mathbf{t}}^\mathrm{T}{\mathbf{t}} \mathrm{d}\mathbf{t}$ is a bounded quantity. Specially, with $s=n^{-{1}/{4(p+6)(p+2+2\theta )}}$, we have $\mathcal{E}({\mathbf{g}}^*,{\mathbf{H}}^*) - 2 \sigma _s^2\le a_2n^{-{1}/{4(p+2+2\theta )}}$. Therefore, for some constant $c_6$, triangle inequality implies that

$$\begin{aligned} |\mathcal{E}(\widehat{\mathbf{g}},\widehat{\mathbf{H}}) - \mathcal{E}({\mathbf{g}}^*,{\mathbf{H}}^*)|&\le |\mathcal{E}(\widehat{\mathbf{g}},\widehat{\mathbf{H}})-2\sigma ^2_s| + |\mathcal{E}({\mathbf{g}}^*,{\mathbf{H}}^*)-2\sigma ^2_s|\\&\le c_6 \left( \log \frac{4}{\delta _n}\right) ^{{1}/{2}}n^{-\frac{1}{4(p+2+2\theta )}}. \end{aligned}$$

This completes the Proof of Theorem 1. $\square $

Proof of Theorem 2:

First we show that for any $l \in \mathcal{L}^*$, $\Vert \widehat{H}_{ll'}\Vert _{L_{\rho _{\mathbf{x}}}^2}=0$ for any $l'\in \mathcal{S}$. Note that $\Vert \widehat{c}_{ll'}\Vert _2=0$ implies that $\Vert \widehat{H}_{ll'}\Vert _{L_{\rho _{\mathbf{x}}}^2}=0$ based on the representer theorem for the RKHS, and thus it suffices to show $ \Vert \widehat{\mathbf{c}}_{ll'}\Vert _2=0$ for any $l \in \mathcal{L}^*$ and $l'\in \mathcal{S}$.

Suppose $\Vert \widehat{\mathbf{c}}_{ll'}\Vert _2>0$ for some $l \in \mathcal{L}^*$ and $l' \in \mathcal{S}$. The derivative of (5) with respect to ${\mathbf{c}}_{ll'}$ yields that

$$\begin{aligned} \sum _{i,j=1}^n x_{ijl}x_{jil'} A_1 ({\mathbf{x}}_i,y_i,{\mathbf{x}}_j,y_j){\mathbf{K}}_{{\mathbf{x}}_i} = \lambda _1 A_2(\widehat{\mathbf{c}}_{ll'}), \end{aligned}$$

(8)

where

$$\begin{aligned} A_1({\mathbf{x}}_i,y_i,{\mathbf{x}}_j,y_j)= & {} \frac{1}{n(n-1)}w_{ij} ( y_i -y_j - \widehat{\mathbf{g}}({\mathbf{x}}_i)^\mathrm{T} ({\mathbf{x}}_i - {\mathbf{x}}_j)\\&+ \frac{1}{2} ({\mathbf{x}}_i - {\mathbf{x}}_j)^\mathrm{T} \widehat{\mathbf{H}}({\mathbf{x}}_i)({\mathbf{x}}_i - {\mathbf{x}}_j) ), \end{aligned}$$

$A_2(\widehat{\mathbf{c}}_{ll'})=\frac{ \pi _{ll'} {\mathbf{K}} \widehat{\mathbf{c}}_{ll'} }{\left( \widehat{\mathbf{c}}_{ll'}^\mathrm{T} {\mathbf{K}} \widehat{\mathbf{c}}_{ll'} \right) ^{{1}/{2}}}$, and $x_{ijl}=x_{il}-x_{jl}.$ For the right-hand side of (8), its norm divided by $n^{{1}/{2}}$ is $n^{-{1}/{2}}\lambda _1\Vert A_2(\widehat{\mathbf{c}}_{ll'})\Vert _2 \ge n^{-{1}/{2}} \lambda _{1}\pi _{ll'} \psi _{min}\psi _{max}^{-{1}/{2}}$, which diverges to infinity by Assumption 5. For the left-hand side of (8), by Assumption 1, $x_{ijl}, x_{jil'}$, and every elements of ${\mathbf{K}}_{\mathbf{x}}$ are bounded. Denote $A_{\mathcal{Z}^n}(\widehat{{\varvec{\alpha }}},\widehat{\mathbf{c}})=\sum _{i,j=1}^n A_1 ({\mathbf{x}}_i,y_i,{\mathbf{x}}_j,y_j)$, we will show that $|A_{\mathcal{Z}^n}(\widehat{{\varvec{\alpha }}},\widehat{\mathbf{c}})|$ is bounded as well.

For some constant $a_{3}$ and $\delta _n\in (0,1)$, denote

$$\begin{aligned} \mathcal {D}&=\Big \{ |A_{\mathcal{Z}^n}(\widehat{{\varvec{\alpha }}},\widehat{\mathbf{c}})| > a_{3}\left( \log {\frac{2}{\delta _n}}\right) ^{{1}/{2}} \\&\quad \left( n^{-{1}/{8(p+2+2\theta )}} + n^{-{3}/{8}} +n^{-{1}/{2}}\lambda _0^{-{1}/{2}}+n^{-{1}/{2}}\lambda _1^{-1} \right) \Big \}, \end{aligned}$$

and thus it suffices to bound $P(\mathcal{D})$. First, we have

$$\begin{aligned} P(\mathcal {D})&=P\left( \mathcal {D}\cap \{|y|\le n^{{1}/{8}} ~\text{ and }~U_n\le M_0\} \right) \\&\quad + P\left( \mathcal {D}\cap \{|y|\le n^{{1}/{8}} ~\text{ and }~U_n\le M_0\}^{C} \right) \\&\le P\left( |y|> n^{{1}/{8}} \right) + P\left( |y|\le n^{{1}/{8}} ~\text{ and }~U_n> M_0 \right) \\&\quad + P\left( \mathcal {D}\cap \{|y|\le n^{{1}/{8}} ~\text{ and }~U_n\le M_0\}\right) \le P_1+P_2+P_4, \end{aligned}$$

where $U_n$ and $M_0$ are defined as in Theorem 1. Note that $P_1+P_2=O(n^{-1/8})$ as in the Proof of Theorem 1. To bound $P_4$, by Cauchy–Schwarz inequality, we conclude that

$$\begin{aligned} E(A_{\mathcal{Z}^n}(\widehat{{\varvec{\alpha }}}, \widehat{\mathbf{c}} ))&\le \Big (\iint w({\mathbf{x}},{\mathbf{u}}) \Big (f^*(\mathbf{x}) - f^*(\mathbf{u}) -\widehat{\mathbf{g}}(\mathbf{x})^\mathrm{T}({\mathbf{x}}-{\mathbf{u}}) \\&\quad + \frac{1}{2} ({\mathbf{x}}-{\mathbf{u}})^\mathrm{T}\widehat{\mathbf{H}}(\mathbf{x})({\mathbf{x}}-{\mathbf{u}}) \Big )^2 \mathrm{d} \rho _{\mathbf{x}}\mathrm{d} \rho _{\mathbf{u}} \Big )^{{1}/{2}}=\left( \mathcal{E}(\widehat{\mathbf{g}},\widehat{\mathbf{H}})-2\sigma ^2_s\right) ^{{1}/{2}}. \end{aligned}$$

Within the set $\{|y|\le n^{{1}/{8}} ~\text{ and }~U_n\le M_0\}$, following similar proofs of Lemma 1 and Proposition 2, we have for some constant $a_{3}$, with probability at least $1-{\delta _n}$ there holds

$$\begin{aligned} |A_{\mathcal{Z}^n}(\widehat{{\varvec{\alpha }}}, \widehat{\mathbf{c}})| \le a_{3} \left( \log \frac{2}{\delta _n}\right) ^{{1}/{2}}\left( n^{-{1}/{{8(p+2+2\theta )}}} + n^{-{3}/{8}} +n^{-{1}/{2}}\lambda _0^{-1}+n^{-{1}/{2}}\lambda _1^{-1} \right) , \end{aligned}$$

which implies $P_4\le \delta _n$, and thus we have $P(\mathcal {D})\le {\delta _n} + O(n^{-{1}/{8}})$. Combining the above results, the norm of the left-hand side of (8) divided by $n^{{1}/{2}}$ converges to zero in probability, which contradicts with the fact that the right-hand side of (8) diverges to infinity when $n\rightarrow \infty $. Therefore, for any $l \in \mathcal{L}^*$ and $l'\in \mathcal{S}$, $\Vert \widehat{\mathbf{c}}_{ll'}\Vert _2\equiv 0$, implying $\Vert \widehat{H}_{ll'}\Vert _{L_{\rho _{\mathbf{x}}}^2}=0$ for any $l \in \mathcal{L}^*$ and $l'\in \mathcal{S}$, and thus there holds $\widehat{\mathcal{L}} \subset \mathcal{L}^*$.

Next, we show $\Vert \widehat{H}_{ll'}\Vert _{L^2_{\rho _{\mathbf{x}}}}^2\ne 0$ for any $l\in \mathcal{N}^* $ and some $l'\in \mathcal{S}$. By Lemma 4, for the set $\mathcal{X}_s=\{ \mathbf{x}\in \mathcal{X}: d(\mathbf{x},\partial \mathcal{X})>s, p(\mathbf{x})>s+c_1s^{\theta }\}$ and some constant as in the supplementary file, there holds

$$\begin{aligned} \int \nolimits _{\mathcal{X}_{s}} \Vert \widehat{\mathbf{H}}(\mathbf{x}) - {\mathbf{H}}^*(\mathbf{x})\Vert _F^2 \mathrm{d} \rho _{\mathbf{x}} \le \frac{b_{6}}{s^{p+5 }} (s^{6+p} + \mathcal{E}\left( \widehat{\mathbf{g}}, \widehat{\mathbf{H}})- 2\sigma _s^2\right) , \end{aligned}$$

which converges to zero in probability by Theorem 1. Suppose that there exist some $l\in \mathcal{N}^*$ such that $\Vert \widehat{H}_{ll'}\Vert ^2_{L_{\rho _X}^2}=0$ for any $l'\in \mathcal{S}$, then

$$\begin{aligned} \int \nolimits _{\mathcal{X}_{s}} ({H}^*_{ll'}(\mathbf{x}))^2 \mathrm{d} \rho _{\mathbf{x}}\le \int \nolimits _{\mathcal{X}_{s}} \Vert \widehat{\mathbf{H}}(\mathbf{x}) - {\mathbf{H}}^*(\mathbf{x})\Vert _F^2 \mathrm{d} \rho _{\mathbf{x}}. \end{aligned}$$

However, Assumption 4 implies that for some $l'\in \mathcal{S}$, $\int \nolimits _{\mathcal{X}_{s}} ({H}^*_{ll'}(\mathbf{x}))^2 \mathrm{d} \rho _{\mathbf{x}}>\int \nolimits _{\mathcal{X}\backslash \mathcal{X}_{t}} ({H}^*_{ll'}(\mathbf{x}))^2 \mathrm{d} \rho _{\mathbf{x}}$ when s is sufficiently small, which is a positive constant, and thus leads to contradiction. Therefore, $\Vert \widehat{H}_{ll'}\Vert _{L^2_{\rho _{\mathbf{x}}}}^2\ne 0$ for any $l\in \mathcal{N}^* $ and some $l'\in \mathcal{S}$, and thus there holds $\widehat{\mathcal{N}}\subset \mathcal{N}^*$.

Finally, since $\mathcal{S}=\mathcal{L}^*\cup \mathcal{N}^*=\widehat{L}\cup \widehat{\mathcal{N}}$ and $\mathcal{L}^*\cap \mathcal{N}^*=\widehat{L}\cap \widehat{\mathcal{N}}=\varnothing $, combining with the above results we have $P(\widehat{\mathcal{L}} = \mathcal{L}^*)\rightarrow 1$ and $P(\widehat{\mathcal{N}}=\mathcal{N}^*)\rightarrow 1$ when n diverges. Moreover, we have $P(\widehat{\mathcal{L}}=\mathcal{L}^*,\widehat{\mathcal{N}}=\mathcal{N}^*) \ge 1 - P(\widehat{\mathcal{L}} \ne \mathcal{L}^*) - P(\widehat{\mathcal{N}}\ne \mathcal{N}^*) \rightarrow 1$ as $n \rightarrow \infty $. This completes the Proof of Theorem 2. $\square $

About this article

Cite this article

He, X., Wang, J. Discovering model structure for partially linear models. Ann Inst Stat Math 72, 45–63 (2020). https://doi.org/10.1007/s10463-018-0682-9

Download citation

Received: 06 December 2017
Revised: 04 June 2018
Published: 30 July 2018
Issue Date: February 2020
DOI: https://doi.org/10.1007/s10463-018-0682-9

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Discovering model structure for partially linear models

Abstract

Access this article

Similar content being viewed by others

Robust and sparse regression in generalized linear model by stochastic optimization

Robust estimation in partially nonlinear models

Variable selection and estimation using a continuous approximation to the $$L_0$$ penalty

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Electronic supplementary material

Supplementary material 1 (pdf 230 KB)

Appendix: technical proofs

Proof of Theorem 1

Proof of Theorem 2:

About this article

Cite this article

Keywords

Navigation

Discovering model structure for partially linear models

Abstract

Access this article

Similar content being viewed by others

Robust and sparse regression in generalized linear model by stochastic optimization

Robust estimation in partially nonlinear models

Variable selection and estimation using a continuous approximation to the $$L_0$$ penalty

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Electronic supplementary material

Supplementary material 1 (pdf 230 KB)

Appendix: technical proofs

Appendix: technical proofs

Proof of Theorem 1

Proof of Theorem 2:

About this article

Cite this article

Share this article

Keywords

Search

Navigation