Abstract
In this paper, a minimizing average check loss estimation (MACLE) procedure is proposed for the single-index coefficient model (SICM) in the framework of quantile regression (QR). The resulting estimators have the asymptotic normality and achieve the best convergence rate. Furthermore, a variable selection method is investigated for the QRSICM by combining MACLE method with the adaptive LASSO penalty, and we also established the oracle property of the proposed variable selection method. Extensive simulations are conducted to assess the finite sample performance of the proposed estimation and variable selection procedure under various error settings. Finally, we present a real-data application of the proposed approach.
Similar content being viewed by others
References
Cai, Z., Xu, X. (2008). Nonparametric quantile estimations for dynamic smooth coefficient models. Journal of the American Statistical Association, 103, 1595–1608.
Fan, J., Li, R. (2001). Variable selection via non-concave penalized likelihood and its oracle properties. Journal of the American Statistical Association, 96, 1348–1360.
Fan, J., Yao, Q., Cai, Z. (2003). Adaptive varying-coefficient linear models. Journal of the Royal Statistical Society Series B (Statistical Methodology), 65, 57–80.
Feng, S., Xue, L. (2013). Variable selection for single-index varying-coefficient model. Frontiers of Mathematics in China, 8, 541–565.
Geyer, C. J. (1994). On the asymptotics of constrained m-estimation. The Annals of Statistics, 22, 1993–2010.
Härdle, W., Hall, P., Ichimura, H. (1993). Optimal smoothing in single-index models. The Annals of Statistics, 21, 157–178.
Hjort, N., Pollard, D. (1993). Asymptotics for minimizers of convex processes (preprint).
Honda, T. (2004). Quantile regression in varying coefficient models. Journal of Statistical Planning and Inference, 121, 113–125.
Huang, Z., Zhang, R. (2013). Profile empirical-likelihood inferences for the single-index-coefficient regression model. Statistics and Computing, 23, 455–465.
Jiang, R., Zhou, Z., Qian, W., Shao, W. (2012). Single-index composite quantile regression. Journal of the Korean Statistical Society, 41, 323–332.
Kai, B., Li, R., Zou, H. (2011). New efficient estimation and variable selection methods for semiparametric varying-coefficient partially linear models. The Annals of Statistics, 39, 305–332.
Kim, M.-O. (2007). Quantile regression with varying coefficients. The Annals of Statistics, 35, 92–108.
Knight, K. (1998). Limiting distributions for \(l_1\) regression estimators under general conditions. The Annals of Statistics, 26, 755–770.
Koenker, R., Basset, G. S. (1978). Regression quantiles. Econometrica, 46, 33–50.
Lu, Z., Tjøstheim, D., Yao, Q. (2007). Adaptive varying-coefficient linear models for stochastic processes: Asymptotic theory. Statistica Sinica, 17, 177–197.
Mack, Y. P., Silverman, B. W. (1982). Weak and strong uniform consistency of kernel regression estimates. Probability Theory and Related Fields, 61, 405–415.
Shapiro, S. S., Wilk, M. B. (1965). An analysis of variance test for normality (complete samples). Biometrika, 52, 591–611.
Wang, H., Leng, C. (2007). Unified lasso estimation via least squares approximation. Journal of the American Statistical Association, 102, 1039–1048.
Wu, T., Yu, K., Yu, Y. (2010). Single-index quantile regression. Journal of Multivariate Analysis, 101, 1607–1621.
Xia, Y., Tong, H., Li, W. K. (1999). On extended partially linear single-index models. Biometrika, 86, 831–842.
Xue, L., Pang, Z. (2013). Statistical inference for a single-index varying-coefficient model. Statistics and Computing, 23, 589–599.
Zhang, C. H. (2010). Nearly unbiased variable selection under minimax concave penalty. The Annals of Statistics, 38, 894–942.
Zou, H. (2006). The adaptive lasso and its oracle properties. Journal of the American Statistical Association, 101, 1418–1429.
Author information
Authors and Affiliations
Corresponding author
Additional information
The research was supported in part by National Natural Science Foundation of China (11501372, 11571112), Project of National Social Science Fund (15BTJ027), Doctoral Fund of Ministry of Education of China (20130076110004), Program of Shanghai Subject Chief Scientist (14XD1401600) and the 111 Project of China (B14019).
Appendix
Appendix
To establish the asymptotic properties and the Oracle property of the proposed methods, we need the following regularity conditions:
-
A.1
The kernel function \(K(\cdot )\) is a symmetric Lipschitz continues density function with a compact support and it satisfies \(\int _{-\infty }^{\infty }z^2K(z)dz<\infty \), \(\int _{-\infty }^{\infty }z^jK^2(z)dz<\infty , ~j=0,1,2\);
-
A.2
Denote \(\varvec{\Theta }\) as the local neighborhood of \(\varvec{\theta }\) and \(\Xi \) as the compact support of the covariate \(\mathbf {X}\). Let \(\mathcal {U}=\left\{ u=\mathbf {x}^T\varvec{\theta };\mathbf {x} \in \Xi , \varvec{\theta }\in \varvec{\Theta } \right\} \) be the compact support of \(\mathbf {X}^T\varvec{\theta }\) with marginal density \(f_{\mathcal {U}}(u)\). Furthermore, \(f_{\mathcal {U}}(u)\) is first-order Lipschitz continuous and its lower bound is positive;
-
A.3
Denote \(u_{\varvec{\theta }}=\mathbf {x}^T\varvec{\theta }\), the index function \(\mathbf {\alpha }(u_{\varvec{\theta }})\) is second order differentiable with respect to \(u_{\varvec{\theta }}\) and it is Lipschitz continues with respect to \(\varvec{\theta }\);
-
A.4
Given \(\mathrm {X}^T\varvec{\theta }=u\), the conditional density f(y|u) is Lipschitz continues with respect to y and u;
-
A.5
The matrix functions \(\mathrm {E}(\mathbf {X}|\mathbf {X}^T\varvec{\theta }=u)\), \(\mathrm {E}(\mathbf {Z}|\mathbf {X}^T\varvec{\theta }=u)\), \(\mathrm {E}(\mathbf {X}^{\otimes 2}|\mathbf {X}^T\varvec{\theta }=u)\), \(\mathrm {E}(\mathbf {Z}^{\otimes 2}|\mathbf {X}^T\varvec{\theta }=u)\) and \(\mathrm {E}(\mathbf {X}\mathbf {Z}^T|\mathbf {X}^T\varvec{\theta }=u)\) are consistently Lipschitz continuous with respect to \(u\in \mathcal {U}\) and \(\varvec{\theta }\in \Theta \), where \(A^{\otimes 2}=AA^T\), A is matrix or vector;
-
A.6
The bandwidth h satisfies \(h\sim n^{-\delta }\), where \(1/6<\delta <1/4\);
-
A.7
\(\forall u\in \mathcal {U}\) and \(\varvec{\theta }\in \Theta \), the matrix \(\mathrm {E}(\mathbf {Z}^{\otimes 2}|\mathbf {X}^T\varvec{\theta }=u)\) is invertible;
-
A.8
\(\forall \varvec{\theta }\in \Theta \), the matrix \(\mathcal {G}\) defined in Theorem 1 is positive definite.
Remark 5
The above conditions are commonly used in the semi-parametric literature and they can be easily satisfied in many applications. Condition A.1 simply requires that the kernel function is a proper density with finite second moment, which is required to derive the asymptotic variance of estimators. Condition A.2 guarantees the existence of any ratio terms with the density appearing as part of the denominator. Conditions A.3 and A.4 are commonly used in single-index model and quantile regression literature, see Wu et al. (2010), Kai et al. (2011) and Xue and Pang (2013). Condition A.5 list some common assumptions in semi-parametric model, see for example Huang and Zhang (2012), Kai et al. (2011) and Xue and Pang (2013). Condition A.6 admits the optimal bandwidth in nonparametric estimation. Condition A.7 comes from Lu et al. (2007) and Kai et al. (2011). Condition A.8 is used to derive the consistence of the variable selection method.
The following two lemmas will be frequently used in our proof.
Lemma 1
Suppose \(A_n(s)\) is convex and can be represented as \(\frac{1}{2}s^TVs+U_n^Ts+C_n+r_n(s)\), where V is symmetric and positive definite, \(U_n\) is stochastically bounded, \(C_n\) is arbitrary, and \(r_n(s)\) goes to zero in probability for each s. Then the argmin of \(A_n\) is only \(o_p(1)\) away from \(\beta _n=-V^{-1}U_n\), the argmin of \(\frac{1}{2}s^TVs+U_n^Ts+C_n\).
Proof
This lemma comes from the Basic proposition in Hjort and Pollard (1993). \(\square \)
Lemma 2
Let \((U_1,Y_1),\ldots ,(U_n,Y_n)\) be independent and identically distributed random vectors, where \(Y_i\) and \(U_i\) are scalar random variable. Assume further that \(\mathrm {E}|Y|^s<\infty \) and \(\sup \limits _u \int |y|^sf(u,y)\mathrm {d}y<\infty \), where \(f(\cdot ,\cdot )\) denotes the joint density of (U, Y). Let \(K(\cdot )\) be a bounded positive function with a bounded support and satisfying a Lipschitz condition. Then
provided that \(n^{2\varepsilon -1}h\rightarrow \infty \) for some \(\varepsilon <1-s^{-1}\), where \(\mathscr {U}\) is the compact support of U.
Proof
This follows from the result by Mack and Silverman (1982). \(\square \)
Let \(\tilde{\varvec{\theta }}\) be the initial consistency estimate of parameter \(\varvec{\theta }\), which can be obtained using existing methods, see Remark 1. In the following, we assume \(\tilde{\varvec{\theta }}-\varvec{\theta }=o_p(1)\). Denote \(\delta _n=\left[ {\ln (1/h) }/{nh} \right] ^{1/2}\), \(\tau _n=h^2+\delta _n\), \(\delta _{\varvec{\theta }}=\Vert \tilde{\varvec{\theta }}-\varvec{\theta }\Vert \) and \(K_{ih}^{\varvec{\theta }}=K_{i,h}^{\varvec{\theta }}(\mathbf {x})=K_h(\mathbf {X}_{i0}^T\varvec{\theta })\), where \(\mathbf {X}_{i0}=\mathbf {X}_i-\mathbf {x}\). Then we have the following Lemma 3.
Lemma 3
Assume \(\mathbf {x}\) as the interior point of \(\Xi \), denote
then we have
where \(\mu _{\varvec{\theta }}(\mathbf {x})=\mathrm {E}(X|\mathbf {X}^T\varvec{\theta }=\mathbf {x}^T\varvec{\theta })\), \(\nu _{\varvec{\theta }}(\mathbf {x})=\mathrm {E}(Z|\mathbf {X}^T\varvec{\theta }=\mathbf {x}^T\varvec{\theta })\), \(\pi _{\varvec{\theta }}(\mathbf {x})=\mathrm {E}(\mathbf {Z}\mathbf {Z}^T|\mathbf {X}^T\varvec{\theta }=\mathbf {x}^T\varvec{\theta })\), \(\Sigma _{\varvec{\theta }}(\mathbf {x})=\mathrm {E}\left( (\mathbf {X}-\mu _{\varvec{\theta }}(\mathbf {x}))(\mathbf {X}-\mu _{\varvec{\theta }}(\mathbf {X}))^T|\mathbf {X}^T\varvec{\theta }=\mathbf {x}^T\varvec{\theta } \right) \).
Proof
By the Condition 2, after some direct calculations, we can easily obtain the above conclusions. \(\square \)
Lemma 4
For the given interior point \(\mathbf {x}\) of \(\mathbf {X}\), then the estimates of \(\mathbf {g}(\mathbf {x}^T\tilde{\varvec{\theta }})\) and \(\mathbf {g}'(\cdot )\) are
Under the conditions A.1–A.7, we have
where \(\varvec{\theta }_d=\tilde{\varvec{\theta }}-\varvec{\theta }\), \(\mathbf {X}_{i0}=\mathbf {X}_i-\mathbf {x}\), \(\psi _{\tau }(u)=\tau -I(u<0)\),
In particular, \(\sup \limits _{\mathbf {x}\in \Xi } \Vert \hat{\mathbf {g}}'(\mathbf {x}^T\tilde{\varvec{\theta }})-\mathbf {g}'(\mathbf {x}^T\tilde{\varvec{\theta }} )\Vert =O(h^2+h^{-1}\delta _n+\delta _{\varvec{\theta }})\) holds.
Proof
For notation simplicity, let \(\mathbf {x}^T\tilde{\varvec{\theta }}=u\), denote
and
Then \(\hat{\eta }_n\) is the minimizer of the following object function
By the identify equation in Knight (1998),
it follows that \(Q_n(\eta )\) can be restated as
where \(W_n=\frac{1}{\sqrt{nh}}\sum \limits _{i=1}^nK_iM_i\psi _{\tau }(\varepsilon _i)\),
We next consider \(B_n(\eta )\). Denote \(\tilde{\mathcal {X}}\) as the \(\sigma \) field generated by \(\{\mathbf {X}_1^T\tilde{\varvec{\theta }},\mathbf {X}_2^T\tilde{\varvec{\theta }},\ldots ,\mathbf {X}_n^T\tilde{\varvec{\theta }} \}\). Take the conditional expectation of \(B_n(\eta )\), we have
where \( B_{n1}(\eta ) =\frac{1}{2} f_Y(q_{\tau }(\mathbf {x},\mathbf {z})|u) \eta ^T \left( \frac{1}{nh}\sum \limits _{i=1}^nM_i{M_i}^TK_i \right) \eta \),
We next calculate \(\mathrm {Var}(B_n(\eta )| \tilde{\mathcal {X}})\). Denote
Since
Therefore, we have \(\mathrm {Var}(B_n(\eta )| \tilde{\mathcal {X}})=o(1)\), and it follows that
Denote \( \mathbb {S}_n=\frac{1}{nh}f_Y(q_{\tau }(\mathbf {x},\mathbf {z})|u)\sum \limits _{i=1}^n M_i{M_i}^T K_i~\). By the above Lemma 3, it is easy to prove \(\mathbb {S}_n=\mathbb {S}+O_p(\tau _n+\delta _{\varvec{\theta }})\), where
and \(A\otimes B\) denotes the Kronecker product of two matrixes.
Combining the above results, we have
Now we begin to consider \(B_{n2}(\eta )\). Note that
hence it follows that
Combining the results from (17), (18), (19) and (20), we have
By the result of (), the minimizer of \(Q_n(\eta )\) can be expressed as
According to the definition of \(\hat{\eta }_n\) and \(W_n\), the result of the first part follows. Meanwhile, by the Lemma 2, the second part also follows. \(\square \)
Proof of Theorem 1
Given the estimates \(\hat{\mathbf {g}}(\mathbf {X}_j^T\tilde{\varvec{\theta }}),~\hat{\mathbf {g}}'(\mathbf {X}_j^T\tilde{\varvec{\theta }})\) of \(\mathbf {g}(\mathbf {X}_j^T\tilde{\varvec{\theta }})\) and \(\mathbf {g}'(\mathbf {X}_j^T\tilde{\varvec{\theta }})\), \(j=1,\ldots ,n\), by (6), the estimate \(\varvec{\theta }\) can be obtained as
Denote \(\tilde{U}_i=\mathbf {X}_i^T\tilde{\varvec{\theta }},~\tilde{U}_j=\mathbf {X}_j^T\tilde{\varvec{\theta }}\). Let
then \(\hat{\varvec{\theta }}^{*}\) is the minimizer of
By Knight (1998) identify Eq. (16), we can rewritten \(\mathcal {Q}_n(\varvec{\theta }^{*})\) as
where \( \mathcal {Q}_{1n}(\varvec{\theta }^{*})= -\frac{1}{\sqrt{n}}\sum \limits _{j=1}^n\sum \limits _{i=1}^n\omega _{ij} \psi _{\tau }(\varepsilon _i)M_{ij}^T\varvec{\theta }^{*} \),
Firstly, we consider the conditional expectation of \(\mathcal {Q}_{2n}(\varvec{\theta }^{*})\) on \( \tilde{\mathcal {X}}\). By directly calculating, we have
where \(\mathcal {Q}_{2n1}(\varvec{\theta }^{*})=\frac{1}{2} \varvec{\theta }^{*T} \left( \frac{1}{n} \sum \limits _{j=1}^n \sum \limits _{i=1}^n\omega _{ij} f_Y(q_{\tau }(\mathbf {X}_i,\mathbf {Z}_i)|\tilde{U}_i) M_{ij}M_{ij}^T \right) \varvec{\theta }^{*}\),
Denote \(\mathcal {R}_n(\varvec{\theta }^{*})=\mathcal {Q}_{2n}(\varvec{\theta }^{*}) -\mathrm {E}(\mathcal {Q}_{2n}(\varvec{\theta }^{*})|\tilde{\mathcal {X}})\). It is easy to obtain \(\mathcal {R}_n(\varvec{\theta }^{*})=o_p(1)\), then we have \( \mathcal {Q}_{2n}(\varvec{\theta }^{*})= \mathcal {Q}_{2n1}(\varvec{\theta }^{*})+\mathcal {Q}_{2n2}(\varvec{\theta }^{*})+o_p(1)\).
Next, we consider \(\mathcal {Q}_{2n1}(\varvec{\theta }^{*})\) and \(\mathcal {Q}_{2n2}(\varvec{\theta }^{*})\), respectively. For \(\mathcal {Q}_{2n1}(\varvec{\theta }^{*})\), let
By the Lemma 2, it is easy to have \(\mathcal {G}_n^{\tilde{\varvec{\theta }}}=2\mathcal {G}+O(h^2+\delta _n+\delta _{\varvec{\theta }})\), where the definition of \(\mathcal {G}\) can be seen in Theorem 1.
Denote \(W_{\varvec{\theta }}(\mathbf {x})=\mathrm {E}(f_Y(q_{\tau }(\mathbf {X},\mathbf {Z})|\mathbf {X}^T\varvec{\theta })\mathbf {ZZ}^T|\mathbf {X}^T\varvec{\theta }=\mathbf {x}^T\varvec{\theta })\), then
For \(\mathcal {Q}_{2n2}(\varvec{\theta }^{*})\), note that
Hence, we obtain
where
Now, we begin to consider \(\mathcal {Q}_{2n21}\) and \(\mathcal {Q}_{2nn2}\). By the asymptotic expressions \(\hat{\mathbf {g}}(\mathbf {x}^T\tilde{\varvec{\theta }})\) and \(\hat{\mathbf {g}}'(\mathbf {X}^T\tilde{\varvec{\theta }})\) obtained in Lemma 4, we have
where
By directly calculating, it follows that
Combining \(T_1\) and \(\mathcal {Q}_{1n}(\varvec{\theta }^{*})\), we have
where \(\mathcal {W}_n=\frac{1}{\sqrt{n}} \sum \limits _{i=1}^n \sum \limits _{j=1}^n \psi _{\tau }(\varepsilon _i) \omega _{ij} \hat{\mathbf {g}}'(\tilde{U}_j) ^T\mathbf {Z}_i\left[ \mathbf {X}_i-\mu _{\varvec{\theta }}(\mathbf {X}_j) \right] \). By the Lemma 2, we obtain
According to the Cramér–Wald device and the central limit theorem, we have
where the definition of \(\mathcal {G}_0\) is given in Theorem 1.
Merging \(T_2\) and \(\mathcal {Q}_{2n22}\), we obtain
By Lemmas 2 and 3, it is easy to obtain
Therefore, by (21), (22) and (25), we have
By the Lemma 1, the minimizer \(\hat{\varvec{\theta }}^{*}\) of \(\mathcal {Q}_n(\varvec{\theta }^{*})\) can be written as \(\hat{\varvec{\theta }}^{*}=\frac{1}{2}\mathcal {G}^{-1}\mathcal {W}_n+\frac{1}{2}\sqrt{n} \varvec{\theta }_d +o_p(1).\) Note that \(\hat{\varvec{\theta }}^{*}=\sqrt{n} \left( \hat{\varvec{\theta }}-\varvec{\theta } \right) \), then we have
The convergence of the estimate algorithm can be followed by the above equation.
Define \(\tilde{{\theta }}_k\) as the kth estimate, \(\forall k\), the Eq. (26) still satisfies if we replace \(\tilde{\varvec{\theta }}\) and \(\hat{\varvec{\theta }}\) as \(\tilde{{\theta }}_k\) and \(\tilde{{\theta }}_{k+1}\), respectively. Therefore, for the sufficiently large k, we have \( \hat{\varvec{\theta }}-\varvec{\theta } =\mathcal {G}^{-1} \frac{1}{\sqrt{n}} \mathcal {W}_n+\frac{1}{2}\left( \hat{\varvec{\theta }}-\varvec{\theta } \right) +o(1/\sqrt{n})\). Then
Combining the above result in (24), we complete the proof of Theorem 1. \(\square \)
Lemma 5
Suppose u is an inner point of the tight support of \(f_{\mathcal {U}}(\cdot )\), and the conditions A.1–A.7 in appendix hold, then we have
where \(\Gamma _{\tau }(\cdot )\) is defined in Theorem 2.
Proof of Lemma 5
When the parameter \(\varvec{\theta }\) is known, for the given interior point \(u=\mathbf {x}^T\varvec{\theta }\) of \(\mathcal {U}\), denote \(R_{n1}^{\varvec{\theta }}(\mathbf {x})\) as \(R_{n1}^{\varvec{\theta }}\). By the similar proof as Theorem 4, the estimate of g(u) can be written as
By the central limit theorem, it is easy to prove
By the Lemma 4, we consider the difference between the two estimate
Since \(\varvec{\theta }_d=O_p(1/\sqrt{n})\), we only need to prove
When the bandwidth h satisfies \(nh^4 \rightarrow \infty \), since \(\varvec{\theta }_d=O_p(1/\sqrt{n})\), by directly calculating, we have
Therefore (28) holds and the proof of the Lemma 5 is completed. \(\square \)
Proof of Theorem 2
Given the interior \(\mathbf {x}\) of \(\Xi \), we have
By Taylor expansion,
By the result of Lemma 5, we can conclude that Theorem 2 holds. \(\square \)
Proof of Theorem 3
For convenience, redefine \(\mathbf {u}=\sqrt{n}(\hat{\varvec{\theta }}^{\lambda }-\varvec{\theta })\), \(\hat{\varvec{\theta }}_d=\hat{\varvec{\theta }}^{QR}-\varvec{\theta }\), where \(\hat{\varvec{\theta }}^{QR}\) is the estimate of \(\varvec{\theta }\) in Theorem 1. Then, \(\mathbf {u}\) is the minimizer of the following object function:
Similar to the proof of Theorem 1, we can write \(G_n(\mathbf {u})\) as:
For \(1\le k\le p_0\), \({\theta }_k\ne 0\), we have \(|\hat{{\theta }}_k^{QR}|^2\rightarrow _p |{\theta }_k|^2\), and \(\sqrt{n}(|{\theta }_k+u_k/\sqrt{n}|-|{\theta }_k|)\rightarrow u_k \mathrm {sgn}({\theta }_k)\). By the Slutsky’s Theorem, \(\frac{\lambda }{\sqrt{n}|\hat{{\theta }}_k^{QR}|^2} \sqrt{n}(|{\theta }_{ k}+u_k/\sqrt{n}|-|{\theta }_{ k}|)\rightarrow _p 0\).
For \(p_0<k\le p\), \({\theta }_{k}=0\), then we have \(\sqrt{n}(|{\theta }_k+u_k/\sqrt{n}|-|{\theta }_k|)\rightarrow _p\infty ~\). Therefore, we have
For \(\varvec{\theta }=\left( {\begin{array}{c} \varvec{\theta }^1 \\ \varvec{\theta }^2 \end{array}} \right) \), denote \(\mathbf {u}=\left( {\begin{array}{c} \mathbf {u}_1\\ \mathbf {u}_2 \end{array}} \right) \), we have
Note that \(G_n(\mathbf {u})\) is convex about u, and \(L(\mathbf {u})\) has unique minimal solution. By the epi-convergence result Geyer (1994), we can obtain the asymptotic normality by following the proof of Theorem 1.
Next, we consider the convergence of the model selection. Note that the form of two formulas \(G_n(\mathbf {u})\) and \(L(\mathbf {u})\) are similar to Zou (2006), and by the condition A.8, \(\mathcal {G}\) is positive definite; hence, we can easily obtain the model consistency by following the idea of Zou (2006). \(\square \)
About this article
Cite this article
Zhao, W., Zhang, R., Lv, Y. et al. Quantile regression and variable selection of single-index coefficient model. Ann Inst Stat Math 69, 761–789 (2017). https://doi.org/10.1007/s10463-016-0558-9
Received:
Revised:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10463-016-0558-9