Skip to main content
Log in

Shrinkage estimation of varying covariate effects based on quantile regression

  • Published:
Statistics and Computing Aims and scope Submit manuscript

Abstract

Varying covariate effects often manifest meaningful heterogeneity in covariate-response associations. In this paper, we adopt a quantile regression model that assumes linearity at a continuous range of quantile levels as a tool to explore such data dynamics. The consideration of potential non-constancy of covariate effects necessitates a new perspective for variable selection, which, under the assumed quantile regression model, is to retain variables that have effects on all quantiles of interest as well as those that influence only part of quantiles considered. Current work on l 1-penalized quantile regression either does not concern varying covariate effects or may not produce consistent variable selection in the presence of covariates with partial effects, a practical scenario of interest. In this work, we propose a shrinkage approach by adopting a novel uniform adaptive LASSO penalty. The new approach enjoys easy implementation without requiring smoothing. Moreover, it can consistently identify the true model (uniformly across quantiles) and achieve the oracle estimation efficiency. We further extend the proposed shrinkage method to the case where responses are subject to random right censoring. Numerical studies confirm the theoretical results and support the utility of our proposals.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

References

  • Belloni, A., Chernozhukov, V.: l 1 penalized quantile regression in high-dimensional sparse models. Ann. Stat. 82, 82–130 (2011)

    Article  MathSciNet  Google Scholar 

  • Carey, J.R., Liedo, P., Orozco, D., Tatar, M., Vaupel, J.W.: A male-female longevity paradox in medfly cohorts. J. Anim. Ecol. 64, 107–116 (1995)

    Article  Google Scholar 

  • Dickson, E., Grambsch, P., Fleming, T., Fisher, L., Langworthy, A.: Prognosis in primary biliary cirrhosis: model for decision making. Hepatology 10, 1–7 (1989)

    Article  Google Scholar 

  • Fan, J., Li, R.: Variable selection via nonconcave penalized likelihood and its oracle properties. J. Am. Stat. Assoc. 96, 1348–1360 (2001)

    Article  MATH  MathSciNet  Google Scholar 

  • Frank, I., Friedman, J.: A statistical view of some chemometrics regression tools. Technometrics (1993)

  • Goodman, V., Kuelbs, J., Zinn, J.: Some results on the Lil in Banach space with applications to weighted empirical processes. Ann. Probab. 9, 713–752 (1981)

    Article  MATH  MathSciNet  Google Scholar 

  • Huang, Y.: Calibration regression of censored lifetime medical cost. J. Am. Stat. Assoc. 98 (2002)

  • Huang, Y.: Quantile calculus and censored regression. Ann. Stat. 38(3), 1607–1637 (2010)

    Article  MATH  Google Scholar 

  • Jensen, G., Torp-Pedersen, C., Hildebrandt, P., Kober, L., Nielsen, F., Melchior, T., Joen, T., Andersen, P.: Does in-Hospital fibrillation affect prognosis after myocardial infarction? Eur. Heart J. 18, 919–924 (1997)

    Article  Google Scholar 

  • Kai, B., Li, R., Zou, H.: New efficient estimation and variable selection methods for semiparametric varying-coefficient partially linear models. Ann. Stat. 39, 305–332 (2011)

    Article  MATH  MathSciNet  Google Scholar 

  • Kaslow, R., Ostrow, D., Detels, R., Phair, J., Polk, B., Rinaldo, C.: The multicenter aids cohort study: rationale, organization and selected characteristics of the participants. Am. J. Epidemiol. 126, 310–318 (1987)

    Article  Google Scholar 

  • Knight, K., Fu, W.: Asymptotics for lasso-type estimators. Ann. Stat., 1356–1378 (2000)

  • Koenker, R.: Quantile Regression. Cambridge University Press, Cambridge (2005)

    Book  MATH  Google Scholar 

  • Koenker, R.: quantreg: quantile regression (2011). http://www.r-project.org

  • Koenker, R., Bassett, G.: Regression quantiles. Econometrica 46, 33–50 (1978)

    Article  MATH  MathSciNet  Google Scholar 

  • Koenker, R., d’Orey, V.: Computing regression quantiles. Appl. Stat. 36, 383–393 (1987)

    Article  Google Scholar 

  • Kutner, N., Clow, P., Zhang, R., Aviles, X.: Association of fish intake and survival in a cohort of incident dialysis patients. Am. J. Kidney Dis. 39, 1018–1024 (2002)

    Article  Google Scholar 

  • Li, Y., Zhu, J.: Quantile regression in reproducing kernel Hilbert spaces. J. Comput. Graph. Stat. 17, 163–185 (2005)

    Article  Google Scholar 

  • Lustig, I., Marsden, R., Shanno, D.: Interior point methods for linear programming: computational state of the art with discussion. ORSA J. Comput. 6, 1–14 (1994)

    Article  MATH  MathSciNet  Google Scholar 

  • Ma, Y., Yin, G.: Semiparametric median residual life model and inference. Can. J. Stat. 38, 665–679 (2010)

    Article  MATH  MathSciNet  Google Scholar 

  • Madsen, K., Nielsen, H.: A finite smoothing algorithm for linear l 1 estimation. SIAM J. Optim. 3, 223–235 (1993)

    Article  MATH  MathSciNet  Google Scholar 

  • McDonald, G., Schwing, R.: Instabilities of regression estimates relating air pollution to mortality. Technometrics 15, 463–481 (1973)

    Article  Google Scholar 

  • Neocleous, T., Vanden Branden, K., Portnoy, S.: Correction to censored regression quantiles by Portnoy, s. 1001–1012 (2003) J. Am. Stat. Assoc. 101(474), 860–861 (2006)

    Article  MathSciNet  Google Scholar 

  • Peng, L., Fine, J.: Competing risks quantile regression. J. Am. Stat. Assoc. 104 (2009)

  • Peng, L., Huang, Y.: Survival analysis with quantile regression models. J. Am. Stat. Assoc. 103(482), 637–649 (2008)

    Article  MATH  MathSciNet  Google Scholar 

  • Portnoy, S.: Censored regression quantiles. J. Am. Stat. Assoc. 98(464), 1001–1012 (2003)

    Article  MATH  MathSciNet  Google Scholar 

  • Portnoy, S., Lin, G.: Asymptotics for censored regression quantiles. J. Nonparametr. Stat. 22, 115–130 (2010)

    Article  MATH  MathSciNet  Google Scholar 

  • Rocha, G., Wang, X., Yu, B.: Asymptotic distribution and sparsistency for l1- penalized parametric m-estimators with applications to linear svm and logistic regression (2009). http://arxiv.org/abs/0908.1940

  • Thorogood, J., Persijn, G., Schreuder, G., D’amaro, J., Zantvoort, F., Van Houwelingen, J., Van Rood, J.: The effect of Hla matching on kidney graft survival in separate posttransplantation intervals. Transplantation 50, 146–150 (1990)

    Article  Google Scholar 

  • Tibshirani, R.: Regression shrinkage and selection via the Lasso. J. R. Stat. Soc. B 58, 267–288 (1996)

    MATH  MathSciNet  Google Scholar 

  • Tsiatis, A.A.: Estimating regression parameters using linear rank tests for censored data. Ann. Stat. 18, 354–372 (1990)

    Article  MATH  MathSciNet  Google Scholar 

  • Van der Vaart, A., Wellner, J.: Weak Convergence and Empirical Processes: With Applications to Statistics. Springer, Berlin (2000)

    Google Scholar 

  • Verweij, P., Van Houwelingen, H.: Time-dependent effects of fixed covariates in cox regression. Biometrics 51, 1550–1556 (1995)

    Article  MATH  Google Scholar 

  • Wang, H., Leng, C.: Unified lasso estimation by least squares approximation. J. Am. Stat. Assoc. 102, 1039–1048 (2007)

    Article  MATH  MathSciNet  Google Scholar 

  • Wang, H., Xia, Y.: Shrinkage estimation of the varying coefficient model. J. Am. Stat. Assoc. 104, 747–757 (2009)

    Article  MathSciNet  Google Scholar 

  • Wei, L.J., Gail, M.H.: Nonparametric estimation for a scale-change with censored observations. J. Am. Stat. Assoc. 78, 382–388 (1983)

    Article  MATH  MathSciNet  Google Scholar 

  • Wu, Y., Liu, Y.: Variable selection in quantile regression. Stat. Sin. 19, 801–817 (2009)

    MATH  Google Scholar 

  • Ying, Z.: A large sample study of rank estimation for censored data. Ann. Stat. 21, 76–99 (1993)

    Article  MATH  Google Scholar 

  • Zhang, H., Lu, W.: Adaptive lasso for Cox’s proportional hazards model. Biometrika 94, 691–703 (2007)

    Article  MATH  MathSciNet  Google Scholar 

  • Zou, H.: The adaptive lasso and its oracle properties. J. Am. Stat. Assoc. 101, 1418–1429 (2006)

    Article  MATH  Google Scholar 

  • Zou, H., Yuan, M.: Composite quantile regression and the oracle model selection theory. Ann. Stat. 36, 1108–1126 (2008)

    Article  MATH  MathSciNet  Google Scholar 

Download references

Acknowledgements

The authors are grateful to the editor, associate editor, and the two referees for many helpful comments. This research has been supported by the National Heart, Lung, And Blood Institute of the National Institute of Health under Award Number R01HL 113548 (the first author). The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institute of Health.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Limin Peng.

Appendices

Appendix A: Proof of Theorem 1

Define B n,τ (C)={β: β=β 0(τ)+(n −1/2 2 n)u,∥u∥≤C}, and let ∂B n,τ (C) denote the boundary set of B n,τ (C). Since \(W_{n, \lambda_{n}}({\boldsymbol {\beta }}; \tau)\) is convex in β for all τΔ, it is sufficient to show that for any ϵ>0, there exists C 0>0 and N 0>0 such that for nN 0,

$$\begin{aligned} &{\operatorname {pr}\Bigl(\inf_{\tau\in \varDelta }\Bigl[\inf_{{\boldsymbol {\beta }}\in\partial B_{n, \tau}(C_0)} W_{n, \lambda_n}({\boldsymbol {\beta }}; \tau)-W_{n, \lambda_n}\bigl\{{\boldsymbol {\beta }}_0( \tau); \tau\bigr\}\Bigr]>0\Bigr) } \\ &{\quad \geq1-\epsilon. } \end{aligned}$$
(2)

To show (2), first note that

Write and u(τ;β)=ββ 0(τ), and let \(\mathcal {D}\) and \(\mathcal {D}_{i}\) respectively denote operators such that \(\mathcal {D}(U)=U-E(U)\) and \(\mathcal {D}_{i}(U)=U-E(U|\boldsymbol {Z}_{i})\) for a random variable U. We can further decomposed the term I as I=III+IV, where

and

For the term IV 1, we can show that the function class, , is a Donsker class (page 81 in Van der Vaart and Wellner (2000)), where \(\mathcal{A}(C_{0})=\{\boldsymbol {b}\in R^{p}:\ \inf_{\tau\in \varDelta }\|\boldsymbol {b}-{\boldsymbol {\beta }}_{0}(\tau)\|\leq C_{0}\}\). Given the uniform boundedness of the functional class \(\mathcal{F}\) and since \(\mathcal{A}(C_{0})\) covers \(\mathcal{B}_{n,\tau}(C_{0})\) for all τΔ, we can apply the functional law of the iterated logarithm (LIL) (Goodman et al. 1981) to \(n^{-1}\sum_{i=1}^{n}\) , and get

$$\sup_{\tau\in \varDelta , {\boldsymbol {\beta }}\in\partial B_{n, \tau }(C_0)}|\mathit{IV}_1|\leq O_p \bigl(n^{-1/2}\ell_2 n\bigr) \bigl(C_0 n^{-1/2}\ell_2 n\bigr). $$

Similarly, we can show that, for j=2,…,8,

$$\sup_{\tau\in \varDelta , {\boldsymbol {\beta }}\in\partial B_{n, \tau}(C_0)}\|\mathit{IV}_j\|\leq O_p \bigl(n^{-1/2}\ell_2 n\bigr) \bigl(C_0 n^{-1/2}\ell_2 n\bigr). $$

Therefore, we have

$$ \sup_{\tau\in \varDelta , {\boldsymbol {\beta }}\in\partial B_{n, \tau}(C_0)}|\mathit{IV}|\leq O_p \bigl(n^{-1/2}\ell_2 n\bigr) \bigl(C_0 n^{-1/2}\ell_2 n\bigr). $$
(3)

For the term III, note that, under condition C2 (i), inf τΔ,z f τ (0|z)>0. By the definition of B n,τ (C 0), there exists N 1>0 such that for nN 1,

This, coupled with condition C2 (ii), implies that for n>N 1, f τ (x|z)>inf τΔ,z f τ (0|z)/2 for any with β∂B n,τ (C 0). Let δ 2≡inf τ,z f τ (0|z)/2 and , where \({\rm eig}\min(\cdot)\) denotes the minimum eigenvalue of a matrix. Let E Z (⋅) denote the expectation with regard to Z. We get, for any τΔ and β∂B n,τ (C 0),

Since a similar result can be shown for the second term in III, it follows that

$$ \inf_{{\boldsymbol {\beta }}\in\partial B_{n, \tau}(C_0), \tau\in \varDelta } \mathit{III}\geq C_0^2 O_p\bigl((\ell_2 n)^2/n\bigr). $$
(4)

For the term II, it is easy to see that

$$\begin{aligned} \mathit{II}\geq -\frac{\lambda_n}{n}\sum_{j=2}^s\bigl| \beta^{(j)}-\beta_0^{(j)}(\tau )\bigr|w_{n,j}( \tau). \end{aligned}$$

By the uniform consistency of \(\tilde{\beta}_{n}(\tau)\), for 2≤js, \(\sup_{\tau\in \varDelta } |w_{n,j}(\tau)|=(\sup_{\tau\in \varDelta }|\tilde{\beta}_{n}^{(j)}(\tau )|)^{-1}=O_{p}(1)\). Therefore, with (n 1/2 2 n)−1 λ n =O(1),

$$ \inf_{{\boldsymbol {\beta }}\in\partial B_{n, \tau}(C_0), \tau\in \varDelta } \mathit{II}\geq-C_0 \cdot O_p\bigl((\ell_2 n)^2/n\bigr). $$
(5)

Based on equations (3), (4), and (5), it follows that

$$\begin{aligned} &{\inf_{\tau\in \varDelta , {\boldsymbol {\beta }}\in\partial B_{n, \tau}(C_0)}\bigl[n^{-1}W_{n, \lambda_n}({\boldsymbol {\beta }}; \tau)-n^{-1} W_{n, \lambda_n}\bigl\{{\boldsymbol {\beta }}_0(\tau); \tau \bigr\}\bigr]} \\ &{\quad \geq C_0^2\cdot O_p\bigl(( \ell_2 n)^2/n\bigr)-C_0\cdot O_p \bigl((\ell_2 n)^2/n\bigr).} \end{aligned}$$

Therefore, (2) holds if we choose C 0 large enough. This completes the proof of Theorem 1.

Appendix B: Proof of Theorem 2

Define \(\operatorname {sgn}(x)=I(x>0)-I(x<0)\), \(U_{n, j}({\boldsymbol {\beta }}; \tau)= \frac {\partial W_{n, \lambda_{n}}({\boldsymbol {\beta }}; \tau)}{\partial\beta^{(j)}}\), and let . Define , , and .

First, by Theorem 1, for j=2,…,s, we have

$$\begin{aligned} &{\lim_{n\rightarrow\infty} \operatorname {pr}\Bigl(\sup _{\tau\in \varDelta }\bigl|\widehat{\beta}_{n, \lambda_n}^{\mathrm {US}(j)}(\tau)\bigr|=0 \Bigr) } \\ &{\quad \leq\lim_{n\rightarrow\infty} \operatorname {pr}\Bigl(\sup_{\tau\in \varDelta }\bigl| \widehat{\beta}_{n, \lambda_n}^{\mathrm {US}(j)}(\tau)-\beta_0^{(j)}( \tau)\bigr|\geq \sup_{\tau\in \varDelta }\bigr|\beta_0^{(j)}(\tau)\bigr| \Bigr) } \\ &{\quad =0.} \end{aligned}$$
(6)

Next, we note that given the uniform consistency of \(\widehat{{\boldsymbol {\beta }}}_{n, \lambda_{n}}^{\mathrm {US}}(\tau)\) implied by Theorem 1, it can be shown by following the lines of Lemma 1 in Peng and Huang (2008) that

(7)

This implies, for j=s+1,…,p,

$$\begin{aligned} &{\sup_{\tau\in \varDelta }\bigl|E_{n, j}(\tau)\bigr| }\\ &{\quad \equiv\sup_{\tau\in \varDelta }\bigl|n^{-1/2} U_{n, j} \bigl\{\widehat{{\boldsymbol {\beta }}}_{n, \lambda_n}^{\mathrm {US}}(\tau); \tau\bigr\} -n^{-1/2} U_{n, j}\bigl\{{\boldsymbol {\beta }}_0(\tau); \tau\bigr\} } \\ &{\qquad{} -n^{1/2}\bigl[\mu_j\bigl\{\widehat{{\boldsymbol {\beta }}}_{n, \lambda_n}^{\mathrm {US}}(\tau)\bigr\}-\mu_j\bigl\{ {\boldsymbol {\beta }}_0(\tau)\bigr\}\bigr] } \\ &{\qquad{} -n^{-1/2}\lambda_n w_{n, j}(\tau)\operatorname {sgn}\bigl\{\widehat{\beta}_{n,\lambda_n}^{\mathrm {US}(j)}(\tau)\bigr\}\bigr|=o_p(1)} \end{aligned}$$
(8)

By the definition of \(\widehat{{\boldsymbol {\beta }}}_{n, \lambda_{n}}^{\mathrm {US}}(\tau)\),

$$ \sup_{\tau\in \varDelta }\bigl|n^{-1/2} U_{n, j} \bigl\{\widehat{{\boldsymbol {\beta }}}_{n, \lambda_n}^{\mathrm {US}}(\tau); \tau\bigr \}\bigr|=o_p(1). $$
(9)

Applying Functional LIL to n −1 U n,j {β 0(τ);τ} gives

$$ \sup_{\tau\in \varDelta }\bigl|n^{-1/2}U_{n,j}\bigl \{{\boldsymbol {\beta }}_0(\tau); \tau\bigr\}\bigr|=O_p(\ell_2 n). $$
(10)

An application of Taylor expansion to \(\mu_{j}\{\widehat{{\boldsymbol {\beta }}}_{n, \lambda_{n}}^{\mathrm {US}}(\tau)\}-\mu_{j}\{{\boldsymbol {\beta }}_{0}(\tau)\}\), coupled with Theorem 1 and the uniform boundedness of A j (β), shows that

$$ \sup_{\tau\in \varDelta }\bigl|n^{1/2}\bigl[ \mu_j\bigl\{\widehat{{\boldsymbol {\beta }}}_{n, \lambda_n}^{\mathrm {US}}(\tau)\bigr \}-\mu_j\bigl\{{\boldsymbol {\beta }}_0(\tau)\bigr\} \bigr]\bigr|=O_p(\ell_2 n). $$
(11)

In addition, the proof of Theorem 1 can be used to justify that \(\sup_{\tau\in \varDelta }|\tilde{\beta}_{n}^{(j)}(\tau )|=O_{p}(n^{-1/2}\ell_{2} n)\) and thus

$$ \lim_{n\rightarrow\infty} \biggl(\frac{1}{\ell_2 n} \biggr) \bigl(n^{-1/2}\lambda_n w_{n,j}(\tau) \bigr)=\infty. $$
(12)

By (9), (10), (11), and (12), with a fixed M>0,

$$\lim_{n\rightarrow\infty} \operatorname {pr}\Bigl(\sup_{\tau\in \varDelta }\bigl|E_{n,j}( \tau )\bigr|\leq M, \sup_{\tau\in \varDelta }\bigl|\widehat{{\boldsymbol {\beta }}}_{n, \lambda_n}^{\mathrm {US}(j)}( \tau)\bigr|\ne0\Bigr)=0. $$

This, coupled with (8), implies that, for j=s+1,…,p,

$$\begin{aligned} &{\lim_{n\rightarrow\infty} \operatorname {pr}\Bigl(\sup_{\tau\in \varDelta }\bigl| \widehat{{\boldsymbol {\beta }}}_{n, \lambda_n}^{\mathrm {US}(j)}(\tau)\bigr|\ne 0\Bigr) } \\ &{\quad \leq\lim_{n\rightarrow\infty}\Bigl\{\operatorname {pr}\Bigl(\sup_{\tau\in \varDelta }\bigl|E_{n,j}( \tau )\bigr|\leq M, \sup_{\tau\in \varDelta }\bigr|\widehat{{\boldsymbol {\beta }}}_{n, \lambda_n}^{\mathrm {US}(j)}( \tau)\bigr|\ne0\Bigr) } \\ &{\qquad{} +\operatorname {pr}\bigl(\bigl|E_{n,j}(\tau)\bigr|>M\bigr)\Bigr\}=0.} \end{aligned}$$
(13)

The proof of Theorem 2(i) is completed based on (6) and (13).

Let \({\buildrel a\over =}\) denote asymptotic equivalence in the sense that the difference converges to zero in probability uniformly in τΔ. Define . By the result in Theorem 2(i), we have

$$n^{-1/2}\boldsymbol {U}_n(\bar{{\boldsymbol {\beta }}}_n; \tau)\,{\buildrel a\over =}\, n^{-1/2} \boldsymbol {U}_n\bigl(\widehat{{\boldsymbol {\beta }}}_{n, \lambda_n}^{\mathrm {US}}; \tau\bigr) $$

and hence \(n^{-1/2}\boldsymbol {U}_{n}(\bar{{\boldsymbol {\beta }}}_{n}; \tau){\buildrel a\over =}0\). Using the result in (7) and applying Taylor expansion to \(\mu_{j}\{\bar{{\boldsymbol {\beta }}}_{n}(\tau)\}-\mu_{j}\{{\boldsymbol {\beta }}_{0}(\tau)\}\), we get

(14)

where \(\check{{\boldsymbol {\beta }}}(\tau)\) is on the line segment between β 0(τ) and \(\bar{{\boldsymbol {\beta }}}(\tau)\), and .

Since lim n→∞ n −1/2 λ n =0, sup τΔ |w n,j (τ)|=O p (1) for j=1,…,s, and \(A\{\check{{\boldsymbol {\beta }}}(\tau)\}{\buildrel a\over =}A\{{\boldsymbol {\beta }}_{0}(\tau)\}\) given the uniform convergence of \(\widehat{{\boldsymbol {\beta }}}_{n,\lambda_{n}}^{\mathrm {US}}(\tau)\) to β 0(τ), it follows from (14) that

(15)

where A 11(⋅) stands for the submatrix of A(⋅) formed by the first s rows and columns. An application of the Donsker theorem based on (15) thus shows that \(n^{1/2}\{\widehat{{\boldsymbol {\beta }}}_{n,\lambda_{n}}^{\mathrm {US}(1:s)}(\tau)-{\boldsymbol {\beta }}_{0}^{(1:s)}(\tau)\}\) converges weakly to a mean zero Gaussian process with the covariance matrix,

Using similar steps, we can show that (15) still holds when \(\widehat{{\boldsymbol {\beta }}}_{\rm oracle}(\tau)\) is in place of \(\widehat{{\boldsymbol {\beta }}}_{n,\lambda_{n}}^{\mathrm {US}(1:s)}(\tau)\). This completes the proof of Theorem 2.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Peng, L., Xu, J. & Kutner, N. Shrinkage estimation of varying covariate effects based on quantile regression. Stat Comput 24, 853–869 (2014). https://doi.org/10.1007/s11222-013-9406-4

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11222-013-9406-4

Keywords

Navigation