Abstract
In this paper, we study model selection and model averaging for quantile regression with randomly right censored response. We consider a semi-parametric censored quantile regression model without distribution assumptions. Under general conditions, a focused information criterion and a frequentist model averaging estimator are proposed, and theoretical properties of the proposed methods are established. The performances of the procedures are illustrated by extensive simulations and the primary biliary cirrhosis data.
Similar content being viewed by others
References
Akaike H (1973) Maximum likelihood identification of Gaussian autoregressive moving average models. Biometrika 22:203–217
Bang H, Tsiatis A (2002) Median regression with censored cost data. Biometrics 58:643–649
Behl P, Claeskens G, Dette H (2014) Focused model selection in quantile regression. Stat Sin 24:601–624
Claeskens G, Croux C, Van Kerckhoven J (2006) Variable selection for logistic regression using a prediction-focused information criterion. Biometrics 62:972–979
Claeskens G, Carroll RJ (2007) An asymptotic theory for model selection inference in general semiparametric problems. Biometrika 94:249–265
Claeskens G, Hjort NL (2003) The focused information criterion (with discussion). J Am Stat Assoc 98:900–916
Deng GH, Liang H (2010) Model averaging for semiparametric additive partial linear models. Scin China math 53:1363–1376
Du J, Zhang ZZ, Xie TF (2013) Focused information criterion and model averaging in quantile regression. Commun Stat Theory Methods 42:3716–3734
Hansen BE (2007) Least squares model averaging. Econometrica 75:1175–1189
Hansen BE (2008) Least squares forecast averaging. J Econ 146:342–350
Hjort NL, Claeskens G (2003) Frequentist model average estimators (with discussion). J Am Stat Assoc 98:879–945
Hjort NL, Claeskens G (2006) Focussed information criteria and model averaging for Coxs hazard regression model. J Am Stat Assoc 101:1449–1464
Kaplan EL, Meier P (1958) Nonparametric estimation from incomplete observations. J Am Stat Assoc 53:457–481
Kitagawa T, Muris C (2016) Model averaging in semiparametric estimation of treatment effects. J Econ 19:271–289
Koenker R (2004) Quantile regression for longitudinal data. J Multivar Anal 91:74–89
Koenker R (2005) Quantile regression. Cambridge University Press, Cambridge
Liu C (2015) Distribution theory of the least squares averaging estimator. J Econ 186:142–159
Pang L, Lu W, Wang H (2012) Variance estimation in censored quantile regression via induced smoothing. Comput Stat Data Anal 56:785–796
Peng L, Huang Y (2008) Survival analysis with quantile regression models. J Am Stat Assoc 103:637–649
Pircalabelu E, Claeskens G, Waldorp L (2015) A focused information criterion for graphical models. Stat Comput 25:1071–1092
Pollard D (1991) Asymptotics for least absolute deviation regression estimators. Econom Theory 7:186–199
Powell JL (1984) Least absolute deviations estimation for the censored regression model. J Econ 25:303–325
Powell JL (1986) Censored regression quantiles. J Econ 32:143–155
Qian J, Peng L (2010) Censored quantile regression with partially functional effects. Biometrika 97:839–850
Schwarz G (1978) Estimating the dimension of a model. Ann Stat 6:461–464
Shows J, Lu W, Zhang H (2010) Sparse estimation and inference for censored median regression. J Stat Plann Inference 140:1903–1917
Therneau TM, Grambsch PM (2001) Introduction to nonparametric regression. Springer, New York
Tibshirani R (1997) The LASSO method for variable selection in the Cox model. Stat Med 16:385–395
Wang H (2009) Inference on quantile regression for heteroscedastic mixed models. Stat Sin 19:1247–1261
Wang H, Fygenson M (2009) Inference for censored quantile regression models in longitudinal studies. Ann Stat 37:756–781
Wu Y, Liu Y (2009) Variable selection in quantile regression. Stat Sin 37:801–817
Xu J, Leng C, Ying Z (2010) Rank-based variable selection in the accelerated failure time model. Stat Comput 20:165–176
Xu G, Wang S, Huang JZ (2014) Focused information criterion and model averaging based on weighted composite quantile regression. Scand J Stat 41:365–381
Ying Z, Jung SH, Wei LJ (1995) Survival analysis with median regression models. J Am Stat Assoc 90:178–184
Zeng D, Lin DY (2008) Efficient resampling methods for non-smooth estimating functions. Biostatistics 9:355–363
Zhang X, Wan ATK, Zhou SZ (2012) Focused information criteria, model selection and model averaging in a Tobit model with a non-zero threshold. J Bus Econ Stat 30:132–142
Zhang X, Zou G, Liang H (2013) Choice of weights in FMA estimators under general parametric models. Sci China Math 56(3):443–457
Zhang X, Liang H (2011) Focused information criterion and model averaging for generalized additive partial linear models. Ann Stat 39:174–200
Zhang H, Lu W (2007) Adaptive LASSO for Cox’s proportional hazards model. Biometrika 94:1–13
Acknowledgements
The authors would like to thank Prof. Norbert Henze and the anonymous reviewer for their valuable suggestions that improve the presentation and the results of the paper.
Author information
Authors and Affiliations
Corresponding author
Additional information
Du’s research was supported by Grants from the National Natural Science Foundation of China (No. 11261025) and Program for Rixin Talents in Beijing University of Technology (No. 006000514116003). Zhang’s research was supported by the National Natural Science Foundation of China (No. 11271039) and Research Fund of Beijing Education Committee (No. 00600054K1002). Xie’s research was supported by Grants from the National Natural Science Foundation of China (No. 11571340) and the Science and Technology Project of Beijing Municipal Education Commission (KM201710005032).
Appendix
Appendix
We first state some regularity conditions.
Regularity Conditions
-
C1: \(\varepsilon _1,\ldots , \varepsilon _n\) are independent and have a common continuous conditional probability density function \(f(\cdot |X = x)\) satisfying that \(f(0|X=x)\ge b_0>0, |\dot{f}(0|X = x)| \le B_0\) and \(\sup _s |\dot{f}(s|X = x)| \le B_0\) for all possible values x of X, where \((b_0, B_0)\) are two positive constants and \(\dot{f}\) is the derivative of f.
-
C2: The covariate vectors \(X_1,\ldots , X_n\) are independent and have a common compact support, the parameter \(\beta _0\) belongs to the interior of a known compact set \(\mathcal {B}_0\).
-
C3: \(P(t \le T \le C) \ge \zeta _0 > 0\) for any \(t \in [0, c]\), where \(\zeta _0\) is a constant and c is the maximum follow-up.
These regularity conditions guarantee asymptotic normality of the proposed estimator.
In order to prove these theorems, we make a linear approximation to \(\rho _\tau (\varepsilon _i-t)\) by \(D_i=(1-\tau )I_{\{\varepsilon _i<0\}} -\tau I_{\{\varepsilon _i\ge 0\}}\). One intuitive interpretation of \(D_i\) is that \(D_i\) can be thought of as the first derivative of \( \rho _\tau (\varepsilon _i-t)\) at \(t=0\) (Pollard 1991). The assumption that \(\varepsilon _i\) has the \(\tau \)th quantile zero implies \(E(D_i)=0, Var(D_i)=\tau (1-\tau )\). We begin by stating an auxiliary lemma, which plays an important role for the proofs of our main theorems.
Lemma 1
Denote
Under conditions C1–C3, for fixed \(\beta _S\) and \(\beta _0\), we have
where \(W_{n1}=\frac{1}{\sqrt{n}}\sum \limits _{i=1}^n\frac{\delta _i}{G_0(Y_i)}D_i X_i\), and \(\Sigma _0\) is defined in the following proof.
Proof of Lemma 1
Denote \(M(t)=E_X[\frac{\delta }{G_0(Y)}(\rho _\tau (\varepsilon -t)-\rho _\tau (\varepsilon ))]\), and note that \(M(t)=E_X(\rho _\tau (\varepsilon -t)-\rho _\tau (\varepsilon ))\), thereinafter \(E_X\) means the conditional expectation taken for given \(X_i\)s. With condition C1, it is easy to show that M(t) has a unique minimizer at zero, and its Taylor expansion at origin has the form \(M(t)=\frac{f(0)}{2}t^2+o(t^2)\). Hence, for larger n, we have
where
Invoking the law of large numbers, and condition C1 and C2, one has
almost surely, where \(\Sigma _0\) is a \(d\times d\) positive definite matrix.
Therefore, combining with condition C2, one has \(A_n=O(1)\) almost surely uniformly on the compact parameter space. Then, we have
\(G_{n}(\beta _S)\) can be rewritten as
where
By routine calculation, we get
Due to the cancelation of cross product terms, by conditions C2 and C3, we obtain
where \(\parallel \cdot \parallel \) denotes the Euclidean norm. By condition C2, the last term converges to zero almost surely, because
and
almost surely, where \(\Sigma =E(X_1X_1^T).\) Therefore, \(\sum \limits _{i=1}^n[R_{i,n,S} -E(R_{i,n,S})]=o_p(1)\) uniformly on the compact parameter space. Further, one has
This completes the proof. \(\square \)
Lemma 2
Denote
Suppose that conditions C1–C3 hold, we have the following asymptotic representations:
where W is a d-normal random vector with mean \(\varvec{0}\).
Proof of Lemma 2
It is easy to show that \(G_{n}({\widehat{G}},\beta _S)\) can be written as
where
First, we consider \(I_{n1}\), by the Taylor expansion (Shows et al. 2010), we have
where \(y(s)=\lim \limits _{n\rightarrow \infty }\frac{1}{n}\sum _{i=1}^nI(Y_i\ge s), M_i^C(s)=(1-\delta _i)I(Y_i\le s)-\int _0^c I(Y_i\ge s)d\Lambda _C(s),\) and \(\Lambda _C(s)\) is the cumulative hazard function of the censoring time C. This leads to
Similarly, we get
Therefore, similar to Lemma 1, one has
where
Combining Lemma 1, one has
where \( W_{n}=W_{n1}+W_{n2}.\)
Notice that
where \(s_i=\frac{\delta _i}{G_0(Y_i)}X_iD_i-\int _0^c \frac{h(s)}{y(s)}dM_i^C(s)\) and \(h(s)=\lim \limits _{n\rightarrow \infty } \frac{1}{n}\sum _{i=1}^n \frac{\delta _i D_i X_i}{G_0(Y_i)} I(Y_i\ge s)\). By the martingale central limit theorem, \(\frac{1}{\sqrt{n}}\sum \limits _{i=1}^ns_i\) converges in distribution to a d-dimensional normal vector W with mean \(\varvec{0}\) and variance-covariance matrix \(\Sigma _1=E(s_1s_1^T)\). As a consequence, one has
\(\square \)
Lemma 3
(Convexity Lemma) Let \(T_n(\theta ):\theta \in \Theta \) be sequence of random convex functions defined on a convex, open \(\Theta \) of \(R^p\). Suppose \(T(\cdot )\) is a real-value function on \(\Theta \) for which \(T_n(\theta )\rightarrow T(\theta )\) in probability, for each \(\theta \) in \(\Theta \). Then for each compact subset K of \(\Theta \),
in probability, and the function \(T(\cdot )\) is necessarily convex on \(\Theta \).
Proof of Lemma 3
There are many versions of the proof for this well known Convexity Lemma. To save space, we skip its proof . Interested readers are referred to Pollard (1991). \(\square \)
Proof of Theorem 1
Observe that \({\widehat{\beta }}_S(\tau )=\arg \min \limits _{\beta _S}\sum \limits _{i=1}^n \frac{\delta _i}{{\widehat{G}}(Y_i)}\rho _\tau (Y_i-(\Pi _SX)_i^T\beta _S) \), so \({\widehat{\beta }}_S(\tau )\) minimizes
This completes the proof. \(\square \)
The proofs of Theorems 2 and 3 are similar to those of Theorems 2 and 3 in Zhang and Liang (2011), respectively, and we omit them.
Rights and permissions
About this article
Cite this article
Du, J., Zhang, Z. & Xie, T. Focused information criterion and model averaging in censored quantile regression. Metrika 80, 547–570 (2017). https://doi.org/10.1007/s00184-017-0616-1
Received:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00184-017-0616-1