Skip to main content
Log in

Focused information criterion and model averaging in censored quantile regression

  • Published:
Metrika Aims and scope Submit manuscript

Abstract

In this paper, we study model selection and model averaging for quantile regression with randomly right censored response. We consider a semi-parametric censored quantile regression model without distribution assumptions. Under general conditions, a focused information criterion and a frequentist model averaging estimator are proposed, and theoretical properties of the proposed methods are established. The performances of the procedures are illustrated by extensive simulations and the primary biliary cirrhosis data.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

Download references

Acknowledgements

The authors would like to thank Prof. Norbert Henze and the anonymous reviewer for their valuable suggestions that improve the presentation and the results of the paper.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jiang Du.

Additional information

Du’s research was supported by Grants from the National Natural Science Foundation of China (No. 11261025) and Program for Rixin Talents in Beijing University of Technology (No. 006000514116003). Zhang’s research was supported by the National Natural Science Foundation of China (No. 11271039) and Research Fund of Beijing Education Committee (No. 00600054K1002). Xie’s research was supported by Grants from the National Natural Science Foundation of China (No. 11571340) and the Science and Technology Project of Beijing Municipal Education Commission (KM201710005032).

Appendix

Appendix

We first state some regularity conditions.

Regularity Conditions

  • C1: \(\varepsilon _1,\ldots , \varepsilon _n\) are independent and have a common continuous conditional probability density function \(f(\cdot |X = x)\) satisfying that \(f(0|X=x)\ge b_0>0, |\dot{f}(0|X = x)| \le B_0\) and \(\sup _s |\dot{f}(s|X = x)| \le B_0\) for all possible values x of X, where \((b_0, B_0)\) are two positive constants and \(\dot{f}\) is the derivative of f.

  • C2: The covariate vectors \(X_1,\ldots , X_n\) are independent and have a common compact support, the parameter \(\beta _0\) belongs to the interior of a known compact set \(\mathcal {B}_0\).

  • C3: \(P(t \le T \le C) \ge \zeta _0 > 0\) for any \(t \in [0, c]\), where \(\zeta _0\) is a constant and c is the maximum follow-up.

These regularity conditions guarantee asymptotic normality of the proposed estimator.

In order to prove these theorems, we make a linear approximation to \(\rho _\tau (\varepsilon _i-t)\) by \(D_i=(1-\tau )I_{\{\varepsilon _i<0\}} -\tau I_{\{\varepsilon _i\ge 0\}}\). One intuitive interpretation of \(D_i\) is that \(D_i\) can be thought of as the first derivative of \( \rho _\tau (\varepsilon _i-t)\) at \(t=0\) (Pollard 1991). The assumption that \(\varepsilon _i\) has the \(\tau \)th quantile zero implies \(E(D_i)=0, Var(D_i)=\tau (1-\tau )\). We begin by stating an auxiliary lemma, which plays an important role for the proofs of our main theorems.

Lemma 1

Denote

$$\begin{aligned} G_{n}(\beta _S)=\sum \limits _{i=1}^n\frac{\delta _i}{G_0(Y_i)} \left[ \rho _\tau (\varepsilon _i-\frac{1}{\sqrt{n}}X_i^T\Pi _S^T\beta _S+\frac{1}{\sqrt{n}}X_i^T\beta _0)-\rho _\tau (\varepsilon _i)\right] . \end{aligned}$$

Under conditions C1–C3, for fixed \(\beta _S\) and \(\beta _0\), we have

$$\begin{aligned} G_{n}(\beta _S)=\frac{1}{2n}(\beta _S^T\Pi _S-\beta _0^T)\Sigma _0(\Pi _S^T\beta _S-\beta _0)+W_{n1}^T(\Pi _S^T\beta _S-\beta _0)+o_p(1). \end{aligned}$$

where \(W_{n1}=\frac{1}{\sqrt{n}}\sum \limits _{i=1}^n\frac{\delta _i}{G_0(Y_i)}D_i X_i\), and \(\Sigma _0\) is defined in the following proof.

Proof of Lemma 1

Denote \(M(t)=E_X[\frac{\delta }{G_0(Y)}(\rho _\tau (\varepsilon -t)-\rho _\tau (\varepsilon ))]\), and note that \(M(t)=E_X(\rho _\tau (\varepsilon -t)-\rho _\tau (\varepsilon ))\), thereinafter \(E_X\) means the conditional expectation taken for given \(X_i\)s. With condition C1, it is easy to show that M(t) has a unique minimizer at zero, and its Taylor expansion at origin has the form \(M(t)=\frac{f(0)}{2}t^2+o(t^2)\). Hence, for larger n, we have

$$\begin{aligned} E_X(G_{n}(\beta _S))= & {} \frac{1}{2n}(\beta _S^T\Pi _S-\beta _0^T)\left( \sum \limits _{i=1}^n f(0|X_i)X_iX_i^T\right) (\Pi _S^T\beta _S-\beta _0)+o(A_n), \end{aligned}$$

where

$$\begin{aligned} A_n=\frac{1}{2n}(\beta _S^T\Pi _S-\beta _0^T)\left( \sum \limits _{i=1}^n f(0|X_i)X_iX_i^T\right) (\Pi _S^T\beta _S-\beta _0). \end{aligned}$$

Invoking the law of large numbers, and condition C1 and C2, one has

$$\begin{aligned} \frac{1}{ n} \sum \limits _{i=1}^n f(0|X_i)X_iX_i^T \rightarrow \Sigma _0, \end{aligned}$$

almost surely, where \(\Sigma _0\) is a \(d\times d\) positive definite matrix.

Therefore, combining with condition C2, one has \(A_n=O(1)\) almost surely uniformly on the compact parameter space. Then, we have

$$\begin{aligned} E(G_{n}(\beta _S))=\frac{1}{2}(\beta _S^T\Pi _S-\beta _0^T)\Sigma _0(\Pi _S^T\beta _S-\beta _0)+o(1). \end{aligned}$$

\(G_{n}(\beta _S)\) can be rewritten as

$$\begin{aligned} G_{n}(\beta _S)=EG_{n}(\beta _S)+W_{n1}^T(\Pi _S^T\beta _S-\beta _0)+\sum \limits _{i=1}^n[R_{i,n,S} -E(R_{i,n,S})], \end{aligned}$$

where

$$\begin{aligned} R_{i,n,S}= & {} \frac{\delta _i}{G_0(Y_i)}\left( \rho _\tau \left( \varepsilon _i-\frac{1}{\sqrt{n}}X_i^T\Pi _S^T\beta _S+\frac{1}{\sqrt{n}}X_i^T\beta _0\right) \right. \\&\left. -\rho _\tau (\varepsilon _i) -D_i\frac{1}{\sqrt{n}}\left( (\Pi _SX)_i^T\beta _S-X_i^T\beta _0\right) \right) . \end{aligned}$$

By routine calculation, we get

$$\begin{aligned} |R_{i,n,S}|\le \frac{\delta _i}{G_0(Y_i)} |\frac{1}{\sqrt{n}} (\Pi _SX)_i^T\beta _S-\frac{1}{\sqrt{n}}X_i^T\beta _0| I_{\left\{ \varepsilon _i\le |\frac{1}{\sqrt{n}}(\Pi _SX)_i^T\beta _S-\frac{1}{\sqrt{n}}X_i^T\beta _0|\right\} }. \end{aligned}$$

Due to the cancelation of cross product terms, by conditions C2 and C3, we obtain

$$\begin{aligned} E_X\left( \sum \limits _{i=1}^n[R_{i,n,S} -E(R_{i,n,S})]\right) ^2= & {} \sum \limits _{i=1}^n E_X\left( R_{i,n,S} -E(R_{i,n,S})\right) ^2\\\le & {} \sum \limits _{i=1}^n E_X \left( R_{i,n,S}\right) ^2\\\le & {} \sum \limits _{i=1}^n\frac{1}{\zeta ^2_0} \left| \frac{1}{\sqrt{n}}(\Pi _S X)_i^T\beta _S-\frac{1}{\sqrt{n}}X_i^T\beta _0\right| ^2\\&\qquad E_X I_{\{\varepsilon _i\le |\frac{1}{\sqrt{n}}(\Pi _SX)_i^T\beta _S-\frac{1}{\sqrt{n}}X_i^T\beta _0|\}}\\\le & {} \left( \sum \limits _{i=1}^n\frac{1}{\zeta ^2_0}\left| \frac{1}{\sqrt{n}}(\Pi _SX)_i^T\beta _S-\frac{1}{\sqrt{n}}X_i^T\beta _0\right| ^2\right) \\&\qquad E_X I_{\{\varepsilon \le |\frac{1}{\sqrt{n}}\parallel \Pi _S^T \beta _S-\beta _0\parallel \ \max \limits _{i=1,2,\ldots ,n}\parallel X_i\parallel \}} , \end{aligned}$$

where \(\parallel \cdot \parallel \) denotes the Euclidean norm. By condition C2, the last term converges to zero almost surely, because

$$\begin{aligned} \sum \limits _{i=1}^n\left| \frac{1}{\sqrt{n}}X_i^T(\Pi _S^T\beta _S-\beta _0)\right| ^2= & {} (\beta _S^T\Pi _S-\beta _0^T)\frac{1}{n}\sum \limits _{i=1}^nX_iX_i^T(\Pi _S^T\beta _S-\beta _0)\\&\rightarrow (\beta _S^T\Pi _S-\beta _0^T)\Sigma (\Pi _S^T\beta _S-\beta _0), \end{aligned}$$

and

$$\begin{aligned} \max \limits _{i=1,2,\ldots ,n}\parallel X_i\parallel /\sqrt{n}\rightarrow 0, \end{aligned}$$

almost surely, where \(\Sigma =E(X_1X_1^T).\) Therefore, \(\sum \limits _{i=1}^n[R_{i,n,S} -E(R_{i,n,S})]=o_p(1)\) uniformly on the compact parameter space. Further, one has

$$\begin{aligned} G_{n}(\beta _S)= & {} \frac{1}{2}(\beta _S^T\Pi _S-\beta _0^T)\Sigma _0(\Pi _S^T\beta _S-\beta _0)+W_{n1}^T(\Pi _S^T\beta _S-\beta _0)+o_p(1). \end{aligned}$$

This completes the proof. \(\square \)

Lemma 2

Denote

$$\begin{aligned} G_{n}({\widehat{G}},\beta _S)=\sum \limits _{i=1}^n\frac{\delta _i}{{\widehat{G}}(Y_i)} \left[ \rho _\tau (\varepsilon _i-\frac{1}{\sqrt{n}}X_i^T\Pi _S^T\beta _S+\frac{1}{\sqrt{n}}X_i^T\beta _0)-\rho _\tau (\varepsilon _i)\right] . \end{aligned}$$

Suppose that conditions C1–C3 hold, we have the following asymptotic representations:

$$\begin{aligned} G_{n}(\widehat{G},\beta _S)=\frac{1}{2}(\beta _S^T\Pi _S-\beta _0^T)\Sigma _0(\Pi _S^T\beta _S-\beta _0)+W^T(\Pi _S^T\beta _S-\beta _0) +o_p(1), \end{aligned}$$

where W is a d-normal random vector with mean \(\varvec{0}\).

Proof of Lemma 2

It is easy to show that \(G_{n}({\widehat{G}},\beta _S)\) can be written as

$$\begin{aligned} G_{n}({\widehat{G}},\beta _S)=G_{n}(\beta _S)+I_{n1}-I_{n2}, \end{aligned}$$

where

$$\begin{aligned} I_{n1}= & {} \sum \limits \limits _{i=1}^n \delta _i\rho _\tau \left( \varepsilon _i-\frac{1}{\sqrt{n}}X_i^T\Pi _S^T\beta _S+\frac{1}{\sqrt{n}}X_i^T\beta _0\right) \left( \frac{1}{\widehat{G}(Y_i)}-\frac{1}{G_0(Y_i)}\right) ,\\ I_{n2}= & {} \sum \limits _{i=1}^n \delta _i\rho _\tau (\varepsilon _i)\left( \frac{1}{\widehat{G}(Y_i)}- \frac{1}{G_0(Y_i)}\right) , \end{aligned}$$

First, we consider \(I_{n1}\), by the Taylor expansion (Shows et al. 2010), we have

$$\begin{aligned} \sqrt{n}\left( \frac{1}{{\widehat{G}}(Y_i)}-\frac{1}{G_0(Y_i)}\right)= & {} -\frac{\sqrt{n} \{{\widehat{G}}(Y_i)-G_0(Y_i)\}}{G^2_0(Y_i)}+o_p(1)\\= & {} \frac{1}{G_0(Y_i)}\frac{1}{\sqrt{n}}\sum _{j=1}^n \int _0^c I(Y_i\ge s)\frac{dM_j^C(s)}{y(s)}+o_p(1) , \end{aligned}$$

where \(y(s)=\lim \limits _{n\rightarrow \infty }\frac{1}{n}\sum _{i=1}^nI(Y_i\ge s), M_i^C(s)=(1-\delta _i)I(Y_i\le s)-\int _0^c I(Y_i\ge s)d\Lambda _C(s),\) and \(\Lambda _C(s)\) is the cumulative hazard function of the censoring time C. This leads to

$$\begin{aligned} I_{n1}= & {} \sum \limits _{i=1}^n \delta _i\rho _\tau \left( \varepsilon _i-\frac{1}{\sqrt{n}}X_i^T\Pi _S^T\beta _S+\frac{1}{\sqrt{n}}X_i^T\beta _0\right) \\&\frac{1}{G_0(Y_i)}\frac{1}{ n}\sum _{j=1}^n \int _0^c I(Y_i\ge s)\frac{dM_j^C(s)}{y(s)}+o_p(1), \end{aligned}$$

Similarly, we get

$$\begin{aligned} I_{n2}=\sum \limits _{i=1}^n \delta _i\rho _\tau (\varepsilon _i)\frac{1}{G_0(Y_i)}\frac{1}{ n}\sum _{j=1}^n \int _0^c I(Y_i\ge s)\frac{dM_j^C(s)}{y(s)}+o_p(1). \end{aligned}$$

Therefore, similar to Lemma 1, one has

$$\begin{aligned} I_{n1}-I_{n2}= & {} \sum \limits _{i=1}^n \delta _i \rho _\tau \left( \varepsilon _i-\frac{1}{\sqrt{n}}X_i^T\Pi _S^T\beta _S+\frac{1}{\sqrt{n}}X_i^T\beta _0\right) \\&\frac{1}{G_0(Y_i)} \frac{1}{ n}\sum _{j=1}^n \int _0^c I(Y_i\ge s)\frac{dM_j^C(s)}{y(s)}\\&- \sum \limits _{i=1}^n \delta _i \rho _\tau (\varepsilon _i) \frac{1}{G_0(Y_i)} \frac{1}{ n}\sum _{j=1}^n \int _0^c I(Y_i\ge s)\frac{dM_j^C(s)}{y(s)}+o_p(1)\\= & {} -\sum \limits _{i=1}^n \delta _i\left[ \frac{1}{\sqrt{n}}X_i^T\Pi _S^T\beta _S-\frac{1}{\sqrt{n}}X_i^T\beta _0\right] \\&\frac{D_i}{G_0(Y_i)}\frac{1}{ n}\sum _{j=1}^n \int _0^c I(Y_i\ge s)\frac{dM_j^C(s)}{y(s)}+o_p(1)\\= & {} (\beta _S^T\Pi _S-\beta _0)W_{n2}+o_p(1), \end{aligned}$$

where

$$\begin{aligned} W_{n2}=-\sum \limits _{i=1}^n \delta _i\frac{1}{\sqrt{n}}X_i\frac{D_i}{G_0(Y_i)}\frac{1}{ n}\sum _{j=1}^n \int _0^c I(Y_i\ge s)\frac{dM_j^C(s)}{y(s)}. \end{aligned}$$

Combining Lemma 1, one has

$$\begin{aligned} G_{n}({\widehat{G}},\beta _S)= & {} \frac{1}{2}(\beta _S^T\Pi _S-\beta _0^T) \Sigma _0(\Pi _S^T\beta _S-\beta _0)\\&+\,(W_{n1}+W_{n2})^T(\Pi _S^T\beta _S- \beta _0)+o_p(1)\\= & {} \frac{1}{2}(\beta _S^T\Pi _S-\beta _0^T)\Sigma _0(\Pi _S^T\beta _S-\beta _0)\\&+\,W_n^T(\Pi _S^T\beta _S-\beta _0)+o_p(1). \end{aligned}$$

where \( W_{n}=W_{n1}+W_{n2}.\)

Notice that

$$\begin{aligned} W_{n}= & {} W_{n1}+W_{n2}\\= & {} \frac{1}{\sqrt{n}}\sum \limits _{i=1}^n\frac{\delta _i}{G_0(Y_i)}D_i X_i\\&- \sum \limits _{i=1}^n \delta _i\frac{1}{\sqrt{n}}X_i\frac{D_i}{G_0(Y_i)}\frac{1}{ n}\sum _{j=1}^n \int _0^c I(Y_i\ge s)\frac{dM_j^C(s)}{y(s)}\\= & {} \frac{1}{\sqrt{n}}\sum \limits _{i=1}^n\frac{\delta _i}{G_0(Y_i)}X_iD_i- \sum \limits _{i=1}^n\frac{1}{\sqrt{n}}\int _0^c \frac{h(s)}{y(s)}dM_i^C(s)+o_p(1) \\= & {} \frac{1}{\sqrt{n}}\sum \limits _{i=1}^n\left( \frac{\delta _i}{G_0(Y_i)}X_iD_i- \int _0^c \frac{h(s)}{y(s)}dM_i^C(s)\right) +o_p(1) \\= & {} \frac{1}{\sqrt{n}}\sum \limits _{i=1}^ns_i+o_p(1), \end{aligned}$$

where \(s_i=\frac{\delta _i}{G_0(Y_i)}X_iD_i-\int _0^c \frac{h(s)}{y(s)}dM_i^C(s)\) and \(h(s)=\lim \limits _{n\rightarrow \infty } \frac{1}{n}\sum _{i=1}^n \frac{\delta _i D_i X_i}{G_0(Y_i)} I(Y_i\ge s)\). By the martingale central limit theorem, \(\frac{1}{\sqrt{n}}\sum \limits _{i=1}^ns_i\) converges in distribution to a d-dimensional normal vector W with mean \(\varvec{0}\) and variance-covariance matrix \(\Sigma _1=E(s_1s_1^T)\). As a consequence, one has

$$\begin{aligned} G_{n}(\widehat{G},\beta _S)=\frac{1}{2}(\beta _S^T\Pi _S-\beta _0^T) \Sigma _0(\Pi _S^T\beta _S-\beta _0)+W^T(\Pi _S^T\beta _S-\beta _0)+o_p(1). \end{aligned}$$

\(\square \)

Lemma 3

(Convexity Lemma)  Let \(T_n(\theta ):\theta \in \Theta \) be sequence of random convex functions defined on a convex, open \(\Theta \) of \(R^p\). Suppose \(T(\cdot )\) is a real-value function on \(\Theta \) for which \(T_n(\theta )\rightarrow T(\theta )\) in probability, for each \(\theta \) in \(\Theta \). Then for each compact subset K of \(\Theta \),

$$\begin{aligned} \sup \limits _{\theta \in K}|T_n(\theta )-T(\theta )|\rightarrow 0 \end{aligned}$$

in probability, and the function \(T(\cdot )\) is necessarily convex on \(\Theta \).

Proof of Lemma 3

There are many versions of the proof for this well known Convexity Lemma. To save space, we skip its proof . Interested readers are referred to Pollard (1991). \(\square \)

Proof of Theorem 1

Observe that \({\widehat{\beta }}_S(\tau )=\arg \min \limits _{\beta _S}\sum \limits _{i=1}^n \frac{\delta _i}{{\widehat{G}}(Y_i)}\rho _\tau (Y_i-(\Pi _SX)_i^T\beta _S) \), so \({\widehat{\beta }}_S(\tau )\) minimizes

$$\begin{aligned}&\sum \limits _{i=1}^n\frac{\delta _i}{{\widehat{G}}(Y_i)}[\rho _\tau (Y_i-(\Pi _SX)_i^T\beta _S)-\rho _\tau (Y_i-X_i^T\beta _0)]\\&\quad =\sum \limits _{i=1}^n\frac{\delta _i}{{\widehat{G}}(Y_i)}[\rho _\tau (\varepsilon _i+X_i^T\beta _0-(\Pi _SX)_i^T\beta _S)-\rho _\tau (Y_i-X_i^T\beta _0)].\\ \end{aligned}$$

By Lemmas 2 and 3, we have

$$\begin{aligned}&\sqrt{n} \left\{ \widehat{\beta }_S(\tau )- \left( \begin{array}{c} \ddot{\beta }_0\\ 0 \end{array} \right) \right\} \mathop {\longrightarrow }\limits ^{d}- (\Pi _S\Sigma _0\Pi _S^{T})^{-1} \Pi _SW\\&+\,(\Pi _S\Sigma _0\Pi _S^{T})^{-1} \Pi _S\Sigma _0\left( \begin{array}{c} 0\\ \delta \end{array} \right) . \end{aligned}$$

This completes the proof. \(\square \)

Proofs of Theorems 2 and  3

The proofs of Theorems 2 and  3 are similar to those of Theorems 2 and  3 in Zhang and Liang (2011), respectively, and we omit them.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Du, J., Zhang, Z. & Xie, T. Focused information criterion and model averaging in censored quantile regression. Metrika 80, 547–570 (2017). https://doi.org/10.1007/s00184-017-0616-1

Download citation

  • Received:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00184-017-0616-1

Keywords

Mathematics Subject Classification

Navigation