Skip to main content
Log in

Quantile function regression analysis for interval censored data, with application to salary survey data

  • Original Paper
  • Recent Statistical Methods for Survival Analysis
  • Published:
Japanese Journal of Statistics and Data Science Aims and scope Submit manuscript

Abstract

This study aims at regression analysis for quantile functions where the quantile regression coefficients are treated as functions over a continuum of quantile levels. We propose a general inference procedure for quantile regression coefficient functions with interval-censored outcome data. The modeling framework follows a recent proposal using a set of parametric basis functions to approximate the quantile regression coefficient functions. The new proposal can accommodate outcome data subject to general types of interval censoring, including fixed, random, and partly interval censoring. The large sample theory for the proposed estimator is established for inference, and a goodness-of-fit testing procedure is developed to guide the choice of the basis functions. We apply the proposed methodology to a survey dataset on monthly salaries of Taiwan workers, where only parts of the salary data are exact while the others are interval-censored according to the salary intervals prespecified in the survey questionnaire.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2

Similar content being viewed by others

References

  • Frumento, P., & Bottai, M. (2016). Parametric modeling of quantile regression coefficient functions. Biometrics, 72, 74–84.

    Article  MathSciNet  Google Scholar 

  • Frumento, P., & Bottai, M. (2017). Parametric modeling of quantile regression coefficient functions with censored and truncated data. Biometrics, 73, 1179–1188.

    Article  MathSciNet  Google Scholar 

  • Frydman, H. (1994). A note on nonparametric estimation of the distribution function from interval-censored and truncated observations. Journal of the Royal Statistical Society, Series B, 56, 71–74.

    MathSciNet  MATH  Google Scholar 

  • Kim, Y.-J., Cho, H., Kim, J., & Jhun, M. (2010). Median regression model with interval censored data. Biometrical Journal, 52, 201–208.

    MathSciNet  MATH  Google Scholar 

  • Koenker, R. (2005). Quantile Regression. Cambridge University Press.

    Book  Google Scholar 

  • Koenker, R., & Bassett, G. (1978). Regression quantiles. Econometrica, 46, 33–50.

    Article  MathSciNet  Google Scholar 

  • Newey, W. K., & McFadden, D. L. (1994). Large sample estimation and hypothesis testing. In R. F. Engle & D. L. McFadden (Eds.), Handbook of Econometrics (Vol. 4, pp. 2113–2148). Elsevier.

    Google Scholar 

  • Ruder, S. (2016). An overview of gradient descent optimization algorithms. arXiv preprint. arXiv:1609.04747.

  • Shen, P. S. (2013). Median regression model with left truncated and interval-censored data. The Journal of the Korean Statistical Society, 42, 469–479.

    Article  MathSciNet  Google Scholar 

  • Shen, P. S. (2020). Quantile regression for doubly truncated data. Statistics, 54, 649–666.

    Article  MathSciNet  Google Scholar 

  • Sun, J. (2006). The statistical analysis of interval-censored failure time data. Springer.

    MATH  Google Scholar 

  • Turnbull, B. W. (1976). The empirical distribution function with arbitrarily grouped, censored and truncated data. Journal of the Royal Statistical Society, Series B, 38, 290–295.

    MathSciNet  MATH  Google Scholar 

  • Zhang, Z., & Sun, J. (2010). Interval censoring. Statistical Methods in Medical Research, 19, 53–70.

    Article  MathSciNet  Google Scholar 

  • Zhou, X., Feng, Y., & Du, X. (2017). Quantile regression for interval censored data. Communications in Statistics-Theory and Methods, 46, 3848–3863.

    Article  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yi-Hau Chen.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary file1 (PDF 262 kb)

Appendix

Appendix

1.1 Derivation of Equation (2.2)

Since \(\tau _{y_i}=\{\tau :\,Q(\tau ;{\varvec{x}}_i)={\varvec{x}}_i^{\mathrm{T}}{\varvec{\beta }}(\tau )=y_i\}\) is assumed to be unique and \(Q(\tau ;{\varvec{x}}_i)\) is a monotone function of \(\tau \), we have \(I(y_i\le {\varvec{x}}_i^{\mathrm{T}}{\varvec{\beta }}(\tau )<\infty )=I(\tau _{y_i}\le \tau \le 1)\). Then,

$$\begin{aligned} {\varvec{0}}=&\sum _{i=1}^n \int _0^1 ({\varvec{b}}(\tau )\otimes {\varvec{x}}_i){S}_i(\tau ){\text {d}}\tau \\ =&\sum _{i=1}^n \Big \{\int _0^1 {\varvec{b}}(\tau )I(\tau _{y_i}\le \tau \le 1){\text {d}}\tau -\int _0^1 \tau {\varvec{b}}(\tau ) {\text {d}}\tau \Big \} \otimes {\varvec{x}}_i\\ =&\sum _{i=1}^n \left\{ {{\varvec{B}}}(1)-{{\varvec{B}}}(\tau _{y_i})-\bar{{\varvec{B}}}(1)\right\} \otimes {\varvec{x}}_i. \end{aligned}$$

1.2 Proof of \(E\{{{\tilde{e}}}_i(\tau )\}=\tau \)

It is sufficient to prove that \(E\{e_i(\tau )\}=\tau \). For ease of presentation, we remove the subscript i of \(c_{i1},c_{i2}\) and \({\varvec{x}}_i\), and write \(F_Y\) short for \(F_{Y|{\varvec{x}}}\).

$$\begin{aligned} E\{I(c_2 \le {\varvec{x}}^{\mathrm{T}}{\varvec{\beta }})\}&=\int _{-\infty }^{{\varvec{x}}^{\mathrm{T}}{\varvec{\beta }}}\int _{-\infty }^{c_2} f_{c_1,c_2}(c_1,c_2){\text {d}}c_1{\text {d}}c_2\nonumber \\&=\int _{-\infty }^{{\varvec{x}}^{\mathrm{T}}{\varvec{\beta }}}\Big \{\int _{-\infty }^{c_2}\frac{F_Y(c_2)}{F_Y(c_2)-F_Y(c_1)}f_{c_1,c_2}(c_1,c_2){\text {d}}c_1\Big \}{\text {d}}c_2\nonumber \\&\quad -\int _{-\infty }^{{\varvec{x}}^{\mathrm{T}}{\varvec{\beta }}}\Big \{\Big (\int _{c_1}^{{\varvec{x}}^{\mathrm{T}}{\varvec{\beta }}} +\int _{{\varvec{x}}^{\mathrm{T}}{\varvec{\beta }}}^{\infty }\Big ) \frac{F_Y(c_1)}{F_Y(c_2)-F_Y(c_1)}f_{c_1,c_2}(c_1,c_2){\text {d}}c_2\Big \}{\text {d}}c_1\nonumber \\&\quad +\int _{-\infty }^{{\varvec{x}}^{\mathrm{T}}{\varvec{\beta }}}\Big \{\int _{{\varvec{x}}^{\mathrm{T}}{\varvec{\beta }}}^{\infty } \frac{F_Y(c_1)}{F_Y(c_2)-F_Y(c_1)}f_{c_1,c_2}(c_1,c_2){\text {d}}c_2\Big \}{\text {d}}c_1\nonumber \\&= - \int _{-\infty }^{{\varvec{x}}^{\mathrm{T}}{\varvec{\beta }}}F_Y(c) d\Big \{\int _{-\infty }^{c}\int _{c}^{\infty }\frac{f_{c_1,c_2}(c_1,c_2)}{F_Y(c_2)-F_Y(c_1)}{\text {d}}c_2{\text {d}}c_1\Big \}\nonumber \\&\quad +\int _{-\infty }^{{\varvec{x}}^{\mathrm{T}}{\varvec{\beta }}}\int _{{\varvec{x}}^{\mathrm{T}}{\varvec{\beta }}}^{\infty } \frac{F_Y(c_1)}{F_Y(c_2)-F_Y(c_1)}f_{c_1,c_2}(c_1,c_2){\text {d}}c_2{\text {d}}c_1. \end{aligned}$$
(7.1)

By integration by parts,

$$\begin{aligned}&-\int _{-\infty }^{{\varvec{x}}^{\mathrm{T}}{\varvec{\beta }}}F_Y(c) d\Big \{\int _{-\infty }^{c}\int _{c}^{\infty }\frac{f_{c_1,c_2}(c_1,c_2)}{F_Y(c_2)-F_Y(c_1)}{\text {d}}c_2{\text {d}}c_1\Big \}\nonumber \\&\quad =\int _{-\infty }^{{\varvec{x}}^{\mathrm{T}}{\varvec{\beta }}}\int _{{\varvec{x}}^{\mathrm{T}}{\varvec{\beta }}}^{\infty } \frac{-\tau }{F_Y(c_2)-F_Y(c_1)}f_{c_1,c_2}(c_1,c_2){\text {d}}c_2{\text {d}}c_1 \nonumber \\&\qquad +\int _{-\infty }^{{\varvec{x}}^{\mathrm{T}}{\varvec{\beta }}}\int _{-\infty }^c\int _c^{\infty } \frac{f_Y(c)}{F_Y(c_2)-F_Y(c_1)}f_{c_1,c_2}(c_1,c_2){\text {d}}c_2{\text {d}}c_1{\text {d}}c. \end{aligned}$$
(7.2)

and

$$\begin{aligned}&\int _{-\infty }^{{\varvec{x}}^{\mathrm{T}}{\varvec{\beta }}}\int _{-\infty }^c\int _c^{\infty } \frac{f_Y(c)}{F_Y(c_2)-F_Y(c_1)}f_{c_1,c_2}(c_1,c_2){\text {d}}c_2{\text {d}}c_1{\text {d}}c\nonumber \\&\quad =\int _{-\infty }^{{\varvec{x}}^{\mathrm{T}}{\varvec{\beta }}}\int _{-\infty }^c\int _c^{\infty } f_{Y|\,c_1,c_2}(c|\,c_1,c_2)f_{c_1,c_2}(c_1,c_2){\text {d}}c_2{\text {d}}c_1{\text {d}}c=P(Y \le {\varvec{x}}^{\mathrm{T}}{\varvec{\beta }})=\tau . \end{aligned}$$
(7.3)

It follows from (7.1) – (7.3), together with

$$\begin{aligned} E\Big \{\frac{\tau -F_Y(c_1)}{F_Y(c_2)-F_Y(c_1)}I(c_1< {\varvec{x}}^{\mathrm{T}}{\varvec{\beta }} < c_2)\Big \} =\int _{-\infty }^{{\varvec{x}}^{\mathrm{T}}{\varvec{\beta }}}\int _{{\varvec{x}}^{\mathrm{T}}{\varvec{\beta }}}^{\infty } \frac{\tau -F_Y(c_1)}{F_Y(c_2)-F_Y(c_1)}f_{c_1,c_2}(c_1,c_2){\text {d}}c_2{\text {d}}c_1 \end{aligned}$$

that we prove \(E\{e_i(\tau )\}=\tau \).

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Hsu, CY., Wen, CC. & Chen, YH. Quantile function regression analysis for interval censored data, with application to salary survey data. Jpn J Stat Data Sci 4, 999–1018 (2021). https://doi.org/10.1007/s42081-021-00113-3

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s42081-021-00113-3

Keywords

Navigation