Skip to main content

Maximum likelihood estimation for length-biased and interval-censored data with a nonsusceptible fraction

Abstract

Left-truncated data are often encountered in epidemiological cohort studies, where individuals are recruited according to a certain cross-sectional sampling criterion. Length-biased data, a special case of left-truncated data, assume that the incidence of the initial event follows a homogeneous Poisson process. In this article, we consider an analysis of length-biased and interval-censored data with a nonsusceptible fraction. We first point out the importance of a well-defined target population, which depends on the prior knowledge for the support of the failure times of susceptible individuals. Given the target population, we proceed with a length-biased sampling and draw valid inferences from a length-biased sample. When there is no covariate, we show that it suffices to consider a discrete version of the survival function for the susceptible individuals with jump points at the left endpoints of the censoring intervals when maximizing the full likelihood function, and propose an EM algorithm to obtain the nonparametric maximum likelihood estimates of nonsusceptible rate and the survival function of the susceptible individuals. We also develop a novel graphical method for assessing the stationarity assumption. When covariates are present, we consider the Cox proportional hazards model for the survival time of the susceptible individuals and the logistic regression model for the probability of being susceptible. We construct the full likelihood function and obtain the nonparametric maximum likelihood estimates of the regression parameters by employing the EM algorithm. The large sample properties of the estimates are established. The performance of the method is assessed by simulations. The proposed model and method are applied to data from an early-onset diabetes mellitus study.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2

References

  • Addona V, Wolfson DB (2006) A formal test for the stationarity of the incidence rate using data from a prevalent cohort study with follow-up. Lifetime Data Anal 12:267–284

    MathSciNet  Article  Google Scholar 

  • Asgharian M, M’Lan CE, Wolfson DB (2002) Length-biased sampling and right-censoring: an unconditional approach. J Am Stat Assoc 97:201–209

  • Asgharian M, Wolfson DB, Zhang X (2006) Checking stationarity of the incidence rate using prevalent cohort survival data. Stat Med 25:1751–1767

    MathSciNet  Article  Google Scholar 

  • Chan JC, Lau ES, Luk AO, Cheung KK, Kong AP, Yu LW, Choi KC, Chow FC, Ozaki R, Brown N, Yang X, Bennett PH, Ma RC, So WY (2014) Premature mortality and comorbidities in young-onset diabetes: a 7-year prospective analysis. Am J Med 127:616–624

    Article  Google Scholar 

  • Chen CH, Tsay YC, Wu YC, Horng CF (2013) Logistic-AFT location-scale mixture regression models with nonsusceptibility for left-truncated and general interval-censored data. Stat Med 32:4285–4305

    MathSciNet  Article  Google Scholar 

  • Chen HY, Little RJA (1999) Proportional hazards regression with missing covariates. J Am Stat Assoc 94:896–908

    MathSciNet  Article  Google Scholar 

  • Farewell VT (1982) The use of mixture models for the analysis of survival data with long-term survivors. Biometrics 38:1041–1046

    Article  Google Scholar 

  • Frydman H (1994) A note on nonparametric estimation of the distribution function from interval-censored and truncated observations. J R Stat Soc Ser B 56:71–74

    MathSciNet  MATH  Google Scholar 

  • Hillier TA, Pedula KL (2001) Characteristics of an adult population with newly diagnosed type 2 diabetes: the relation of obesity and age of onset. Diabetes Care 24:1522–1527

    Article  Google Scholar 

  • Huang CY, Ning J, Qin J (2015) Semiparametric likelihood inference for left-truncated and right-censored data. Biostatistics 16:785–798

    MathSciNet  Article  Google Scholar 

  • Huang J, Wellner JA (1995) Asymptotic normality of the NPMLE of linear functionals for interval censored data, case I. Stat Neerl 49:153–163

    MathSciNet  Article  Google Scholar 

  • Huang J, Wellner JA (1997) Interval censored survival data: a review of recent progress. In: Lin DY, Fleming TR (ed) Proceedings of the first Seattle symposium in biostatistics: survival analysis. Springer, New York, pp 123–169

  • Kim JS (2003) Efficient estimation for the proportional hazards model with left-truncated and “case 1” interval-censored data. Stat Sin 13:519–537

  • Kuk AYC, Chen CH (1992) A mixture model combining logistic regression with proportional hazards regression. Biometrika 79:531–541

    Article  Google Scholar 

  • Lascar N, Brown J, Pattison H, Barnett AH, Bailey CJ, Bellary S (2018) Type 2 diabetes in adolescents and young adults. Lancet Diabetes Endocrinol 6:69–80

    Article  Google Scholar 

  • Ma S (2010) Mixed case interval censored data with a cured subgroup. Stat Sin 20:1165–1181

    MathSciNet  MATH  Google Scholar 

  • Mandel M, Betensky RA (2007) Testing goodness of fit of a uniform truncation model. Biometrics 63:405–412

    MathSciNet  Article  Google Scholar 

  • Pan W, Chappell R (2002) Estimation in the Cox proportional hazard model with left-truncated and interval-censored data. Biometrics 58:64–70

    MathSciNet  Article  Google Scholar 

  • Peng Y, Dear KBG (2000) A nonparametric mixture model for cure rate estimation. Biometrics 56:237–243

    Article  Google Scholar 

  • Piao J, Ning J, Chambers CD, Xu R (2018) Semiparametric model and inference for spontaneous abortion data with a cured proportion and biased sampling. Biostatistics 19:54–70

    MathSciNet  Article  Google Scholar 

  • Qin J (2017) Biased Sampling. Over-identified Parameter Problems and Beyond, Springer, Singapore

    MATH  Google Scholar 

  • Qin J, Ning J, Liu H, Shen Y (2011) Maximum likelihood estimations and EM algorithms with length-biased data. J Am Stat Assoc 106:1434–1449

    MathSciNet  Article  Google Scholar 

  • Sattar N, Rawshani A, Franzén S, Rawshani A, Svensson AM, Rosengren A, McGuire DK, Eliasson B, Gudbjornsdöttir S (2019) Age at diagnosis of type 2 diabetes mellitus and associations with cardiovascular and mortality risks. Circulation 139:2228–2237

    Article  Google Scholar 

  • Schick A, Yu Q (2000) Consistency of the GMLE with mixed case interval-censored data. Scand J Stat 27:45–55

    MathSciNet  Article  Google Scholar 

  • Shen PS (2015) Conditional MLE for the proportional hazards model with left-truncated and interval-censored data. Stat Probab Lett 100:164–171

    MathSciNet  Article  Google Scholar 

  • Shen PS (2020) Nonparametric estimators of survival function under the mixed case interval-censored model with left truncation. Lifetime Data Anal 26:624–637

    MathSciNet  Article  Google Scholar 

  • Shen PS, Chen HJ, Pan WH, Chen CM (2019) Semiparametric regression analysis for left-truncated and interval-censored data without or with a cure fraction. Comput Stat Data Anal 140:74–87

    MathSciNet  Article  Google Scholar 

  • Shen Y, Ning J, Qin J (2017) Nonparametric and semiparametric regression estimation for length-biased survival data. Lifetime Data Anal 23:3–24

    MathSciNet  Article  Google Scholar 

  • Song S (2004) Estimation with univariate “mixed case” interval censored data. Stat Sin 14:269–282

  • Sun Y, Qin J, Huang CY (2018) Missing information principle: a unified approach for general truncated and censored survival data problems. Stat Sci 33:261–276

    MathSciNet  Article  Google Scholar 

  • Sy JP, Taylor JMG (2000) Estimation in a Cox proportional hazards model cure model. Biometrics 56:227–236

    MathSciNet  Article  Google Scholar 

  • Tsai WY (1990) Testing the assumption of independence of truncation time and failure time. Biometrika 77:169–177

    MathSciNet  Article  Google Scholar 

  • Turnbull BW (1976) The empirical distribution function with arbitrarily grouped, censored and truncated data. J R Stat Soc Ser B 38:290–295

    MathSciNet  MATH  Google Scholar 

  • Vardi Y (1989) Multiplicative censoring, renewal processes, deconvolution and decreasing density: nonparametric estimation. Biometrika 76:751–761

    MathSciNet  Article  Google Scholar 

  • Wang MC (1987) Product limit estimates—a generalized maximum likelihood study. Commun Stat Theor Meth 16:3117–3132

    MathSciNet  Article  Google Scholar 

  • Wang MC (1989) A semiparametric model for randomly truncated data. J Am Stat Assoc 84:742–748

    MathSciNet  Article  Google Scholar 

  • Wang MC (1991) Nonparametric estimation from cross-sectional survival data. J Am Stat Assoc 86:130–143

    MathSciNet  Article  Google Scholar 

  • Wilmot E, Idris I (2014) Early onset type 2 diabetes: risk factors, clinical impact and management. Ther Adv Chronic Dis 5:234–244

    Article  Google Scholar 

  • Wu Y, Chambers CD, Xu R (2019) Semiparametric sieve maximum likelihood estimation under cure model with partly interval censored and left truncated data for application to spontaneous abortion. Lifetime Data Anal 25:507–528

    MathSciNet  Article  Google Scholar 

  • Xu J, Peng Y (2014) Nonparametric cure rate estimation with covariates. Can J Stat 42:1–17

    MathSciNet  Article  Google Scholar 

  • Zelen M (2004) Forward and backward recurrence times and length biased sampling: Age specific models. Lifetime Data Anal 10:325–334

    MathSciNet  Article  Google Scholar 

  • Zou X, Zhou X, Ji L, Yang W, Lu J, Weng J, Jia W, Shan Z, Liu J, Tian H, Ji Q, Zhu D, Ge J, Lin L, Chen L, Guo X, Zhao Z, Li Q, Zhou Z (2017) The characteristics of newly diagnosed adult early-onset diabetes: a population-based cross-sectional study. Sci Rep 7:46534

    Article  Google Scholar 

Download references

Acknowledgements

This work was supported by a research grant from the Ministry of Science and Technology of Taiwan (MOST 108-2118-M-010-001).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Chyong-Mei Chen.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 4409 KB)

Appendix: Determine the possible jump points of G(t)

Appendix: Determine the possible jump points of G(t)

The major issue here is to determine the possible jump points of G(t), which implies the possible jump points of F(t). We could show that it suffices to consider the discrete distribution function G(t) with jumps points at \(l_i, i=1,\ldots ,n\). The arguments are as follows. From (2.3), for any pair \((\pi ^*,F^*)\), the likelihood function can be reparameterized as

$$\begin{aligned}&{\mathcal {L}}(\pi ^{*}, G^{*})\\&\qquad =\prod _{i=1}^n \left( {\tilde{\pi }}^{*} \int _{l_i}^{u_i} \frac{1}{t} dG^{*}(t) \right) ^{\delta _i}\left( \frac{1-{\tilde{\pi }}^{*}}{\tau }\right. \\&\left. \qquad \quad +{\tilde{\pi }}^{*} \int _{l_i}^{\infty } \frac{1}{t} dG^{*}(t) \right) ^{(1-\delta _i)I(\omega _i=0)} \left( \frac{1-{\tilde{\pi }}^{*}}{\tau }\right) ^{(1-\delta _i)I(\omega _i=1)}, \end{aligned}$$

where \({\tilde{\pi }}^{*}=\pi ^{*} \mu ^{*}/(\tau (1-\pi ^{*})+\pi ^{*}\mu ^{*})\) and \(\mu ^{*}=1/(\int t^{-1} dG^{*}(t))\). If \(G^*\) is a step function which assigns positive probability to the points between \(l_i\) and \(u_i\) for some i, then we can obtain greater value for \(\int _{l_i}^{u_i} t^{-1} dG(t)\) by shifting the mass to \(l_i\) since the integrand 1/t is a decreasing function of t. Similarly, if \(G^*\) puts mass at a point on the left side of the smallest of \(l_i\)’s, a greater likelihood value can be obtained by shifting that point to the smallest of \(l_i\)’s since none of the integration in (2) contains that point. Although \(\mu =1/(\int t^{-1} dG(t))\) is smaller than \(\mu ^*=1/(\int t^{-1} dG^{*}(t))\), we can choose \(\pi =\tau {{\tilde{\pi }}}^{*}/((1-{{\tilde{\pi }}}^{*})\mu +\tau {{\tilde{\pi }}}^{*})\) such that \({\tilde{\pi }}={\tilde{\pi }}^{*}\). Then, \({\mathcal {L}}(\pi , G)>{\mathcal {L}}(\pi ^*, G^*)\). The proof is complete.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Shen, Ps., Peng, Y., Chen, HJ. et al. Maximum likelihood estimation for length-biased and interval-censored data with a nonsusceptible fraction. Lifetime Data Anal 28, 68–88 (2022). https://doi.org/10.1007/s10985-021-09536-2

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10985-021-09536-2

Keywords

  • Interval censoring
  • Left truncation
  • Length-biased sampling
  • Mixture cure model
  • Nonparametric maximum likelihood estimation