Skip to main content

Advertisement

Log in

Computing the log concave NPMLE for interval censored data

  • Published:
Statistics and Computing Aims and scope Submit manuscript

Abstract

In analyzing interval censored data, a non-parametric estimator is often desired due to difficulties in assessing model fits. Because of this, the non-parametric maximum likelihood estimator (NPMLE) is often the default estimator. However, the estimates for values of interest of the survival function, such as the quantiles, have very large standard errors due to the jagged form of the estimator. By forcing the estimator to be constrained to the class of log concave functions, the estimator is ensured to have a smooth survival estimate which has much better operating characteristics than the unconstrained NPMLE, without needing to specify a parametric family or smoothing parameter. In this paper, we first prove that the likelihood can be maximized under a finite set of parameters under mild conditions, although the log likelihood function is not strictly concave. We then present an efficient algorithm for computing a local maximum of the likelihood function. Using our fast new algorithm, we present evidence from simulated current status data suggesting that the rate of convergence of the log-concave estimator is faster (between \(n^{2/5}\) and \(n^{1/2}\)) than the unconstrained NPMLE (between \(n^{1/3}\) and \(n^{1/2}\)).

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

References

  • Anderson-Bergman, C.: logconPH: CoxPH model with log concave baseline distribution. http://CRAN.R-project.org/package=logconPH (2014)

  • Bagnoli, M., Bergstorm, T.: Log-concave probability and its applications. Econ. Theory 26(2), 445–469 (2005)

    Article  MathSciNet  MATH  Google Scholar 

  • Braun, J.: ICE: iterated conditional expectation. http://CRAN.R-project.org/package=ICE (2013)

  • Braun, J., Duchense, T., Stafford, J.: Local likelihood density estimation for interval censored data. Can. J. Stat. 33(1), 39–60 (2005)

    Article  MathSciNet  Google Scholar 

  • Bogaerts, K., Lesaffre, E.: A new, fast algorithm to find the regions of possible support for bivariate interval-censored data. J. Comput. Gr. Stat. 13(2), 330–340 (2004)

    Article  MathSciNet  Google Scholar 

  • Dümbgen, L., Freitag-Wolf, S., Jongbloed, G.: Estimating a unimodal distribution from interval-censored data. J. Am. Stat. Assoc. 101(475), 1094–1106 (2006)

    Article  MathSciNet  MATH  Google Scholar 

  • Dümbgen, L., Hüsler, A., Rufibach, K.: Active set and EM algorithms for log-concave densities based on complete and censored data. preprint (2007)

  • Dümbgen, L., Rufibach, K., Schuhmacher, D.: Maximum likelihood estimation of a log-concave density based on censored data. Electron. J. Stat. 8(1), 1405–1437 (2014)

    Article  MathSciNet  MATH  Google Scholar 

  • Dümbgen, L., Rufibach, K.: Maximum likelihood estimation of a log-concave density and its distribution function: basic properties and uniform consistency. Bernoulli 15(1), 40–68 (2009)

    Article  MathSciNet  MATH  Google Scholar 

  • Fedorov, V.: Theory of Optimal Experiments. Academic Press, New York (1972)

    Google Scholar 

  • Gaspero, L.: QuadProg++. http://www.diegm.uniud.it/digaspero/index.php/software/ (2010)

  • Gentleman, R., Vandal, A.: Computational algorithms for censored-data problems using intersection graphs. J. Comput. Gr. Stat. 10(3), 403–421 (2001)

    Article  MathSciNet  Google Scholar 

  • Gentleman, R., Vandal, A.: Nonparametric estimation of the bivariate CDF for arbitrarily censored data. Can. J. Stat. 30(4), 557–571 (2002)

    Article  MathSciNet  MATH  Google Scholar 

  • Gomez, G., Calle, M., Oller, R., Langohr, K.: Tutorial on methods for interval-censored data and their implementation in R. Stat. Model. 9(4), 259–297 (2009)

    Article  MathSciNet  Google Scholar 

  • Groeneboom, P.: Asymptotics for interval censored observations. Technical Report 87–18. Department of Mathematics, University of Amsterdam (1987)

  • Groeneboom, P.: Nonparametric maximum likelihood estimation for interval censored data. Technical Report, Statistics Department, Stanford University (1991)

  • Huang, J.: Asymptotic properties of nonparametric estimation based on partly interval-censored data. Statistica Sinica 9, 501–519 (1999)

    MathSciNet  MATH  Google Scholar 

  • Jewell, N., Laan, M., Henneman, T.: Nonparametric estimation from current status data with competing risks. Biometrika 90(1), 183–197 (2003)

    Article  MathSciNet  MATH  Google Scholar 

  • Jongbloed, G.: The iterative convex minorant algorithm for nonparametric estimation. J. Comput. Gr. Stat. 7(3), 310–321 (1998)

    MathSciNet  Google Scholar 

  • Komárek, A.: bayesSurv: Bayesian survival regression with flexible error and random effects distributions. http://cran.r-project.org/web/packages/bayesSurv/index.html (2104)

  • Komárek, A., Lesaffre, E.: Bayesian semi-parametric accelerated failure time model for paired double-interval-censored data. Stat. Model. 6, 3–22 (2006)

    Article  MathSciNet  Google Scholar 

  • Komárek, A., Lesaffre, E.: The regression analysis of correlated interval-censored data: illustration using accelerated failure time models with flexible distributional assumptions. Stat. Model. 9(4), 299–319 (2009)

    Article  MathSciNet  Google Scholar 

  • Kooperberg, C., Stone, C.: Logspline density estimation for censored data. J. Comput. Gr. Stat. 1, 301–328 (1992)

    Google Scholar 

  • Kooperberg, C.: logspline: logspline density estimation routines. http://CRAN.R-project.org/ (2013)

  • Kuhn, W., Tucker, W.: Nonlinear programming. In: Proceedings of 2nd Berkeley Symposium, 481–492 (1951)

  • Lesaffre, D., Komárek, A., Declerck, D.: An overview of methods for interval-censored data with an emphasis on applications in dentistry. Stat. Methods Med. Res. 14, 539–552 (2005)

  • Maathuis, M.: MLEcens: Computation of the MLE for bivariate (interval) censored data (2013). http://CRAN.R-project.org/package=MLEcens

  • Pan, W.: Smooth estimation of the survival function for interval censored data. Stat. Med. 19(19), 2611–2624 (2000)

  • R Core Team, R: a language and environment for statistical computing. http://www.R-project.org/ (2014)

  • Rufibach, K.: Log-concave density estimation and bump hunting for i.i.d. observations. Ph.D. dissertation, University of Bern and Göttingen (2006)

  • Rufibach, K.: Computing maximum likelihood estimators of a concave density. J. Stat. Comput. Simul. 77(7), 561–574 (2007)

    Article  MathSciNet  MATH  Google Scholar 

  • Schuhmacher, D., Rufibach K., Dümbgen, L.: logconcens: maximum likelihood estimation of a log-concave density based on censored data. http://CRAN.R-project.org/package=logconcens (2013)

  • Turnbull, B.: The empirical distribution function with arbitrarily grouped, censored and truncated data. J. R. Stat. Soc. Ser. B (Methodol.) 38(3), 290–295 (1976)

    MathSciNet  MATH  Google Scholar 

  • Vanobbergen, J., Lesaffre, E., Declerck, D.: The signal-tandmobiel \({\textregistered }\) project–a longitudinal intervention health promotion study in Flanders (Belgium): base and first year results. Eur. J. Pediatr. Dentist. 2, 87–96 (2000)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Clifford Anderson-Bergman.

Appendix

Appendix

1.1 Proof of Theorem 2

We state the following theorem about the likelihood function for the log-concave NPMLE for interval censored data:

The likelihood function is continuous, coercive and bounded (and therefore has a maximum) if there exist \(i,j \in 1, ... , n\) such that \(L_i > R_j\). To show this, we consider the following three cases.

  1. 1.

    At least two data points are uncensored, and they are not equal to each other

  2. 2.

    All the data are censored

  3. 3.

    There exists one uncensored data point which is not contained in at least one of the censored intervals

Proof

In all three cases, continuity is trivial; the likelihood function is a sum of continuous functions.

For case 1, coercivity and boundedness are easily proven given that it has been shown that the log likelihood function for uncensored is strictly concave when at least two unique values are observed Rufibach (2006). We can consider the log likelihood function as having contributions from the censored and uncensored data. Because the contributions of the censored must be non-positive, the sum of uncensored and censored observations is bounded from above by the contribution of the uncensored observations. Because the contribution of the uncensored observations has been shown to be strictly concave, and the total likelihood is bounded from above by the contribution of the uncensored observations, the total likelihood is bounded and coercive.

For case 2, we again use the fact that the contribution of the censored observations is bounded from above by 0. Therefore, the log likelihood function is bounded. To show that the likelihood function is coercive, consider if one of the standardized \(\beta _k\)’s approaches \(\infty \). Because the standardized \(\beta _i\)’s must describe a proper probability distribution, as \(\beta _k \rightarrow \infty \), \(\beta _l \rightarrow -\infty \) for all \(l \ne k\). Thus, the contribution to the likelihood function for any observation interval not containing \(\beta _k\) will approach \(-\infty \) as \(\beta _k \rightarrow \infty \). Since \(L_i > R_j\) implies there exists at least one interval such that \(\beta _k\) is not within this interval, the contribution of this interval will approach \(-\infty \) as \(\beta _k \rightarrow \infty \), while all other intervals will be bounded from above by 0. Therefore, the likelihood function is coercive.

For case 3, first we note that the only complication is when there is only one unique time. If there is more than one unique time, this could be considered case 1 which has already been established. It is worth noting that there can be multiple observations all occurring at the same time, so one could have multiple exact observations without qualifying as case 1. If there are \(n_1\) uncensored observations at time \(x_1\) and \(n_2\) censored observations, the log likelihood function can be written as

$$\begin{aligned} \ell (\phi ) = \displaystyle n_1\phi (x_1) + \sum _{i = n_1+ 1} ^{n_1 + n_2} \log \left( \int _{L_i}^{R_i} e^{\phi (x) } {\mathrm {d}}x \right) \end{aligned}$$

Suppose \(L\) and \(R\) are the end points of an interval such that the unique uncensored time is not within it. Then because the contribution of the other censored observations is less than 0, we have

$$\begin{aligned} \ell (\phi ) \le n_1 \phi (x_1) + \log \left( \displaystyle \int _L^R e^{\phi (x)} {\mathrm {d}}x \right) \end{aligned}$$

Noting that the right side of the above equation is bounded below 0. Therefore, if any of the densities approach \(\infty \) inside \([L, \infty )\), \(\phi (0)\) will approach \(-\infty \) (because of the log concave constraint), as will the likelihood function. We must also show that if \(\ell (\phi ) \rightarrow -\infty \) whenever \(\phi (x_1) \rightarrow \infty \).

Without loss of generality, let us assume that \(x_1 = 0\) and \(L > 0\). Then we have that

$$\begin{aligned} \ell (\phi )\le & {} n_1 \phi (0) + \log \left( \displaystyle \int _L^R e^{\phi (x)} {\mathrm {d}}x \right) \\ \ell (\phi )\le & {} n_1 \phi (0) + \log \left( \displaystyle \int _L^\infty e^{\phi (x)} {\mathrm {d}}x \right) \end{aligned}$$

We note that for any choice of \(\phi (0)\), \(\log \left( \displaystyle \int _L^\infty e^{\phi (x)} {\mathrm {d}}x \right) \) is maximized by setting \(\phi (x)\) to be an exponential distribution with rate \(\lambda = e^{\phi (0)}\). This can be seen readily from the fact that the exponential distribution is the limit of the log-concave constraint. This means we can use the cdf of the exponential distribution to further bound the likelihood, i.e., setting \(\phi (0) = \log (\lambda )\), we get

$$\begin{aligned} \ell (\phi ) \le n_1 \log (\lambda ) + \log \left( e^{- \lambda L} \right) = n_1 \log (\lambda ) - \lambda L \end{aligned}$$

Because \(L > 0\), as \(\lambda \rightarrow \infty \) the above equation approaches \(-\infty \). Therefore, the likelihood function is bounded and coercive. \(\square \)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Anderson-Bergman, C., Yu, Y. Computing the log concave NPMLE for interval censored data. Stat Comput 26, 813–826 (2016). https://doi.org/10.1007/s11222-015-9571-8

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11222-015-9571-8

Keywords

Navigation