Skip to main content

Efficient computation of nonparametric survival functions via a hierarchical mixture formulation

Abstract

We propose a new algorithm for computing the maximum likelihood estimate of a nonparametric survival function for interval-censored data, by extending the recently-proposed constrained Newton method in a hierarchical fashion. The new algorithm makes use of the fact that a mixture distribution can be recursively written as a mixture of mixtures, and takes a divide-and-conquer approach to break down a large-scale constrained optimization problem into many small-scale ones, which can be solved rapidly. During the course of optimization, the new algorithm, which we call the hierarchical constrained Newton method, can efficiently reallocate the probability mass, both locally and globally, among potential support intervals. Its convergence is theoretically established based on an equilibrium analysis. Numerical study results suggest that the new algorithm is the best choice for data sets of any size and for solutions with any number of support intervals.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2

References

  1. Bogaerts, K., Lesaffre, E.: A new, fast algorithm to find the regions of possible support for bivariate interval-censored data. J. Comput. Graph. Stat. 13, 330–340 (2004)

    MathSciNet  Article  Google Scholar 

  2. Böhning, D.: A vertex-exchange-method in D-optimal design theory. Metrika 33, 337–347 (1986)

    MathSciNet  Article  MATH  Google Scholar 

  3. Böhning, D., Schlattmann, P., Dietz, E.: Interval censored data: A note on the nonparametric maximum likelihood estimator of the distribution function. Biometrika 83, 462–466 (1996)

    Article  MATH  Google Scholar 

  4. Chen, L., Jha, P., Sirling, B., Sgaier, S.K., Daid, T., Kaul, R., Nagelkerke, N.: Sexual risk factors for HIV infection in early and advanced HIV epidemics in Sub-Saharan Africa: systematic overview of 68 epidemiological studies. PLoS ONE 2, e1001 (2007)

    Article  Google Scholar 

  5. Dax, A.: The smallest point of a polytope. J. Optim. Theory Appl. 64, 429–432 (1990)

    MathSciNet  Article  MATH  Google Scholar 

  6. Dempster, A.P., Laird, N.M., Rubin, D.B.: Maximum likelihood from incomplete data via the EM algorithm. J. R. Stat. Soc. B 39, 1–22 (1977)

    MathSciNet  MATH  Google Scholar 

  7. Dümbgen, L., Freitag-Wolf, S., Jongbloed, G.: Estimating a unimodal distribution from interval-censored data. J. Am. Stat. Assoc. 101, 1094–1106 (2006)

    Article  MATH  Google Scholar 

  8. Gentleman, R., Vandal, A.C.: Computational algorithms for censored-data problems using intersection graphs. J. Comput. Graph. Stat. 10, 403–421 (2001)

    MathSciNet  Article  Google Scholar 

  9. Gentleman, R., Vandal, A.C.: Icens: NPMLE for censored and truncated data. R package version 1.18.0 (2009)

  10. Groeneboom, P.: Nonparametric maximum likelihood estimators for interval censoring and deconvolution. Technical report 378, Department of Statistics, Stanford University (1991)

  11. Groeneboom, P., Jongbloed, G., Wellner, J.A.: The support reduction algorithm for computing nonparametric function estimates in mixture models. Scand. J. Stat. 35, 385–399 (2008)

    MathSciNet  Article  Google Scholar 

  12. Groeneboom, P., Wellner, J.A.: Information Bounds and Nonparametric Maximum Likelihood Estimation. Birkhäuser, Basel (1992)

    Book  MATH  Google Scholar 

  13. Jongbloed, G.: The iterative convex minorant algorithm for nonparametric estimation. J. Comput. Graph. Stat. 7, 301–321 (1998)

    MathSciNet  Google Scholar 

  14. Kumwenda, N.I., Hoover, D.R., Mofenson, L.M., Thigpen, M.C., Kafulafula, G., Li, Q., Mipando, L., Nkanaunena, K., Mebrahtu, T., Bulterys, M., Fowler, M.G., Taha, T.E.: Extended antiretroviral prophylaxis to reduce breast-milk HIV-1 transmission. N. Engl. J. Med. 359, 119–129 (2008)

    Article  Google Scholar 

  15. Lawson, C.L., Hanson, R.J.: Solving Least Squares Problems. Prentice-Hall, New York (1974)

    MATH  Google Scholar 

  16. Lesperance, M.L., Kalbfleisch, J.D.: An algorithm for computing the nonparametric MLE of a mixing distribution. J. Am. Stat. Assoc. 87, 120–126 (1992)

    Article  MATH  Google Scholar 

  17. Lindsay, B.G.: In: Mixture Models: Theory, Geometry and Applications. NSF-CBMS Regional Conference Series in Probability and Statistics, vol. 5. Institute for Mathematical Statistics, Hayward (1995)

    Google Scholar 

  18. Luenberger, D.G., Ye, Y.: Linear and Nonlinear Programming, 3rd edn. Springer, New York (2008)

    MATH  Google Scholar 

  19. Maathuis, M.H.: Reduction algorithm for the NPMLE for the distribution of bivariate interval-censored data. J. Comput. Graph. Stat. 14, 352–362 (2005)

    MathSciNet  Article  Google Scholar 

  20. Maathuis, M.H.: MLEcens: Computation of the MLE for bivariate (interval) censored data. R package version 0.1-2. (2007)

  21. Peto, R.: Experimental survival curves for interval-censored data. Appl. Stat. 22, 86–91 (1973)

    Article  Google Scholar 

  22. Pilla, R.S., Lindsay, B.G.: Alternative EM methods for nonparametric finite mixture models. Biometrika 88, 535–550 (2001)

    MathSciNet  Article  MATH  Google Scholar 

  23. Siegfried, N., Clarke, M., Volmink, J.: Randomised controlled trials in Africa of HIV and AIDS: descriptive study and spatial distribution. BMJ 331, 742 (2005)

    Article  Google Scholar 

  24. Sun, J.: The Statistical Analysis of Interval-censored Failure Time Data. Springer, Berlin (2006)

    MATH  Google Scholar 

  25. Turnbull, B.W.: Nonparametric estimation of a survivorship function with doubly censored data. J. Am. Stat. Assoc. 69, 169–173 (1974)

    MathSciNet  Article  MATH  Google Scholar 

  26. Turnbull, B.W.: The empirical distribution function with arbitrarily grouped, censored and truncated data. J. R. Stat. Soc. B 38, 290–295 (1976)

    MathSciNet  MATH  Google Scholar 

  27. Wang, Y.: On fast computation of the non-parametric maximum likelihood estimate of a mixing distribution. J. R. Stat. Soc. B 69, 185–198 (2007)

    Article  MATH  Google Scholar 

  28. Wang, Y.: Dimension-reduced nonparametric maximum likelihood computation for interval-censored data. Comput. Stat. Data Anal. 52, 2388–2402 (2008)

    Article  MATH  Google Scholar 

  29. Wellner, J.A., Zhan, Y.: A hybrid algorithm for computation of the nonparametric maximum likelihood estimator from censored data. J. Am. Stat. Assoc. 92, 945–959 (1997)

    MathSciNet  Article  MATH  Google Scholar 

  30. Wong, G.Y., Yu, Q.: Generalized MLE of a joint distribution function with multivariate interval-censored data. J. Multivar. Anal. 69, 155–166 (1999)

    MathSciNet  Article  MATH  Google Scholar 

  31. Wu, C.F.: Some algorithmic aspects of the theory of optimal designs. Ann. Stat. 6, 1286–1301 (1978)

    Article  MATH  Google Scholar 

Download references

Acknowledgements

The authors thank the Editor, the Associate Editor and two reviewers for many constructive comments, and are grateful to Bruce Lindsay for helpful suggestions. This research was supported by a Marsden grant of the Royal Society of New Zealand (9145/3608546).

Author information

Affiliations

Authors

Corresponding author

Correspondence to Yong Wang.

Appendix: Linear regression over a simplex

Appendix: Linear regression over a simplex

Consider the constrained least squares problem:

for δ>0. It can be solved by the NNLS algorithm of Lawson and Hanson (1974), after a transformation suggested by Dax (1990). Letting y=x/δ and c=b/δ, it is apparent that the problem is equivalent to

which is further equivalent to

(22)

where P=A−(c,…,c). The solution to problem (22) can be found by solving the following least squares problem with only non-negativity constraints:

(23)

By relating the Karush-Kuhn-Tucker conditions for both problems, Dax established that if \({\tilde {\mathbf {y}}}\) solves problem (23), then \({\tilde {\mathbf {y}}}/{\tilde {\mathbf {y}}}^{{\top }}\mathbf {1}\) solves problem (22).

Problem (23) can be solved by the NNLS algorithm of Lawson and Hanson (1974).

Rights and permissions

Reprints and Permissions

About this article

Cite this article

Wang, Y., Taylor, S.M. Efficient computation of nonparametric survival functions via a hierarchical mixture formulation. Stat Comput 23, 713–725 (2013). https://doi.org/10.1007/s11222-012-9341-9

Download citation

Keywords

  • Nonparametric maximum likelihood
  • Survival function
  • Interval censoring
  • Clinical trial
  • Constrained Newton method
  • Disease-free survival