Coarsening at Random: Characterizations, Conjectures, Counter-Examples

  • Richard D. Gill
  • Mark J. van der Laan
  • James M. Robins
Part of the Lecture Notes in Statistics book series (LNS, volume 123)


The notion of coarsening at random (CAR) was introduced by Heitjan and Rubin (1991) to describe the most general form of randomly grouped, censored, or missing data, for which the coarsening mechanism can be ignored when making likelihood-based inference about the parameters of the distribution of the variable of interest. The CAR assumption is popular, and applications abound. However the full implications of the assumption have not been realized. Moreover a satisfactory theory of CAR for continuously distributed data—which is needed in many applications, particularly in survival analysis—hardly exists as yet. This paper gives a detailed study of CAR. We show that grouped data from a finite sample space always fit a CAR model: a nonparametric model for the variable of interest together with the assumption of an arbitrary CAR mechanism puts no restriction at all on the distribution of the observed data. In a slogan, CAR is everything. We describe what would seem to be the most general way CAR data could occur in practice, a sequential procedure called randomized monotone coarsening. We show that CAR mechanisms exist which are not of this type. Such a coarsening mechanism uses information about the underlying data which is not revealed to the observer, without this affecting the observer’s conclusions. In a second slogan, CAR is more than it seems. This implies that if the analyst can argue from subject-matter considerations that coarsened data is CAR, he or she has knowledge about the structure of the coarsening mechanism which can be put to good use in non-likelihood-based inference procedures. We argue that this is a valuable option in multivariate survival analysis. We give a new definition of CAR in general sample spaces, criticising earlier proposals, and we establish parallel results to the discrete case. The new definition focusses on the distribution rather than the density of the data. It allows us to generalise the theory of CAR to the important situation where coarsening variables (e.g., censoring times) are partially observed as well as the variables of interest.


Conditional Distribution Marginal Distribution Sample Space Discrete Case Semiparametric Model 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. P.J. Bickel, C.A.J. Klaassen, Y. Ritov and J.A. Wellner (1993), Efficient and Adaptive Inference in Semi-parametric Models, John Hopkins University Press, Baltimore.Google Scholar
  2. J.T. Chang and D. Pollard (1997), Conditioning as disintegration, Statistica Neerlandica 51 (to appear).Google Scholar
  3. D.M. Dabrowska (1988), Kaplan-Meier estimation on the plane, Ann. Statist. 16, 1475–1489.MathSciNetzbMATHCrossRefGoogle Scholar
  4. R.D. Gill (1989), Non-and semi-parametric maximum likelihood estimators and the von Mises method, Part 1, Scand. J. Statist. 16, 97–128.MathSciNetzbMATHGoogle Scholar
  5. R.D. Gill and J.M. Robins (1997), Sequential models for coarsening and missingness, Proc. First Seattle Symposium on Bio statistics: Survival Analysis, ed. D.Y. Lin, Springer-Verlag.Google Scholar
  6. D.F. Heitjan (1993), Ignorability and coarse data: some biomedical examples, Biometrics 49, 1099–1109.zbMATHCrossRefGoogle Scholar
  7. D.F. Heitjan (1994), Ignorability in general incomplete-data models, Biometrika 81, 701–708.MathSciNetzbMATHCrossRefGoogle Scholar
  8. D.F. Heitjan and D.B. Rubin (1991), Ignorability and coarse data, Ann. Statist. 19, 2244–2253.MathSciNetzbMATHCrossRefGoogle Scholar
  9. M. Jacobsen and N. Keiding (1995), Coarsening at random in general sample spaces and random censoring in continuous time, Ann. Statist. 23, 774–786.MathSciNetzbMATHCrossRefGoogle Scholar
  10. R. Kress (1989), Linear Integral Equations, Springer-Verlag, Berlin.zbMATHCrossRefGoogle Scholar
  11. M.J. van der Laan (1993), Efficient and Inefficient Estimation in Semiparametric Models, Ph.D. Thesis, Dept. Mathematics, University Utrecht; reprinted (1995) as CWI tract 114, Centre for Mathematics and Computer Science, Amsterdam.Google Scholar
  12. M.J. van der Laan (1996), Efficient estimation in the bivariate censoring model and repairing NPMLE, Ann. Statist. 24, 596–627.MathSciNetzbMATHCrossRefGoogle Scholar
  13. R.J.A. Little and D.B. Rubin (1987), Statistical Analysis with Missing Data, Wiley, New York.zbMATHGoogle Scholar
  14. S.F. Nielsen (1996), Incomplete Observations and Coarsening at Random, preprint, Institute of Mathematical Statistics, Univ. of Copenhagen.Google Scholar
  15. R.L. Prentice and J. Cai (1992), Covariance and survivor function estimation using censored multivariate failure time data, Biometrika 79, 495–512.MathSciNetzbMATHCrossRefGoogle Scholar
  16. J.M. Robins (1996a), Locally efficient median regression with random censoring and surrogate markers, pp. 263–274 in: Lifetime Data: Models in Reliability and Survival Analysis, N.P. Jewell, A.C. Kimber, M.L. Ting Lee, G.A. Whitmore (eds), Kluwer, Dordrecht.Google Scholar
  17. J.M. Robins (1996b), Non-response models for the analysis of non-monotone non-ignorable missing data, Statististics in Medicine, Special Issue, to appear.Google Scholar
  18. J.M. Robins and R.D. Gill (1996), Non-response models for the analysis of non-monotone ignorable missing data, Statistics in Medicine, to appear.Google Scholar
  19. J.M. Robins and Y. Ritov (1996), Towards a curse of dimensionality appropriate (CODA) asymptotic theory for semiparametric models, Statistics in Medicine, to appear.Google Scholar
  20. J.M. Robins and A. Rotnitzky (1992), Recovery of information and adjustment for dependent censoring using surrogate markers, pp. 297–331 in: AIDS Epidemiology—Methodological Issues, N. Jewell, K. Dietz, V. Farewell (eds), Birkhäuser, Boston.Google Scholar
  21. J.M. Robins, A. Rotnitzky and L.P. Zhao (1994), Estimation of regression coefficients when some regressors are not always observed, J. Amer. Statist Assoc. 89, 846–866.MathSciNetzbMATHCrossRefGoogle Scholar
  22. D.B. Rubin (1976), Inference and missing data, Biometrika 63, 581–592.MathSciNetzbMATHCrossRefGoogle Scholar
  23. D.B. Rubin, H.S. Stern and V. Vehovar (1995), Handling “Don’t Know” survey responses: the case of the Slovenian plebiscite, J. Amer. Statist. Assoc. 90, 822–828.CrossRefGoogle Scholar
  24. A.W. van der Vaart (1991), On differentiable functionals, Ann. Statist. 19, 178–204.MathSciNetzbMATHCrossRefGoogle Scholar
  25. P. Whittle (1971), Optimization under Constraints, Wiley, New York.zbMATHGoogle Scholar

Copyright information

© Springer-Verlag New York, Inc. 1997

Authors and Affiliations

  • Richard D. Gill
    • 1
  • Mark J. van der Laan
    • 2
  • James M. Robins
    • 3
  1. 1.Mathematical InstituteUniversity UtrechtUtrechtNetherlands
  2. 2.Dept. of BiostatisticsUniversity of CaliforniaBerkeleyUSA
  3. 3.Depts of Epidemiology and BiostatisticsHarvard School of Public HealthBostonUSA

Personalised recommendations