# Coarsening at Random: Characterizations, Conjectures, Counter-Examples

## Abstract

The notion of *coarsening at random* (CAR) was introduced by Heitjan and Rubin (1991) to describe the most general form of randomly grouped, censored, or missing data, for which the coarsening mechanism can be ignored when making likelihood-based inference about the parameters of the distribution of the variable of interest. The CAR assumption is popular, and applications abound. However the full implications of the assumption have not been realized. Moreover a satisfactory theory of CAR for continuously distributed data—which is needed in many applications, particularly in survival analysis—hardly exists as yet. This paper gives a detailed study of CAR. We show that grouped data from a finite sample space *always* fit a CAR model: a nonparametric model for the variable of interest together with the assumption of an arbitrary CAR mechanism puts no restriction at all on the distribution of the observed data. In a slogan, CAR is everything. We describe what would seem to be the most general way CAR data could occur in practice, a sequential procedure called *randomized monotone coarsening*. We show that CAR mechanisms exist which are not of this type. Such a coarsening mechanism uses information about the underlying data which is not revealed to the observer, without this affecting the observer’s conclusions. In a second slogan, CAR is more than it seems. This implies that if the analyst can argue from subject-matter considerations that coarsened data is CAR, he or she has knowledge about the structure of the coarsening mechanism which can be put to good use in non-likelihood-based inference procedures. We argue that this is a valuable option in multivariate survival analysis. We give a new definition of CAR in general sample spaces, criticising earlier proposals, and we establish parallel results to the discrete case. The new definition focusses on the distribution rather than the density of the data. It allows us to generalise the theory of CAR to the important situation where coarsening variables (e.g., censoring times) are partially observed as well as the variables of interest.

## Keywords

Conditional Distribution Marginal Distribution Sample Space Discrete Case Semiparametric Model## Preview

Unable to display preview. Download preview PDF.

## Bibliography

- P.J. Bickel, C.A.J. Klaassen, Y. Ritov and J.A. Wellner (1993),
*Efficient and Adaptive Inference in Semi-parametric Models*, John Hopkins University Press, Baltimore.Google Scholar - J.T. Chang and D. Pollard (1997), Conditioning as disintegration,
*Statistica Neerlandica*51 (to appear).Google Scholar - D.M. Dabrowska (1988), Kaplan-Meier estimation on the plane,
*Ann. Statist*.**16**, 1475–1489.MathSciNetzbMATHCrossRefGoogle Scholar - R.D. Gill (1989), Non-and semi-parametric maximum likelihood estimators and the von Mises method, Part 1,
*Scand. J. Statist*.**16**, 97–128.MathSciNetzbMATHGoogle Scholar - R.D. Gill and J.M. Robins (1997), Sequential models for coarsening and missingness,
*Proc. First Seattle Symposium on Bio statistics: Survival Analysis*, ed. D.Y. Lin, Springer-Verlag.Google Scholar - D.F. Heitjan (1993), Ignorability and coarse data: some biomedical examples,
*Biometrics***49**, 1099–1109.zbMATHCrossRefGoogle Scholar - D.F. Heitjan (1994), Ignorability in general incomplete-data models,
*Biometrika***81**, 701–708.MathSciNetzbMATHCrossRefGoogle Scholar - D.F. Heitjan and D.B. Rubin (1991), Ignorability and coarse data,
*Ann. Statist*.**19**, 2244–2253.MathSciNetzbMATHCrossRefGoogle Scholar - M. Jacobsen and N. Keiding (1995), Coarsening at random in general sample spaces and random censoring in continuous time,
*Ann. Statist*.**23**, 774–786.MathSciNetzbMATHCrossRefGoogle Scholar - R. Kress (1989),
*Linear Integral Equations*, Springer-Verlag, Berlin.zbMATHCrossRefGoogle Scholar - M.J. van der Laan (1993),
*Efficient and Inefficient Estimation in Semiparametric Models*, Ph.D. Thesis, Dept. Mathematics, University Utrecht; reprinted (1995) as CWI tract 114, Centre for Mathematics and Computer Science, Amsterdam.Google Scholar - M.J. van der Laan (1996), Efficient estimation in the bivariate censoring model and repairing NPMLE,
*Ann. Statist*.**24**, 596–627.MathSciNetzbMATHCrossRefGoogle Scholar - R.J.A. Little and D.B. Rubin (1987),
*Statistical Analysis with Missing Data*, Wiley, New York.zbMATHGoogle Scholar - S.F. Nielsen (1996),
*Incomplete Observations and Coarsening at Random*, preprint, Institute of Mathematical Statistics, Univ. of Copenhagen.Google Scholar - R.L. Prentice and J. Cai (1992), Covariance and survivor function estimation using censored multivariate failure time data,
*Biometrika***79**, 495–512.MathSciNetzbMATHCrossRefGoogle Scholar - J.M. Robins (1996a), Locally efficient median regression with random censoring and surrogate markers, pp. 263–274 in:
*Lifetime Data: Models in Reliability and Survival Analysis*, N.P. Jewell, A.C. Kimber, M.L. Ting Lee, G.A. Whitmore (eds), Kluwer, Dordrecht.Google Scholar - J.M. Robins (1996b), Non-response models for the analysis of non-monotone non-ignorable missing data,
*Statististics in Medicine*, Special Issue, to appear.Google Scholar - J.M. Robins and R.D. Gill (1996), Non-response models for the analysis of non-monotone ignorable missing data,
*Statistics in Medicine*, to appear.Google Scholar - J.M. Robins and Y. Ritov (1996), Towards a curse of dimensionality appropriate (CODA) asymptotic theory for semiparametric models,
*Statistics in Medicine*, to appear.Google Scholar - J.M. Robins and A. Rotnitzky (1992), Recovery of information and adjustment for dependent censoring using surrogate markers, pp. 297–331 in:
*AIDS Epidemiology—Methodological Issues*, N. Jewell, K. Dietz, V. Farewell (eds), Birkhäuser, Boston.Google Scholar - J.M. Robins, A. Rotnitzky and L.P. Zhao (1994), Estimation of regression coefficients when some regressors are not always observed,
*J. Amer. Statist Assoc*.**89**, 846–866.MathSciNetzbMATHCrossRefGoogle Scholar - D.B. Rubin (1976), Inference and missing data,
*Biometrika***63**, 581–592.MathSciNetzbMATHCrossRefGoogle Scholar - D.B. Rubin, H.S. Stern and V. Vehovar (1995), Handling “Don’t Know” survey responses: the case of the Slovenian plebiscite,
*J. Amer. Statist. Assoc*.**90**, 822–828.CrossRefGoogle Scholar - A.W. van der Vaart (1991), On differentiable functionals,
*Ann. Statist*.**19**, 178–204.MathSciNetzbMATHCrossRefGoogle Scholar - P. Whittle (1971),
*Optimization under Constraints*, Wiley, New York.zbMATHGoogle Scholar