Abstract
The notion of coarsening at random (CAR) was introduced by Heitjan and Rubin (1991) to describe the most general form of randomly grouped, censored, or missing data, for which the coarsening mechanism can be ignored when making likelihood-based inference about the parameters of the distribution of the variable of interest. The CAR assumption is popular, and applications abound. However the full implications of the assumption have not been realized. Moreover a satisfactory theory of CAR for continuously distributed data—which is needed in many applications, particularly in survival analysis—hardly exists as yet. This paper gives a detailed study of CAR. We show that grouped data from a finite sample space always fit a CAR model: a nonparametric model for the variable of interest together with the assumption of an arbitrary CAR mechanism puts no restriction at all on the distribution of the observed data. In a slogan, CAR is everything. We describe what would seem to be the most general way CAR data could occur in practice, a sequential procedure called randomized monotone coarsening. We show that CAR mechanisms exist which are not of this type. Such a coarsening mechanism uses information about the underlying data which is not revealed to the observer, without this affecting the observer’s conclusions. In a second slogan, CAR is more than it seems. This implies that if the analyst can argue from subject-matter considerations that coarsened data is CAR, he or she has knowledge about the structure of the coarsening mechanism which can be put to good use in non-likelihood-based inference procedures. We argue that this is a valuable option in multivariate survival analysis. We give a new definition of CAR in general sample spaces, criticising earlier proposals, and we establish parallel results to the discrete case. The new definition focusses on the distribution rather than the density of the data. It allows us to generalise the theory of CAR to the important situation where coarsening variables (e.g., censoring times) are partially observed as well as the variables of interest.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
Bibliography
P.J. Bickel, C.A.J. Klaassen, Y. Ritov and J.A. Wellner (1993), Efficient and Adaptive Inference in Semi-parametric Models, John Hopkins University Press, Baltimore.
J.T. Chang and D. Pollard (1997), Conditioning as disintegration, Statistica Neerlandica 51 (to appear).
D.M. Dabrowska (1988), Kaplan-Meier estimation on the plane, Ann. Statist. 16, 1475–1489.
R.D. Gill (1989), Non-and semi-parametric maximum likelihood estimators and the von Mises method, Part 1, Scand. J. Statist. 16, 97–128.
R.D. Gill and J.M. Robins (1997), Sequential models for coarsening and missingness, Proc. First Seattle Symposium on Bio statistics: Survival Analysis, ed. D.Y. Lin, Springer-Verlag.
D.F. Heitjan (1993), Ignorability and coarse data: some biomedical examples, Biometrics 49, 1099–1109.
D.F. Heitjan (1994), Ignorability in general incomplete-data models, Biometrika 81, 701–708.
D.F. Heitjan and D.B. Rubin (1991), Ignorability and coarse data, Ann. Statist. 19, 2244–2253.
M. Jacobsen and N. Keiding (1995), Coarsening at random in general sample spaces and random censoring in continuous time, Ann. Statist. 23, 774–786.
R. Kress (1989), Linear Integral Equations, Springer-Verlag, Berlin.
M.J. van der Laan (1993), Efficient and Inefficient Estimation in Semiparametric Models, Ph.D. Thesis, Dept. Mathematics, University Utrecht; reprinted (1995) as CWI tract 114, Centre for Mathematics and Computer Science, Amsterdam.
M.J. van der Laan (1996), Efficient estimation in the bivariate censoring model and repairing NPMLE, Ann. Statist. 24, 596–627.
R.J.A. Little and D.B. Rubin (1987), Statistical Analysis with Missing Data, Wiley, New York.
S.F. Nielsen (1996), Incomplete Observations and Coarsening at Random, preprint, Institute of Mathematical Statistics, Univ. of Copenhagen.
R.L. Prentice and J. Cai (1992), Covariance and survivor function estimation using censored multivariate failure time data, Biometrika 79, 495–512.
J.M. Robins (1996a), Locally efficient median regression with random censoring and surrogate markers, pp. 263–274 in: Lifetime Data: Models in Reliability and Survival Analysis, N.P. Jewell, A.C. Kimber, M.L. Ting Lee, G.A. Whitmore (eds), Kluwer, Dordrecht.
J.M. Robins (1996b), Non-response models for the analysis of non-monotone non-ignorable missing data, Statististics in Medicine, Special Issue, to appear.
J.M. Robins and R.D. Gill (1996), Non-response models for the analysis of non-monotone ignorable missing data, Statistics in Medicine, to appear.
J.M. Robins and Y. Ritov (1996), Towards a curse of dimensionality appropriate (CODA) asymptotic theory for semiparametric models, Statistics in Medicine, to appear.
J.M. Robins and A. Rotnitzky (1992), Recovery of information and adjustment for dependent censoring using surrogate markers, pp. 297–331 in: AIDS Epidemiology—Methodological Issues, N. Jewell, K. Dietz, V. Farewell (eds), Birkhäuser, Boston.
J.M. Robins, A. Rotnitzky and L.P. Zhao (1994), Estimation of regression coefficients when some regressors are not always observed, J. Amer. Statist Assoc. 89, 846–866.
D.B. Rubin (1976), Inference and missing data, Biometrika 63, 581–592.
D.B. Rubin, H.S. Stern and V. Vehovar (1995), Handling “Don’t Know” survey responses: the case of the Slovenian plebiscite, J. Amer. Statist. Assoc. 90, 822–828.
A.W. van der Vaart (1991), On differentiable functionals, Ann. Statist. 19, 178–204.
P. Whittle (1971), Optimization under Constraints, Wiley, New York.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 1997 Springer-Verlag New York, Inc.
About this paper
Cite this paper
Gill, R.D., van der Laan, M.J., Robins, J.M. (1997). Coarsening at Random: Characterizations, Conjectures, Counter-Examples. In: Lin, D.Y., Fleming, T.R. (eds) Proceedings of the First Seattle Symposium in Biostatistics. Lecture Notes in Statistics, vol 123. Springer, New York, NY. https://doi.org/10.1007/978-1-4684-6316-3_14
Download citation
DOI: https://doi.org/10.1007/978-1-4684-6316-3_14
Publisher Name: Springer, New York, NY
Print ISBN: 978-0-387-94992-5
Online ISBN: 978-1-4684-6316-3
eBook Packages: Springer Book Archive