Abstract
A method based on pseudo-observations has been proposed for direct regression modeling of functionals of interest with right-censored data, including the survival function, the restricted mean and the cumulative incidence function in competing risks. The models, once the pseudo-observations have been computed, can be fitted using standard generalized estimating equation software. Regression models can however yield problematic results if the number of covariates is large in relation to the number of events observed. Guidelines of events per variable are often used in practice. These rules of thumb for the number of events per variable have primarily been established based on simulation studies for the logistic regression model and Cox regression model. In this paper we conduct a simulation study to examine the small sample behavior of the pseudo-observation method to estimate risk differences and relative risks for right-censored data. We investigate how coverage probabilities and relative bias of the pseudo-observation estimator interact with sample size, number of variables and average number of events per variable.
Similar content being viewed by others
References
Andersen PK, Hansen MG, Klein JP (2004) Regression analysis of restricted mean survival time based on pseudo-observations. Life Time Data Anal 10:335–350
Andersen PK, Klein JP (2007) Regression analysis for multistate models based on a pseudovalue approach, with applications to bone marrow transplantation studies. Scand J Stat 34:3–16
Andersen PK, Klein JP, Rosthoj S (2003) Generalised linear models for correlated pseudoobservations, with applications to multi-state models. Biometrika 90:15–27
Andersen PK, Perme MP (2010) Pseudo-observations in survival analysis. Stat Methods Med Res 19(1):71–99
Binder N, Gerds TA, Andersen P (2013) Pseudo-observations for competing risks with covariate dependent censoring. Lifetime Data Anal
Concato J, Peduzzi P, Holford TR, Feinstein AR (1995) Importance of events per independent variable in proportional hazards analysis. I. Background, goals, and general strategy. J Clin Epidemiol 48(12):1495–1501
Graw F, Gerds TA, Schumacher M (2009) On pseudo-values for regression analysis in competing risks models. Lifetime Data Anal 15(2):241–255
Kaplan EL, Meier P (1958) Nonparametric estimation from incomplete observations. J Am Stat Assoc 53(282):457–481
Klein JP (2006) Modeling competing risks in cancer studies. Stat Med 25:1015–1034
Klein JP, Andersen PK (2005) Regression modeling of competing risks data based on pseudo-values of the cumulative incidence function. Biometrics 61:223–229
Klein JP, Andersen PK, Logan BL, Harhoff MG (2007) Analyzing survival curves at a fixed point in time. Stat Med 26:4505–4519
Klein JP, Gerster M, Andersen PK, Tarima S, Perme MP (2008) SAS and R functions to compute pseudo-values for censored data regression. Comput Methods Programs Biomed 89:289–300
Liang KY, Zeger SL (1986) Longitudinal data analysis using generalized linear models. Biometrika 73(1):13–22
Parner, E.T., Andersen, P.K.: Regression analysis of censored data using pseudoobservations. Stata Journal 10(3), 408–422(15) (2010).
Peduzzi P, Concato J, Feinstein AR, Holford TR (1995) Importance of events per independent variable in proportional hazards regression analysis. II. Accuracy and precision of regression estimates. J Clin Epidemiol 48(12):1503–1510
Perme MP, Andersen PK (2008) Checking hazard regression models using pseudoobservations. Stat Med 27(25):5309–5328
Vittinghoff E, McCulloch CE (2007) Relaxing the rule of ten events per variable in logistic and Cox regression. Am J Epidemiol 165(6):710–718
Author information
Authors and Affiliations
Corresponding author
Electronic supplementary material
Below is the link to the electronic supplementary material.
Appendix
Appendix
1.1 The risk difference model
Let \(\tau >0\) be a given time point, and let us introduce two functions \(f\) and \(g\) by
Let \(T, C\) and \(X\) be given as in Sect. 3 and let \(q_x=P(X=x)\). The model in (2) can equivalently be formulated as \(T\mid X=x\sim \exp (\lambda _x)\), where \(\lambda _x=-\log (1-p-dx)/\tau \) and \(\tau =1\). Under this model, the probability of observing an event before time \(\tau \) is
and the probability of observing an event after time \(\tau \) is
If we denote the probability of observing an event before time \(\tau \) by \(p_e(\lambda _c)\), where \(\lambda _c\) is the censoring rate parameter, then
since the function inside the summation tends to \(0\). Furthermore we have that
where \(q=q_1\) is the mean of \(X\).
Let \(X_1\) be a binary covariate with mean \(q\) and assume that \(X_2,\ldots ,X_k\) are i.i.d. normally distributed with mean \(0\) and variance \(\sigma ^2>0\). The model (3) can now be formulated as \(T\mid X_1=x_1,\tilde{X}=\tilde{x}\sim \exp (\lambda _{x_1,\tilde{x}})\), where \(\tilde{X}=(X_2,\ldots ,X_k)\) and
Under this model, the probability, \(P(T\le C,T\le \tau )\), of observing an event before \(\tau \) is
where \(A\) is the event that \(p+dX_1+X_2+\cdots +X_k\in (0,1)\). The probability of observing an event after \(\tau \) is thus
These probabilities are simulated in our study since explicit formulas for them do not exist.
To obtain a risk difference model such that \(P(A)\) is close to one, we choose the variance \(\sigma ^2\) of \(X_2,\ldots ,X_k\) such that
since then
Rights and permissions
About this article
Cite this article
Hansen, S.N., Andersen, P.K. & Parner, E.T. Events per variable for risk differences and relative risks using pseudo-observations. Lifetime Data Anal 20, 584–598 (2014). https://doi.org/10.1007/s10985-013-9290-4
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10985-013-9290-4