Skip to main content
Log in

Events per variable for risk differences and relative risks using pseudo-observations

  • Published:
Lifetime Data Analysis Aims and scope Submit manuscript

Abstract

A method based on pseudo-observations has been proposed for direct regression modeling of functionals of interest with right-censored data, including the survival function, the restricted mean and the cumulative incidence function in competing risks. The models, once the pseudo-observations have been computed, can be fitted using standard generalized estimating equation software. Regression models can however yield problematic results if the number of covariates is large in relation to the number of events observed. Guidelines of events per variable are often used in practice. These rules of thumb for the number of events per variable have primarily been established based on simulation studies for the logistic regression model and Cox regression model. In this paper we conduct a simulation study to examine the small sample behavior of the pseudo-observation method to estimate risk differences and relative risks for right-censored data. We investigate how coverage probabilities and relative bias of the pseudo-observation estimator interact with sample size, number of variables and average number of events per variable.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

References

  • Andersen PK, Hansen MG, Klein JP (2004) Regression analysis of restricted mean survival time based on pseudo-observations. Life Time Data Anal 10:335–350

    Article  MathSciNet  MATH  Google Scholar 

  • Andersen PK, Klein JP (2007) Regression analysis for multistate models based on a pseudovalue approach, with applications to bone marrow transplantation studies. Scand J Stat 34:3–16

    Article  MathSciNet  MATH  Google Scholar 

  • Andersen PK, Klein JP, Rosthoj S (2003) Generalised linear models for correlated pseudoobservations, with applications to multi-state models. Biometrika 90:15–27

    Article  MathSciNet  MATH  Google Scholar 

  • Andersen PK, Perme MP (2010) Pseudo-observations in survival analysis. Stat Methods Med Res 19(1):71–99

    Article  MathSciNet  Google Scholar 

  • Binder N, Gerds TA, Andersen P (2013) Pseudo-observations for competing risks with covariate dependent censoring. Lifetime Data Anal

  • Concato J, Peduzzi P, Holford TR, Feinstein AR (1995) Importance of events per independent variable in proportional hazards analysis. I. Background, goals, and general strategy. J Clin Epidemiol 48(12):1495–1501

    Article  Google Scholar 

  • Graw F, Gerds TA, Schumacher M (2009) On pseudo-values for regression analysis in competing risks models. Lifetime Data Anal 15(2):241–255

    Article  MathSciNet  MATH  Google Scholar 

  • Kaplan EL, Meier P (1958) Nonparametric estimation from incomplete observations. J Am Stat Assoc 53(282):457–481

    Article  MathSciNet  MATH  Google Scholar 

  • Klein JP (2006) Modeling competing risks in cancer studies. Stat Med 25:1015–1034

    Article  MathSciNet  Google Scholar 

  • Klein JP, Andersen PK (2005) Regression modeling of competing risks data based on pseudo-values of the cumulative incidence function. Biometrics 61:223–229

    Article  MathSciNet  MATH  Google Scholar 

  • Klein JP, Andersen PK, Logan BL, Harhoff MG (2007) Analyzing survival curves at a fixed point in time. Stat Med 26:4505–4519

    Article  MathSciNet  Google Scholar 

  • Klein JP, Gerster M, Andersen PK, Tarima S, Perme MP (2008) SAS and R functions to compute pseudo-values for censored data regression. Comput Methods Programs Biomed 89:289–300

    Article  Google Scholar 

  • Liang KY, Zeger SL (1986) Longitudinal data analysis using generalized linear models. Biometrika 73(1):13–22

    Article  MathSciNet  MATH  Google Scholar 

  • Parner, E.T., Andersen, P.K.: Regression analysis of censored data using pseudoobservations. Stata Journal 10(3), 408–422(15) (2010).

    Google Scholar 

  • Peduzzi P, Concato J, Feinstein AR, Holford TR (1995) Importance of events per independent variable in proportional hazards regression analysis. II. Accuracy and precision of regression estimates. J Clin Epidemiol 48(12):1503–1510

    Article  Google Scholar 

  • Perme MP, Andersen PK (2008) Checking hazard regression models using pseudoobservations. Stat Med 27(25):5309–5328

    Google Scholar 

  • Vittinghoff E, McCulloch CE (2007) Relaxing the rule of ten events per variable in logistic and Cox regression. Am J Epidemiol 165(6):710–718

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Stefan Nygaard Hansen.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 210 KB)

Appendix

Appendix

1.1 The risk difference model

Let \(\tau >0\) be a given time point, and let us introduce two functions \(f\) and \(g\) by

$$\begin{aligned} f(y,z)=\frac{y}{y+z},\quad g(y,z)=e^{-(y+z)\tau },\quad y,z>0. \end{aligned}$$

Let \(T, C\) and \(X\) be given as in Sect. 3 and let \(q_x=P(X=x)\). The model in (2) can equivalently be formulated as \(T\mid X=x\sim \exp (\lambda _x)\), where \(\lambda _x=-\log (1-p-dx)/\tau \) and \(\tau =1\). Under this model, the probability of observing an event before time \(\tau \) is

$$\begin{aligned} P(T\le C,T&\le \tau )\nonumber \\&= \sum \limits _{x=0,1}\big (1+f(\lambda _c,\lambda _x)g(\lambda _c,\lambda _x)-f(\lambda _c,\lambda _x)-g(\lambda _c,\lambda _x)\big )q_x, \end{aligned}$$
(5)

and the probability of observing an event after time \(\tau \) is

$$\begin{aligned} P(T\le C,T>\tau )=\sum \limits _{x=0,1}\big (g(\lambda _c,\lambda _x)-f(\lambda _c,\lambda _x)g(\lambda _c,\lambda _x)\big )q_x. \end{aligned}$$
(6)

If we denote the probability of observing an event before time \(\tau \) by \(p_e(\lambda _c)\), where \(\lambda _c\) is the censoring rate parameter, then

$$\begin{aligned} \lim _{\lambda _c\rightarrow \infty }p_e(\lambda _c)=0 \end{aligned}$$
(7)

since the function inside the summation tends to \(0\). Furthermore we have that

$$\begin{aligned} \lim _{\lambda _c\downarrow 0}p_e(\lambda _c)=\sum _{x=0,1}\left( 1-g(0,\lambda _x)\right) q_x&= \sum _{x=0,1}\left( 1-(1-p-dx)\right) q_x\nonumber \\&= p+dq , \end{aligned}$$
(8)

where \(q=q_1\) is the mean of \(X\).

Let \(X_1\) be a binary covariate with mean \(q\) and assume that \(X_2,\ldots ,X_k\) are i.i.d. normally distributed with mean \(0\) and variance \(\sigma ^2>0\). The model (3) can now be formulated as \(T\mid X_1=x_1,\tilde{X}=\tilde{x}\sim \exp (\lambda _{x_1,\tilde{x}})\), where \(\tilde{X}=(X_2,\ldots ,X_k)\) and

$$\begin{aligned} \lambda _{x_1,\tilde{x}}=-\log (1-p-dx_1-x_2-\cdots -x_k)/\tau . \end{aligned}$$

Under this model, the probability, \(P(T\le C,T\le \tau )\), of observing an event before \(\tau \) is

$$\begin{aligned} \mathrm{E}\left[ 1+f(\lambda _c,\lambda _{X_1,\tilde{X}})g(\lambda _c,\lambda _{X_1,\tilde{X}})-f(\lambda _c,\lambda _{X_1,\tilde{X}})-g(\lambda _c,\lambda _{X_1,\tilde{X}})\mid A\right] , \end{aligned}$$

where \(A\) is the event that \(p+dX_1+X_2+\cdots +X_k\in (0,1)\). The probability of observing an event after \(\tau \) is thus

$$\begin{aligned} P(T\le C,T>\tau )=\mathrm{E}\left[ g(\lambda _c,\lambda _{X_1,\tilde{X}})-f(\lambda _c,\lambda _{X_1,\tilde{X}})g(\lambda _c,\lambda _{X_1,\tilde{X}})\mid A\right] . \end{aligned}$$

These probabilities are simulated in our study since explicit formulas for them do not exist.

To obtain a risk difference model such that \(P(A)\) is close to one, we choose the variance \(\sigma ^2\) of \(X_2,\ldots ,X_k\) such that

$$\begin{aligned} \sigma = \frac{\min (p,1-p-d)}{3\sqrt{k-1}}, \end{aligned}$$
(9)

since then

$$\begin{aligned} P(A)=P(p+dX_1+X_2+\cdots +X_k\in (0,1))\ge 0.9973. \end{aligned}$$

Rights and permissions

Reprints and permissions

About this article

Cite this article

Hansen, S.N., Andersen, P.K. & Parner, E.T. Events per variable for risk differences and relative risks using pseudo-observations. Lifetime Data Anal 20, 584–598 (2014). https://doi.org/10.1007/s10985-013-9290-4

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10985-013-9290-4

Keywords

Navigation