Events per variable for risk differences and relative risks using pseudo-observations

Hansen, Stefan Nygaard; Andersen, Per Kragh; Parner, Erik Thorlund

doi:10.1007/s10985-013-9290-4

Events per variable for risk differences and relative risks using pseudo-observations

Published: 14 January 2014

Volume 20, pages 584–598, (2014)
Cite this article

Lifetime Data Analysis Aims and scope Submit manuscript

Stefan Nygaard Hansen¹,
Per Kragh Andersen² &
Erik Thorlund Parner¹

918 Accesses
17 Citations
Explore all metrics

Abstract

A method based on pseudo-observations has been proposed for direct regression modeling of functionals of interest with right-censored data, including the survival function, the restricted mean and the cumulative incidence function in competing risks. The models, once the pseudo-observations have been computed, can be fitted using standard generalized estimating equation software. Regression models can however yield problematic results if the number of covariates is large in relation to the number of events observed. Guidelines of events per variable are often used in practice. These rules of thumb for the number of events per variable have primarily been established based on simulation studies for the logistic regression model and Cox regression model. In this paper we conduct a simulation study to examine the small sample behavior of the pseudo-observation method to estimate risk differences and relative risks for right-censored data. We investigate how coverage probabilities and relative bias of the pseudo-observation estimator interact with sample size, number of variables and average number of events per variable.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Violating the normality assumption may be the lesser of two evils

Article Open access 07 May 2021

A general model-checking procedure for semiparametric accelerated failure time models

Article 07 May 2024

Binary Logistic Regression

References

Andersen PK, Hansen MG, Klein JP (2004) Regression analysis of restricted mean survival time based on pseudo-observations. Life Time Data Anal 10:335–350
Article MathSciNet MATH Google Scholar
Andersen PK, Klein JP (2007) Regression analysis for multistate models based on a pseudovalue approach, with applications to bone marrow transplantation studies. Scand J Stat 34:3–16
Article MathSciNet MATH Google Scholar
Andersen PK, Klein JP, Rosthoj S (2003) Generalised linear models for correlated pseudoobservations, with applications to multi-state models. Biometrika 90:15–27
Article MathSciNet MATH Google Scholar
Andersen PK, Perme MP (2010) Pseudo-observations in survival analysis. Stat Methods Med Res 19(1):71–99
Article MathSciNet Google Scholar
Binder N, Gerds TA, Andersen P (2013) Pseudo-observations for competing risks with covariate dependent censoring. Lifetime Data Anal
Concato J, Peduzzi P, Holford TR, Feinstein AR (1995) Importance of events per independent variable in proportional hazards analysis. I. Background, goals, and general strategy. J Clin Epidemiol 48(12):1495–1501
Article Google Scholar
Graw F, Gerds TA, Schumacher M (2009) On pseudo-values for regression analysis in competing risks models. Lifetime Data Anal 15(2):241–255
Article MathSciNet MATH Google Scholar
Kaplan EL, Meier P (1958) Nonparametric estimation from incomplete observations. J Am Stat Assoc 53(282):457–481
Article MathSciNet MATH Google Scholar
Klein JP (2006) Modeling competing risks in cancer studies. Stat Med 25:1015–1034
Article MathSciNet Google Scholar
Klein JP, Andersen PK (2005) Regression modeling of competing risks data based on pseudo-values of the cumulative incidence function. Biometrics 61:223–229
Article MathSciNet MATH Google Scholar
Klein JP, Andersen PK, Logan BL, Harhoff MG (2007) Analyzing survival curves at a fixed point in time. Stat Med 26:4505–4519
Article MathSciNet Google Scholar
Klein JP, Gerster M, Andersen PK, Tarima S, Perme MP (2008) SAS and R functions to compute pseudo-values for censored data regression. Comput Methods Programs Biomed 89:289–300
Article Google Scholar
Liang KY, Zeger SL (1986) Longitudinal data analysis using generalized linear models. Biometrika 73(1):13–22
Article MathSciNet MATH Google Scholar
Parner, E.T., Andersen, P.K.: Regression analysis of censored data using pseudoobservations. Stata Journal 10(3), 408–422(15) (2010).
Google Scholar
Peduzzi P, Concato J, Feinstein AR, Holford TR (1995) Importance of events per independent variable in proportional hazards regression analysis. II. Accuracy and precision of regression estimates. J Clin Epidemiol 48(12):1503–1510
Article Google Scholar
Perme MP, Andersen PK (2008) Checking hazard regression models using pseudoobservations. Stat Med 27(25):5309–5328
Google Scholar
Vittinghoff E, McCulloch CE (2007) Relaxing the rule of ten events per variable in logistic and Cox regression. Am J Epidemiol 165(6):710–718
Article Google Scholar

Download references

Author information

Authors and Affiliations

Section for Biostatistics, University of Aarhus, Bartholins Allé 2, 8000 , Aarhus C, Denmark
Stefan Nygaard Hansen & Erik Thorlund Parner
Department of Biostatistics, University of Copenhagen, Øster Farimagsgade 5, 1014 , Copenhagen K, Denmark
Per Kragh Andersen

Authors

Stefan Nygaard Hansen
View author publications
You can also search for this author in PubMed Google Scholar
Per Kragh Andersen
View author publications
You can also search for this author in PubMed Google Scholar
Erik Thorlund Parner
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Stefan Nygaard Hansen.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 210 KB)

Appendix

1.1 The risk difference model

Let $\tau >0$ be a given time point, and let us introduce two functions $f$ and $g$ by

$$\begin{aligned} f(y,z)=\frac{y}{y+z},\quad g(y,z)=e^{-(y+z)\tau },\quad y,z>0. \end{aligned}$$

Let $T, C$ and $X$ be given as in Sect. 3 and let $q_x=P(X=x)$. The model in (2) can equivalently be formulated as $T\mid X=x\sim \exp (\lambda _x)$, where $\lambda _x=-\log (1-p-dx)/\tau $ and $\tau =1$. Under this model, the probability of observing an event before time $\tau $ is

$$\begin{aligned} P(T\le C,T&\le \tau )\nonumber \\&= \sum \limits _{x=0,1}\big (1+f(\lambda _c,\lambda _x)g(\lambda _c,\lambda _x)-f(\lambda _c,\lambda _x)-g(\lambda _c,\lambda _x)\big )q_x, \end{aligned}$$

(5)

and the probability of observing an event after time $\tau $ is

$$\begin{aligned} P(T\le C,T>\tau )=\sum \limits _{x=0,1}\big (g(\lambda _c,\lambda _x)-f(\lambda _c,\lambda _x)g(\lambda _c,\lambda _x)\big )q_x. \end{aligned}$$

(6)

If we denote the probability of observing an event before time $\tau $ by $p_e(\lambda _c)$, where $\lambda _c$ is the censoring rate parameter, then

$$\begin{aligned} \lim _{\lambda _c\rightarrow \infty }p_e(\lambda _c)=0 \end{aligned}$$

(7)

since the function inside the summation tends to $0$. Furthermore we have that

$$\begin{aligned} \lim _{\lambda _c\downarrow 0}p_e(\lambda _c)=\sum _{x=0,1}\left( 1-g(0,\lambda _x)\right) q_x&= \sum _{x=0,1}\left( 1-(1-p-dx)\right) q_x\nonumber \\&= p+dq , \end{aligned}$$

(8)

where $q=q_1$ is the mean of $X$.

Let $X_1$ be a binary covariate with mean $q$ and assume that $X_2,\ldots ,X_k$ are i.i.d. normally distributed with mean $0$ and variance $\sigma ^2>0$. The model (3) can now be formulated as $T\mid X_1=x_1,\tilde{X}=\tilde{x}\sim \exp (\lambda _{x_1,\tilde{x}})$, where $\tilde{X}=(X_2,\ldots ,X_k)$ and

$$\begin{aligned} \lambda _{x_1,\tilde{x}}=-\log (1-p-dx_1-x_2-\cdots -x_k)/\tau . \end{aligned}$$

Under this model, the probability, $P(T\le C,T\le \tau )$, of observing an event before $\tau $ is

$$\begin{aligned} \mathrm{E}\left[ 1+f(\lambda _c,\lambda _{X_1,\tilde{X}})g(\lambda _c,\lambda _{X_1,\tilde{X}})-f(\lambda _c,\lambda _{X_1,\tilde{X}})-g(\lambda _c,\lambda _{X_1,\tilde{X}})\mid A\right] , \end{aligned}$$

where $A$ is the event that $p+dX_1+X_2+\cdots +X_k\in (0,1)$. The probability of observing an event after $\tau $ is thus

$$\begin{aligned} P(T\le C,T>\tau )=\mathrm{E}\left[ g(\lambda _c,\lambda _{X_1,\tilde{X}})-f(\lambda _c,\lambda _{X_1,\tilde{X}})g(\lambda _c,\lambda _{X_1,\tilde{X}})\mid A\right] . \end{aligned}$$

These probabilities are simulated in our study since explicit formulas for them do not exist.

To obtain a risk difference model such that $P(A)$ is close to one, we choose the variance $\sigma ^2$ of $X_2,\ldots ,X_k$ such that

$$\begin{aligned} \sigma = \frac{\min (p,1-p-d)}{3\sqrt{k-1}}, \end{aligned}$$

(9)

since then

$$\begin{aligned} P(A)=P(p+dX_1+X_2+\cdots +X_k\in (0,1))\ge 0.9973. \end{aligned}$$

Rights and permissions

Reprints and permissions

About this article

Cite this article

Hansen, S.N., Andersen, P.K. & Parner, E.T. Events per variable for risk differences and relative risks using pseudo-observations. Lifetime Data Anal 20, 584–598 (2014). https://doi.org/10.1007/s10985-013-9290-4

Download citation

Received: 25 April 2013
Accepted: 30 December 2013
Published: 14 January 2014
Issue Date: October 2014
DOI: https://doi.org/10.1007/s10985-013-9290-4

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Events per variable for risk differences and relative risks using pseudo-observations

Abstract

Access this article

Similar content being viewed by others

Violating the normality assumption may be the lesser of two evils

A general model-checking procedure for semiparametric accelerated failure time models

Binary Logistic Regression

References

Author information

Authors and Affiliations

Corresponding author

Electronic supplementary material

Supplementary material 1 (pdf 210 KB)

Appendix

1.1 The risk difference model

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Events per variable for risk differences and relative risks using pseudo-observations

Abstract

Access this article

Similar content being viewed by others

Violating the normality assumption may be the lesser of two evils

A general model-checking procedure for semiparametric accelerated failure time models

Binary Logistic Regression

References

Author information

Authors and Affiliations

Corresponding author

Electronic supplementary material

Supplementary material 1 (pdf 210 KB)

Appendix

Appendix

1.1 The risk difference model

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation