Skip to main content
Log in

Reducing bias in parameter estimates from stepwise regression in proportional hazards regression with right-censored data

  • Published:
Lifetime Data Analysis Aims and scope Submit manuscript

Abstract

When variable selection with stepwise regression and model fitting are conducted on the same data set, competition for inclusion in the model induces a selection bias in coefficient estimators away from zero. In proportional hazards regression with right-censored data, selection bias inflates the absolute value of parameter estimate of selected parameters, while the omission of other variables may shrink coefficients toward zero. This paper explores the extent of the bias in parameter estimates from stepwise proportional hazards regression and proposes a bootstrap method, similar to those proposed by Miller (Subset Selection in Regression, 2nd edn. Chapman & Hall/CRC, 2002) for linear regression, to correct for selection bias. We also use bootstrap methods to estimate the standard error of the adjusted estimators. Simulation results show that substantial biases could be present in uncorrected stepwise estimators and, for binary covariates, could exceed 250% of the true parameter value. The simulations also show that the conditional mean of the proposed bootstrap bias-corrected parameter estimator, given that a variable is selected, is moved closer to the unconditional mean of the standard partial likelihood estimator in the chosen model, and to the population value of the parameter. We also explore the effect of the adjustment on estimates of log relative risk, given the values of the covariates in a selected model. The proposed method is illustrated with data sets in primary biliary cirrhosis and in multiple myeloma from the Eastern Cooperative Oncology Group.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Akaike H (1973). Information theory and the extension of the maximum likelihood principle. In: Petrov, BN and Czaki, F (eds) 2nd international symposium on information theory, pp 267–281. Akademiai Kiado, Budapest

    Google Scholar 

  • Altman DG and Andersen PK (1989). Bootstrap investigation of the stability of a Cox regression model. Stat Med 8: 771–783

    Article  Google Scholar 

  • Chen C-H and George SL (1985). The bootstrap and identification of prognostic factors via Cox’s proportional hazards regression model. Stat Med 4: 39–46

    Article  Google Scholar 

  • Cox DR (1972). Regression models and life tables (with discussion). J R Stat Soc [Ser B] 34: 187–220

    MATH  Google Scholar 

  • Davison A and Hinkley D (1997). Bootstrap methods and their application. Cambridge University Press, Cambridge UK

    MATH  Google Scholar 

  • Faraggi D and Simon R (1998). Bayesian variable selection method for censored survival data. Biometrics 54: 1475–1485

    Article  MATH  MathSciNet  Google Scholar 

  • Fleming TR and Harrington DP (1991). Counting processes and survival analysis. Wiley, New York

    MATH  Google Scholar 

  • Frank IE and Friedman JH (1993). A statistical view of some chemometrics regression tools. Technometrics 35: 109–135

    Article  MATH  Google Scholar 

  • Harrell FE, Lee KL and Mark DB (1996). Tutorial in biostatistics. Multivariable prognostic models: Issues in developing models, evaluating assumptions and adequacy, and measuring and reducing errors. Stat Med 15: 361–387

    Article  Google Scholar 

  • Harrell FE, Lee KL, Califf RM, Pryor DB and Rosati RA (1984). Regression modelling strategies for improved prognostic prediction. Stat Med 3: 143–152

    Article  Google Scholar 

  • Huang J and Harrington D (2002). Penalized partial likelihood regression for right-censored data with bootstrap selection of the penalty parameter. Biometrics 58: 781–791

    Article  MathSciNet  Google Scholar 

  • Johnson ME, Tolley HD, Bryson MC and Goldman AS (1982). Covariate analysis of survival data: a small-sample study of Cox’s model. Biometrics 38: 685–698

    Article  Google Scholar 

  • Lagakos SW and Schoenfeld DA (1984). Properties of proportional-hazards score tests under misspecified regression models. Biometrics 40: 1037–1048

    Article  MATH  MathSciNet  Google Scholar 

  • Miller AJ (1984). Selection of subsets of regression variables. J R Stat Soc [Ser A] 147: 389–425

    Article  MATH  Google Scholar 

  • Miller AJ (2002) Subset selection in regression, 2nd edn. Chapman & Hall/CRC

  • Oken M, Leong T, Lenhard R, Greipp P, Kay N, Ness BV and Kyle R (1999). The addition of interferon or high dose cyclophosphamide to standard chemotherapy in the treatment of patients with multiple myeloma. Cancer 86: 957–968

    Article  Google Scholar 

  • Raftery A, Madigan D, Volinsky C (1996) Accounting for model uncertainty in survival analysis improves predictive performance. In: Bernardo J, Berger JO, Dawid AP, Smith AFM (eds) Bayesian statistics 5. Oxford University Press, pp 323–349

  • Sauerbrei W and Schumacher M (1992). A bootstrap resampling procedure for model building: application to the Cox regression model. Stat Med 11: 2093–2109

    Article  Google Scholar 

  • Struthers CA and Kalbfleisch JD (1986). Misspecified proportional hazards models. Biometrika 73: 363–369

    Article  MATH  MathSciNet  Google Scholar 

  • Tibshirani R (1997). The LASSO method for variable selection in the Cox model. Stat Med 16: 385–395

    Article  Google Scholar 

  • van Houwelingen JC and le Cessie S (1990). Predictive value of statistical models. Stat Med 9: 1303–1325

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Chang-Heok Soh.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Soh, CH., Harrington, D.P. & Zaslavsky, A.M. Reducing bias in parameter estimates from stepwise regression in proportional hazards regression with right-censored data. Lifetime Data Anal 14, 65–85 (2008). https://doi.org/10.1007/s10985-007-9078-5

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10985-007-9078-5

Keywords

Navigation