Abstract
When variable selection with stepwise regression and model fitting are conducted on the same data set, competition for inclusion in the model induces a selection bias in coefficient estimators away from zero. In proportional hazards regression with right-censored data, selection bias inflates the absolute value of parameter estimate of selected parameters, while the omission of other variables may shrink coefficients toward zero. This paper explores the extent of the bias in parameter estimates from stepwise proportional hazards regression and proposes a bootstrap method, similar to those proposed by Miller (Subset Selection in Regression, 2nd edn. Chapman & Hall/CRC, 2002) for linear regression, to correct for selection bias. We also use bootstrap methods to estimate the standard error of the adjusted estimators. Simulation results show that substantial biases could be present in uncorrected stepwise estimators and, for binary covariates, could exceed 250% of the true parameter value. The simulations also show that the conditional mean of the proposed bootstrap bias-corrected parameter estimator, given that a variable is selected, is moved closer to the unconditional mean of the standard partial likelihood estimator in the chosen model, and to the population value of the parameter. We also explore the effect of the adjustment on estimates of log relative risk, given the values of the covariates in a selected model. The proposed method is illustrated with data sets in primary biliary cirrhosis and in multiple myeloma from the Eastern Cooperative Oncology Group.
Similar content being viewed by others
References
Akaike H (1973). Information theory and the extension of the maximum likelihood principle. In: Petrov, BN and Czaki, F (eds) 2nd international symposium on information theory, pp 267–281. Akademiai Kiado, Budapest
Altman DG and Andersen PK (1989). Bootstrap investigation of the stability of a Cox regression model. Stat Med 8: 771–783
Chen C-H and George SL (1985). The bootstrap and identification of prognostic factors via Cox’s proportional hazards regression model. Stat Med 4: 39–46
Cox DR (1972). Regression models and life tables (with discussion). J R Stat Soc [Ser B] 34: 187–220
Davison A and Hinkley D (1997). Bootstrap methods and their application. Cambridge University Press, Cambridge UK
Faraggi D and Simon R (1998). Bayesian variable selection method for censored survival data. Biometrics 54: 1475–1485
Fleming TR and Harrington DP (1991). Counting processes and survival analysis. Wiley, New York
Frank IE and Friedman JH (1993). A statistical view of some chemometrics regression tools. Technometrics 35: 109–135
Harrell FE, Lee KL and Mark DB (1996). Tutorial in biostatistics. Multivariable prognostic models: Issues in developing models, evaluating assumptions and adequacy, and measuring and reducing errors. Stat Med 15: 361–387
Harrell FE, Lee KL, Califf RM, Pryor DB and Rosati RA (1984). Regression modelling strategies for improved prognostic prediction. Stat Med 3: 143–152
Huang J and Harrington D (2002). Penalized partial likelihood regression for right-censored data with bootstrap selection of the penalty parameter. Biometrics 58: 781–791
Johnson ME, Tolley HD, Bryson MC and Goldman AS (1982). Covariate analysis of survival data: a small-sample study of Cox’s model. Biometrics 38: 685–698
Lagakos SW and Schoenfeld DA (1984). Properties of proportional-hazards score tests under misspecified regression models. Biometrics 40: 1037–1048
Miller AJ (1984). Selection of subsets of regression variables. J R Stat Soc [Ser A] 147: 389–425
Miller AJ (2002) Subset selection in regression, 2nd edn. Chapman & Hall/CRC
Oken M, Leong T, Lenhard R, Greipp P, Kay N, Ness BV and Kyle R (1999). The addition of interferon or high dose cyclophosphamide to standard chemotherapy in the treatment of patients with multiple myeloma. Cancer 86: 957–968
Raftery A, Madigan D, Volinsky C (1996) Accounting for model uncertainty in survival analysis improves predictive performance. In: Bernardo J, Berger JO, Dawid AP, Smith AFM (eds) Bayesian statistics 5. Oxford University Press, pp 323–349
Sauerbrei W and Schumacher M (1992). A bootstrap resampling procedure for model building: application to the Cox regression model. Stat Med 11: 2093–2109
Struthers CA and Kalbfleisch JD (1986). Misspecified proportional hazards models. Biometrika 73: 363–369
Tibshirani R (1997). The LASSO method for variable selection in the Cox model. Stat Med 16: 385–395
van Houwelingen JC and le Cessie S (1990). Predictive value of statistical models. Stat Med 9: 1303–1325
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Soh, CH., Harrington, D.P. & Zaslavsky, A.M. Reducing bias in parameter estimates from stepwise regression in proportional hazards regression with right-censored data. Lifetime Data Anal 14, 65–85 (2008). https://doi.org/10.1007/s10985-007-9078-5
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10985-007-9078-5