Properties of the Estimators of the Cox Regression Model with Imputed Data

Chiapella, Luciana Carla; Quaglino, Marta Beatriz; Mamprin, María Eugenia

doi:10.1007/s12561-022-09361-7

Properties of the Estimators of the Cox Regression Model with Imputed Data

Published: 30 December 2022

Volume 15, pages 330–352, (2023)
Cite this article

Statistics in Biosciences Aims and scope Submit manuscript

Luciana Carla Chiapella¹,
Marta Beatriz Quaglino²^na1 &
María Eugenia Mamprin¹^na1

169 Accesses
Explore all metrics

Abstract

Cox regression is one of the most commonly used methods in biomedical research when studying the relationship between a set of covariates and the time up to the occurrence of an event of interest. In research studies, it is not surprising to find missing data, which may compromise the well-known asymptotic properties of the estimators and lead to wrong inferences. In this paper, we present the results of an extensive simulation study on the impact of different methods for the treatment of missing data in estimating the parameters of a Cox model with mixed covariates. The study considers different mechanisms and proportions of missing data and different sample sizes. A variety of five methods are applied for the treatment of missing data and the distributional properties of the estimators of the model parameters; their predictive capacity and the precision of the imputations are compared. In general, the publications that compare imputation techniques in the context of Cox models do so using complete case analysis or multiple imputation. In this paper, the consideration of some flexible imputation methods is proposed. These methods have been shown to provide acceptable results, so their consideration is recommended in cases similar to those raised in this study. Finally, a real motivating case is introduced and the results of the analysis of its information are presented, following the guidelines that arise from the recommendations derived from the simulation study.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Cox regression with missing covariate data using a modified partial likelihood method

Article 22 October 2015

Large sample results for frequentist multiple imputation for Cox regression with missing covariate data

Article 04 April 2019

Multiple imputation for handling missing outcome data when estimating the relative risk

Article Open access 06 September 2017

References

Bailey KR (1983) The asymptotic joint distribution of regression and survival parameter estimates in the Cox regression model. Ann Stat 11(1):39–48
Article MathSciNet MATH Google Scholar
Cox DR (1975) Partial likelihood. Biometrika 62(2):269–276
Article MathSciNet MATH Google Scholar
Næs T (1982) The asymptotic distribution of the estimator for the regression parameter in Cox’s regression model. Scand J Stat 9(2):107–115
MathSciNet MATH Google Scholar
Tsiatis AA (1981) A large sample study of Cox’s regression model. Ann Stat 9(1):93–108
Article MathSciNet MATH Google Scholar
Demissie S et al (2003) Bias due to missing exposure data using complete-case analysis in the proportional hazards regression model. Stat Med 22(4):545–557
Article Google Scholar
Little RJA, Rubin DB (2019) Statistical analysis with missing data, 3rd edn. John Wiley & Sons, New York
MATH Google Scholar
Dempster AP et al (1977) Maximum likelihood from incomplete data via the EM algorithm. J Roy Stat Soc: Ser B (Methodol) 39(1):1–22
MathSciNet MATH Google Scholar
Heckman JJ (1979) Sample selection bias as a specification error. Econom J Econom Soc 47:153–161
MathSciNet MATH Google Scholar
Rubin DB (1976) Inference and missing data. Biometrika 63(3):581–592
Article MathSciNet MATH Google Scholar
Ali AMG et al (2011) Comparison of methods for handling missing data on immunohistochemical markers in survival analysis of breast cancer. Br J Cancer 104(4):693–699
Article Google Scholar
Hsu C-H, Yu M (2018) Cox regression analysis with missing covariates via nonparametric multiple imputation. Stat Methods Med Res 28(6):1676–1688
Article MathSciNet Google Scholar
Qi L et al (2010) A comparison of multiple imputation and fully augmented weighted estimators for Cox regression with missing covariates. Stat Med 29(25):2592–2604
Article MathSciNet Google Scholar
White IR, Royston P (2009) Imputing missing covariate values for the Cox model. Stat Med 28(15):1982–1998
Article MathSciNet Google Scholar
Clark TG, Altman DG (2003) Developing a prognostic model in the presence of missing data: an ovarian cancer case study. J Clin Epidemiol 56(1):28–37
Article Google Scholar
Jerez JM et al (2010) Missing data imputation using statistical and machine learning methods in a real breast cancer problem. Artif Intell Med 50(2):105–115
Article Google Scholar
van Buuren S et al (1999) Multiple imputation of missing blood pressure covariates in survival analysis. Stat Med 18(6):681–694
Article Google Scholar
Guo CY et al (2021) The optimal machine learning-based missing data imputation for the cox proportional hazard model. Front Pub Health. https://doi.org/10.3389/fpubh.2021.68005
Article Google Scholar
Cox DR (1972) Regression models and life-tables. J Roy Stat Soc: Ser B (Methodol) 34(2):187–202
MathSciNet MATH Google Scholar
Houari R et al. (2014) Handling missing data problems with sampling methods. 2014 international conference on advanced networking distributed systems and applications (INDS), IEEE
Donders AR et al (2006) Review: a gentle introduction to imputation of missing values. J Clin Epidemiol 59(10):1087–1091
Article Google Scholar
Troyanskaya O et al (2001) Missing value estimation methods for DNA microarrays. Bioinformatics 17(6):520–525
Article Google Scholar
Gower JC (1971) A general coefficient of similarity and some of its properties. Biometrics 27(4):857–871
Article Google Scholar
Kagie M et al. (2009) “An empirical comparison of dissimilarity measures for recommender systems.” ERIM report series research in management
Breiman L (2001) Random forests. Mach Learn 45(1):5–32
Article MATH Google Scholar
Stekhoven DJ, Bühlmann P (2012) MissForest—non-parametric missing value imputation for mixed-type data. Bioinformatics 28(1):112–118
Article Google Scholar
van Buuren S, Groothuis-Oudshoorn K (2010) Mice: multivariate imputation by chained equations in R. J Stat Softw 45(3):1–68
Google Scholar
Zhang Z (2016) Multiple imputation with multivariate imputation by chained equation (MICE) package. Ann Transl Med 4(2):30
Google Scholar
Pedersen AB et al (2017) Missing data and multiple imputation in Clinical Epidemiological Research. Clin Epidemiol 9:157
Article Google Scholar
Sidi Y, Harel O (2018) The treatment of incomplete data: Reporting, analysis, reproducibility, and replicability. Soc Sci Med 209:169–173
Article Google Scholar
Sterne JA et al (2009) Multiple imputation for missing data in epidemiological and clinical research: potential and pitfalls. BMJ 338:b2393
Article Google Scholar
White IR et al (2011) Multiple imputation using chained equations: Issues and guidance for practice. Stat Med 30(4):377–399
Article MathSciNet Google Scholar
Yucel R (2017) Impact of the non-distinctness and non-ignorability on the inference by multiple imputation in multivariate multilevel data: a simulation assessment. J Stat Comput Simul 87(9):1813–1826
Article MathSciNet MATH Google Scholar
Team R (2016) RStudio: Integrated Development Environment for R
Kropko J, Harden JJ (2020) coxed: Duration-Based Quantities of Interest for the Cox Proportional Hazards Model. R package version 0.3.3
Harden JJ, Kropko J (2019) Simulating duration data for the Cox model. Polit Sci Res Methods 7(4):921–928
Article Google Scholar
Templ M et al. (2011) "VIM: visualization and imputation of missing values." R package version 2(3)
Stekhoven DJ (2013) Package ‘missForest’: Nonparametric Missing Value Imputation using Random Forest. Swiss Federal Institute of Technology, Zürich, Switzerland
Google Scholar
Rodante DE et al (2019) Predictors of short and long term recurrence of suicidal behavior in Borderline Personality Disorder. Acta Psychiatr Scand 140(2):158–168
Article Google Scholar
Villar Garcı́a M et al (1995) Preparation of a SCID-II-based diagnostic tool for personality disorders. Spanish version. Translation and adaptation. Actas Luso Esp Neurol Psiquiatr Cienc Afines 23(4):178–183
Google Scholar
Buss AH, Durkee A (1957) An inventory for assessing different kinds of hostility. J Consult Psychol 21(4):343
Article Google Scholar
Montalván V et al (2001) Spanish adaptation of the Buss-Durkee Hostility Inventory (BDHI). Eur J Psychiatry 15(2):101–112
Google Scholar
Bobes J et al (1999) Validation of the Spanish version of the social adaptation scale in depressive patients. Actas Esp Psiquiatr 27(2):71–80
Google Scholar
Bosc M et al (1997) Development and validation of a social functioning scale, the social adaptation self-evaluation scale. Eur Neuropsychopharmacol 7(1):S57–S70
Article MathSciNet Google Scholar
Rubin DB (1987) Multiple imputation for nonresponse in surveys. Wiley, New York
Book MATH Google Scholar
Little RJ et al (2012) The prevention and treatment of missing data in clinical trials. N Engl J Med 367(14):1355–1360
Article Google Scholar
van Ginkel JR et al. (2019) "Rebutting Existing Misconceptions About Multiple Imputation as a Method for Handling Missing Data." Journal of Personality Assessment: 1–12
Nguyen CD et al (2017) Model checking in multiple imputation: an overview and case study. Emerg Themes Epidemiol 14(1):8
Article Google Scholar
Von Elm E et al (2008) Das Strengthening the Reporting of Observational Studies in Epidemiology (STROBE-) statement. Notfall+ Rettungsmedizin 11(4):260–260
Article Google Scholar

Download references

Acknowledgements

The results presented in this work have been obtained using the facilities of the CCT-Rosario Computational Centre, member of the High Performance Computing National System (SNCAD, MincyT-Argentina). The good predisposition of your work team to advise and collaborate with the use of the system is appreciated. We also thank Federico Daray and his research team for providing the data used in the actual example presented in this study.

Author information

Marta Beatriz Quaglino and María Eugenia Mamprin are considered as joint senior co-authors.

Authors and Affiliations

Facultad de Ciencias Bioquímicas y Farmacéuticas (Área Farmacología), Universidad Nacional de Rosario, CONICET, Rosario (Santa Fe), Argentina
Luciana Carla Chiapella & María Eugenia Mamprin
Facultad de Ciencias Económicas y Estadística, Instituto de Investigaciones Teóricas y Aplicadas (Escuela de Estadística), Universidad Nacional de Rosario, Rosario (Santa Fe), Argentina
Marta Beatriz Quaglino

Authors

Luciana Carla Chiapella
View author publications
You can also search for this author in PubMed Google Scholar
Marta Beatriz Quaglino
View author publications
You can also search for this author in PubMed Google Scholar
María Eugenia Mamprin
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Marta Beatriz Quaglino.

Ethics declarations

Conflict of interest

The authors have no relevant financial or non-financial interests to disclose.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary file1 (DOCX 186 kb)

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Chiapella, L.C., Quaglino, M.B. & Mamprin, M.E. Properties of the Estimators of the Cox Regression Model with Imputed Data. Stat Biosci 15, 330–352 (2023). https://doi.org/10.1007/s12561-022-09361-7

Download citation

Received: 04 May 2022
Revised: 10 December 2022
Accepted: 23 December 2022
Published: 30 December 2022
Issue Date: July 2023
DOI: https://doi.org/10.1007/s12561-022-09361-7

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Properties of the Estimators of the Cox Regression Model with Imputed Data

Abstract

Access this article

Similar content being viewed by others

Cox regression with missing covariate data using a modified partial likelihood method

Large sample results for frequentist multiple imputation for Cox regression with missing covariate data

Multiple imputation for handling missing outcome data when estimating the relative risk

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Supplementary Information

Supplementary file1 (DOCX 186 kb)

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Properties of the Estimators of the Cox Regression Model with Imputed Data

Abstract

Access this article

Similar content being viewed by others

Cox regression with missing covariate data using a modified partial likelihood method

Large sample results for frequentist multiple imputation for Cox regression with missing covariate data

Multiple imputation for handling missing outcome data when estimating the relative risk

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Supplementary Information

Supplementary file1 (DOCX 186 kb)

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation