Abstract
We propose an extension of the method presented in Helenowski and Demirtas (2013) involving imputing mixed continuous and binary data to data involving categorical variables with three or more levels. In a bivariate case, the medians for the continuous variable will be computed by each level of the categorical variable and the categorical variable will be ranked as an ordinal variable with respect to these medians, so that each ordinal level assigned to a categorical level is determined by the rank order of medians of the continuous variable for that category. In a multivariate case, the categorical variables are ordered with respect to the continuous variable for which the range among the medians is the largest. Here, ‘bivariate’ indicates that the data set includes two variables while ‘multivariate’ indicates that the data set includes three or more variables. The pairwise correlation between the continuous and ordinal variable is then computed. Data will then be transformed to normally distributed values, imputed via joint modeling, and back-transformed to the original scale via the Barton and Schruben (1993) technique for the continuous variable and quantiles based on the original probabilities of the categorical variable. The algorithm is re-iterated until the absolute difference of the pairwise correlations from the original and imputed data is less than some constant c chosen to maximize the coverage rate and minimize standardized bias. Results from simulations applied to artificial data and to real data involving 74 colorectal patients indicate that our technique as promising.
Similar content being viewed by others
References
Barton, R. R., Schruben, L. W.: Uniform and bootstrap resampling of empirical distributions. In proceedings of the 25th conference on winter simulation (1993), pp. 503–508 (1993)
Carraro, P.G., Segala, M., Cesana, B.M., Tiberio, G.: Obstructing colonic cancer: failure and survival patterns over a ten-year follow-up after one-stage curative surgery. Diseases of the Colon and Rectum. 4(2), 243–250 (2001)
Demirtas, H.: Simulation driven inferences for multiply imputed longitudinal data sets. Stat. Neerl. 58(4), 466–482 (2004)
Demirtas, H.: Practical advice on how to impute continuous data when the ultimate interest centers on dichotomized outcomes through pre-specified thresholds. Commun. Stat. Simul. Comput. 36(4), 871–889 (2007)
Demirtas, H., Doganay, B.: Simultaneous generation of binary and normal data with specified marginal and association structures. J. Biopharm. Stat. 22(2), 223–236 (2012)
Demirtas, H., Yavuz, Y.(2015): Concurrent generation of ordinal and normal data. J. Biopharm. Stat. (in press)
Demirtas, H., Freels, S.A., Yucel, R.M.: Plausibility of multivariate normality assumption when imputing non-Gaussian continuous outcomes: a simulation assessment. J. Stat. Comput. Simul. 78(1), 69–84 (2008)
Fortier, J., Chung, F., Su, J.: Unanticipated admission after ambulatory surgery—a prospective study. Can. J. Anaesth. 45(7), 612–619 (1998)
Helenowski, I.B., Demirtas, H.: A semi-parametric approach for imputing mixed data. Stat. Interface. 6(3), 399–412 (2013)
Helenowski, I. B., Demirtas, H.: Multiple imputation for continuous data via a semi- parametric probability integral transformation. J. Biopharm. Stat. 24(2), 359–377 (2014)
Helenowski, I. B., Demirtas, H., Erdogan, B. D.: On imputing binary data via pairwise associations and corresponding conditional probabilities. Turk. Clin. J. Biostat. 4(1), 1–9 (2012)
Madersbacher, S., Hochreiter, W., Burkhard, F., Thalmann, G.N., Danuser, H., Markwalder, R., Studer, U.E.: Radical cystectomy for bladder cancer today–a homogeneous series without neoadjuvant therapy. J. Clin. Oncol. 21(4), 690–696 (2003)
Schafer, J.L.: Analysis of Incomplete Multivariate Data. Chapman and Hall, London (1997)
Tanner, M.A., Wong, W.H.: The calculation of posterior distributions by data augmentation. JASA 82(398), 528–540 (1987)
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Helenowski, I.B., Demirtas, H. & McGee, M.F. A semi-parametric approach to impute mixed continuous and categorical data. Health Serv Outcomes Res Method 14, 183–193 (2014). https://doi.org/10.1007/s10742-014-0127-8
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10742-014-0127-8