Skip to main content

Advertisement

Log in

A semi-parametric approach to impute mixed continuous and categorical data

  • Published:
Health Services and Outcomes Research Methodology Aims and scope Submit manuscript

Abstract

We propose an extension of the method presented in Helenowski and Demirtas (2013) involving imputing mixed continuous and binary data to data involving categorical variables with three or more levels. In a bivariate case, the medians for the continuous variable will be computed by each level of the categorical variable and the categorical variable will be ranked as an ordinal variable with respect to these medians, so that each ordinal level assigned to a categorical level is determined by the rank order of medians of the continuous variable for that category. In a multivariate case, the categorical variables are ordered with respect to the continuous variable for which the range among the medians is the largest. Here, ‘bivariate’ indicates that the data set includes two variables while ‘multivariate’ indicates that the data set includes three or more variables. The pairwise correlation between the continuous and ordinal variable is then computed. Data will then be transformed to normally distributed values, imputed via joint modeling, and back-transformed to the original scale via the Barton and Schruben (1993) technique for the continuous variable and quantiles based on the original probabilities of the categorical variable. The algorithm is re-iterated until the absolute difference of the pairwise correlations from the original and imputed data is less than some constant c chosen to maximize the coverage rate and minimize standardized bias. Results from simulations applied to artificial data and to real data involving 74 colorectal patients indicate that our technique as promising.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

References

  • Barton, R. R., Schruben, L. W.: Uniform and bootstrap resampling of empirical distributions. In proceedings of the 25th conference on winter simulation (1993), pp. 503–508 (1993)

  • Carraro, P.G., Segala, M., Cesana, B.M., Tiberio, G.: Obstructing colonic cancer: failure and survival patterns over a ten-year follow-up after one-stage curative surgery. Diseases of the Colon and Rectum. 4(2), 243–250 (2001)

    Article  Google Scholar 

  • Demirtas, H.: Simulation driven inferences for multiply imputed longitudinal data sets. Stat. Neerl. 58(4), 466–482 (2004)

    Article  Google Scholar 

  • Demirtas, H.: Practical advice on how to impute continuous data when the ultimate interest centers on dichotomized outcomes through pre-specified thresholds. Commun. Stat. Simul. Comput. 36(4), 871–889 (2007)

    Article  Google Scholar 

  • Demirtas, H., Doganay, B.: Simultaneous generation of binary and normal data with specified marginal and association structures. J. Biopharm. Stat. 22(2), 223–236 (2012)

    Article  PubMed  Google Scholar 

  • Demirtas, H., Yavuz, Y.(2015): Concurrent generation of ordinal and normal data. J. Biopharm. Stat. (in press)

  • Demirtas, H., Freels, S.A., Yucel, R.M.: Plausibility of multivariate normality assumption when imputing non-Gaussian continuous outcomes: a simulation assessment. J. Stat. Comput. Simul. 78(1), 69–84 (2008)

    Article  Google Scholar 

  • Fortier, J., Chung, F., Su, J.: Unanticipated admission after ambulatory surgery—a prospective study. Can. J. Anaesth. 45(7), 612–619 (1998)

  • Helenowski, I.B., Demirtas, H.: A semi-parametric approach for imputing mixed data. Stat. Interface. 6(3), 399–412 (2013)

    Article  Google Scholar 

  • Helenowski, I. B., Demirtas, H.: Multiple imputation for continuous data via a semi- parametric probability integral transformation. J. Biopharm. Stat. 24(2), 359–377 (2014)

  • Helenowski, I. B., Demirtas, H., Erdogan, B. D.: On imputing binary data via pairwise associations and corresponding conditional probabilities. Turk. Clin. J. Biostat. 4(1), 1–9 (2012)

  • Madersbacher, S., Hochreiter, W., Burkhard, F., Thalmann, G.N., Danuser, H., Markwalder, R., Studer, U.E.: Radical cystectomy for bladder cancer today–a homogeneous series without neoadjuvant therapy. J. Clin. Oncol. 21(4), 690–696 (2003)

    Article  PubMed  Google Scholar 

  • Schafer, J.L.: Analysis of Incomplete Multivariate Data. Chapman and Hall, London (1997)

    Book  Google Scholar 

  • Tanner, M.A., Wong, W.H.: The calculation of posterior distributions by data augmentation. JASA 82(398), 528–540 (1987)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Irene B. Helenowski.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Helenowski, I.B., Demirtas, H. & McGee, M.F. A semi-parametric approach to impute mixed continuous and categorical data. Health Serv Outcomes Res Method 14, 183–193 (2014). https://doi.org/10.1007/s10742-014-0127-8

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10742-014-0127-8

Keywords

Navigation