Abstract
Missing data are a common issue in statistical analyses. Multiple imputation is a technique that has been applied in countless research studies and has a strong theoretical basis. Most of the statistical literature on multiple imputation has focused on unbounded continuous variables, with mostly ad hoc remedies for variables with bounded support. These approaches can be unsatisfactory when applied to bounded variables as they can produce misleading inferences. In this paper, we propose a flexible quantile-based imputation model suitable for distributions defined over singly or doubly bounded intervals. Proper support of the imputed values is ensured by applying a family of transformations with singly or doubly bounded range. Simulation studies demonstrate that our method is able to deal with skewness, bimodality, and heteroscedasticity and has superior properties as compared to competing approaches, such as log-normal imputation and predictive mean matching. We demonstrate the application of the proposed imputation procedure by analysing data on mathematical development scores in children from the Millennium Cohort Study, UK. We also show a specific advantage of our methods using a small psychiatric dataset. Our methods are relevant in a number of fields, including education and psychology.
Similar content being viewed by others
References
Aranda-Ordaz, F. J. (1981). On two families of transformations to additivity for binary response data. Biometrika, 68(2), 357–363. https://doi.org/10.1093/biomet/68.2.357.
Bech, P., & Rafaelsen, O. J. (1980). The use of rating scales exemplified by a comparison of the hamilton and the bech-rafaelsen melancholia scale. Acta Psychiatrica Scandinavica, 62(S285), 128–132. https://doi.org/10.1111/j.1600-0447.1980.tb07683.x.
Bottai, M., & Zhen, H. (2013). Multiple imputation based on conditional quantile estimation. Epidemiology, Biostatistics, and Public Health, 10(1), e8758. https://doi.org/10.2427/8758.
Box, G. E. P., & Cox, D. R. (1964). An analysis of transformations. Journal of the Royal Statistical Society B, 26(2), 211–252. https://doi.org/10.1080/01621459.1981.10477649.
Buchinsky, M. (1995). Quantile regression, Box–Cox transformation model, and the US wage structure, 1963–1987. Journal of Econometrics, 65(1), 109–154. https://doi.org/10.1016/0304-4076(94)01599-U.
Chamberlain, G. (1994). Quantile regression, censoring, and the structure of wages. In C. Sims (Ed.), Advances in econometrics: Sixth world congress (Vol. 1). Cambridge: Cambridge University Press.
de Jong, R., van Buuren, S., & Spiess, M. (2016). Multiple imputation of predictor variables using generalized additive models. Communications in Statistics - Simulation and Computation, 45(3), 968–985. https://doi.org/10.1080/03610918.2014.911894.
Dehbi, H.-M., Cortina-Borja, M., & Geraci, M. (2016). Aranda–Ordaz quantile regression for student performance assessment. Journal of Applied Statistics, 43(1), 58–71. https://doi.org/10.1080/02664763.2015.1025724.
Demirtas, H. (2009). Multiple imputation under the generalized lambda distribution. Journal of Biopharmaceutical Statistics, 19(1), 77–89. https://doi.org/10.1080/10543400802527882.
Demirtas, H., & Hedeker, D. (2008a). Imputing continuous data under some non-Gaussian distributions. Statistica Neerlandica, 62(2), 193–205. https://doi.org/10.1111/j.1467-9574.2007.00377.x.
Demirtas, H., & Hedeker, D. (2008b). Multiple imputation under power polynomials. Communications in Statistics - Simulation and Computation, 37(8), 1682–1695. https://doi.org/10.1080/03610910802101531.
Fitzenberger, B., Wilke, R. A., & Zhang, X. (2010). Implementing Box–Cox quantile regression. Econometric Reviews, 29(2), 158–181. https://doi.org/10.1080/07474930903382166.
Geraci, M. (2016a). Estimation of regression quantiles in complex surveys with data missing at random: An application to birthweight determinants. Statistical Methods in Medical Research, 25(4), 1393–1421. https://doi.org/10.1177/0962280213484401.
Geraci, M. (2016b). Qtools: A collection of models and tools for quantile inference. The R Journal, 8(2), 117–138.
Geraci, M. (2017). Qtools: Utilities for Quantiles. R package version 1.2. URL: https://CRAN.R-project.org/package=Qtools.
Geraci, M., & Jones, M. C. (2015). Improved transformation-based quantile regression. Canadian Journal of Statistics, 43(1), 118–132. https://doi.org/10.1002/cjs.11240.
He, Y., & Raghunathan, T. E. (2006). Tukey’s gh distribution for multiple imputation. The American Statistician, 60(3), 251–256. https://doi.org/10.1198/000313006X126819.
He, Y., & Raghunathan, T. E. (2012). Multiple imputation using multivariate gh transformations. Journal of Applied Statistics, 39(10), 2177–2198. https://doi.org/10.1080/02664763.2012.702268.
Johnson, J. (2008). Millennium third survey follow-up: A guide to the school assessment datasets (1st ed.). London: Centre for Longitudinal Studies, University of London.
Kiernan, K. E., & Mensah, F. K. (2009). Poverty, maternal depression, family status and children’s cognitive and behavioural development in early childhood: A longitudinal study. Journal of Social Policy, 38(4), 569–588. https://doi.org/10.1017/S0047279409003250.
Koenker, R. (2005). Quantile regression. New York, NY: Cambridge University Press.
Koenker, R. (2016). Quantreg: Quantile regression. R package version 5.29. URL: https://CRAN.R-project.org/package=quantreg.
Koenker, R., & Bassett, G. (1978). Regression quantiles. Econometrica, 46(1), 33–50.
Lee, K. J., & Carlin, J. B. (2017). Multiple imputation in the presence of non-normal data. Statistics in Medicine, 36(4), 606–617. https://doi.org/10.1002/sim.7173.
Little, R. J. A. (1988). Missing-data adjustments in large surveys. Journal of Business & Economic Statistics, 6(3), 287–296. https://doi.org/10.1080/07350015.1988.10509663.
Little, R. J. A., & Rubin, D. B. (2002). Statistical analysis with missing data (2nd ed.). Hoboken: Wiley.
Machin, S., & McNally, S. (2005). Gender and student achievement in English schools. Oxford Review of Economic Policy, 21(3), 357–372. https://doi.org/10.1093/oxrep/gri021.
Mensah, F. K., & Kiernan, K. E. (2010). Gender differences in educational attainment: Influences of the family environment. British Educational Research Journal, 36(2), 239–260.
Morris, T. P., White, I. R., & Royston, P. (2014). Tuning multiple imputation by predictive mean matching and local residual draws. BMC Medical Research Methodology, 14(1), 75. https://doi.org/10.1186/1471-2288-14-75.
Mu, Y. M., & He, X. M. (2007). Power transformation toward a linear regression quantile. Journal of the American Statistical Association, 102(477), 269–279. https://doi.org/10.1198/016214506000001095.
Muñoz, J. F., & Rueda, M. (2009). New imputation methods for missing data using quantiles. Journal of Computational and Applied Mathematics, 232(2), 305–317. https://doi.org/10.1016/j.cam.2009.06.011.
Nielsen, S. F. (2003). Proper and improper multiple imputation. International Statistical Review, 71(3), 593–607. https://doi.org/10.1111/j.1751-5823.2003.tb00214.x.
Powell, J. L. (1991). Estimation of monotonic regression models under quantile restrictions (pp. 357–384). New York: Cambridge University Press.
Core Team, R. (2016). R: A language and environment for statistical computing. Vienna: R Foundation for Statistical Computing.
Reisby, N., Gram, L. F., Bech, P., Nagy, A., Petersen, G. O., Ortmann, J., et al. (1977). Imipramine: Clinical effects and pharmacokinetic variability. Psychopharmacology, 54(3), 263–72. https://doi.org/10.1007/BF00426574.
Rodwell, L., Lee, K. J., Romaniuk, H., & Carlin, J. B. (2014). Comparison of methods for imputing limited-range variables: A simulation study. BMC Medical Research Methodology, 14, 57. https://doi.org/10.1186/1471-2288-14-57.
Royston, P., & White, I. R. (2011). Multiple imputation by chained equations (MICE): Implementation in Stata. Journal of Statistical Software, 45(4), 1–20. https://doi.org/10.18637/jss.v045.i04.
Rubin, D. B. (1987). Multiple imputation for nonresponse in surveys. New York: Sons.
Rubin, D. B., & Schenker, N. (1986). Multiple imputation for interval estimation from simple random samples with ignorable nonresponse. Journal of the American Statistical Association, 81(394), 366–374. https://doi.org/10.2307/2289225.
Smith, K., & Joshi, H. (2002). The millennium cohort study. Population Trends, 107, 30–4.
Smithson, M., & Shou, Y. (2017). CDF-quantile distributions for modelling random variables on the unit interval. British Journal of Mathematical and Statistical Psychology,. https://doi.org/10.1111/bmsp.12091.
Van Buuren, S., Brand, J. P. L., Groothuis-Oudshoorn, C. G. M., & Rubin, D. B. (2006). Fully conditional specification in multivariate imputation. Journal of Statistical Computation and Simulation, 76(12), 1049–1064. https://doi.org/10.1080/10629360600810434.
van Buuren, S., & Groothuis-Oudshoorn, K. (2011). Mice: Multivariate imputation by chained equations in R. Journal of Statistical Software, 45(3), 1–67. https://doi.org/10.18637/jss.v045.i03.
von Hippel, P. T. (2013). Should a normal imputation model be modified to impute skewed variables? Sociological Methods and Research, 42(1), 105–138. https://doi.org/10.1177/0049124112464866.
White, I. R., Royston, P., & Wood, A. M. (2011). Multiple imputation using chained equations: Issues and guidance for practice. Statistics in Medicine, 30(4), 377–399. https://doi.org/10.1002/Sim.4067.
Acknowledgements
Marco Geraci was funded by an ASPIRE grant from the Office of the Vice President for Research at the University of South Carolina and by the National Institutes of Health–National Institute of Child Health and Human Development (Grant Number: 1R03HD084807-01A1). The authors wish to thank four anonymous referees for helpful comments and suggestions that substantially improved the paper.
Author information
Authors and Affiliations
Corresponding author
Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
About this article
Cite this article
Geraci, M., McLain, A. Multiple Imputation for Bounded Variables. Psychometrika 83, 919–940 (2018). https://doi.org/10.1007/s11336-018-9616-y
Received:
Revised:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11336-018-9616-y