Abstract
The missing values or items were aroused from various factors like lost to follow-up, critical illness, severe adverse drug reaction, due to death and some other accidental causes, and termination of clinical trial during the study period. In this intervention, the researcher is unable to draw the drug decision-making based on the negative predicted values. In medical research domain, the missing value imputation is a necessary footstep to extrapolate accurate scientific evidence (clinical data sets), knowledge of drug distribution, developing vaccine trails, automatic disease diagnosis, formulation of drug and induction of clinical trial, etc. Missing values may resort the final results, and the underlying data missing mechanism may deleteriously cause a biased statistical analysis and decrease the level of precision. Therefore, we need to be focused or formulate appropriate advanced statistical imputation models at larger extent. An advanced model will be necessary to predict the true predicted values at greater accuracy before testing the null hypothesis (H0) at population or sample level. In this vein, the present chapter discusses various advanced imputation model for predicting missing values (least variance method, Gaussian mixture, KNN, and ARIMA models). An overall advanced imputed models were demonstrated by real data sets and diagnostically models were tested by suitable test statistics.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Allison PD (2001) Missing data, vol 136. Sage
Azur Melissa J et al (2011) Multiple imputation by chained equations: what is it and how does it work. Int J Methods Psychiatr Res 2(1):40–49
De Goeij MC, Van Diepen M, Jager KJ et al (2013) Multiple imputation: dealing with missing data. Nephrol Dial Transplant 28(10):2415–2420
Gelman et al (2006) Data analysis using regression and multilevel/hierarchical models. Chapter 15. Cambridge University Press. http://www.stat.columbia.edu/~gelman/arm/missing.pdf
Hassani H, Kalantari M, Ghodsi Z (2019) Evaluating the performance of multiple imputation methods for handling missing values in time series data: a study focused on East Africa, soil carbonate-stable isotope data. Stats 2(4):457–467
Heckman JJ (1976) The common structure of statistical models of truncation, sample selection and limited dependent variables and a simple estimator for such models. Ann Econ Soc Meas 5(4):475–492
Karimzadeh S, Olafsson S (2019) Data clustering using proximity matrices with missing values. Author links open overlay panel. Expert Syst Appl 15(7):265–276
Kaushal S et al (2014) Missing data in clinical trials: pitfalls and remedies. Int J Appl Basic Med Res 4(Suppl 1):S6–S7
Kollewe K, Mauss U, Krampfl K, Petri S, Dengler R, Mohammadi B (2008) ALSFRS-R score and its ratio: a useful predictor for ALS-progression. J Neurol Sci 275(1–2):69–73. https://doi.org/10.1016/j.jns.2008.07.016
Little RJA, Rubin DB (2014) Statistical analysis with missing data, vol 793. Wiley, New York
Little RJ, DAgostino R, Cohen ML, Dickersin K, Emerson SS, Farrar J (2012) The prevention and treatment of missing data in clinical trials. N Engl J Med 367(14):1355–1360
Marlin B (2008) Missing data problems in machine learning. http://www-devel.cs.ubc.ca/~bmarlin/research/phd_thesis/marlin-phd-thesis.pdf. Accessed 7 Aug 2016
Miotto R, Li L, Kidd BA et al (2016) Deep patient: an unsupervised representation to predict the future of patients from the electronic health records. Sci Rep 6:26094. https://doi.org/10.1038/srep26094
Pigott TD (2001) A review of methods for missing data. Educ Res Eval 7(4):353–383
Rubin DB (1976) Inference and missing data. J Biometrika 63(3):581–592
Sabatelli M, Conte A, Zollino M (2013) Clinical and genetic heterogeneity of amyotrophic lateral sclerosis. Clin Genet 83(5):408–416. https://doi.org/10.1111/cge.12117
Srivastava N, Hinton GE, Krizhevsky A, Sutskever I, Salakhutdinov R (2014) Dropout3: a simple way to prevent neural networks from overfitting. J Mach Learn Res 15:1929–1958. https://doi.org/10.1214/12-AOS1000
Sterne JAC, White IR, Carlin JB et al (2009) Multiple imputation for missing data in epidemiological and clinical research: potential and pitfalls. BMJ 338:b2393. https://doi.org/10.1136/bmj.b2393
Troyanskaya O, Cantor M, Sherlock G et al (2001) Missing value estimation methods for DNA microarrays. Bioinformatics 17(6):520–525. https://doi.org/10.1093/bioinformatics/17.6.520
Van Buuren S (2018) Flexible imputation of missing data. Chapman and Hall/CRC, Boca Raton
Wells BJ, Chagin KM, Nowacki AS, Kattan MW (2013) Strategies for handling missing data in electronic health record derived data. EGEMS (Washington, DC) 1(3):1035. https://doi.org/10.13063/2327-9214.1035
Author information
Authors and Affiliations
Rights and permissions
Copyright information
© 2020 The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this chapter
Cite this chapter
D. M., B., Narasimha Murthy, B. (2020). Imputation Methods Approach to Clinical and Life Science Research Data Sets. In: Design of Experiments and Advanced Statistical Techniques in Clinical Research. Springer, Singapore. https://doi.org/10.1007/978-981-15-8210-3_11
Download citation
DOI: https://doi.org/10.1007/978-981-15-8210-3_11
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-15-8209-7
Online ISBN: 978-981-15-8210-3
eBook Packages: Biomedical and Life SciencesBiomedical and Life Sciences (R0)