Imputation Methods Approach to Clinical and Life Science Research Data Sets

D. M., Basavarajaiah; Narasimha Murthy, Bhamidipati

doi:10.1007/978-981-15-8210-3_11

Basavarajaiah D. M.³ &
Bhamidipati Narasimha Murthy⁴

598 Accesses

Abstract

The missing values or items were aroused from various factors like lost to follow-up, critical illness, severe adverse drug reaction, due to death and some other accidental causes, and termination of clinical trial during the study period. In this intervention, the researcher is unable to draw the drug decision-making based on the negative predicted values. In medical research domain, the missing value imputation is a necessary footstep to extrapolate accurate scientific evidence (clinical data sets), knowledge of drug distribution, developing vaccine trails, automatic disease diagnosis, formulation of drug and induction of clinical trial, etc. Missing values may resort the final results, and the underlying data missing mechanism may deleteriously cause a biased statistical analysis and decrease the level of precision. Therefore, we need to be focused or formulate appropriate advanced statistical imputation models at larger extent. An advanced model will be necessary to predict the true predicted values at greater accuracy before testing the null hypothesis (H₀) at population or sample level. In this vein, the present chapter discusses various advanced imputation model for predicting missing values (least variance method, Gaussian mixture, KNN, and ARIMA models). An overall advanced imputed models were demonstrated by real data sets and diagnostically models were tested by suitable test statistics.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Hardcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Allison PD (2001) Missing data, vol 136. Sage
Google Scholar
Azur Melissa J et al (2011) Multiple imputation by chained equations: what is it and how does it work. Int J Methods Psychiatr Res 2(1):40–49
Article Google Scholar
De Goeij MC, Van Diepen M, Jager KJ et al (2013) Multiple imputation: dealing with missing data. Nephrol Dial Transplant 28(10):2415–2420
Article PubMed Google Scholar
Gelman et al (2006) Data analysis using regression and multilevel/hierarchical models. Chapter 15. Cambridge University Press. http://www.stat.columbia.edu/~gelman/arm/missing.pdf
Hassani H, Kalantari M, Ghodsi Z (2019) Evaluating the performance of multiple imputation methods for handling missing values in time series data: a study focused on East Africa, soil carbonate-stable isotope data. Stats 2(4):457–467
Article Google Scholar
Heckman JJ (1976) The common structure of statistical models of truncation, sample selection and limited dependent variables and a simple estimator for such models. Ann Econ Soc Meas 5(4):475–492
Google Scholar
Karimzadeh S, Olafsson S (2019) Data clustering using proximity matrices with missing values. Author links open overlay panel. Expert Syst Appl 15(7):265–276
Article Google Scholar
Kaushal S et al (2014) Missing data in clinical trials: pitfalls and remedies. Int J Appl Basic Med Res 4(Suppl 1):S6–S7
PubMed Central PubMed Google Scholar
Kollewe K, Mauss U, Krampfl K, Petri S, Dengler R, Mohammadi B (2008) ALSFRS-R score and its ratio: a useful predictor for ALS-progression. J Neurol Sci 275(1–2):69–73. https://doi.org/10.1016/j.jns.2008.07.016
Article PubMed Google Scholar
Little RJA, Rubin DB (2014) Statistical analysis with missing data, vol 793. Wiley, New York
Google Scholar
Little RJ, DAgostino R, Cohen ML, Dickersin K, Emerson SS, Farrar J (2012) The prevention and treatment of missing data in clinical trials. N Engl J Med 367(14):1355–1360
Article CAS PubMed PubMed Central Google Scholar
Marlin B (2008) Missing data problems in machine learning. http://www-devel.cs.ubc.ca/~bmarlin/research/phd_thesis/marlin-phd-thesis.pdf. Accessed 7 Aug 2016
Miotto R, Li L, Kidd BA et al (2016) Deep patient: an unsupervised representation to predict the future of patients from the electronic health records. Sci Rep 6:26094. https://doi.org/10.1038/srep26094
Article CAS PubMed Central PubMed Google Scholar
Pigott TD (2001) A review of methods for missing data. Educ Res Eval 7(4):353–383
Article Google Scholar
Rubin DB (1976) Inference and missing data. J Biometrika 63(3):581–592
Article Google Scholar
Sabatelli M, Conte A, Zollino M (2013) Clinical and genetic heterogeneity of amyotrophic lateral sclerosis. Clin Genet 83(5):408–416. https://doi.org/10.1111/cge.12117
Article CAS PubMed Google Scholar
Srivastava N, Hinton GE, Krizhevsky A, Sutskever I, Salakhutdinov R (2014) Dropout3: a simple way to prevent neural networks from overfitting. J Mach Learn Res 15:1929–1958. https://doi.org/10.1214/12-AOS1000
Article Google Scholar
Sterne JAC, White IR, Carlin JB et al (2009) Multiple imputation for missing data in epidemiological and clinical research: potential and pitfalls. BMJ 338:b2393. https://doi.org/10.1136/bmj.b2393
Article PubMed Central PubMed Google Scholar
Troyanskaya O, Cantor M, Sherlock G et al (2001) Missing value estimation methods for DNA microarrays. Bioinformatics 17(6):520–525. https://doi.org/10.1093/bioinformatics/17.6.520
Article CAS PubMed Google Scholar
Van Buuren S (2018) Flexible imputation of missing data. Chapman and Hall/CRC, Boca Raton
Book Google Scholar
Wells BJ, Chagin KM, Nowacki AS, Kattan MW (2013) Strategies for handling missing data in electronic health record derived data. EGEMS (Washington, DC) 1(3):1035. https://doi.org/10.13063/2327-9214.1035
Article Google Scholar

Download references

Author information

Authors and Affiliations

Karnataka Veterinary, Animal and Fisheries Sciences University, Karnataka, India
Basavarajaiah D. M.
National Health Mission, Govt. of India, National Institute of Epidemiology, Chennai, India
Bhamidipati Narasimha Murthy

Authors

Basavarajaiah D. M.
View author publications
You can also search for this author in PubMed Google Scholar
Bhamidipati Narasimha Murthy
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

D. M., B., Narasimha Murthy, B. (2020). Imputation Methods Approach to Clinical and Life Science Research Data Sets. In: Design of Experiments and Advanced Statistical Techniques in Clinical Research. Springer, Singapore. https://doi.org/10.1007/978-981-15-8210-3_11

Download citation

DOI: https://doi.org/10.1007/978-981-15-8210-3_11
Published: 05 November 2020
Publisher Name: Springer, Singapore
Print ISBN: 978-981-15-8209-7
Online ISBN: 978-981-15-8210-3
eBook Packages: Biomedical and Life SciencesBiomedical and Life Sciences (R0)

Publish with us

Policies and ethics