Statistical Modeling for the Heart Disease Diagnosis via Multiple Imputation

  • Lian Li
  • Yichuan Zhao
Part of the ICSA Book Series in Statistics book series (ICSABSS)


During statistical analysis of clinic data, missing data is a common challenge. Incomplete datasets can occur via different means, such as mishandling of samples, low signal-to-noise ratio, measurement error, non-responses to questions, or aberrant value deletion. Missing data causes severe problems in statistical analysis and leads to invalid conclusions. Multiple imputation is a useful strategy for handling missing data. The statistical inference of multiple imputation is widely accepted as a less biased and more valid result. In the chapter, we apply the multiple imputation to a public-accessible heart disease dataset, which has a high missing rate, and build a prediction model for the heart disease diagnosis.


Missing data Multiple imputation Heart disease dataset 



The authors are grateful to the two reviewers for their helpful comments, which improved the manuscript significantly. The authors would like to thank Lisa Elon for invaluable advice and Dr. Eric Dammer for critical reading of the manuscript.


  1. Cary, N. (2015). SAS/STAT® 14.1 User’s Guide. Cary, NC: SAS Institute Inc.Google Scholar
  2. Graham, J. W. (2009). Missing data analysis: Making it work in the real world. Annual Review of Psychology, 60, 549–576.CrossRefGoogle Scholar
  3. Kang, H. (2013). The prevention and handling of the missing data. Korean Journal of Anesthesiology, 64, 402–406.CrossRefGoogle Scholar
  4. Rubin, D. B. (1976). Inference and missing data. Biometrika, 63, 581–592.MathSciNetCrossRefGoogle Scholar
  5. Sarkar, S. K., Midi, H., & Rana, S. (2011). Detection of outliers and influential observations in binary logistic regression: An empirical study. Journal of Applied Sciences, 11(1), 26–35.CrossRefGoogle Scholar
  6. Sterne, J. A., White, I. R., Carlin, J. B., Spratt, M., Royston, P., Kenward, M. G., Wood, A. M., & Carpenter, J. R. (2009). Multiple imputation for missing data in epidemiological and clinical research: Potential and pitfalls. BMJ, 338, b2393.CrossRefGoogle Scholar
  7. Tanner, M. A. & Wong W. H. (1987). Source: Journal of the American Statistical Association, 82(398), 528–540.Google Scholar
  8. Van Buuren, S., Brand, J. P. L., Groothuis-Oudshoorn, C. G. M., & Rubin, D. B. (2006). Fully conditional specification in multivariate imputation. Journal of Statistical Computation and Simulation, 76, 1049–1064.MathSciNetCrossRefGoogle Scholar
  9. Von Hippel, P. T. (2009). How to impute interactions, squares, and other transformed variables. Sociological Methodology, 39, 265–291.CrossRefGoogle Scholar
  10. Zhang, P. (2003). Multiple imputation: Theory and method. International Statistical Review, 71, 581–592.CrossRefGoogle Scholar

Copyright information

© Springer Nature Switzerland AG 2018

Authors and Affiliations

  1. 1.Department of Epidemiology, Rollins School of Public HealthEmory UniversityAtlantaUSA
  2. 2.Department of Mathematics and StatisticsGeorgia State UniversityAtlantaUSA

Personalised recommendations