Skip to main content
Log in

Leveraging electronic health record data to inform hospital resource management

A systematic data mining approach

  • Published:
Health Care Management Science Aims and scope Submit manuscript

Abstract

Early identification of resource needs is instrumental in promoting efficient hospital resource management. Hospital information systems, and electronic health records (EHR) in particular, collect valuable demographic and clinical patient data from the moment patients are admitted, which can help predict expected resource needs in early stages of patient episodes. To this end, this article proposes a data mining methodology to systematically obtain predictions for relevant managerial variables by leveraging structured EHR data. Specifically, these managerial variables are: i) Diagnosis categories, ii) procedure codes, iii) diagnosis-related groups (DRGs), iv) outlier episodes and v) length of stay (LOS). The proposed methodology approaches the problem in four stages: Feature set construction, feature selection, prediction model development, and model performance evaluation. We tested this approach with an EHR dataset of 5,089 inpatient episodes and compared different classification and regression models (for categorical and continuous variables, respectively), performed temporal analysis of model performance, analyzed the impact of training set homogeneity on performance and assessed the contribution of different EHR data elements for model predictive power. Overall, our results indicate that inpatient EHR data can effectively be leveraged to inform resource management on multiple perspectives. Logistic regression (combined with minimal redundancy maximum relevance feature selection) and bagged decision trees yielded best results for predicting categorical and numerical managerial variables, respectively. Furthermore, our temporal analysis indicated that, while DRG classes are more difficult to predict, several diagnosis categories, procedure codes and LOS amongst shorter-stay patients can be predicted with higher confidence in early stages of patient stay. Lastly, value of information analysis indicated that diagnoses, medication and structured assessment forms were the most valuable EHR data elements in predicting managerial variables of interest through a data mining approach.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

References

  1. Orszag PR, Emanuel EJ (2010) Health care reform and cost control. N Engl J Med 363 (7):601–603

    Article  Google Scholar 

  2. Carter M (2002) Diagnosis: mismanagement of resources. OR MS TODAY 29(2):26–33

    Google Scholar 

  3. Yasar AO (2005) Quantitative methods in health care management: techniques and applications, vol 4. Wiley, New York

    Google Scholar 

  4. Hulshof PJH, Kortbeek N, Boucherie RJ, Hans EW, Bakker PJM (2012) Taxonomic classification of planning decisions in health care: a structured review of the state of the art in or/ms. Health Syst 1 (2):129–175

    Article  Google Scholar 

  5. Hans EW, Van Houdenhoven M, Hulshof PJH (2012) A framework for healthcare planning and control. In: Handbook of healthcare system scheduling. Springer, pp 303–320

  6. Baker A (2001) Crossing the quality chasm: a new health system for the 21st century

  7. Hillestad R, Bigelow J, Bower A, Girosi F, Meili R, Scoville R, Taylor R (2005) Can electronic medical record systems transform health care? potential health benefits, savings, and costs. Health Affairs 24 (5):1103–1117

    Article  Google Scholar 

  8. Kandula S, Zeng-Treitler Q, Chen L, Salomon WL, Bray BE (2011) A bootstrapping algorithm to improve cohort identification using structured data. J Biomed Inform 44:S63–S68

    Article  Google Scholar 

  9. Safran C, Bloomrosen M, Hammond WE, Labkoff S, Markel-Fox S, Tang PC, Detmer DE (2007) Toward a national framework for the secondary use of health data: an american medical informatics association white paper. J Am Med Inform Assoc 14(1):1–9

    Article  Google Scholar 

  10. Bourne PE (2014) What big data means to me

  11. Luo J, Min W, Gopukumar D, Zhao Y (2016) Big data application in biomedical research and health care: a literature review. Biomed Inform Insights 8:BII–s31559

    Article  Google Scholar 

  12. Ben-Assuli O, Padman R Trajectories of repeated readmissions of chronic disease patients: Risk stratification, profiling, and prediction. MIS Quarterly, 44(1), 2020

  13. Herland M, Khoshgoftaar TM, Wald R (2014) A review of data mining using big data in health informatics. J Big Data 1(1):1–35

    Article  Google Scholar 

  14. Ross MK, Wei W, Ohno-Machado L (2014) “big data” and the electronic health record. Yearbook Med Inform 23(01):97–104

    Article  Google Scholar 

  15. Stanfill MH, Williams M, Fenton SH, Jenders RA, Hersh WR (2010) A systematic literature review of automated clinical coding and classification systems. J Am Med Inform Assoc 17(6):646–651

    Article  Google Scholar 

  16. Busse R, Geissler A, Aaviksoo A, Cots F, Häkkinen U, Kobel C, Mateus C, Or Z, O’Reilly J, Serdén L et al (2013) Diagnosis related groups in europe: moving towards transparency, efficiency, and quality in hospitals? Bmj 346:f3197

    Article  Google Scholar 

  17. Gartner D, Kolisch R, Neill DB, Padman R (2015) Machine learning approaches for early drg classification and resource allocation. INFORMS J Comput 27(4):718–734

    Article  Google Scholar 

  18. Gartner D, Kolisch R (2014) Scheduling the hospital-wide flow of elective patients. Eur J Oper Res 233(3):689–699

    Article  Google Scholar 

  19. Gartner D, Padman R (2020) Flexible hospital-wide elective patient scheduling. J Oper Res Soc 71(6):878–892

    Article  Google Scholar 

  20. Bellazzi R, Zupan B (2008) Predictive data mining in clinical medicine: current issues and guidelines. Int J Med Inform 77(2):81–97

    Article  Google Scholar 

  21. Chuang C-L (2011) Case-based reasoning support for liver disease diagnosis. Artif Intell Med 53 (1):15–23

    Article  Google Scholar 

  22. Al Jarullah AA (2011) Decision tree discovery for the diagnosis of type ii diabetes. In: 2011 International conference on innovations in information technology, pp 303–307, IEEE

  23. Hoogendoorn M, Moons LG, Numans ME, Sips RJ (2014) Utilizing data mining for predictive modeling of colorectal cancer using electronic medical records. Lecture Notes Comput Sci 8609:132–141

    Article  Google Scholar 

  24. Kop R, Hoogendoorn M, Moons LMG, Numans ME, Annette ten Teije (2015) On the advantage of using dedicated data mining techniques to predict colorectal cancer. In: Conference on artificial intelligence in medicine in europe. Springer, pp 133–142

  25. Wu Jionglin, Roy J, Stewart WF (2010) Prediction modeling using ehr data: challenges, strategies, and a comparison of machine learning approaches. Medical Care 48(6):S106–S113

    Article  Google Scholar 

  26. Kocbek S, Cavedon L, Martinez D, Bain C, Manus CM, Haffari G, Zukerman I, Verspoor K (2016) Text mining electronic hospital records to automatically classify admissions against disease: measuring the impact of linking data sources. J Biomed Inform 64:158–167

    Article  Google Scholar 

  27. Meystre SM, Savova GK, Kipper-Schuler KC, Hurdle JF (2008) Extracting information from textual documents in the electronic health record: a review of recent research. Yearbook Med Inform 17(01):128–144

    Article  Google Scholar 

  28. Scheurwegs E, Luyckx K, Luyten L, Daelemans W, Van den Bulcke T. (2015) Data integration of structured and unstructured sources for assigning clinical codes to patient stays. J Am Med Inform Assoc 23(e1):e11–e19

    Article  Google Scholar 

  29. Ferrão JC, Oliveira MD, Janela F, Martins HMG, Gartner D (2020) Can structured EHR data support clinical coding? A data mining approach. Health Systems, 124. https://doi.org/10.1080/20476965.2020.1729666

  30. Chiaravalloti MT, Guarasci R, Lagani V, Pasceri E, Trunfio R (2014) A coding support system for the icd-9-cm standard. In: 2014 IEEE international conference on healthcare informatics. IEEE, pp 71–78

  31. Subotin M, Davis A (2014) A system for predicting icd-10-pcs codes from electronic health records. In: Proceedings of BioNLP 2014, pp 59–67

  32. Gartner D, Padman R (2015) Improving hospital-wide early resource allocation through machine learning. Stud Health Technol Inform 216:315–319

    Google Scholar 

  33. Okamoto K, Uchiyama T, Takemura T, Kume N, Kuroda T, Yoshihara H (2018) Automatic selection of diagnosis procedure combination codes based on partial treatment data relative to the number of hospitalization days. European Journal of Biomedical Informatics 14(1):45–51

    Article  Google Scholar 

  34. David EC, Louise MR (2002) Concurrent prediction of hospital mortality and length of stay from risk factors on admission. Health services research 37(3):631–645

    Article  Google Scholar 

  35. Faddy M, Graves N, Pettitt A (2009) Modeling length of stay in hospital and other right skewed data: comparison of phase-type, gamma and log-normal distributions. Value in Health 12(2):309–314

    Article  Google Scholar 

  36. Gustafson DH (1968) Length of stay: prediction and explanation. Health Serv Res 3(1):12

    Google Scholar 

  37. Arboix A, Massons J, García-eroles L, Targa C, Oliveres M, Comes E (2012) Clinical predictors of prolonged hospital stay after acute stroke: relevance of medical complications. Int J Clin Med 3(06):502

    Article  Google Scholar 

  38. Osnabrugge RL, Speir AM, Head SJ, Jones PG, Ailawadi G, Fonner CE, Fonner E Jr, Pieter KA, Rich JB (2014) Prediction of costs and length of stay in coronary artery bypass grafting. Ann Thorac Surg 98(4):1286–1293

    Article  Google Scholar 

  39. Barbini P, Barbini E, Furini S, Cevenini G (2014) A straightforward approach to designing a scoring system for predicting length-of-stay of cardiac surgery patients. BMC Med Inf Decis Mak 14(1):89

    Article  Google Scholar 

  40. Zoller B, Spanaus K, Gerster R, Fasshauer M, Stehberger PA, Klinzing S, Vergopoulos A, Eckardstein A, Béchir M (2014) Icg-liver test versus new biomarkers as prognostic markers for prolonged length of stay in critically ill patients-a prospective study of accuracy for prediction of length of stay in the icu. Ann Intensive Care 4(1):19

    Article  Google Scholar 

  41. Kapadia AS, Chan W, Sachdeva R, Moye LA, Jefferson LS (2000) Predicting duration of stay in a pediatric intensive care unit: A markovian approach. Eur J Oper Res 124(2):353–359

    Article  Google Scholar 

  42. Rowan M, Ryan T, Hegarty F, O’Hare N (2007) The use of artificial neural networks to stratify the length of stay of cardiac patients based on preoperative and initial postoperative factors. Artif Intell Med 40(3):211–221

    Article  Google Scholar 

  43. Messaoudi N, Cocker JD, Stockman B, Bossaert LL, Rodrigus IER (2009) Prediction of prolonged length of stay in the intensive care unit after cardiac surgery The need for a multi-institutional risk scoring system. J Card Surg 24(2):127–133

    Article  Google Scholar 

  44. Ong PH, Pua YH (2013) A prediction model for length of stay after total and unicompartmental knee replacement. Bone & Joint J 95(11):1490–1496

    Article  Google Scholar 

  45. Carter EM, Potts HWW (2014) Predicting length of stay from an electronic patient record system: a primary total knee replacement example. BMC Med Inform Dec Mak 14(1):26

    Article  Google Scholar 

  46. Verburg IWM, de Keizer NF, de Jonge E, Peek N (2014) Comparison of regression methods for modeling intensive care length of stay. PloS one 9(10):e109684

    Article  Google Scholar 

  47. Lafaro RJ, Pothula S, Kubal KP, Inchiosa ME, Pothula VM, Yuan SC, Maerz DA, Montes L, Oleszkiewicz SM, Yusupov A (2015) Neural network prediction of icu length of stay following cardiac surgery based on pre-incision variables. PLos One 10(12):e0145395

    Article  Google Scholar 

  48. Wrenn J, Jones I, Lanaghan K, Congdon CB, Aronsky D (2005) Estimating patient’s length of stay in the emergency department with an artificial neural network. In: AMIA... Annual Symposium proceedings. AMIA Symposium. vol 2005, pp 1155–1155. American Medical Informatics Association

  49. Xie Y, Schreier G, Chang DCW, Neubauer S, Liu Y, Redmond SJ, Lovell NH (2015) Predicting days in hospital using health insurance claims. IEEE J Biomed Health Inform 19(4):1224– 1233

    Article  Google Scholar 

  50. Xie Y, Schreier G, Hoy M, Liu Y, Neubauer S, Chang DCW, Redmond SJ, Lovell NH (2016) Analyzing health insurance claims on different timescales to predict days in hospital. J Biomed Inform 60:187–196

    Article  Google Scholar 

  51. Houdenhoven MV, Nguyen DT, Eijkemans MJ, Steyerberg EW, Tilanus HW, Gommers D, Wullink G, Bakker J, Kazemier G (2007) Optimizing intensive care capacity using individual length-of-stay prediction models. Critical Care 11(2):R42

    Article  Google Scholar 

  52. Yang CS, Wei CP, Yuan CC, Schoung J (2010) Predicting the length of hospital stay of burn patients Comparisons of prediction accuracy among different clinical stages. Decis Support Syst 50(1):325–335

    Article  Google Scholar 

  53. Huang Z, Juarez JM, Duan H, Li H (2013) Length of stay prediction for clinical treatment process using temporal similarity. Expert Syst Appl 40(16):6330–6339

    Article  Google Scholar 

  54. Huang Z, Dong W, Ji L, Duan H (2016) Outcome prediction in clinical treatment processes. J Med Syst 40(1):8

    Article  Google Scholar 

  55. Bishop CM (2006) Pattern recognition and machine learning. Springer, Berlin

    Google Scholar 

  56. Haux R, Seggewies C, Baldauf-Sobez W, Kullmann P, Reichert H, Luedecke L, Seibold H (2003) Soarian™–workflow management applied for health care. Methods Inform Med 42(01):25–36

    Article  Google Scholar 

  57. Kavuluru R, Rios A, Lu Y (2015) An empirical evaluation of supervised learning approaches in assigning diagnosis codes to electronic medical records. Artif Intell Med 65(2):155–166

    Article  Google Scholar 

  58. Sechidis K, Tsoumakas G, Vlahavas I (2011) On the stratification of multi-label data. In: Joint european conference on machine learning and knowledge discovery in databases. Springer, pp 145–158

  59. Littig SJ, Isken M (2007) Short term hospital occupancy prediction. Health Care Manag Sci 10(1):47–66

    Article  Google Scholar 

  60. (2013). Ministério da saúde, portaria n.o 163/2013. 2495–2606

  61. Ng K, Ghoting A, Steinhubl SR, Stewart W F, Malin B, Sun J (2014) Paramo: a parallel predictive modeling platform for healthcare analytic research using electronic health records. J Biomed Inform 48:160–170

    Article  Google Scholar 

  62. Qiu S, Chinnam RB, Murat A, Batarse B, Neemuchwala H, Jordan W (2015) A cost sensitive inpatient bed reservation approach to reduce emergency department boarding times. Health Care Manag Sci 18(1):67–85

    Article  Google Scholar 

  63. Guyon I, Elisseeff A (2003) An introduction to variable and feature selection. J Mach Learn Res 3(Mar):1157–1182

    Google Scholar 

  64. Saeys Y, Inza I, Larrañaga P (2007) A review of feature selection techniques in bioinformatics. bioinformatics 23(19):2507–2517

    Article  Google Scholar 

  65. Liu H, Motoda H, Setiono R, Zheng Z (2010) Feature selection: An ever evolving frontier in data mining. In: Feature Selection in Data Mining. pp 4–13

  66. Lei Y, Liu H (2004) Efficient feature selection via analysis of relevance and redundancy. J Mach Learn Res 5(Oct):1205– 1224

    Google Scholar 

  67. Peng H, Long F, Ding C (2005) Feature selection based on mutual information: criteria of max-dependency, max-relevance, and min-redundancy. IEEE Transactions on Pattern Analysis & Machine Intelligence 27 (8):1226–1238

    Article  Google Scholar 

  68. Yang Y, Pedersen JO (1997) A comparative study on feature selection in text categorization. In: Icml, vol 97, p 35

  69. Draper NR, Smith H (1998) Applied regression analysis, vol 326. Wiley, New York

    Book  Google Scholar 

  70. Breiman L (2017) Classification and regression trees. Routledge, Evanston

    Book  Google Scholar 

  71. Pereira F, Mitchell T, Botvinick M (2009) Machine learning classifiers and fmri: a tutorial overview. Neuroimage 45(1):S199–S209

    Article  Google Scholar 

  72. Hosmer DW, Lemeshow S (2000) Applied logistic regression. Wiley, New York

    Book  Google Scholar 

  73. Sikora R et al (2015) A modified stacking ensemble machine learning algorithm using genetic algorithms. In: Handbook of research on organizational transformations through big data analytics. pp 43–53. IGi Global

  74. Fawagreh K, Gaber MM, Elyan E (2014) Random forests: from early developments to recent advancements. Syst Sci Contr Eng An Open Access J 2(1):602–609

    Google Scholar 

  75. Sylvain A, Alain C, et al. (2010) A survey of cross-validation procedures for model selection. Stat Surv 4:40–79

    Google Scholar 

  76. Brown G, Pocock A, Ming-jie Z, Mikel L (2012) Conditional likelihood maximisation: a unifying framework for information theoretic feature selection. J Mach Learn Res 13(Jan):27–66

    Google Scholar 

  77. Ferrão JC, Oliveira MD, Janela F, Martins HMG (2016) Preprocessing structured clinical data for predictive modeling and decision support - A roadmap to tackle the challenges. Applied Clinical Informatics 7(4):1135–1153. https://doi.org/10.4338/ACI-2016-03-SOA-0035

    Article  Google Scholar 

  78. Pakhomov VS, Buntrock JD, Chute CG (2006) Automating the assignment of diagnosis codes to patient encounters using example-based and machine learning techniques. J Am Med Inf Assoc 13(5):516–525

    Article  Google Scholar 

  79. Farkas R, Szarvas G (2008) Automatic construction of rule-based icd-9-cm coding systems. In: BMC bioinformatics, vol 9, pp S10. BioMed Central

  80. Xu J-W, Yu S, Bi J, Lita LV, Niculescu RS, Rao RB (2007) Automatic medical coding of patient records via weighted ridge regression. In: Sixth international conference on machine learning and applications (ICMLA 2007), pp 260–265. IEEE

  81. Yan Y, Fung G, Dy JG, Romer R (2010) Medical coding classification by leveraging inter-code relationships. In: Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, pp 193–202

  82. Eijkemans MJC, Van Houdenhoven M, Nguyen T, Boersma E, Steyerberg EW, Kazemier G (2010) Predicting the unpredictablea new prediction model for operating room times using individual characteristics and the surgeon’s estimate. Anesthesiology: J Am Soc Anesthesiologists 112(1):41–49

    Article  Google Scholar 

  83. Aspland E, Gartner D, Harper P (2019) Clinical pathway modelling: a literature review. Health Systems 1–23

  84. Gartner D (2015) Scheduling the hospital-wide flow of elective patients. Springer Lecture Notes in Economics and Mathematical Systems, Heidelberg

    Google Scholar 

  85. England T, Gartner D, Ostler E, Harper P, Behrens D, Boulton J, Bull D, Cordeaux C, Jenkins I, Lindsay F (2019) Near real-time bed modelling feasibility study. Journal of Simulation 1–12. https://doi.org/10.1080/17477778.2019.1706434

  86. Liu S, Ma W, Moore R, Ganesan V, Nelson S (2005) Rxnorm: prescription for electronic drug information exchange. IT Prof 7(5):17–23

    Article  Google Scholar 

  87. Krämer J, Schreyögg J, Busse R (2019) Classification of hospital admissions into emergency and elective care: a machine learning approach. Health Care Manag Sci 22(1):85–105

    Article  Google Scholar 

  88. Ferrão J. C., Oliveira M. D., Janela F., Martins H. M. G. (2013) Using Structured EHR Data and SVM to Support ICD-9-CM Coding. 2013 IEEE International Conference on Healthcare Informatics, 511516. https://doi.org/10.1109/ICHI.2013.79

Download references

Acknowledgements

The authors are grateful for the close collaboration with colleagues at Hospital Prof. Doutor Fernando Fonseca and their availability throughout this research study. The authors sincerely thank the associate editor and the anonymous referees for their careful review and excellent suggestions for improving this paper.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to José Carlos Ferrão.

Ethics declarations

Ethics approval

Ethics approval was waived reviewed and signed off by hospital board and chief information officer. This was a fully retrospective study with anonymized, routinely collected EHR data extracted by a hospital-designated data handler.

Conflict of Interests

No conflicts of interest to declare.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix A: Most frequent ICD 9 diagnosis codes

Table 14 Top 75 most frequent ICD-9-CM diagnosis codes (category level), ordered by decreasing order of frequency

Appendix B: Most frequent ICD 9 procedure codes

Table 15 Top 75 most frequent ICD-9-CM procedure codes, ordered by decreasing order of frequency

Appendix C: Most frequent DRG codes

Table 16 Top 25 most frequent DRG codes ordered by decreasing order of frequency

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Ferrão, J.C., Oliveira, M.D., Gartner, D. et al. Leveraging electronic health record data to inform hospital resource management. Health Care Manag Sci 24, 716–741 (2021). https://doi.org/10.1007/s10729-021-09554-4

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10729-021-09554-4

Keywords

Navigation