Abstract
Early identification of resource needs is instrumental in promoting efficient hospital resource management. Hospital information systems, and electronic health records (EHR) in particular, collect valuable demographic and clinical patient data from the moment patients are admitted, which can help predict expected resource needs in early stages of patient episodes. To this end, this article proposes a data mining methodology to systematically obtain predictions for relevant managerial variables by leveraging structured EHR data. Specifically, these managerial variables are: i) Diagnosis categories, ii) procedure codes, iii) diagnosis-related groups (DRGs), iv) outlier episodes and v) length of stay (LOS). The proposed methodology approaches the problem in four stages: Feature set construction, feature selection, prediction model development, and model performance evaluation. We tested this approach with an EHR dataset of 5,089 inpatient episodes and compared different classification and regression models (for categorical and continuous variables, respectively), performed temporal analysis of model performance, analyzed the impact of training set homogeneity on performance and assessed the contribution of different EHR data elements for model predictive power. Overall, our results indicate that inpatient EHR data can effectively be leveraged to inform resource management on multiple perspectives. Logistic regression (combined with minimal redundancy maximum relevance feature selection) and bagged decision trees yielded best results for predicting categorical and numerical managerial variables, respectively. Furthermore, our temporal analysis indicated that, while DRG classes are more difficult to predict, several diagnosis categories, procedure codes and LOS amongst shorter-stay patients can be predicted with higher confidence in early stages of patient stay. Lastly, value of information analysis indicated that diagnoses, medication and structured assessment forms were the most valuable EHR data elements in predicting managerial variables of interest through a data mining approach.
Similar content being viewed by others
References
Orszag PR, Emanuel EJ (2010) Health care reform and cost control. N Engl J Med 363 (7):601–603
Carter M (2002) Diagnosis: mismanagement of resources. OR MS TODAY 29(2):26–33
Yasar AO (2005) Quantitative methods in health care management: techniques and applications, vol 4. Wiley, New York
Hulshof PJH, Kortbeek N, Boucherie RJ, Hans EW, Bakker PJM (2012) Taxonomic classification of planning decisions in health care: a structured review of the state of the art in or/ms. Health Syst 1 (2):129–175
Hans EW, Van Houdenhoven M, Hulshof PJH (2012) A framework for healthcare planning and control. In: Handbook of healthcare system scheduling. Springer, pp 303–320
Baker A (2001) Crossing the quality chasm: a new health system for the 21st century
Hillestad R, Bigelow J, Bower A, Girosi F, Meili R, Scoville R, Taylor R (2005) Can electronic medical record systems transform health care? potential health benefits, savings, and costs. Health Affairs 24 (5):1103–1117
Kandula S, Zeng-Treitler Q, Chen L, Salomon WL, Bray BE (2011) A bootstrapping algorithm to improve cohort identification using structured data. J Biomed Inform 44:S63–S68
Safran C, Bloomrosen M, Hammond WE, Labkoff S, Markel-Fox S, Tang PC, Detmer DE (2007) Toward a national framework for the secondary use of health data: an american medical informatics association white paper. J Am Med Inform Assoc 14(1):1–9
Bourne PE (2014) What big data means to me
Luo J, Min W, Gopukumar D, Zhao Y (2016) Big data application in biomedical research and health care: a literature review. Biomed Inform Insights 8:BII–s31559
Ben-Assuli O, Padman R Trajectories of repeated readmissions of chronic disease patients: Risk stratification, profiling, and prediction. MIS Quarterly, 44(1), 2020
Herland M, Khoshgoftaar TM, Wald R (2014) A review of data mining using big data in health informatics. J Big Data 1(1):1–35
Ross MK, Wei W, Ohno-Machado L (2014) “big data” and the electronic health record. Yearbook Med Inform 23(01):97–104
Stanfill MH, Williams M, Fenton SH, Jenders RA, Hersh WR (2010) A systematic literature review of automated clinical coding and classification systems. J Am Med Inform Assoc 17(6):646–651
Busse R, Geissler A, Aaviksoo A, Cots F, Häkkinen U, Kobel C, Mateus C, Or Z, O’Reilly J, Serdén L et al (2013) Diagnosis related groups in europe: moving towards transparency, efficiency, and quality in hospitals? Bmj 346:f3197
Gartner D, Kolisch R, Neill DB, Padman R (2015) Machine learning approaches for early drg classification and resource allocation. INFORMS J Comput 27(4):718–734
Gartner D, Kolisch R (2014) Scheduling the hospital-wide flow of elective patients. Eur J Oper Res 233(3):689–699
Gartner D, Padman R (2020) Flexible hospital-wide elective patient scheduling. J Oper Res Soc 71(6):878–892
Bellazzi R, Zupan B (2008) Predictive data mining in clinical medicine: current issues and guidelines. Int J Med Inform 77(2):81–97
Chuang C-L (2011) Case-based reasoning support for liver disease diagnosis. Artif Intell Med 53 (1):15–23
Al Jarullah AA (2011) Decision tree discovery for the diagnosis of type ii diabetes. In: 2011 International conference on innovations in information technology, pp 303–307, IEEE
Hoogendoorn M, Moons LG, Numans ME, Sips RJ (2014) Utilizing data mining for predictive modeling of colorectal cancer using electronic medical records. Lecture Notes Comput Sci 8609:132–141
Kop R, Hoogendoorn M, Moons LMG, Numans ME, Annette ten Teije (2015) On the advantage of using dedicated data mining techniques to predict colorectal cancer. In: Conference on artificial intelligence in medicine in europe. Springer, pp 133–142
Wu Jionglin, Roy J, Stewart WF (2010) Prediction modeling using ehr data: challenges, strategies, and a comparison of machine learning approaches. Medical Care 48(6):S106–S113
Kocbek S, Cavedon L, Martinez D, Bain C, Manus CM, Haffari G, Zukerman I, Verspoor K (2016) Text mining electronic hospital records to automatically classify admissions against disease: measuring the impact of linking data sources. J Biomed Inform 64:158–167
Meystre SM, Savova GK, Kipper-Schuler KC, Hurdle JF (2008) Extracting information from textual documents in the electronic health record: a review of recent research. Yearbook Med Inform 17(01):128–144
Scheurwegs E, Luyckx K, Luyten L, Daelemans W, Van den Bulcke T. (2015) Data integration of structured and unstructured sources for assigning clinical codes to patient stays. J Am Med Inform Assoc 23(e1):e11–e19
Ferrão JC, Oliveira MD, Janela F, Martins HMG, Gartner D (2020) Can structured EHR data support clinical coding? A data mining approach. Health Systems, 124. https://doi.org/10.1080/20476965.2020.1729666
Chiaravalloti MT, Guarasci R, Lagani V, Pasceri E, Trunfio R (2014) A coding support system for the icd-9-cm standard. In: 2014 IEEE international conference on healthcare informatics. IEEE, pp 71–78
Subotin M, Davis A (2014) A system for predicting icd-10-pcs codes from electronic health records. In: Proceedings of BioNLP 2014, pp 59–67
Gartner D, Padman R (2015) Improving hospital-wide early resource allocation through machine learning. Stud Health Technol Inform 216:315–319
Okamoto K, Uchiyama T, Takemura T, Kume N, Kuroda T, Yoshihara H (2018) Automatic selection of diagnosis procedure combination codes based on partial treatment data relative to the number of hospitalization days. European Journal of Biomedical Informatics 14(1):45–51
David EC, Louise MR (2002) Concurrent prediction of hospital mortality and length of stay from risk factors on admission. Health services research 37(3):631–645
Faddy M, Graves N, Pettitt A (2009) Modeling length of stay in hospital and other right skewed data: comparison of phase-type, gamma and log-normal distributions. Value in Health 12(2):309–314
Gustafson DH (1968) Length of stay: prediction and explanation. Health Serv Res 3(1):12
Arboix A, Massons J, García-eroles L, Targa C, Oliveres M, Comes E (2012) Clinical predictors of prolonged hospital stay after acute stroke: relevance of medical complications. Int J Clin Med 3(06):502
Osnabrugge RL, Speir AM, Head SJ, Jones PG, Ailawadi G, Fonner CE, Fonner E Jr, Pieter KA, Rich JB (2014) Prediction of costs and length of stay in coronary artery bypass grafting. Ann Thorac Surg 98(4):1286–1293
Barbini P, Barbini E, Furini S, Cevenini G (2014) A straightforward approach to designing a scoring system for predicting length-of-stay of cardiac surgery patients. BMC Med Inf Decis Mak 14(1):89
Zoller B, Spanaus K, Gerster R, Fasshauer M, Stehberger PA, Klinzing S, Vergopoulos A, Eckardstein A, Béchir M (2014) Icg-liver test versus new biomarkers as prognostic markers for prolonged length of stay in critically ill patients-a prospective study of accuracy for prediction of length of stay in the icu. Ann Intensive Care 4(1):19
Kapadia AS, Chan W, Sachdeva R, Moye LA, Jefferson LS (2000) Predicting duration of stay in a pediatric intensive care unit: A markovian approach. Eur J Oper Res 124(2):353–359
Rowan M, Ryan T, Hegarty F, O’Hare N (2007) The use of artificial neural networks to stratify the length of stay of cardiac patients based on preoperative and initial postoperative factors. Artif Intell Med 40(3):211–221
Messaoudi N, Cocker JD, Stockman B, Bossaert LL, Rodrigus IER (2009) Prediction of prolonged length of stay in the intensive care unit after cardiac surgery The need for a multi-institutional risk scoring system. J Card Surg 24(2):127–133
Ong PH, Pua YH (2013) A prediction model for length of stay after total and unicompartmental knee replacement. Bone & Joint J 95(11):1490–1496
Carter EM, Potts HWW (2014) Predicting length of stay from an electronic patient record system: a primary total knee replacement example. BMC Med Inform Dec Mak 14(1):26
Verburg IWM, de Keizer NF, de Jonge E, Peek N (2014) Comparison of regression methods for modeling intensive care length of stay. PloS one 9(10):e109684
Lafaro RJ, Pothula S, Kubal KP, Inchiosa ME, Pothula VM, Yuan SC, Maerz DA, Montes L, Oleszkiewicz SM, Yusupov A (2015) Neural network prediction of icu length of stay following cardiac surgery based on pre-incision variables. PLos One 10(12):e0145395
Wrenn J, Jones I, Lanaghan K, Congdon CB, Aronsky D (2005) Estimating patient’s length of stay in the emergency department with an artificial neural network. In: AMIA... Annual Symposium proceedings. AMIA Symposium. vol 2005, pp 1155–1155. American Medical Informatics Association
Xie Y, Schreier G, Chang DCW, Neubauer S, Liu Y, Redmond SJ, Lovell NH (2015) Predicting days in hospital using health insurance claims. IEEE J Biomed Health Inform 19(4):1224– 1233
Xie Y, Schreier G, Hoy M, Liu Y, Neubauer S, Chang DCW, Redmond SJ, Lovell NH (2016) Analyzing health insurance claims on different timescales to predict days in hospital. J Biomed Inform 60:187–196
Houdenhoven MV, Nguyen DT, Eijkemans MJ, Steyerberg EW, Tilanus HW, Gommers D, Wullink G, Bakker J, Kazemier G (2007) Optimizing intensive care capacity using individual length-of-stay prediction models. Critical Care 11(2):R42
Yang CS, Wei CP, Yuan CC, Schoung J (2010) Predicting the length of hospital stay of burn patients Comparisons of prediction accuracy among different clinical stages. Decis Support Syst 50(1):325–335
Huang Z, Juarez JM, Duan H, Li H (2013) Length of stay prediction for clinical treatment process using temporal similarity. Expert Syst Appl 40(16):6330–6339
Huang Z, Dong W, Ji L, Duan H (2016) Outcome prediction in clinical treatment processes. J Med Syst 40(1):8
Bishop CM (2006) Pattern recognition and machine learning. Springer, Berlin
Haux R, Seggewies C, Baldauf-Sobez W, Kullmann P, Reichert H, Luedecke L, Seibold H (2003) Soarian™–workflow management applied for health care. Methods Inform Med 42(01):25–36
Kavuluru R, Rios A, Lu Y (2015) An empirical evaluation of supervised learning approaches in assigning diagnosis codes to electronic medical records. Artif Intell Med 65(2):155–166
Sechidis K, Tsoumakas G, Vlahavas I (2011) On the stratification of multi-label data. In: Joint european conference on machine learning and knowledge discovery in databases. Springer, pp 145–158
Littig SJ, Isken M (2007) Short term hospital occupancy prediction. Health Care Manag Sci 10(1):47–66
(2013). Ministério da saúde, portaria n.o 163/2013. 2495–2606
Ng K, Ghoting A, Steinhubl SR, Stewart W F, Malin B, Sun J (2014) Paramo: a parallel predictive modeling platform for healthcare analytic research using electronic health records. J Biomed Inform 48:160–170
Qiu S, Chinnam RB, Murat A, Batarse B, Neemuchwala H, Jordan W (2015) A cost sensitive inpatient bed reservation approach to reduce emergency department boarding times. Health Care Manag Sci 18(1):67–85
Guyon I, Elisseeff A (2003) An introduction to variable and feature selection. J Mach Learn Res 3(Mar):1157–1182
Saeys Y, Inza I, Larrañaga P (2007) A review of feature selection techniques in bioinformatics. bioinformatics 23(19):2507–2517
Liu H, Motoda H, Setiono R, Zheng Z (2010) Feature selection: An ever evolving frontier in data mining. In: Feature Selection in Data Mining. pp 4–13
Lei Y, Liu H (2004) Efficient feature selection via analysis of relevance and redundancy. J Mach Learn Res 5(Oct):1205– 1224
Peng H, Long F, Ding C (2005) Feature selection based on mutual information: criteria of max-dependency, max-relevance, and min-redundancy. IEEE Transactions on Pattern Analysis & Machine Intelligence 27 (8):1226–1238
Yang Y, Pedersen JO (1997) A comparative study on feature selection in text categorization. In: Icml, vol 97, p 35
Draper NR, Smith H (1998) Applied regression analysis, vol 326. Wiley, New York
Breiman L (2017) Classification and regression trees. Routledge, Evanston
Pereira F, Mitchell T, Botvinick M (2009) Machine learning classifiers and fmri: a tutorial overview. Neuroimage 45(1):S199–S209
Hosmer DW, Lemeshow S (2000) Applied logistic regression. Wiley, New York
Sikora R et al (2015) A modified stacking ensemble machine learning algorithm using genetic algorithms. In: Handbook of research on organizational transformations through big data analytics. pp 43–53. IGi Global
Fawagreh K, Gaber MM, Elyan E (2014) Random forests: from early developments to recent advancements. Syst Sci Contr Eng An Open Access J 2(1):602–609
Sylvain A, Alain C, et al. (2010) A survey of cross-validation procedures for model selection. Stat Surv 4:40–79
Brown G, Pocock A, Ming-jie Z, Mikel L (2012) Conditional likelihood maximisation: a unifying framework for information theoretic feature selection. J Mach Learn Res 13(Jan):27–66
Ferrão JC, Oliveira MD, Janela F, Martins HMG (2016) Preprocessing structured clinical data for predictive modeling and decision support - A roadmap to tackle the challenges. Applied Clinical Informatics 7(4):1135–1153. https://doi.org/10.4338/ACI-2016-03-SOA-0035
Pakhomov VS, Buntrock JD, Chute CG (2006) Automating the assignment of diagnosis codes to patient encounters using example-based and machine learning techniques. J Am Med Inf Assoc 13(5):516–525
Farkas R, Szarvas G (2008) Automatic construction of rule-based icd-9-cm coding systems. In: BMC bioinformatics, vol 9, pp S10. BioMed Central
Xu J-W, Yu S, Bi J, Lita LV, Niculescu RS, Rao RB (2007) Automatic medical coding of patient records via weighted ridge regression. In: Sixth international conference on machine learning and applications (ICMLA 2007), pp 260–265. IEEE
Yan Y, Fung G, Dy JG, Romer R (2010) Medical coding classification by leveraging inter-code relationships. In: Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, pp 193–202
Eijkemans MJC, Van Houdenhoven M, Nguyen T, Boersma E, Steyerberg EW, Kazemier G (2010) Predicting the unpredictablea new prediction model for operating room times using individual characteristics and the surgeon’s estimate. Anesthesiology: J Am Soc Anesthesiologists 112(1):41–49
Aspland E, Gartner D, Harper P (2019) Clinical pathway modelling: a literature review. Health Systems 1–23
Gartner D (2015) Scheduling the hospital-wide flow of elective patients. Springer Lecture Notes in Economics and Mathematical Systems, Heidelberg
England T, Gartner D, Ostler E, Harper P, Behrens D, Boulton J, Bull D, Cordeaux C, Jenkins I, Lindsay F (2019) Near real-time bed modelling feasibility study. Journal of Simulation 1–12. https://doi.org/10.1080/17477778.2019.1706434
Liu S, Ma W, Moore R, Ganesan V, Nelson S (2005) Rxnorm: prescription for electronic drug information exchange. IT Prof 7(5):17–23
Krämer J, Schreyögg J, Busse R (2019) Classification of hospital admissions into emergency and elective care: a machine learning approach. Health Care Manag Sci 22(1):85–105
Ferrão J. C., Oliveira M. D., Janela F., Martins H. M. G. (2013) Using Structured EHR Data and SVM to Support ICD-9-CM Coding. 2013 IEEE International Conference on Healthcare Informatics, 511516. https://doi.org/10.1109/ICHI.2013.79
Acknowledgements
The authors are grateful for the close collaboration with colleagues at Hospital Prof. Doutor Fernando Fonseca and their availability throughout this research study. The authors sincerely thank the associate editor and the anonymous referees for their careful review and excellent suggestions for improving this paper.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Ethics approval
Ethics approval was waived reviewed and signed off by hospital board and chief information officer. This was a fully retrospective study with anonymized, routinely collected EHR data extracted by a hospital-designated data handler.
Conflict of Interests
No conflicts of interest to declare.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendices
Appendix A: Most frequent ICD 9 diagnosis codes
Appendix B: Most frequent ICD 9 procedure codes
Appendix C: Most frequent DRG codes
Rights and permissions
About this article
Cite this article
Ferrão, J.C., Oliveira, M.D., Gartner, D. et al. Leveraging electronic health record data to inform hospital resource management. Health Care Manag Sci 24, 716–741 (2021). https://doi.org/10.1007/s10729-021-09554-4
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10729-021-09554-4