Leveraging electronic health record data to inform hospital resource management

Ferrão, José Carlos; Oliveira, Mónica Duarte; Gartner, Daniel; Janela, Filipe; Martins, Henrique M. G.

doi:10.1007/s10729-021-09554-4

Leveraging electronic health record data to inform hospital resource management

A systematic data mining approach

Published: 24 May 2021

Volume 24, pages 716–741, (2021)
Cite this article

Health Care Management Science Aims and scope Submit manuscript

José Carlos Ferrão ORCID: orcid.org/0000-0003-1001-1451^1,2,
Mónica Duarte Oliveira²,
Daniel Gartner³,
Filipe Janela¹ &
…
Henrique M. G. Martins⁴

1323 Accesses
4 Citations
2 Altmetric
Explore all metrics

Abstract

Early identification of resource needs is instrumental in promoting efficient hospital resource management. Hospital information systems, and electronic health records (EHR) in particular, collect valuable demographic and clinical patient data from the moment patients are admitted, which can help predict expected resource needs in early stages of patient episodes. To this end, this article proposes a data mining methodology to systematically obtain predictions for relevant managerial variables by leveraging structured EHR data. Specifically, these managerial variables are: i) Diagnosis categories, ii) procedure codes, iii) diagnosis-related groups (DRGs), iv) outlier episodes and v) length of stay (LOS). The proposed methodology approaches the problem in four stages: Feature set construction, feature selection, prediction model development, and model performance evaluation. We tested this approach with an EHR dataset of 5,089 inpatient episodes and compared different classification and regression models (for categorical and continuous variables, respectively), performed temporal analysis of model performance, analyzed the impact of training set homogeneity on performance and assessed the contribution of different EHR data elements for model predictive power. Overall, our results indicate that inpatient EHR data can effectively be leveraged to inform resource management on multiple perspectives. Logistic regression (combined with minimal redundancy maximum relevance feature selection) and bagged decision trees yielded best results for predicting categorical and numerical managerial variables, respectively. Furthermore, our temporal analysis indicated that, while DRG classes are more difficult to predict, several diagnosis categories, procedure codes and LOS amongst shorter-stay patients can be predicted with higher confidence in early stages of patient stay. Lastly, value of information analysis indicated that diagnoses, medication and structured assessment forms were the most valuable EHR data elements in predicting managerial variables of interest through a data mining approach.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 2

Fig. 6

The role of artificial intelligence in healthcare: a structured literature review

Article Open access 10 April 2021

Revolutionizing healthcare: the role of artificial intelligence in clinical practice

Article Open access 22 September 2023

Big data in healthcare: management, analysis and future prospects

Article Open access 19 June 2019

References

Orszag PR, Emanuel EJ (2010) Health care reform and cost control. N Engl J Med 363 (7):601–603
Article Google Scholar
Carter M (2002) Diagnosis: mismanagement of resources. OR MS TODAY 29(2):26–33
Google Scholar
Yasar AO (2005) Quantitative methods in health care management: techniques and applications, vol 4. Wiley, New York
Google Scholar
Hulshof PJH, Kortbeek N, Boucherie RJ, Hans EW, Bakker PJM (2012) Taxonomic classification of planning decisions in health care: a structured review of the state of the art in or/ms. Health Syst 1 (2):129–175
Article Google Scholar
Hans EW, Van Houdenhoven M, Hulshof PJH (2012) A framework for healthcare planning and control. In: Handbook of healthcare system scheduling. Springer, pp 303–320
Baker A (2001) Crossing the quality chasm: a new health system for the 21st century
Hillestad R, Bigelow J, Bower A, Girosi F, Meili R, Scoville R, Taylor R (2005) Can electronic medical record systems transform health care? potential health benefits, savings, and costs. Health Affairs 24 (5):1103–1117
Article Google Scholar
Kandula S, Zeng-Treitler Q, Chen L, Salomon WL, Bray BE (2011) A bootstrapping algorithm to improve cohort identification using structured data. J Biomed Inform 44:S63–S68
Article Google Scholar
Safran C, Bloomrosen M, Hammond WE, Labkoff S, Markel-Fox S, Tang PC, Detmer DE (2007) Toward a national framework for the secondary use of health data: an american medical informatics association white paper. J Am Med Inform Assoc 14(1):1–9
Article Google Scholar
Bourne PE (2014) What big data means to me
Luo J, Min W, Gopukumar D, Zhao Y (2016) Big data application in biomedical research and health care: a literature review. Biomed Inform Insights 8:BII–s31559
Article Google Scholar
Ben-Assuli O, Padman R Trajectories of repeated readmissions of chronic disease patients: Risk stratification, profiling, and prediction. MIS Quarterly, 44(1), 2020
Herland M, Khoshgoftaar TM, Wald R (2014) A review of data mining using big data in health informatics. J Big Data 1(1):1–35
Article Google Scholar
Ross MK, Wei W, Ohno-Machado L (2014) “big data” and the electronic health record. Yearbook Med Inform 23(01):97–104
Article Google Scholar
Stanfill MH, Williams M, Fenton SH, Jenders RA, Hersh WR (2010) A systematic literature review of automated clinical coding and classification systems. J Am Med Inform Assoc 17(6):646–651
Article Google Scholar
Busse R, Geissler A, Aaviksoo A, Cots F, Häkkinen U, Kobel C, Mateus C, Or Z, O’Reilly J, Serdén L et al (2013) Diagnosis related groups in europe: moving towards transparency, efficiency, and quality in hospitals? Bmj 346:f3197
Article Google Scholar
Gartner D, Kolisch R, Neill DB, Padman R (2015) Machine learning approaches for early drg classification and resource allocation. INFORMS J Comput 27(4):718–734
Article Google Scholar
Gartner D, Kolisch R (2014) Scheduling the hospital-wide flow of elective patients. Eur J Oper Res 233(3):689–699
Article Google Scholar
Gartner D, Padman R (2020) Flexible hospital-wide elective patient scheduling. J Oper Res Soc 71(6):878–892
Article Google Scholar
Bellazzi R, Zupan B (2008) Predictive data mining in clinical medicine: current issues and guidelines. Int J Med Inform 77(2):81–97
Article Google Scholar
Chuang C-L (2011) Case-based reasoning support for liver disease diagnosis. Artif Intell Med 53 (1):15–23
Article Google Scholar
Al Jarullah AA (2011) Decision tree discovery for the diagnosis of type ii diabetes. In: 2011 International conference on innovations in information technology, pp 303–307, IEEE
Hoogendoorn M, Moons LG, Numans ME, Sips RJ (2014) Utilizing data mining for predictive modeling of colorectal cancer using electronic medical records. Lecture Notes Comput Sci 8609:132–141
Article Google Scholar
Kop R, Hoogendoorn M, Moons LMG, Numans ME, Annette ten Teije (2015) On the advantage of using dedicated data mining techniques to predict colorectal cancer. In: Conference on artificial intelligence in medicine in europe. Springer, pp 133–142
Wu Jionglin, Roy J, Stewart WF (2010) Prediction modeling using ehr data: challenges, strategies, and a comparison of machine learning approaches. Medical Care 48(6):S106–S113
Article Google Scholar
Kocbek S, Cavedon L, Martinez D, Bain C, Manus CM, Haffari G, Zukerman I, Verspoor K (2016) Text mining electronic hospital records to automatically classify admissions against disease: measuring the impact of linking data sources. J Biomed Inform 64:158–167
Article Google Scholar
Meystre SM, Savova GK, Kipper-Schuler KC, Hurdle JF (2008) Extracting information from textual documents in the electronic health record: a review of recent research. Yearbook Med Inform 17(01):128–144
Article Google Scholar
Scheurwegs E, Luyckx K, Luyten L, Daelemans W, Van den Bulcke T. (2015) Data integration of structured and unstructured sources for assigning clinical codes to patient stays. J Am Med Inform Assoc 23(e1):e11–e19
Article Google Scholar
Ferrão JC, Oliveira MD, Janela F, Martins HMG, Gartner D (2020) Can structured EHR data support clinical coding? A data mining approach. Health Systems, 124. https://doi.org/10.1080/20476965.2020.1729666
Chiaravalloti MT, Guarasci R, Lagani V, Pasceri E, Trunfio R (2014) A coding support system for the icd-9-cm standard. In: 2014 IEEE international conference on healthcare informatics. IEEE, pp 71–78
Subotin M, Davis A (2014) A system for predicting icd-10-pcs codes from electronic health records. In: Proceedings of BioNLP 2014, pp 59–67
Gartner D, Padman R (2015) Improving hospital-wide early resource allocation through machine learning. Stud Health Technol Inform 216:315–319
Google Scholar
Okamoto K, Uchiyama T, Takemura T, Kume N, Kuroda T, Yoshihara H (2018) Automatic selection of diagnosis procedure combination codes based on partial treatment data relative to the number of hospitalization days. European Journal of Biomedical Informatics 14(1):45–51
Article Google Scholar
David EC, Louise MR (2002) Concurrent prediction of hospital mortality and length of stay from risk factors on admission. Health services research 37(3):631–645
Article Google Scholar
Faddy M, Graves N, Pettitt A (2009) Modeling length of stay in hospital and other right skewed data: comparison of phase-type, gamma and log-normal distributions. Value in Health 12(2):309–314
Article Google Scholar
Gustafson DH (1968) Length of stay: prediction and explanation. Health Serv Res 3(1):12
Google Scholar
Arboix A, Massons J, García-eroles L, Targa C, Oliveres M, Comes E (2012) Clinical predictors of prolonged hospital stay after acute stroke: relevance of medical complications. Int J Clin Med 3(06):502
Article Google Scholar
Osnabrugge RL, Speir AM, Head SJ, Jones PG, Ailawadi G, Fonner CE, Fonner E Jr, Pieter KA, Rich JB (2014) Prediction of costs and length of stay in coronary artery bypass grafting. Ann Thorac Surg 98(4):1286–1293
Article Google Scholar
Barbini P, Barbini E, Furini S, Cevenini G (2014) A straightforward approach to designing a scoring system for predicting length-of-stay of cardiac surgery patients. BMC Med Inf Decis Mak 14(1):89
Article Google Scholar
Zoller B, Spanaus K, Gerster R, Fasshauer M, Stehberger PA, Klinzing S, Vergopoulos A, Eckardstein A, Béchir M (2014) Icg-liver test versus new biomarkers as prognostic markers for prolonged length of stay in critically ill patients-a prospective study of accuracy for prediction of length of stay in the icu. Ann Intensive Care 4(1):19
Article Google Scholar
Kapadia AS, Chan W, Sachdeva R, Moye LA, Jefferson LS (2000) Predicting duration of stay in a pediatric intensive care unit: A markovian approach. Eur J Oper Res 124(2):353–359
Article Google Scholar
Rowan M, Ryan T, Hegarty F, O’Hare N (2007) The use of artificial neural networks to stratify the length of stay of cardiac patients based on preoperative and initial postoperative factors. Artif Intell Med 40(3):211–221
Article Google Scholar
Messaoudi N, Cocker JD, Stockman B, Bossaert LL, Rodrigus IER (2009) Prediction of prolonged length of stay in the intensive care unit after cardiac surgery The need for a multi-institutional risk scoring system. J Card Surg 24(2):127–133
Article Google Scholar
Ong PH, Pua YH (2013) A prediction model for length of stay after total and unicompartmental knee replacement. Bone & Joint J 95(11):1490–1496
Article Google Scholar
Carter EM, Potts HWW (2014) Predicting length of stay from an electronic patient record system: a primary total knee replacement example. BMC Med Inform Dec Mak 14(1):26
Article Google Scholar
Verburg IWM, de Keizer NF, de Jonge E, Peek N (2014) Comparison of regression methods for modeling intensive care length of stay. PloS one 9(10):e109684
Article Google Scholar
Lafaro RJ, Pothula S, Kubal KP, Inchiosa ME, Pothula VM, Yuan SC, Maerz DA, Montes L, Oleszkiewicz SM, Yusupov A (2015) Neural network prediction of icu length of stay following cardiac surgery based on pre-incision variables. PLos One 10(12):e0145395
Article Google Scholar
Wrenn J, Jones I, Lanaghan K, Congdon CB, Aronsky D (2005) Estimating patient’s length of stay in the emergency department with an artificial neural network. In: AMIA... Annual Symposium proceedings. AMIA Symposium. vol 2005, pp 1155–1155. American Medical Informatics Association
Xie Y, Schreier G, Chang DCW, Neubauer S, Liu Y, Redmond SJ, Lovell NH (2015) Predicting days in hospital using health insurance claims. IEEE J Biomed Health Inform 19(4):1224– 1233
Article Google Scholar
Xie Y, Schreier G, Hoy M, Liu Y, Neubauer S, Chang DCW, Redmond SJ, Lovell NH (2016) Analyzing health insurance claims on different timescales to predict days in hospital. J Biomed Inform 60:187–196
Article Google Scholar
Houdenhoven MV, Nguyen DT, Eijkemans MJ, Steyerberg EW, Tilanus HW, Gommers D, Wullink G, Bakker J, Kazemier G (2007) Optimizing intensive care capacity using individual length-of-stay prediction models. Critical Care 11(2):R42
Article Google Scholar
Yang CS, Wei CP, Yuan CC, Schoung J (2010) Predicting the length of hospital stay of burn patients Comparisons of prediction accuracy among different clinical stages. Decis Support Syst 50(1):325–335
Article Google Scholar
Huang Z, Juarez JM, Duan H, Li H (2013) Length of stay prediction for clinical treatment process using temporal similarity. Expert Syst Appl 40(16):6330–6339
Article Google Scholar
Huang Z, Dong W, Ji L, Duan H (2016) Outcome prediction in clinical treatment processes. J Med Syst 40(1):8
Article Google Scholar
Bishop CM (2006) Pattern recognition and machine learning. Springer, Berlin
Google Scholar
Haux R, Seggewies C, Baldauf-Sobez W, Kullmann P, Reichert H, Luedecke L, Seibold H (2003) Soarian™–workflow management applied for health care. Methods Inform Med 42(01):25–36
Article Google Scholar
Kavuluru R, Rios A, Lu Y (2015) An empirical evaluation of supervised learning approaches in assigning diagnosis codes to electronic medical records. Artif Intell Med 65(2):155–166
Article Google Scholar
Sechidis K, Tsoumakas G, Vlahavas I (2011) On the stratification of multi-label data. In: Joint european conference on machine learning and knowledge discovery in databases. Springer, pp 145–158
Littig SJ, Isken M (2007) Short term hospital occupancy prediction. Health Care Manag Sci 10(1):47–66
Article Google Scholar
(2013). Ministério da saúde, portaria n.o 163/2013. 2495–2606
Ng K, Ghoting A, Steinhubl SR, Stewart W F, Malin B, Sun J (2014) Paramo: a parallel predictive modeling platform for healthcare analytic research using electronic health records. J Biomed Inform 48:160–170
Article Google Scholar
Qiu S, Chinnam RB, Murat A, Batarse B, Neemuchwala H, Jordan W (2015) A cost sensitive inpatient bed reservation approach to reduce emergency department boarding times. Health Care Manag Sci 18(1):67–85
Article Google Scholar
Guyon I, Elisseeff A (2003) An introduction to variable and feature selection. J Mach Learn Res 3(Mar):1157–1182
Google Scholar
Saeys Y, Inza I, Larrañaga P (2007) A review of feature selection techniques in bioinformatics. bioinformatics 23(19):2507–2517
Article Google Scholar
Liu H, Motoda H, Setiono R, Zheng Z (2010) Feature selection: An ever evolving frontier in data mining. In: Feature Selection in Data Mining. pp 4–13
Lei Y, Liu H (2004) Efficient feature selection via analysis of relevance and redundancy. J Mach Learn Res 5(Oct):1205– 1224
Google Scholar
Peng H, Long F, Ding C (2005) Feature selection based on mutual information: criteria of max-dependency, max-relevance, and min-redundancy. IEEE Transactions on Pattern Analysis & Machine Intelligence 27 (8):1226–1238
Article Google Scholar
Yang Y, Pedersen JO (1997) A comparative study on feature selection in text categorization. In: Icml, vol 97, p 35
Draper NR, Smith H (1998) Applied regression analysis, vol 326. Wiley, New York
Book Google Scholar
Breiman L (2017) Classification and regression trees. Routledge, Evanston
Book Google Scholar
Pereira F, Mitchell T, Botvinick M (2009) Machine learning classifiers and fmri: a tutorial overview. Neuroimage 45(1):S199–S209
Article Google Scholar
Hosmer DW, Lemeshow S (2000) Applied logistic regression. Wiley, New York
Book Google Scholar
Sikora R et al (2015) A modified stacking ensemble machine learning algorithm using genetic algorithms. In: Handbook of research on organizational transformations through big data analytics. pp 43–53. IGi Global
Fawagreh K, Gaber MM, Elyan E (2014) Random forests: from early developments to recent advancements. Syst Sci Contr Eng An Open Access J 2(1):602–609
Google Scholar
Sylvain A, Alain C, et al. (2010) A survey of cross-validation procedures for model selection. Stat Surv 4:40–79
Google Scholar
Brown G, Pocock A, Ming-jie Z, Mikel L (2012) Conditional likelihood maximisation: a unifying framework for information theoretic feature selection. J Mach Learn Res 13(Jan):27–66
Google Scholar
Ferrão JC, Oliveira MD, Janela F, Martins HMG (2016) Preprocessing structured clinical data for predictive modeling and decision support - A roadmap to tackle the challenges. Applied Clinical Informatics 7(4):1135–1153. https://doi.org/10.4338/ACI-2016-03-SOA-0035
Article Google Scholar
Pakhomov VS, Buntrock JD, Chute CG (2006) Automating the assignment of diagnosis codes to patient encounters using example-based and machine learning techniques. J Am Med Inf Assoc 13(5):516–525
Article Google Scholar
Farkas R, Szarvas G (2008) Automatic construction of rule-based icd-9-cm coding systems. In: BMC bioinformatics, vol 9, pp S10. BioMed Central
Xu J-W, Yu S, Bi J, Lita LV, Niculescu RS, Rao RB (2007) Automatic medical coding of patient records via weighted ridge regression. In: Sixth international conference on machine learning and applications (ICMLA 2007), pp 260–265. IEEE
Yan Y, Fung G, Dy JG, Romer R (2010) Medical coding classification by leveraging inter-code relationships. In: Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, pp 193–202
Eijkemans MJC, Van Houdenhoven M, Nguyen T, Boersma E, Steyerberg EW, Kazemier G (2010) Predicting the unpredictablea new prediction model for operating room times using individual characteristics and the surgeon’s estimate. Anesthesiology: J Am Soc Anesthesiologists 112(1):41–49
Article Google Scholar
Aspland E, Gartner D, Harper P (2019) Clinical pathway modelling: a literature review. Health Systems 1–23
Gartner D (2015) Scheduling the hospital-wide flow of elective patients. Springer Lecture Notes in Economics and Mathematical Systems, Heidelberg
Google Scholar
England T, Gartner D, Ostler E, Harper P, Behrens D, Boulton J, Bull D, Cordeaux C, Jenkins I, Lindsay F (2019) Near real-time bed modelling feasibility study. Journal of Simulation 1–12. https://doi.org/10.1080/17477778.2019.1706434
Liu S, Ma W, Moore R, Ganesan V, Nelson S (2005) Rxnorm: prescription for electronic drug information exchange. IT Prof 7(5):17–23
Article Google Scholar
Krämer J, Schreyögg J, Busse R (2019) Classification of hospital admissions into emergency and elective care: a machine learning approach. Health Care Manag Sci 22(1):85–105
Article Google Scholar
Ferrão J. C., Oliveira M. D., Janela F., Martins H. M. G. (2013) Using Structured EHR Data and SVM to Support ICD-9-CM Coding. 2013 IEEE International Conference on Healthcare Informatics, 511516. https://doi.org/10.1109/ICHI.2013.79

Download references

Acknowledgements

The authors are grateful for the close collaboration with colleagues at Hospital Prof. Doutor Fernando Fonseca and their availability throughout this research study. The authors sincerely thank the associate editor and the anonymous referees for their careful review and excellent suggestions for improving this paper.

Author information

Authors and Affiliations

SIEMENS Healthineers, Rua Irmãos Siemens 1, 2720-093, Amadora, Portugal
José Carlos Ferrão & Filipe Janela
CEG-IST, Centre for Management Studies of Instituto Superior Técnico, Universidade de Lisboa, University of Lisbon, Av. Rovisco Pais 1, 1049-001, Lisbon, Portugal
José Carlos Ferrão & Mónica Duarte Oliveira
Cardiff University, School of Mathematics, Cardiff, United Kingdom
Daniel Gartner
Centre for Research and Creativity in Informatics (CI2), Hospital Prof. Doutor Fernando Fonseca, IC-19 Venteira, 2720-276, Amadora, Portugal
Henrique M. G. Martins

Authors

José Carlos Ferrão
View author publications
You can also search for this author in PubMed Google Scholar
Mónica Duarte Oliveira
View author publications
You can also search for this author in PubMed Google Scholar
Daniel Gartner
View author publications
You can also search for this author in PubMed Google Scholar
Filipe Janela
View author publications
You can also search for this author in PubMed Google Scholar
Henrique M. G. Martins
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to José Carlos Ferrão.

Ethics declarations

Ethics approval

Ethics approval was waived reviewed and signed off by hospital board and chief information officer. This was a fully retrospective study with anonymized, routinely collected EHR data extracted by a hospital-designated data handler.

Conflict of Interests

No conflicts of interest to declare.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix A: Most frequent ICD 9 diagnosis codes

Table 14 Top 75 most frequent ICD-9-CM diagnosis codes (category level), ordered by decreasing order of frequency

Full size table

Appendix B: Most frequent ICD 9 procedure codes

Table 15 Top 75 most frequent ICD-9-CM procedure codes, ordered by decreasing order of frequency

Full size table

Appendix C: Most frequent DRG codes

Table 16 Top 25 most frequent DRG codes ordered by decreasing order of frequency

Full size table

Rights and permissions

Reprints and permissions

About this article

Cite this article

Ferrão, J.C., Oliveira, M.D., Gartner, D. et al. Leveraging electronic health record data to inform hospital resource management. Health Care Manag Sci 24, 716–741 (2021). https://doi.org/10.1007/s10729-021-09554-4

Download citation

Received: 26 May 2020
Accepted: 02 February 2021
Published: 24 May 2021
Issue Date: December 2021
DOI: https://doi.org/10.1007/s10729-021-09554-4

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Leveraging electronic health record data to inform hospital resource management

Abstract

Access this article

Similar content being viewed by others

The role of artificial intelligence in healthcare: a structured literature review

Revolutionizing healthcare: the role of artificial intelligence in clinical practice

Big data in healthcare: management, analysis and future prospects

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Ethics approval

Conflict of Interests

Additional information

Publisher’s note

Appendices

Appendix A: Most frequent ICD 9 diagnosis codes

Appendix B: Most frequent ICD 9 procedure codes

Appendix C: Most frequent DRG codes

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Leveraging electronic health record data to inform hospital resource management

Abstract

Access this article

Similar content being viewed by others

The role of artificial intelligence in healthcare: a structured literature review

Revolutionizing healthcare: the role of artificial intelligence in clinical practice

Big data in healthcare: management, analysis and future prospects

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Ethics approval

Conflict of Interests

Additional information

Publisher’s note

Appendices

Appendix A: Most frequent ICD 9 diagnosis codes

Appendix B: Most frequent ICD 9 procedure codes

Appendix C: Most frequent DRG codes

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation