Springer Nature is making SARS-CoV-2 and COVID-19 research free. View research | View latest news | Sign up for updates

A study of factors related to patients’ length of stay using data mining techniques in a general hospital in southern Iran

  • 18 Accesses



The length of stay (LOS) in hospitals is a widely used indicator for goals such as health care management, quality control, utilizing hospital services and resources, and determining the degree of efficiency. Various methods have been used to identify the factors influencing the LOS. This study adopts a comparative approach of data mining techniques for investigating effective factors and predict the length of stay in Shahid-Mohammadi Hospital, Bandar Abbas, Iran.


Using a dataset consists of 526 patient records of the Shahid-Mohammadi Hospital from March 2016 to March 2017, factors affecting the LOS were ranked using information gain and correlation indices. In addition, classification models for LOS prediction were created based on nine data mining classifiers applied with and without feature selection technique. Finally, the models were compared.


The most important factors affecting LOS are the number of para-clinical services, counseling frequency, clinical ward, the specialty and the degree of the doctor, and the cause of hospitalization. In addition, regarding to the classifiers created based on the dataset, the best accuracy (83.91%) and sensitivity (80.36%) belongs to the Logistic Regression and Naïve Bayes respectively. In addition, the best AUC (0.896) belongs to the Random Forest and Generalized Linear classifiers.


The results showed that most of the proposed models are suitable for classification of the length of stay, although the Logistic Regression might have a slightly better performance than others in term of accuracy, and this model can be used to determine the patients’ Length of Stay. In general, continuous monitoring of the factors influencing each of the performance indicators based on proper and accurate models in hospitals is important for helping management decisions.

This is a preview of subscription content, log in to check access.

Fig. 1
Fig. 2
Fig. 3


  1. 1.

    Pourreza A, Salavati S, Sadeghi darvishi S, Salehi Nasab M, Tabesh H, Mamivand F, Kishizadeh Z. Factors influencing the length of stay in infectious ward of Razi Hospital in Ahvaz: Iran. Health Inf Manag. 2015;11(6):779–88.

  2. 2.

    Bahadori M, Sadeghifar J, Hamouzadeh P, Hakimzadeh SM, Nejati M. Combining multiple indicators to assess hospital performance in Iran using the Pabon Lasso Model. Australas Med J. 2011;4(4):175–9. https://doi.org/10.4066/AMJ.2011.620.

  3. 3.

    Baniasadi T, Khorrami F, Jebraeily M, Khamzade F, Ghovvati Kisomi F. Performance evaluation of Hormozgan University of Medical Sciences (HUMS) hospitals based on Pabon Lasso Model. Evid Based Health Policy Manag Econ. 2018;2(4):249–57. https://doi.org/10.18502/jebhpme.v2i4.276.

  4. 4.

    Zahiri M, Keliddar I. Performance evaluating in hospitals affiliated in Ahvaz University of Medical Sciences based on Pabon Lasso model. Hospital. 2012;11(3):37–44.

  5. 5.

    del-Rey-Chamorro FM, Roy R, van Wegen B, Steele A. A framework to create key performance indicators for knowledge management solutions. J Knowl Manag. 2003;7(2):46–62. https://doi.org/10.1108/13673270310477289.

  6. 6.

    Parmenter D. Key Performance Indicators (KPI): developing, implementing, and using winning KPIs. Philadelphia: Wiley; 2007.

  7. 7.

    Nasiripoor AA, Helali Bonab MA, Raeisi P. The leadership styles of district health network managers and performance indices in eastern Azerbaijan, Iran; 2008. J Health Adm. 2009;12(36):17–24.

  8. 8.

    Sadeghifar J, Ashrafrezaee N, Hamouzadeh P, Taghavi Shahri S, Shams L. Relationship between performance indicators and hospital evaluation score at hospitals affiliated to Urmia University of Medical Sciences. J Urmia Nurs Midwifery Fac. 2011;9(4):270–6.

  9. 9.

    Jonaidi Jafari N, Sadeghi M, Izadi M, Ranjbar R. Comparison of performance indicators in one of hospitals of Tehran with national standards. J Mil Med. 2011;12(4):223–8.

  10. 10.

    Ebadifard Azar F, Ansari H, Rezapoor A. Study of daily bed occupancy costs and performance indexes in selected Hospitalat of Iran University of Medical Sciences in 2002. J Health Admin. 2005;7(18):37–44.

  11. 11.

    Arab M, Zarei A, Rahimi A, Rezaiean F, Akbari F. Analysis of factors affecting length of stay in public hospitals in Lorestan Province, Iran. Hakim Res J. 2010;12(4):27–32.

  12. 12.

    Karim H, Tara SM, Etminani K. Factors Associated with length of hospital stay: a systematic review. J Health Biomed Inf. 2015;1(2):131–42.

  13. 13.

    Ravangard R, Arab M, Zeraati H, Rashidian A, Akbarisari A, Mostaan F. Patients’ length of stay in women hospital and its associated clinical and non-clinical factors, tehran, iran. Iran Red Crescent Med J. 2011;13(5):309–15.

  14. 14.

    Aghajani S, Kargari M. Determining factors influencing length of stay and predicting length of stay using data mining in the general surgery department. Hosp Pract Res. 2016;1(2):53–8. https://doi.org/10.20286/hpr-010251.

  15. 15.

    Ameri H, Adham D, Panahi M, Khalili Z, Fasihi A, Moravveji M, Karimi S. Predictors for duration of stay in hospitals. J Health. 2015;6(3):256–65.

  16. 16.

    Khajehali N, Alizadeh S. Extract critical factors affecting the length of hospital stay of pneumonia patient by data mining (case study: an Iranian hospital). Artif Intell Med. 2017;83:2–13. https://doi.org/10.1016/j.artmed.2017.06.010.

  17. 17.

    Turgeman L, May JH, Sciulli R. Insights from a machine learning model for predicting the hospital Length of Stay (LOS) at the time of admission. Expert Syst Appl. 2017;78:376–85. https://doi.org/10.1016/j.eswa.2017.02.023.

  18. 18.

    Xiao J, Douglas D, Lee AH, Vemuri SR. A Delphi evaluation of the factors influencing length of stay in Australian hospitals. Int J Health Plan Manage. 1997;12(3):207–18. https://doi.org/10.1002/(SICI)1099-1751(199707/09)12:3%3c207:AID-HPM480%3e3.0.CO;2-V.

  19. 19.

    Yaghoubi M, Karimi S, Ketabi S, Javadi M. Factors affecting patients’ length of stay in Alzahra hospital based on hierarchical analysis technique. Health Inf Manag. 2011;8(3):326–34.

  20. 20.

    Han J, Pei J, Kamber M. Data mining: concepts and techniques. 3rd ed. Burlington: Morgan Kaufmann; 2012.

  21. 21.

    Rezaei Hachesu P, Ahmadi M, Alizadeh S, Sadoughi F. Use of data mining techniques to determine and predict length of stay of cardiac patients. Healthc Inf Res. 2013;19(2):121–9. https://doi.org/10.4258/hir.2013.19.2.121.

  22. 22.

    Azari A, Janeja VP, Mohseni A. Predicting hospital length of stay (PHLOS): a multi-tiered data mining approach. In: 2012 IEEE 12th international conference on datamining workshops (ICDMW), Brussels, Belgium; 2012. p. 17–24. IEEE.

  23. 23.

    Tanuja S, Acharya DU, Shailesh K. Comparison of different data mining techniques to predict hospital length of stay. J Pharm Biomed Sci. 2011;7(15):1–4.

  24. 24.

    Daghistani TA, Elshawi R, Sakr S, Ahmed AM, Al-Thwayee A, Al-Mallah MH. Predictors of in-hospital length of stay among cardiac patients: a machine learning approach. Int J Cardiol. 2019;288:140–7. https://doi.org/10.1016/j.ijcard.2019.01.046.

  25. 25.

    Chuang MT, Hu YH, Lo CL. Predicting the prolonged length of stay of general surgery patients: a supervised learning approach. Int Trans Oper Res. 2018;25(1):75–90. https://doi.org/10.1111/itor.12298.

  26. 26.

    Caetano N, Cortez P, Laureano RM. Using data mining for prediction of hospital length of stay: an application of the CRISP-DM methodology. In: International conference on enterprise information systems. Cham: Springer; 2014. p. 149–66.

  27. 27.

    Karegowda AG, Manjunath A, Jayaram M. Comparative study of attribute selection using gain ratio and correlation based feature selection. Int J Inf Technol Knowl Manag. 2010;2(2):271–7.

  28. 28.

    Hall MA. Correlation-based feature selection for machine learning. PhD Thesis, Department of Computer Science, Waikato University, Waikato; 1999.

  29. 29.

    Patil TR, Sherekar S. Performance analysis of Naive Bayes and J48 classification algorithm for data classification. Int J Comput Sci Appl. 2013;6(2):256–61.

  30. 30.

    Komarek P. Logistic regression for data mining and high-dimensional classification. Ph.D. Thesis, Carnegie Mellon University, Pittsburgh; 2004.

  31. 31.

    LeCun Y, Bengio Y, Hinton G. Deep learning. Nature. 2015;521(7553):436–44. https://doi.org/10.1038/nature14539.

  32. 32.

    Gupta B, Rawat A, Jain A, Arora A, Dhami N. Analysis of various decision tree algorithms for classification in data mining. Int J Comput Appl. 2017;163(8):15–9. https://doi.org/10.5120/ijca2017913660.

  33. 33.

    Krauss C, Do XA, Huck N. Deep neural networks, gradient-boosted trees, random forests: statistical arbitrage on the S&P 500. Eur J Oper Res. 2017;259(2):689–702. https://doi.org/10.1016/j.ejor.2016.10.031.

  34. 34.

    Friedman JH. Greedy function approximation: a gradient boosting machine. Ann Stat. 2001;29(5):1189–232.

  35. 35.

    Chen T, Guestrin C. Xgboost: a scalable tree boosting system. In: Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining–KDD’16, San Francisco, CA, USA. New York: ACM; 2016. p. 785–94.

  36. 36.

    Freund Y, Schapire RE. A decision-theoretic generalization of on-line learning and an application to boosting. J Comput Syst Sci. 1997;55(1):119–39. https://doi.org/10.1006/jcss.1997.1504.

  37. 37.

    Nekoei-Moghadam M, Rooholamini A, Yazdi Feizabadi V, Hooshyar P. Comparing performance of selected teaching hospitals in Kerman and Shiraz Universities of Medical Sciences, Iran, Using Pabon-Lasso Chart. J Health Dev. 2012;1(1):11–21.

  38. 38.

    Hodge V, Austin J. A survey of outlier detection methodologies. Artif Intell Rev. 2004;22(2):85–126. https://doi.org/10.1023/B:AIRE.0000045502.10941.a9.

  39. 39.

    Jaensson M, Dahlberg K, Eriksson M, Gronlund A, Nilsson U. The Development of the Recovery Assessments by Phone Points (RAPP): a mobile phone app for postoperative recovery monitoring and assessment. JMIR mHealth uHealth. 2015;3(3):e86. https://doi.org/10.2196/mhealth.4649.

  40. 40.

    Bekmezian A, Chung PJ, Cabana MD, Maselli JH, Hilton JF, Hersh AL. Factors associated with prolonged emergency department length of stay for admitted children. Pediatr Emerg Care. 2011;27(2):110–5. https://doi.org/10.1097/PEC.0b013e31820943e4.

  41. 41.

    Liu Y, Phillips M, Codde J. Factors influencing patients’ length of stay. Aust Health Rev. 2001;24(2):63–70. https://doi.org/10.1071/AH010063.

  42. 42.

    Baek H, Cho M, Kim S, Hwang H, Song M, Yoo S. Analysis of length of hospital stay using electronic health records: a statistical and data mining approach. PLoS ONE. 2018;13(4):e0195901. https://doi.org/10.1371/journal.pone.0195901.

  43. 43.

    Thompson B, Elish KO, Steele R. Machine learning-based prediction of prolonged length of stay in newborns. In: Paper presented at the 2018 17th IEEE international conference on machine learning and applications (ICMLA), Orlando, FL; 2018.

  44. 44.

    Stoean R, Stoean C, Sandita A, Ciobanu D, Mesina C. Interpreting decision support from multiple classifiers for predicting length of stay in patients with colorectal carcinoma. Neural Process Lett. 2017;46(3):811–27. https://doi.org/10.1007/s11063-017-9585-7.

  45. 45.

    Zikos D, Tsiakas K, Qudah F, Athitsos V, Makedon F. Evaluation of classification methods for the prediction of hospital length of stay using medicare claims data. In: Proceedings of the 7th international conference on PErvasive technologies related to assistive environments, Rhodes, Greece. New York: ACM; 2014. p. 54.

  46. 46.

    Rajkomar A, Oren E, Chen K, Dai AM, Hajaj N, Hardt M, Liu PJ, Liu X, Marcus J, Sun M. Scalable and accurate deep learning with electronic health records. NPJ Digit Med. 2018;1(1):18. https://doi.org/10.1038/s41746-018-0029-1.

  47. 47.

    Stoean R, Stoean C, Sandita A, Ciobanu D, Mesina C. Ensemble of classifiers for length of stay prediction in colorectal cancer. In: Rojas I, Joya G, Catala A, editors. Advances in computational intelligence, vol. 9094., Lecture notes in computer scienceBerlin: Springer; 2015. p. 444–57.

  48. 48.

    Steele RJ, Thompson B. Data mining for generalizable pre-admission prediction of elective length of stay. In: 2019 IEEE 9th annual computing and communication workshop and conference (CCWC), Las Vegas, NV, USA. IEEE; 2019. p. 0127–33.

Download references


This study is part of a registered research project with the grant number of 9520 and ethical code of HUMS.REC.1395.56 from Deputy of Research and Technology of Hormozgan University of Medical Sciences. We wish to thank the deputy of the university’s research and technology for its supports, also we are sincerely thankful to our counselors in Clinical Research Development Center of Shahid Mohammadi Hospital.

Author information

Correspondence to Tayebeh Baniasadi.

Ethics declarations

Conflict of interest

The authors report no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Ayyoubzadeh, S.M., Ghazisaeedi, M., Rostam Niakan Kalhori, S. et al. A study of factors related to patients’ length of stay using data mining techniques in a general hospital in southern Iran. Health Inf Sci Syst 8, 9 (2020). https://doi.org/10.1007/s13755-020-0099-8

Download citation


  • Data mining
  • Classification
  • Length of stay (LOS)
  • Hospital