Miscoding Alerts Within Hospital Datasets: An Unsupervised Machine Learning Approach

  • Julio Souza
  • João Vasco Santos
  • Fernando Lopes
  • João Viana
  • Alberto Freitas
Conference paper
Part of the Advances in Intelligent Systems and Computing book series (AISC, volume 746)


The appropriate funding of hospital services may depend upon grouping hospital episodes into Diagnosis Related Groups (DRGs). DRGs rely on the quality of clinical data held in administrative healthcare databases, mainly proper diagnoses and procedure codes. This work proposes a methodology based on unsupervised machine learning and statistical methods to generate alerts of suspect cases of up- and under-coding in healthcare administrative databases. The administrative database, with a DRG assigned to each hospital episode, was split into homogeneous patient subgroups by applying decision tree-based algorithms. The proportions of specific diagnosis and procedure codes were compared within targeted subgroups to identify hospitals with abnormal distributions. Preliminary results indicate that the proposed methodology has the potential to automatically identify upcoding and under-coding suspect cases, as well as other relevant types of discrepancies regarding coding practices. Nevertheless, additional evaluation under the medical perspective need to be incorporated in the methodology.


Diagnosis related groups Clinical terminology Decision tree Machine learning Upcoding Under-coding Miscoding Hospital administrative data 



Project NORTE-01-0145-FEDER-000016 (NanoSTIMA) is financed by the North Portugal Regional Operational Programme (NORTE 2020), under the PORTUGAL 2020 Partnership Agreement, and through the European Regional Development Fund (ERDF). The authors would also like to thank the Central Authority for Health Services, I.P. (ACSS) for providing access to the data.


  1. 1.
    Cheng, P., Gilchrist, A., Robinson, K.M., Paul, L.: The risk and consequences of clinical miscoding due to inadequate medical documentation: a case study of the impact on health services funding. Health Inf. Manage. J. 38, 35–46 (2009)Google Scholar
  2. 2.
    Mathauer, I., Wittenbecher, F.: Hospital payment systems based on diagnosis-related groups: experiences in low- and middle-income countries. Bull. World Health Organ. 91(10), 746–756 (2013)CrossRefGoogle Scholar
  3. 3.
    Busse, R., Geissler, A., Quentin, W., Wiley, M.: Diagnosis Related Group in Europe. Moving towards transparency, efficiency and quality in hospitals. McGraw Hill, New York (2011)Google Scholar
  4. 4.
    Fetter, R.B., Thompson, J.D., Mills, R.E.: A system for cost and reimbursement control in hospitals. Yale J Biol Med. 49, 123–136 (1976)Google Scholar
  5. 5.
    Luo, W., Gallagher, M.: Unsupervised DRG upcoding detection in healthcare databases. In: IEEE International Conference on Data Mining Workshops (ICDMW), pp. 600–605. IEEE (2010)Google Scholar
  6. 6.
    Dafny L.S: How Do Hospitals Respond to Price Changes? In: National Bureau of Economic Research, Inc., NBER Working Papers: 9972 (2003)Google Scholar
  7. 7.
    Reid, B., Palmer, G.R., Aisbett, C.: Under-coding in Australia limits the performance of DRG groupers. Health Inf. Manage. 29, 113–117 (1999)Google Scholar
  8. 8.
    Freitas, A., Lema, I., da Costa-Pereira, A.: Comorbidity coding trends in hospital administrative databases. In: New Advances in Information Systems and Technologies, pp, 609–617. Springer, Cham (2016)CrossRefGoogle Scholar
  9. 9.
    Bauder, R., Khoshgoftaar, T.M., Seliya, N.: A survey on the state of healthcare upcoding fraud analysis and detection. Health Serv. Outcomes Res. Method. 17, 31–55 (2017)CrossRefGoogle Scholar
  10. 10.
    Bauder, R.A., Khoshgoftaar, T.M.: A probabilistic programming approach for outlier detection in healthcare claims. In: 15th IEEE International Conference on Machine Learning And Applications (ICMLA), New York: IEEE, pp. 347–54 (2016)Google Scholar
  11. 11.
    Suresh, N., de Traversay, J., Gollamudi, H., Pathria, A., Tyler, M.: Detection of Upcoding and Code Gaming Fraud and Abuse in Prospective Payment Healthcare Systems. US Patent 8,666,757 (2014)Google Scholar
  12. 12.
    Averill, R.F., Goldfield, N., Hughes, J.S., et al.: All Patient Refined Diagnosis Related Groups Methodology Overview 3 M Health Information Systems. https://www.hcup- (2003)
  13. 13.
    Administração Central do Sistema de Saúde (ACSS). (last accessed 2018/01/07)
  14. 14.
    Quinlan, R.: C4.5: Programs for Machine Learning. Morgan Kaufmann Publishers, San Mateo (1993)Google Scholar
  15. 15.
    Frank, E., Witten, I.H.: Generating accurate rule sets without global optimization. In: 15th International Conference on Machine Learning, pp. 144–151 (1998)Google Scholar
  16. 16.
    Shi, H.: Best-first Decision Tree Learning. Hamilton, New Zealand (2007)Google Scholar
  17. 17.
    Breiman, L., Friedman, J.H., Olshen, R.A., Stone, C.J.: Classification and Regression Trees. Wadsworth International Group, Belmont (1984)zbMATHGoogle Scholar
  18. 18.
    Kalmegh, S.: Analysis of WEKA data mining algorithm REPTree, simple cart and randomtree for classification of indian news. Int. J. Innov. Sci. Eng. Technol. 2, 438–446 (2015)Google Scholar
  19. 19.
    Stiglic, G., et al.: Comprehensive Decision Tree Models in Bioinformatics. PLosONE 7(3), e33812 (2012)CrossRefGoogle Scholar
  20. 20.
    Podgorelec, V., Kokol, P., Stiglic, B., Rozman, I.: Decision trees: an overview and their use in medicine. J. Med. Syst. 26, 445–463 (2002)CrossRefGoogle Scholar
  21. 21.
    Azar, A.T., El-Metwally, S.M.: Decision tree classifiers for automated medical diagnosis. Neural Comput. Appl. 23, 2387–2403 (2013)CrossRefGoogle Scholar
  22. 22.
    Warrens, M.J.: On Association Coefficients for 2 × 2 Tables and Properties That Do Not Depend on the Marginal Distributions. Psychometrika 73, 777–789 (2008)MathSciNetCrossRefGoogle Scholar

Copyright information

© Springer International Publishing AG, part of Springer Nature 2018

Authors and Affiliations

  1. 1.CINTESIS – Center for Health Technology and Services ResearchPortoPortugal
  2. 2.MEDCIDS – Department of Community Medicine, Information and Health Decision Sciences, Faculty of MedicineUniversity of PortoPortoPortugal

Personalised recommendations