Skip to main content
Log in

A survey on the state of healthcare upcoding fraud analysis and detection

  • Published:
Health Services and Outcomes Research Methodology Aims and scope Submit manuscript


From its infancy in the 1910s, healthcare group insurance continues to increase, creating a consistently rising burden on the government and taxpayers. The growing number of people enrolled in healthcare programs such as Medicare, along with the enormous volume of money in the healthcare industry, increases the appeal for and risk of fraudulent activities. One such fraud, known as upcoding, is a means by which a provider can obtain additional reimbursement by coding a certain provided service as a more expensive service than what was actually performed. With the proliferation of data mining techniques and the recent and continued availability of public healthcare data, the application of these techniques towards fraud detection, using this increasing cache of data, has the potential to greatly reduce healthcare costs through a more robust detection of upcoding fraud. Presently, there is a sizable body of healthcare fraud detection research available but upcoding fraud studies are limited. Audit data can be difficult to obtain, limiting the usefulness of supervised learning; therefore, other data mining techniques, such as unsupervised learning, must be explored using mostly unlabeled records in order to detect upcoding fraud. This paper is specific to reviewing upcoding fraud analysis and detection research providing an overview of healthcare, upcoding, and a review of the current data mining techniques used therein.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others



















  • Ahmad, P., Qamar, S., Rizvi, S.Q.A.: Techniques of data mining in healthcare: a review. Int. J. Comput. Appl. 120(15), 38–50 (2015)

    Google Scholar 

  • Aral, K.D., Güvenir, H.A., Sabuncuoğlu, İ., Akar, A.R.: A prescription fraud detection model. Comput. Methods Programs Biomed. 106(1), 37–46 (2012)

    Article  PubMed  Google Scholar 

  • Bowblis, J.R., Brunt, C.S.: Medicare skilled nursing facility reimbursement and upcoding. Health Econ. 23(7), 821–840 (2014)

    Article  PubMed  Google Scholar 

  • Bricker, E.: Physician upcoding: Does it happen? If so, how?. (2015)

  • Brunt, C.S.: CPT fee differentials and visit upcoding under Medicare Part B. Health Econ. 20(7), 831–841 (2011)

    Article  PubMed  Google Scholar 

  • Chandola, V., Sukumar, S.R., Schryver, J. C.: Knowledge discovery from massive healthcare claims data. In: Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, ser. KDD ’13, pp. 1312–1320. ACM, New York, NY (2013)

  • Cox, E.: A fuzzy system for detecting anomalous behaviors in healthcare provider claims. In: Intelligent Systems for Finance and Business, pp. 111–134. John Wiley & Sons (1995)

  • Dave, D.M., Dadhich, P.: Applications of data mining techniques: empowering quality healthcare services. In: IJICCT (2013)

  • Davis, E.: DRG 101: What is a DRG & How does it work?. (2015)

  • Furlan, Š., Bajec, M.: Holistic approach to fraud management in health insurance. J. Inf. Organ. Sci. 32(2), 99–114 (2008)

    Google Scholar 

  • Gera, C., Joshi, K.: A survey on data mining techniques in the medicative field. Int. J. Comput. Appl. 113(13), 32–35 (2015)

    Google Scholar 

  • Hsia, R.Y., Antwi, Y.A.: Variation in charges for emergency department visits across California. Ann. Emerg. Med. 64(2), 120–126.e4 (2014)

    Article  PubMed  PubMed Central  Google Scholar 

  • Johnson, M.E., Nagarur, N.: Multi-stage methodology to detect health insurance claim fraud. Health Care Manag. Sci. (2015). doi:10.1007/s10729-015-9317-3

  • Joudaki, H., Rashidian, A., Minaei-Bidgoli, B., Mahmoodi, M., Geraili, B., Nasiri, M., Arab, M.: Using data mining to detect health care fraud and abuse: a review of literature. Glob. J. Health Sci. 7(1), 194 (2015)

    Google Scholar 

  • Jürges, H., Köberlein, J.: First do no harm. Then do not cheat: DRG upcoding in German Neonatology. CESifo Group Munich, CESifo Working Paper Series 4341 (2013)

  • King, K.M.: Medicare fraud: progress made, but more action needed to address Medicare fraud, waste, and abuse. (2014)

  • Kumar, M., Ghani, R., Mei, Z.-S.: Data mining to predict and prevent errors in health insurance claims processing. In: Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, ser. KDD ’10, pp. 65–74. ACM, New York, NY (2010)

  • Liu, Q., Vasarhelyi, M.: Healthcare fraud detection: A survey and a clustering model incorporating geo-location information. In: 29th World Continuous Auditing and Reporting Symposium (29WCARS), Brisbane, Australia (2013)

  • Lu, F., Boritz, J.E.: Detecting fraud in health insurance data: learning to model incomplete Benford’s law distributions. In: Machine Learning: ECML 2005: 16th European Conference on Machine Learning, Porto, Portugal, October 3–7, 2005. Proceedings, pp. 633–640. Springer, Berlin (2005)

  • Luo, W., Gallagher, M.: Unsupervised DRG upcoding detection in healthcare databases. In: 2010 IEEE International Conference on Data Mining Workshops (ICDMW), pp. 600–605 (2010)

  • Major, J.A., Riedinger, D.R.: EFD: a hybrid knowledge/statistical-based system for the detection of fraud. J. Risk Insur. 69(3), 309–324 (2002)

    Article  Google Scholar 

  • Morris, L.: Combating fraud in health care: an essential component of any cost containment strategy. (2009)

  • Munro, D.: Annual US healthcare spending hits $3.8 trillion. (2014)

  • Ngufor, C., Wojtusiak, A.: Unsupervised labeling of data for supervised learning and its application to medical claims prediction. Comput. Sci. 14(2), 191–214 (2013)

    Article  Google Scholar 

  • Ornstein, C., Grochowski, R.J.: Top billing: meet the docs who charge Medicare top dollar for office visits. (2014)

  • Peng, Y., Kou, G., Sabatka, A., Chen, Z., Khazanchi, D., Shi, Y.: Application of clustering methods to health insurance fraud detection. In: 2006 International Conference on Service Systems and Service Management vol. 1, pp. 116–120 (2006)

  • Phua, C., Lee, V.C.S., Smith-Miles, K., Gayler, R.W.: A comprehensive survey of data mining-based fraud detection research. In: CoRR. arXiv:1009.6119 (2010)

  • Rosenberg, M.A., Fryback, D.G., Katz, D.A.: A statistical model to detect DRG upcoding. Health Serv. Outcomes Res. Method. 1(3), 233–252 (2000)

    Article  Google Scholar 

  • Schönfelder, T., Klewer, J.: Methods to detect DRG-upcoding. Heilberufe 60, 6–12 (2008)

    Article  Google Scholar 

  • Silverman, E., Skinner, J.: Medicare upcoding and hospital ownership. J. Health Econ. 23(2), 369–389 (2004)

    Article  PubMed  Google Scholar 

  • Steinbusch, P.J., Oostenbrink, J.B., Zuurbier, J.J., Schaepkens, F.J.: The risk of upcoding in casemix systems: a comparative study. Health Policy 81(23), 289–299 (2007)

    Article  PubMed  Google Scholar 

  • Suresh, N., de Traversay, J., Gollamudi, H., Pathria, A., Tyler, M.: Detection of upcoding and code gaming fraud and abuse in prospective payment healthcare systems. US Patent 8,666,757 (2014)

  • Swanson, T.: The 5 most common types of medical billing fraud. (2012)

  • Sweeney, E.: Florida fraud case highlights concerns surrounding Medicare advantage upcoding. (2015)

  • Tomar, D., Agarwal, S.: A survey on data mining approaches for healthcare. Int. J. Bio-Sci. Bio-Technol. 5(5), 241–266 (2013)

    Article  Google Scholar 

  • Travaille, P., Müller, R.M., Thornton, D., van Hillegersberg, J.: Electronic fraud detection in the US Medicaid healthcare program: lessons learned from other industries. In: 17th Americas Conference on Information Systems, AMCIS (2011)

  • Witten, I .H., Frank, E.: Data Mining: Practical Machine Learning Tools and Techniques. Morgan Kaufmann, Burlington (2005)

    Google Scholar 

  • Yamanishi, K., Takeuchi, J.-I., Williams, G., Milne, P.: On-line unsupervised outlier detection using finite mixtures with discounting learning algorithms. Data Min. Knowl. Disc. 8(3), 275–300 (2004)

    Article  Google Scholar 

Download references


The authors would like to thank the editor and the anonymous reviewers for their insightful comments. They would also like to thank various members of the Data Mining and Machine Learning Laboratory, Florida Atlantic University, Boca Raton, for their assistance reviewing this manuscript.

Author information

Authors and Affiliations


Corresponding author

Correspondence to Richard Bauder.

Ethics declarations

Conflict of interest

The authors declare that they have no conflicts of interest.

Ethical approval

This article does not contain any studies with human participants or animals performed by any of the authors.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Bauder, R., Khoshgoftaar, T.M. & Seliya, N. A survey on the state of healthcare upcoding fraud analysis and detection. Health Serv Outcomes Res Method 17, 31–55 (2017).

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: