Skip to main content
Log in

Provider profiling and labeling of fraudulent health insurance claims using Weighted MultiTree

  • Original Research
  • Published:
Journal of Ambient Intelligence and Humanized Computing Aims and scope Submit manuscript

Abstract

Recently, healthcare organizations getting engross in digitizing the health insurance system. Besides its undeniable benefits, the risk of exaggerating a claim or entirely fabricating one by providers is increasing. Provider profiling aids in outlier false claims by measure the performance of providers and outcomes of healthcare. Hence provider profiling has become an interesting research topic in the health insurance system. However, most of the existing provider profiling approaches are encountering the problem of intermediate results due to class overlappings. Another problem encounter in developing or validating an automated fraud detection model is the availability of labeled data. The manual labeling of huge claims data by medical experts is always not feasible. Hence, it is essential to automate the process of fraud detection which was not focused on by the researchers who are developing healthcare fraud detection models. There is one existing approach to automate the labeling of health insurance claims which considers the provider’s unique identification number as a reference while one-to-one mapping with real-world fraudulent claims. However, the approach is encountering a problem of missing values in providers’ identification numbers, causing poor performance in healthcare fraud detection models. In this study, we have proposed a Weighted MultiTree approach to mitigate the aforementioned problems of provider profiling and labeling. MultiTree is a DAG construction in which each node is reachable from any other node without ambiguity. And hence our proposed approach performed provider profiling without intermediate results with less construction cost. And the labeling of claims using unique details set of providers yielded from MultiTree enhanced the detection accuracy of fraudulent claims.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

Availability of data and material

(data transparency) Data used for this paper is available in the public domain.

Code availability

(software application or custom code) Custom code developed.

References

  • ACA (2021) The Affordable Care Act and Health Care Fraud. https://weaver.com/blog/affordable-care-act-and-health-care-fraud Accessed 20 Nov 2020

  • Ashtiani MN, Raahemi B (2021) Intelligent fraud detection in financial statements using machine learning and data mining: a systematic literature review. IEEE Access. https://doi.org/10.1109/ACCESS.2021.3096799

    Article  Google Scholar 

  • Bauder RA, Khoshgoftaar TM (2016) A probabilistic programming approach for outlier detection in healthcare claims. In: Proceedings of the 15th IEEE international conference on machine learning and applications, pp 347–354

  • Bauder RA, Khoshgoftaar TM (2017) Multivariate outlier detection in medicare claims payments applying probabilistic programming methods. J Health Serv Outcomes Res Methodol 17:1–34

    Google Scholar 

  • Bauder RA, Khoshgoftaar TM, Seliya N (2017) A survey on the state of healthcare upcoding fraud analysis and detection. J Health Serv Outcomes Res Methodol 17:31–55

    Article  Google Scholar 

  • Bayerstadler A, Dijk LV, Winter F (2016) Bayesian multinomial latent variable modeling for fraud and abuse detection in health insurance. Insur Math Econ 71:244–252

    Article  MathSciNet  MATH  Google Scholar 

  • Bekkar M, Djemaa HK, Alitouche TA (2013) Evaluation measures for models assessment over imbalanced data sets. J Inf Eng Appl 3:10

    Google Scholar 

  • Boutaher N, Elomri A, Abghour N et al (2020) A review of credit card fraud detection using machine learning techniques. In: 5th international conference on cloud computing and artificial intelligence: technologies and applications (CloudTech), pp 1–5

  • Branting LK, Reeder F, Gold J et al (2016) Graph analytics for healthcare fraud risk estimation. In: Proceedings of the IEEE/ACM international conference on advances in social networks analysis and mining (ASONAM). IEEE, pp 845–851

  • Capelleveen GV, Poel M, Roland MM et al (2016) Outlier detection in healthcare fraud: a case study in the medicaid dental domain. Int J Account Inf Syst 21:18–31

    Article  Google Scholar 

  • Chandola V, Sukumar SR, Schryver JC (2013) Knowledge discovery from massive healthcare claims data. In: Proceedings of the 19th ACM SIGKDD international conference on knowledge discovery and data mining, ACM, pp 1312–1320

  • Chelladurai U, Pandian S (2021) A novel blockchain based electronic health record automation system for healthcare. J Ambient Intell Humaniz Comput. https://doi.org/10.1007/s12652-021-03163-3

    Article  Google Scholar 

  • CMS (2019) Medicare Physician & Other Practitioners—by Provider and Service https://data.cms.gov/provider-summary-by-type-of-service/medicare-physician-other-practitioners/medicare-physician-other-practitioners-by-provider-and-service. Accessed 10 Nov 2020

  • Dhieb N, Ghazzai H, Besbes H et al (2020) A secure AI-driven architecture for automated insurance systems: fraud detection and risk measurement. IEEE Access 8:58546–58558

    Article  Google Scholar 

  • Hancock JT, Khoshgoftaar TM (2021) Gradient boosted decision tree algorithms for medicare fraud detection. SN Comput Sci 2(268):1–12

    Google Scholar 

  • Haque ME, Tozal ME (2021) Identifying health insurance claim frauds using mixture of clinical concepts. IEEE Trans Serv Comput. https://doi.org/10.1109/TSC.2021.3051165

    Article  Google Scholar 

  • Hasselgren A, Kralevska K, Gligoroski D et al (2020) Blockchain in healthcare and health sciences—a scoping review. Int J Med Informatics 134(104040):1–10

    Google Scholar 

  • HCFG (2021) Challenge of Health Care Fraud. https://healthcarefraudgroup.com/the-challenges-of-health-care-fraud/. Accessed 12 July 2021

  • HCPCS (2019) Centers for Medicare & Medicaid Services, HCPCS general information. https://www.cms.gov/Medicare/Coding/MedHCPCSGenInfo/index.html. Accessed 20 Jan 2019

  • He H, Wang J, Graco W et al (1997) Application of neural networks to detection of medical fraud. Expert Syst Appl 13(4):329–336

    Article  Google Scholar 

  • He H, Hawkins S, Graco W et al (2000) Application of genetic algorithms and k-nearest neighbor method in real world medical fraud detection problem. J Adv Comput Intell Intell Inf 4(2):130–137

    Article  Google Scholar 

  • Herland M, Khoshgoftaar TM, Bauder RA (2018) Big data fraud detection using multiple medicare data sources. J Big Data 5(29):1–21

    Google Scholar 

  • Jeni LA, Cohn JF, De La Torre F (2013) Facing imbalanced data–recommendations for the use of performance metrics. In: 2013 Humaine association conference on affective computing and intelligent interaction (ACII), pp 245–251

  • Jiang Z, Chen X, Dong B et al (2020) Trajectory-based community detection. IEEE Trans Circuits Syst II Express Briefs 67(6):1139–1143

    Google Scholar 

  • Johnson JM, Khoshgoftaar TM (2019) Medicare fraud detection using neural networks. J Big Data 6(63):1–35

    Google Scholar 

  • Johnson JM, Khoshgoftaar TM (2021) Medical provider embeddings for healthcare fraud detection. SN Comput Sci 2(276):1–15

    Google Scholar 

  • Johnson ME, Nagarur N (2015) Multi-stage methodology to detect health insurance claim fraud. Health Care Manag Sci. https://doi.org/10.1007/s10729-015-9317-3

    Article  Google Scholar 

  • Kosea I, Gokturk M, Kilic K (2015) An interactive machine-learning-based electronic fraud and abuse detection system in healthcare insurance. Appl Soft Comput 36:283–299

    Article  Google Scholar 

  • Li J, Huang KY, Jin J et al (2008) A survey on statistical methods for health care fraud detection. Health Care Manag Sci 11:275–287

    Article  Google Scholar 

  • Lucas Y, Portier P-E, Laporte L et al (2020) Towards automated feature engineering for credit card fraud detection using multi-perspective HMMs. Futur Gener Comput Syst 102:393–402

    Article  Google Scholar 

  • Marr B (2015) How big data is changing healthcare. https://www.forbes.com/sites/bernardmarr/2015/04/21/how-big-data-is-changing-healthcare. Accessed 18 June 2020

  • Matloob I, Khan SA, Rahman HU (2020) Sequence mining and prediction-based healthcare fraud detection methodology. IEEE Access 8:143256–143273

    Article  Google Scholar 

  • Matthews (1975) Comparison of the predicted and observed secondary structure of T4 phage lysozyme. Biochimica Et Biophysica Acta (BBA)-Protein Structure 405(2):442–451

    Article  Google Scholar 

  • McGhin T, Choo K-K, Liu CZ et al (2019) Blockchain in healthcare applications: research challenges and opportunities. J Netw Comput Appl 135:62–75

    Article  Google Scholar 

  • NHCAA (2010) Combating Health Care Fraud in a Post-Reform World: Seven Guiding Principles for Policymakers. https://www.pcmanet.org/wp-content/uploads/2016/08/pr-dated-05-09-13-whitepaper_oct10.pdf. Accessesed 11 Mar 2020

  • NPI (2019) Centers for Medicare & Medicaid Services, National Provider Identifier (NPI) standard. https://www.cms.gov/Regulations-and-Guidance/Administrative-Simplification/NationalProvIdentStand/. Accessed 11 Mar 2019

  • OIG (2019) LEIE downloadable databases https://oig.hhs.gov/exclusions/exclusions_list.asp. Accessed 10 Nov 2019

  • Ozbayoglu AM, Gudelek MU, Sezer OB (2020) Deep learning for financial applications: a survey. Appl Soft Comput 93(106384):1–29

    Google Scholar 

  • Sahmoud S, Topcuoglu HR (2020) A general framework based on dynamic multi-objective evolutionary algorithms for handling feature drifts on data streams. Futur Gener Comput Syst 102:42–52

    Article  Google Scholar 

  • San Miguel Carrasco R, Sicilia-Urbán MÁ (2020) Evaluation of deep neural networks for reduction of credit card fraud alerts. IEEE Access 8:186421–186432

    Article  Google Scholar 

  • Sasaki Y (2007) The truth of the F-measure. Teach Tutor mater

  • Shanmugapriya E, Kavitha R (2019) Medical big data analysis: preserving security and privacy with hybrid cloud technology. Soft Comput 23:2585–2596

    Article  Google Scholar 

  • Shin H, Park H, Lee J et al (2012) A Scoring model to detect abusive billing patterns in health insurance claims. Expert Syst Appl 39(8):7441–7450

    Article  Google Scholar 

  • Simborg DW (2008) Healthcare fraud: whose problem is it anyway? J Am Med Inform Assoc 15(3):278–280

    Article  Google Scholar 

  • Viveros MS, Nearhos JP, Rothman MJ (1996) Applying data mining techniques to a health insurance information system. In: Proceedings of the 22nd conference on very large data bases (VLDB), pp 286–294

  • Yamanishi K, Takeuchi JI, Williams G, Milne P (2004) On-line unsupervised outlier detection using finite mixtures with discounting learning algorithms. Data Min Knowl Disc 8(3):275–300

    Article  MathSciNet  Google Scholar 

  • Yang WS, Hwang SY (2006) A process-mining framework for the detection of healthcare fraud and abuse. Expert Syst Appl 31:56–68

    Article  Google Scholar 

  • Zhang Z, Chen L, Liu Q et al (2020) A fraud detection method for low-frequency transaction. IEEE Access 8:25210–25220

    Article  Google Scholar 

  • Zhou S, He J, Yang H et al (2020) Big data-driven abnormal behavior detection in healthcare based on association rules. IEEE Access 8:129002–129011

    Article  Google Scholar 

Download references

Funding

This work is partially supported by the Scheme for Promotion of Academic and Research Collaboration (SPARC), sponsored by the Ministry of Human Resource Development, Government of India, under the project titled Digital Health Records Storage and Analysis for Healthcare Provisioning of Global Patients: An India-Australia Initiative (1406).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to G. R. Gangadharan.

Ethics declarations

Conflict of interest

On behalf of all authors, the corresponding author states that there is no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Settipalli, L., Gangadharan, G.R. Provider profiling and labeling of fraudulent health insurance claims using Weighted MultiTree. J Ambient Intell Human Comput 14, 3487–3508 (2023). https://doi.org/10.1007/s12652-021-03481-6

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s12652-021-03481-6

Keywords

Navigation