Skip to main content

Advertisement

Log in

Design and development of big data-based model for detecting fraud in healthcare insurance industry

  • Application of soft computing
  • Published:
Soft Computing Aims and scope Submit manuscript

Abstract

The advancement in healthcare services has been increasing widely to extend several services with intense quality. One of the important issues affecting the effective use of public funds is the detection of health insurance fraud. Previous techniques of detecting fraud pay close attention to characteristics of a single visit rather than many patient visits. Due to a higher false positive rate and poor profile construction, the common traits have reduced detection performance. This paper introduces a novel and intelligent Provider Fraud_Anomaly Detection System (PF_ADS) by combining big data and deep learning approaches for the healthcare insurance industry. The proposed framework contributes to improvising the preprocessing and classification phases to detect provider fraud at an untimely phase. Initially, the collected datasets are preprocessed using a Relative Risk-based MapReduce framework that builds an organized set of relationships between diseases, patients, and claiming variables. The classification phase is improvised using a proposed Recurrent Neural Network (RNN). It consists of sophisticated steps to consider the significant attributes using hyperparameter optimization. Recalling ability is one of the best parts of RNNs that defines the past and present states of the networks. Therefore, the ability of network state predictions and the tuning of parameters is studied by improved Decisional Score-based Bayesian Optimization (DS_BO). Finally, the best attributes with the selective hyperparameters are fed into the input layer of the Recurrent Neural Networks (RNNs) to classify the anomalies from the provider’s end. The proposed PF_ADS framework is experimented with and validated on the public repositories. The experimental results state that the proposed framework outperforms better than the other methods in terms of accuracy (88.09%), precision (14.15%), recall (32.80%), and 92.30 s computational time.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

Data availability

Enquiries about data availability should be directed to the authors.

References

  • Ashtiani MN, Raahemi B (2022) Intelligent fraud detection in financial statements using machine learning and data mining: a systematic literature review. Inst Electr Electron Eng Access 10:72504–72525

    Google Scholar 

  • Bauder RA, Khoshgoftaar TM, Seliya N (2017) A survey on the state of healthcare upcoding fraud analysis and detection. Health Serv Outcomes Res Method 17:31–55

    Article  Google Scholar 

  • Bauder RA, Khoshgoftaar TM (2016) A probabilistic programming approach for outlier detection in healthcare claims. In: 2016 15th ieee international conference on machine learning and applications (ICMLA), Anaheim, CA, USA, pp 347–354

  • Bayerstadler A, Dijk LV, Winter F (2016) Bayesian multinomial latent variable modeling for fraud and abuse detection in health insurance. Insur Math Econ 71:244–252

  • Branting K, Reeder F, Gold J, Champney T (2016) Graph analytics for healthcare fraud risk estimation. In: IEEE/ACM international conference on advances in social networks analysis and mining (ASONAM), Davis California, San Francisco, CA, USA, pp 845–851

  • Chandola V, Sukumar VR, Schryver JC (2013) Knowledge discovery from massive healthcare claims data. In: Proceedings of 19th ACM SIGKDD international conference on knowledge discovery and data mining, Chicago, Illinois, USA, pp 1312–1320

  • Chelladurai U, Pandian S (2021) A novel blockchain based electronic health record automation system for healthcare. J Ambient Intell Humaniz Comput 13:693–703

    Article  Google Scholar 

  • Gupta A, Anand R Medical Provider Fraud Detection, Dataset, Kaggle. Available: https://www.kaggle.com/rohitrox/medical-provider-fraud-detection

  • Hancock JT, Khoshgoftaar TM (2021) Gradient boosted decision tree algorithms for medicare fraud detection. SN Comput Sci 2(268):1–12

    Google Scholar 

  • Haque ME, Tozal ME (2022) Identifying health insurance claim frauds using mixture of clinical concepts. IEEE Trans Serv Comput 15(4):2356–2367

    Article  Google Scholar 

  • He H, Hawkins S, Graco WJ, Yao X (2000) Application of genetic algorithm and k-nearest neighbour method in real world medical fraud detection problem. J Adv Comput Intell Inf 4(1):130–137

    Article  Google Scholar 

  • Johnson JM, Khoshgoftaar TM (2019) Medicare fraud detection using neural networks. J Big Data 6:1–35

    Article  Google Scholar 

  • Johnson JM, Khoshgoftaar TM (2021) Medical provider embeddings for healthcare fraud detection. SN Comput Sci 2(276):1–15

    Google Scholar 

  • Johnson ME, Nagarur N (2015) Multi-stage methodology to detect health insurance claim fraud. Health Care Manag Sci 19(3):249–260

    Article  Google Scholar 

  • Kose I, Gokturk M, Kilic K (2015) An interactive machine-learning-based electronic fraud and abuse detection system in healthcare insurance. Appl Soft Comput 36:283–299

    Article  Google Scholar 

  • Li J, Huang KY, Shi J (2008) A survey on statistical methods for health care fraud detection. Health Care Manag Sci 11(3):275–287

    Article  Google Scholar 

  • Marr B (2020) How big data is changing healthcare, Forbes, 2020. https://www.forbes.com/sites/bernardmarr/2015/04/21/how-big-data-is-changing-healthcare

  • Mary AJ, Claret SPA (2023) MapReduce-iterative support vector machine classifier: novel fraud detection systems in healthcare insurance industry. Int J Electr Comput Eng 13(1):756–769

  • Matloob I, Khan SA, Rahman HU (2020) Sequence mining and prediction-based healthcare fraud detection methodology. Inst Electr Electron Eng Access 8:143256–143273

    Google Scholar 

  • Ngufor C, Wojtusiak J (2013) Unsupervised labeling of data for supervised learning and its application to medical claims prediction. Comput Sci 14(2):191–214

    Article  Google Scholar 

  • Ozbayoglu AM, Gudelek MU, Sezer OB (2020) Deep learning for financial applications: a survey. Appl Soft Comput 93:10638

    Article  Google Scholar 

  • Sekharan GH, Dora P (2015) Healthcare insurance fraud detection leveraging big data analytics. Int J Sci Res 4(4):2073–2076

    Google Scholar 

  • Settipalli L, Gangadharan GR (2023) WMTDBC: an unsupervised multivariate analysis model for fraud detection in health insurance claims. Expert Syst Appl 215

  • Settipalli L, Gangadharan GR (2021) Provider profiling and labelling of fraudulent health insurance claims using Weighted MultiTree. J Ambient Intell Humaniz Comput 73(6):1–22

    Google Scholar 

  • Shin H, Park H, Lee J, Jhee WC (2012) A Scoring model to detect abusive billing patterns in health insurance claims. Expert Syst Appl 39(8):7441–7450

  • Simborg DW (2008) Healthcare fraud: Whose problem is it anyway? J Am Med Inform Assoc 15(3):278–280

    Article  Google Scholar 

  • Van Capelleveen GC, Poel M, Mueller R, Thornton D, van Hillegersberg J (2016b) Outlier detection in healthcare fraud: a case study in the medicaid dental domain. Int J Acc Inf Syst 21:18–31

    Article  Google Scholar 

  • van Capelleveen GC, Poel M, Mueller R, Thornton D, van Hillegersberg J (2016a) Outlier detection in healthcare fraud: a case study in the medicaid dental domain. Int J Acc Inf Syst 21(1):18–31

  • Vosseler A (2022) Unsupervised insurance fraud prediction based on anomaly detector ensembles. Risks 10(132)

  • Warneke D, Kao O (2009) Nephele: efficient parallel data processing in the cloud. In: Proceedings of the 2nd workshop on many-task computing on grids and supercomputers, New York, NY, USA, pp 1–10

  • Wu X, Zhu X, Wu G, Ding W (2014) Data mining with big data. IEEE Trans Knowl Data Eng 26(1):93–109

    Google Scholar 

  • Yamanishi K, Takeuchi J, Williams GJ, Milne P (2000) On-line unsupervised outlier detection using finite mixtures with discounting learning algorithms. Data Min Knowl Disc 8(1):275–300

    MathSciNet  Google Scholar 

  • Yang W, Hwang S (2006) A process-mining framework for the detection of healthcare fraud and abuse. Expert Syst Appl 31:56–68

    Article  Google Scholar 

  • Zhou S, He J, Yang H, Chen D, Zhang R (2020) Big data-driven abnormal behavior detection in healthcare based on association rules. Inst Electr Electron Eng Access 8:129002–129011

    Google Scholar 

Download references

Acknowledgements

The authors are grateful to all who supported us in producing this article and to those who contributed to this study.

Funding

The authors received no specific funding for this study.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to S. P. Angelin Claret.

Ethics declarations

Conflict of interest

The authors declare that they have no conflicts of interest to report regarding the present study.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Mary, A.J., Claret, S.P.A. Design and development of big data-based model for detecting fraud in healthcare insurance industry. Soft Comput 27, 8357–8369 (2023). https://doi.org/10.1007/s00500-023-08296-5

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00500-023-08296-5

Keywords

Navigation