Abstract
The advancement in healthcare services has been increasing widely to extend several services with intense quality. One of the important issues affecting the effective use of public funds is the detection of health insurance fraud. Previous techniques of detecting fraud pay close attention to characteristics of a single visit rather than many patient visits. Due to a higher false positive rate and poor profile construction, the common traits have reduced detection performance. This paper introduces a novel and intelligent Provider Fraud_Anomaly Detection System (PF_ADS) by combining big data and deep learning approaches for the healthcare insurance industry. The proposed framework contributes to improvising the preprocessing and classification phases to detect provider fraud at an untimely phase. Initially, the collected datasets are preprocessed using a Relative Risk-based MapReduce framework that builds an organized set of relationships between diseases, patients, and claiming variables. The classification phase is improvised using a proposed Recurrent Neural Network (RNN). It consists of sophisticated steps to consider the significant attributes using hyperparameter optimization. Recalling ability is one of the best parts of RNNs that defines the past and present states of the networks. Therefore, the ability of network state predictions and the tuning of parameters is studied by improved Decisional Score-based Bayesian Optimization (DS_BO). Finally, the best attributes with the selective hyperparameters are fed into the input layer of the Recurrent Neural Networks (RNNs) to classify the anomalies from the provider’s end. The proposed PF_ADS framework is experimented with and validated on the public repositories. The experimental results state that the proposed framework outperforms better than the other methods in terms of accuracy (88.09%), precision (14.15%), recall (32.80%), and 92.30 s computational time.
Similar content being viewed by others
Data availability
Enquiries about data availability should be directed to the authors.
References
Ashtiani MN, Raahemi B (2022) Intelligent fraud detection in financial statements using machine learning and data mining: a systematic literature review. Inst Electr Electron Eng Access 10:72504–72525
Bauder RA, Khoshgoftaar TM, Seliya N (2017) A survey on the state of healthcare upcoding fraud analysis and detection. Health Serv Outcomes Res Method 17:31–55
Bauder RA, Khoshgoftaar TM (2016) A probabilistic programming approach for outlier detection in healthcare claims. In: 2016 15th ieee international conference on machine learning and applications (ICMLA), Anaheim, CA, USA, pp 347–354
Bayerstadler A, Dijk LV, Winter F (2016) Bayesian multinomial latent variable modeling for fraud and abuse detection in health insurance. Insur Math Econ 71:244–252
Branting K, Reeder F, Gold J, Champney T (2016) Graph analytics for healthcare fraud risk estimation. In: IEEE/ACM international conference on advances in social networks analysis and mining (ASONAM), Davis California, San Francisco, CA, USA, pp 845–851
Chandola V, Sukumar VR, Schryver JC (2013) Knowledge discovery from massive healthcare claims data. In: Proceedings of 19th ACM SIGKDD international conference on knowledge discovery and data mining, Chicago, Illinois, USA, pp 1312–1320
Chelladurai U, Pandian S (2021) A novel blockchain based electronic health record automation system for healthcare. J Ambient Intell Humaniz Comput 13:693–703
Gupta A, Anand R Medical Provider Fraud Detection, Dataset, Kaggle. Available: https://www.kaggle.com/rohitrox/medical-provider-fraud-detection
Hancock JT, Khoshgoftaar TM (2021) Gradient boosted decision tree algorithms for medicare fraud detection. SN Comput Sci 2(268):1–12
Haque ME, Tozal ME (2022) Identifying health insurance claim frauds using mixture of clinical concepts. IEEE Trans Serv Comput 15(4):2356–2367
He H, Hawkins S, Graco WJ, Yao X (2000) Application of genetic algorithm and k-nearest neighbour method in real world medical fraud detection problem. J Adv Comput Intell Inf 4(1):130–137
Johnson JM, Khoshgoftaar TM (2019) Medicare fraud detection using neural networks. J Big Data 6:1–35
Johnson JM, Khoshgoftaar TM (2021) Medical provider embeddings for healthcare fraud detection. SN Comput Sci 2(276):1–15
Johnson ME, Nagarur N (2015) Multi-stage methodology to detect health insurance claim fraud. Health Care Manag Sci 19(3):249–260
Kose I, Gokturk M, Kilic K (2015) An interactive machine-learning-based electronic fraud and abuse detection system in healthcare insurance. Appl Soft Comput 36:283–299
Li J, Huang KY, Shi J (2008) A survey on statistical methods for health care fraud detection. Health Care Manag Sci 11(3):275–287
Marr B (2020) How big data is changing healthcare, Forbes, 2020. https://www.forbes.com/sites/bernardmarr/2015/04/21/how-big-data-is-changing-healthcare
Mary AJ, Claret SPA (2023) MapReduce-iterative support vector machine classifier: novel fraud detection systems in healthcare insurance industry. Int J Electr Comput Eng 13(1):756–769
Matloob I, Khan SA, Rahman HU (2020) Sequence mining and prediction-based healthcare fraud detection methodology. Inst Electr Electron Eng Access 8:143256–143273
Ngufor C, Wojtusiak J (2013) Unsupervised labeling of data for supervised learning and its application to medical claims prediction. Comput Sci 14(2):191–214
Ozbayoglu AM, Gudelek MU, Sezer OB (2020) Deep learning for financial applications: a survey. Appl Soft Comput 93:10638
Sekharan GH, Dora P (2015) Healthcare insurance fraud detection leveraging big data analytics. Int J Sci Res 4(4):2073–2076
Settipalli L, Gangadharan GR (2023) WMTDBC: an unsupervised multivariate analysis model for fraud detection in health insurance claims. Expert Syst Appl 215
Settipalli L, Gangadharan GR (2021) Provider profiling and labelling of fraudulent health insurance claims using Weighted MultiTree. J Ambient Intell Humaniz Comput 73(6):1–22
Shin H, Park H, Lee J, Jhee WC (2012) A Scoring model to detect abusive billing patterns in health insurance claims. Expert Syst Appl 39(8):7441–7450
Simborg DW (2008) Healthcare fraud: Whose problem is it anyway? J Am Med Inform Assoc 15(3):278–280
Van Capelleveen GC, Poel M, Mueller R, Thornton D, van Hillegersberg J (2016b) Outlier detection in healthcare fraud: a case study in the medicaid dental domain. Int J Acc Inf Syst 21:18–31
van Capelleveen GC, Poel M, Mueller R, Thornton D, van Hillegersberg J (2016a) Outlier detection in healthcare fraud: a case study in the medicaid dental domain. Int J Acc Inf Syst 21(1):18–31
Vosseler A (2022) Unsupervised insurance fraud prediction based on anomaly detector ensembles. Risks 10(132)
Warneke D, Kao O (2009) Nephele: efficient parallel data processing in the cloud. In: Proceedings of the 2nd workshop on many-task computing on grids and supercomputers, New York, NY, USA, pp 1–10
Wu X, Zhu X, Wu G, Ding W (2014) Data mining with big data. IEEE Trans Knowl Data Eng 26(1):93–109
Yamanishi K, Takeuchi J, Williams GJ, Milne P (2000) On-line unsupervised outlier detection using finite mixtures with discounting learning algorithms. Data Min Knowl Disc 8(1):275–300
Yang W, Hwang S (2006) A process-mining framework for the detection of healthcare fraud and abuse. Expert Syst Appl 31:56–68
Zhou S, He J, Yang H, Chen D, Zhang R (2020) Big data-driven abnormal behavior detection in healthcare based on association rules. Inst Electr Electron Eng Access 8:129002–129011
Acknowledgements
The authors are grateful to all who supported us in producing this article and to those who contributed to this study.
Funding
The authors received no specific funding for this study.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflicts of interest to report regarding the present study.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Mary, A.J., Claret, S.P.A. Design and development of big data-based model for detecting fraud in healthcare insurance industry. Soft Comput 27, 8357–8369 (2023). https://doi.org/10.1007/s00500-023-08296-5
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00500-023-08296-5