Skip to main content
Log in

Network analytics for insurance fraud detection: a critical case study

  • Case Study
  • Published:
European Actuarial Journal Aims and scope Submit manuscript

Abstract

There has been an increasing interest in fraud detection methods, driven by new regulations and by the financial losses linked to fraud. One of the state-of-the-art methods to fight fraud is network analytics. Network analytics leverages the interactions between different entities to detect complex patterns that are indicative of fraud. However, network analytics has only recently been applied to fraud detection in the actuarial literature. Although it shows much potential, many network methods are not yet applied. This paper extends the literature in two main ways. First, we review and apply multiple methods in the context of insurance fraud and assess their predictive power against each other. Second, we analyse the added value of network features over intrinsic features to detect fraud. We conclude that (1) complex methods do not necessarily outperform basic network features, and that (2) network analytics helps to detect different fraud patterns, compared to models trained on claim-specific features alone.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Algorithm 1
Algorithm 2
Algorithm 3
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

Data Availability

The healthcare provider data is available on kaggle (https://www.kaggle.com/datasets/rohitrox/healthcare-provider-fraud-detection-analysis). The motor insurance data set is proprietary.

Notes

  1. https://www.kaggle.com/datasets/rohitrox/healthcare-provider-fraud-detection-analysis.

  2. https://github.com/B-Deprez/NetworkFraud_BiRank_M2V_SAGE.

  3. Among an insurer’s (independent) companies over state lines.

  4. Among subsidiaries of an insurance company.

  5. Among different agents involved, i.e., hospitals, patients and pharmacies.

  6. Since claim and fraud data are highly sensitive, we only give a rough approximation of the numbers, which can either be rounded up or down. The total number is the sum of these semi-random numbers.

  7. https://www.kaggle.com/datasets/rohitrox/healthcare-provider-fraud-detection-analysis.

References

  1. Abdallah A, Maarof MA, Zainal A (2016) Fraud detection system: a survey. J Netw Comput Appl 68:90–113

    Article  Google Scholar 

  2. Arsov N, Mirceva G (2019) Network embedding: an overview. arXiv:1911.11726

  3. Baesens B, Van Vlasselaer V, Verbeke W (2015) Fraud analytics using descriptive, predictive, and social network techniques: a guide to data science for fraud detection. Wiley, New York

    Book  Google Scholar 

  4. Barabáse AL (2020) Network science, 5th edn. Cambridge University Press, Cambridge

    Google Scholar 

  5. Bockel-Rickermann C, Verdonck T, Verbeke W (2023) Fraud analytics: a decade of research organizing challenges and solutions in the field. Expert Syst Appl 232:120605

  6. Cai H, Zheng VW, Chang KCC (2018) A comprehensive survey of graph embedding: problems, techniques, and applications. IEEE Trans Knowl Data Eng 30(9):1616–1637

    Article  Google Scholar 

  7. Chen C, Liang C, Lin J, et al (2019) Infdetect: a large scale graph-based fraud detection system for e-commerce insurance. In: 2019 IEEE international conference on big data (Big Data). IEEE, pp 1765–1773

  8. CSIRO’s Data61 (2018) Stellargraph machine learning library. https://github.com/stellargraph/stellargraph

  9. Davis J, Goadrich M (2006) The relationship between precision-recall and ROC curves. In: Proceedings of the 23rd International Conference on Machine Learning (ICML’06). Pittsburgh, Pennsylvania, USA, PP 233–240

  10. Derrig RA (2002) Insurance fraud. J Risk Insur 69(3):271–287

    Article  Google Scholar 

  11. Dong Y, Chawla NV, Swami A (2017) metapath2vec: scalable representation learning for heterogeneous networks. In: Proceedings of the 23rd ACM SIGKDD international conference on knowledge discovery and data mining (KDD’17). Halifax, NS, Canada, pp 135–144

  12. EIOPA (2019) Big data analytics in motor and health insurance: a thematic review. Publications Office of the European Union, Luxembourg. https://www.eiopa.europa.eu/document/download/becbbe3a-ba4c-47b9-870a-63872fef3986_en?filename=Big%20Data%20Analytics%20in%20motor%20and%20health%20insurance%3A%20A%20thematic%20review

  13. Geisberger R, Sanders P, Schultes D (2008) Better approximation of betweenness centrality. In: 2008 Proceedings of the tenth workshop on algorithm engineering and experiments (ALENEX). SIAM, pp 90–100

  14. Goyal P, Ferrara E (2018) Graph embedding techniques, applications, and performance: a survey. Knowl Based Syst 151:78–94

    Article  Google Scholar 

  15. Hamilton W, Ying Z, Leskovec J (2017) Inductive representation learning on large graphs. Adv Neural Inf Process Syst 30:1024–1034

  16. Hamilton WL, Ying R, Leskovec J (2018) Representation learning on graphs: methods and applications. arXiv:1709.05584

  17. He X, Gao M, Kan MY et al (2016) BiRank: towards ranking on bipartite graphs. IEEE Trans Knowl Data Eng 29(1):57–71

    Article  Google Scholar 

  18. Hou M, Ren J, Zhang D et al (2020) Network embedding: taxonomies, frameworks and applications. Comput Sci Rev 38:100296

    Article  MathSciNet  Google Scholar 

  19. Insurance Europe (2019) Insurance fraud: not a victimless crime. https://www.insuranceeurope.eu/publications/703/insurance-fraud-not-a-victimless-crime/. Accessed 10 Jan 2023

  20. Kipf TN, Welling M (2017) Semi-supervised classification with graph convolutional networks. arXiv:1609.02907

  21. Koutra D, Ke TY, Kang U, et al (2011) Unifying guilt-by-association approaches: theorems and fast algorithms. In: Joint European conference on machine learning and knowledge discovery in databases. Springer, pp 245–260

  22. McPherson M, Smith-Lovin L, Cook JM (2001) Birds of a feather: homophily in social networks. Ann Rev Sociol 27(1):415–444

    Article  Google Scholar 

  23. Menon NM (2015) Information spillovers and semicollaborative networks in insurer fraud detection. MIS Q 42(2):407–426

  24. Mikolov T, Chen K, Corrado G, et al (2013) Efficient estimation of word representations in vector space. arXiv:1301.3781

  25. Newman M (2010) Networks: an introduction. Oxford University Press, Oxford. https://doi.org/10.1093/acprof:oso/9780199206650.001.0001

    Book  Google Scholar 

  26. Óskarsdóttir M, Ahmed W, Antonio K et al (2022) Social network analytics for supervised fraud detection in insurance. Risk Anal 42(8):1872–1890

    Article  Google Scholar 

  27. Ozenne B, Subtil F, Maucort-Boulch D (2015) The precision-recall curve overcame the optimism of the receiver operating characteristic curve in rare diseases. J Clin Epidemiol 68(8):855–859

    Article  Google Scholar 

  28. Page L, Brin S, Motwani R et al (1999) The PageRank citation ranking: bringing order to the web. Tech. rep, Stanford InfoLab

  29. Park J, Barabási AL (2007) Distribution of node characteristics in complex networks. Proc Natl Acad Sci 104(46):17916–17920

    Article  Google Scholar 

  30. Peng J, Li Q, Li H, et al (2018) Fraud detection of medical insurance employing outlier analysis. In: 2018 IEEE 22nd international conference on computer supported cooperative work in design ((CSCWD)). IEEE, pp 341–346

  31. Perozzi B, Al-Rfou R, Skiena S (2014) Deepwalk: online learning of social representations. In: Proceedings of the 20th ACM SIGKDD international conference on knowledge discovery and data mining (KDD ’14). New York, USA, pp 701–710

  32. Scarselli F, Yong SL, Gori M, et al (2005) Graph neural networks for ranking web pages. In: The 2005 IEEE/WIC/ACM international conference on web intelligence (WI’05). Compiegne, France, pp 666–672

  33. Scarselli F, Gori M, Tsoi AC et al (2008) The graph neural network model. IEEE Trans Neural Netw 20(1):61–80

    Article  Google Scholar 

  34. Šubelj L, Furlan Š, Bajec M (2011) An expert system for detecting automobile insurance fraud using social network analysis. Expert Syst Appl 38(1):1039–1052

    Article  Google Scholar 

  35. Sun C, Li Q, Cui L et al (2018) Heterogeneous network-based chronic disease progression mining. Big Data Min Anal 2(1):25–34

    Article  Google Scholar 

  36. Sun C, Yan Z, Li Q et al (2018) Abnormal group-based joint medical fraud detection. IEEE Access 7:13589–13596

    Article  Google Scholar 

  37. Tumminello M, Consiglio A, Vassallo P et al (2022) Insurance fraud detection: a statistically validated network approach. J Risk Insur 90(2):381–419

  38. Van Belle R, Van Damme C, Tytgat H et al (2022) Inductive graph representation learning for fraud detection. Expert Syst Appl 193:116463

    Article  Google Scholar 

  39. Van Belle R, Baesens B, De Weerdt J (2023) CATCHM: a novel network-based credit card fraud detection method using node representation learning. Decis Support Syst 164:113866

    Article  Google Scholar 

  40. Van Vlasselaer V, Bravo C, Caelen O et al (2015) APATE: a novel approach for automated credit card transaction fraud detection using network-based extensions. Decis Support Syst 75:38–48

    Article  Google Scholar 

  41. Van Vlasselaer V, Eliassi-Rad T, Akoglu L et al (2017) Gotcha! network-based fraud detection for social security fraud. Manag Sci 63(9):3090–3110

    Article  Google Scholar 

  42. Veličković P, Cucurull G, Casanova A, et al (2018) Graph attention networks. arXiv:1710.10903

  43. Verbeke W, Martens D, Baesens B (2014) Social network analysis for customer churn prediction. Appl Soft Comput 14:431–446

    Article  Google Scholar 

  44. Xiao S, Bai T, Cui X, et al (2022) A graph-based contrastive learning framework for medicare insurance fraud detection. Front Comput Sci 17(2):172341

  45. Yoo Y, Shin J, Kyeong S (2023) Medicare fraud detection using graph analysis: a comparative study of machine learning and graph neural networks. IEEE Access 11:88278–88294

  46. Zhao B, Shi Y, Zhang K, et al (2019) Health insurance anomaly detection based on dynamic heterogeneous information network. In: 2019 IEEE international conference on bioinformatics and biomedicine (BIBM). IEEE, pp 1118–1122

Download references

Funding

This work was supported by the Research Foundation—Flanders (FWO research project 1SHEN24N).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Bruno Deprez.

Ethics declarations

Conflict of interest

No potential conflict of interest was reported by the author(s).

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Deprez, B., Vandervorst, F., Verbeke, W. et al. Network analytics for insurance fraud detection: a critical case study. Eur. Actuar. J. (2024). https://doi.org/10.1007/s13385-024-00384-6

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s13385-024-00384-6

Keywords

Navigation