Using a Data Mining Approach to Detect Automobile Insurance Fraud

Salmi, Mabrouka; Atif, Dalia

doi:10.1007/978-3-030-96302-6_5

Mabrouka Salmi¹⁷ &
Dalia Atif¹⁸

Part of the book series: Lecture Notes in Networks and Systems ((LNNS,volume 417))

Included in the following conference series:

International Conference on Soft Computing and Pattern Recognition

4 Citations

Abstract

The number of policyholders involved in fraudulent activities has increased dramatically in recent years. Intentionally misleading insurers by missing facts when claiming insurance has resulted in massive losses for insurers. There is indeed a lack of a robust system for rigorously addressing insurance fraud. Throughout the paper, we presented a data mining approach to detect fraudulent claims. Following two sampling methods (SMOTE, ROSE) to remove the class imbalance and experimenting with two different features subsets (the first composed of 23 predictors, the second of 5 predictors), we employed Random Forests and Logistic Regression. For validation, we experimented with a (75:25) split ratio for a real dataset of automobile insurance claims to test the performance of our proposed models. The results revealed that the models built using the second feature selection perform slightly better with a higher rate of correctly classified fraudulent claims (Random Forest recall = 95.24%); further, statistically, there is an insignificant difference between SMOTE and ROSE. Finally, the study has demonstrated that random forest outperforms logistic regression.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 259.00; Price excludes VAT (USA)

Softcover Book: USD 329.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Abdallah, A., Maarof, M.A., Zainal, A.: Fraud detection system: a survey. J. Netw. Comput. Appl. 68, 90–113 (2016)
Article Google Scholar
ACFE: ACFE, report to the nations on occupational fraud and abuse (2020). Online https://acfepublic.s3-us-west-2.amazonaws.com/2020-Report-to-the-Nations.pdf
Artís, M., Ayuso, M., Guillén, M.: Detection of automobile insurance fraud with discrete choice models and misclassified claims. J. Risk Insur. 69(3), 325–340 (2002)
Article Google Scholar
Badriyah, T., Rahmaniah, L., Syarif, I.: Nearest neighbour and statistics method based for detecting fraud in auto insurance. In: 2018 International Conference on Applied Engineering (ICAE), pp. 1–5. IEEE (2018)
Google Scholar
Belhadji, B., Dionne, G., et al.: Development of an expert system for automatic detection of automobile insurance fraud. Technical report, Ecole des Hautes Etudes Commerciales de Montreal-Chaire de gestion des risques (1997)
Google Scholar
Bhowmik, R.: Detecting auto insurance fraud by data mining techniques. J. Emerg. Trends Comput. Inf. Sci. 2(4), 156–162 (2011)
Google Scholar
Burri, R.D., Burri, R., Bojja, R.R., Buruga, S.R.: Insurance claim analysis using machine learning algorithms. Int. J. Adv. Sci. Technol. 127(1), 147–155 (2019)
Google Scholar
Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: Smote: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002)
Article Google Scholar
Davis, J., Goadrich, M.: The relationship between precision-recall and ROC curves. In: Proceedings of the 23rd International Conference on Machine Learning, pp. 233–240 (2006)
Google Scholar
Derrig, R.A.: Insurance fraud. J. Risk Insur. 69(3), 271–287 (2002)
Article Google Scholar
García, V., Sánchez, J.S., Mollineda, R.A.: On the effectiveness of preprocessing methods when dealing with different levels of class imbalance. Knowl.-Based Syst. 25(1), 13–21 (2012)
Article Google Scholar
Gong, C., Gu, L.: A novel smote-based classification approach to online data imbalance problem. Math. Prob. Eng. 2016 (2016)
Google Scholar
Haixiang, G., Yijing, L., Shang, J., Mingyun, G., Yuanyue, H., Bing, G.: Learning from class-imbalanced data: review of methods and applications. Expert Syst. Appl. 73, 220–239 (2017)
Article Google Scholar
Harjai, S., Khatri, S.K., Singh, G.: Detecting fraudulent insurance claims using random forests and synthetic minority oversampling technique. In: 2019 4th International Conference on Information Systems and Computer Networks (ISCON), pp. 123–128. IEEE (2019)
Google Scholar
Itri, B., Mohamed, Y., Mohammed, Q., Omar, B.: Performance comparative study of machine learning algorithms for automobile insurance fraud detection. In: 2019 Third International Conference on Intelligent Computing in Data Sciences (ICDS), pp. 1–4. IEEE (2019)
Google Scholar
Kowshalya, G., Nandhini, M.: Predicting fraudulent claims in automobile insurance. In: 2018 Second International Conference on Inventive Communication and Computational Technologies (ICICCT), pp. 1338–1343. IEEE (2018)
Google Scholar
Lunardon, N., Menardi, G., Torelli, N.: Rose: a package for binary imbalanced learning. R J. 6(1) (2014)
Google Scholar
Lunardon, N., Menardi, G., Torelli, N., Lunardon, M.N., Suggests, M.: Package ‘Rose’ (2021)
Google Scholar
Marqués, A.I., García, V., Sánchez, J.S.: On the suitability of resampling techniques for the class imbalance problem in credit scoring. J. Oper. Res. Soc. 64(7), 1060–1070 (2013)
Article Google Scholar
Menardi, G., Torelli, N.: Training and assessing classification rules with imbalanced data. Data Min. Knowl. Disc. 28(1), 92–122 (2012). https://doi.org/10.1007/s10618-012-0295-5
Article MathSciNet MATH Google Scholar
Mqadi, N., Naicker, N., Adeliyi, T.: A smote based oversampling data-point approach to solving the credit card data imbalance problem in financial fraud detection. Int. J. Comput. Digit. Syst. 10(1), 277–286 (2021)
Article Google Scholar
Nokeri, T.C.: Logistic regression analysis. In: Nokeri, T.C. (ed.) Data Science Revealed, pp. 91–115. Apress, Berkeley, CA (2021). https://doi.org/10.1007/978-1-4842-6870-4_5
Chapter Google Scholar
Pedregosa, F., et al.: Scikit-learn: machine learning in python. J. Mach. Learn. Res. 12, 2825–2830 (2011)
MathSciNet MATH Google Scholar
Phua, C., Alahakoon, D., Lee, V.: Minority report in fraud detection: classification of skewed data. ACM SIGKDD Explor. Newsl. 6(1), 50–59 (2004)
Article Google Scholar
Spelmen, V.S., Porkodi, R.: A review on handling imbalanced data. In: 2018 International Conference on Current Trends towards Converging Technologies (ICCTCT), pp. 1–11. IEEE (2018)
Google Scholar
Subudhi, S., Panigrahi, S.: Effect of class imbalanceness in detecting automobile insurance fraud. In: 2018 2nd International Conference on Data Science and Business Analytics (ICDSBA), pp. 528–531. IEEE (2018)
Google Scholar
Subudhi, S., Panigrahi, S.: Use of optimized fuzzy c-means clustering and supervised classifiers for automobile insurance fraud detection. J. King Saud Univ-Comput. Inf. Sci. 32(5), 568–575 (2020)
Google Scholar
Sundarkumar, G.G., Ravi, V.: A novel hybrid undersampling method for mining unbalanced datasets in banking and insurance. Eng. Appl. Artif. Intell. 37, 368–377 (2015)
Article Google Scholar
Sundarkumar, G.G., Ravi, V., Siddeshwar, V.: One-class support vector machine based undersampling: application to churn prediction and insurance fraud detection. In: 2015 IEEE International Conference on Computational Intelligence and Computing Research (ICCIC), pp. 1–7. IEEE (2015)
Google Scholar
Tao, H., Zhixin, L., Xiaodong, S.: Insurance fraud identification research based on fuzzy support vector machine with dual membership. In: 2012 International Conference on Information Management, Innovation Management and Industrial Engineering, vol. 3, pp. 457–460. IEEE (2012)
Google Scholar
Tian, X.: Insurance fraud detection: an exploratory data mining approach. In: Southwest Decision Sciences Institute 48th Annual Meeting (2017)
Google Scholar
Viaene, S., Dedene, G., Derrig, R.A.: Auto claim fraud detection using Bayesian learning neural networks. Expert Syst. Appl. 29(3), 653–666 (2005)
Article Google Scholar
Wen, C.H., Wang, M.J., Lan, L.W.: Discrete choice modeling for bundled automobile insurance policies. J. Eastern Asia Soc. Transp. Stud. 6, 1914–1928 (2005)
Google Scholar
Xu, W., Wang, S., Zhang, D., Yang, B.: Random rough subspace based neural network ensemble for insurance fraud detection. In: 2011 Fourth International Joint Conference on Computational Sciences and Optimization, pp. 1276–1280. IEEE (2011)
Google Scholar

Download references

Author information

Authors and Affiliations

National School of Statistics and Applied Economics, Tipaza, Algeria
Mabrouka Salmi
University of Tipaza, Tipaza, Algeria
Dalia Atif

Authors

Mabrouka Salmi
View author publications
You can also search for this author in PubMed Google Scholar
Dalia Atif
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Scientific Network for Innovation and Research Excellence, Machine Intelligence Research Labs (MIR Labs), Auburn, WA, USA
Ajith Abraham
Department of Industrial Engineering and Computer Science, Stellenbosch University, Matieland, South Africa
Andries Engelbrecht
Department of Computer Science, Università degli Studi di Milano, Milan, Italy
Fabio Scotti
Scientific Network for Innovation and Research Excellence, Machine Intelligence Research Labs (MIR Labs), Auburn, WA, USA
Niketa Gandhi
University of Mumbai, Mumbai, Maharashtra, India
Pooja Manghirmalani Mishra
University of Calabria (Unical), Rende, Italy
Giancarlo Fortino
Department of Informatics, Vilnius University, Kaunas, Lithuania
Virgilijus Sakalauskas
Center for Smart Computing Continuum, Forschung Burgenland, Eisenstadt, Austria
Sabri Pllana

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Salmi, M., Atif, D. (2022). Using a Data Mining Approach to Detect Automobile Insurance Fraud. In: Abraham, A., et al. Proceedings of the 13th International Conference on Soft Computing and Pattern Recognition (SoCPaR 2021). SoCPaR 2021. Lecture Notes in Networks and Systems, vol 417. Springer, Cham. https://doi.org/10.1007/978-3-030-96302-6_5

Download citation

DOI: https://doi.org/10.1007/978-3-030-96302-6_5
Published: 22 February 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-96301-9
Online ISBN: 978-3-030-96302-6
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics

Using a Data Mining Approach to Detect Automobile Insurance Fraud