Abstract
The number of policyholders involved in fraudulent activities has increased dramatically in recent years. Intentionally misleading insurers by missing facts when claiming insurance has resulted in massive losses for insurers. There is indeed a lack of a robust system for rigorously addressing insurance fraud. Throughout the paper, we presented a data mining approach to detect fraudulent claims. Following two sampling methods (SMOTE, ROSE) to remove the class imbalance and experimenting with two different features subsets (the first composed of 23 predictors, the second of 5 predictors), we employed Random Forests and Logistic Regression. For validation, we experimented with a (75:25) split ratio for a real dataset of automobile insurance claims to test the performance of our proposed models. The results revealed that the models built using the second feature selection perform slightly better with a higher rate of correctly classified fraudulent claims (Random Forest recall = 95.24%); further, statistically, there is an insignificant difference between SMOTE and ROSE. Finally, the study has demonstrated that random forest outperforms logistic regression.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Abdallah, A., Maarof, M.A., Zainal, A.: Fraud detection system: a survey. J. Netw. Comput. Appl. 68, 90–113 (2016)
ACFE: ACFE, report to the nations on occupational fraud and abuse (2020). Online https://acfepublic.s3-us-west-2.amazonaws.com/2020-Report-to-the-Nations.pdf
Artís, M., Ayuso, M., Guillén, M.: Detection of automobile insurance fraud with discrete choice models and misclassified claims. J. Risk Insur. 69(3), 325–340 (2002)
Badriyah, T., Rahmaniah, L., Syarif, I.: Nearest neighbour and statistics method based for detecting fraud in auto insurance. In: 2018 International Conference on Applied Engineering (ICAE), pp. 1–5. IEEE (2018)
Belhadji, B., Dionne, G., et al.: Development of an expert system for automatic detection of automobile insurance fraud. Technical report, Ecole des Hautes Etudes Commerciales de Montreal-Chaire de gestion des risques (1997)
Bhowmik, R.: Detecting auto insurance fraud by data mining techniques. J. Emerg. Trends Comput. Inf. Sci. 2(4), 156–162 (2011)
Burri, R.D., Burri, R., Bojja, R.R., Buruga, S.R.: Insurance claim analysis using machine learning algorithms. Int. J. Adv. Sci. Technol. 127(1), 147–155 (2019)
Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: Smote: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002)
Davis, J., Goadrich, M.: The relationship between precision-recall and ROC curves. In: Proceedings of the 23rd International Conference on Machine Learning, pp. 233–240 (2006)
Derrig, R.A.: Insurance fraud. J. Risk Insur. 69(3), 271–287 (2002)
García, V., Sánchez, J.S., Mollineda, R.A.: On the effectiveness of preprocessing methods when dealing with different levels of class imbalance. Knowl.-Based Syst. 25(1), 13–21 (2012)
Gong, C., Gu, L.: A novel smote-based classification approach to online data imbalance problem. Math. Prob. Eng. 2016 (2016)
Haixiang, G., Yijing, L., Shang, J., Mingyun, G., Yuanyue, H., Bing, G.: Learning from class-imbalanced data: review of methods and applications. Expert Syst. Appl. 73, 220–239 (2017)
Harjai, S., Khatri, S.K., Singh, G.: Detecting fraudulent insurance claims using random forests and synthetic minority oversampling technique. In: 2019 4th International Conference on Information Systems and Computer Networks (ISCON), pp. 123–128. IEEE (2019)
Itri, B., Mohamed, Y., Mohammed, Q., Omar, B.: Performance comparative study of machine learning algorithms for automobile insurance fraud detection. In: 2019 Third International Conference on Intelligent Computing in Data Sciences (ICDS), pp. 1–4. IEEE (2019)
Kowshalya, G., Nandhini, M.: Predicting fraudulent claims in automobile insurance. In: 2018 Second International Conference on Inventive Communication and Computational Technologies (ICICCT), pp. 1338–1343. IEEE (2018)
Lunardon, N., Menardi, G., Torelli, N.: Rose: a package for binary imbalanced learning. R J. 6(1) (2014)
Lunardon, N., Menardi, G., Torelli, N., Lunardon, M.N., Suggests, M.: Package ‘Rose’ (2021)
Marqués, A.I., García, V., Sánchez, J.S.: On the suitability of resampling techniques for the class imbalance problem in credit scoring. J. Oper. Res. Soc. 64(7), 1060–1070 (2013)
Menardi, G., Torelli, N.: Training and assessing classification rules with imbalanced data. Data Min. Knowl. Disc. 28(1), 92–122 (2012). https://doi.org/10.1007/s10618-012-0295-5
Mqadi, N., Naicker, N., Adeliyi, T.: A smote based oversampling data-point approach to solving the credit card data imbalance problem in financial fraud detection. Int. J. Comput. Digit. Syst. 10(1), 277–286 (2021)
Nokeri, T.C.: Logistic regression analysis. In: Nokeri, T.C. (ed.) Data Science Revealed, pp. 91–115. Apress, Berkeley, CA (2021). https://doi.org/10.1007/978-1-4842-6870-4_5
Pedregosa, F., et al.: Scikit-learn: machine learning in python. J. Mach. Learn. Res. 12, 2825–2830 (2011)
Phua, C., Alahakoon, D., Lee, V.: Minority report in fraud detection: classification of skewed data. ACM SIGKDD Explor. Newsl. 6(1), 50–59 (2004)
Spelmen, V.S., Porkodi, R.: A review on handling imbalanced data. In: 2018 International Conference on Current Trends towards Converging Technologies (ICCTCT), pp. 1–11. IEEE (2018)
Subudhi, S., Panigrahi, S.: Effect of class imbalanceness in detecting automobile insurance fraud. In: 2018 2nd International Conference on Data Science and Business Analytics (ICDSBA), pp. 528–531. IEEE (2018)
Subudhi, S., Panigrahi, S.: Use of optimized fuzzy c-means clustering and supervised classifiers for automobile insurance fraud detection. J. King Saud Univ-Comput. Inf. Sci. 32(5), 568–575 (2020)
Sundarkumar, G.G., Ravi, V.: A novel hybrid undersampling method for mining unbalanced datasets in banking and insurance. Eng. Appl. Artif. Intell. 37, 368–377 (2015)
Sundarkumar, G.G., Ravi, V., Siddeshwar, V.: One-class support vector machine based undersampling: application to churn prediction and insurance fraud detection. In: 2015 IEEE International Conference on Computational Intelligence and Computing Research (ICCIC), pp. 1–7. IEEE (2015)
Tao, H., Zhixin, L., Xiaodong, S.: Insurance fraud identification research based on fuzzy support vector machine with dual membership. In: 2012 International Conference on Information Management, Innovation Management and Industrial Engineering, vol. 3, pp. 457–460. IEEE (2012)
Tian, X.: Insurance fraud detection: an exploratory data mining approach. In: Southwest Decision Sciences Institute 48th Annual Meeting (2017)
Viaene, S., Dedene, G., Derrig, R.A.: Auto claim fraud detection using Bayesian learning neural networks. Expert Syst. Appl. 29(3), 653–666 (2005)
Wen, C.H., Wang, M.J., Lan, L.W.: Discrete choice modeling for bundled automobile insurance policies. J. Eastern Asia Soc. Transp. Stud. 6, 1914–1928 (2005)
Xu, W., Wang, S., Zhang, D., Yang, B.: Random rough subspace based neural network ensemble for insurance fraud detection. In: 2011 Fourth International Joint Conference on Computational Sciences and Optimization, pp. 1276–1280. IEEE (2011)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Salmi, M., Atif, D. (2022). Using a Data Mining Approach to Detect Automobile Insurance Fraud. In: Abraham, A., et al. Proceedings of the 13th International Conference on Soft Computing and Pattern Recognition (SoCPaR 2021). SoCPaR 2021. Lecture Notes in Networks and Systems, vol 417. Springer, Cham. https://doi.org/10.1007/978-3-030-96302-6_5
Download citation
DOI: https://doi.org/10.1007/978-3-030-96302-6_5
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-96301-9
Online ISBN: 978-3-030-96302-6
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)