Skip to main content

Using a Data Mining Approach to Detect Automobile Insurance Fraud

  • Conference paper
  • First Online:
Proceedings of the 13th International Conference on Soft Computing and Pattern Recognition (SoCPaR 2021) (SoCPaR 2021)

Part of the book series: Lecture Notes in Networks and Systems ((LNNS,volume 417))

Included in the following conference series:

Abstract

The number of policyholders involved in fraudulent activities has increased dramatically in recent years. Intentionally misleading insurers by missing facts when claiming insurance has resulted in massive losses for insurers. There is indeed a lack of a robust system for rigorously addressing insurance fraud. Throughout the paper, we presented a data mining approach to detect fraudulent claims. Following two sampling methods (SMOTE, ROSE) to remove the class imbalance and experimenting with two different features subsets (the first composed of 23 predictors, the second of 5 predictors), we employed Random Forests and Logistic Regression. For validation, we experimented with a (75:25) split ratio for a real dataset of automobile insurance claims to test the performance of our proposed models. The results revealed that the models built using the second feature selection perform slightly better with a higher rate of correctly classified fraudulent claims (Random Forest recall = 95.24%); further, statistically, there is an insignificant difference between SMOTE and ROSE. Finally, the study has demonstrated that random forest outperforms logistic regression.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 259.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 329.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Abdallah, A., Maarof, M.A., Zainal, A.: Fraud detection system: a survey. J. Netw. Comput. Appl. 68, 90–113 (2016)

    Article  Google Scholar 

  2. ACFE: ACFE, report to the nations on occupational fraud and abuse (2020). Online https://acfepublic.s3-us-west-2.amazonaws.com/2020-Report-to-the-Nations.pdf

  3. Artís, M., Ayuso, M., Guillén, M.: Detection of automobile insurance fraud with discrete choice models and misclassified claims. J. Risk Insur. 69(3), 325–340 (2002)

    Article  Google Scholar 

  4. Badriyah, T., Rahmaniah, L., Syarif, I.: Nearest neighbour and statistics method based for detecting fraud in auto insurance. In: 2018 International Conference on Applied Engineering (ICAE), pp. 1–5. IEEE (2018)

    Google Scholar 

  5. Belhadji, B., Dionne, G., et al.: Development of an expert system for automatic detection of automobile insurance fraud. Technical report, Ecole des Hautes Etudes Commerciales de Montreal-Chaire de gestion des risques (1997)

    Google Scholar 

  6. Bhowmik, R.: Detecting auto insurance fraud by data mining techniques. J. Emerg. Trends Comput. Inf. Sci. 2(4), 156–162 (2011)

    Google Scholar 

  7. Burri, R.D., Burri, R., Bojja, R.R., Buruga, S.R.: Insurance claim analysis using machine learning algorithms. Int. J. Adv. Sci. Technol. 127(1), 147–155 (2019)

    Google Scholar 

  8. Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: Smote: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002)

    Article  Google Scholar 

  9. Davis, J., Goadrich, M.: The relationship between precision-recall and ROC curves. In: Proceedings of the 23rd International Conference on Machine Learning, pp. 233–240 (2006)

    Google Scholar 

  10. Derrig, R.A.: Insurance fraud. J. Risk Insur. 69(3), 271–287 (2002)

    Article  Google Scholar 

  11. García, V., Sánchez, J.S., Mollineda, R.A.: On the effectiveness of preprocessing methods when dealing with different levels of class imbalance. Knowl.-Based Syst. 25(1), 13–21 (2012)

    Article  Google Scholar 

  12. Gong, C., Gu, L.: A novel smote-based classification approach to online data imbalance problem. Math. Prob. Eng. 2016 (2016)

    Google Scholar 

  13. Haixiang, G., Yijing, L., Shang, J., Mingyun, G., Yuanyue, H., Bing, G.: Learning from class-imbalanced data: review of methods and applications. Expert Syst. Appl. 73, 220–239 (2017)

    Article  Google Scholar 

  14. Harjai, S., Khatri, S.K., Singh, G.: Detecting fraudulent insurance claims using random forests and synthetic minority oversampling technique. In: 2019 4th International Conference on Information Systems and Computer Networks (ISCON), pp. 123–128. IEEE (2019)

    Google Scholar 

  15. Itri, B., Mohamed, Y., Mohammed, Q., Omar, B.: Performance comparative study of machine learning algorithms for automobile insurance fraud detection. In: 2019 Third International Conference on Intelligent Computing in Data Sciences (ICDS), pp. 1–4. IEEE (2019)

    Google Scholar 

  16. Kowshalya, G., Nandhini, M.: Predicting fraudulent claims in automobile insurance. In: 2018 Second International Conference on Inventive Communication and Computational Technologies (ICICCT), pp. 1338–1343. IEEE (2018)

    Google Scholar 

  17. Lunardon, N., Menardi, G., Torelli, N.: Rose: a package for binary imbalanced learning. R J. 6(1) (2014)

    Google Scholar 

  18. Lunardon, N., Menardi, G., Torelli, N., Lunardon, M.N., Suggests, M.: Package ‘Rose’ (2021)

    Google Scholar 

  19. Marqués, A.I., García, V., Sánchez, J.S.: On the suitability of resampling techniques for the class imbalance problem in credit scoring. J. Oper. Res. Soc. 64(7), 1060–1070 (2013)

    Article  Google Scholar 

  20. Menardi, G., Torelli, N.: Training and assessing classification rules with imbalanced data. Data Min. Knowl. Disc. 28(1), 92–122 (2012). https://doi.org/10.1007/s10618-012-0295-5

    Article  MathSciNet  MATH  Google Scholar 

  21. Mqadi, N., Naicker, N., Adeliyi, T.: A smote based oversampling data-point approach to solving the credit card data imbalance problem in financial fraud detection. Int. J. Comput. Digit. Syst. 10(1), 277–286 (2021)

    Article  Google Scholar 

  22. Nokeri, T.C.: Logistic regression analysis. In: Nokeri, T.C. (ed.) Data Science Revealed, pp. 91–115. Apress, Berkeley, CA (2021). https://doi.org/10.1007/978-1-4842-6870-4_5

    Chapter  Google Scholar 

  23. Pedregosa, F., et al.: Scikit-learn: machine learning in python. J. Mach. Learn. Res. 12, 2825–2830 (2011)

    MathSciNet  MATH  Google Scholar 

  24. Phua, C., Alahakoon, D., Lee, V.: Minority report in fraud detection: classification of skewed data. ACM SIGKDD Explor. Newsl. 6(1), 50–59 (2004)

    Article  Google Scholar 

  25. Spelmen, V.S., Porkodi, R.: A review on handling imbalanced data. In: 2018 International Conference on Current Trends towards Converging Technologies (ICCTCT), pp. 1–11. IEEE (2018)

    Google Scholar 

  26. Subudhi, S., Panigrahi, S.: Effect of class imbalanceness in detecting automobile insurance fraud. In: 2018 2nd International Conference on Data Science and Business Analytics (ICDSBA), pp. 528–531. IEEE (2018)

    Google Scholar 

  27. Subudhi, S., Panigrahi, S.: Use of optimized fuzzy c-means clustering and supervised classifiers for automobile insurance fraud detection. J. King Saud Univ-Comput. Inf. Sci. 32(5), 568–575 (2020)

    Google Scholar 

  28. Sundarkumar, G.G., Ravi, V.: A novel hybrid undersampling method for mining unbalanced datasets in banking and insurance. Eng. Appl. Artif. Intell. 37, 368–377 (2015)

    Article  Google Scholar 

  29. Sundarkumar, G.G., Ravi, V., Siddeshwar, V.: One-class support vector machine based undersampling: application to churn prediction and insurance fraud detection. In: 2015 IEEE International Conference on Computational Intelligence and Computing Research (ICCIC), pp. 1–7. IEEE (2015)

    Google Scholar 

  30. Tao, H., Zhixin, L., Xiaodong, S.: Insurance fraud identification research based on fuzzy support vector machine with dual membership. In: 2012 International Conference on Information Management, Innovation Management and Industrial Engineering, vol. 3, pp. 457–460. IEEE (2012)

    Google Scholar 

  31. Tian, X.: Insurance fraud detection: an exploratory data mining approach. In: Southwest Decision Sciences Institute 48th Annual Meeting (2017)

    Google Scholar 

  32. Viaene, S., Dedene, G., Derrig, R.A.: Auto claim fraud detection using Bayesian learning neural networks. Expert Syst. Appl. 29(3), 653–666 (2005)

    Article  Google Scholar 

  33. Wen, C.H., Wang, M.J., Lan, L.W.: Discrete choice modeling for bundled automobile insurance policies. J. Eastern Asia Soc. Transp. Stud. 6, 1914–1928 (2005)

    Google Scholar 

  34. Xu, W., Wang, S., Zhang, D., Yang, B.: Random rough subspace based neural network ensemble for insurance fraud detection. In: 2011 Fourth International Joint Conference on Computational Sciences and Optimization, pp. 1276–1280. IEEE (2011)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Salmi, M., Atif, D. (2022). Using a Data Mining Approach to Detect Automobile Insurance Fraud. In: Abraham, A., et al. Proceedings of the 13th International Conference on Soft Computing and Pattern Recognition (SoCPaR 2021). SoCPaR 2021. Lecture Notes in Networks and Systems, vol 417. Springer, Cham. https://doi.org/10.1007/978-3-030-96302-6_5

Download citation

Publish with us

Policies and ethics