Modeling Insurance Fraud Detection Using Imbalanced Data Classification

  • Amira Kamil Ibrahim HassanEmail author
  • Ajith Abraham
Conference paper
Part of the Advances in Intelligent Systems and Computing book series (AISC, volume 419)


This paper proposes an innovative insurance fraud detection method to deal with the imbalanced data distribution. The idea is based on building insurance fraud detection models using Decision tree (DT), Support vector machine (SVM) and Artificial Neural Network (ANN), on data partitions derived from under-sampling (with-replacement and without-replacement) of the majority class and merging it with the minority class. Throughout the paper, ten-fold cross validation method of testing is used. Its originality lies in the use of several partitioning under-sampling approaches and choosing the best. Results from a publicly available automobile insurance fraud detection data set demonstrate that DT performs slightly better than other algorithms, so DT model was used to compare between different partitioning-under-sampling approaches. Empirical results illustrate that the proposed model gave better results.


Insurance fraud detection Imbalanced data Decision tree Support vector machine and artificial neural network 


  1. 1.
    Phua, C., Alahakoon, Damminda, Lee, Vincent: Minority report in fraud detection: classification of skewed data. ACM SIGKDD Explor. Newsl. 6, 50–59 (2004)CrossRefGoogle Scholar
  2. 2.
    Wolpert, D.H., Macready, W.G.: No free lunch theorems for optimization. IEEE Trans. Evol. Comput. 1, 67–82 (1997)CrossRefGoogle Scholar
  3. 3.
    Pérez, J.M., Muguerza, J., Arbelaitz, O., Gurrutxaga, I., Martín, J.I.: Consolidated tree classifier learning in a car insurance fraud detection domain with class imbalance. In: Pattern Recognition and Data Mining (ed), pp. 381–389. Springer (2005)Google Scholar
  4. 4.
    Farquad, M., Ravi, V., Raju, S.B.: Analytical CRM in banking and finance using SVM: a modified active learning–based rule extraction approach. Int. J. Electron. Customer Relat. Manag. 6, 48–73 (2012)CrossRefGoogle Scholar
  5. 5.
    Sundarkumar, G.G., Ravi, V.: A novel hybrid undersampling method for mining unbalanced datasets in banking and insurance. Eng. Appl. Artif. Intell. 37, 368–377 (2015)CrossRefGoogle Scholar
  6. 6.
    Ibarguren, I., Pérez, M., Muguerza, J., Gurrutxaga, I., Arbelaitz, O.: Coverage based resampling: building robust consolidated decision trees. Knowl.-Based Syst. (2015)Google Scholar
  7. 7.
    Hassan, A.K.I., Abraham, A.: Computational intelligence models for insurance fraud detection: a review of a decade of research. J. Netw. Innovative Comput. 1, 341–347 (2013)Google Scholar
  8. 8.
    Sternberg, M., Reynolds, R.G.: Using cultural algorithms to support re-engineering of rule-based expert systems in dynamic performance environments: a case study in fraud detection. IEEE Trans. Evol. Comput. 1, 225–243 (1997)CrossRefGoogle Scholar
  9. 9.
    Brockett, P.L., Xia, X., Derrig, R.A.: Using Kohonen’s self-organizing feature map to uncover automobile bodily injury claims fraud. J. Risk Insur. 245–274 (1998)Google Scholar
  10. 10.
    Tennyson, S., Salsas-Forn, P.: Claims auditing in automobile insurance: fraud detection and deterrence objectives. J. Risk. Insur. 69, 289–308 (2002)CrossRefGoogle Scholar
  11. 11.
    Artı́s, M., Ayuso, M., Guillén, M.: Modelling different types of automobile insurance fraud behaviour in the Spanish market. Insur.: Math. Econ. 24, 67–81 (1999)Google Scholar
  12. 12.
    Artís, M., Ayuso, M., Guillén, M.: Detection of automobile insurance fraud with discrete choice models and misclassified claims. J. Risk Insur. 69, 325–340 (2002)CrossRefGoogle Scholar
  13. 13.
    Caudill, S.B., Ayuso, M., Guillen, M.: Fraud detection using a multinomial logit model with missing information. J. Risk Insur. 72, 539–550 (2005)CrossRefGoogle Scholar
  14. 14.
    Belhadji, E.B., Dionne, G., Tarkhani, F.: A model for the detection of insurance fraud. In: Geneva Papers on Risk and Insurance. Issues and Practice, pp. 517–538 (2000)Google Scholar
  15. 15.
    Pinquet, J., Ayuso, M., Guillen, M.: Selection bias and auditing policies for insurance claims. J. Risk Insur. 74, 425–440 (2007)CrossRefGoogle Scholar
  16. 16.
    Viaene, S., Derrig, R.A., Baesens, B., Dedene, G.: A comparison of state-of-the-art classification techniques for expert automobile insurance claim fraud detection. J. Risk Insur. 69, 373–421 (2002)CrossRefGoogle Scholar
  17. 17.
    Viaene, S., Derrig, R.A., Dedene, G.: A case study of applying boosting Naive Bayes to claim fraud diagnosis. IEEE Trans. Knowl. Data Eng. 16, 612–620 (2004)CrossRefGoogle Scholar
  18. 18.
    Viaene, S., Dedene, G., Derrig, R.A.: Auto claim fraud detection using Bayesian learning neural networks. Expert Syst. Appl. 29, 653–666 (2005)CrossRefGoogle Scholar
  19. 19.
    Xu, W., Wang, S., Zhang, D., Yang, B., Random rough subspace based neural network ensemble for insurance fraud detection. In: 2011 Fourth International Joint Conference on Computational Sciences and Optimization (CSO), pp. 1276–1280 (2011)Google Scholar
  20. 20.
    Vasu, M., Ravi, V.: A hybrid under-sampling approach for mining unbalanced datasets: applications to banking and insurance. Int. J. Data Min. Model. Manage. 3, 75–105 (2011)Google Scholar
  21. 21.
    Viaene, S., Ayuso, M., Guillen, M., Van Gheel, D., Dedene, G.: Strategies for detecting fraudulent claims in the automobile insurance industry. Eur. J. Oper. Res. 176, 565–583 (2007)CrossRefzbMATHGoogle Scholar
  22. 22.
    Bhowmik, R.: Detecting auto insurance fraud by data mining techniques. J. Emerg. Trends Comput. Inf. Sci. 2, 156–162 (2011)MathSciNetGoogle Scholar
  23. 23.
    Chan, P.K., Fan, W., Prodromidis, A.L., Stolfo, S.J.: Distributed data mining in credit card fraud detection. Intell. Syst. Appl. IEEE 14, 67–74 (1999)CrossRefGoogle Scholar
  24. 24.
    Chan, P.K., Stolfo, S.J.: A comparative evaluation of voting and meta-learning on partitioned data. In ICML, pp. 90–98 (1995)Google Scholar
  25. 25.
    Tomar, D., Agarwal, S.: A survey on Data Mining approaches for Healthcare. Int. J. Bio-Sci. Bio-Technol. 5, 241–266 (2013)CrossRefGoogle Scholar
  26. 26.
    Apté, C., Weiss, S.: Data mining with decision trees and decision rules. Future Gener. Comput. Syst. 13, 197–210 (1997)CrossRefGoogle Scholar
  27. 27.
    Cristianini, N., Shawe-Taylor, J.: An Introduction to Support Vector Machines and Other Kernel-based Learning Methods. Cambridge university press (2000)Google Scholar
  28. 28.
    Cristianini, N., Shawe-Taylor, J.: An Introduction to Support Vector Machines (ed). Cambridge University Press (2000)Google Scholar
  29. 29.
    Silver, M., Sakata, T., Su, H.-C., Herman, C., Dolins, S.B., Shea, M.J.O.: Case study: how to apply data mining techniques in a healthcare data warehouse. J. Healthc. Inf. Manage. 15, 155–164 (2001)Google Scholar
  30. 30.
    Phua, C., Alahakoon, D., Lee, V.: Minority report in fraud detection: classification of skewed data. ACM SIGKDD Explor. Newsl. 6, 50–59 (2004)CrossRefGoogle Scholar
  31. 31.
    Hassan, A.K.I., Abraham, A.: Modeling consumer loan default prediction using neural netware. In: 2013 International Conference on Computing, Electrical and Electronics Engineering (ICCEEE), pp. 239-243 (2013)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2016

Authors and Affiliations

  1. 1.Department of computer scienceSudan University of Science and TechnologyKhartoumSudan
  2. 2.Machine Intelligence Research Labs (MIR Labs)AuburnUSA
  3. 3.IT4Innovations, VSB - Technical University of OstravaOstravaCzech Republic

Personalised recommendations