Skip to main content
Log in

Streaming active learning strategies for real-life credit card fraud detection: assessment and visualization

  • Applications
  • Published:
International Journal of Data Science and Analytics Aims and scope Submit manuscript

A Correction to this article was published on 02 May 2018

This article has been updated

Abstract

Credit card fraud detection is a very challenging problem because of the specific nature of transaction data and the labeling process. The transaction data are peculiar because they are obtained in a streaming fashion, and they are strongly imbalanced and prone to non-stationarity. The labeling is the outcome of an active learning process, as every day human investigators contact only a small number of cardholders (associated with the riskiest transactions) and obtain the class (fraud or genuine) of the related transactions. An adequate selection of the set of cardholders is therefore crucial for an efficient fraud detection process. In this paper, we present a number of active learning strategies and we investigate their fraud detection accuracies. We compare different criteria (supervised, semi-supervised and unsupervised) to query unlabeled transactions. Finally, we highlight the existence of an exploitation/exploration trade-off for active learning in the context of fraud detection, which has so far been overlooked in the literature.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13

Similar content being viewed by others

Change history

  • 02 May 2018

    A line was misplaced in Table 1. The correct Table 1 is given below.

Notes

  1. Though some papers on fraud detection present datasets with still lower rates (0.01% in [15], 0.005% in [2], 0.02% in [51] and 0.004% in [36]), our dataset is inline with other recent works on fraud detection ([22, 47] and [39] have a class imbalance rate of 0.8, 0.5 and 0.4%, respectively).

  2. The use of two different learning strategies is justified by the need to assess the robustness of the AL strategies with respect to different learning methods and different detection tasks (transaction based and card based).

  3. https://github.com/fabriziocarcillo/StreamingActiveLearningStrategies.

  4. We made the Streaming Active Learning Strategies repository available in http://github.com/fabriziocarcillo/.

References

  1. Aggarwal, C.C.: Outlier analysis. In: Data Mining, pp. 237–263. Springer, New York (2015)

  2. Bhattacharyya, S., Jha, S., Tharakunnel, K., Westland, J.C.: Data mining for credit card fraud: a comparative study. Decis. Support Syst. 50(3), 602–613 (2011)

    Article  Google Scholar 

  3. Bolton, R.J., Hand, D.J., et al.: Unsupervised profiling methods for fraud detection. In: Credit Scoring and Credit Control, vol. VII, pp. 235–255 (2001)

  4. Breunig, M.M., Kriegel, H.P., Ng, R.T., Sander, J.: Lof: identifying density-based local outliers. In: ACM Sigmod Record, vol. 29, pp. 93–104. ACM (2000)

  5. Carcillo, F., Dal Pozzolo, A., Le Borgne, Y.A., Caelen, O., Mazzer, Y., Bontempi, G.: Scarff: a scalable framework for streaming credit card fraud detection with spark. Inf. Fus. 41, 182–194 (2018)

  6. Carcillo, F., Le Borgne, Y.A., Caelen, O., Bontempi, G.: An assessment of streaming active learning strategies for real-life credit card fraud detection. In: DSAA-The 4th IEEE International Conference on Data Science and Advanced Analytics, vol. 7, pp. 783–790 (2017)

  7. Chandola, V., Banerjee, A., Kumar, V.: Anomaly detection: a survey. ACM Comput. Surv. (CSUR) 41(3), 15 (2009)

    Article  Google Scholar 

  8. Chapelle, O., Scholkopf, B., Zien, A.: Semi-supervised learning (Chapelle, o. et al., eds.; 2006) [book reviews]. IEEE Trans. Neural Netw. 20(3), 542 (2009)

    Article  Google Scholar 

  9. Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: Smote: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002)

    MATH  Google Scholar 

  10. Chen, C., Liaw, A., Breiman, L.: Using Random Forest to Learn Imbalanced Data, vol. 110. University of California, Berkeley (2004)

    Google Scholar 

  11. Cohn, D., Atlas, L., Ladner, R.: Improving generalization with active learning. Mach. Learn. 15(2), 201–221 (1994)

    Google Scholar 

  12. Dal Pozzolo, A., Caelen, O., Le Borgne, Y.A., Waterschoot, S., Bontempi, G.: Learned lessons in credit card fraud detection from a practitioner perspective. Expert Syst. Appl. 41(10), 4915–4928 (2014)

    Article  Google Scholar 

  13. Dal Pozzolo, A., Boracchi, G., Caelen, O., Alippi, C., Bontempi, G.: Credit card fraud detection: a realistic modeling and a novel learning strategy. IEEE Trans. Neural Netw. Learn. Syst. PP(99), 1–14 (2017). https://doi.org/10.1109/TNNLS.2017.2736643

  14. Dasgupta, S.: Two faces of active learning. Theoret. Comput. Sci. 412(19), 1767–1781 (2011)

    Article  MathSciNet  MATH  Google Scholar 

  15. Dorronsoro, J.R., Ginel, F., Sgnchez, C., Cruz, C.: Neural fraud detection in credit card operations. IEEE Trans. Neural Netw. 8(4), 827–834 (1997)

    Article  Google Scholar 

  16. Drews, P., Núñez, P., Rocha, R.P., Campos, M., Dias, J.: Novelty detection and segmentation based on gaussian mixture models: a case study in 3d robotic laser mapping. Robot. Auton. Syst. 61(12), 1696–1709 (2013)

    Article  Google Scholar 

  17. Ertekin, S., Huang, J., Bottou, L., Giles, L.: Learning on the border: active learning in imbalanced data classification. In: Proceedings of the Sixteenth ACM Conference on Information and Knowledge Management, pp. 127–136. ACM (2007)

  18. Fan, W., Huang, Y.A., Wang, H., Yu, P.S.: Active mining of data streams. In: Proceedings of the 2004 SIAM International Conference on Data Mining, pp. 457–461. SIAM (2004)

  19. Ilonen, J., Paalanen, P., Kamarainen, J.K., Kalviainen, H.: Gaussian mixture pdf in one-class classification: computing and utilizing confidence values. In: 18th International Conference on Pattern Recognition, ICPR 2006, vol. 2, pp. 577–580. IEEE (2006)

  20. Jacobusse, G., Veenman, C.: On selection bias with imbalanced classes. In: International Conference on Discovery Science, pp. 325–340. Springer, New York (2016)

  21. Japkowicz, N., Stephen, S.: The class imbalance problem: a systematic study. Intell. Data Anal. 6(5), 429–449 (2002)

    MATH  Google Scholar 

  22. Jurgovsky, J., Granitzer, M., Ziegler, K., Calabretto, S., Portier, P.E., He-Guelton, L., Caelen, O.: Sequence classification for credit-card fraud detection. Expert Syst. Appl. (2018)

  23. Kriegel, H.P., Kröger, P., Schubert, E., Zimek, A.: Loop: local outlier probabilities. In: Proceedings of the 18th ACM Conference on Information and Knowledge Management, pp. 1649–1652. ACM, New York, NY, USA (2009). https://doi.org/10.1145/1645953.1646195

  24. Lewis, D.D., Catlett, J.: Heterogeneous uncertainty sampling for supervised learning. In: In Proceedings of the Eleventh International Conference on Machine Learning, pp. 148–156. Morgan Kaufmann (1994)

  25. Lewis, D.D., Gale, W.A.: A sequential algorithm for training text classifiers. In: Proceedings of the 17th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 3–12. Springer, New York (1994)

  26. Li, L., Hansman, R.J., Palacios, R., Welsch, R.: Anomaly detection via a Gaussian mixture model for flight operation and safety monitoring. Transp. Res. Part C Emerg. Technol. 64, 45–57 (2016)

    Article  Google Scholar 

  27. Liu, F.T., Ting, K.M., Zhou, Z.H.: Isolation forest. In: Eighth IEEE International Conference on Data Mining. ICDM’08, pp. 413–422. IEEE (2008)

  28. Palau, C., Arregui, F., Carlos, M.: Burst detection in water networks using principal component analysis. J. Water Resour. Plan. Manag. 138(1), 47–54 (2011)

    Article  Google Scholar 

  29. Pang, G., Cao, L., Chen, L., Liu, H.: Learning homophily couplings from non-iid data for joint feature selection and noise-resilient outlier detection. In: Proceedings of the 26th International Joint Conference on Artificial Intelligence, pp. 2585–2591. AAAI Press (2017)

  30. Pang, G., Cao, L., Chen, L., Liu, H.: Unsupervised feature selection for outlier detection by modelling hierarchical value-feature couplings. In: 2016 IEEE 16th International Conference on Data Mining (ICDM), pp. 410–419. IEEE (2016)

  31. Pichara, K., Soto, A., Araneda, A.: Detection of anomalies in large datasets using an active learning scheme based on dirichlet distributions. In: Ibero-American Conference on Artificial Intelligence, pp. 163–172. Springer (2008)

  32. Pimentel, M.A., Clifton, D.A., Clifton, L., Tarassenko, L.: A review of novelty detection. Signal Process. 99, 215–249 (2014)

    Article  Google Scholar 

  33. Pinto da Costa, J.F., Alonso, H., Roque, L.: A weighted principal component analysis and its application to gene expression data. IEEE/ACM Trans. Comput. Biol. Bioinform. 8(1), 246–252 (2011)

    Article  Google Scholar 

  34. Ren, D., Wang, B., Perrizo, W.: Rdf: a density-based outlier detection method using vertical data representation. In: Fourth IEEE International Conference on Data Mining, ICDM’04, pp. 503–506. IEEE (2004)

  35. Rokach, L.: Decision forest: twenty years of research. Inf. Fus. 27, 111–125 (2016)

    Article  Google Scholar 

  36. Sahin, Y., Bulkan, S., Duman, E.: A cost-sensitive decision tree approach for fraud detection. Expert Syst. Appl. 40(15), 5916–5923 (2013)

    Article  Google Scholar 

  37. Schohn, G., Cohn, D.: Less is more: active learning with support vector machines. In: ICML, pp. 839–846. Citeseer (2000)

  38. Schölkopf, B., Williamson, R.C., Smola, A.J., Shawe-Taylor, J., Platt, J.C.: Support vector method for novelty detection. In: Advances in Neural Information Processing Systems, pp. 582–588 (2000)

  39. Seeja, KR., Zareapoor, M.: Fraudminer: a novel credit card fraud detection model based on frequent itemset mining. Sci. World J. 2014, 1–10 (2014)

  40. Sethi, N., Gera, A.: A revived survey of various credit card fraud detection techniques. Int. J. Comput. Sci. Mobile Comput. 3(4), 780–791 (2014)

    Google Scholar 

  41. Settles, B., Craven, M., Ray, S.: Multiple-instance active learning. In: Advances in Neural Information Processing Systems, pp. 1289–1296 (2008)

  42. Settles, B.: Active learning literature survey. Univ. Wis. Madison 52(55–66), 11 (2010)

    Google Scholar 

  43. Shimpi, P.R., Kadroli, V.: Survey on credit card fraud detection techniques. Int. J. Eng. Comput. Sci. 4(11), 15010–15015 (2015)

    Google Scholar 

  44. Shyu, M.L., Chen, S.C., Sarinnapakorn, K., Chang, L.: A novel anomaly detection scheme based on principal component classifier. Technical report, Miami Univ Coral Gables FL Dept of Electrical and Computer Engineering (2003)

  45. Srivastava, A., Kundu, A., Sural, S., Majumdar, A.: Credit card fraud detection using hidden Markov model. IEEE Trans. Dependable Secur. Comput. 5(1), 37–48 (2008)

    Article  Google Scholar 

  46. Tang, J., Chen, Z., Fu, A., Cheung, D.: Enhancing effectiveness of outlier detections for low density patterns. Adv. Knowl. Discov. Data Min. 535–548 (2002)

  47. Van Vlasselaer, V., Bravo, C., Caelen, O., Eliassi-Rad, T., Akoglu, L., Snoeck, M., Baesens, B.: Apate: a novel approach for automated credit card transaction fraud detection using network-based extensions. Decis. Support Syst. 75, 38–48 (2015)

    Article  Google Scholar 

  48. Van Vlasselaer, V., Eliassi-Rad, T., Akoglu, L., Snoeck, M., Baesens, B.: Afraid: fraud detection via active inference in time-evolving social networks. In: IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM), pp. 659–666. IEEE (2015)

  49. Vijayanarasimhan, S., Jain, P., Grauman, K.: Far-sighted active learning on a budget for image and video recognition. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3035–3042. IEEE (2010)

  50. Wang, W., Guan, X., Zhang, X.: A novel intrusion detection method based on principle component analysis in computer security. Adv. Neural Netw. ISNN 2004, 88–89 (2004)

    Google Scholar 

  51. Wei, W., Li, J., Cao, L., Ou, Y., Chen, J.: Effective detection of sophisticated online banking fraud on extremely imbalanced data. World Wide Web 16(4), 449–475 (2013)

    Article  Google Scholar 

  52. Xie, J., Xiong, T.: Stochastic semi-supervised learning on partially labeled imbalanced data. In: Active Learning and Experimental Design workshop In conjunction with AISTATS 2010, pp. 85–98 (2011)

  53. Zareapoor, M., Shamsolmoali, P.: Application of credit card fraud detection: Based on bagging ensemble classifier. Proc. Comput. Sci. 48, 679–685 (2015)

    Article  Google Scholar 

  54. Zhang, Y., Bingham, C., Martínez-García, M., Cox, D.: Detection of emerging faults on industrial gas turbines using extended Gaussian mixture models. Int. J. Rotating Mach. 2017, 1–9 (2017)

  55. Zhang, K., Hutter, M., Jin, H.: A new local distance-based outlier detection approach for scattered real-world data. Adv. Knowl. Discov. Data Min. 5476, 813–822 (2009)

  56. Zhu, J., Hovy, E.H.: Active learning for word sense disambiguation with methods for addressing the class imbalance problem. EMNLP-CoNLL 7, 783–790 (2007)

    Google Scholar 

  57. Žliobaite, I., Bifet, A., Pfahringer, B., Holmes, G.: Active learning with evolving streaming data. In: Joint European Conference on Machine Learning and Knowledge Discovery in Databases, pp. 597–612. Springer (2011)

Download references

Acknowledgements

The authors FC, YLB and GB acknowledge the funding of the Brufence project (Scalable machine learning for automating defense system) supported by INNOVIRIS (Brussels Institute for the encouragement of scientific research and innovation).

Funding

Computational resources have been provided by the Consortium des quipements de Calcul Intensif (CCI), funded by the Fonds de la Recherche Scientifique de Belgique (F.R.S.-FNRS) under Grant No. 2.5020.11.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Fabrizio Carcillo.

Ethics declarations

Conflicts of interest

On behalf of all authors, the corresponding author states that there is no conflict of interest.

Additional information

The original version of this article was revised: A line was misplaced in Table 1.

This paper is an extension version of the DSAA’2017 Application Track paper titled: “An Assessment of Streaming Active Learning Strategies for Real-Life Credit Card Fraud Detection” [6].

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Carcillo, F., Le Borgne, YA., Caelen, O. et al. Streaming active learning strategies for real-life credit card fraud detection: assessment and visualization. Int J Data Sci Anal 5, 285–300 (2018). https://doi.org/10.1007/s41060-018-0116-z

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s41060-018-0116-z

Keywords

Navigation