Streaming active learning strategies for real-life credit card fraud detection: assessment and visualization

Carcillo, Fabrizio; Le Borgne, Yann-Aël; Caelen, Olivier; Bontempi, Gianluca

doi:10.1007/s41060-018-0116-z

Streaming active learning strategies for real-life credit card fraud detection: assessment and visualization

Applications
Published: 04 April 2018

Volume 5, pages 285–300, (2018)
Cite this article

International Journal of Data Science and Analytics Aims and scope Submit manuscript

Fabrizio Carcillo¹,
Yann-Aël Le Borgne¹,
Olivier Caelen² &
…
Gianluca Bontempi¹

1740 Accesses
54 Citations
2 Altmetric
Explore all metrics

A Correction to this article was published on 02 May 2018

This article has been updated

Abstract

Credit card fraud detection is a very challenging problem because of the specific nature of transaction data and the labeling process. The transaction data are peculiar because they are obtained in a streaming fashion, and they are strongly imbalanced and prone to non-stationarity. The labeling is the outcome of an active learning process, as every day human investigators contact only a small number of cardholders (associated with the riskiest transactions) and obtain the class (fraud or genuine) of the related transactions. An adequate selection of the set of cardholders is therefore crucial for an efficient fraud detection process. In this paper, we present a number of active learning strategies and we investigate their fraud detection accuracies. We compare different criteria (supervised, semi-supervised and unsupervised) to query unlabeled transactions. Finally, we highlight the existence of an exploitation/exploration trade-off for active learning in the context of fraud detection, which has so far been overlooked in the literature.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Using Active Learning Methods for Predicting Fraudulent Financial Statements

Computational Techniques for Real-Time Credit Card Fraud Detection

Iterative cleaning and learning of big highly-imbalanced fraud data using unsupervised learning

Article Open access 19 June 2023

Change history

02 May 2018
A line was misplaced in Table 1. The correct Table 1 is given below.

Notes

Though some papers on fraud detection present datasets with still lower rates (0.01% in [15], 0.005% in [2], 0.02% in [51] and 0.004% in [36]), our dataset is inline with other recent works on fraud detection ([22, 47] and [39] have a class imbalance rate of 0.8, 0.5 and 0.4%, respectively).
The use of two different learning strategies is justified by the need to assess the robustness of the AL strategies with respect to different learning methods and different detection tasks (transaction based and card based).
https://github.com/fabriziocarcillo/StreamingActiveLearningStrategies.
We made the Streaming Active Learning Strategies repository available in http://github.com/fabriziocarcillo/.

References

Aggarwal, C.C.: Outlier analysis. In: Data Mining, pp. 237–263. Springer, New York (2015)
Bhattacharyya, S., Jha, S., Tharakunnel, K., Westland, J.C.: Data mining for credit card fraud: a comparative study. Decis. Support Syst. 50(3), 602–613 (2011)
Article Google Scholar
Bolton, R.J., Hand, D.J., et al.: Unsupervised profiling methods for fraud detection. In: Credit Scoring and Credit Control, vol. VII, pp. 235–255 (2001)
Breunig, M.M., Kriegel, H.P., Ng, R.T., Sander, J.: Lof: identifying density-based local outliers. In: ACM Sigmod Record, vol. 29, pp. 93–104. ACM (2000)
Carcillo, F., Dal Pozzolo, A., Le Borgne, Y.A., Caelen, O., Mazzer, Y., Bontempi, G.: Scarff: a scalable framework for streaming credit card fraud detection with spark. Inf. Fus. 41, 182–194 (2018)
Carcillo, F., Le Borgne, Y.A., Caelen, O., Bontempi, G.: An assessment of streaming active learning strategies for real-life credit card fraud detection. In: DSAA-The 4th IEEE International Conference on Data Science and Advanced Analytics, vol. 7, pp. 783–790 (2017)
Chandola, V., Banerjee, A., Kumar, V.: Anomaly detection: a survey. ACM Comput. Surv. (CSUR) 41(3), 15 (2009)
Article Google Scholar
Chapelle, O., Scholkopf, B., Zien, A.: Semi-supervised learning (Chapelle, o. et al., eds.; 2006) [book reviews]. IEEE Trans. Neural Netw. 20(3), 542 (2009)
Article Google Scholar
Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: Smote: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002)
MATH Google Scholar
Chen, C., Liaw, A., Breiman, L.: Using Random Forest to Learn Imbalanced Data, vol. 110. University of California, Berkeley (2004)
Google Scholar
Cohn, D., Atlas, L., Ladner, R.: Improving generalization with active learning. Mach. Learn. 15(2), 201–221 (1994)
Google Scholar
Dal Pozzolo, A., Caelen, O., Le Borgne, Y.A., Waterschoot, S., Bontempi, G.: Learned lessons in credit card fraud detection from a practitioner perspective. Expert Syst. Appl. 41(10), 4915–4928 (2014)
Article Google Scholar
Dal Pozzolo, A., Boracchi, G., Caelen, O., Alippi, C., Bontempi, G.: Credit card fraud detection: a realistic modeling and a novel learning strategy. IEEE Trans. Neural Netw. Learn. Syst. PP(99), 1–14 (2017). https://doi.org/10.1109/TNNLS.2017.2736643
Dasgupta, S.: Two faces of active learning. Theoret. Comput. Sci. 412(19), 1767–1781 (2011)
Article MathSciNet MATH Google Scholar
Dorronsoro, J.R., Ginel, F., Sgnchez, C., Cruz, C.: Neural fraud detection in credit card operations. IEEE Trans. Neural Netw. 8(4), 827–834 (1997)
Article Google Scholar
Drews, P., Núñez, P., Rocha, R.P., Campos, M., Dias, J.: Novelty detection and segmentation based on gaussian mixture models: a case study in 3d robotic laser mapping. Robot. Auton. Syst. 61(12), 1696–1709 (2013)
Article Google Scholar
Ertekin, S., Huang, J., Bottou, L., Giles, L.: Learning on the border: active learning in imbalanced data classification. In: Proceedings of the Sixteenth ACM Conference on Information and Knowledge Management, pp. 127–136. ACM (2007)
Fan, W., Huang, Y.A., Wang, H., Yu, P.S.: Active mining of data streams. In: Proceedings of the 2004 SIAM International Conference on Data Mining, pp. 457–461. SIAM (2004)
Ilonen, J., Paalanen, P., Kamarainen, J.K., Kalviainen, H.: Gaussian mixture pdf in one-class classification: computing and utilizing confidence values. In: 18th International Conference on Pattern Recognition, ICPR 2006, vol. 2, pp. 577–580. IEEE (2006)
Jacobusse, G., Veenman, C.: On selection bias with imbalanced classes. In: International Conference on Discovery Science, pp. 325–340. Springer, New York (2016)
Japkowicz, N., Stephen, S.: The class imbalance problem: a systematic study. Intell. Data Anal. 6(5), 429–449 (2002)
MATH Google Scholar
Jurgovsky, J., Granitzer, M., Ziegler, K., Calabretto, S., Portier, P.E., He-Guelton, L., Caelen, O.: Sequence classification for credit-card fraud detection. Expert Syst. Appl. (2018)
Kriegel, H.P., Kröger, P., Schubert, E., Zimek, A.: Loop: local outlier probabilities. In: Proceedings of the 18th ACM Conference on Information and Knowledge Management, pp. 1649–1652. ACM, New York, NY, USA (2009). https://doi.org/10.1145/1645953.1646195
Lewis, D.D., Catlett, J.: Heterogeneous uncertainty sampling for supervised learning. In: In Proceedings of the Eleventh International Conference on Machine Learning, pp. 148–156. Morgan Kaufmann (1994)
Lewis, D.D., Gale, W.A.: A sequential algorithm for training text classifiers. In: Proceedings of the 17th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 3–12. Springer, New York (1994)
Li, L., Hansman, R.J., Palacios, R., Welsch, R.: Anomaly detection via a Gaussian mixture model for flight operation and safety monitoring. Transp. Res. Part C Emerg. Technol. 64, 45–57 (2016)
Article Google Scholar
Liu, F.T., Ting, K.M., Zhou, Z.H.: Isolation forest. In: Eighth IEEE International Conference on Data Mining. ICDM’08, pp. 413–422. IEEE (2008)
Palau, C., Arregui, F., Carlos, M.: Burst detection in water networks using principal component analysis. J. Water Resour. Plan. Manag. 138(1), 47–54 (2011)
Article Google Scholar
Pang, G., Cao, L., Chen, L., Liu, H.: Learning homophily couplings from non-iid data for joint feature selection and noise-resilient outlier detection. In: Proceedings of the 26th International Joint Conference on Artificial Intelligence, pp. 2585–2591. AAAI Press (2017)
Pang, G., Cao, L., Chen, L., Liu, H.: Unsupervised feature selection for outlier detection by modelling hierarchical value-feature couplings. In: 2016 IEEE 16th International Conference on Data Mining (ICDM), pp. 410–419. IEEE (2016)
Pichara, K., Soto, A., Araneda, A.: Detection of anomalies in large datasets using an active learning scheme based on dirichlet distributions. In: Ibero-American Conference on Artificial Intelligence, pp. 163–172. Springer (2008)
Pimentel, M.A., Clifton, D.A., Clifton, L., Tarassenko, L.: A review of novelty detection. Signal Process. 99, 215–249 (2014)
Article Google Scholar
Pinto da Costa, J.F., Alonso, H., Roque, L.: A weighted principal component analysis and its application to gene expression data. IEEE/ACM Trans. Comput. Biol. Bioinform. 8(1), 246–252 (2011)
Article Google Scholar
Ren, D., Wang, B., Perrizo, W.: Rdf: a density-based outlier detection method using vertical data representation. In: Fourth IEEE International Conference on Data Mining, ICDM’04, pp. 503–506. IEEE (2004)
Rokach, L.: Decision forest: twenty years of research. Inf. Fus. 27, 111–125 (2016)
Article Google Scholar
Sahin, Y., Bulkan, S., Duman, E.: A cost-sensitive decision tree approach for fraud detection. Expert Syst. Appl. 40(15), 5916–5923 (2013)
Article Google Scholar
Schohn, G., Cohn, D.: Less is more: active learning with support vector machines. In: ICML, pp. 839–846. Citeseer (2000)
Schölkopf, B., Williamson, R.C., Smola, A.J., Shawe-Taylor, J., Platt, J.C.: Support vector method for novelty detection. In: Advances in Neural Information Processing Systems, pp. 582–588 (2000)
Seeja, KR., Zareapoor, M.: Fraudminer: a novel credit card fraud detection model based on frequent itemset mining. Sci. World J. 2014, 1–10 (2014)
Sethi, N., Gera, A.: A revived survey of various credit card fraud detection techniques. Int. J. Comput. Sci. Mobile Comput. 3(4), 780–791 (2014)
Google Scholar
Settles, B., Craven, M., Ray, S.: Multiple-instance active learning. In: Advances in Neural Information Processing Systems, pp. 1289–1296 (2008)
Settles, B.: Active learning literature survey. Univ. Wis. Madison 52(55–66), 11 (2010)
Google Scholar
Shimpi, P.R., Kadroli, V.: Survey on credit card fraud detection techniques. Int. J. Eng. Comput. Sci. 4(11), 15010–15015 (2015)
Google Scholar
Shyu, M.L., Chen, S.C., Sarinnapakorn, K., Chang, L.: A novel anomaly detection scheme based on principal component classifier. Technical report, Miami Univ Coral Gables FL Dept of Electrical and Computer Engineering (2003)
Srivastava, A., Kundu, A., Sural, S., Majumdar, A.: Credit card fraud detection using hidden Markov model. IEEE Trans. Dependable Secur. Comput. 5(1), 37–48 (2008)
Article Google Scholar
Tang, J., Chen, Z., Fu, A., Cheung, D.: Enhancing effectiveness of outlier detections for low density patterns. Adv. Knowl. Discov. Data Min. 535–548 (2002)
Van Vlasselaer, V., Bravo, C., Caelen, O., Eliassi-Rad, T., Akoglu, L., Snoeck, M., Baesens, B.: Apate: a novel approach for automated credit card transaction fraud detection using network-based extensions. Decis. Support Syst. 75, 38–48 (2015)
Article Google Scholar
Van Vlasselaer, V., Eliassi-Rad, T., Akoglu, L., Snoeck, M., Baesens, B.: Afraid: fraud detection via active inference in time-evolving social networks. In: IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM), pp. 659–666. IEEE (2015)
Vijayanarasimhan, S., Jain, P., Grauman, K.: Far-sighted active learning on a budget for image and video recognition. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3035–3042. IEEE (2010)
Wang, W., Guan, X., Zhang, X.: A novel intrusion detection method based on principle component analysis in computer security. Adv. Neural Netw. ISNN 2004, 88–89 (2004)
Google Scholar
Wei, W., Li, J., Cao, L., Ou, Y., Chen, J.: Effective detection of sophisticated online banking fraud on extremely imbalanced data. World Wide Web 16(4), 449–475 (2013)
Article Google Scholar
Xie, J., Xiong, T.: Stochastic semi-supervised learning on partially labeled imbalanced data. In: Active Learning and Experimental Design workshop In conjunction with AISTATS 2010, pp. 85–98 (2011)
Zareapoor, M., Shamsolmoali, P.: Application of credit card fraud detection: Based on bagging ensemble classifier. Proc. Comput. Sci. 48, 679–685 (2015)
Article Google Scholar
Zhang, Y., Bingham, C., Martínez-García, M., Cox, D.: Detection of emerging faults on industrial gas turbines using extended Gaussian mixture models. Int. J. Rotating Mach. 2017, 1–9 (2017)
Zhang, K., Hutter, M., Jin, H.: A new local distance-based outlier detection approach for scattered real-world data. Adv. Knowl. Discov. Data Min. 5476, 813–822 (2009)
Zhu, J., Hovy, E.H.: Active learning for word sense disambiguation with methods for addressing the class imbalance problem. EMNLP-CoNLL 7, 783–790 (2007)
Google Scholar
Žliobaite, I., Bifet, A., Pfahringer, B., Holmes, G.: Active learning with evolving streaming data. In: Joint European Conference on Machine Learning and Knowledge Discovery in Databases, pp. 597–612. Springer (2011)

Download references

Acknowledgements

The authors FC, YLB and GB acknowledge the funding of the Brufence project (Scalable machine learning for automating defense system) supported by INNOVIRIS (Brussels Institute for the encouragement of scientific research and innovation).

Funding

Computational resources have been provided by the Consortium des quipements de Calcul Intensif (CCI), funded by the Fonds de la Recherche Scientifique de Belgique (F.R.S.-FNRS) under Grant No. 2.5020.11.

Author information

Authors and Affiliations

Machine Learning Group, Computer Science Department, Faculty of Sciences, Université Libre de Bruxelles (ULB), Brussels, Belgium
Fabrizio Carcillo, Yann-Aël Le Borgne & Gianluca Bontempi
R&D, Worldline, Brussels, Belgium
Olivier Caelen

Authors

Fabrizio Carcillo
View author publications
You can also search for this author in PubMed Google Scholar
Yann-Aël Le Borgne
View author publications
You can also search for this author in PubMed Google Scholar
Olivier Caelen
View author publications
You can also search for this author in PubMed Google Scholar
Gianluca Bontempi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Fabrizio Carcillo.

Ethics declarations

Conflicts of interest

On behalf of all authors, the corresponding author states that there is no conflict of interest.

Additional information

The original version of this article was revised: A line was misplaced in Table 1.

This paper is an extension version of the DSAA’2017 Application Track paper titled: “An Assessment of Streaming Active Learning Strategies for Real-Life Credit Card Fraud Detection” [6].

Rights and permissions

Reprints and permissions

About this article

Cite this article

Carcillo, F., Le Borgne, YA., Caelen, O. et al. Streaming active learning strategies for real-life credit card fraud detection: assessment and visualization. Int J Data Sci Anal 5, 285–300 (2018). https://doi.org/10.1007/s41060-018-0116-z

Download citation

Received: 13 November 2017
Accepted: 22 March 2018
Published: 04 April 2018
Issue Date: June 2018
DOI: https://doi.org/10.1007/s41060-018-0116-z

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Streaming active learning strategies for real-life credit card fraud detection: assessment and visualization

Abstract

Access this article

Similar content being viewed by others

Using Active Learning Methods for Predicting Fraudulent Financial Statements

Computational Techniques for Real-Time Credit Card Fraud Detection

Iterative cleaning and learning of big highly-imbalanced fraud data using unsupervised learning

Change history

02 May 2018

Notes

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflicts of interest

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Streaming active learning strategies for real-life credit card fraud detection: assessment and visualization

Abstract

Access this article

Similar content being viewed by others

Using Active Learning Methods for Predicting Fraudulent Financial Statements

Computational Techniques for Real-Time Credit Card Fraud Detection

Iterative cleaning and learning of big highly-imbalanced fraud data using unsupervised learning

Change history

02 May 2018

Notes

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflicts of interest

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation