Parallel and incremental credit card fraud detection model to handle concept drift and data imbalance
- 198 Downloads
Real-time fraud detection in credit card transactions is challenging due to the intrinsic properties of transaction data, namely data imbalance, noise, borderline entities and concept drift. The advent of mobile payment systems has further complicated the fraud detection process. This paper proposes a transaction window bagging (TWB) model, a parallel and incremental learning ensemble, as a solution to handle the issues in credit card transaction data. TWB model uses a parallelized bagging approach, incorporated with an incremental learning model, cost-sensitive base learner and a weighted voting-based combiner to effectively handle concept drift and data imbalance. Experiments were performed with Brazilian Bank data and University of California, San Diego (UCSD) data, and results were compared with state-of-the-art models. Comparisons on Brazilian Bank data indicates increased fraud detection levels between 18–38% and 1.3–2 times lower cost levels, which exhibits the enhanced performances of TWB. Comparisons on UCSD data indicate improved precision levels ranging between 8 and 25%, indicating the robustness of the TWB model. Future extensions of the proposed model will be on incorporating feature engineering to improve performances.
KeywordsFraud detection Cost-sensitive bagging Concept drift Incremental learning Data imbalance Bagging
The authors would like to thank DEITY for the financial support extended under Visvesvaraya Ph.D. scheme (NITT/RO/DEITY-Ph.D. Cont. grant/2015-16). The authors would like to acknowledge the infrastructure support provided by the Massively Parallel Programming Laboratory (CUDA Teaching Centre), Machine Learning and Data Analytics Laboratory and Big Data Laboratory, Dept. of Computer Applications, NIT, Trichy. The authors would also like to thank Dr. Manoel Fernando Gadi, Univ. of Sao Paulo, Brazil and Dr. Neda Solatani, Amirkabir University of Technology, Iran, for providing the Brazilian Bank Dataset.
Compliance with ethical standards
Conflict of interest
The authors have no conflict of interest to declare.
- 4.Somasundaram A, Reddy US (2017) Modelling a stable classifier for handling large scale data with noise and imbalance. In: Computational intelligence in data science (ICCIDS), pp 1–6Google Scholar
- 5.Akila S, Srinivasulu Reddy U (2016) Data imbalance: effects and solutions for classification of large and highly imbalanced data. In: Proceedings of ICRECT.16, pp 28–34Google Scholar
- 6.Michalski RS (1983) A theory and methodology of inductive learning, vol 20, no 2, Springer, New York, pp 83–134Google Scholar
- 11.Hassan D (2017) The impact of false negative cost on the performance of cost sensitive learning based on Bayes minimum risk. A case study in detecting fraudulent transactions. Int J Intell Syst Appl 9(2):18Google Scholar
- 14.Hegazy M, Madian A, Ragaie M (2016) Enhanced fraud miner: credit card fraud detection using clustering data mining techniques. Egypt Comput Sci J ISSN 40(03):11102586Google Scholar
- 15.Gadi MF, Wang X, do Lago AP (2008) Credit card fraud detection with artificial immune system. In: International conference on artificial immune systems, Springer, Berlin, pp 119–131Google Scholar
- 17.Ghobadi F Fahimeh, Mohsen Rohani M (2016) Cost sensitive modeling of credit card fraud using neural network strategy. In: International conference of signal processing and intelligent systems (ICSPIS), IEEE, pp 1–5Google Scholar
- 18.Bahnsen AC, Correa Alejandro, Aleksandar Stojanovic A, Djamila Aouada D, Bjorn Ottersten B (2013) Cost sensitive credit card fraud detection using Bayes minimum risk. In: 12th international conference on machine learning and applications (ICMLA), vol 1, pp 333–338Google Scholar
- 19.Bahnsen AC, Correa Alejandro, Aleksandar Stojanovic A, Djamila Aouada D, Bjorn Ottersten B (2014) Improving credit card fraud detection with calibrated probabilities. In: Proceedings of the 2014 SIAM international conference on data mining, pp 677–685Google Scholar
- 20.Bahnsen AC, Correa Alejandro, Djamia Aouada D, Bjorn Ottersten B (2014) Example-dependent cost-sensitive logistic regression for credit scoring. In: 13th international conference on in machine learning and applications (ICMLA), pp 263–269Google Scholar
- 22.Dal Pozzolo A, Boracchi G, Caelen O, Alippi C, Bontempi G (2015) Credit card fraud detection and concept-drift adaptation with delayed supervised information. In: 2015 international joint conference on Neural networks (IJCNN), pp 1–8Google Scholar
- 25.Brzezinski D Dariusz (2010) Mining data streams with concept drift. PhD dissertation, Masters thesis, Poznan University of TechnologyGoogle Scholar
- 31.van Rijn JN, Holmes G, Pfahringer B, Vanschoren J (2015) Case study on bagging stable classifiers for data streams. In: BENELEARNGoogle Scholar
- 33.https://www.cs.purdue.edu/commugrate/data/credit_card/. Accessed June 2016
- 34.Wang S, Yao X (2009) Diversity analysis on imbalanced data sets by using ensemble models. In: Computational intelligence and data mining, CIDM’09, pp 324–331Google Scholar