Advertisement

Neural Computing and Applications

, Volume 31, Supplement 1, pp 3–14 | Cite as

Parallel and incremental credit card fraud detection model to handle concept drift and data imbalance

  • Akila SomasundaramEmail author
  • Srinivasulu Reddy
S.I. : Machine Learning Applications for Self-Organized Wireless Networks
  • 198 Downloads

Abstract

Real-time fraud detection in credit card transactions is challenging due to the intrinsic properties of transaction data, namely data imbalance, noise, borderline entities and concept drift. The advent of mobile payment systems has further complicated the fraud detection process. This paper proposes a transaction window bagging (TWB) model, a parallel and incremental learning ensemble, as a solution to handle the issues in credit card transaction data. TWB model uses a parallelized bagging approach, incorporated with an incremental learning model, cost-sensitive base learner and a weighted voting-based combiner to effectively handle concept drift and data imbalance. Experiments were performed with Brazilian Bank data and University of California, San Diego (UCSD) data, and results were compared with state-of-the-art models. Comparisons on Brazilian Bank data indicates increased fraud detection levels between 18–38% and 1.3–2 times lower cost levels, which exhibits the enhanced performances of TWB. Comparisons on UCSD data indicate improved precision levels ranging between 8 and 25%, indicating the robustness of the TWB model. Future extensions of the proposed model will be on incorporating feature engineering to improve performances.

Keywords

Fraud detection Cost-sensitive bagging Concept drift Incremental learning Data imbalance Bagging 

Notes

Acknowledgements

The authors would like to thank DEITY for the financial support extended under Visvesvaraya Ph.D. scheme (NITT/RO/DEITY-Ph.D. Cont. grant/2015-16). The authors would like to acknowledge the infrastructure support provided by the Massively Parallel Programming Laboratory (CUDA Teaching Centre), Machine Learning and Data Analytics Laboratory and Big Data Laboratory, Dept. of Computer Applications, NIT, Trichy. The authors would also like to thank Dr. Manoel Fernando Gadi, Univ. of Sao Paulo, Brazil and Dr. Neda Solatani, Amirkabir University of Technology, Iran, for providing the Brazilian Bank Dataset.

Compliance with ethical standards

Conflict of interest

The authors have no conflict of interest to declare.

References

  1. 1.
  2. 2.
  3. 3.
  4. 4.
    Somasundaram A, Reddy US (2017) Modelling a stable classifier for handling large scale data with noise and imbalance. In: Computational intelligence in data science (ICCIDS), pp 1–6Google Scholar
  5. 5.
    Akila S, Srinivasulu Reddy U (2016) Data imbalance: effects and solutions for classification of large and highly imbalanced data. In: Proceedings of ICRECT.16, pp 28–34Google Scholar
  6. 6.
    Michalski RS (1983) A theory and methodology of inductive learning, vol 20, no 2, Springer, New York, pp 83–134Google Scholar
  7. 7.
    Angluin D (1988) Queries and concept learning. Mach Learn 2(4):319342MathSciNetGoogle Scholar
  8. 8.
    Webb GI, Hyde R, Cao H, Nguyen HL, Petitjean F (2016) Characterizing concept drift. Data Min Knowl Discov 30(4):964–994MathSciNetCrossRefGoogle Scholar
  9. 9.
    Hoens TR, Polikar R, Chawla NV (2012) Learning from streaming data with concept drift and imbalance an overview. Prog Artif Intell 1(1):89–101CrossRefGoogle Scholar
  10. 10.
    Kim J, Choi K, Kim G, Suh Y (2012) Classification cost: An empirical comparison among traditional classifier, cost-sensitive classifier, and metacost. Expert Syst Appl 39(4):4013–4019CrossRefGoogle Scholar
  11. 11.
    Hassan D (2017) The impact of false negative cost on the performance of cost sensitive learning based on Bayes minimum risk. A case study in detecting fraudulent transactions. Int J Intell Syst Appl 9(2):18Google Scholar
  12. 12.
    Zareapoor M, Yang J (2017) A novel strategy for mining highly imbalanced data in credit card transactions. Intell Autom Soft Comput.  https://doi.org/10.1080/10798587.2017.1321228 Google Scholar
  13. 13.
    Seeja KR, Zareapoor M (2014) FraudMiner: a novel credit card fraud detection model based on frequent item set mining. Sci World J 2014:1–10CrossRefGoogle Scholar
  14. 14.
    Hegazy M, Madian A, Ragaie M (2016) Enhanced fraud miner: credit card fraud detection using clustering data mining techniques. Egypt Comput Sci J ISSN 40(03):11102586Google Scholar
  15. 15.
    Gadi MF, Wang X, do Lago AP (2008) Credit card fraud detection with artificial immune system. In: International conference on artificial immune systems, Springer, Berlin, pp 119–131Google Scholar
  16. 16.
    Halvaiee NS, Akbari MK (2014) A novel model for credit card fraud detection using artificial immune systems. Appl Soft Comput 24:40–49CrossRefGoogle Scholar
  17. 17.
    Ghobadi F Fahimeh, Mohsen Rohani M (2016) Cost sensitive modeling of credit card fraud using neural network strategy. In: International conference of signal processing and intelligent systems (ICSPIS), IEEE, pp 1–5Google Scholar
  18. 18.
    Bahnsen AC, Correa Alejandro, Aleksandar Stojanovic A, Djamila Aouada D, Bjorn Ottersten B (2013) Cost sensitive credit card fraud detection using Bayes minimum risk. In: 12th international conference on machine learning and applications (ICMLA), vol 1, pp 333–338Google Scholar
  19. 19.
    Bahnsen AC, Correa Alejandro, Aleksandar Stojanovic A, Djamila Aouada D, Bjorn Ottersten B (2014) Improving credit card fraud detection with calibrated probabilities. In: Proceedings of the 2014 SIAM international conference on data mining, pp 677–685Google Scholar
  20. 20.
    Bahnsen AC, Correa Alejandro, Djamia Aouada D, Bjorn Ottersten B (2014) Example-dependent cost-sensitive logistic regression for credit scoring. In: 13th international conference on in machine learning and applications (ICMLA), pp 263–269Google Scholar
  21. 21.
    Bahnsen AC, Correa Alejandro, Djamila Aouada D, Aleksandar Stojanovic A, Bjorn Ottersten B (2016) Feature engineering strategies for credit card fraud detection. Expert Syst Appl 51:134–142CrossRefGoogle Scholar
  22. 22.
    Dal Pozzolo A, Boracchi G, Caelen O, Alippi C, Bontempi G (2015) Credit card fraud detection and concept-drift adaptation with delayed supervised information. In: 2015 international joint conference on Neural networks (IJCNN), pp 1–8Google Scholar
  23. 23.
    Tennant M, Stahl F, Rana O, Gomes JB (2017) Scalable real-time classification of data streams with concept drift. Future Gener Comput Syst 75:187–199CrossRefGoogle Scholar
  24. 24.
    Wozniak MK, Sieniewicz P, Cyganek B, Kasprzak A, Walkowiak K (2016) Active learning classification of drifted streaming data. Proced Comput Sci 80:1724–1733CrossRefGoogle Scholar
  25. 25.
    Brzezinski D Dariusz (2010) Mining data streams with concept drift. PhD dissertation, Masters thesis, Poznan University of TechnologyGoogle Scholar
  26. 26.
    Barddal JP, Gomes HM, Enembreck F, Pfahringer B (2017) A survey on feature drift adaptation: definition, benchmark, challenges and future directions. J Syst Softw 127:278–294CrossRefGoogle Scholar
  27. 27.
    Iniguez J, Hansen A, Perez I, Langham C, Rivera J, Sanchez J, Acuna J (2006) On division in extreme and mean ratio and its connection to a particular re-expression of the golden quadratic equation \(x^ 2-x- 1= 0\). Nexus Netw J 8(2):93–100CrossRefzbMATHGoogle Scholar
  28. 28.
    Carcillo F, DalPozzolo A, Le Borgne YA, Caelen O, Mazzer Y, Bontempi G (2018) Scarff: a scalable framework for streaming credit card fraud detection with spark. Inf Fusion 41:182–194CrossRefGoogle Scholar
  29. 29.
    Brzezinski D, Stefanowski J (2014) Combining block-based and online methods in learning ensembles from concept drifting data streams. Inf Sci 265:50–67MathSciNetCrossRefzbMATHGoogle Scholar
  30. 30.
    Bauer Eric E, Ron Kohavi R (1999) An empirical comparison of voting classification algorithms: bagging, boosting, and variants. Mach Learn 36(1):105–139CrossRefGoogle Scholar
  31. 31.
    van Rijn JN, Holmes G, Pfahringer B, Vanschoren J (2015) Case study on bagging stable classifiers for data streams. In: BENELEARNGoogle Scholar
  32. 32.
    Bayes T (1970) An essay towards solving a problem in the doctrine of chances.  C. Davis, Printer R Soc London (London, U. K) 1:134–153zbMATHGoogle Scholar
  33. 33.
  34. 34.
    Wang S, Yao X (2009) Diversity analysis on imbalanced data sets by using ensemble models. In: Computational intelligence and data mining, CIDM’09, pp 324–331Google Scholar
  35. 35.
    Tao D, Tang X, Li X, Wu X (2006) Asymmetric bagging and random subspace for support vector machines-based relevance feedback in image retrieval. IEEE Trans Pattern Anal Mach Intell 28(7):1088–1099CrossRefGoogle Scholar
  36. 36.
    Hido S, Kashima H, Takahashi Y (2009) Roughly balanced bagging for imbalanced data. Stat Anal Data Min ASA Data Sci J 2(56):412–426MathSciNetCrossRefGoogle Scholar
  37. 37.
    Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP (2002) SMOTE: synthetic minority over-sampling technique. J Artif Intell Res 16:321–357CrossRefzbMATHGoogle Scholar
  38. 38.
    Elhoseny M, Tharwat A, Yuan X, Hassanien AE (2018) Optimizing K-coverage of mobile WSNs. Expert Syst Appl 92:142–153.  https://doi.org/10.1016/j.eswa.2017.09.008 CrossRefGoogle Scholar
  39. 39.
    Elsayed W, Elhoseny M, Sabbeh S, Riad A (2017) Self-maintenance model for wireless sensor networks. Comput Electr Eng.  https://doi.org/10.1016/j.compeleceng.2017.12.022 Google Scholar

Copyright information

© The Natural Computing Applications Forum 2018

Authors and Affiliations

  1. 1.Department of Computer ApplicationsNational Institute of TechnologyTrichyIndia

Personalised recommendations