Abstract
Credit card fraud has adversely impacted market economic order and has broken stakeholders, financial entities, and consumers’ trust and interest. Card fraud losses are increasing annually and billions of dollars are being lost. Therefore, this work provides a framework for fraud card detection to be tackled efficiently. Recently, the imbalanced dataset for fraud card transactions due to the number of ordinary transactions being far greater than the amount of fraud. Before solving the fraud problem, we first have to solve the imbalanced data problem which occurred when one class considerably outnumbers the examples of the other class. So, the classification of fraud come to be very tough as the result may get biased towards the majority group. Thus, this paper aims firstly to use hybrid sampling and oversampling preprocessing techniques to solve the imbalanced data problem, and secondly to resolve the fraud. The performance of the proposed framework is estimated based on different metrics accuracy, precision, and recall in comparing existing algorithms such as KNN, LR, LDA, NB, and CART. The obtained results revealing that when the data is highly imbalanced, the model strives to detect fraudulent transactions. Besides, it can predict positive classes improved significantly, reaching an accuracy of 99.9.
Similar content being viewed by others
References
Aditsania A, Saonard AL (2017) Handling imbalanced data in churn prediction using ADASYN and backpropagation algorithm. In: 2017 3rd international conference on science in information technology (ICSITech), pp 533–536. https://doi.org/10.1109/ICSITech.2017.8257170
Ali H, Salleh M, Saedudin R, Hussain K, Mushtaq M (2019) Imbalance class problems in data mining: A review. Indones J Electr Eng Comput Sci 14. https://doi.org/10.11591/ijeecs.v14.i3.pp1552-1563
Ali A, et al. (2019) "Leveraging spatio-temporal patterns for predicting citywide traffic crowd flows using deep hybrid neural networks." 2019 IEEE 25th International Conference on Parallel and Distributed Systems (ICPADS). IEEE
Ali A, Zhu Y, Zakarya M (2021) A data aggregation based approach to exploit dynamic spatio-temporal correlations for citywide crowd flows prediction in fog computing. Multimed Tools Appl 80(20):31401–31433
Ali A, Zhu Y, Zakarya M (2021) Exploiting dynamic spatio-temporal correlations for citywide traffic flow prediction using attention based neural networks. Inf Sci 577:852–870
Ali A, Zhu Y, Zakarya M (2022) Exploiting dynamic spatio-temporal graph convolutional neural networks for citywide traffic flows prediction. Neural Netw 145:233–247
https://www.kaggle.com/rafjaa/resampling-strategies-for-imbalanced-datasets. Accessed 27 May 2020
Batista G, Prati R, Monard M-C (2004) A Study of the Behavior of Several Methods for Balancing machine Learning Training Data. SIGKDD Explorations 6:20–29. https://doi.org/10.1145/1007730.1007735
Bhattacharyya S, Jha S, Tharakunnel K, Westland J (2011) Data mining for credit card fraud: A comparative study. Decis Support Syst 50:602–613. https://doi.org/10.1016/j.dss.2010.08.008
Ebenuwa SH, Sharif MS, Alazab M, al-Nemrat A (2019) Variance ranking attributes selection techniques for binary classification problem in imbalance data. IEEE Access 7:24649–24666
Hakak S, Alazab M, Khan S, Gadekallu TR, Maddikunta PKR, Khan WZ (2021) An ensemble machine learning approach through effective feature extraction to classify fake news. Futur Gener Comput Syst 117:47–58
Han H, Wang WY, Mao BH (2005) Borderline-SMOTE: A New Over-Sampling Method in Imbalanced Data Sets Learning. In: Huang DS, Zhang XP, Huang GB (eds) Advances in Intelligent Computing. ICIC 2005. Lecture notes in computer science, vol 3644. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11538059_91
He H, Bai Y, Garcia E, Li S (2008) ADASYN: Adaptive Synthetic Sampling Approach for Imbalanced Learning. Proceedings of the International Joint Conference on Neural Networks:1322–1328. https://doi.org/10.1109/IJCNN.2008.4633969
Hou Y, Li B, Li L, Liu J (2019) A density-based under-sampling algorithm for imbalance classification. In: Journal of Physics: Conference Series (Vol. 1302, No. 2, p 022064). IOPPublishing.. https://doi.org/10.1088/1742-6596/1302/2/022064
Jose G, Moreno-Torres TR, Alaiz-Rodríguez R, Chawla NV, Herrera F (2012) A unifying view on dataset shift in classification. Pattern Recognit 45(1):521–530 ISSN 0031-3203
Karim A, Azam S, Shanmugam B, Kannoorpatti K, Alazab M (2019) A comprehensive survey for intelligent spam email detection. IEEE Access 7:168261–168295
Keswani B, Vijay P, Nayak N, Keswani P, Dash S, Sahoo L, Mishra T, Mohapatra A (2020) Adapting Machine Learning Techniques for Credit Card Fraud Detection. https://doi.org/10.1007/978-981-15-1286-5_38
López García Pedro & Masegosa, Antonio & Onieva, Enrique & Osaba, Eneko. (2018). Ensemble and fuzzy techniques applied to imbalanced traffic congestion Datasets: A Comparative Study. https://doi.org/10.1007/978-3-319-91641-5_16
Parkinson de Castro E (2020) An examination of the smote and other smote-based techniques that use synthetic data to oversample the minority class in the context of credit-card fraud classification. Masters Dissertation. Tech Univ Dublin. https://doi.org/10.21427/wj33-n221
Parthasarathy G, Lakshmanan R, JustinDhas Y, Saravanakumar J, Darwin J (2019) Comparative case study of machine learning classification techniques using imbalanced credit card fraud datasets. SSRN Electron J. https://doi.org/10.2139/ssrn.3351584
Pattanayak S, Rout M (2018) Experimental Comparison of Sampling Techniques for Imbalanced Datasets Using Various Classification Models. https://doi.org/10.1007/978-981-10-6875-1_2
Reddy T, et al. (2020) "Antlion re-sampling based deep neural network model for classification of imbalanced multimodal stroke dataset." Multimed Tools Appl: 1–25
Roberston D (2021) The Nelson Report. Available online: https://nilsonreport.com/content_promo.php?id_promo=16 [Last accessed: 18 March 2021)
Singh P, Kar A, Singh Y, Kolekar M, Tanwar S (2020) Recent Innovations in Computing: Proceedings of ICRIC 2019 P. 209:221
Tang MJ, Alazab M, Luo Y (2017) Big data for cybersecurity: vulnerability disclosure trends and dependencies. IEEE Transactions on Big Data 5(3):317–329
Soh WW, Yusuf RM. (2019). Predicting credit card fraud on a imbalanced data. Int J Data Sci Adv Anal 1(1):12–17
Huang Z, Yang C, Chen X, Huang K, Xie Y (2019) Adaptive over-sampling method for classification with application to imbalanced datasets in aluminum electrolysis. Neural Comput Appl 32:7183–7199. https://doi.org/10.1007/s00521-019-04208-7
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Abd El-Naby, A., Hemdan, E.ED. & El-Sayed, A. An efficient fraud detection framework with credit card imbalanced data in financial services. Multimed Tools Appl 82, 4139–4160 (2023). https://doi.org/10.1007/s11042-022-13434-6
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-022-13434-6