Abstract
Imbalanced data is recognized as one of the most attractive matters to many researches. It is shown by numerous publications on this which is a growing interest. The hardest challenge is the failure of generalizing inductive rules by learning algorithms. such as difficulty in forming good classification on decision boundary over more features but fewer samples and risk of overfitting of the sampling. So many solutions have been applied to deal with these problems. In our article, we propose a novel method called MASI (Moving to Adaptive Samples in Imbalanced) in term of changing majority class samples’ label into minor class samples based on data distribution. This proposed method rebalances the classes before training a model in order to improve the classification performance in imbalanced data. We tested on some unbalanced datasets from data of UCI. The empirical results showed that our method has a significant achievement in Sensitivity and G-mean values than other classification models, such as Random Over-sampling, Random Under-sampling, SMOTE, and Borderline SMOTE in using different machine learning approaches, including SVM, C5.0, and RF.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Matías, M.D., Federico, D., Juan, M.: Improving electric fraud detection using class imbalance strategies. In: International Conference on Pattern Recognition Applications and Methods (ICPRAM), pp. 135–141 (2012)
Japkowicz, N., Stephen, S.: The class imbalance problem: a systematic study. Intell. Data Anal. 6, 429–449 (2002)
Nistesh, C.V.: Data mining for imbalanced datasets: an overview. In: Data Mining and Knowledge Discovery Handbook, pp. 853–868. Springer, Boston (2005)
Rafiq, M.A., Kok, W.W., Mohd, S.F., Xuequn, W.: Improving fraud prediction with incremental data balancing technique for massive data streams. CoRR, pp. 1–8 (2019)
Fei, W., Xiao-Yuan, J., Shiguang, S., Wangmeng, Z., Jing-Yu, Y.: Multiset feature learning for highly imbalanced data classification. In: Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence (AAAI), pp. 1583–1589 (2017)
Chao, C., Andy, L., Leo, B.: Using random forest to learn imbalanced data, pp. 1–12. University of California, Berkeley (2004)
Qiang, Y., Xindong, W.: 10 challenging problems in data mining research. Int. J. Inf. Technol. Decis. Mak. 5(4), 597–604 (2006)
Enislay, R., Yailé, C., Rafael, B.: SMOTE-RSB∗: a hybrid preprocessing approach based on oversampling and undersampling for high imbalanced data_sets using SMOTE and rough sets theory. Knowl. Inf. Syst. 33(2), 245–265 (2011)
Dang, T.X., Tran, H.D., Osamu, H., Kenji, S.: SPY: a novel resampling method for improving classification performance in imbalanced data. In: Seventh International Conference on Knowledge and Systems Engineering (KSE), pp. 280–285 (2015)
Yanmin, S., Andrew, W.K., Mohamed, K.S.: Classification of imbalanced data: a review. Int. J. Pattern Recognit. Artif. Intell. 23(4), 687–719 (2009)
Lich, N.T., Thuy, N.T., Toan, N.T.: MASI: moving to adaptive samples in imbalanced credit card dataset for classification. In: International Conference on Innovative Research and Development (ICIRD), pp. 133–137 (2018)
Alireza, P., Majid, K., Alireza, N.: Fraud detection in E-banking by using the hybrid feature selection and evolutionary algorithms. IJCSNS Int. J. Comput. Sci. Netw. Secur. 17(8), 271–279 (2017)
Aastha, B., Rajan, G.: Financial frauds: data mining based detection – a comprehensive survey. Int. J. Comput. Appl. 156(10), 20–28 (2016)
Anuj, S., Prabin, P.K.: A review of financial accounting fraud detection based on data mining techniques. Int. J. Comput. Appl. 39(1), 37–47 (2012)
Kaizhu, H., Haiqin, Y., Irwin, K., Michael, L.: Machine Learning: Modeling Data Locally and Globally. Springer (2008)
Federica, M., Marco, B., Gianfranco, B., Francesca, C.: Peculiar genes selection: a new features selection method to improve classification performances in imbalanced data sets. PLoS ONE 12, 1–18 (2017)
Alberto, F., Salvador, G., Francisco, H.: SMOTE for learning from imbalanced data: progress and challenges, marking the 15-year anniversary. J. Artif. Intell. Res. 61, 863–905 (2018)
Arnis, K., Sergei, P., Henrihs, G.: Entropy-based classifier enhancement to handle imbalanced class problem. Procedia Comput. Sci. 104, 586–591 (2017)
Haibo, H., Yang, B., Edwardo, G.A., Shutao, L.: ADASYN: adaptive synthetic sampling approach for imbalanced learning. In: IEEE International Joint Conference on Neural Networks, pp. 1322–1328 (2008)
Nitesh, C.V., Kevin, B.W., Lawrence, H.O., Philip, K.W.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002)
Ivan, T.: Two modifications of CNN. Trans. Syst. Man Commun. 6(11), 769–772 (1976)
Masoumeh, Z., Pourya, S.: Application of credit card fraud detection: based on bagging ensemble classifier. In: International Conference on Computer, Communication and Convergence (ICCC), vol. 48, pp. 679–685 (2015)
Sheng, G., Min, C., Hsin, Y.H., Shu, C.C., Mei, S.L., Chengde, Z.: Deep learning with MCA-based instance selection and bootstrapping for imbalanced data classification. In: IEEE Conference on Collaboration and Internet Computing (CIC), pp. 288–295 (2015)
Reshma, D.K., Banait, S.: Imbalanced time series data classification using oversampling technique. Int. J. Electron. Commun. Soft Comput. Sci. Eng. 75–80 (2015). ISSN 2277-947
Mousa, A.: Detecting financial fraud using data mining techniques: a decade review from 2004 to 2015. J. Data Sci. 14, 553–570 (2016)
Adrian, B.: Detecting and preventing fraud with data analytics. Procedia Econ. Financ. 32, 1827–1836 (2015)
Yiyang, B., Min, C., Chen, Y., Yuan, Y., Qing, L., Leon, Z., Liang, L.: Financial fraud detection: a new ensemble learning approach for imbalanced data. In: Pacifc Asia Conference on Information Systems (PACIS), pp. 315–326 (2016)
Rafiq, M.A., Kok, W.W., Mohd, S.F., Xuequn, W.: Scalable machine learning techniques for highly imbalanced credit card fraud detection: a comparative study. In: Pacific Rim International Conference on Artificial Intelligence (PRICAI), pp. 237–246 (2018)
Mario, A., Firas, M., Elli, A., Stefan, S., Andreas, M.: The random forest classifier in weka: discussion and new developments for imbalanced data. Comput. Vis. Pattern Recognit. 1–6 (2019)
Ludmila, K.I., Álvar, A.-G., José-Francisco, D.-P., Iain, G.A.D.: Instance selection improves geometric mean accuracy: a study on imbalanced data classification. Prog. Artif. Intell. 8, 215–228 (2018)
Ila, D., Shantanu, D., Bijan, R.: Detecting financial restatements using data mining techniques. Expert Syst. Appl. 93, 374–393 (2017)
Leila, G., Mohammad, T.J.: Survey of detecting fraud in automobile insurance using data mining techniques. Int. J. Comput. Inf. Technol. (IJOCIT) 4(4), 111–125 (2016)
Maciej, M.A., Piotr, H.A., Jacek, Z.M., Joseph, L.Y., Jay, B.A.: Training neural network classifiers for medical decision making: the effects of imbalanced datasets on classification performance. Neural Netw. 21, 427–436 (2008)
Masoumeh, Z., Pourya, S., Deepak, J.K., Haoxiang, W.: Kernelized support vector machine with deep learning: an efficient approach for extreme multiclass dataset. Pattern Recognit. Lett. 115, 4–13 (2018)
Wei-Chao, L., Shih-Wen, K., Chih-Fong, T.: Top 10 data mining techniques in business applications: a brief survey. Kybernetes 46(7), 1158–1170 (2017)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Nghiem, TL., Nghiem, TT. (2020). Applying MASI Algorithm to Improve the Classification Performance of Imbalanced Data in Fraud Detection. In: Le Thi, H., Le, H., Pham Dinh, T., Nguyen, N. (eds) Advanced Computational Methods for Knowledge Engineering. ICCSAMA 2019. Advances in Intelligent Systems and Computing, vol 1121. Springer, Cham. https://doi.org/10.1007/978-3-030-38364-0_14
Download citation
DOI: https://doi.org/10.1007/978-3-030-38364-0_14
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-38363-3
Online ISBN: 978-3-030-38364-0
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)