Abstract
This study intends to develop a methodology of fraudulent transaction detection model. The algorithm of XGBoost integrating the techniques of SMOTE sampling method and Bayesian Hyperparameter Optimization, is proposed to separate fraud transactions from non-fraud transactions. The experimental results based on the public data set of financial statement fraud from Kaggle website show the proposed model is better than the commonly used binary-classification methods, such as Logistic Regression, SVM, KNN, Random Forest, XGBoost without Hyperparameter Tuning and Multilayer Perceptron. The method of establishing fraud detection models assists people who lack the machine learning domain expertise for the modeling and tuning parameter techniques. It can help to detect abnormal transactions as early as possible and carry out risk management for banking industry.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Apparao, G., Singh, A., Rao, G.S., Bhavani, B.L., Eswar, K., Rajani, D.: Financial statement fraud detection by data mining. Corp. Gov. 3(1), 159–163 (2009)
Baesens, B., Höppner, S., Ortner, I., Verdonck, T.: robROSE: a robust approach for dealing with imbalanced data in fraud detection. Stat. Methods Appl. 30(3), 841–861 (2021). https://doi.org/10.1007/s10260-021-00573-7
Bergstra, J., Komer, B., Eliasmith, C., Yamins, D., Cox, D.D.: Hyperopt: a python library for model selection and hyperparameter optimization. Comput. Sci. Discov. 8(1), 014008 (2015)
Berthold, M.R., Huber, K.P.: From radial to rectangular basis functions: a new approach for rule learning from large datasets. Technical report, University of Karlsruhe (1995)
cg2010studio: Support vector machine, May 2012. https://cg2010studio.com/2012/05/20/%E6%94%AF%E6%8C%81%E5%90%91%E9%87%8F%E6%A9%9F%E5%99%A8-support-vector-machine/. Accessed 3 Nov 2021
Chen, S., Yang, A.: An effective financial statements fraud detection model. DEStech Trans. Eng. Technol. Res. (pmsms) (2018)
Chen, T., Guestrin, C.: XGBoost: a scalable tree boosting system. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 785–794, August 2016
Choi, D., Lee, K.: An artificial intelligence approach to financial fraud detection under IoT environment: a survey and implementation. Secur. Commun. Netw. (2018)
Davis, J., Goadrich, M.: The relationship between precision-recall and ROC curves. In: Proceedings of the 23rd International Conference on Machine Learning, pp. 233–240, June 2006
Frazier, P.I.: A tutorial on Bayesian optimization. arXiv preprint arXiv:1807.02811 (2018)
Géron, A.: Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow: Concepts, Tools, and Techniques to Build Intelligent Systems. O’Reilly Media, Newton (2019)
He, H., Garcia, E.A.: Learning from imbalanced data. IEEE Trans. Knowl. Data Eng. 21(9), 1263–1284 (2009)
Hosmer Jr., D.W., Lemeshow, S., Sturdivant, R.X.: Applied Logistic Regression, vol. 398. Wiley, New York (2013)
Koehrsen, W.: Random-forest-simple-explanation, December 2017. https://medium.com/williamkoehrsen/random-forest-simple-explanation-377895a60d2d/. Accessed 3 Nov 2021
Lavion, D.: PwC’s global economic crime and fraud survey (2020). https://www.pwc.com/gx/en/forensics/gecs-2020/pdf/global-economic-crime-and-fraud-survey-2020.pdf. Accessed 3 Nov 2021
Charles, L.: Data-competition-From-0-to-1, August 2019. https://www.cnblogs.com/LCharles/p/11385574.html. Accessed 3 Nov 2021
Vannucci, M., Colla, V., Nastasi, G., Matarese, N.: Detection of rare events within industrial datasets by means of data resampling and specific algorithms. Int. J. Simul. Syst. Sci. Technol. 11(3), 1–11 (2010)
Vannucci, M., Colla, V., Sgarbi, M., Toscanelli, O.: Thresholded neural networks for sensitive industrial classification tasks. In: Proceedings of International Work Conference on Artificial Neural Networks, pp. 1320–1327 (2009)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Hwang, TK., Chen, WC., Chiang, WC., Li, YM. (2022). Machine Learning Detection for Financial Statement Fraud. In: Rocha, A., Adeli, H., Dzemyda, G., Moreira, F. (eds) Information Systems and Technologies. WorldCIST 2022. Lecture Notes in Networks and Systems, vol 469. Springer, Cham. https://doi.org/10.1007/978-3-031-04819-7_16
Download citation
DOI: https://doi.org/10.1007/978-3-031-04819-7_16
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-04818-0
Online ISBN: 978-3-031-04819-7
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)