Machine Learning Detection for Financial Statement Fraud

Hwang, Ting-Kai; Chen, Wei-Chun; Chiang, Wan-Chi; Li, Yung-Ming

doi:10.1007/978-3-031-04819-7_16

Ting-Kai Hwang¹³,
Wei-Chun Chen¹⁴,
Wan-Chi Chiang¹⁴ &
…
Yung-Ming Li¹⁴

Part of the book series: Lecture Notes in Networks and Systems ((LNNS,volume 469))

Included in the following conference series:

World Conference on Information Systems and Technologies

1423 Accesses

Abstract

This study intends to develop a methodology of fraudulent transaction detection model. The algorithm of XGBoost integrating the techniques of SMOTE sampling method and Bayesian Hyperparameter Optimization, is proposed to separate fraud transactions from non-fraud transactions. The experimental results based on the public data set of financial statement fraud from Kaggle website show the proposed model is better than the commonly used binary-classification methods, such as Logistic Regression, SVM, KNN, Random Forest, XGBoost without Hyperparameter Tuning and Multilayer Perceptron. The method of establishing fraud detection models assists people who lack the machine learning domain expertise for the modeling and tuning parameter techniques. It can help to detect abnormal transactions as early as possible and carry out risk management for banking industry.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 259.00; Price excludes VAT (USA)

Softcover Book: USD 329.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Apparao, G., Singh, A., Rao, G.S., Bhavani, B.L., Eswar, K., Rajani, D.: Financial statement fraud detection by data mining. Corp. Gov. 3(1), 159–163 (2009)
Google Scholar
Baesens, B., Höppner, S., Ortner, I., Verdonck, T.: robROSE: a robust approach for dealing with imbalanced data in fraud detection. Stat. Methods Appl. 30(3), 841–861 (2021). https://doi.org/10.1007/s10260-021-00573-7
Article MathSciNet MATH Google Scholar
Bergstra, J., Komer, B., Eliasmith, C., Yamins, D., Cox, D.D.: Hyperopt: a python library for model selection and hyperparameter optimization. Comput. Sci. Discov. 8(1), 014008 (2015)
Article Google Scholar
Berthold, M.R., Huber, K.P.: From radial to rectangular basis functions: a new approach for rule learning from large datasets. Technical report, University of Karlsruhe (1995)
Google Scholar
cg2010studio: Support vector machine, May 2012. https://cg2010studio.com/2012/05/20/%E6%94%AF%E6%8C%81%E5%90%91%E9%87%8F%E6%A9%9F%E5%99%A8-support-vector-machine/. Accessed 3 Nov 2021
Chen, S., Yang, A.: An effective financial statements fraud detection model. DEStech Trans. Eng. Technol. Res. (pmsms) (2018)
Google Scholar
Chen, T., Guestrin, C.: XGBoost: a scalable tree boosting system. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 785–794, August 2016
Google Scholar
Choi, D., Lee, K.: An artificial intelligence approach to financial fraud detection under IoT environment: a survey and implementation. Secur. Commun. Netw. (2018)
Google Scholar
Davis, J., Goadrich, M.: The relationship between precision-recall and ROC curves. In: Proceedings of the 23rd International Conference on Machine Learning, pp. 233–240, June 2006
Google Scholar
Frazier, P.I.: A tutorial on Bayesian optimization. arXiv preprint arXiv:1807.02811 (2018)
Géron, A.: Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow: Concepts, Tools, and Techniques to Build Intelligent Systems. O’Reilly Media, Newton (2019)
Google Scholar
He, H., Garcia, E.A.: Learning from imbalanced data. IEEE Trans. Knowl. Data Eng. 21(9), 1263–1284 (2009)
Article Google Scholar
Hosmer Jr., D.W., Lemeshow, S., Sturdivant, R.X.: Applied Logistic Regression, vol. 398. Wiley, New York (2013)
Google Scholar
Koehrsen, W.: Random-forest-simple-explanation, December 2017. https://medium.com/williamkoehrsen/random-forest-simple-explanation-377895a60d2d/. Accessed 3 Nov 2021
Lavion, D.: PwC’s global economic crime and fraud survey (2020). https://www.pwc.com/gx/en/forensics/gecs-2020/pdf/global-economic-crime-and-fraud-survey-2020.pdf. Accessed 3 Nov 2021
Charles, L.: Data-competition-From-0-to-1, August 2019. https://www.cnblogs.com/LCharles/p/11385574.html. Accessed 3 Nov 2021
Vannucci, M., Colla, V., Nastasi, G., Matarese, N.: Detection of rare events within industrial datasets by means of data resampling and specific algorithms. Int. J. Simul. Syst. Sci. Technol. 11(3), 1–11 (2010)
Google Scholar
Vannucci, M., Colla, V., Sgarbi, M., Toscanelli, O.: Thresholded neural networks for sensitive industrial classification tasks. In: Proceedings of International Work Conference on Artificial Neural Networks, pp. 1320–1327 (2009)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Journalism, Ming Chuan University, Taoyuan, Taiwan
Ting-Kai Hwang
Institute of Information Management, National Yang Ming Chiao Tung University, Hsinchu, Taiwan
Wei-Chun Chen, Wan-Chi Chiang & Yung-Ming Li

Authors

Ting-Kai Hwang
View author publications
You can also search for this author in PubMed Google Scholar
Wei-Chun Chen
View author publications
You can also search for this author in PubMed Google Scholar
Wan-Chi Chiang
View author publications
You can also search for this author in PubMed Google Scholar
Yung-Ming Li
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ting-Kai Hwang .

Editor information

Editors and Affiliations

ISEG, Universidade de Lisboa, Lisboa, Portugal
Alvaro Rocha
College of Engineering, The Ohio State University, Columbus, OH, USA
Hojjat Adeli
Institute of Data Science and Digital Te, Vilnius University, Vilnius, Lithuania
Gintautas Dzemyda
DCT, Universidade Portucalense, PORTO, Portugal
Fernando Moreira

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Hwang, TK., Chen, WC., Chiang, WC., Li, YM. (2022). Machine Learning Detection for Financial Statement Fraud. In: Rocha, A., Adeli, H., Dzemyda, G., Moreira, F. (eds) Information Systems and Technologies. WorldCIST 2022. Lecture Notes in Networks and Systems, vol 469. Springer, Cham. https://doi.org/10.1007/978-3-031-04819-7_16

Download citation

DOI: https://doi.org/10.1007/978-3-031-04819-7_16
Published: 17 May 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-04818-0
Online ISBN: 978-3-031-04819-7
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics