Using Active Learning Methods for Predicting Fraudulent Financial Statements

  • Stamatis Karlos
  • Georgios Kostopoulos
  • Sotiris Kotsiantis
  • Vassilis Tampakas
Conference paper
Part of the Communications in Computer and Information Science book series (CCIS, volume 744)


Detection of Fraudulent Financial Statements (FFS), or simpler fraud detection problem, refers to the falsification of financial statements with the aim either to demonstrate larger positive rates, such as assets and profit, or to conceal negative factors, such as expenses and losses. Since the expansion of contemporary markets and multinational trade are real phenomena, production of large volumes of data under which the operation of the current firms is facilitated constitutes a resulting consequence. Thus, analog upgrade of the antifraud mechanisms should be adopted, enabling the introduction of Machine Learning tools in the related field. However, because of the inability to collect trustworthy datasets that describe the corresponding ratios of a firm that has conducted fraud actions, strategies that exploit the existence of a few labeled instances for discovering useful patterns from a pool of unlabeled data could be proved really efficient. In this work, comparisons of algorithms that operate under Active Learning theory against their supervised variants are being conducted, using data extracted from Greek firms. To the best of our knowledge, this is the first study that uses Active Learning for predicting FFS. The obtained results prove the superior performance of the corresponding active learners.


Active learning theory Machine learning Fraud detection Financial ratios Classification accuracy 


  1. 1.
    Pigott, T.D.: A Review of Methods for Missing Data, vol. 7, no. 4, pp. 353–383 (2001)Google Scholar
  2. 2.
    Zhu, X., Goldberg, A.B.: Introduction to Semi-Supervised Learning, vol. 3, no. 1. Morgan & Claypool, San Rafael (2009)Google Scholar
  3. 3.
    Theodoridis, S., Koutroumbas, K.: Pattern recognition. Academic Press, Cambridge (2009)zbMATHGoogle Scholar
  4. 4.
    Dasgupta, S.: Two faces of active learning. Theor. Comput. Sci. 412(19), 1767–1781 (2011)MathSciNetCrossRefzbMATHGoogle Scholar
  5. 5.
    Coderre, D.: Computer-Aided Fraud Prevention & Detection. Wiley, Hoboken (2009)Google Scholar
  6. 6.
    Youngblood, J.: Fraud Identification and Prevention. CRC Press, Boca Raton (2015)Google Scholar
  7. 7.
    Rezaee, Z.: Financial Statement Fraud: Prevention and Detection. Wiley, Hoboken (2002)Google Scholar
  8. 8.
    Rezaee, Z., Riley, R.: Financial Statement Fraud Prevention and Detection. Wiley, Hoboken (2009)Google Scholar
  9. 9.
    Koskivaara, E.: Artificial Neural Networks in Auditing: State of the Art. ICFAI J. Audit Pract. 1(4), 12–33 (2004)Google Scholar
  10. 10.
    Banarescu, A.: Detecting and preventing fraud with data analytics. Procedia Econ. Finan. 32, 1827–1836 (2015)CrossRefGoogle Scholar
  11. 11.
    Bao, Y., Ke, B., Li, B., Yu, J., Zhang, J.: Detecting accounting frauds in publicly traded U.S. firms: new perspective and new method, vol. 45, pp. 173–188 (2015)Google Scholar
  12. 12.
    Altman, E.I., Marco, G., Varetto, F.: Corporate distress diagnosis: Comparisons using linear discriminant analysis and neural networks (the Italian experience). J. Bank. Financ. 18(3), 505–529 (1994)CrossRefGoogle Scholar
  13. 13.
    Yoon, Y., Guimaraes, T., Swales, G.: Integrating artificial neural networks with rule-based expert systems. Decis. Support Syst. 11(5), 497–507 (1994)CrossRefGoogle Scholar
  14. 14.
    Green, B.P., Choi, J.H.: Assessing the risk of management fraud through neural network technology. Audit. A J. Pract. Theory 16(1), 14–28 (1997)Google Scholar
  15. 15.
    Calderon, T.G., Cheh, J.J.: A roadmap for future neural networks research in auditing and risk assessment. Int. J. Account. Inf. Syst. 3(4), 203–236 (2002)CrossRefGoogle Scholar
  16. 16.
    Spathis, C.T.: Detecting false financial statements using published data: some evidence from Greece. Manag. Audit. J. 17(4), 179–191 (2002)CrossRefGoogle Scholar
  17. 17.
    Spathis, C., Doumpos, M., Zopounidis, C.: Detecting falsified financial statements: a comparative study using multicriteria analysis and multivariate statistical techniques. Eur. Account. Rev. 11(3), 509–535 (2002)CrossRefGoogle Scholar
  18. 18.
    Omar, N., Amirah Johari, Z., Smith, M.: Predicting fraudulent financial reporting using artificial neural network. J. Financ. Crime Iss. 24(2), 362–387 (2017)CrossRefGoogle Scholar
  19. 19.
    Kotsiantis, S., Koumanakos, E., Tzelepis, D., Tampakas, V.: Predicting Fraudulent Financial Statements with Machine Learning Techniques, pp. 538–542. Springer, Heidelberg (2006)Google Scholar
  20. 20.
    Beneish, M.D.: The detection of earnings manipulation. Financ. Anal. J. 55(5), 24–36 (1999)CrossRefGoogle Scholar
  21. 21.
    Ravisankar, P., Ravi, V., Raghava Rao, G., Bose, I.: Detection of financial statement fraud and feature selection using data mining techniques. Decis. Support Syst. 50(2), 491–500 (2011)CrossRefGoogle Scholar
  22. 22.
    Aris, N.A., Arif, S.M.M., Othman, R., Zain, M.M.: Fraudulent financial statement detection using statistical techniques: the case of small medium automotive enterprise. J. Appl. Bus. Res. 31(4), 1469–1478 (2015)CrossRefGoogle Scholar
  23. 23.
    Chen, S., Goo, Y.J., Shen, Z.: A hybrid approach of stepwise regression, logistic regression, support vector machine, and decision tree for forecasting fraudulent financial statements. Sci. World J. 2014, 9 (2014)Google Scholar
  24. 24.
    Yeh, C.-C., Chi, D.-J., Lin, T.-Y., Chiu, S.-H.: A hybrid detecting fraudulent financial statements model using rough set theory and support vector machines. Cybern. Syst. 47(4), 261–276 (2016)CrossRefGoogle Scholar
  25. 25.
    Karlos, S., Fazakis, N., Kotsiantis, S., Sgarbas, K.: Semi-supervised forecasting of fraudulent financial statements. In: Proceedings of the 20th Pan-Hellenic Conference on Informatics, Article No. 34, pp. 1–6 (2016)Google Scholar
  26. 26.
    Alcalá-Fdez, J., Fernández, A., Luengo, J., Derrac, J., García, S., Sánchez, L., Herrera, F.: KEEL data-mining software tool: data set repository, integration of algorithms and experimental analysis framework. J. Mult. Log. Soft Comput. 17(2–3), 255–287 (2011)Google Scholar
  27. 27.
    Zhou, Z.-H.: Learning with Unlabeled Data and Its Application to Image Retrieval, pp. 5–10. Springer, Heidelberg (2006)Google Scholar
  28. 28.
    Kremer, J., Steenstrup Pedersen, K., Igel, C.: Active learning with support vector machines. Wiley Interdiscip. Rev Data Min. Knowl. Discov. 4(4), 313–326 (2014)CrossRefGoogle Scholar
  29. 29.
    Settles, B.: Active learning literature survey. Univ. Wis. Madison 52(55–66), 11 (2010)Google Scholar
  30. 30.
    Dwyer, K., Holte, R.: Decision tree instability and active learning. In: Kok, Joost N., Koronacki, J., Mantaras, RLd, Matwin, S., Mladenič, D., Skowron, A. (eds.) ECML 2007. LNCS, vol. 4701, pp. 128–139. Springer, Heidelberg (2007). doi: 10.1007/978-3-540-74958-5_15 CrossRefGoogle Scholar
  31. 31.
    Ramirez-Loaiza, M.E., Sharma, M., Kumar, G., Bilgic, M.: Active learning: an empirical study of common baselines. Data Min. Knowl. Discov. 31(2), 287–313 (2017)MathSciNetCrossRefGoogle Scholar
  32. 32.
    Shannon, C.E.: A mathematical theory of communication. ACM SIGMOBILE Mob. Comput. Commun. Rev. 5(1), 3 (2001)MathSciNetCrossRefGoogle Scholar
  33. 33.
    Kotsianits, S., Koumanakos, E., Tzelepis, D., Tampakas, V.: Forecasting fraudulent financial statements using data mining. IT Prof. 1(12) (2007)Google Scholar
  34. 34.
    Sikonja, M.R., Kononenko, I.: An adaptation of Relief for attribute estimation in regression. In: Proceedings of 14th International Conference on Machine Learning, pp. 296–304 (1997)Google Scholar
  35. 35.
    Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H.: The WEKA data mining software. ACM SIGKDD Explor. Newsl. 11(1), 10 (2009)CrossRefGoogle Scholar
  36. 36.
    Reyes, O., Pérez, E., Del, M., Rodríguez-Hernández, C., Fardoun, H.M., Ventura, S.: JCLAL: a Java framework for active learning. J. Mach. Learn. Res. 17, 1–5 (2016)MathSciNetzbMATHGoogle Scholar
  37. 37.
    Friedman, J., Hastie, T., Tibshirani, R.: Additive logistic regression: a statistical view of boosting. Ann. Stat. 38(2), 337–374 (1998)MathSciNetzbMATHGoogle Scholar

Copyright information

© Springer International Publishing AG 2017

Authors and Affiliations

  • Stamatis Karlos
    • 1
  • Georgios Kostopoulos
    • 2
  • Sotiris Kotsiantis
    • 2
  • Vassilis Tampakas
    • 1
  1. 1.Department of Computer Engineering InformaticsTechnical Educational Institute of Western GreeceAntirrionGreece
  2. 2.Educational Software Development Laboratory (ESDLab), Department of MathematicsUniversity of PatrasPatrasGreece

Personalised recommendations