Word Categorization of Corporate Annual Reports for Bankruptcy Prediction by Machine Learning Methods

  • Petr HájekEmail author
  • Vladimír Olej
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9302)


The language of company related documents is recognized as being an important indicator of future financial performance. This study aims to extract various word categories from corporate annual reports and examine their effect on bankruptcy prediction. We show that the language used by bankrupt companies is characterized by stronger tenacity, accomplishment, familiarity, present concern, exclusion and denial. Bankrupt companies also use more modal, positive, uncertain and negative language. We used neural networks, support vector machines, decision trees and ensembles of decision trees to predict corporate bankruptcy. The prediction models utilized both financial indicators and word categorizations as input variables. We show that both general dictionary and financial dictionary categories can significantly improve the accuracy of the prediction models.


Bankruptcy prediction Word categorization Sentiment analysis Machine learning Meta-learning 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Kirkos, E.: Assessing Methodologies for Intelligent Bankruptcy Prediction. Artificial Intelligence Review 43(1), 83–123 (2015)CrossRefGoogle Scholar
  2. 2.
    Huang, S.M., Tsai, C.F., Yen, D.C., Cheng, Y.L.: A Hybrid Financial Analysis Model for Business Failure Prediction. Expert Systems with Applications 35(3), 1034–1040 (2008)CrossRefGoogle Scholar
  3. 3.
    Chaudhuri, A., De, K.: Fuzzy Support Vector Machine for Bankruptcy Prediction. Applied Soft Computing 11(2), 2472–2486 (2011)CrossRefGoogle Scholar
  4. 4.
    Alfaro, E., García, N., Gámez, M., Elizondo, D.: Bankruptcy Forecasting: An Empirical Comparison of AdaBoost and Neural Networks. Decision Support Systems 45(1), 110–122 (2008)CrossRefGoogle Scholar
  5. 5.
    Verikas, A., Kalsyte, Z., Bacauskiene, M., Gelzinis, A.: Hybrid and Ensemble-based Soft Computing Techniques in Bankruptcy Prediction: A Survey. Soft Computing 14(9), 995–1010 (2010)CrossRefGoogle Scholar
  6. 6.
    Heo, J., Yang, J.Y.: AdaBoost Based Bankruptcy Forecasting of Korean Construction Companies. Applied Soft Computing 24, 494–499 (2014)CrossRefGoogle Scholar
  7. 7.
    Loughran, T., McDonald, B.: When is a Liability not a Liability? Textual Analysis, Dictionaries, and 10-Ks. The Journal of Finance 66(1), 35–65 (2011)CrossRefGoogle Scholar
  8. 8.
    Cecchini, M., Aytug, H., Koehler, G.J., Pathak, P.: Making Words Work: Using Financial Text as a Predictor of Financial Events. Decision Support Systems 50(1), 164–175 (2010)CrossRefGoogle Scholar
  9. 9.
    Shirata, C.Y., Takeuchi, H., Ogino, S., Watanabe, H.: Extracting Key Phrases as Predictors of Corporate Bankruptcy: Empirical Analysis of Annual Reports by Text Mining. Journal of Emerging Technologies in Accounting 8(1), 31–44 (2011)CrossRefGoogle Scholar
  10. 10.
    Lu, H.M., Tsai, F.T., Chen, H., Hung, M.W., Li, S.H.: Credit Rating Change Modeling Using News and Financial Ratios. ACM Transactions on Management Information Systems 3(3), 14 (2012)CrossRefGoogle Scholar
  11. 11.
    Lu, Y.C., Shen, C.H., Wei, Y.C.: Revisiting early warning signals of corporate credit default using linguistic analysis. Pacific-Basin Finance Journal 24, 1–21 (2013)CrossRefGoogle Scholar
  12. 12.
    Hájek, P., Olej, V.: Evaluating sentiment in annual reports for financial distress prediction using neural networks and support vector machines. In: Iliadis, L., Papadopoulos, H., Jayne, C. (eds.) EANN 2013, Part II. CCIS, vol. 384, pp. 1–10. Springer, Heidelberg (2013) CrossRefGoogle Scholar
  13. 13.
    Hajek, P., Olej, V., Myskova, R.: Forecasting Corporate Financial Performance using Sentiment in Annual Reports for Stakeholders’ Decision-Making. Technological and Economic Development of Economy 20(4), 721–738 (2014)CrossRefGoogle Scholar
  14. 14.
    Zhou, L.: Performance of Corporate Bankruptcy Prediction Models on Imbalanced Dataset: The Effect of Sampling Methods. Knowledge-Based Systems 41, 16–25 (2013)CrossRefGoogle Scholar
  15. 15.
    Hart, R.P.: Redeveloping DICTION: theoretical considerations. In: West, M.D. (ed.) Theory, Method, and Practice in Computer Content Analysis, pp. 43–60 (2001)Google Scholar
  16. 16.
    Hall, M.A.: Correlation-based Feature Selection for Machine Learning. Doctoral dissertation, The University of Waikato (1999)Google Scholar
  17. 17.
    Hajek, P., Michalak, K.: Feature Selection in Corporate Credit Rating Prediction. Knowledge-Based Systems 51, 72–84 (2013)CrossRefGoogle Scholar
  18. 18.
    Freund, Y., Mason, L.: The alternating decision tree learning algorithm. In: 16th Int. Conf. on Machine Learning, pp. 124–133, Bled, Slovenia (1999)Google Scholar
  19. 19.
    Kohavi, R.: Scaling up the accuracy of naive-bayes classifiers: a decision-tree hybrid. In: Second International Conference on Knowledge Discovery and Data Mining, pp. 202–207 (1996)Google Scholar
  20. 20.
    Breiman, L.: Random Forests. Machine Learning 45(1), 5–32 (2001)Google Scholar
  21. 21.
    Rodriguez, J.J., Kuncheva, L.I., Alonso, C.J.: Rotation Forest: A New Classifier Ensemble Method. IEEE Transactions on Pattern Analysis and Machine Intelligence 28(10), 1619–1630 (2006)CrossRefGoogle Scholar
  22. 22.
    Ho, T.K.: The Random Subspace Method for Constructing Decision Forests. IEEE Transactions on Pattern Analysis and Machine Intelligence 20(8), 832–844 (1998)CrossRefGoogle Scholar
  23. 23.
    Banfield, R.E., Hall, L.O., Bowyer, K.W., Kegelmeyer, W.P.: A Comparison of Decision Tree Ensemble Creation Techniques. IEEE Transactions on Pattern Analysis and Machine Intelligence 29(1), 173–180 (2007)CrossRefGoogle Scholar
  24. 24.
    Powers, D.M.W.: Evaluation: from Precision, Recall and F-measure to ROC, Informedness, Markedness and Correlation. Journal of Machine Learning Technologies 1(2), 37–63 (2011)Google Scholar
  25. 25.
    Hájek, P., Olej, V., Myšková, R.: Predicting financial distress of banks using random subspace ensembles of support vector machines. In: Silhavy, R., Senkerik, R., Oplatkova, Z.K., Prokopova, Z., Silhavy, P. (eds.) Artificial Intelligence Perspectives and Applications. AISC, vol. 347, pp. 131–140. Springer, Heidelberg (2015) Google Scholar

Copyright information

© Springer International Publishing Switzerland 2015

Authors and Affiliations

  1. 1.Institute of System Engineering and Informatics, Faculty of Economics and AdministrationUniversity of PardubicePardubiceCzech Republic

Personalised recommendations