Word Categorization of Corporate Annual Reports for Bankruptcy Prediction by Machine Learning Methods
The language of company related documents is recognized as being an important indicator of future financial performance. This study aims to extract various word categories from corporate annual reports and examine their effect on bankruptcy prediction. We show that the language used by bankrupt companies is characterized by stronger tenacity, accomplishment, familiarity, present concern, exclusion and denial. Bankrupt companies also use more modal, positive, uncertain and negative language. We used neural networks, support vector machines, decision trees and ensembles of decision trees to predict corporate bankruptcy. The prediction models utilized both financial indicators and word categorizations as input variables. We show that both general dictionary and financial dictionary categories can significantly improve the accuracy of the prediction models.
KeywordsBankruptcy prediction Word categorization Sentiment analysis Machine learning Meta-learning
Unable to display preview. Download preview PDF.
- 15.Hart, R.P.: Redeveloping DICTION: theoretical considerations. In: West, M.D. (ed.) Theory, Method, and Practice in Computer Content Analysis, pp. 43–60 (2001)Google Scholar
- 16.Hall, M.A.: Correlation-based Feature Selection for Machine Learning. Doctoral dissertation, The University of Waikato (1999)Google Scholar
- 18.Freund, Y., Mason, L.: The alternating decision tree learning algorithm. In: 16th Int. Conf. on Machine Learning, pp. 124–133, Bled, Slovenia (1999)Google Scholar
- 19.Kohavi, R.: Scaling up the accuracy of naive-bayes classifiers: a decision-tree hybrid. In: Second International Conference on Knowledge Discovery and Data Mining, pp. 202–207 (1996)Google Scholar
- 20.Breiman, L.: Random Forests. Machine Learning 45(1), 5–32 (2001)Google Scholar
- 24.Powers, D.M.W.: Evaluation: from Precision, Recall and F-measure to ROC, Informedness, Markedness and Correlation. Journal of Machine Learning Technologies 1(2), 37–63 (2011)Google Scholar
- 25.Hájek, P., Olej, V., Myšková, R.: Predicting financial distress of banks using random subspace ensembles of support vector machines. In: Silhavy, R., Senkerik, R., Oplatkova, Z.K., Prokopova, Z., Silhavy, P. (eds.) Artificial Intelligence Perspectives and Applications. AISC, vol. 347, pp. 131–140. Springer, Heidelberg (2015) Google Scholar