Predicting Corporate Credit Ratings Using Content Analysis of Annual Reports – A Naïve Bayesian Network Approach

  • Petr HajekEmail author
  • Vladimir Olej
  • Ondrej Prochazka
Conference paper
Part of the Lecture Notes in Business Information Processing book series (LNBIP, volume 276)


Corporate credit ratings are based on a variety of information, including financial statements, annual reports, management interviews, etc. Financial indicators are critical to evaluate corporate creditworthiness. However, little is known about how qualitative information hidden in firm-related documents manifests in credit rating process. To address this issue, this study aims to develop a methodology for extracting topical content from firm-related documents using latent semantic analysis. This information is integrated with traditional financial indicators into a multi-class corporate credit rating prediction model. Informative indicators are obtained using a correlation-based filter in the process of feature selection. We demonstrate that Naïve Bayesian networks perform statistically equivalent to other machine learning methods in terms of classification performance. We further show that the “red flag” values obtained using Naïve Bayesian networks may indicate a low credit quality (non-investment rating classes) of firms. These findings can be particularly important for investors, banks and market regulators.


Credit rating Firms Prediction Concept extraction Naïve Bayesian network 



This work was supported by the scientific research project of the Czech Sciences Foundation Grant No: GA16-19590S and by the grant No. SGS_2016_023 of the Student Grant Competition.


  1. 1.
    Atiya, A.F.: Bankruptcy prediction for credit risk using neural networks: a survey and new results. IEEE Trans. Neural Networks 12(4), 929–935 (2001). doi: 10.1109/72.935101 CrossRefGoogle Scholar
  2. 2.
    Crouhy, M., Galai, D., Mark, R.: A comparative analysis of current credit risk models. J. Bank. Finance 24(1–2), 59–117 (2000). doi: 10.1016/S0378-4266(99)00053-9 CrossRefGoogle Scholar
  3. 3.
    Petropoulos, A., Chatzis, S.P., Xanthopoulos, S.: A novel corporate credit rating system based on Student’s-t hidden Markov models. Expert Syst. Appl. 53, 87–105 (2016). doi: 10.1016/j.eswa.2016.01.015 CrossRefGoogle Scholar
  4. 4.
    Zhong, H., Miao, C., Shen, Z., Feng, Y.: Comparing the learning effectiveness of BP, ELM, I-ELM, and SVM for corporate credit ratings. Neurocomputing 128, 285–295 (2014). doi: 10.1016/j.neucom.2013.02.054 CrossRefGoogle Scholar
  5. 5.
    Hajek, P.: Municipal credit rating modelling by neural networks. Decis. Support Syst. 51(1), 108–118 (2011). doi: 10.1016/j.dss.2010.11.033 CrossRefGoogle Scholar
  6. 6.
    Huang, Z., Chen, H., Hsu, C.J., Chen, W.H., Wu, S.: Credit rating analysis with support vector machines and neural networks: a market comparative study. Decis. Support Syst. 37(4), 543–558 (2004). doi: 10.1016/S0167-9236(03)00086-1 CrossRefGoogle Scholar
  7. 7.
    Kim, K.J., Ahn, H.: A corporate credit rating model using multi-class support vector machines with an ordinal pairwise partitioning approach. Comput. Oper. Res. 39(8), 1800–1811 (2012). doi: 10.1016/j.cor.2011.06.023 MathSciNetCrossRefzbMATHGoogle Scholar
  8. 8.
    Hajek, P., Olej, V.: Credit rating modelling by kernel-based approaches with supervised and semi-supervised learning. Neural Comput. Appl. 20(6), 761–773 (2011). doi: 10.1007/s00521-010-0495-0 CrossRefzbMATHGoogle Scholar
  9. 9.
    Chen, C.C., Li, S.T.: Credit rating with a monotonicity-constrained support vector machine model. Expert Syst. Appl. 41(16), 7235–7247 (2014). doi: 10.1016/j.eswa.2014.05.035 CrossRefGoogle Scholar
  10. 10.
    Hajek, P., Michalak, K.: Feature selection in corporate credit rating prediction. Knowl.-Based Syst. 51, 72–84 (2013). doi: 10.1016/j.knosys.2013.07.008 CrossRefGoogle Scholar
  11. 11.
    Hajek, P.: Credit rating analysis using adaptive fuzzy rule-based systems: an industry-specific approach. Cent. Eur. J. Oper. Res. 20(3), 421–434 (2012). doi: 10.1007/s10100-011-0229-0 CrossRefzbMATHGoogle Scholar
  12. 12.
    Chen, Y.S., Cheng, C.H.: Hybrid models based on rough set classifiers for setting credit rating decision rules in the global banking industry. Knowl.-Based Syst. 39, 224–239 (2013). doi: 10.1016/j.knosys.2012.11.004 CrossRefGoogle Scholar
  13. 13.
    Wu, T.C., Hsu, M.F.: Credit risk assessment and decision making by a fusion approach. Knowl.-Based Syst. 35, 102–110 (2012). doi: 10.1016/j.knosys.2012.04.025 CrossRefGoogle Scholar
  14. 14.
    Yeh, C.C., Lin, F., Hsu, C.Y.: A hybrid KMV model, random forests and rough set theory approach for credit rating. Knowl.-Based Syst. 33, 166–172 (2012). doi: 10.1016/j.knosys.2012.04.004 CrossRefGoogle Scholar
  15. 15.
    Pai, P.F., Tan, Y.S., Hsu, M.F.: Credit rating analysis by the decision-tree support vector machine with ensemble strategies. Int. J. Fuzzy Syst. 17(4), 521–530 (2015). doi: 10.1007/s40815-015-0063-y CrossRefGoogle Scholar
  16. 16.
    Hájek, P., Olej, V.: Predicting firms’ credit ratings using ensembles of artificial immune systems and machine learning – an over-sampling approach. In: Iliadis, L., Maglogiannis, I., Papadopoulos, H. (eds.) AIAI 2014. IAICT, vol. 436, pp. 29–38. Springer, Heidelberg (2014). doi: 10.1007/978-3-662-44654-6_3 Google Scholar
  17. 17.
    Hájek, P., Olej, V.: Evaluating sentiment in annual reports for financial distress prediction using neural networks and support vector machines. In: Iliadis, L., Papadopoulos, H., Jayne, C. (eds.) EANN 2013. CCIS, vol. 384, pp. 1–10. Springer, Heidelberg (2013). doi: 10.1007/978-3-642-41016-1_1 CrossRefGoogle Scholar
  18. 18.
    Hajek, P., Olej, V., Myskova, R.: Forecasting corporate financial performance using sentiment in annual reports for stakeholders’ decision-making. Technol. Econ. Dev. Econ. 20(4), 721–738 (2014). doi: 10.3846/20294913.2014.979456 CrossRefGoogle Scholar
  19. 19.
    Lu, Y.C., Shen, C.H., Wei, Y.C.: Revisiting early warning signals of corporate credit default using linguistic analysis. Pacifin-Basin Finan. J. 24, 1–21 (2013). doi: 10.1016/j.pacfin.2013.02.002 CrossRefGoogle Scholar
  20. 20.
    Lu, H.M., Tsai, F.T., Chen, H., Hung, M.W., Li, S.H.: Credit rating change modeling using news and financial ratios. ACM Trans. Manag. Inf. Syst. 3(3), 14 (2012). doi: 10.1145/2361256.2361259 CrossRefGoogle Scholar
  21. 21.
    Cecchini, M., Aytug, H., Koehler, G.J., Pathak, P.: Making words work: using financial text as a predictor of financial events. Decis. Support Syst. 50(1), 164–175 (2010). doi: 10.1016/j.dss.2010.07.012 CrossRefGoogle Scholar
  22. 22.
    Dejaeger, K., Verbraken, T., Baesens, B.: Toward comprehensible software fault prediction models using Bayesian network classifiers. IEEE Trans. Software Eng. 39(2), 237–257 (2013). doi: 10.1109/TSE.2012.20 CrossRefGoogle Scholar
  23. 23.
    Salton, G., Buckley, C.: Term-weighting approaches in automatic text retrieval. Inf. Process. Manag. 24(5), 513–523 (1988). doi: 10.1016/0306-4573(88)90021-0 CrossRefGoogle Scholar
  24. 24.
    Yu, L., Liu, H.: Feature selection for high-dimensional data: a fast correlation-based filter solution. In: International Conference on Machine Learning, ICML 2003, Washington, vol. 3, pp. 856–863 (2003)Google Scholar
  25. 25.
    Crain, S.P., Zhou, K., Yang, S.H., Zha, H.: Dimensionality reduction and topic modeling: from latent semantic indexing to latent dirichlet allocation and beyond. In: Aggarwal, C.C., Zhai, C. (eds.) Mining Text Data, pp. 129–161. Springer, New York (2012). doi: 10.1007/978-1-4614-3223-4_5
  26. 26.
    Wall, M.E., Rechtsteiner, A., Rocha, L.M.: Singular value decomposition and principal component analysis. In: Berrar, D.P., Dubitzky, W., Granzow, M. (eds) A Practical Approach to Microarray Data Analysis, pp. 91–109. Kluwer (2003). doi: 10.1007/0-306-47815-3_5
  27. 27.
    Howard, R.A., Matheson, J.E.: Influence diagrams. Decis. Anal. 2(3), 721–762 (2005). doi: 10.1287/deca.1050.0020 Google Scholar
  28. 28.
    Pearl, J.: Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference. Morgan Kaufmann, San Mateo (1988)zbMATHGoogle Scholar
  29. 29.
    Heckerman, D., Geiger, D., Chickering, D.M.: Learning Bayesian networks: the combination of knowledge and statistical data. Mach. Learn. 20(3), 197–243 (1995). doi: 10.1007/BF00994016 zbMATHGoogle Scholar
  30. 30.
    Friedman, N., Geiger, D., Goldszmidt, M.: Bayesian network classifiers. Mach. Learn. 29(2–3), 131–163 (1997). doi: 10.1023/A:1007465528199 CrossRefzbMATHGoogle Scholar
  31. 31.
    Miller, G.A.: WordNet: a lexical database for English. Commun. ACM 38(11), 39–41 (1995)CrossRefGoogle Scholar
  32. 32.
    Hajek, P., Olej, V.: Comparing corporate financial performance and qualitative information from annual reports using self-organizing maps. In: 10th International Conference on Natural Computation (ICNC 2014), pp. 93–98. IEEE (2014). doi: 10.1109/ICNC.2014.6975816
  33. 33.
    Matveeva, I., Levow, G.A., Farahat, A., Royer, C.H.: Term representation with generalized latent semantic analysis. In: Recent Advances in Natural Language Processing IV: Selected papers from RANLP 2005, Current Issues in Linguistic Theory, vol. 292, pp. 45–54. John Benjamins Publishing (2007)Google Scholar
  34. 34.
    Hájek, P., Boháčová, J.: Predicting abnormal bank stock returns using textual analysis of annual reports – a neural network approach. In: Jayne, C., Iliadis, L. (eds.) EANN 2016. CCIS, vol. 629, pp. 67–78. Springer, Heidelberg (2016). doi: 10.1007/978-3-319-44188-7_5 CrossRefGoogle Scholar
  35. 35.
    Chawla, N.V., Japkowicz, N., Kotcz, A.: Editorial: special issue on learning from imbalanced data sets. ACM Sigkdd Explor. Newsl. 6(1), 1–6 (2004)CrossRefGoogle Scholar
  36. 36.
    Hand, D.J., Till, R.J.: A simple generalisation of the area under the ROC curve for multiple class classification problems. Mach. Learn. 45, 171–186 (2001). doi: 10.1023/A:1010920819831 CrossRefzbMATHGoogle Scholar
  37. 37.
    Provost, F., Fawcett, T.: Robust classification for imprecise environments. Mach. Learn. 42(3), 203–231 (2001). doi: 10.1023/A:1007601015854 CrossRefzbMATHGoogle Scholar
  38. 38.
    Feuerriegel, S., Ratku, A., Neumann, D.: Analysis of how underlying topics in financial news affect stock prices using latent dirichlet allocation. In: Bui, T.X., Sprague, R.H. (eds) 49th Hawaii International Conference on System Sciences (HICSS), pp. 1072–1081. IEEE (2016). doi: 10.1109/HICSS.2016.137

Copyright information

© Springer International Publishing AG 2017

Authors and Affiliations

  1. 1.Institute of System Engineering and Informatics, Faculty of Economics and AdministrationUniversity of PardubicePardubiceCzech Republic

Personalised recommendations