Skip to main content
Log in

Improving financial distress prediction using textual sentiment of annual reports

  • Original Research
  • Published:
Annals of Operations Research Aims and scope Submit manuscript

Abstract

An accurate prediction of financial distress is beneficial to investors and allows banks and other financial institutions to build an early warning system to avoid risk contagion. This study investigated financial distress prediction using textual sentiment extracted from listed firms’ annual reports in the Chinese market. The sentiments reflected by the firms’ management discussions and analysis (MD&A) sections and audit reports were extracted separately through the application of deep learning algorithms. We found that the sentiment score extracted from MD&A sections was more optimistic compared with that extracted from audit reports. Moreover, the experimental results demonstrated that the modeling performance was significantly improved with the incorporation of textual sentiment scores, and the inclusion of sentiment from audit reports lead to a more significant incremental improvement than that from the MD&A sections. However, when both sentiment scores were included in the modeling input, the improvement in predictive accuracy was insignificant compared to the model using audit report scores only. Our study highlights the predictive power of textual information in annual reports, and shows that the textual sentiment of annual reports should be applied in distress modeling. The results provide implications for the utilization of soft information in credit risk modeling in the context of Chinese market, and such application can be further explored in other areas of operational research studies.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

Notes

  1. The python codes are available in the link.

    https://github.com/wu-maud/textual-analysis/tree/master/MDAFDPre.

References

  • Agarwal, S., Chen, V. Y. S., & Zhang, W. (2016). The information value of credit rating action reports: A textual analysis. Management Science, 62(8), 2218–2240.

    Google Scholar 

  • Alfaro, E., García, N., Gámez, M., & Elizondo, D. (2008). Bankruptcy forecasting: An empirical comparison of AdaBoost and neural networks. Decision Support Systems, 45(1), 110–122.

    Google Scholar 

  • Altman, E. I. (1968). Financial ratios, discriminant analysis and the prediction of corporate bankruptcy. Journal of Finance, 23, 589–609.

    Google Scholar 

  • Baesens, B., Van Gestel, T., Viaene, S., Stepanova, M., Suykens, J., & Vanthienen, J. (2003). Benchmarking state-of-the-art classification algorithms for credit scoring. Journal of the Operational Research Society, 54, 627–635.

    Google Scholar 

  • Beaver, W. H. (1966). Financial ratios as predictors of failure. Journal of Accounting Research, 4, 71–111.

    Google Scholar 

  • Beaver, W. H., McNichols, M. F., & Rhie, J. (2005). Have financial statements become Less informative? Evidence from the ability of financial ratios to predict bankruptcy. Review of Accounting Studies, 10, 93–122.

    Google Scholar 

  • Boyacioglu, M. A., Kara, Y., & Baykan, Ö. K. (2009). Predicting bank financial failures using neural networks, support vector machines and multivariate statistical methods: A comparative analysis in the sample of savings deposit insurance fund (SDIF) transferred banks in Turkey. Expert Systems with Applications, 36(2), 3355–3366.

    Google Scholar 

  • Breiman, L. (2001). Random forests. Machine Learning, 45, 5–32.

    Google Scholar 

  • Campbell, J. L., Chen, H., Dhaliwal, D. S., Lu, H., & Steele, L. B. (2014). The information content of mandatory risk factor disclosures in corporate filings. Review of Accounting Studies, 19(1), 396–455.

    Google Scholar 

  • Campbell, J. Y., Hilscher, J., & Szilagyi, J. (2008). In search of distress risk. Journal of Finance, 63(6), 2899–2939.

    Google Scholar 

  • Chauhan, N., Ravi, V., & Chandra, D. K. (2009). Differential evolution trained wavelet neural networks: Application to bankruptcy prediction in banks. Expert Systems with Applications, 36(4), 7659–7665.

    Google Scholar 

  • Chava, S., & Jarrow, R. (2004). Bankruptcy prediction with industry effects. Review of Finance, 8, 537–569.

    Google Scholar 

  • Chen, T., & Guestrin, C. (2016). Xgboost: A scalable tree boosting system. In: Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, San Francisco, USA, August 13–17, 2016. pp. 785–794.

  • Chen, M.-Y. (2011). Predicting corporate financial distress based on integration of decision tree classification and logistic regression. Expert Systems with Applications, 38(9), 11261–11272.

    Google Scholar 

  • Devlin, J., Chang, M., Lee, K. & Toutanova, K. (2018). BERT: Pre-training of deep bidirectional transformers for language understanding. Retrieved from https://arxiv.org/abs/1810.04805

  • Dietterich, T. G. (1998). Approximate statistical tests for comparing supervised classification learning. Neural Computation, 10, 1895–1923.

    Google Scholar 

  • Friedman, J. (2001). Greedy function approximation: A gradient boosting machine. Annals of Statistics, 29(5), 1189–1232.

    Google Scholar 

  • Geng, R., Bose, I., & Chen, X. (2015). Prediction of financial distress: An empirical study of listed Chinese companies using data mining. European Journal of Operational Research, 241(1), 236–247.

    Google Scholar 

  • Gurun, U. G., & Butler, A. W. (2012). Don’t believe the hype: Local media slant, local advertising, and firm value. Journal of Finance, 67, 561–598.

    Google Scholar 

  • Hansen, P. R., Lunde, A., & James, M. N. (2011). The model confidence set. Econometrica, 79(2), 453–497.

    Google Scholar 

  • Hillegeist, S. A., Keating, E. K., Cram, D. P., & Lundstedt, K. G. (2004). Assessing the probability of bankruptcy. Review of Accounting Studies, 9(1), 5–34.

    Google Scholar 

  • Hochreiter, S., & Schmidhuber, J. (1997). Long short-term memory. Neural Computation, 9(8), 1735–1780.

    Google Scholar 

  • Hosaka, T. (2019). Bankruptcy prediction using imaged financial ratios and convolutional neural networks. Expert Systems with Applications, 117, 287–299.

    Google Scholar 

  • Jiang, C., Wang, Z., Wang, R., & Ding, Y. (2017). Loan default prediction by combining soft information extracted from descriptive text in online peer-to-peer lending. Annuals of Operations Research, 266, 511–529.

    Google Scholar 

  • Jones, S., Johnstone, D., & Wilson, R. (2017). Predicting corporate bankruptcy: An evaluation of alternative statistical frameworks. Journal of Business Finance & Accounting, 44(1–2), 3–34.

    Google Scholar 

  • Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). Imagenet classification with deep convolutional neural networks. In: Proceedings of the advances in neural information processing systems. pp. 1097—1105

  • Kumar, P. R., & Ravi, V. (2007). Bankruptcy prediction in banks and firms via statistical and intelligent techniques—a review. European Journal of Operational Research, 180, 1–28.

    Google Scholar 

  • Lang, M., & Stice-Lawrence, L. (2015). Textual analysis and international financial reporting: Large sample evidence. Journal of Accounting and Economics, 60(2), 110–135.

    Google Scholar 

  • Lessmann, S., Baesens, B., Seow, H.-V., & Thomas, L. C. (2015). Benchmarking state-of-the-art classification algorithms for credit scoring: An update of research. European Journal of Operational Research, 247, 124–136.

    Google Scholar 

  • Li, F. (2010). The information content of forward-looking statements in corporate filings—A naive Bayesian machine learning approach. Journal of Accounting Research, 48, 1049–1102.

    Google Scholar 

  • Loughran, T., & Mcdonald, B. (2011). When is a liability not a liability? Textual analysis, dictionaries, and 10-Ks. Journal of Finance, 66(1), 35–65.

    Google Scholar 

  • Mai, F., Tian, S., Lee, C., & Ma, L. (2019). Deep learning models for bankruptcy prediction using textual disclosures. European Journal of Operational Research, 274(2), 743–758.

    Google Scholar 

  • Matin, R., Hansen, C., Hansen, C., & Mølgaard, P. (2019). Predicting distresses using deep learning of text segments in annual reports. Expert Systems with Applications, 132, 199–208.

    Google Scholar 

  • Mikolov, T., Sutskever, I., Chen, K., Corrado, G. & Dean, J. (2013). Distributed representations of words and phrases and their compositionality. Retrieved from https://arxiv.org/abs/1310.4546

  • Nanni, L., & Lumini, A. (2009). An experimental comparison of ensemble of classifiers for bankruptcy prediction and credit scoring. Expert Systems with Applications, 36, 3028–3033.

    Google Scholar 

  • Olson, D. L., Delen, D., & Meng, Y. (2012). Comparative analysis of data mining methods for bankruptcy prediction. Decision Support Systems, 52(2), 464–473.

    Google Scholar 

  • Shumway, T. (2001). Forecasting bankruptcy more accurately: A simple hazard model. Journal of Business, 74, 101–124.

    Google Scholar 

  • Sun, J., Fujita, H., Chen, P., & Li, H. (2017). Dynamic financial distress prediction with concept drift based on time weighting combined with adaboost support vector machine ensemble. Knowledge-Based Systems, 120, 4–14.

    Google Scholar 

  • Sun, J., Lang, J., Fujita, H., & Li, H. (2018). Imbalanced enterprise credit evaluation with DTE-SBD: Decision tree ensemble based on SMOTE and bagging with differentiated sampling rates. Information Sciences, 425, 76–91.

    Google Scholar 

  • Sun, J., Li, H., Huang, Q., & He, K. (2014). Predicting financial distress and corporate failure: A review from the state-of-the-art definitions, modeling, sampling, and featuring approaches. Knowledge-Based Systems, 57, 41–56.

    Google Scholar 

  • Sundar, S. S. (1998). Effect of source attribution on perception of online news stories. Journalism & Mass Communication Quarterly, 75(1), 55–68.

    Google Scholar 

  • Sundar, S. S. (1999). Exploring receivers’ criteria for perception of print and online news. Journalism & Mass Communication Quarterly, 76, 373–386.

    Google Scholar 

  • Suykens, J., & Vandewalle, J. (1999). Least squares support vector machine classifiers. Neural Processing Letters, 9, 293–300.

    Google Scholar 

  • Tetlock, P. C. (2007). Giving content to investor sentiment: The role of media in the stock market. Journal of Finance, 62, 1139–1168.

    Google Scholar 

  • Tetlock, P. C. (2010). Does public financial news resolve asymmetric information? The Review of Financial Studies, 23(9), 3520–3557.

    Google Scholar 

  • Tian, S., Yu, Y., & Guo, H. (2015). Variable selection and corporate bankruptcy forecasts. Journal of Banking and Finance, 52, 89–100.

    Google Scholar 

  • Tsai, C., Hsu, Y., & Yen, D. C. (2014). A comparative study of classifier ensembles for bankruptcy prediction. Applied Soft Computing, 24, 977–984.

    Google Scholar 

  • Tsai, C., & Wu, J. (2008). Using neural network ensembles for bankruptcy prediction and credit scoring. Expert Systems with Applications, 34, 2639–2649.

    Google Scholar 

  • Vapnik, V. (1995). The nature of statistical learning theory. Springer-Verlag.

    Google Scholar 

  • Wilson, R. L., & Sharda, R. (1994). Bankruptcy prediction using neural networks. Decision Support Systems, 11(5), 545–557.

    Google Scholar 

  • Xia, Y., He, L., Li, Y., Liu, N., & Ding, Y. (2019). Predicting loan default in peer-to-peer lending using narrative data. Journal of Forecasting, 39(2), 260–280.

    Google Scholar 

  • Zhou, L. (2013). Performance of corporate bankruptcy prediction models on imbalanced dataset: The effect of sampling methods. Knowledge-Based Systems, 41, 16–25.

    Google Scholar 

  • Zhou, L., Tam, K. P., & Fujita, H. (2016). Predicting the listing status of Chinese listed companies with multi-class classification models. Information Science, 328, 222–236.

    Google Scholar 

  • Zmijewski, M. E. (1984). Methodological issues related to the estimation of financial distress prediction models. Journal of Accounting Research, 22, 59–82.

    Google Scholar 

Download references

Acknowledgements

The authors would like to thank for the support from the funding of National Natural Science Foundation in China (Grant No. 71703162, 71901230).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xiao Yao.

Ethics declarations

Conflict of interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix A: Examples sentences with sentiment labels in MD&A sections and audit reports

Segment

Sentiment

Example sentences

MD&A sections

Positive

It is planned to actively extend the intelligent government services business in Shanghai and Guangxi Province

Sales of social security card chip has experienced substantial growing in 2017 due to the policy adjustment in the short term, and more efforts should be made for the competition in the market of the third-generation social security card chip

Neutral

We will make an overall plan to connect all sections together hierarchically and horizontally

We aim to strengthen the cash management and explore more financing opportunities

Negative

Deferred income has experienced an increase of 15,057,000 yuan due to the delay of the settlement of national scientific funding

Because the corporate has initiated the procedure of material asset restructuring since 2013, and thus high service fee is incurred

Iron ore is the main raw materials of steel products, of which the price has strong correlation with national economic structure, economic cycle and industrialisation level, and the demand of iron ore is highly dependent with importation without possession of pricing power which is also impacted by ocean freight

Audit reports

Positive

In our opinion, the financial statements give a true and fair view of the financial position of the corporation at 31 December 2012, and the corporate operations and cash flows in 2012 in accordance with Chinese Financial Reporting Standards

Neutral

We believe that the audit evidence we have obtained is sufficient and appropriate to provide a basis for our audit opinion

In making those risk assessments, the auditor considers internal control relevant to the entity’s preparation and true and fair presentation of the financial statements in order to design audit procedures that are appropriate in the circumstances, but not for the purpose of expressing an opinion on the effectiveness of the entity’s internal control

The property management incomes are the main sources of the firm profit in 2013

Negative

Until 31 December 2017, the receivables of Shanghai Putian accumulates to 748,956,944 yuan, in that its obligor (Zhejiang Dawei Co., Ltd.) has filed bankruptcy since July 2017

The corporate has filed an accusation to Liyang Industrial Development Co.,Ltd. due to the dispute in the contract of Tianmu Lake Hotel decoration

Appendix B: Full list of candidate variables

No.

Variables

No.

Variables

1

MDA score

28

Current assets to turnover

2

Audit score

29

Long-term assets to turnover

3

Working capital to total assets

30

Net assets to turnover

4

Retained earnings to total assets

31

Account payable to turnover

5

Earnings before interest and taxes to total assets

32

working capital to turnover

6

Market equity to total liabilities

33

Cash to turnover

7

Total sales to total assets

34

Operating revenue per share

8

Net income to total assets

35

Earnings per share

9

Total liabilities to total assets

36

retention ratio

10

Current ratio

37

Rate of capital accumulation

11

Quick ratio

38

price to earnings

12

Cash ratio

39

dividend to price

13

Owner’s equity to total assets

40

Market value to cash flow

14

Fixed assets to total assets

41

Market value to sales

15

Owner’s equity to fixed assets

42

Net cash flow per share

16

Current liability to total liabilities

43

Net cash flow from operating activities to net profit ratio

17

Long-term liability to total liabilities

44

Net assets per share

18

Owner’s equity to liability

45

Surplus reserves per share

19

Debt to tangible assets ratio

46

Undistributed profits per share

20

Gross operating profit ratio

47

Net cash flow from operating activities per share

21

Net profit to operating revenue

48

Net cash flow from investing activities per share

22

Net profit to current assets

49

Net cash flow from financing activities per share

23

Net profit to fixed assets

50

Current assets to owner’s equity

24

Net profit to owner’s equity

51

Price to book value

25

Return on invested capital

52

Relative size

26

Account receivables to turnover

53

Excess return

27

Inventory to turnover

54

Market volatility

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Huang, B., Yao, X., Luo, Y. et al. Improving financial distress prediction using textual sentiment of annual reports. Ann Oper Res 330, 457–484 (2023). https://doi.org/10.1007/s10479-022-04633-3

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10479-022-04633-3

Keywords

Navigation