Skip to main content
Log in

Predicting Firm Financial Performance from SEC Filing Changes Using Automatically Generated Dictionary

  • Published:
Computational Economics Aims and scope Submit manuscript

Abstract

Textual data are increasingly used to predict firm performance, however extracting useful signals towards serving this goal with a continuously growing repository of financial reports and documents is challenging, even by the state-of-the-art machine learning and natural language processing (NLP) techniques. We propose a novel approach to automatically create a word list from SEC filings (10-K and 8-K reports) using advanced deep learning and NLP techniques and compare their performance against the widely used Loughran–McDonald sentiment dictionaries. We additionally analyze a corpus of 8-K and 10-K documents to evaluate their relative informativeness for firm performance prediction. Since 8-K filings provide corporate updates along a fiscal year, we compare their content against changes in 10-Ks between consecutive years to assess the incremental value of information provided in these regulatory filings. Information effectiveness is examined by predicting six key financial indicators for a set of US banks using ridge regression. Our results positively support sentiment dictionaries expansion by automatically extracting meaning from text and highlight the benefits obtainable from utilizing update filings.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

Notes

  1. http://www.wjh.harvard.edu/inquirer.

  2. https://github.com/google-research/bert.

  3. https://github.com/hanxiao/bert-as-service.

  4. https://www.nltk.org/

  5. https://scikit-learn.org/

  6. https://pypi.org/project/beautifulsoup4/.

  7. https://sraf.nd.edu/textual-analysis/resources/.

  8. Not to be confused with Altman’s Z-score (Altman, 1968) built by a regression analysis to measure creditworthiness of any firm in terms of a set of financial ratios.

References

  • Akhtar, M. S., Kumar, A., Ghosal, D., Ekbal, A., and Bhattacharyya, P. (2017). A multilayer perceptron based ensemble technique for fine-grained financial sentiment analysis. In Proceedings of the 2017 conference on empirical methods in natural language processing (pp. 540–546).

  • Altman, E. I. (1968). Financial ratios, discriminant analysis and the prediction of corporate bankruptcy. The Journal of Finance, 23(4), 589–609.

    Article  Google Scholar 

  • Araci, D. (2019). Finbert: Financial sentiment analysis with pre-trained language models. arXiv preprint arXiv:1908.10063.

  • Brown, S. V., & Tucker, J. W. (2011). Large-sample evidence on firms’ year-over-year md &a modifications. Journal of Accounting Research, 49(2), 309–346.

    Article  Google Scholar 

  • Chang, C. Y., Zhang, Y., Teng, Z., Bozanic, Z., & Ke, B. (2016). Measuring the information content of financial news. In Proceedings of COLING 2016, the 26th international conference on computational linguistics: Technical papers (pp. 3216–3225).

  • Chen, Y., Rabbani, R. M., Gupta, A., & Zaki, M. J. (2017). Comparative text analytics via topic modeling in banking. In 2017 IEEE symposium series on computational intelligence (SSCI) (pp. 1–8). IEEE.

  • Das, S. R., et al. (2014). Text and context: Language analytics in finance. Foundations and Trends ® in Finance, 8(3), 145–261.

    Article  Google Scholar 

  • Dereli, N. & Saraçlar, M. (2019). Convolutional neural networks for financial text regression. In Proceedings of the 57th annual meeting of the association for computational linguistics: Student research workshop (pp. 331–337).

  • DeSola, V., Hanna, K., & Nonis, P. (2019). Finbert: Pre-trained model on sec filings for financial natural language tasks. University of California.

    Google Scholar 

  • Devlin, J., Chang, M.-W., Lee, K., & Toutanova, K. (2018). Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805.

  • Duarte, J. J., Montenegro González, S., & Cruz, J. C. (2021). Predicting stock price falls using news data: Evidence from the Brazilian market. Computational Economics, 57(1), 311–340.

    Article  Google Scholar 

  • Emerson, S., Kennedy, R., O’Shea, L., & O’Brien, J. (2019). Trends and applications of machine learning in quantitative finance. In 8th international conference on economics and finance research (ICEFR 2019).

  • Gupta, A. & Owusu, A. (2019). Identifying the risk culture of banks using machine learning. Available at SSRN 3441861.

  • Huang, K.-W. (2010). Exploring the information contents of risk factors in sec form 10-k: A multi-label text classification application. Available at SSRN 1784527.

  • Khaidem, L., Saha, S., & Dey, S. R. (2016). Predicting the direction of stock market prices using random forest. arXiv preprint arXiv:1605.00003.

  • Khalil, F., & Pipa, G. (2022). Is deep-learning and natural language processing transcending the financial forecasting? investigation through lens of news analytic process. Computational Economics, 60(1), 147–171.

    Article  Google Scholar 

  • Kogan, S., Levin, D., Routledge, B. R., Sagi, J. S., & Smith, N. A. (2009). Predicting risk from financial reports with regression. In Proceedings of human language technologies: The 2009 annual conference of the North American chapter of the association for computational linguistics (pp. 272–280).

  • Lee, J. & Lee, H. (2008). Predicting corporate 8-k content using machine learning techniques. Graduate School of Business Stanford University.

  • Leidner, J. L. & Schilder, F. (2010). Hunting for the black swan: risk mining from text. In Proceedings of the ACL 2010 system demonstrations (pp. 54–59). Association for Computational Linguistics.

  • Loughran, T., & McDonald, B. (2011). When is a liability not a liability? textual analysis, dictionaries, and 10-ks. The Journal of Finance, 66(1), 35–65.

    Article  Google Scholar 

  • Loughran, T., & McDonald, B. (2016). Textual analysis in accounting and finance: A survey. Journal of Accounting Research, 54(4), 1187–1230.

    Article  Google Scholar 

  • Mikolov, T., Chen, K., Corrado, G., & Dean, J. (2013). Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781.

  • Nopp, C. & Hanbury, A. (2015). Detecting risks in the banking system by sentiment analysis. In Proceedings of the 2015 conference on empirical methods in natural language processing (pp. 591–600).

  • Nousi, P., Tsantekidis, A., Passalis, N., Ntakaris, A., Kanniainen, J., Tefas, A., Gabbouj, M., & Iosifidis, A. (2019). Machine learning for forecasting mid-price movements using limit order book data. Ieee Access, 7, 64722–64736.

    Article  Google Scholar 

  • Peters, M. E., Neumann, M., Iyyer, M., Gardner, M., Clark, C., Lee, K., & Zettlemoyer, L. (2018). Deep contextualized word representations. arXiv preprintarXiv:1802.05365.

  • Radford, A., Narasimhan, K., Salimans, T., & Sutskever, I. (2018). Improving language understanding by generative pre-training. https://s3-us-west-2. amazonaws. com/openai-assets/researchcovers/languageunsupervised/language understanding paper.pdf.

  • Rawte, V., Gupta, A., & Zaki, M. J. (2018). Analysis of year-over-year changes in risk factors disclosure in 10-k filings. In Proceedings of the fourth international workshop on data science for macro-modeling with financial and economic datasets (pp. 1–4).

  • Roy, A. D. (1952). Safety first and the holding of assets. Econometrica: Journal of the Econometric Society (pp. 431–449).

  • Sardelich, M. & Manandhar, S. (2018). Multimodal deep learning for short-term stock volatility prediction. arXiv preprint arXiv:1812.10479.

  • Sedinkina, M., Breitkopf, N., & Schütze, H. (2019). Automatic domain adaptation outperforms manual domain adaptation for predicting financial outcomes. In Proceedings of the 57th annual meeting of the association for computational linguistics (pp. 346–359).

  • Sennrich, R., Haddow, B., & Birch, A. (2015). Neural machine translation of rare words with subword units. arXiv preprint arXiv:1508.07909.

  • Tetlock, P. C., Saar-Tsechansky, M., & Macskassy, S. (2008). More than words: Quantifying language to measure firms’ fundamentals. The Journal of Finance, 63(3), 1437–1467.

    Article  Google Scholar 

  • Theil, C. K., Štajner, S., & Stuckenschmidt, H. (2018). Word embeddings-based uncertainty detection in financial disclosures. In Proceedings of the first workshop on economics and natural language processing (pp. 32–37).

  • Theil, C. K., Štajner, S., & Stuckenschmidt, H. (2020). Explaining financial uncertainty through specialized word embeddings. ACM Transactions on Data Science, 1(1), 1–19.

    Article  Google Scholar 

  • Tsai, M.-F. & Wang, C.-J. (2014). Financial keyword expansion via continuous word vector representations. In Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP) (pp. 1453–1458).

  • Tsai, M.-F., & Wang, C.-J. (2017). On the risk prediction and analysis of soft information in finance reports. European Journal of Operational Research, 257(1), 243–250.

    Article  Google Scholar 

  • Tsai, M.-F., Wang, C.-J., & Chien, P.-C. (2016). Discovering finance keywords via continuous-space language models. ACM Transactions on Management Information Systems (TMIS), 7(3), 1–17.

    Article  Google Scholar 

  • Tsantekidis, A., Passalis, N., Tefas, A., Kanniainen, J., Gabbouj, M., & Iosifidis, A. (2017). Forecasting stock prices from the limit order book using convolutional neural networks. In 2017 IEEE 19th conference on business informatics (CBI) (Vol. 1, pp. 7–12). IEEE.

  • Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, Ł., & Polosukhin, I. (2017). Attention is all you need. In Advances in neural information processing systems (pp. 5998–6008).

  • Wang, Y. & Ni, X. S. (2019). A xgboost risk model via feature selection and bayesian hyper-parameter optimization. arXiv preprint arXiv:1901.08433.

  • Yang, L., Zhang, Z., Xiong, S., Wei, L., Ng, J., Xu, L., & Dong, R. (2018). Explainable text-driven neural network for stock prediction. In 2018 5th IEEE international conference on cloud computing and intelligence systems (CCIS) (pp. 441–445). IEEE.

  • Zaki, M. J., & Meira, W., Jr. (2020). Data mining and machine learning: Fundamental concepts and algorithms. Cambridge University Press.

    Book  Google Scholar 

  • Zhai, S. S. & Zhang, Z. D. (2019). Forecasting firm material events from 8-k reports. In Proceedings of the second workshop on economics and natural language processing (pp. 22–30).

Download references

Acknowledgements

This work was supported in part by NSF Award III-1738895.

Funding

This study was funded by NSF Award III-1738895.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Aparna Gupta.

Ethics declarations

Conflict of interest

All authors certify that they have no affiliations with or involvement in any organization or entity with any financial interest or non-financial interest in the subject matter or materials discussed in this manuscript. The authors have no competing interests.

Research involving Human Participants and/or Animals

Not applicable.

Informed consent

Not applicable.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Gupta, A., Rawte, V. & Zaki, M.J. Predicting Firm Financial Performance from SEC Filing Changes Using Automatically Generated Dictionary. Comput Econ (2023). https://doi.org/10.1007/s10614-023-10443-x

Download citation

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s10614-023-10443-x

Keywords

JEL Classification

Navigation