Predicting Firm Financial Performance from SEC Filing Changes Using Automatically Generated Dictionary

Gupta, Aparna; Rawte, Vipula; Zaki, Mohammed J.

doi:10.1007/s10614-023-10443-x

Predicting Firm Financial Performance from SEC Filing Changes Using Automatically Generated Dictionary

Published: 12 August 2023

(2023)
Cite this article

Computational Economics Aims and scope Submit manuscript

302 Accesses
Explore all metrics

Abstract

Textual data are increasingly used to predict firm performance, however extracting useful signals towards serving this goal with a continuously growing repository of financial reports and documents is challenging, even by the state-of-the-art machine learning and natural language processing (NLP) techniques. We propose a novel approach to automatically create a word list from SEC filings (10-K and 8-K reports) using advanced deep learning and NLP techniques and compare their performance against the widely used Loughran–McDonald sentiment dictionaries. We additionally analyze a corpus of 8-K and 10-K documents to evaluate their relative informativeness for firm performance prediction. Since 8-K filings provide corporate updates along a fiscal year, we compare their content against changes in 10-Ks between consecutive years to assess the incremental value of information provided in these regulatory filings. Information effectiveness is examined by predicting six key financial indicators for a set of US banks using ridge regression. Our results positively support sentiment dictionaries expansion by automatically extracting meaning from text and highlight the benefits obtainable from utilizing update filings.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1

A survey on sentiment analysis methods, applications, and challenges

Article 07 February 2022

Stock price reactions to ESG news: the role of ESG ratings and disagreement

Article 10 March 2022

Sentiment Analysis in the Age of Generative AI

Article Open access 05 March 2024

Notes

http://www.wjh.harvard.edu/inquirer.
https://github.com/google-research/bert.
https://github.com/hanxiao/bert-as-service.
https://www.nltk.org/
https://scikit-learn.org/
https://pypi.org/project/beautifulsoup4/.
https://sraf.nd.edu/textual-analysis/resources/.
Not to be confused with Altman’s Z-score (Altman, 1968) built by a regression analysis to measure creditworthiness of any firm in terms of a set of financial ratios.

References

Akhtar, M. S., Kumar, A., Ghosal, D., Ekbal, A., and Bhattacharyya, P. (2017). A multilayer perceptron based ensemble technique for fine-grained financial sentiment analysis. In Proceedings of the 2017 conference on empirical methods in natural language processing (pp. 540–546).
Altman, E. I. (1968). Financial ratios, discriminant analysis and the prediction of corporate bankruptcy. The Journal of Finance, 23(4), 589–609.
Article Google Scholar
Araci, D. (2019). Finbert: Financial sentiment analysis with pre-trained language models. arXiv preprint arXiv:1908.10063.
Brown, S. V., & Tucker, J. W. (2011). Large-sample evidence on firms’ year-over-year md &a modifications. Journal of Accounting Research, 49(2), 309–346.
Article Google Scholar
Chang, C. Y., Zhang, Y., Teng, Z., Bozanic, Z., & Ke, B. (2016). Measuring the information content of financial news. In Proceedings of COLING 2016, the 26th international conference on computational linguistics: Technical papers (pp. 3216–3225).
Chen, Y., Rabbani, R. M., Gupta, A., & Zaki, M. J. (2017). Comparative text analytics via topic modeling in banking. In 2017 IEEE symposium series on computational intelligence (SSCI) (pp. 1–8). IEEE.
Das, S. R., et al. (2014). Text and context: Language analytics in finance. Foundations and Trends ® in Finance, 8(3), 145–261.
Article Google Scholar
Dereli, N. & Saraçlar, M. (2019). Convolutional neural networks for financial text regression. In Proceedings of the 57th annual meeting of the association for computational linguistics: Student research workshop (pp. 331–337).
DeSola, V., Hanna, K., & Nonis, P. (2019). Finbert: Pre-trained model on sec filings for financial natural language tasks. University of California.
Google Scholar
Devlin, J., Chang, M.-W., Lee, K., & Toutanova, K. (2018). Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805.
Duarte, J. J., Montenegro González, S., & Cruz, J. C. (2021). Predicting stock price falls using news data: Evidence from the Brazilian market. Computational Economics, 57(1), 311–340.
Article Google Scholar
Emerson, S., Kennedy, R., O’Shea, L., & O’Brien, J. (2019). Trends and applications of machine learning in quantitative finance. In 8th international conference on economics and finance research (ICEFR 2019).
Gupta, A. & Owusu, A. (2019). Identifying the risk culture of banks using machine learning. Available at SSRN 3441861.
Huang, K.-W. (2010). Exploring the information contents of risk factors in sec form 10-k: A multi-label text classification application. Available at SSRN 1784527.
Khaidem, L., Saha, S., & Dey, S. R. (2016). Predicting the direction of stock market prices using random forest. arXiv preprint arXiv:1605.00003.
Khalil, F., & Pipa, G. (2022). Is deep-learning and natural language processing transcending the financial forecasting? investigation through lens of news analytic process. Computational Economics, 60(1), 147–171.
Article Google Scholar
Kogan, S., Levin, D., Routledge, B. R., Sagi, J. S., & Smith, N. A. (2009). Predicting risk from financial reports with regression. In Proceedings of human language technologies: The 2009 annual conference of the North American chapter of the association for computational linguistics (pp. 272–280).
Lee, J. & Lee, H. (2008). Predicting corporate 8-k content using machine learning techniques. Graduate School of Business Stanford University.
Leidner, J. L. & Schilder, F. (2010). Hunting for the black swan: risk mining from text. In Proceedings of the ACL 2010 system demonstrations (pp. 54–59). Association for Computational Linguistics.
Loughran, T., & McDonald, B. (2011). When is a liability not a liability? textual analysis, dictionaries, and 10-ks. The Journal of Finance, 66(1), 35–65.
Article Google Scholar
Loughran, T., & McDonald, B. (2016). Textual analysis in accounting and finance: A survey. Journal of Accounting Research, 54(4), 1187–1230.
Article Google Scholar
Mikolov, T., Chen, K., Corrado, G., & Dean, J. (2013). Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781.
Nopp, C. & Hanbury, A. (2015). Detecting risks in the banking system by sentiment analysis. In Proceedings of the 2015 conference on empirical methods in natural language processing (pp. 591–600).
Nousi, P., Tsantekidis, A., Passalis, N., Ntakaris, A., Kanniainen, J., Tefas, A., Gabbouj, M., & Iosifidis, A. (2019). Machine learning for forecasting mid-price movements using limit order book data. Ieee Access, 7, 64722–64736.
Article Google Scholar
Peters, M. E., Neumann, M., Iyyer, M., Gardner, M., Clark, C., Lee, K., & Zettlemoyer, L. (2018). Deep contextualized word representations. arXiv preprintarXiv:1802.05365.
Radford, A., Narasimhan, K., Salimans, T., & Sutskever, I. (2018). Improving language understanding by generative pre-training. https://s3-us-west-2. amazonaws. com/openai-assets/researchcovers/languageunsupervised/language understanding paper.pdf.
Rawte, V., Gupta, A., & Zaki, M. J. (2018). Analysis of year-over-year changes in risk factors disclosure in 10-k filings. In Proceedings of the fourth international workshop on data science for macro-modeling with financial and economic datasets (pp. 1–4).
Roy, A. D. (1952). Safety first and the holding of assets. Econometrica: Journal of the Econometric Society (pp. 431–449).
Sardelich, M. & Manandhar, S. (2018). Multimodal deep learning for short-term stock volatility prediction. arXiv preprint arXiv:1812.10479.
Sedinkina, M., Breitkopf, N., & Schütze, H. (2019). Automatic domain adaptation outperforms manual domain adaptation for predicting financial outcomes. In Proceedings of the 57th annual meeting of the association for computational linguistics (pp. 346–359).
Sennrich, R., Haddow, B., & Birch, A. (2015). Neural machine translation of rare words with subword units. arXiv preprint arXiv:1508.07909.
Tetlock, P. C., Saar-Tsechansky, M., & Macskassy, S. (2008). More than words: Quantifying language to measure firms’ fundamentals. The Journal of Finance, 63(3), 1437–1467.
Article Google Scholar
Theil, C. K., Štajner, S., & Stuckenschmidt, H. (2018). Word embeddings-based uncertainty detection in financial disclosures. In Proceedings of the first workshop on economics and natural language processing (pp. 32–37).
Theil, C. K., Štajner, S., & Stuckenschmidt, H. (2020). Explaining financial uncertainty through specialized word embeddings. ACM Transactions on Data Science, 1(1), 1–19.
Article Google Scholar
Tsai, M.-F. & Wang, C.-J. (2014). Financial keyword expansion via continuous word vector representations. In Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP) (pp. 1453–1458).
Tsai, M.-F., & Wang, C.-J. (2017). On the risk prediction and analysis of soft information in finance reports. European Journal of Operational Research, 257(1), 243–250.
Article Google Scholar
Tsai, M.-F., Wang, C.-J., & Chien, P.-C. (2016). Discovering finance keywords via continuous-space language models. ACM Transactions on Management Information Systems (TMIS), 7(3), 1–17.
Article Google Scholar
Tsantekidis, A., Passalis, N., Tefas, A., Kanniainen, J., Gabbouj, M., & Iosifidis, A. (2017). Forecasting stock prices from the limit order book using convolutional neural networks. In 2017 IEEE 19th conference on business informatics (CBI) (Vol. 1, pp. 7–12). IEEE.
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, Ł., & Polosukhin, I. (2017). Attention is all you need. In Advances in neural information processing systems (pp. 5998–6008).
Wang, Y. & Ni, X. S. (2019). A xgboost risk model via feature selection and bayesian hyper-parameter optimization. arXiv preprint arXiv:1901.08433.
Yang, L., Zhang, Z., Xiong, S., Wei, L., Ng, J., Xu, L., & Dong, R. (2018). Explainable text-driven neural network for stock prediction. In 2018 5th IEEE international conference on cloud computing and intelligence systems (CCIS) (pp. 441–445). IEEE.
Zaki, M. J., & Meira, W., Jr. (2020). Data mining and machine learning: Fundamental concepts and algorithms. Cambridge University Press.
Book Google Scholar
Zhai, S. S. & Zhang, Z. D. (2019). Forecasting firm material events from 8-k reports. In Proceedings of the second workshop on economics and natural language processing (pp. 22–30).

Download references

Acknowledgements

This work was supported in part by NSF Award III-1738895.

Funding

This study was funded by NSF Award III-1738895.

Author information

Authors and Affiliations

Lally School of Management, Rensselaer Polytechnic Institute, Troy, NY, 12180, US
Aparna Gupta
Computer Science Department, Rensselaer Polytechnic Institute, Troy, NY, 12180, US
Vipula Rawte & Mohammed J. Zaki

Authors

Aparna Gupta
View author publications
You can also search for this author in PubMed Google Scholar
Vipula Rawte
View author publications
You can also search for this author in PubMed Google Scholar
Mohammed J. Zaki
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Aparna Gupta.

Ethics declarations

Conflict of interest

All authors certify that they have no affiliations with or involvement in any organization or entity with any financial interest or non-financial interest in the subject matter or materials discussed in this manuscript. The authors have no competing interests.

Research involving Human Participants and/or Animals

Not applicable.

Informed consent

Not applicable.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Gupta, A., Rawte, V. & Zaki, M.J. Predicting Firm Financial Performance from SEC Filing Changes Using Automatically Generated Dictionary. Comput Econ (2023). https://doi.org/10.1007/s10614-023-10443-x

Download citation

Accepted: 26 July 2023
Published: 12 August 2023
DOI: https://doi.org/10.1007/s10614-023-10443-x

Keywords

JEL Classification

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Predicting Firm Financial Performance from SEC Filing Changes Using Automatically Generated Dictionary

Abstract

Access this article

Similar content being viewed by others

A survey on sentiment analysis methods, applications, and challenges

Stock price reactions to ESG news: the role of ESG ratings and disagreement

Sentiment Analysis in the Age of Generative AI

Notes

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Research involving Human Participants and/or Animals

Informed consent

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

JEL Classification

Navigation

Predicting Firm Financial Performance from SEC Filing Changes Using Automatically Generated Dictionary

Abstract

Access this article

Similar content being viewed by others

A survey on sentiment analysis methods, applications, and challenges

Stock price reactions to ESG news: the role of ESG ratings and disagreement

Sentiment Analysis in the Age of Generative AI

Notes

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Research involving Human Participants and/or Animals

Informed consent

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

JEL Classification

Search

Navigation