Abstract
Textual data are increasingly used to predict firm performance, however extracting useful signals towards serving this goal with a continuously growing repository of financial reports and documents is challenging, even by the state-of-the-art machine learning and natural language processing (NLP) techniques. We propose a novel approach to automatically create a word list from SEC filings (10-K and 8-K reports) using advanced deep learning and NLP techniques and compare their performance against the widely used Loughran–McDonald sentiment dictionaries. We additionally analyze a corpus of 8-K and 10-K documents to evaluate their relative informativeness for firm performance prediction. Since 8-K filings provide corporate updates along a fiscal year, we compare their content against changes in 10-Ks between consecutive years to assess the incremental value of information provided in these regulatory filings. Information effectiveness is examined by predicting six key financial indicators for a set of US banks using ridge regression. Our results positively support sentiment dictionaries expansion by automatically extracting meaning from text and highlight the benefits obtainable from utilizing update filings.
Similar content being viewed by others
Notes
Not to be confused with Altman’s Z-score (Altman, 1968) built by a regression analysis to measure creditworthiness of any firm in terms of a set of financial ratios.
References
Akhtar, M. S., Kumar, A., Ghosal, D., Ekbal, A., and Bhattacharyya, P. (2017). A multilayer perceptron based ensemble technique for fine-grained financial sentiment analysis. In Proceedings of the 2017 conference on empirical methods in natural language processing (pp. 540–546).
Altman, E. I. (1968). Financial ratios, discriminant analysis and the prediction of corporate bankruptcy. The Journal of Finance, 23(4), 589–609.
Araci, D. (2019). Finbert: Financial sentiment analysis with pre-trained language models. arXiv preprint arXiv:1908.10063.
Brown, S. V., & Tucker, J. W. (2011). Large-sample evidence on firms’ year-over-year md &a modifications. Journal of Accounting Research, 49(2), 309–346.
Chang, C. Y., Zhang, Y., Teng, Z., Bozanic, Z., & Ke, B. (2016). Measuring the information content of financial news. In Proceedings of COLING 2016, the 26th international conference on computational linguistics: Technical papers (pp. 3216–3225).
Chen, Y., Rabbani, R. M., Gupta, A., & Zaki, M. J. (2017). Comparative text analytics via topic modeling in banking. In 2017 IEEE symposium series on computational intelligence (SSCI) (pp. 1–8). IEEE.
Das, S. R., et al. (2014). Text and context: Language analytics in finance. Foundations and Trends ® in Finance, 8(3), 145–261.
Dereli, N. & Saraçlar, M. (2019). Convolutional neural networks for financial text regression. In Proceedings of the 57th annual meeting of the association for computational linguistics: Student research workshop (pp. 331–337).
DeSola, V., Hanna, K., & Nonis, P. (2019). Finbert: Pre-trained model on sec filings for financial natural language tasks. University of California.
Devlin, J., Chang, M.-W., Lee, K., & Toutanova, K. (2018). Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805.
Duarte, J. J., Montenegro González, S., & Cruz, J. C. (2021). Predicting stock price falls using news data: Evidence from the Brazilian market. Computational Economics, 57(1), 311–340.
Emerson, S., Kennedy, R., O’Shea, L., & O’Brien, J. (2019). Trends and applications of machine learning in quantitative finance. In 8th international conference on economics and finance research (ICEFR 2019).
Gupta, A. & Owusu, A. (2019). Identifying the risk culture of banks using machine learning. Available at SSRN 3441861.
Huang, K.-W. (2010). Exploring the information contents of risk factors in sec form 10-k: A multi-label text classification application. Available at SSRN 1784527.
Khaidem, L., Saha, S., & Dey, S. R. (2016). Predicting the direction of stock market prices using random forest. arXiv preprint arXiv:1605.00003.
Khalil, F., & Pipa, G. (2022). Is deep-learning and natural language processing transcending the financial forecasting? investigation through lens of news analytic process. Computational Economics, 60(1), 147–171.
Kogan, S., Levin, D., Routledge, B. R., Sagi, J. S., & Smith, N. A. (2009). Predicting risk from financial reports with regression. In Proceedings of human language technologies: The 2009 annual conference of the North American chapter of the association for computational linguistics (pp. 272–280).
Lee, J. & Lee, H. (2008). Predicting corporate 8-k content using machine learning techniques. Graduate School of Business Stanford University.
Leidner, J. L. & Schilder, F. (2010). Hunting for the black swan: risk mining from text. In Proceedings of the ACL 2010 system demonstrations (pp. 54–59). Association for Computational Linguistics.
Loughran, T., & McDonald, B. (2011). When is a liability not a liability? textual analysis, dictionaries, and 10-ks. The Journal of Finance, 66(1), 35–65.
Loughran, T., & McDonald, B. (2016). Textual analysis in accounting and finance: A survey. Journal of Accounting Research, 54(4), 1187–1230.
Mikolov, T., Chen, K., Corrado, G., & Dean, J. (2013). Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781.
Nopp, C. & Hanbury, A. (2015). Detecting risks in the banking system by sentiment analysis. In Proceedings of the 2015 conference on empirical methods in natural language processing (pp. 591–600).
Nousi, P., Tsantekidis, A., Passalis, N., Ntakaris, A., Kanniainen, J., Tefas, A., Gabbouj, M., & Iosifidis, A. (2019). Machine learning for forecasting mid-price movements using limit order book data. Ieee Access, 7, 64722–64736.
Peters, M. E., Neumann, M., Iyyer, M., Gardner, M., Clark, C., Lee, K., & Zettlemoyer, L. (2018). Deep contextualized word representations. arXiv preprintarXiv:1802.05365.
Radford, A., Narasimhan, K., Salimans, T., & Sutskever, I. (2018). Improving language understanding by generative pre-training. https://s3-us-west-2. amazonaws. com/openai-assets/researchcovers/languageunsupervised/language understanding paper.pdf.
Rawte, V., Gupta, A., & Zaki, M. J. (2018). Analysis of year-over-year changes in risk factors disclosure in 10-k filings. In Proceedings of the fourth international workshop on data science for macro-modeling with financial and economic datasets (pp. 1–4).
Roy, A. D. (1952). Safety first and the holding of assets. Econometrica: Journal of the Econometric Society (pp. 431–449).
Sardelich, M. & Manandhar, S. (2018). Multimodal deep learning for short-term stock volatility prediction. arXiv preprint arXiv:1812.10479.
Sedinkina, M., Breitkopf, N., & Schütze, H. (2019). Automatic domain adaptation outperforms manual domain adaptation for predicting financial outcomes. In Proceedings of the 57th annual meeting of the association for computational linguistics (pp. 346–359).
Sennrich, R., Haddow, B., & Birch, A. (2015). Neural machine translation of rare words with subword units. arXiv preprint arXiv:1508.07909.
Tetlock, P. C., Saar-Tsechansky, M., & Macskassy, S. (2008). More than words: Quantifying language to measure firms’ fundamentals. The Journal of Finance, 63(3), 1437–1467.
Theil, C. K., Štajner, S., & Stuckenschmidt, H. (2018). Word embeddings-based uncertainty detection in financial disclosures. In Proceedings of the first workshop on economics and natural language processing (pp. 32–37).
Theil, C. K., Štajner, S., & Stuckenschmidt, H. (2020). Explaining financial uncertainty through specialized word embeddings. ACM Transactions on Data Science, 1(1), 1–19.
Tsai, M.-F. & Wang, C.-J. (2014). Financial keyword expansion via continuous word vector representations. In Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP) (pp. 1453–1458).
Tsai, M.-F., & Wang, C.-J. (2017). On the risk prediction and analysis of soft information in finance reports. European Journal of Operational Research, 257(1), 243–250.
Tsai, M.-F., Wang, C.-J., & Chien, P.-C. (2016). Discovering finance keywords via continuous-space language models. ACM Transactions on Management Information Systems (TMIS), 7(3), 1–17.
Tsantekidis, A., Passalis, N., Tefas, A., Kanniainen, J., Gabbouj, M., & Iosifidis, A. (2017). Forecasting stock prices from the limit order book using convolutional neural networks. In 2017 IEEE 19th conference on business informatics (CBI) (Vol. 1, pp. 7–12). IEEE.
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, Ł., & Polosukhin, I. (2017). Attention is all you need. In Advances in neural information processing systems (pp. 5998–6008).
Wang, Y. & Ni, X. S. (2019). A xgboost risk model via feature selection and bayesian hyper-parameter optimization. arXiv preprint arXiv:1901.08433.
Yang, L., Zhang, Z., Xiong, S., Wei, L., Ng, J., Xu, L., & Dong, R. (2018). Explainable text-driven neural network for stock prediction. In 2018 5th IEEE international conference on cloud computing and intelligence systems (CCIS) (pp. 441–445). IEEE.
Zaki, M. J., & Meira, W., Jr. (2020). Data mining and machine learning: Fundamental concepts and algorithms. Cambridge University Press.
Zhai, S. S. & Zhang, Z. D. (2019). Forecasting firm material events from 8-k reports. In Proceedings of the second workshop on economics and natural language processing (pp. 22–30).
Acknowledgements
This work was supported in part by NSF Award III-1738895.
Funding
This study was funded by NSF Award III-1738895.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
All authors certify that they have no affiliations with or involvement in any organization or entity with any financial interest or non-financial interest in the subject matter or materials discussed in this manuscript. The authors have no competing interests.
Research involving Human Participants and/or Animals
Not applicable.
Informed consent
Not applicable.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Gupta, A., Rawte, V. & Zaki, M.J. Predicting Firm Financial Performance from SEC Filing Changes Using Automatically Generated Dictionary. Comput Econ (2023). https://doi.org/10.1007/s10614-023-10443-x
Accepted:
Published:
DOI: https://doi.org/10.1007/s10614-023-10443-x