Skip to main content

Leveraging statistical information in fine-grained financial sentiment analysis

Abstract

The recent development of deep learning-based natural language processing (NLP) methods has fostered many downstream applications in various fields. As one of the applications in the financial industry, fine-grained financial sentiment analysis (FSA) aims to understand the sentimental orientation, i.e., bullish or bearish, of financial texts by predicting the polarity score and has been widely applied in the financial industry stock-related opinion mining. Because of the lack of a large-scale labeled dataset and the domain-dependent nature, FSA is challenging. Previous works mainly focus on constructing and exploiting handcrafted lexicons that encode expert knowledge to enhance the semantic features in decision making, which yields improvements but are expensive to acquire. This paper proposes a lightweight regression model incorporating the statistical distribution of a term over the polarity range, say between − 1 and 1, to address the fine-grained FSA task. More concretely, we first count each word’s appearance at different polarity intervals and produce a statistic-based representation for each text, which will be encoded as a corpus-level statistical feature vector by an autoencoder. Subsequently, the obtained feature vector will be integrated with the semantic feature vector in the regression model. Our experiments show such a model can produce significant improvements compared with the baseline models on two FSA subsets, i.e., news headlines and microblogs, without a computational overhead. Furthermore, we notice the signs that lexicon-based approaches have neglected can play an important role in FSA.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Notes

  1. https://stocktwits.com/

  2. https://twitter.com/

  3. http://finance.yahoo.com/

  4. https://fasttext.cc/docs/en/english-vectors.html

References

  1. Akhtar, M.S., Kumar, A., Ghosal, D., Ekbal, A., Bhattacharyya, P.: A multilayer perceptron based ensemble technique for fine-grained financial sentiment analysis. In: Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing. https://doi.org/10.18653/v1/D17-1057, pp 540–546. Association for Computational Linguistics, Copenhagen, Denmark (2017)

  2. Antweiler, W., Frank, M.Z.: Is all that talk just noise? the information content of internet stock message boards. J. Financ. 59(3), 1259–1294 (2004). http://www.jstor.org/stable/3694736

    Article  Google Scholar 

  3. Bahdanau, D., Cho, K., Bengio, Y.: Neural machine translation by jointly learning to align and translate. In: Proceedings of the 3rd International Conference on Learning Representations (2015)

  4. Brown, G.W., Cliff, M.T.: Investor sentiment and the near-term stock market. J. Empir. Financ. 11(1), 1–27 (2004). https://doi.org/10.1016/j.jempfin.2002.12.001, https://www.sciencedirect.com/science/article/pii/S0927539803000422

    Article  Google Scholar 

  5. Cai, Y., Huang, Q., Lin, Z., Xu, J., Chen, Z., Li, Q.: Recurrent neural network with pooling operation and attention mechanism for sentiment analysis: A multi-task learning approach. Knowledge-Based Systems 203, 105856 (2020)

    Article  Google Scholar 

  6. Cambria, E., Li, Y., Xing, F.Z., Poria, S., Kwok, K.: Senticnet 6: Ensemble application of symbolic and subsymbolic ai for sentiment analysis. In: Proceedings of the 29th ACM International Conference on Information & Knowledge Management. Association for Computing Machinery, pp 105–114 (2020)

  7. Chen, X., Xie, H., Cheng, G., Li, Z.: A decade of sentic computing: Topic modeling and bibliometric analysis. Cogn. Comput., 1–24 (2021)

  8. Cortis, K., Freitas, A., Daudert, T., Huerlimann, M., Zarrouk, M., Handschuh, S., Davis, B.: SemEval-2017 task 5: Fine-grained sentiment analysis on financial microblogs and news. In: Proceedings of the 11th International Workshop on Semantic Evaluation (SemEval-2017). https://doi.org/10.18653/v1/S17-2089, pp 519–535. Association for Computational Linguistics, Vancouver, Canada (2017)

  9. Davis, B., Cortis, K., Vasiliu, L., Koumpis, A., McDermott, R., Handschuh, S.: Social sentiment indices powered by x-scores. 2nd International Conference on Big Data, Small Data, Linked Data and Open Data, ALLDATA 2016. p. 21 (2016)

  10. Do, H.H., Prasad, P., Maag, A., Alsadoon, A.: Deep learning for aspect-based sentiment analysis: A comparative review. Expert Syst. Appl. 118, 272–299 (2019). https://doi.org/10.1016/j.eswa.2018.10.003, https://www.sciencedirect.com/science/article/pii/S0957417418306456

    Article  Google Scholar 

  11. Fama, E.F.: Efficient capital markets: A review of theory and empirical work. J. Financ. 25(2), 383–417 (1970). http://www.jstor.org/stable/2325486

    Article  Google Scholar 

  12. Feuerriegel, S., Prendinger, H.: News-based trading strategies. Decis. Support Syst. 90, 65–74 (2016)

    Article  Google Scholar 

  13. Ghosal, D., Bhatnagar, S., Akhtar, M.S., Ekbal, A., Bhattacharyya, P.: IITP at SemEval-2017 task 5: An ensemble of deep learning and feature based models for financial sentiment analysis. In: Proceedings of the 11th International Workshop on Semantic Evaluation (SemEval-2017). https://doi.org/10.18653/v1/S17-2154, pp 899–903. Association for Computational Linguistics, Vancouver, Canada (2017)

  14. Graves, A., Jaitly, N., Mohamed, A.: Hybrid speech recognition with deep bidirectional lstm. In: 2013 IEEE Workshop on Automatic Speech Recognition and Understanding. https://doi.org/10.1109/ASRU.2013.6707742, pp 273–278 (2013)

  15. Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997). https://doi.org/10.1162/neco.1997.9.8.1735

    Article  Google Scholar 

  16. Hu, M., Liu, B.: Mining and summarizing customer reviews. In: Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’04. https://doi.org/10.1145/1014052.1014073, pp 168–177. Association for Computing Machinery, New York, NY, USA (2004)

  17. Hutto, C., Gilbert, E.: Vader: A parsimonious rule-based model for sentiment analysis of social media text. In: Proceedings of the International AAAI Conference on Web and Social Media, vol. 8 (2014)

  18. Jiang, M., Lan, M., Wu, Y.: ECNU at SemEval-2017 task 5: An ensemble of regression algorithms with effective features for fine-grained sentiment analysis in financial domain. In: Proceedings of the 11th International Workshop on Semantic Evaluation (SemEval-2017). https://doi.org/10.18653/v1/S17-2152, pp 888–893. Association for Computational Linguistics, Vancouver, Canada (2017)

  19. Johnson, R., Zhang, T.: Deep pyramid convolutional neural networks for text categorization. In: Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp 562–570. Association for Computational Linguistics (2017)

  20. Kar, S., Maharjan, S., Solorio, T.: RiTUAL-UH at SemEval-2017 task 5: Sentiment analysis on financial data using neural networks. In: Proceedings of the 11th International Workshop on Semantic Evaluation (SemEval-2017). https://doi.org/10.18653/v1/S17-2150, pp 877–882. Association for Computational Linguistics, Vancouver, Canada (2017)

  21. Kim, Y.: Convolutional neural networks for sentence classification. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp 1746–1751. Association for Computational Linguistics (2014)

  22. Kingma, D.P., Welling, M.: Auto-encoding variational bayes. In: Proceedings of the 2014 International Conference on Learning Representations (2014)

  23. Kiritchenko, S., Zhu, X., Mohammad, S.M.: Sentiment analysis of short informal texts. J. Artif. Intell. Res. 50, 723–762 (2014)

    Article  Google Scholar 

  24. Li, F.: The information content of forward-looking statements in corporate filings—a naïve bayesian machine learning approach. J. Account. Res. 48(5), 1049–1102 (2010). https://doi.org/10.1111/j.1475-679X.2010.00382.x

    Article  Google Scholar 

  25. Li, X., Li, Z., Xie, H., Li, Q.: Merging statistical feature via adaptive gate for improved text classification. Proc. AAAI Conf. Artif. Intell. 35 (15), 13288–13296 (2021). https://ojs.aaai.org/index.php/AAAI/article/view/17569

    Google Scholar 

  26. Li, X., Li, Z., Zhao, Y., Xie, H., Li, Q.: Incorporating effective global information via adaptive gate attention for text classification. arXiv:2002.09673 (2020)

  27. Li, X., Xie, H., Chen, L., Wang, J., Deng, X.: News impact on stock price return via sentiment analysis. Knowl.-Based Syst. 69, 14–23 (2014). https://doi.org/10.1016/j.knosys.2014.04.022, https://www.sciencedirect.com/science/article/pii/S0950705114001440

    Article  Google Scholar 

  28. Li, X., Xie, H., Lau, R.Y.K., Wong, T., Wang, F.L.: Stock prediction via sentimental transfer learning. IEEE Access 6, 73110–73118 (2018)

    Article  Google Scholar 

  29. Li, Z., Chen, X., Xie, H., Li, Q., Tao, X.: Emochannelattn: Exploring emotional construction towards multi-class emotion classification. In: 2020 IEEE/WIC/ACM International Joint Conference on Web Intelligence and Intelligent Agent Technology (WI-IAT). https://doi.org/10.1109/WIIAT50758.2020.00036, pp 242–249 (2020)

  30. Li, Z., Xie, H., Cheng, G., Li, Q.: Word-level emotion distribution with two schemas for short text emotion classification. Knowl.-Based Syst., 107163. https://doi.org/10.1016/j.knosys.2021.107163, https://www.sciencedirect.com/science/article/pii/S0950705121004263 (2021)

  31. Loughran, T., Mcdonald, B.: When is a liability not a liability? textual analysis, dictionaries, and 10-ks. J. Financ. 66 (1), 35–65 (2011). https://doi.org/10.1111/j.1540-6261.2010.01625.x

    Article  Google Scholar 

  32. Luo, L., Ao, X., Pan, F., Wang, J., Zhao, T., Yu, N., He, Q.: Beyond polarity: Interpretable financial sentiment analysis with hierarchical query-driven attention. In: Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence, IJCAI-18. https://doi.org/10.24963/ijcai.2018/590, pp 4244–4250 (2018)

  33. Mai, L., Le, B.: Joint sentence and aspect-level sentiment analysis of product comments. Ann. Oper. Res., 1–21 (2020)

  34. Malkiel, B.G.: The efficient market hypothesis and its critics. J. Econ. Perspect. 17(1), 59–82 (2003). https://doi.org/10.1257/089533003321164958

    Article  Google Scholar 

  35. Mikolov, T., Grave, E., Bojanowski, P., Puhrsch, C., Joulin, A.: Advances in pre-training distributed word representations. In: Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018). https://www.aclweb.org/anthology/L18-1008, pp 52–55. European Language Resources Association (ELRA) (2018)

  36. Mohammad, S.M., Kiritchenko, S.: Using hashtags to capture fine emotion categories from tweets. Comput. Intell. 31(2), 301–326 (2015). https://doi.org/10.1111/coin.12024

    MathSciNet  Article  Google Scholar 

  37. Mohammad, S.M., Turney, P.D.: Nrc emotion lexicon. National Research Council, Canada, pp. 1–234 (2013)

  38. Mowlaei, M.E., Saniee Abadeh, M., Keshavarz, H.: Aspect-based sentiment analysis using adaptive aspect-based lexicons. Expert Syst. Appl. 148, 113234 (2020). https://doi.org/10.1016/j.eswa.2020.113234, https://www.sciencedirect.com/science/article/pii/S0957417420300609

    Article  Google Scholar 

  39. Ramos, J., et al.: Using tf-idf to determine word relevance in document queries. In: Proceedings of the First Instructional Conference on Machine Learning (2003)

  40. Sabherwal, S., Sarkar, S.K., Zhang, Y.: Do internet stock message boards influence trading? evidence from heavily discussed stocks with no fundamental news. J. Bus. Financ. Account. 38(9-10), 1209–1237 (2011). https://doi.org/10.1111/j.1468-5957.2011.02258.x

    Article  Google Scholar 

  41. Salton, G., Buckley, C.: Term-weighting approaches in automatic text retrieval. Inf. Process. Manag. 24(5), 513–523 (1988)

    Article  Google Scholar 

  42. Stone, P.J., Dunphy, D.C., Smith, M.S.: The general inquirer: A computer approach to content analysis (1966)

  43. Tang, D., Qin, B., Liu, T.: Document modeling with gated recurrent neural network for sentiment classification. In: Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, pp 1422–1432 (2015)

  44. Wang, G., Wang, T., Wang, B., Sambasivan, D., Zhang, Z., Zheng, H., Zhao, B.Y.: Crowds on wall street: Extracting value from collaborative investing platforms. In: Proceedings of the 18th ACM Conference on Computer Supported Cooperative Work &; Social Computing, CSCW ’15. https://doi.org/10.1145/2675133.2675144, pp 17–30. Association for Computing Machinery, New York, NY, USA (2015)

  45. Wang, Q., Lau, R.Y.K.: The impact of investors’ surprise emotion on post-m&a performance: A social media analytics approach. In: 40th International Conference on Information Systems (ICIS 2019). Association for Information Systems (2019)

  46. Xing, F., Malandri, L., Zhang, Y., Cambria, E.: Financial sentiment analysis: An investigation into common mistakes and silver bullets. In: Proceedings of the 28th International Conference on Computational Linguistics. https://doi.org/10.18653/v1/2020.coling-main.85, pp 978–987. International Committee on Computational Linguistics, Barcelona Spain (Online) (2020)

  47. Xing, F.Z., Cambria, E., Welsch, R.E.: Natural language based financial forecasting: a survey. Artif. Intell. Rev. 50(1), 49–73 (2018)

    Article  Google Scholar 

  48. Xu, J., Cai, Y., Wu, X., Lei, X., Huang, Q., Leung, H.F., Li, Q.: Incorporating context-relevant concepts into convolutional neural networks for short text classification. Neurocomputing 386, 42–53 (2020)

    Article  Google Scholar 

  49. Yuan, H., Tang, Y., Xu, W., Lau, R.Y.K.: Exploring the influence of multimodal social media data on stock performance: an empirical perspective and analysis. Internet Res. (2021)

  50. Zhang, X., Zhao, J.J., LeCun, Y.: Character-level convolutional networks for text classification. In: Proceedings of Advances in Neural Information Processing Systems 28: Annual Conference on Neural Information Processing Systems 2015, pp 649–657 (2015)

  51. Zubiaga, A.: Exploiting Class Labels to Boost Performance on Embedding-Based Text Classification. Association for Computing Machinery, New York, NY USA. https://doi.org/10.1145/3340531.3417444 (2020)

Download references

Acknowledgements

The work of this paper has been supported by the Hong Kong Research Grants Council under the General Research Fund (Project No. 15200021), the Lam Woo Research Fund (Project No. LWI20011) and the Faculty Research Grant (Project No. DB21B6) of Lingnan University, Hong Kong, the One-off Special Fund from Central and Faculty Fund in Support of Research from 2019/20 to 2021/22 (Project No. MIT02/19-20) and the Research Cluster Fund (Project No. RG 78/2019-2020R) of The Education University of Hong Kong. Lau’s work was supported by a grant from the Research Grants Council of the Hong Kong Special Administrative Region, China (Project No. CityU 11507219). Dian Zhang’s work was supported by NSFC (Project No. 61872247).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Zongxi Li.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This article belongs to the Topical Collection: Special Issue on Web Intelligence =Artificial Intelligence in the Connected World Guest Editors: Yuefeng Li, Amit Sheth, Athena Vakali, and Xiaohui Tao

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Zhang, H., Li, Z., Xie, H. et al. Leveraging statistical information in fine-grained financial sentiment analysis. World Wide Web 25, 513–531 (2022). https://doi.org/10.1007/s11280-021-00993-1

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11280-021-00993-1

Keywords

  • Financial sentiment analysis
  • Sentiment analysis
  • Natural language processing
  • Information retrieval