Skip to main content

Financial causal sentence recognition based on BERT-CNN text classification


By studying the causality contained in financial texts, we can further reveal more potential laws of economic activities, such as “factors promoting stable and healthy economic development,” “The central bank's use of the loan window to issue money will increase the probability of inflation,” “The consequence of overcapacity is a decline in product prices,” and so on. Causal sentence recognition usually includes two sub-tasks: one is to design rules or templates to find candidate causal sentences; the other is to design a classifier to sort candidate causal sentences to finally identify the causal sentence. This article first focuses on the characteristics of complex sentence patterns of multiple causes and one effect, multiple effects and one cause, and multiple causes and multiple effects in financial review texts, and provides a relatively complete candidate causal sentence identification rules, which can identify both simple causal sentences and complex causal sentences. A BERT-CNN (Bidirectional Encoder Representations from Transformers-Convolutional Neural Networks) combination model is proposed for the classification of candidate causal sentences. On the one hand, by adding a CNN (Convolutional Neural Networks) structure to the specific task layer of the BERT (Bidirectional Encoder Representations from Transformers) model to capture important local information in the text. On the other hand, in order to make better use of the self-attention mechanism, the local text representation and the output of the BERT are input together in the multi-layer transformer encoder. A complete representation of the text is finally obtained through a single-layer transformer encoder. Experimental results show that our model is significantly better than the most advanced baseline model, with a 5.31 pts improvement in F1 over previous analyzers.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3


  1. 1.

  2. 2. password: 1111.

  3. 3.


  1. 1.

    Tan PN, Steinbach M, Kumar V (2005) Introduction to Data Mining, (First Edition). Addison-Wesley Longman Publishing Co. Inc.

  2. 2.

    Hashimoto C, Torisawa K, De Saeger S, et al (2012). Excitatory or inhibitory: A new semantic orientation extracts contradiction and causality from the web. In: Proc of the Conf on Empirical Methods in Natural Language Processing and Natural Language Learning (ACL), pp. 619–630.

  3. 3.

    Radinsky K, Davidovich S, Markovitch S (2012). Learning causality for news events prediction. In: Proc of the 21st International Conference on World Wide Web (ACM), pp. 909–918.

  4. 4.

    Hashimoto C, Torisawa K, Kloetzer J, et al (2014). Toward future scenario generation: Extracting event causality exploiting semantic relation, context, and association features. In: Proc of the Annual Meeting of the Association for Computational Linguistics (ACL), pp. 987–997.

  5. 5.

    Hashimoto C (2019). Weakly supervised multilingual causality extraction from Wikipedia. In: Proc of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pp. 2988–2999

  6. 6.

    Mehwish R, Roxana G (2010) Another look at causality: Discovering scenario-specific contingency relationships with no supervision. In: Proc of the 4th IEEE International Conference on Semantic Computing (ICSC), pp. 361–68.

  7. 7.

    Peters ME, Neumann M, Lyyer M, et al (2018) Deep contextualized word representations. arXiv preprint

  8. 8.

    Radford A, Narasimhan K, Salimans T, et al (2018) Improving language understanding by generative pre-training.

  9. 9.

    Howard J, Ruder S (2018) Universal language model fine-tuning for text classification. arXiv preprint

  10. 10.

    Devlin J, Wei Chang M, Lee K, Toutanova K (2018) Bert: pre-training of deep bidirectional transformers for language understanding. arXiv preprint

  11. 11.

    Pota M, Ventura M, Catelli, R et al. (2020). An effective bert-based pipeline for twitter sentiment analysis: a case study in Italian. Sensors, pp. 21(1), 133.

  12. 12.

    Conneau A, Schwenk H, Barrault L, et al (2016) Very deep convolutional networks for text classification. arXiv preprint

  13. 13.

    Li W, Gao SB, Zhou H, et al (2019) The automatic text classification method based on bert and feature union. In: Proc of the IEEE 25th International Conference on Parallel and Distributed Systems (ICPADS), pp. 774–777.

  14. 14.

    Chung J, Gulcehre C, Cho K H, et al (2014) Empirical evaluation of gated recurrent neural networks on sequence modeling. arXiv preprint 1412.3555.

  15. 15.

    Garcia D (1997) COATIS: An NLP system to locate expressions of actions connected by causality links. Proc of the 10th European workshop on knowledge acquisition. Springer, Modeling and Management, pp 347–352

    Google Scholar 

  16. 16.

    Khoo CSG, Kornfilt J, Oddy RN et al (1998) Automatic extraction of cause-effect information from newspaper text without knowledge-based inferencing. Literary Linguist Comput 13(4):177–186

    Article  Google Scholar 

  17. 17.

    Khoo CS, Chan S, Niu Y (2000). Extracting causal knowledge from a medical database using graphical patterns In: Proc of the Annual Meeting of the Association for Computational Linguistics (ACL), pp. 336–343.

  18. 18.

    Shen T, Zhou TY, Long G D (2018). DiSAN: directional self-attention network for RNN/CNN-free language understanding. In: Proc of the Thirty-Second AAAI Conference on Artificial Intelligence, pp. 1015–1025.

  19. 19.

    Shen T, Zhou T Y, Long G D, et al (2018) Bi-directional block self-attention for fast and memory-efficient sequence modeling. arXiv preprint

  20. 20.

    Yang Z, Yang D, Dyer C, et al. (2017). Hierarchical attention networks for document classification. In: Proc of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp.1069–1079.

  21. 21.

    Jiang NQ (2012). Research on causality extraction method of emergencies in Chinese text (Dissertation). Dalian University of Technology.

  22. 22.

    Hashimoto C, Torisawa K, Kloetzer J, et al (2015). Generating event causality hypotheses through semantic relations In: Proc of the AAAI Conf on Artificial Intelligence (AAAI), pp. 2396–2403.

  23. 23.

    Ittoo A, Bouma G (2011) Extracting explicit and implicit causal relations from sparse. Lect Notes Comput Sci, Nat Lang Proc Inform Sys 6716:52–63

    Article  Google Scholar 

  24. 24.

    Hidey C, McKeown K (2016) Identifying causal relations using parallel Wikipedia articles. In: Proc of the Annual Meeting of the Association for Computational Linguistics (ACL), pp.1424–1433.

  25. 25.

    Kayesh H, Islam MS, Wang J, et al (2019). Event causality detection in tweets by context word extension and neural networks. In: Proc of the 20th Parallel and Distributed Computing: Applications and Technologies (IEEE): pp. 355–360.

  26. 26.

    Huang G, Xu M, Lin X et al (2017) Shuffle dog: characterizing and adapting user-perceived latency of android apps. IEEE Trans Mob Comput 16(10):2913–2926

    Article  Google Scholar 

  27. 27.

    Zhang Y, Huang G, Liu X, et al (2012). Refactoring android Java code for on-demand computation offloading. ACM SIGPLAN Conference on Object-Oriented Programming, Systems, Languages, and Applications.

  28. 28.

    Song H, Huang G, Chauvel F et al (2011) Supporting runtime software architecture: a bidirectional-transformation-based approach. J Syst Softw 84(5):711–723

    Article  Google Scholar 

  29. 29.

    Chen CM, Chen L, Gan W et al (2021) Discovering high utility-occupancy patterns from uncertain data. Inf Sci 546:1208–1229

    MathSciNet  Article  Google Scholar 

  30. 30.

    Chen CM, Huang Y, Wang KH, et al (2020) A secure authenticated and key exchange scheme for fog computing. Enterprise Information Systems, 1–16.

  31. 31.

    Quamer W et al (2021) SACNN: self-attentive convolutional neural network model for natural language inference. Trans Asian Low-Resour Lang Inform Proc 20(3):1–16

    Article  Google Scholar 

  32. 32.

    Huang G, Liu T, Mei H, et al (2004) Towards Autonomic Computing Middleware via Reflection. International Computer Software and Applications Conference.

  33. 33.

    Huang G, Luo C, Wu K, et al (2019) Software-Defined Infrastructure for Decentralized Data Lifecycle Governance: Principled Design and Open Challenges. IEEE International Conference on Distributed Computing Systems.

  34. 34.

    Chen X, Li M, Zhong H et al (2021) DNNOff: offloading DNN-based intelligent IoT applications in mobile edge computing. IEEE Trans Ind Inform, Publish Online,.

    Article  Google Scholar 

  35. 35.

    Chen X, Chen S, Ma Y et al (2019) An adaptive offloading framework for android applications in mobile edge computing. SCI CHINA Inf Sci 62(8):82102

    Article  Google Scholar 

  36. 36.

    Lin B, Huang Y, Zhang J et al (2020) Cost-driven offloading for dnn-based applications over cloud, edge and end devices. IEEE Trans Industr Inf 16(8):5456–5466

    Article  Google Scholar 

  37. 37.

    Chen X, Zhu F, Chen Z et al (2021) Resource allocation for cloud-based software services using prediction-enabled feedback control with reinforcement learning. IEEE Trans Cloud Compu, Publish Online,.

    Article  Google Scholar 

  38. 38.

    Chen X, Lin J, Ma Y et al (2019) Self-adaptive resource allocation for cloud-based software services based on progressive QoS prediction model. SCIENCE CHINA Inform Sci 62(11):219101

    Article  Google Scholar 

  39. 39.

    Chen X, Wang H, Ma Y et al (2020) Self-adaptive resource allocation for cloud-based software services based on iterative qos prediction model. Futur Gener Comput Syst 105:287–296

    Article  Google Scholar 

  40. 40.

    Huang G, Chen X, Zhang Y et al (2012) Towards Architecture-based management of platforms in the cloud. Front Comp Sci 6(4):388–397

    MathSciNet  Article  Google Scholar 

  41. 41.

    Chen X, Li A, Zeng X et al (2015) Runtime model based approach to IoT application development. Front Comp Sci 9(4):540–553

    Article  Google Scholar 

  42. 42.

    Wang, B (2018). Disconnected recurrent neural networks for text categorization. In: Proc of the 56th Annual Meeting of the Association for Computational Linguistics, Long Papers, (1):1024–1034.

  43. 43.

    Wang S, Huang M, Deng Z (2018). Densely connected CNN with multi-scale feature attention for text classification. In: Proc of the Twenty-Seventh International Joint Conference on Artificial Intelligence (IJCAI), pp. 4468–4474.

  44. 44.

    Kim Y (2014) Convolutional neural networks for sentence classification. arXiv preprint .

  45. 45.

    Tai KS, Socher R, Manning CD (2015) Improved semantic representations from tree-structured long short-term memory networks. arXiv preprint

  46. 46.

    Priyadarshini, I., Cotton, C (2021) A novel LSTM–CNN–grid search-based deep neural network for sentiment analysis. The Journal of Supercomput, pp.1–22.

  47. 47.

    Huang G, Ma Y, Liu X et al (2015) Model-based automated navigation and composition of complex service mashups. IEEE Trans Serv Comput 8(3):494–506

    Article  Google Scholar 

  48. 48.

    Liu X, Huang G, Zhao Q et al (2014) iMashup: a mashup-based framework for service composition. SCIENCE CHINA Inf Sci 54(1):1–20

    Article  Google Scholar 

  49. 49.

    Huang G, Liu X, Ma Y et al (2019) Programming situational mobile web applications with cloud-mobile convergence: an internetware-oriented approach. IEEE Trans Serv Comput 12(1):6–19

    Article  Google Scholar 

  50. 50.

    Huang G, Mei H (2006) Yang F (2006) Runtime recovery and manipulation of software architecture of component-based systems. Autom Softw Eng 13(2):257–281

    MathSciNet  Article  Google Scholar 

  51. 51.

    Zhang X, Zhao J, Lecun Y (2015) Character-level convolutional networks for text classification. In: Proc of the Neural Information Processing Systems. MIT Press, pp. 1–9.

  52. 52.

    Lei Z, Yang Y, Yang M, Liu Y (2018) A multi-sentiment-resource enhanced attention network for sentiment classification. arXiv preprint

  53. 53.

    Kalchbrenner N, Grefenstette E, Blunsom P (2014) A convolutional neural network for modelling sentences. arXiv preprint

  54. 54.

    Johnson R, Zhang T (2017). Deep pyramid convolutional neural networks for text categorization. In: Proc of the 55th Annual Meeting of the Association for Computational Linguistics, 1:1024–1034.

  55. 55.

    Kruengkrai C, Torisawa K, Hashimoto C, et al (2017) Improving event causality recognition with multiple background knowledge sources using multi-column convolutional neural networks. In: Proc of the national conference on artificial intelligence, pp. 3466–3473.

  56. 56.

    Tang D, Bing Q, Liu T (2015) Document modeling with gated recurrent neural network for sentiment classification. In: Proc of the 2015 Conference on Empirical Methods in Natural Language Processing. pp.1422–1432.

  57. 57.

    Huang M, Qian Q, Zhu XY (2017) Encoding syntactic knowledge in neural networks for sentiment classification[J]. ACM Trans Inform Sys 35(3):1–27

    Article  Google Scholar 

  58. 58.

    Zhou C, Sun C, Liu Z, et al (2015) A C-LSTM neural network for text classification. arXiv preprint

  59. 59.

    Xiao Y, Cho K (2016) Efficient character-level document classification by combining convolution and recurrent layers. arXiv preprint 1602.00367.

  60. 60.

    Li M, Hsu W, Xie X et al (2020) SACNN: self-attention convolutional neural network for low-dose CT denoising with self-supervised perceptual loss network. IEEE Trans Med Imag 99:1–1

    Google Scholar 

  61. 61.

    Xiao LQ, Zhang HL, Chen WQ, et al (2018). Learning what to share: leaky multi-task network for text classification. In: Proc of the 27th International Conference on Computational Linguistics, pp.2245–2255.

  62. 62.

    Rajpurkar P, Zhang J, Lopyrev K, et al (2016) SQuAD: 100,000+ questions for machine comprehension of text. arXiv preprint

  63. 63.

    Maas AL, Daly RE, Pham PT, et al (2011) Learning word vectors for sentiment analysis. [C]// Proc of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, vol. 1. Association for Computational Linguistics (ACL), pp.3564–3574.

  64. 64.

    Sang E.F, De Meulder F (2003) Introduction to the CoNLL-2003 shared task: language independent named entity recognition. arXiv preprint cs/0306050 (2003).

  65. 65.

    Jain, Praphula Kumar, et al. (2021). "SpSAN: Sparse self-attentive network-based aspect-aware model for sentiment analysis." Journal of Ambient Intelligence and Humanized Computing 1–18.

  66. 66.

    Joulin A, Grave E, Bojanowski P, et al (2017). Bag of tricks for efficient text classification. In: Proc of the 15th Conference of the European Chapter of the Association for Computational Linguistics (ACL), pp. 2017(2): 427–431.

  67. 67.

    Peng P (2004) A review of causal connective components in modern Chinese. Chinese learning, pp. 2004(2): 45–49.

  68. 68.

    Dong ZD.

  69. 69.

    Xing FY (2001) A study of Chinese complex sentences. commercial press, China.

Download references


This work was sponsored by the National Natural Science Foundation of China [grant number 61972184, 61562032, 61662027, 61762042], Postgraduate Innovation Special Foundation of Jiangxi [grant number YC2017-B065], Modern Agricultural Research Collaborative Innovation Project of Jiangxi [grant number JXXTCXQN201906], and Key R & D Project of Jiangxi Science and Technology Department [grant number 2019ACB60016].

Author information



Corresponding author

Correspondence to Chang-Xuan Wan.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.



See Table

Table 8 Verbs causal prompt words


Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Wan, CX., Li, B. Financial causal sentence recognition based on BERT-CNN text classification. J Supercomput (2021).

Download citation


  • Text classification
  • Recognition of causality
  • BERT model