Skip to main content

Analyzing DistilBERT for Sentiment Classification of Banking Financial News

  • Conference paper
  • First Online:
Intelligent Computing and Innovation on Data Science

Part of the book series: Lecture Notes in Networks and Systems ((LNNS,volume 248))

Abstract

In this paper, the sentiment classification approaches are introduced in Indian banking, governmental and global news. The study assesses state-of-art deep contextual language representation, DistilBERT, and traditional context-independent system, TF-IDF, on multiclass (positive, negative, and neutral) sentiment classification news-events. The DistilBERT model is fine-tuned and fed into four supervised machine learning classifiers Random Forest, Decision Tree, Logistic Regression, and Linear SVC, and similarly with baseline TF-IDF. The findings indicate that DistilBERT can transfer basic semantic understanding to further domains and lead to greater accuracy than the baseline TF-IDF. The results also suggest that Random Forest with DistilBERT leads to higher accuracy than other ML classifiers. The Random Forest with DistilBERT achieves 78% accuracy, which is 7% more than with TF-IDF.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 189.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Omotosho BS, Tumala MM (2019) A text mining analysis of Central Bank Monetary Policy Communication in Nigeria

    Google Scholar 

  2. Verma I, Dey L, Meisheri H (2017) Detecting, quantifying and accessing impact of news events on Indian stock indices. In: Proceedings of the international conference on web intelligence, pp 550–557

    Google Scholar 

  3. Kaya M, Fidan G, Toroslu IH (2012) Sentiment analysis of turkish political news. In: 2012 IEEE/WIC/ACM international conferences on web intelligence and intelligent agent technology, vol 1. IEEE, pp 174–180

    Google Scholar 

  4. Yu L, Wu J, Chang P, Chu H (2013) Knowledge-based systems using a contextual entropy model to expand emotion words and their intensity for the sentiment classification of stock market news. Knowl Based Syst 41:89–97

    Article  Google Scholar 

  5. Mikolov T, Chen K, Corrado G, Dean J (2013) Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781

  6. Pennington J, Socher R, Manning CD (2014) Glove: global vectors for word representation. In: Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), pp 1532–1543

    Google Scholar 

  7. Azar PD (2009) Sentiment analysis in financial news. Doctoral dissertation, Harvard University

    Google Scholar 

  8. Sanh V, Debut L, Chaumond J, Wolf T (2019) DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter. arXiv preprint arXiv:1910.01108

  9. Nguyen TH, Shirai K, Velcin J (2015) Sentiment analysis on social media for stock movement prediction. Expert Syst Appl 42(24):9603–9611

    Article  Google Scholar 

  10. Schumaker RP, Chen H (2009) A quantitative stock prediction system based on financial news. Inf Process Manag 45(5):571–583

    Article  Google Scholar 

  11. Xia R, Zong C, Hu X, Cambria E (2013) Feature ensemble plus sample selection: domain adaptation for sentiment classification. IEEE Intell Syst 28(3):10–18

    Article  Google Scholar 

  12. Jing LP, Huang HK, Shi HB (2002) Improved feature selection approach TFIDF in text mining. In: Proceedings of the international conference on machine learning and cybernetics, vol 2. IEEE, pp 944–946

    Google Scholar 

  13. Mikolov T, Karafiát M, Burget L, Černocký J, Khudanpur S (2010) Recurrent neural network based language model. In: Eleventh annual conference of the international speech communication association

    Google Scholar 

  14. Zhou P, Qi Z, Zheng S, Xu J, Bao H, Xu B (2016) Text classification improved by integrating bidirectional LSTM with two-dimensional max pooling. arXiv preprint arXiv:1611.06639.

  15. Sujatha R, Chatterjee JM, Jhanjhi NZ, Brohi SN (2021) Performance of deep learning vs machine learning in plant leaf disease detection. Microprocess Microsyst 80:103615

    Article  Google Scholar 

  16. Devlin J, Chang MW, Lee K, Toutanova K (2018) Bert: pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805.

  17. Elagamy MN, Stanier C, Sharp B (2018) Stock market random forest-text mining system mining critical indicators of stock market movements. In: 2018 2nd international conference on natural language and speech processing (ICNLSP). IEEE, pp 1–8

    Google Scholar 

  18. Batra I, Verma S, Malik A, Ghosh U, Rodrigues JJ, Nguyen GN, Mariappan V (2020) Hybrid logical security framework for privacy preservation in the green internet of things. Sustainability 12(14):5542

    Article  Google Scholar 

  19. Batra I, Verma S, Alazab M (2020) A lightweight IoT-based security framework for inventory automation using wireless sensor network. Int J Commun Syst 33(4):e4228

    Article  Google Scholar 

  20. Hochreiter S (1997) JA1 4 rgen Schmidhuber, Long short-term memory. Neural Comput 9(8)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sahil Verma .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Dogra, V., Singh, A., Verma, S., Kavita, Jhanjhi, N.Z., Talib, M.N. (2021). Analyzing DistilBERT for Sentiment Classification of Banking Financial News. In: Peng, SL., Hsieh, SY., Gopalakrishnan, S., Duraisamy, B. (eds) Intelligent Computing and Innovation on Data Science. Lecture Notes in Networks and Systems, vol 248. Springer, Singapore. https://doi.org/10.1007/978-981-16-3153-5_53

Download citation

Publish with us

Policies and ethics