Sentiment analysis and machine learning in finance: a comparison of methods and models on one million messages

Renault, Thomas

doi:10.1007/s42521-019-00014-x

Sentiment analysis and machine learning in finance: a comparison of methods and models on one million messages

Original Article
Published: 18 September 2019

Volume 2, pages 1–13, (2020)
Cite this article

Digital Finance Aims and scope Submit manuscript

Thomas Renault ORCID: orcid.org/0000-0003-4838-4755¹

7359 Accesses
56 Citations
1 Altmetric
Explore all metrics

Abstract

We use a large dataset of one million messages sent on the microblogging platform StockTwits to evaluate the performance of a wide range of preprocessing methods and machine learning algorithms for sentiment analysis in finance. We find that adding bigrams and emojis significantly improve sentiment classification performance. However, more complex and time-consuming machine learning methods, such as random forests or neural networks, do not improve the accuracy of the classification. We also provide empirical evidence that the preprocessing method and the size of the dataset have a strong impact on the correlation between investor sentiment and stock returns. While investor sentiment and stock returns are highly correlated, we do not find that investor sentiment derived from messages sent on social media helps in predicting large capitalization stocks return at a daily frequency.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A survey on sentiment analysis methods, applications, and challenges

Article 07 February 2022

A review on sentiment analysis and emotion detection from text

Article 28 August 2021

Artificial intelligence in Finance: a comprehensive review through bibliometric and content analysis

Article Open access 20 January 2024

Notes

We do not explore the impact of word embedding in this article (GloVe, Word2Vec). We let this for future research.
The accuracy only increases by 0.31 point of percentage when the size of the dataset increase from 500,000 messages to 1 million messages.
Natural Language Toolkit -https://www.nltk.org/.
https://scikit-learn.org/stable/.
https://scikit-learn.org/stable/modules/generated/sklearn.naive_bayes.MultinomialNB.html.
We also find some that the accuracy reaches a plateau around 250,000–500,000 messages when a Logistic Regression algorithm is used.
Ranco et al. (2015) state that “to achieve the performance of human experts, a large enough set of tweets has to be manually annotated, in our case, over 100,000”.
The time will differ depending on the computing power and the optimization of the script.

References

Ahmad, K., Han, J., Hutson, E., Kearney, C., & Liu, S. (2016). Media-expressed negative tone and firm-level stock returns. Journal of Corporate Finance, 37, 152–172.
Article Google Scholar
Antweiler, W., & Frank, M. Z. (2004). Is all that talk just noise? The information content of internet stock message boards. The Journal of Finance, 59(3), 1259–1294.
Article Google Scholar
Avery, C. N., Chevalier, J. A., & Zeckhauser, R. J. (2015). The “CAPS” prediction system and stock market returns. Review of Finance, 20(4), 1363–1381.
Article Google Scholar
Bartov, E., Faurel, L., & Mohanram, P. S. (2017). Can Twitter help predict firm-level earnings and stock returns? The Accounting Review, 93(3), 25–57.
Article Google Scholar
Chen, C. Y.-H., Despres, R., Guo, L., & Renault, T. (2019). ‘What makes cryptocurrencies special? investor sentiment and return predictability during the bubble’, Working Paper .
Das, S. R., & Chen, M. Y. (2007). Yahoo! for Amazon: Sentiment extraction from small talk on the web. Management Science, 53(9), 1375–1388.
Article Google Scholar
Garcia, D. (2013). Sentiment during recessions. The Journal of Finance, 68(3), 1267–1300.
Article Google Scholar
Leung, H., & Ton, T. (2015). The impact of internet stock message boards on cross-sectional returns of small-capitalization stocks. Journal of Banking & Finance, 55, 37–55.
Article Google Scholar
Li, F. (2010). The information content of forward-looking statements in corporate filings—A naïve bayesian machine learning approach. Journal of Accounting Research, 48(5), 1049–1102.
Article Google Scholar
Loughran, T., & McDonald, B. (2011). When is a liability not a liability? Textual analysis, dictionaries, and 10-Ks. The Journal of Finance, 66(1), 35–65.
Article Google Scholar
Mahmoudi, N., Docherty, P., & Moscato, P. (2018). Deep neural networks understand investors better. Decision Support Systems.
Oliveira, N., Cortez, P., & Areal, N. (2016). Stock market sentiment lexicon acquisition using microblogging data and statistical measures. Decision Support Systems, 85, 62–73.
Article Google Scholar
Price, S. M., Doran, J. S., Peterson, D. R., & Bliss, B. A. (2012). Earnings conference calls and stock returns: The incremental informativeness of textual tone. Journal of Banking & Finance, 36(4), 992–1011.
Article Google Scholar
Ranco, G., Aleksovski, D., Caldarelli, G., Grčar, M., & Mozetič, I. (2015). The effects of Twitter sentiment on stock price returns. PloS One, 10(9), e0138441.
Article Google Scholar
Renault, T. (2017). Intraday online investor sentiment and return patterns in the US stock market. Journal of Banking & Finance, 84, 25–40.
Article Google Scholar
Saif, H., Fernández, M., He, Y., & Alani, H. (2014). ‘On stopwords, filtering and data sparsity for sentiment analysis of twitter’.
Sprenger, T. O., Sandner, P. G., Tumasjan, A., & Welpe, I. M. (2014). News or noise? Using Twitter to identify and understand company-specific news flow. Journal of Business Finance & Accounting, 41(7–8), 791–830.
Article Google Scholar
Tetlock, P. C. (2007). Giving content to investor sentiment: The role of media in the stock market. The Journal of Finance, 62(3), 1139–1168.
Article Google Scholar
Tetlock, P. C., Saar-Tsechansky, M., & Macskassy, S. (2008). More than words: Quantifying language to measure firms’ fundamentals. The Journal of Finance, 63(3), 1437–1467.
Article Google Scholar
Wang, S., & Manning, C. D. (2012). Baselines and bigrams: Simple, good sentiment and topic classification, in ‘Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Short Papers-Volume 2’. Association for Computational Linguistics, 90–94.

Download references

Author information

Authors and Affiliations

CES Sorbonne, Université Paris 1 Panthéon-Sorbonne, CES & LabEx RéFi, Maison des Sciences Économiques, 106-112, boulevard de l’Hôpital, 75013, Paris, France
Thomas Renault

Authors

Thomas Renault
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Thomas Renault.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Renault, T. Sentiment analysis and machine learning in finance: a comparison of methods and models on one million messages. Digit Finance 2, 1–13 (2020). https://doi.org/10.1007/s42521-019-00014-x

Download citation

Received: 20 March 2019
Accepted: 11 September 2019
Published: 18 September 2019
Issue Date: September 2020
DOI: https://doi.org/10.1007/s42521-019-00014-x

Keywords

JEL classification

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Sentiment analysis and machine learning in finance: a comparison of methods and models on one million messages

Abstract

Access this article

Similar content being viewed by others

A survey on sentiment analysis methods, applications, and challenges

A review on sentiment analysis and emotion detection from text

Artificial intelligence in Finance: a comprehensive review through bibliometric and content analysis

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

JEL classification

Navigation

Sentiment analysis and machine learning in finance: a comparison of methods and models on one million messages

Abstract

Access this article

Similar content being viewed by others

A survey on sentiment analysis methods, applications, and challenges

A review on sentiment analysis and emotion detection from text

Artificial intelligence in Finance: a comprehensive review through bibliometric and content analysis

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

JEL classification

Search

Navigation