Enhanced news sentiment analysis using deep learning methods
- 28 Downloads
We explore the predictive power of historical news sentiments based on financial market performance to forecast financial news sentiments. We define news sentiments based on stock price returns averaged over one minute right after a news article has been released. If the stock price exhibits positive (negative) return, we classify the news article released just prior to the observed stock return as positive (negative). We use Wikipedia and Gigaword five corpus articles from 2014 and we apply the global vectors for word representation method to this corpus to create word vectors to use as inputs into the deep learning TensorFlow network. We analyze high-frequency (intraday) Thompson Reuters News Archive as well as the high-frequency price tick history of the Dow Jones Industrial Average (DJIA 30) Index individual stocks for the period between 1/1/2003 and 12/30/2013. We apply a combination of deep learning methodologies of recurrent neural network with long short-term memory units to train the Thompson Reuters News Archive Data from 2003 to 2012, and we test the forecasting power of our method on 2013 News Archive data. We find that the forecasting accuracy of our methodology improves when we switch from random selection of positive and negative news to selecting the news with highest positive scores as positive news and news with highest negative scores as negative news to create our training data set.
KeywordsSentiment analysis Deep learning Forecasting
With the latest technological developments and advancement in data analytics, financial professionals and economists have increasingly explored new artificial intelligence and machine learning approaches to enhance financial market forecasting results. Qualitative inputs such as the news, corporate earnings’ reports, corporate press releases, and regulatory announcements play an important role in shaping the decisions of central bankers, economic strategists, investment professionals, securities traders, and portfolio managers regarding global investment decisions, portfolio re-balancing, as well as exploring new investment products and opportunities. The amount of streaming news and information that financial professionals and market participants need to read on daily basis is of proportions that surpass a human capability to process and utilize such information in real time decision making processes. To overcome human limitations, the application of deep learning approaches to finance research has received a great deal of attention from both practitioners and academicians.
Forecasting financial time-series is probably one of the most challenging problems in financial market analysis. Researchers have analyzed vast amount of financial market transactions to detect repeated patterns of price movements using statistical and econometric models. The noisy and stochastic nature of markets, however, adversely affects the forecasting accuracy of the aforementioned models. Hence, the promising results obtained using artificial intelligence and deep learning have attracted the attention of the finance and economics researchers in a quest to improve economic forecasting results.
In a study using deep learning for portfolio construction, Lee and Yoo (2018) use recurrent neural network (RNN)  with long short-term memory (LSTM) units , to predict potential returns of a collection of investments . They construct diversified portfolios by giving thresholds for the potential returns and examine the return and risk levels of the portfolios. These results show that it is possible to build a portfolio given a desired degree of return and risk by adjusting the thresholds, which is promising in asset allocations reflecting investors risk preference. In another study, Bao and Rao (2017) present a novel deep learning framework where wavelet transforms (WT), stacked autoencoders (SAEs) and LSTM are combined for stock price forecasting . They introduce SAEs for hierarchically extracted deep features into stock price forecasting. Results show that the proposed model outperforms other similar models in both predictive accuracy and profitability performance.
Deep learning has successfully been used for choosing and pricing securities, constructing investment portfolios, and active risk management  and natural language processing (NLP) or computational linguistics has become increasingly powerful due to increased data availability. Recently developed NLP techniques enable capturing sentiments more accurately and extracting text semantics more effectively. Articles that utilize NLP techniques to predict financial markets are establishing a research field of natural language based financial forecasting (NLFF). Xing et al. (2018) offer a summary of NLFF methodologies in a review study, ordering and structuring techniques and applications from related work .
RNNs have also been applied to stock return predictions and portfolio re-balancing by adjusting potential return threshold levels used to classify assets, based on risk-return trade-offs . Using deep learning methodology, based on convolutional neural networks (CNNs) and high-frequency time series extracted from limit order books, to predict future stock price movements, for example, has achieved better results compared to results obtained by multilayer neural networks and support vector machines (SVM) [21, 22].
The ability of deep learning to extract features from a large set of raw data without relying on prior knowledge of predictors makes these methodologies very attractive for stock market prediction at high frequencies. The algorithms vary considerably in the choice of network structure, activation function, and other model parameters, and their performance strongly depends on the method of data representation. Researchers have explored both the advantages and drawbacks of deep learning algorithms for stock market analysis and prediction. Using high-frequency intraday stock returns as input data, the authors in  study the effects of three unsupervised feature extraction methods: (1) principal component analysis, (2) autoencoder, and the (3) restricted Boltzmann machine, and their ability to predict future market behavior. Results show that deep neural networks can extract additional information from the residuals of the autoregressive model and can improve model prediction performance.
One of the major advantages in using deep learning for finance is to embed a large collection of information into investment decisions and portfolio construction. This can be accomplished by information compression into a smaller feature space. Studies have reported that non-linear feature reduction performed by deep learning tools is effective in price trend prediction . Deep learning could offer remedies for natural language complexity and ambiguity, not present in traditional methods of text mining. For instance, RNN with LSTM units employs hierarchical structures including large number of hidden layers, to automatically extract features from ordered sequences of words and capture non-linear relationships or context-dependent meanings of words. Kraus and Feuerriegel (2017) study the use of deep neural networks for financial decision making and report higher accuracy in predicting stock price movements based on financial disclosures, compared to traditional machine learning techniques .
We explore the use of deep learning hierarchical models for financial prediction and classification. Our hypothesis is that applying deep learning methods to financial forecasting can enhance the results by complementing standard methods in finance. In particular, deep learning can detect and exploit interactions in the data that might be invisible to economic models. We build our research on multi-agent simulation models of sentiments that include influences of market trends, neighbours’ agents, and the propensity of the market .
Stock trader models in the multi-agent simulation have mainly modeled agents monitoring the stock price only. However, in the real world, traders trade stocks based on both the price change of stocks and news. In this paper, we combine the NLP machine learning approach, extracting news with positive and negative sentiments, with the use of deep learning hierarchical models to explore financial prediction and classification. We consider whether agents can predict the change of stock prices, arising from sentiments of financial and economic news, by training the model (agent) using RNN with LSTM units. We find that the approach of choosing training data plays a significant role in the performance of the deep learning algorithms. In other words, if we choose the training data randomly from a large corpus of news, the results are inferior to the ones where we select the positive and negative classes of news hierarchically.
The rest of this paper is organized as follows: in Sect. 2, we describe the data we use and present basic statistics. Section 3 is on the methods we use. In Sect. 4, we present our results, and in Sect. 5 we offer our discussion and concluding remarks. In particular, we discuss the application of deep learning in NLP to multi-agent simulation in relation to high performance computing.
We use two data sets in this paper. One is the Thomson Reuters News Archive (TRNA) and the other is the Thomson Reuters Tick History (TRTH) for the DJIA 30 Index for the period between 2003 and 2013. Since DowDuPont Inc. (DWDP) is added to the DJIA 30 on September 1st, 2017, it is not included in our data and the analysis in this paper. Hence, we hereafter refer to the pricing data as “DJ29” instead of DJ30.
TRNA is a news archive provided by Thompson Reuters and it is a collection of third party news stories, organized and ordered by the time when the news were published. The time stamps of the news documents include time precision up to milliseconds, and each news item contains all sequencing and control data. The news archive contains many types of news (e.g., financial markets or sports news) in 128 languages.
There are three major approaches for natural language processing (NLP), i.e., (1) the thesaurus-based approach , (2) the count-based approach (see  as a review), and (3) the inference-based approach. In this study, we adopt the inference-based approach as it involves deep learning, reported in previous studies as one of the most promising methods. This approach includes the following steps: (1) word embedding, (2) definition of polarity, (3) preparation of training and testing data, and (4) deep learning process, which we briefly describe below.
First step of NLP is to convert words to vectors. There exist several methods for this task such as word2vec [14, 15], the global vectors for word representation method (GloVe) , fastText , and others. Among the pre-trained word vectors provided by GloVe , we choose the one created from Wikipedia 2014 and Gigaword 5  corpus, which contains 400K words represented by vectors in 200 dimensions. Since we use the TRNA from 2003 to 2013, the pre-trained word vector of GloVe based on Wikipedia 2014, contains words that have commonly been used during the same period.
The distribution of the length of the initial set of news articles, \(l_i(t)\) is shown in the left panel of Fig. 4. The peak near the origin in this figure represents mostly news headlines. When we construct a natural language processing model, upon review of the positive and negative article classes, we find that the longer articles are classified more appropriately compared to the short articles including the headlines. Moreover, in the training of RNN with LSTM units, short documents are problematic for the methodology, invoking many zero elements of the vectored word matrix, and hence contributing to difficulties in applying the methodology. Thus, in our analysis, we use documents with minimum length of 50 words. As we explain in Sect. 3.4, we also set the maximum article length that we analyze at 550 words.
In our analysis, we consider two approaches of selecting the training data: (1) random, and (2) hierarchical. The random approach consists of randomly selecting 12,500 positive news articles from the positive article population and 12,500 negative articles from the negative article population to construct the training news article sets. The hierarchical approach selects the 12,500 news articles with highest positive polarity (top) and the 12,500 news articles with the highest negative polarity (bottom), selected from a list ordered by \(r_i(t)\) as explained in Fig. 3.
As was explained in Sect. 3.3, we consider the news to be positive if \(r_i(t)>0\), and negative if \(r_i(t)<0\). By applying this definition to the news with length longer or equal to 50 pertaining to stocks included in DJ29 in 2013, we obtain 16,856 positive news and 17,213 negative news. We apply our model based on the training dataset from 2003 to 2012 to the test data from 2013.
Training: convergence of accuracy and loss
As we can see in Fig. 5, both the random selection (blue line) and the hierarchical selection (orange line) converge at around 200 k iterations. Beyond 200 k iterations, the accuracy of the random selection case fluctuates around 95% and the accuracy of the hierarchical selection case fluctuates around 97.5%. These results infer that the hierarchical selection case performs better when compared to the random selection case.
In Fig. 6, we observe the change of the loss during training, where similarly to Fig. 5, the blue line corresponds to the random selection case of the training data and the orange line corresponds to the hierarchical selection case. Both curves converge at almost 200 k iterations. After 200 k iterations, the loss fluctuates around 10% in the case of random selection of training data while, on the other hand, the loss fluctuates around 5% in the case of hierarchical selection of training data. These results also imply that the training data selected by the hierarchical method are better than that obtained by a random selection.
Test: predictability of the model
Application of the model trained using randomly selected training data to the positive test data yields P at every 10 k iterations depicted by blue dots in Fig. 9. On the other hand, orange dots in this figure represent the application of the model using randomly selected training data to the negative test data. We connect the dots at same iteration levels with green line when the prediction of the positive test data is better than the prediction of the negative test data. On the other hand, we connect the two dots by a red line when the prediction of the positive test data is worse than the prediction of the negative test data. This figure shows that the model trained using randomly selected training data predicts the test data as positive on average. When the iteration is \(n=340\) k, the blue dots are located in the range \(P>0\) and the orange dots are located in the range \(P<0\). Thus, we expect, although the percentage is small, that the model will predict the positive news as positive and the negative news as negative, on average. In general, however, we consider that the iteration range around \(n=340\) k corresponds to the over-fitting training range, and the results might not be very significant.
Discussion and conclusion
In this paper, we explore a new direction of sentiment analysis using deep learning. We define a polarity (i.e., positive or negative sentiment) of the news by observing the log return of the ratio between average stock (entity) price for one minute before the news pertaining to the relevant stock is published and one minute after the news has been released. This definition of polarity of news is novel and differs from previously used approaches to sentiment analysis. Although the definition of polarity is different, by training RNN with LSTM units, we show that the model predicts the positive news as positive and the negative news as negative, on average. This means that we can predict the increase or decrease of stock prices from observing and investigating the news sentiments. To increase the level of predictability, however, as future work, we need to include different approaches and combinations of methodologies. For this purpose, we consider the following changes to our methodologies: (1) Train the model with different values of the hyperparameters to obtain better results. In order to explore this possibility and use significant number of parameter combinations to improve the forecasting results, we consider utilizing high performance computing.
(2) The second improvement consideration includes different approaches to word embedding in addition to GloVe that we used in this study. To accomplish this, we consider using existing pre-trained word vectors distributed by Google and fastText. Moreover, we can also create word vectors by ourselves by applying the word2vec or fastText algorithms to TRNA. Applying different word vectors to the deep learning approach for news sentiment forecasting may allow us to identify a promising avenue for improving the forecasting power of our model.
(3) The third improvement concerns with the definition of polarity. We defined the polarity of news based on the change of stock price, i.e., \(r_i(t)\). Thus, it was difficult to classify news into three polarity groups: i.e., positive, neutral, negative. One solution to overcome this problem is introduction of threshold to \(r_i(t)\) to analyze truly positive and truly negative news reducing the noise of positive or negative sentiment news that are close to neutral. We consider using a hybrid approach to the definition of polarity including our current methodology and adding information from another dataset with already assigned polarity to news, such as Thomson Reuters News Analytics or Thomson Reuters MarketPsych.
(4) The last improvement is concerned with using different deep learning methodologies. Although we used the RNN with LSTM units in this paper, the application of CNN to sentiment analysis is also extensively studied (for example, see [10, 19]). Moreover, the application of seq2seq or Attention to the sentiment analysis is also a consideration.
Stock trader models in a multi-agent simulation framework have mainly modeled agents monitoring the stock price only. However, in the real world, market participants make decisions based on both the price change of stocks and news about the stocks. Hence, in this study, we are introducing the possibility that agents interpret the polarity of news and predict the price change before taking action. Thus, We consider that the application of sentiment analysis to the agent simulation will provide new direction of the field of agent-based modeling and simulation.
We would like to thank Hiroshi Iyetomi, Yuichi Ikeda, and Yoshi Fujiwara. This work was supported by MEXT as Exploratory Challenges on Post-K computer (Studies of Multi-level Spatiotemporal Simulation of Socioeconomic Phenomena, Macroeconomic Simulations) and JSPS KAKENHI Grant Number 17H02041.
- 2.Bojanowski, P., Grave, E., Joulin, A., & Mikolov, T. (2016). Enriching word vectors with subword information. CoRR (abs/1607.04606) arXiv:abs/1607.04606
- 4.Gigaword5: https://catalog.ldc.upenn.edu/LDC2011T07
- 6.Händschke, S. G., Buechel, S., Goldenstein, J., Poschmann, P., Duan, T., Walgenbach, P., et al. (2018). A corpus of corporate annual and social responsibility reports: 280 million tokens of balanced organizational writing. ACL, 2018, 20.Google Scholar
- 10.Kim, Y. (2014). Convolutional neural networks for sentence classification. arXiv preprint arXiv:1408.5882
- 12.Lee, S. I., & Yoo, S. J. (2017). A deep efficient frontier method for optimal investments. arXiv preprint arXiv:1709.09822
- 13.Lee, S. I., & Yoo, S. J. (2018). A new method for portfolio construction using a deep predictive model. In: Proceedings of the 7th International Conference on Emerging Databases (pp. 260–266). SpringerGoogle Scholar
- 14.Mikolov, T., Chen, K., Corrado, G., & Dean, J. (2013). Efficient estimation of word representations in vector space. CoRR (abs/1301.3781), arxiv:1301.3781
- 15.Mikolov, T., Sutskever, I., Chen, K., Corrado, G. S., & Dean, J. (2013). Distributed representations of words and phrases and their compositionality. In: C. J. C. Burges, L. Bottou, M. Welling, Z. Ghahramani, & K. Q. Weinberger (Eds.), Advances in neural information processing systems (vol. 26, pp. 3111–3119). Curran Associates, Inc. http://papers.nips.cc/paper/5021-distributed-representations-of-words-and-phrases-and-their-compositionality.pdf.
- 17.Pennington, J., Socher, R., & Manning, C. D. (2014). Glove: Global vectors for word representation. In: Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP) (pp. 1532–1543).Google Scholar
- 20.Troiano, L., Mejuto, E., & Kriplani, P. (2017). On feature reduction using deep learning for trend prediction in finance. arXiv preprint arXiv:1704.03205
- 21.Tsantekidis, A., Passalis, N., Tefas, A., Kanniainen, J., Gabbouj, M., & Iosifidis, A. (2017). Forecasting stock prices from the limit order book using convolutional neural networks. In: Business informatics (CBI), 2017 IEEE 19th conference on. vol. 1, pp. 7–12. IEEEGoogle Scholar
- 22.Tsantekidis, A., Passalis, N., Tefas, A., Kanniainen, J., Gabbouj, M., & Iosifidis, A. (2017). Using deep learning to detect price change indications in financial markets. In: Signal processing conference (EUSIPCO), 2017 25th European. pp. 2511–2515. IEEEGoogle Scholar
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.