A graph-based CNN-LSTM stock price prediction algorithm with leading indicators

Wu, Jimmy Ming-Tai; Li, Zhongcui; Herencsar, Norbert; Vo, Bay; Lin, Jerry Chun-Wei

doi:10.1007/s00530-021-00758-w

A graph-based CNN-LSTM stock price prediction algorithm with leading indicators

Special Issue Paper
Open access
Published: 22 February 2021

Volume 29, pages 1751–1770, (2023)
Cite this article

Download PDF

You have full access to this open access article

Multimedia Systems Aims and scope Submit manuscript

A graph-based CNN-LSTM stock price prediction algorithm with leading indicators

Download PDF

Jimmy Ming-Tai Wu¹,
Zhongcui Li¹,
Norbert Herencsar²,
Bay Vo³ &
…
Jerry Chun-Wei Lin ORCID: orcid.org/0000-0001-8768-9709⁴

33k Accesses
77 Citations
1 Altmetric
Explore all metrics

Abstract

In today’s society, investment wealth management has become a mainstream of the contemporary era. Investment wealth management refers to the use of funds by investors to arrange funds reasonably, for example, savings, bank financial products, bonds, stocks, commodity spots, real estate, gold, art, and many others. Wealth management tools manage and assign families, individuals, enterprises, and institutions to achieve the purpose of increasing and maintaining value to accelerate asset growth. Among them, in investment and financial management, people’s favorite product of investment often stocks, because the stock market has great advantages and charm, especially compared with other investment methods. More and more scholars have developed methods of prediction from multiple angles for the stock market. According to the feature of financial time series and the task of price prediction, this article proposes a new framework structure to achieve a more accurate prediction of the stock price, which combines Convolution Neural Network (CNN) and Long–Short-Term Memory Neural Network (LSTM). This new method is aptly named stock sequence array convolutional LSTM (SACLSTM). It constructs a sequence array of historical data and its leading indicators (options and futures), and uses the array as the input image of the CNN framework, and extracts certain feature vectors through the convolutional layer and the layer of pooling, and as the input vector of LSTM, and takes ten stocks in U.S.A and Taiwan as the experimental data. Compared with previous methods, the prediction performance of the proposed algorithm in this article leads to better results when compared directly.

A Graphic CNN-LSTM Model for Stock Price Predication

Integrated GCN-LSTM stock prices movement prediction based on knowledge-incorporated graphs construction

Article 16 April 2023

Unsupervised Learning via Graph Convolutional Network for Stock Trend Prediction

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

The capital market includes money as well as the market, namely the financial market. So financing refers to the process of economic operation; both supply and demand of funds use various financial instruments to adjust the capital surplus of activity. Financial markets are trading financial instruments, such as bonds, savings certificates, stock, etc. To seek greater benefits from it, generations of scholars and investors continue to explore the secrets and develop many prediction methods [1, 2]. Especially in stocks, because fluctuations in stock prices are influenced by many factors, including economic trends, economic cycles, economic structure, and other macro factors, as well as industry development, listed companies’ financial quality, and other factors. Even great in part, it depends on the influence of investors’ psychological games and other micro-factors. These factors are important in stocks. Researchers and investors are trying to find opportunities to gain profit in stocks. Therefore, the stock’s methods forecasting is widely spread. [3,4,5,6,7]. As we all know, traditional methods include linear discriminant analysis, statistical methods, random forests [8], quadratic discriminant analysis, evolutionary computation algorithms [9, 10], logistic regression and genetic algorithm [11,12,13]. Among them, the stochastic forest algorithm is used to build the stock model based on the historical price information [8] for the trend prediction in the process of stock investment. First of all, taking the stock portfolio as the research object, genetic programming is used to derive a more accurate prediction function. Secondly, the genetic algorithm is used in the possible stock permutation and combination to perform the random number generation, selection, exchange, mutation, etc. The survival probability of chromosomes is determined according to the profit fitness evaluation, and the better combination is found [11,12,13]. Establishing a model of the relationship between future price trends and historical behavior, and use the sample in historical market trends to predict future price [5]. However, the key part of many forecasting methods is the extraction of features. They all assume that the future price trend is the result of historical behavior. It is worth noting that people design features subjectively, and models based on technical analysis are generally based on some assumptions of the market framework. The model’s success depends mainly on The validity of these assumptions.

With the continuous development of information technology, the application of machine learning is becoming more and more extensive. Especially, an artificial neural network (ANN) which is very important in social life, such as completing some signal processing or pattern recognition function, speech recognition [14], constructing expert system [15], making robot and so on. ANN is a method of simulating human thinking. It is a non-linear dynamic system and his characteristic is the parallel distributed information processing and storage synergistic [16,17,18]. Although the structure of the artificial neural network is very simple and its functions are limited, it is connected by many neurons with adjustable connection weights. It is characterized by large-scale parallel processing, distributed information memory, good self-organization, and self-learning capabilities. Therefore, rapid judgment, decision-making, and processing can be made on many issues. After all, they are all learning and training with the help of a series of adaptive, and self-organizing abilities of neural networks, so that they have a good and predictive ability. For example, constructing a neural network framework for the financial market, and predict the short-term closing price of the stock through a combination of technical analysis, financial theory and economic analysis, time series analysis, and basic analysis [19]. With the increase of the neural networks, deep learning has attracted more and more heed [20,21,22,23]. The difference between ANN and deep learning is that the number of layers of the network is different. Deep learning is to train many layers of neural networks with more layers. A series of new structures and new methods that can be worked and evolved. New structures include a CNN, LSTM, ResNet, etc. There are different units in CNN and LSTM, and there are mainly convolutional units [24] and the unit of pooling on CNN. There is mainly recurrent unit [25], long-short term memory unit [26] in LSTM [27, 28]. In addition, there are some algorithm, such as, Restricted Boltzmann Machine (RBM) [29, 30], deep multilayer perceptron (MLP) [22], and autoencoder (AE) [31], among others.

The original steps of a multilayer neural network are mapping features to value. It is characterized by manual selection. The step of deep learning is to input the signal at first, then extract the feature, and output the expected value at last. The most important feature is that the network chooses itself. Among them, CNN and LSTM are the most widely used in the network. CNN solves the problem that the traditional deep network parameters are too many and hard to train. It uses the concepts of local receptive fields and shared weights, thus greatly reducing the number of network parameters. Local receptive field refers to the input data of neural networks in a multi-dimensional vector, and the neurons of the next layer will only be connected to the input neurons under part of the window. Weight sharing means that $N \times N$ hidden layer neurons connected with the layer of inputting, the parameters of each hidden layer’s neurons are not different, that is to say, different windows and their corresponding hidden layer neurons share a set of parameters. With the above characteristics of CNN, many researchers apply it to the prediction of stock [3, 32,33,34,35]. LSTM is a special type of recurrent neural network (RNN). It is mainly applied to solve gradient disappearance and explosion during long sequence training. As we all know, RNN is a neural network used to process sequence data. It is mainly used to process data that changes in sequence. For example, according to the previous content, words will have different meanings, and RNN can solve such problems well. In simple terms, LSTM can perform better than ordinary RNN in longer sequences. Similarly, many scholars use LSTM to relate to time series and believe that applying it to the financial market will achieve good results [36,37,38]. In [34], it uses the stock candlestick chart as the input image, and directly input to the input layer. Another study is in [33], it used to map the market’s historical data to its future volatility to seek a framework. To reduce overfitting, the establishment of a one-dimensional input to predict only be achieved by the CNN framework in [32]. At the same time, it is based on the closing price history, but ignore other possible variables, such as technical indicators. Based on the above shortcomings, [3] proposed another CNN based model that used technical indicators for each sample. However, it can take into account the possible interrelationship as another probable source of message between the stock market. In [32], historical data of the closing price of the S&P 500 index as the input of LSTM, MLP, and CNN, and the result shown that the experimental results of CNN and LSTM are better than MLP. In [37], it proposed a model to predict the stock rankings future returns. [38] analyzed LSTM that is a special RNN structure, suitable for learning from experience, classification, processing, and prediction of time series with unknown size delay, and [36] propose a framework to analyze and forecast the company’s future growth using the LSTM model and the company’s net growth algorithm.

At present, in the aspect of financial time series, especially for stocks, the feature vectors are usually the closing price, the opening price, the lowest price, the highest price, and the transaction volume of historical data. And then the data is predicted by inputting them into relevant algorithms. In this paper, the historical data and the leading indicators of stocks are used to predict stocks. Leading indicators’ index is changed firstly before the overall economic recession or growth. It can predict the turning point in the economic cycle, and estimate the fluctuation range of economic activities, and speculate the trend of economic fluctuation. From an economic perspective, this indicator has a clear and positive leading relationship with its benchmark cycle, such as inflation rate, futures, and options. This article will propose a new neural network framework, and this method is called stock sequence array convolution LSTM (SACLSTM) whose input method is mainly to imitate the elements of CNN network image input and confirm to the currently existing data into the form of an image. Then through these image training network weights to extract useful features, the extracted features are input into the LSTM network to identify the extreme market to generate appropriate signals. The historical data and leading indicators of Taiwanese and American stock markets are used as input data, and the input vector of the proposed network is transformed by preprocessing. In the experiment, this paper combines different types of data and different indicators to test and get the appropriate signal. On this basis, this paper tests a simple commercial strategy based on prediction to evaluate the profitability and stability of the network. Some contributions are described in this article as:

1.
Because stocks are susceptible to various factors, it is necessary to collect more reference indicators. Based on the operating principles of CNN and LSTM, the input of the initial variable covers the historical data and the leading indicators of the stock, and it can further reach other places of the stock, and then realize the prediction of the data.
2.
This paper firstly proposes a two-dimensional vector that simulates the image input style and then inputs it into the proposed network to predict stocks. This input framework makes stock price prediction possible.
3.
In this article, the proposed arithmetic will be compared with the previous algorithm, collect some data, and experiment to verify the usefulness of the algorithm.

The others of this article are arranged as follows. The second part firstly introduces the related work of stock forecasting, and then summarizes the methods and applications of related algorithms. The third part will briefly introduce the background knowledge of related technologies in this field. The fourth part introduces the proposed methods. The fifth part will introduce experimental results of classification and market simulation. The sixth part shows the experimental results, and finally, the last part summarizes the experimental results and puts forward suggestions and shortcomings for further improvement.

2 Related work

Financial time series is a type of time series data [39]. They have a strong temporarily. The data has a strong dependency before and after the data, and the order cannot be adjusted. Generally, they are two-dimensional data. Since time series have strong sequence lines, and there are generally dependencies, periods, etc. before and after the data, the future data can be predicted according to the existing data through statistical knowledge. Rani and Sikka [1] believe that time series clustering is one of the key concepts of data mining. They are applied to comprehend the mechanism of generating time series and predict the future value of a given time series. There are usually two different methods for financial sequence prediction. The first algorithm attempts to improve the capability of the prediction by improving the model, and it focuses on the characteristics of the improved prediction.

In the first category of algorithms focused on predictive models, a variety of means have been applied, including ANN, Naive Bayes, support vector machines (SVM), and random forests. In [40], the latest literature reviews the techniques used to predict stock market trends in the area of artificial intelligence and machine learning. ANN is considered the main machine learning technology in the field. It explores the possibility of using the non-linearity of daily returns to improve short-term and long-term stock forecasts. They have compared five competitive patterns, namely LSTAR, the linear AR model, and ESTAR smooth transition autoregressive model, and JCN and MLP. The results showed that the nonlinear neural network model may be a better prediction method. The use of artificial neural network models can more accurately predict stock returns, and neural network technology has certain improvements in predicting stock returns relative to AR models and STAR models.

Feedforward neural networks are currently popular neural network types, usually using back propagation algorithms [41]. Zhang et al. [42] proposes a PSO-based selective neural network integration (PSOSEN) algorithm, which can be used for Nasdaq 100 index and S and P 300 index analysis. The algorithm firstly trains each neural network through the PSO algorithm and then combines the neural networks according to a preset threshold. Experimental results show that the improved arithmetic is valid on the stock index prediction problem, and its performance is powerful than the selection integration algorithm based on a genetic algorithm.

In [2], this article presents an efficient and complete data mining method. Three-degree dimensionality reduction fuzzy robust principal component analysis (FRPCA), techniques of principal component analysis (PCA), and kernel-based principal component analysis (KPCA) that are used to the entire data set to rearrange and simplify the original data structure. Then using artificial neural networks (ANN) is to classify the converted data sets and to predict the daily direction of future market returns. The results show that the risk-adjusted profit of the trading strategy based on the comprehensive classification and mining process of PCA and ANN is significantly higher than the comparison benchmark and higher than the trading strategy predicted based on the KPCA and FRPCA models.

The simplicity of shallow models prevents them from achieving an effective mapping from input space to successful prediction. Therefore, using the availability of many data and emerging effective learning methods is to train deep models. Investigators have turned to market prediction. A significant aspect of a deep model is that they are often able to extract predictions and rich features from the original data. From this perspective, the depth model usually combines the feature extraction stage and single-stage prediction.

Deep ANN is one of the earliest depth methods in this field. It is a neural network with multiple hidden layers. Long et al. [7] is a new model, multi-filter neural network (MFNN), for the task of sample price movement and feature extraction prediction of financial time series. By fusing convolution and recursive neurons, a multi-filter structure is constructed to obtain information on not the same market views and feature spaces. The neural network is used for signal-based trading simulation and the extreme market prediction of the CSI 300 index.

An RNN is a specially designed neural network with internal memory, which can make predictions and extract historical features based on it. Therefore, they seem to be suitable for market forecasting. LSTM is one of the most popular types of RNN. Graves et al. [14] and Pan et al. [43] proposed an LSTM method to obtain useful information and predict immature stock markets from financial time series. In [44], they provide technical indicators to LSTM to achieve the forecast of the stock price direction of the Brazilian stock market. The results show that LSTM is better than MLP.

CNN is another deep learning algorithm applied to stock market prediction after MLP and LSTM, and its effective feature extraction ability has also been verified in many other fields. In [3, 32, 34], the CNN and other algorithms are used to measure the same set of data. Experiments show that the CNN prediction results are ideal.

According to some reported experiments, CNN has a significant role in the input data processing method on the quality of the final prediction and the extracted feature set. For example, [33] can be applied to data sets from different sources, including different markets, and extract features to predict the future of these markets. The evaluation results showed that compared with the most advanced baseline algorithm, the prediction performance is significantly improved.

In these models, some specific methods are based on feature extraction and selection. Due to the high uncertainty and volatility of stocks, the traditional feature extraction methods are technical analysis and statistical methods [4]. Technical analysis is the most direct and basic method in stock forecasting. In [45], people consider that some historical patterns are related to future behavior. Therefore, a large number of technical indicators have been defined to describe these models to be applied to investment expert systems.

The above summarizes the interpretation paper from the initial variable set, feature extraction algorithm, prediction method, and so on. Because these algorithms can automatically extract features from the original data, a trend toward deep learning models has appeared in recent publications. However, most researchers use only one market’s technical indicators or historical price data to make predictions, and multiple variables can improve the accuracy of stock market forecasts. In this article, we will introduce a new framework based on CNN and LSTM, which aims to aggregate multiple variables (historical data and leading indicators), automatically extract features through CNN, and then input them into LSTM to predict the direction of the stock market.

3 Background

Before introducing the method proposed in this article, in this section, we will review the CNN and the LSTM as the main elements of the framework of the proposed algorithm.

3.1 Convolutional neural network

With the development of DNN, convolutional neural network has been proposed [3, 32,33,34] and is currently one of the most famous algorithms. It has successfully been applied in various fields such as detection [46] and segmentation [47]. CNN provided outstanding performance than the previous traditional machine learning algorithms in the above fields. It mainly includes several convolutional layers, pooling layers, and fully connected layers. The details of each component in CNN are introduced as follows.

3.1.1 Convolutional layer

The function of the convolution layer is to collect the features of input data, which contains several convolution kernels. Each cell of the convolution kernel corresponds to a bias vector and a weight coefficient, similar to a neuron of the feedforward neural network. Each neuron in the convolutional layer is related to multiple neurons in the area close to the previous layer. The size of the area depends on the size of the convolution kernel that is called “receptive field” in the literature. Its meaning can be compared to the receptive field of visual cortical cells. During the convolution kernel is working, the input features will be scanned regularly, and the input features will be summed and multiplied by the matrix elements in the receptive field, and the deviations are superimposed:

$$\begin{aligned}&\begin{aligned} Y^{l+1}(c,d)&=[Y^{l}\bigotimes w^{l+1}](c,d)+b\\&=\sum _{K_{l}}^{f=1}\sum _{f}^{e=1}\sum _{f}^{y=1}[Y_{l}^{k}(s_{0}c+x,s_{0}d+y)w_{l+1}^{k}(e,f)]+b \end{aligned} \end{aligned}$$

(1)

$$\begin{aligned}&(c,d)\epsilon \left\{ 0, 1, \ldots , Z_{l+1}\right\} , Z_{l+1}=\frac{Z_{l}+2q-f}{s_{0}}+1 \end{aligned}$$

(2)

The summation part of Eq. (1) is equivalent to solving a cross-correlation. b is a deviation, $Y^{l}$ and $Y^{l+1}$ represent the input and output of the convolution of $l+1$ layer, also known as the feature map. In Eq. (2), $Y+1$ is the size of $Y^{l+1}$, it is assumed that the feature maps have the same length and width here. Y(c, d) corresponds to the pixels of the feature map. K is the channel number of the characteristic graph. $s_{0}$, f, q are the parameters of the convolution layer, corresponding to the size of convolution step (stride), the convolution kernel, and padding (padding) layers.

Figure 1 takes the two-dimensional convolution kernel as an example, and the one dimensional or three-dimensional convolution kernel works similarly. In particular, when the convolution kernel is of size $f = 1$, step size $s_{0} = 1$ and does not contain the filled unit convolution kernel, the cross-correlation calculation in the convolution layer is equivalent to matrix multiplication, and a fully connected network is constructed between the convolution layers. At this time, the output of layer $l+1$ is Eq. (3)

$$\begin{aligned} Y^{l+1}=\sum _{K_{l}}^{k=1}\sum _{f}^{e=1}\sum _{f}^{f=1}\left( Y_{c,d,k}^{l}w_{l+1}^{k}\right) +b=w_{T}^{l+1}Y_{l+1}+b, Y^{l+1}=Y \end{aligned}$$

(3)

The parameters of the convolution layer include the size of the convolution kernel, the step size, and the filling. The three determine the size of the output feature map of the convolution layer that is the hyperparameter of the CNN. The size of the convolution kernel is specified as any value smaller than the size of the inputting of the image. The convolution kernel is larger, the extracted input features are more complex.

When it sweeps the feature map twice, the convolution step describes the distance between the positions of the convolution kernel. The convolution kernel will sweep through the elements of the feature map one by one when the convolution step is 1, and they will skip $n-1$ pixels in the next scan when the step size is n.

The convolutional layer contains an activation function that helps express complex features, and its representation is described in Eq. (4).

$$\begin{aligned} A_{c,d,k}^{l}=f\left( Y_{c,d,k}^{l}\right) \end{aligned}$$

(4)

Among them, the common activation function is Relu, which usually refers to the nonlinear function represented by its variants and the ramp function. It is defined as Eq. (5):

$$\begin{aligned} f\left( e\right) = {\text {max}} \left( 0,e\right) \end{aligned}$$

(5)

As the activation function of the neuron, the nonlinear output of the neuron after the linear transformation $\mathbf {w^{T}e+b}$ is defined. For the input vector, $\mathbf {e}$ from the previous neural network entering the neuron, using the linear rectification activation function of the neuron will output $\mathbf {max(0,w^{T}e+b)}$, and it will as the output of the entire neural network as go to the next layer of neurons.

3.1.2 Pooling layer

After feature extraction in the convolutional layer, the output feature map is going to be passed to the pooling layer for information filtering and feature selection. The pooling layer contains preset pooling functions whose function is to replace the result of a single point with the statistics of the feature map of its neighboring area in the feature map. The selection of the pooling area in the pooling layer is not different from the steps of the convolution kernel scanning feature map, controlled by the pooling size, step size, and filling. It is generally expressed as Eq. (6):

$$\begin{aligned} A_{k}^{l}(c,d)=\left[ \sum _{x=1}^{f}\sum _{y=1}^{f}A_{k}^{l}\left( s_{0}c+e,s_{0}d+f \right) ^{p}\right] ^{\frac{1}{p}} \end{aligned}$$

(6)

In Eq. 6, the meaning of step size $s_{0}$ and pixel (c, d) is not different as that of the convolutional layer, and s is a pre-specified parameter. Pooling takes the average value in the pooling area, which is called average pooling when $p = 1$. When $p\rightarrow \infty$, pooling takes the maximum value in the area which is called max pooling. Mean pooling and maximum pooling are pooling methods that have been used for a long time in the design of CNN. Both retain texture information and the background and of the image at the size of the feature map or the expense of partial information. For example, Fig. 2 is a process of max pooling, the size of filters is set as 2 $\times$ 2 and each stride is set as 2. After the pooling process, it can be seen that the original 4 $\times$4 matrix is compressed into a 2 $\times$ 2 matrix.

As the convolution layers are stacked, according to the cross-correlation calculation of the convolution kernel, the size of the feature map will gradually reduce. For example, an input image of 16 $\times$ 16 undergoes a convolution kernel of 5 $\times$ 5 with unit steps and no padding. After that, a feature map of 12 $\times$ 12 will be output. For this reason, the filling is a method to increase the size of the feature map to offset the influence of size shrinkage. The common filling methods are filling by replication padding and 0.

3.1.3 Fully connected layer

In the traditional feedforward neural network, the fully connected layer in CNN is equal to the hidden layer. The fully connected layer is located in the last part of the hidden layer of CNN and only transmits signals to other fully connected layers. The featured graph loses the spatial topology in the fully connected layer and is expanded into a vector and passes the excitation function. The convolutional layer and pooling layer in the CNN can perform feature extraction on the input data. The role of the fully connected layer is to nonlinearly combine the extracted features to obtain an output. Fully connected layers play the role of ”classifier” in the entire convolutional neural network. The convolutional layer, pooling layer, and activation function layer map the original data to the hidden layer feature space. The fully connected layer plays the role of mapping the learned “distributed feature representation” to the sample label space. The fully connected layer isn’t expected to have features extracting ability, but trying to use the existing high-order features to complete the learning goal.

3.2 Long–short-term memory networks

LSTM is a type of time RNN. It is specially devised to solve the long-term dependence problem of general RNN [36,37,38, 48, 49]. It has been successfully used in various fields such as machine translation [50], speech recognition [51], image description generation [52], video tagging [53], and financial time series [54]. All RNN has a chain form of the repetitive neural network module. It mainly includes forgetting gate, input gate, and output gate.

3.2.1 Forgetting Gate

The determine useful information is:

$$\begin{aligned} z_{t}=\delta \left( E_{f}\cdot \left[ h_{t-1},x_{t}\right] +b_{f}\right) \end{aligned}$$

(7)

Through the input of the current time and the output of the previous time, the cell state is multiplied by the output of the sigmoid function through the sigmoid function. If the sigmoid function outputs 0, the part of the information needs to be forgotten, otherwise, the part of the information continues to be transmitted in the united state. In Eq. (7), $z_{t}$ is current output value, $E_{f}$ is the weight of current output, $b_{f}$ is a biased of current output, and $h_{t-1}$ is the output value of the previous layer.

3.2.2 Input gate

The confirm updated information is

$$\begin{aligned} j_{t}&=\delta \left( E_{i}\cdot \left[ h_{t-1}\right] +b_{i}\right) \end{aligned}$$

(8)

$$\begin{aligned} \widetilde{B}_{t}&={\text {tan}}h \left( E_{B}\cdot \left[ h_{t-1}, x_{t}\right] +b_{B}\right) \end{aligned}$$

(9)

The gate function is to update the old unit status. The previous forget gate layer determines what information is forgotten or add, and is implemented by the gate layer. The gate composed of the second ${\text {sigmoid}} + {\text {tan}}h$ function determines which information needs to be added to the state. Here, it is divided into two parts. One is that the sigmoid layer determines which values will be updated. This part is the same as the first layer, and the tanh layer will create new information to be added to the state. For example, replacing the subject state of the previous sentence. See Eqs. (8) and (9) for specific formula. $j_{t}$ and ${B}_{t}$ are current output values in the input gate which represents two parts. The input $h_{t-1}$ and $x_{t}$ at time $t-1$ are activated by another linear transformation and sigmoid (this is called the input gate) and output $j_{t}$. At the same time, after $h_{t-1}$ and $x_{t}$ are activated by another linear transformation (tanh), they are multiplied by $j_{t}$ to obtain an intermediate result. This intermediate result is added to the intermediate result of the previous step to get $\widetilde{B}_{t}$.

3.2.3 Output gate

The output information is

$$\begin{aligned} p_{t}= & {} \delta \left( W_{p}\left[ h_{t-1},x_{t}\right] +b_{p}\right) \end{aligned}$$

(10)

$$\begin{aligned} h_{t}= & {} p_{t}*{\text {tan}}h(B_{t}) \end{aligned}$$

(11)

The front two doors are mainly used to update the status of the penetration line. The third door is used to calculate according to the information on the penetration line and the output of the current input information calculation module. The gate control device is used to control how much the state value m(t) is visible to the outside at time t. What information is updated is still what information needs to be discarded and what information needs to be added. Equationa (10) and (12) are shown, After the input $h_{t-1}$ and $x_{t}$ at time $t-1$ are activated by another linear transformation + sigmoid (this is called the output gate), the output $p_{t}$, $p_{t}$ is multiplied by $B_{t}$ through tanh to obtain $h_{t}$.

4 Proposed SACLSTM framework

CNN and LSTM have many parameters, including the number of layers, the number of filters in each layer. The initial indication of the input data, which should be chosen to obtain the desired result. The size of each layer of filters on CNN is very important. Because the filter of 5$\times$5 and 3$\times$3 are very common in the field of image processing. In this paper, it proposed that the size of each filter followed the previous work in the area of image processing. Here, it introduces the architecture of SACLSTM, which is a general stock market prediction framework based on CNN and LSTM. This paper divides the framework into three main steps: input data representation, the extraction of continuous features, and final prediction.

Input data representation SACLSTM uses this information to predict the future of these markets and obtains information from different markets. Its goal is to find an ordinary model that maps historical market data to future fluctuations. The general model of this paper is talking about refers to a model that applies to multiple markets. In this paper, it is assumed that from history to the future market for many real mapping functions are correct. To achieve this goal, this paper needs to plan a single model that can predict the future of the market based on the market’s history. However, to extract the required mapping function, this framework needs to be trained by specimens from not the same markets. But in addition to using market history modeling and various other variables (futures, options) as input data, it uses the ten pieces of data that are close to the data variables of the day. In this algorithm, all these messages are aggregated and provided to a designed framework in the form of a two-dimensional tensor.

The extraction of the continuous feature The historical data of each day is represented by a series of variables, for example, the closing price, the opening price, the lowest price, the highest price, and volume. The traditional market forecasting method is to analyze these variables, such as in the form of candlesticks [34], and may predict the future trend of the market by constructing high-level features based on them. The idea behind the first layer design of SACLSTM comes from the recognition of images by CNN. In the first step of SACLSTM, a convolutional layer’s task is to merge daily variables into higher-level features to represent each day in history. Some useful messages from the trend of the market over time may also have a certain effect on predicting the future behavior of the market. This information may find patterns and provide us with information on market behavior trends, and they can be used to predict the future trends. Therefore, it is important that it combines 30 consecutive days of data variables as a “picture ” to collect high-level features that represent trends or reflect market behavior within a specific time interval. The convolutional layer and the layer of pooling in CNN generate more complex features within a certain time interval to summarize the data.

Final prediction The advanced features extracted in the pooling layer and the convolutional layer are input into the LSTM, further operated by the LSTM unit, and finally the flattening operation is used to convert the features generated in the previous layer into a one-dimensional vector, and the vector is provided to map features to predicted fully connected layers. In the next section, this paper will explain the overall design of SACLSTM and how they are used in the dataset this paper used in the specific experiments in this article. In our experiment, this paper used data from 10 stocks from 2 places. In addition to its historical data, options, and futures, each stock also collected ten pieces of data close to its data to better predict the results.

4.1 The process of the proposed SACLSTM

The detailed progress of the developed SACLSTM is described as follows.

Expression of input data As previously earlier, the input of SACLSTM is a two-dimensional matrix. The matrix’s size depends on the number of variables, and the number of days used to make a backtracking history of predictions. If the input applied for prediction is g days, and each g day is represented by the f variable, then the size of the input tensor is g $\times$f.

The extraction of the continuous feature In SACLSTM, to extract the 30-days change feature, an initial variable filter is used. The filter of 3$\times$3 most commonly used by CNN in images are adopted, and these filters can be used to combine them into a matrix with more advanced features. It can use this layer to construct different combinations of host variables. The network can also delete worthless variables by setting the corresponding weight to 0 in the filter. Therefore, the layer serves as a feature selection module. The subsequent different effects of convolution and pooling build higher-level features, aggregate available information over a certain period of time, and combine lower level features from their inputs to higher-level inputs. It applies 64 filters in the second layer. Each filter continuously filters for three days. This method is inspired by observations. The most famous candlestick patterns [34], trying to find unique patterns in three subsequent days. This paper uses this setting as a sign to extract the potentially useful messages from a time window of three subsequent time units in collecting data. The pooling layer executes a maximum pooling of 2 $\times$ 2 in the third layer. To build more complex features and aggregate information over longer time intervals, SACLSTM uses other convolutional layer with 128 filters, which is then similar to the first pooling layer another pooling layer. To obtain more accurate information, this paper then used two layers of pooling and convolution, each with 256 filters, the size of the pooling layer is not different from the previous pooling layer.

Final prediction The features produced by the last pooling layer, and are input into the LSTM unit to extract deeper features, and then are tiled into the final feature vector to realize the final prediction. SACLSTM’s prediction of the market may be interpreted as the possibility of a price rise on the next day of the market. It is conscious to invest more money in stocks that are more likely to rise. Stocks with a low upside are a good option for short selling. However, in our experiment, this paper discrete the output to 0, 1, or $-1$, which is closer to the predicted value.

Example configuration of SACLSTM As mentioned earlier, the input of this paper for each prediction includes 30 days, and every 30 days is represented by several variables. The input of 2D-CNNpred is a 30 $\times$ variable number matrix. The first convolutional layer uses one filter of 3$\times$3, described by three convolutional layers are 128, 256, 256 filters, each filter is followed by a max-pooling layer of 2$\times$2. It is then inputted into the LSTM unit to generate the final output. Figure 3 describes the visualization of graphical process.

5 Data preprocessing and environment setting

In this section, the paper will show the settings used to assess the model, including data sets, evaluation methods, network parameters, and baseline algorithms. Then, this paper will report the optimization framework.

5.1 Dataset

The use of the data set in this work is ten stocks from two markets, namely AAPL, IBM, MSFT, FB, AMZN five stocks in the American market, and CDA, CFO, DJO, DVO, IJO in five stocks of Taiwan. Each sample has several variables (mainly historical data, options, and futures) and the 10 most similar data. Tables 1, 2, 3, 4 and 5 show the relevant information of five stocks in Taiwan and America. Its attributes include the historical data of stock and the attribute of the stock’s future and option. It has three main categories, namely, historical data, futures, and options.

Table 1 Historical data of the five stocks in Taiwan

A graph-based CNN-LSTM stock price prediction algorithm with leading indicators

Abstract

Similar content being viewed by others

A Graphic CNN-LSTM Model for Stock Price Predication

Integrated GCN-LSTM stock prices movement prediction based on knowledge-incorporated graphs construction

Unsupervised Learning via Graph Convolutional Network for Stock Trend Prediction

1 Introduction

2 Related work

3 Background

3.1 Convolutional neural network

3.1.1 Convolutional layer

3.1.2 Pooling layer

3.1.3 Fully connected layer

3.2 Long–short-term memory networks

3.2.1 Forgetting Gate

3.2.2 Input gate

3.2.3 Output gate

4 Proposed SACLSTM framework

4.1 The process of the proposed SACLSTM

5 Data preprocessing and environment setting

5.1 Dataset

5.2 Normalization function

5.3 Assess methods

5.3.1 Detail architectures of input images

5.4 Network parameters

5.5 Baseline algorithms

5.6 Optimization framework

6 Experimental results

7 Conclusion

References

Funding

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation