1 Introduction

Financial time series forecasting, especially stock price forecasting has been one of the most difficult problems for researchers and speculators. The difficulties are mainly caused by the uncertainty and noise of samples, the generation of samples are not just consequence of historical behavior information contained in samples, but also influenced by information beyond historical samples such as macro economy, investor sentiment etc. Traditional statistics methods were well prefered to fit financial time series consider the their robustness to noise and good explaination. But consider it’s pool fitting capability, their implement on sending signal for trading were mostly undesirable. Machine learning method were exploited to this problem and get considerable progress but bottleneck are lead by their sensitivity to parameters and tendency to overfitting.

In recent years, deep learning method have shown remarkable progress in many tasks such as computer visions [9, 15], nature language process [7], speech recognition [5] etc. The deep architecture have shown powerful capabilities of feature extraction and fitting,and the auxiliary tricks such as dropout [14],batch normalization [6] etc. and optimizer such as Rmsprop, Adam [8], Nadam etc. were designed to improve the efficiency of training and figure problems of overfitting, gradient vanish, gradient explosion that substantially led by the deep architecture and non-linear mapping during training. In application on financial time series prediction,numerous studies have shown that neural network is a very effective tool in financial time series forecasting [2, 13, 16]. Weigend et al. [12, 17, 18] compared the performance of neural network with that of traditional statistics methods in predicting financial time series and neural network showed superior forecasting ability than tradition ways. NN models were firstly applied to solve problem in financial domain in White research [19], five different exchange rates were predicted by feedforward and recurrent networks and it was shown in their finding that performance of predictions can be improved by applying NN. Some works show that neural networks are efficient and profitable in forecasting financial time series [4]. Some combinations of multiple neural networks or NN with other method are also proposed for financial time series forecasting. For example, a hybrid artificial method based on neural network and genetic algorithm was used to model daily exchange rates [11].

In this paper, we extended recurrent neural network into deep architecture as a classifier to predict the movement trend of stock price. The performance of models were evaluated on CSI 300 stock index and the results of classification were considered as trading signal to evaluate the profitability.

2 Recurrent Neural Networks with Deep Architecture

2.1 RNN

RNNs [20] are sequence learners which have achieved much success in applications such as natural language understanding, language generation, video processing, and many other tasks [1, 3, 10]. A simple RNN is formed by a repeated application of a function \(F_h\) to the input sequence \(\mathcal {X}_t=(X_1,\ldots ,X_T)\).For each time step \(t=1,\ldots ,T\),the function generates a hidden state \(h_t\):

$$\begin{aligned} h_t=F_h(X_t,h_{t-1})=\sigma (W_hX_t+U_hh_{t-1}+b_h) \end{aligned}$$
(1)

for some non-linear activation function \(\sigma (x)\), where \(X_t\) denotes the input at time t, \(W_h\) denotes the weight of connection between input and hidden state, \(U_h\) denotes the weight of connection between the hidden states \(h_t\) and \(h_{t-1}\), and \(b_h\) denotes the bias of activation.

2.2 Batch Normalization

With the depth of a net work growing, problems such as gradient explosion and gradient vanish may be incurred, and some approach were proposed to alleviate these problems, one of them was batch normalization [6]. The main idea of batch normalization is to perform normalization on the output of each layers for each mini batch [BN], and to reduce internal covariate shift of each layer’s activation, the mean and variance of the distribution are parameterized and learned while training. A batch normalization layer can be formulated as:

$$\begin{aligned} \hat{x}^k=\frac{x^k-E[x^k]}{\sqrt{Var[x^k]}} \end{aligned}$$
(2)
$$\begin{aligned} y^k=\gamma ^k\hat{x}^k+\beta ^k \end{aligned}$$
(3)

where \(x^k\) is the activation of kth layer, \(y^k\) is the output after batch normalization, \(\gamma \) and \(\beta \) are parameters of batch normalization to be learned.

2.3 Deep Recurrent Architecture

To address the problem of stock price prediction, we extend recurrent neural networks into deep architecture. The input of model are multi-variance time series of high frequency market data. At each frame, the hidden outputs \(h_t\) from recurrent layer are fully connected to the next recurrent layer so that the recurrent units are stacked into deeper architecture. Between each stacked recurrent layers, batch normalization are performed on each time axis so that the output of each recurrent units can be normalized to avoid the problems that may led by scale of activation while training on mini-batch. At the last recurrent layer, the last normalized frame was connected to a fully connected perception and output with a softmax layers. The details of our deep architecture are presented in Fig. 1.

3 Data and Preprocessing Methodology

3.1 Sampling

To exploit trading signal from historical market behavior (open, close, high, low, amount, volumns), market data of CSI 300 from the period Jan. 2016 to Dec. 2016 with frequency of 1-minute were sampled into short sequence by constant windows with length of 120, normalization are performed on each univariate time series of each segmented sequence.

3.2 Labeling Methodology

The profitability not only depend on the correctness of prediction on the movement direction of price, but also the margin of price movement that captured by trading signal. So we label samples by assign those whose future prices rise or fall sharply into two single classes and the others as another class, which is defined as:

$$\begin{aligned} L_t = \left\{ \begin{array}{ll} 1 &{}\ { r_t>r_{\theta }}\\ 0 &{}\ \text {Others}\\ -1 &{}\ { r_t<r_{1-\theta }} \end{array} \right. \end{aligned}$$

where \(L_t\) denotes the label of sample \(X_t\),\(r_t=ln\frac{close_{t+t_{forward}}}{close_t}\) denotes the logarithm return of the stock index \(t_{forward}\) minutes after t, and \(\theta \) denotes the threshold of labeling with \(p(r_t>r_{\theta })=\theta \) and \(p(r_t<r_{1-\theta })=\theta \). Another reason of the labeling methodology is that samples contain higher noise when the price fluctuates in a narrow range, dependency between history behavior and future trend are tend to be weaker than other two situations. Detail statistics of training and test sets are shown in Table 1.

Fig. 1.
figure 1

RNN architecture for financial time series prediction.

Table 1. Statistic of data sets

4 Experiment

4.1 Experiment Setting

We generate data sets with 5 different thresholds \(\theta \) and 6 kinds of time window \(t_{forward}\) of prediction to train 30 RNNs. While training models and learning the parameters, back propagation and stochastic gradient descent(SGD) are used for updating the weights of neurons, dropout rates are 0.25 among recurrent layers and 0.5 in fully connected layers, and the batch size is 320. The learning rate of optimizer are 0.5 at the start of training, and decayed by 0.5 if the accuracy on validation sets haven’t improve for 20 epochs. A early stop condition is set, which is that accuracy on validation sets haven’t improve for 150 epochs.

4.2 Results Discussion

The performance of each model on test set are shown in Fig. 2. We find that the prediction accuracy increases as the threshold decreases, which is likely because the samples corresponded to larger margin of rise or fall show stronger dependency between features and labels. However, the change of time windows of prediction do not show obvious effect on model performance. Specifically, the model with \(\theta =0.1,t_{forward}=10\) reaches the best performance with the accuracy of 48.31%, which is remarkable for 3-classes financial time series prediction, and can give powerful support for market practice.

We further test our 30 data sets on SVM, Random Forest, Logistic Regression and traditional statistic model linear regression to compare results with RNN, the best five results of each model on 30 data sets are shown in Table 2. We can find that the performance of RNN is far better than any of the three traditional machine learning models or linear regression, and the accuracy of SVM, the best of the other four models, is outperformed by that of RNN about 4%.

4.3 Market Simulation

We simulate real stock trading based on the prediction of RNN to evaluate the market performance. We follow a strategy proposed by Lavrenko et al. are followed: if the model predicts the new sample as positive class, our system will purchase 100,000 CYN worth of stock at next minutes with open price. We assume 1,000,000 CYN are available at the start moment and trading signal will not be executed when cash balance is less than 100,000 CYN. After a purchase, the system will hold the stock for \(t_{forward}\) minutes corresponding to the prediction window of model. If during that period we can sell the stock to make profit of \(r_{\theta }\) (threshold profit rate of labeling) or more, we sell immediately, otherwise, at the end of \(t_{forward}\) minute period, our system sells the stock with the close price. If the model predicts the new sample as negative class, our system will have a short position of 100,000 CNY worth of stock. Similarly, system will hold the stock for \(t_{forward}\) minutes. If during the period the system can buy the stock at \(r_{1-\theta }\) lower than shorted, the system close the position of short by buying the stock to cover. Or else, at the end of the period, system will close the position in the same way at the close price of the end of period.

Fig. 2.
figure 2

Performance of each model on 30 datasets.

Table 2. Best 5 results of each model on 30 data sets

To simulate this strategy we use models trained on training sets to predict the future trend of stock in each minute from April 18th 2016 to January 30th 2017, and send trading signal according to the prediction made by models. The profits of each model on market simulation are presented in Table 3. We can see from results that all simulations based on trading signals sent by prediction models are all significantly more profitable than randomly buy and sell strategy, which implies that prediction models can catch suitable trading points by predict future trends to make profit. Among these prediction models, all simulations based on machine learning prediction models result in higher profit than linear regression, which indicates that the non-linear fitting of machine learning models show better efficiency in extreme market signal learning than traditional statistic models. Specially, RNN achieves 18.13% more profit than the statistic model, even the second best model is 11.13% less profit than RNN.

Table 3. Market simulation results

5 Conclusion

In this paper we extend RNN into deep structure to learning the extreme market from the sequential samples of historical behavior. High frequency market data of CSI 300 are used to train the deep RNN and the deep structure do improve the accuracy of prediction compared with the traditional machine learning method and statistical method. In the sight of practice, this paper presents the applicability of deep non-linear mapping on financial time series, and 48.31% accuracy for 3-classes classification is meaningful for practice in market. And we further prove the better profitability of deep RNN in market simulation than that of any traditional machine learning models or statistic models.