1 Introduction

In the stock market, different investors have different opinions on whether the stock price can be predicted. Cootner [1] believes that in the efficient market, the stock price has reflected all public information at any time; therefore, the stock price is unpredictable. Some scholars [2] believe that the best strategy for stock investment is to buy and hold for a long time to earn dividend income. The methods of financial commodity analysis can be divided into fundamental analysis and technical analysis [3, 4]. The argument of the former is that the value of financial products is determined by the overall economic environment, the industrial environment to which it belongs, and individual companies and operating performance. On the other hand, there is a lot of information in the stock market for investors to use and refer to, including daily stock price changes, trading volume, stock price fluctuations, changes in chips, average costs, margin trading, etc. The information with reference value has been aggregated and analyzed, and expressed by certain data special formulas, the so-called technical indicators (TIs) [5]. Technical indicators are sufficient to simplify market information and reflect on values or graphs for investors to refer to and formulate trading strategies.

In recent years, deep neural network [6] and various deep learning algorithms have shined in major pattern recognition and machine learning competitions. The vigorous development of deep neural networks has not only created a new field of machine learning research, but its various applications have gradually appeared in people’s lives, such as speech recognition, emotion recognition, natural language processing, and image recognition [7,8,9,10,11,12]. In the field of deep learning, recurrent neural network (RNN) [13], long short-term memory (LSTM) [14], and attention mechanism [15, 16] are particularly good at processing time series data, such as natural language processing, machine translation, speech recognition, and financial index prediction [17,18,19].

The application of time series neural networks to financial commodity trends and price forecasts has gradually become the main trend of financial technology (FinTech) recently [20]. Nelson et al. [21] first proposed the concept of using LSTM neural networks to predict the future trend of stocks. The model label is a binary classification: compared with the closing price of the previous day, up or down. The experiment target was selected by four stocks from the Brazilian stock exchange, and the result reached a maximum of 55.9% forecast accuracy.

Chen et al. [22] proposed an RNN-boost method and used latent Dirichlet allocation (LDA) to select data features. The model was applied to Shanghai-Shenzhen 300 Stock Index (HS300) index prediction. The results show that the time series model is better than other traditional methods, such as support vector regression (SVR)-based approach [23].

Li et al. [24] first proposed the MI-LSTM (Multi-Input) prediction model based on attention mechanism. In the proposed model, attention layer is followed by a LSTM layer, and the prediction results are obtained through Softmax. The experimental target is Shanghai-Shenzhen CSI 300 comprehensive index, and the results show that sequence model with the attention mechanism is better than general sequence models.

Li et al. [25] proposed a Markov random fields combined with a multitask RNN model and applied it to the stock price movement prediction. The model reached 67.97, 66.80, and 68.95% prediction accuracy in the Chinese Securities Index (CSI) 200/300/500.

Livieris et al. [26] proposed a CNN-LSTM-based model to predict the international gold price trend. In their proposed model, the input sequence is first converted to a 32 × 64 convolution operation and then input to the sequence LSTM to obtain the final prediction result. The experimental results are slightly better than SVR kernel method.

Hu [27] proposed the concept of using complete ensemble empirical mode decomposition to remove noise, weighted oil news, and financial news with TF-IDF as input attributes, combined with attention-LSTM to predict oil prices. The experimental results show that the proposed method is significantly better than SVR, AdaBoost, random forest, and other methods in the RMSE evaluation.

Based on the above literature, most studies have not conducted in-depth discussions on the application of technical indicators to sequential deep networks. In our previous research [28], we explored the effectiveness and practicality of various financial analysis technical indicators in time series deep learning networks. In financial commodity trading, it is common to analyze and discuss prices at various levels, such as fundamental, technical, and chip analysis. Technical analysis is the most frequently cited as the primary tool for smart stock selection by various trading software. This research explores the possibility of using technical analysis as the feature selection of deep networks. This study designed feature selection methods for famous technical indicators, such as 9 K-9D, 16 K-16D, RSI, MACD, and Williams%R, and measured it with a four-layer LSTM. The main contribution of [28] is to define the regularization of mainstream technical indicators applied to sequential neural networks.

In view of the above literature, the recent related research of deep learning applied to FinTech mainly focuses on the rise and fall forecast of financial products and model design. Research on the introduction of TAs into DNN models is very rare. The subsequent in-depth discussion on trading strategy research is even more lacking. At present, the research of financial technical indicators and deep learning are two independent research directions. How to effectively combine the two research fields and apply them to the design of trading strategies, and even automated trading based on deep learning, is a major challenge at the practical application level. This research proposes an attention-based BiLSTM neural network combined with trading strategy. In this study we analyze the performance of widely used TIs, including KD, RSI, BIAS, MACD, and Williams%R on the time series neural network. This research also demonstrates the effectiveness of using time series neural networks to construct financial commodity trading strategies. The remainder of this paper is as follows: Sect. 2 is the literature and techniques review; Sect. 3 is the methodology of this research; Sect. 4 is the experimental design, results and discussion, and the last section is conclusion and future work.

2 Related techniques and literature review

2.1 Machine learning and deep learning

Machine learning is a main branch of artificial intelligence (AI). AI generally refers to the reasoning, induction, and knowledge gained through programs. Machine learning is the mathematical rules and means to realize artificial intelligence [29, 30].

Machine learning and deep learning are widely known to the public after AlphaGo developed by DeepMind (acquired by Google in 2014) defeated the world champion of Go [31]. In 2018, the "Turing Award" was awarded to well-known deep learning scholars, which pushed machine learning/deep learning to the peak in one fell swoop and left an extremely important milestone in history [32]. Deep learning technologies have been widely used in different fields in recent years, such as autonomous driving systems [33], voice recognition systems [34, 35], AIoT [36, 37], face recognition [38], smart home and mart city [39,40,41,42], smart campus [43, 44], machine translation [45,46,47], image processing and retrieval [48,49,50], natural language processing [51, 52], etc.

The learning rule of the neural network is mainly to give an error evaluation function \({\mathbb{E}}\) with N training samples and obtain the updated weight by the way of gradient-derived transmission by delta rules [53,54,55]. Given the weight wt and bias bt of t-layer, the gradient update method is as follows, where the learning rate \(\gamma\) is a positive number.

$$w_{t + 1} = w_{t} - \gamma_{t} \nabla_{w} {\mathbb{E}}$$
(1)
$$b_{t + 1} = b_{t} - \gamma_{t} \nabla_{b} {\mathbb{E}}$$
(2)

The error term of the (l + 1)th layer can be obtained from the weight of the previous layer, and weight and bias update are calculated as follows:

$${\varvec{\delta}}^{l} = \left( {{\mathbf{W}}^{l + 1} } \right)^{{\text{T}}} {\varvec{\delta}}^{l + 1} { } \circ f{^{\prime}}\left( {{\mathbf{u}}^{l} } \right)$$
(3)
$$\nabla_{{{\mathbf{W}}^{\left( l \right)} }} {\mathbb{E}} = {\mathbf{x}}^{l - 1} \left( {{\varvec{\delta}}^{l} } \right)^{{\text{T}}}$$
(4)
$$\nabla_{{{\mathbf{b}}^{\left( l \right)} }} {\mathbb{E}} = {\varvec{\delta}}^{l}$$
(5)

where \(\circ\) represents element-wise multiplication, f(•) is the nonlinear transformation equation in a neuron, \({\mathbf{W}}^{l+1}\) denotes the weight matrix from l-layer to (l + 1)-layer.

2.2 Recurrent Neural Network (RNN)

In the fundamental neural network, the training and discrimination of individual data often operate independently, which is relatively unsuitable for serial data input such as audio or natural language. For time series data with extremely high correlation between the context, it is relatively suitable to use recurrent neural network (RNN) [13]. If a fully connected neural network or a convolutional neural network is used to process the same data, the correlation between the data may be ignored and the prediction may be inaccurate.

The structure of the RNN is shown in Fig. 1. When the training data is input, the hidden layer will be synchronously transferred to the next hidden layer, to maintain the dependence between data. In the above architecture, U, V, and W are shared weights. There will be a nonlinear transformation from U to the hidden layer, usually using hyperbolic tangent function (tanh). In addition, the multi-category output is converted through Softmax Function before V to output. The RNN architecture generally uses the cross-entropy error function.

Fig. 1
figure 1

Recurrent neural network architecture

2.3 Long Short-Term Memory (LSTM)

The structure of long short-term memory (LSTM) cell is shown in Fig. 2 [14]. The difference between LSTM and the basic type of RNN is that the memory of RNN will decrease as the data sequence increases. From a theoretical point of view, the gradient feedback of the hidden layer will decrease layer by layer as the sequence data increases, and LSTM can effectively improve this problem. With its advantages, LSTM has been widely used in various time series data and applications, such as pedestrian trajectory prediction [56], chemistry and material science [57], and COVID-19 trend prediction [58].

Fig. 2
figure 2

A long short-term memory cell

Fig. 3
figure 3

Attention mechanism with RNN-LSTM

LSTM cells are composed of input gates, forget gates, output gates, and unit states. The computation of hidden state \(h\_t\) in a specific LSTM cell is as follows:

$$\tilde{C}_{t} = {\text{tan}}h\left( {W_{c} x_{t} + U_{c} h_{t - 1} + b_{c} } \right)$$
(6)
$$i_{t} = \sigma \left( {W_{i} x_{t} + U_{i} h_{t - 1} + b_{i} } \right)$$
(7)
$$f_{t} = \sigma \left( {W_{f} x_{{\text{t}}} + U_{f} h_{t - 1} + b_{f} } \right)$$
(8)
$$O_{t} = \sigma \left( {W_{o} x_{t} + U_{o} h_{t - 1} + b_{o} } \right)$$
(9)
$$C_{t} = i_{t} \circ \tilde{C}_{t} + f_{t} \circ C_{t - 1}$$
(10)
$$h_{t} \user2{ } = \user2{ }O_{t} \circ \tanh \left( {C_{t} } \right)$$
(11)

where \(\sigma \left(\cdot \right)\) is the sigmoid activation function; \(\circ\) represents element-wise product; \({i}_{t}\), \({f}_{t}\), and \({O}_{t}\), indicate the input, forget, and output gates, respectively. \({C}_{t}\) is the activation vectors, and \({\tilde{C }}_{t}\) represents the intermediate vector of cell state. In LSTM model, each LSTM neural unit will go through a process of \({C}_{t-1}\to {C}_{t}\), \({C}_{t}\) is the current memory content, and \({C}_{t-1}\) is the memory content of the previous moment. The content of time t − 1 will be updated or deleted at time t. The forget gate uses the important measurement of past memory and the comprehensive calculation of \({C}_{t-1}\) to play the role of memory screening. The input importance factor is adjusted by the tanh function and then added to \({C}_{t-1}\) to expand the memory capacity.

2.4 Attention mechanism

Attention mechanism was first proposed in the field of computer vision, and then the Google Mind team applied the attention mechanism to the RNN model for image classification and achieved extremely high accuracy [59]. Subsequently, Bahdanau and Bengio [60] applied attention mechanism to synchronous machine translation tasks, which is a successful application on NLP. In 2017, the Google Brain team published < Attention Is All You Need > [16] to direct the attention mechanism to an independent structure instead of being attached to the existing CNN/RNN architecture.

The attention mechanism can simulate that when humans are watching a certain image or understanding things, they do not look at the whole picture in a full field of view, but instead focus their attention on a specific area. Attention mechanism is an improved version of the sequence-to-sequence (Seq2Seq) model in RNN [61]. In the sequence-to-sequence encoding stage, the hidden state of each time state is passed forward, and finally a set of vectors of uniform dimensions are generated before decoding. Seq2Seq causes each training data to correspond to the same set of context vectors. The attention mechanism (Fig. 3) is aimed at improving this situation. It passes the hidden state of each training to the back-end decoder, so that the important state in the timing will be learned by the network (i.e., the focus of attention).

3 The proposed attention-based BiLSTM stock trading strategy

3.1 Technical Indicator preprocessing

The technical indicators imported in this research include stochastic oscillator (KD), relative strength index (RSI), bias ratio (BIAS), William’s oscillator (W%R), and moving average convergence/divergence (MACD). Since the range of each technical indicator value is different from the meaning of stock market momentum, this study adopts the concept of Lee et al. [28] and normalizes the technical indicators as the following:

  1. i.

    Stochastic Oscillator (KD):

    $$\left\{ {\begin{array}{*{20}l} {A : {\text{K}}_{t - 1} < {\text{D}}_{t - 1}\; AND\quad {\text{D}}_{t} < {\text{K}}_{t} , } \\ {AND \left( {one\;of\;{\text{K}}_{t} ,{\text{K}}_{t - 1} ,\;and \quad {\text{K}}_{t - 2} \,is\,20\sim 80} \right)} \\ {B : K_{t - 1} > D_{t - 1} \;AND\quad {\text{D}}_{t} >{\text{K}}_{t} , } \\ {AND \left( {one\; of\;{\text{K}}_{t} ,{\text{K}}_{t - 1} ,\;and\quad{\text{ K}}_{t - 2}\,is\,20\sim 80} \right)} \\ {C : others} \\ \end{array} } \right.$$
    (12)

    where \({\mathrm{K}}_{t}\), \({D}_{t}\) are the K-D values on the trading day t [2]. When the K line is higher than the D line but the K falls below the D value in the next day, it is called a death cross. When the K line is lower than the D line but the K breaks through the D value in the next day, it is a golden cross.

  2. ii.

    Relative Strength Index (RSI)

    $$\left\{ {\begin{array}{*{20}l} {A: \quad {\text{RSI}}_{t} \le 20 } \\ {B: \quad 20 < {\text{RSI}}_{t} < 80} \\ {C:\quad {\text{RSI}}_{t} \ge 80} \\ \end{array} } \right.$$
    (13)

    where RSI value is close to 80 and 20 and is the time point for the trend reversal [62]. A value close to or greater than 80 indicates a downtrend, and a value close to or less than 20 implies a high possibility of rising.

  3. iii.

    Bias Ratio (BIAS)

    BIAS [63] is the deviation rate of the moving average, and its value may be positive or negative. The theoretical cutoff point is between + 3 and -3%. A positive deviation means that the upward momentum is large, and a negative deviation is the opposite. The normalization of BIAS input into the deep network in this study is as follows:

    $$\left\{ {\begin{array}{*{20}l} {A: \quad {\text{BIAS}}_{t} \le - 0.03 } \\ {B: \quad - 0.03 < {\text{BIAS}}_{t} < 0.03} \\ {C: \quad {\text{BIAS}}_{t} \ge 0.03} \\ \end{array} } \right.$$
    (14)
  4. iv.

    William’s Oscillator (W%R)

    William’s oscillator [64] is an oscillatory indicator that measures the overbought and oversold conditions of stocks. Its range and reversal signal are similar to the concept of the RSI indicator. The normalization of W%R input into the deep network in this study is as follows:

    $$\left\{ {\begin{array}{*{20}l} {A: \quad {\text{W\% R}}_{t} \ge 80 } \\ {B: \quad 20 < {\text{W\% R}}_{t} < 80} \\ {C:\quad {\text{W\% R}}_{t} \le 20} \\ \end{array} } \right.$$
    (15)
  5. v.

    Moving Average Convergence/Divergence (MACD)

MACD [65] is used to evaluate the short-term and long-term cross-state of the moving average. Its value can also be positive or negative. When the negative value turns positive, it means that the short-term upward trend breaks through the long-term stalemate state, and vice versa. The normalization of MACD input into the deep network in this study is as follows, where DIF is the difference between the short-term trend and the long-term trend:

$$\left\{ {\begin{array}{*{20}l} {A:{\text{DIF}}_{t - 1} < 9{\text{MACD}}_{t - 1} \;and \quad 9{\text{MACD}}_{t} < {\text{DIF}}_{t} } \\ {B: {\text{DIF}}_{t - 1} > 9{\text{MACD}}_{t - 1} \;and \quad 9{\text{MACD}}_{t} > {\text{DIF}}_{t} } \\ {C: \quad others} \\ \end{array} } \right.$$
(16)

3.2 System architecture

The overall framework of the proposed AttBiLSTM model and trading strategy design is shown in Fig. 4. The system first normalizes KD, RSI, BIAS, W%R, and MACD with the methods mentioned in Sect. 3. The model uses stock daily trading information, including opening price, closing price, highest price, lowest price, and trading volume plus normalized technical indicators to form a vector (i.e., StockVec) for training. After preprocessing, StockVec will be input into the well-trained model. The model uses the LSTM + attention architecture to predict and output the probability distribution of the predicted categories. The final predicted distribution will be provided to users as a reference for buying and selling decisions.

Fig. 4
figure 4

The overall framework of the proposed AttBiLSTM prediction model applied to trading strategy design

The operation procedure of the proposed deep model is shown in Fig. 5. The LSTM architecture used in this study is BiLSTM, which is more sensitive to the dependence between stock prices and technical indicators in a time interval. The proposed BiLSTM algorithm is shown in Algorithm 1. The model input is a 3D tensor, including StockVec, sequence length and batch size. The daily changing technical indicators and basic data form a set of TI embedding and then form a BiLSTM sequence input according to the time series determined by the experiment. As shown in Fig. 3 of Sect. 3, BiLSTM is used as the encoder stage of the model, the operation is to learn the weight dependence of the StockVec of the input sequence in both forward and backward directions. While the BiLSTM operation is completed, different time series data will produce corresponding context vector via the attention mechanism in the given batch.

Fig. 5
figure 5

The proposed AttBiLSTM operation procedure

After the decoding vector is generated through the previous module, the probability of rising, falling or flat vector is obtained through the dense layer and the Softmax function [30], which predicts the probability for the jth class from input vector X:

$$P\left( {{\text{output}} = j{|}{\mathbf{X}}} \right) = \frac{{e^{{{\mathbf{X}}^{{\text{T}}} w_{j} }} }}{{\mathop \sum \nolimits_{s = 1}^{S} e^{{{\mathbf{X}}^{{\text{T}}} w_{s} }} }}$$
(17)
figure a

3.3 Label design

The estimated labels designed by this research include two types: (A) the short-term future trend and (B) the estimate of the stock price rise and fall next day, as follows:

  1. (A)

    Short-term Future Trend:

    This type of label is divided into three categories, including (1) the future trend is bullish, (2) the future trend is flat, and (3) the future trend is bearish, which are defined as follows:

    $$\left\{ {\begin{array}{*{20}l} {\left( 1 \right): \quad MA\left( n \right)_{t}^{ + } - MA\left( n \right)_{t}^{ - } > MA\left( n \right)_{t}^{ - } \times \alpha } \\ {\left( 2 \right): \quad |(MA\left( n \right)_{t}^{ - } \times \alpha )\left| \ge \right|MA\left( n \right)_{t}^{ - } - MA\left( n \right)_{t}^{ - } |} \\ {\left( 3 \right): \quad MA\left( n \right)_{t}^{ + } - MA\left( n \right)_{t}^{ - } < - MA\left( n \right)_{t}^{ - } \times \alpha } \\ \end{array} } \right.$$
    (18)

    where \({MA(n)}_{t}^{-}\) denotes the moving average of stock prices in the past \(n\) days at day t, \({MA(n)}_{t}^{+}\) represents the moving average of stock prices in the next \(n\) days at day t, and \(\alpha\) is the threshold for judging whether to rise or fall. By default, \(n=\) 5 and \(\alpha =\) 0.5%.

  2. (B)

    The next day price rises and falls:

This type of label compares the prices of the next day and the previous day within a certain percentage and is defined as follows, where \({\mathrm{Close}}_{t}\) is the closing price at day t, and \(\alpha =\) 0.5% by default:

$$\left\{ {\begin{array}{*{20}l} {\left( 1 \right):\quad {\text{Close}}_{t + 1} - {\text{Close}}_{t} > {\text{Close}}_{t} \times \alpha } \\ {\left( 2 \right): \quad |{\text{Close}}_{t} \times \alpha )\left| { \ge } \right|{\text{Close}}_{t + 1} - {\text{Close}}_{t} |} \\ {\left( 3 \right): \quad {\text{Close}}_{t + 1} - {\text{Close}}_{t} < - {\text{Close}}_{t} \times \alpha } \\ \end{array} } \right.$$
(19)

3.4 Trading strategy design

This research uses the output probability of Softmax to implement the trading strategy. The concept is as follows:

$${\text{Softmax}}\_{\text{up }} > x$$
(20)
$${\text{Softmax}}\_{\text{down }} > y$$
(21)

If the predicted increase probability of the output classification is greater than a threshold, it is a possible timing to buy. Similarly, if the predicted decline probability in the model is greater than a threshold, it is a possible timing to sell. Considering that real market transactions will include stop loss, profit stop and consumer cash flow, etc., to simplify the problem and focus on the effectiveness of the deep network model in the design of trading strategies, this study will analyze the following two trading strategies:

Strategy_A: The market entry condition is fixed (while the probability \(\ge\) 35%), and the exit timing is set by the experiment.

Strategy_B: The exit timing is fixed (while the probability \(\ge\) 35%), and the entry timing is set by the experiment.

4 Experimental results and discussion

4.1 Experimental setting

This study adopts Yuanta/P-shares Taiwan Top 50 ETF (TPE0050) as the research subject. TPE0050 is the earliest securities ETF issued in Taiwan and the largest ETF in the Taiwan stock market. The investment subject of TPE0050 is the top 50 large-cap stocks of the Taiwan Capitalization Weighted Stock Index.

The experiment data are collected from the Taiwan Stock Exchange (TWSE), and the transaction date used in this experiment is from 2014/01/04 to 2021/05/31, with a total of 1807 trading days. The data are divided into 60% training set, 20% for validation, and the last 20% for testing. The number of data is 1084, 361, and 362, respectively. In addition, the time interval for the backtest of the trading strategy is 2017/01/16 ~ 2019/10/14, a total of 677 trading days, which is a section where stock prices fluctuate more intensely.

The experimental hardware uses Asus Esc8000 G3 host, Xeon E5-2600 v3 processor, 4 GB memory, and NVIDIA GeForce GTX 1080ti \(\times\) 8 GPUs with Ubuntu 18.04 LTS OS. The batch size is fixed at 32, epochs are between 50 and 200, and the parameter with the highest validation accuracy is selected as the prediction model. As shown in Table 1,  the experimental settings are (a) fundamental LSTM and attention-based BiLSTM, (b) five different technical indicators with (c) two predicted labels cross-tests. Finally, a combination model with higher accuracy is selected for trading strategy experiments and performs the backtest to calculate the return on investment. The measures of this study are recall, precision, F1 measure, and accuracy, which are defined as follows:

$${\text{Precision}} = { }\frac{{{\text{TP}}}}{{{\text{TP }} + {\text{ FP}}}}$$
(22)
$${\text{Recall }} = { }\frac{{{\text{TP}}}}{{{\text{TP }} + {\text{ FN}}}}$$
(23)
$${\text{F}}1{ } = { }2 \times \frac{{{\text{Recall}} \times {\text{Precision}}}}{{{\text{Recall}} + {\text{ Precision}}}}$$
(24)
$${\text{Accuracy }} = { }\frac{{{\text{TP}} + {\text{TN}}}}{{{\text{TP}} + {\text{TN }} + {\text{ FP }} + {\text{ FN}}}}$$
(25)

where TP, TN, FP, and FN represent the true positive, true negative, false positive, and false negative, respectively. F1 measure is the harmonic mean of recall and precision.

Table 1 Experimental combination of models, technical indicators and predicted labels, where AttBiLSTM denotes the proposed Attention-based BiLSTM model, STFT denotes the Short-term Future Trend, and NDP represents the Next Day Price prediction

4.2 Experimental results of four-layer LSTM in labels STFT/NDP

The experimental results of fundamental LSTM for STFT are shown in Table 2 and Fig. 6. The accuracy of the experiment with is between 57.40 and 66.80%. The flat prediction results are generally higher in recall, precision, and F1 measure. The downward trend prediction is relatively low.

Table 2 Experimental results of STFT in fundamental LSTMs
Fig. 6
figure 6

Confusion Matrix of fundamental LSTM for STFT, (a) ~ (f) represent the results of features (I) to (VI), respectively

In these six experiments, the KD indicator performs better in forecasting, which achieved 66.80% prediction accuracy. Among various technical indicators, the prediction performance of RSI is relatively low.

The experimental results of fundamental LSTM for NDP are shown in Table 3 and Fig. 7. The experimental accuracy is between 52.73 and 63.80%. The result shows that in the NDP problem, KD index also obtains better performance. The overall accuracy is still greater than 60%.

Table 3 Experimental results of NDP in fundamental LSTMs
Fig. 7
figure 7

Confusion Matrix of fundamental LSTM for NDP, (a)–(f) represent the results of features (I) to (VI), respectively

The experimental results of AttBiLSTM for STFT/NDP are shown in Tables 4, 5 and Figs. 8, 9. In STFT case, the experimental accuracy is between 62.00 and 68.83%, and the average accuracy is 64.70%, and 60.07−4.76% in NDP test. Compared with the fundamental LSTM model, there is a significant improvement in classification accuracy. The experimental results show that the KD indicator performs better than other types of technical indicators in both cases, followed by BIAS and MACD. In AttBiLSTM model, the improvements in average accuracy are 3.35 and 4.52 percentage points in STFT and NDP cases.

Table 4 Experimental results of STFT in AttBiLSTM
Table 5 Experimental results of NDP in AttBiLSTM
Fig. 8
figure 8

Confusion Matrix of fundamental AttBiLSTM for STFT, (a)–(f) represent the results of features (I) to (VI), respectively

Fig. 9
figure 9

Confusion Matrix of fundamental AttBiLSTM for NDP, (a)–(f) represent the results of features (I) to (VI), respectively

4.3 Experiments on trading strategy and return on investment (ROI)

This section discusses the return on investment (ROI) for different entry and exit conditions. According to the trading strategy settings described in Sect. 3, this study will backtest the proposed model on TPE0050 and evaluate the corresponding ROI. The backtest condition is a linear backtest with a single buy and sell. The backtest date is 2017/01/16–2019/10/14, which is the range of large price fluctuations. The experimental results of Sect. 4.2show that the AttBiLSTM model has a better predictive effect on STFT problem. In this section, the combination of AttBiLSTM + STFT is used to analyze the trading strategy. The ROI in this study is set as follows:

$${\text{ROI}} = \frac{{{\text{Selling price}} - {\text{Buying price}}}}{{\text{Buying price}}}$$
(26)

4.3.1 Strategy_A & Strategy_B

In the proposed trading strategy design, the approach of Strategy_A is to enter the market when the predicted price_up probability >  = 35%. Strategy A fixes the buying timing to the rising probability of 0.35, while the selling probability is divided into 0.2, 0.25, 0.3, and 0.35 for comparison (as shown in Table 6 and Fig. 10), the results show that the probability of 0.3–0.35 has a higher ROI, and AttBiLSTM + KD indicator obtains the highest ROI of 42.57%. Strategy_B has similar experimental result with Strategy_A, the results are shown in Table 7 and Fig. 11. When the falling probability is fixed at 35% and the rising probability is 35%, the model with KD indicator reaches the highest ROI of 42.74%. Figure 12 shows the backtest of buying/selling simulation from 2018/10/31 to 2019/05/16.

Table 6 Strategy_A, ROI of different probabilities of falling
Fig. 10
figure 10

The change trend of ROI in different falling probability with fixed price_up probability >  = 35%

Table 7 Strategy_B, ROI of different probabilities of rising
Fig. 11
figure 11

The change trend of ROI in different rising probability with fixed price_down probability >  = 35%

Fig. 12
figure 12

Backtest simulation of buying/selling from 2018/10/31 to  2019/05/16

5 Conclusions and future work

This research explores the effectiveness of attention-based time series neural networks applied to stock forecasting and financial commodity trading strategy design. This research proposes the concept of attention-based BiLSTM combined with trading strategy design and verifies the effectiveness of different technical indicators in the model. This study uses stochastic oscillator (KD indicator), RSI, BIAS, W%R, and MACD, which are commonly used in stock technical analysis, to perform the short-term trend and next-day rise/fall predictions. This research concluded the following conclusions:

  1. 1.

    Taking TPE:0050 as the research target, the AttBiLSTM model performs better than the fundamental LSTM model.

  2. 2.

    Among the several technical indicators based on the moving average concept, the KD indicator has a better overall effect, including classification forecasting and measurement of return on investment.

  3. 3.

    The results of the backtest experiment show that the proposed deep trading strategy can achieve a maximum ROI of 42.74%

  4. 4.

    The probability output of the deep network model can be directly applied to the trading strategy of financial commodities.

Future works have the following extensions:

  1. 1.

    The technical indicators used in this study are mainly moving averages. In the future, more different types of technical indicator analysis can be included.

  2. 2.

    More time series or non-time series networks can be considered for trading strategy design, such as CNN, attention-based CNN, GAN, and transformer.

  3. 3.

    In the future, the concepts proposed in this study can be used in advanced financial commodity transaction types, such as stretchable financial commodities, day-trading, futures trading, and option transaction.