Event prediction within directional change framework using a CNN-LSTM model

Financial forecasting has always been an intriguing research area in the field of finance. The widely accepted approach to forecast financial data is to perform predictions using time series data. In time series analysis, sampling the financial data with a predefined frequency (e.g. hourly, daily) leads to an uneven and discontinued data flow. Directional Change is a newly proposed approach that replaces physical time within the financial data and establishes an event-driven framework. With the emergence of the machine and deep learning-based methods, researchers have utilised them in financial time series. These techniques have shown to outperform conventional approaches. This paper aims to employ the CNN-LSTM model to investigate its predictive competence within the Directional Change (DC) framework to predict DC event prices. To obtain this objective, we first create the tick bars/candles of the GBPUSD, EURUSD, USDCHF, and USDCAD tick prices from January to August 2019. Then, the DC-based summaries of the selected tick bar/candle for each currency pair will be generated and fed to the CNN-LSTM model. The CNN-LSTM network architecture incorporates the robustness of Convolutional Neural Network (CNN) in feature extraction and Long Short-Term Memory (LSTM) in predicting sequential data. The results suggest that the performance of the CNN-LSTM model improves significantly within the DC framework.


Introduction
Although predicting a financial asset price has been an intriguing area of research, it is has proven to be a highly complex task due to the inherent complexity, volatility, and nonlinearity of financial markets. The widely accepted approach to analyse financial data is time-series analysis. Conventionally, in order to analyse the financial time series, prices are recorded by sampling data points at fixed time intervals (Daily, weekly, monthly). Researchers first decide how often to sample the data in this method, and then they take snapshots at the chosen frequency.
Consequently, financial time series are unevenly spaced and discontinuous concerning the flow of physical time [8]. Thereby, the interval-based summary of the price may miss important key events and lose profitable trade opportunities.
To tackle the aforementioned shortcoming of the traditional approach of time series analysis, Guillaume et al. [10] proposed a new method for scaling time. Directional changes (DC) is an alternative approach that replaces the notion of ''physical time scale'' and looks beyond the physical time constraints within financial data, and constitutes an event-driven approach. Hence, market data are being observed from the event-based rather than the interval-based perspective. With the recent success of machine and deep learning approaches, many researchers have applied various algorithms and architectures on financial time series to predict financial assets' price and movement [27]. Mehtab and Sen [19] presented a suite of CNN-based regression models with a high level of accuracy and robustness in forecasting multivariate financial time series. This study proposes a deep learning-based regression model to predict the price of the directional change framework events the currency pairs in the foreign exchange (FX) market and evaluate its performance within and without the Directional Change framework.
The remainder of this research paper is organised as follows. Section 2 presents a brief overview of the related work in the field of financial forecasting. Section 3 presents the methodology of this study, which includes discussing the directional change framework, Long Short-Term Memory, Convolutional Neural Networks, Support Vector and Random Forest regression, data, experiment and results. Finally, in Sect. 4, we conclude the paper.

Related work
Financial forecasting has always been an exciting research area in the financial industry. Numerous studies have been published on machine learning models with relatively better performances than classical time series forecasting techniques [17,29,30,34,37]. Researchers endeavoured to use nonlinear models to predict. With the advent of machine learning methods such as neural networks, support vector machines (SVM), researchers utilise them for time series prediction [16]. Zbikowski [38] employed Volume-Weighted SVM feature selection techniques to enhance classifier accuracy to create a stock trading strategy. Choudhury et al. [4] utilised k-means and SVR to predict market volatility and prices for two days in the Indian stock market. Artificial neural networks (ANNs), a sub-class of machine learning models are widely used for predictive data-mining tasks. The applicability of artificial neural networks to stock market predictions was first hypothesised by White [36], with some indications of success by Saad et al. [25]. Artificial neural networks, in essence, mimic the structure of biological neural networks where neurons are interconnected and learn from experience.
In 2003, Zhang used neural network and auto-regressive integrated moving average model (ARIMA) to forecast stocks. The experimental results proved the advantage of neural networks in nonlinear data forecasting [39]. Abu Hammad et al. [1] investigated the Jordanian stock market with a multi-layer back propagation (BP) network, nonetheless did not discuss the BP proneness to fall into a local minimum. Zhang et al. [40] proposed a stock forecasting model based on LM-BP neural network which improves the traditional BP neural network. Wang et al. [35] proposed a wavelet neural network to forecast stock prices. Persio and Honchar [6] compared the performance of three different variants of RNNs to predict Google's stock price. Their model showed better results for LSTM compared to the basic RNN and the Gated Recurrent Unit (GRU), with an accuracy of 72% within a five day period. They shuffled the train and test data to prevent the network from over-fitting.
The prediction of the Nifty Index movements using the open, high, low, close prices was implemented with an LSTM RNN architecture in Roondiwala et al. [24] work. Their work reached a root mean squared error of 0.0086 after training with 500 epochs. Karmiani et al. [13] compared the performance of LSTM to SVM, backpropagation and Kalman filter with epochs between 10 to 100 and found that LSTM has high accuracy and low variance. Fischer and Krauss [5] performed a large-scale prediction of S and P500 from December 1992 to October 2015 and showed that the LSTM model outperforms the machine learning methods and deep networks. Nelson et al. [20] proposed an LSTM-based model in combination with 175 technical indicators to predict the stock market movement. Salis et al. [26] presented a thorough investigation of the application of LSTM models and artificial neural networks in predicting the fluctuation of daily gold prices. Zhuge et al. [41] predicted the opening stock prices using their proposed LSTM model. They combined the classification results and the analysis of the naive Bayesian-based emotions. In 2018, Hu [12] used CNN to predict time series. Their results showed that CNN can predict time series, however, the forecasting accuracy is relatively low. Sezer and Ozbayoglu [28] utilised the CNN model to classify the daily price of Dow 30 stocks and Exchange-Traded Funds (ETFs).

Methodology
The methodology is structured as follows. In Sect. 3.1, the directional change framework will be introduced. Sections 3.2 and 3.3 explain Long-Short Term Memory (LSTM) and Convolutional Neural networks (CNNs). Section 3.4 briefly introduces Support Vector and Random Forest regression. Sections 3.5 and 3.6 describe the data and the Average True Range. Finally, in Sect. 3.7, the experiment will be presented in detail.

Directional change framework
The directional Change (DC) is an approach to summarise price movement by transforming a time series price curve into an intrinsic time curve [32]. Under the DC framework, a DC event is identified by a substantial change in the price of an asset, defined as a price change greater than a predefined threshold value h. Following a DC event, an overshoot (OS) event happens until the next DC event in the opposite direction. Figure 1 illustrates a time series and the corresponding intrinsic time series for a h = 0.01%. Based on DC approach, the market is broken down into an alternating uptrend and downtrend. An upturn event indicates that the price change between the current market price p t and the last low price p l is greater than a threshold h: As illustrated in Fig. 1, the move from point A to B is an upturn DC event. By the same token, a downturn event is defined as an event where the difference between the current price p t and the last high price p h is lower than a fixed threshold h [32]: A trend ends whenever a price change of the same threshold h is observed in the opposite direction, see [2]. It should be noted that different thresholds generate different series of events. The notion of using different thresholds is that each threshold might be considered significant by a different trader. Smaller thresholds create more directional changes compared to larger ones. As it was mentioned above the value of the threshold needs to be predetermined when summarising price movements using the DC. It represents how big of a price change the observer considers as significant. Tsang and Chen [31], Bakhach et al. [2] , and Golub et al. [9] have explored classical machine learning techniques such as the Hidden Markov Model and Naïve Bayes classifier to predict the behaviour of tick prices within an event-driven approach in the directional change framework. In our work, we extended their work into a deep neural network paradigm. Since different thresholds generate different market summaries, we also proposed incorporating the Average True Range indicator to determine the DC thresholds dynamically. For the interested reader, a more detailed discussion on Directional Change may be found in [3].

Long short-term memory (LSTM)
Recurrent Neural Networks (RNN), are a robust type of artificial neural network which process sequences by iterating through the sequence elements and maintaining a state containing information relative to previous states. Unlike the Feed-Forward neural networks, RNNs models can leverage the previous inputs' sequential information through memory gates. The RNNs memory, which is called recurrent hidden state, enable the network to predict the next item in the input data sequence. Practically, however, the length of the sequential information is limited to only a few steps back. Although RNNs should theoretically retain information from previous time-steps, such long-term dependencies are impossible to learn in practice. A common problem among RNNs is vanishing gradient when the gradients' information vanish while passing through a deep layered network. The gradient is the partial derivative of a function's output with respect to its inputs' changes. This problem prevents the network from learning long-term dependencies which causes the learning process to slow down or stop altogether. Conversely, there is the exploding gradient problem in which the gradient's information accumulate and result in a large gradient. In the ''vanishing gradient'' problem, the network assigns smaller values to the weight matrix, and in the ''exploding gradient'' problem, the opposite is true. As mentioned earlier, RNNs are not capable of learning long-term dependencies [11]. The LSTM models are an extension of RNNs and are designed to address the vanishing gradient problem. Generally, the LSTM model consists of three gates: forget, input, and output gates, as shown in Fig. 2. The forget gate is responsible for deciding to preserve or removing the existing information. The input gate determines the extent to which the new information will be added into the memory, and the output gate controls whether the current value in the cell contributes to the output [11].
• Forget Gate: In the forget gate block of the LSTM layer, the information from the current input x t and the previous hidden state h tÀ1 is passed through an activation function (e.g. sigmoid). The gate output f t will be a value between 0 and 1, where zero implies removing the learned value while one means to preserve the value. The output is computed as: where b f is called the bias value. • Input Gate: This gate which determines the additions of new information to the LSTM memory has two layers. A sigmoid layer decides which values need to be updated and the hyperbolic tangent layer generates a vector of new values that will be added to the memory.
The output value of the input gate is computed through the following formulas: Together, these two layers update the LSTM memory, forgetting the current value by multiplying the old value and adding a new value i t ÃC t . The following represents its equation: • Output Gate: Here the gate first uses a sigmoid function to determine which part of the LSTM memory contributes to the output. Subsequently, through the nonlinear tanh function, it maps the values between À1 and 1. Figure 2 is the depiction of the LSTM architecture.

Convolutional neural networks (CNN)
Convolutional Neural Network (CNN), designed by Lecun et al. [15] is a special type of Feed-Forward network with high performance in image processing and natural language processing [14]. The main parts of the CNN are the convolution and pooling layer. Each convolution layer contains different kernels. Following the convolutional operations, the high dimensional extracted features pass through a pooling layer to reduce the dimensionality.
In the above equation, l t represents the convolution's output, x t is the input vector, k t is the convolution kernel weights, and b t is the bias. Although Convolutional Neural Network was initially designed for image processing, it can be utilised for time series forecasting. The reduced number of parameters by the CNN improves the efficiency of the model [23].

Support vector and random forest regression
Support Vector Machines proposed by Vapnik [33] formulate the binary classification problem as convex optimisation problems, which entails finding the maximum margin separating the hyperplane. Support vectors represent the optimal hyperplane. The introduction of an -insensitive region around the function forms epsilon-tube around the function, generalising the Support Vector Machine to Support Vector Regression. The so-called -

Data
Financial data comes in a variety of shapes and forms. The four essential financial data types are fundamental data, market data, analytics, and alternative data. To apply machine learning algorithms on unstructured financial data, we need to parse it and extract valuable information, then store those extractions in a regularized format. The tabular representations of data used in ML algorithms (i.e. table rows) equate to what finance practitioners refer to as bar in bar charts [7]. Time bars which perhaps are the most popular among market practitioners and academics are generated through sampling price information at fixed time intervals. The information usually includes; timestamp, volume-weighted average price, open, high, low, close, and traded volume. Time bars unrealistically process information at a fixed time interval, leading to an exhibition of poor statistical properties [7]. In financial jargon, a tick refers to a change in the price of a security from a trade to the next. In order to create tick bars, sample variables mentioned earlier will be extracted each time a predefined number of transactions occurs, allowing synchronising sampling with a proxy of information arrival. For instance, if we wish to generate 100-tick bars, we need to store the 100 price information and then extract the open, high, low, and close value from the observations. Mandelbrot and Taylor [18] found that sampling as the function of transaction numbers exhibit Gaussian distribution properties. In contrast, sampling over a fixed interval may follow a stable Paretian distribution, whose variance is infinite [7]. It should be mentioned that throughout this paper, tick bars and tick candles are used interchangeably. The sole difference between the two is that the tick candles are colour coded to reflect any increase or decrease in price.

Average true range
The average true range (ATR) is a technical analysis indicator that measures market volatility. It decomposes the whole range of an asset price for a specific period. It is typically derived from a moving average of length 14 of a series of true range values and can be calculated on an intra-day, daily, weekly or monthly basis. If the current high is above the prior period's high and the low is below the prior period's low (i.e. outside day) high less the low will be used as the True Range. In addition, in the case of a   gap when the previous close is greater than the current high or the previous close is lower than the current low, or an inside day (i.e. when the current high is below the previous high and the current low is above the previous low), current high less the previous close or the current low less the previous close will be used. Following equations represents the calculation of ATR: where TR i is the true range, and n is the time period. In Eq. 12, ATR%, is the ATR division by the current price of the asset. Table 1 illustrates a sample of raw tick prices transformed into tick bars, sampled for every one thousand observations. The open, high, low, and close are the first, highest, lowest, and last tick prices within a sequence of a thousand tick prices. The last column is the price at which the directional change occurs. The change in direction is confirmed if the price exceeds a threshold in either direction. The remaining values in the directional change column are excluded since no more ATR%-defined changes in direction happened in the sample.

Experiment
This paper's objective is to apply the CNN-LSTM network to the generated DC-based summaries of GBPUSD, EURUSD, USDCHF, and USDCAD tick prices to predict the following price of the directional change event.
The initial dataset comprises of the currency pairs' tick prices from January to August of 2019, in comma-separated variables (CSV) format. As we mentioned earlier, a tick price alludes to a change in an asset price from one trade to the next. Our model aims to predict the immediate step-ahead movement of the financial asset tick prices instead of the time prices. Note that predictions are short-term and sensitive to the threshold values, i.e., different user-defined thresholds produce different summaries of the price movements.
To generate the tick bars, we will aggregate 50, 100, 200, 500, 1000 data points from the original tick prices of the GBPUSD, EURUSD, USDCHF, USDCAD currency pairs. Every tick bar has an open, high, low, and close price. The open and close prices correspond to the price of the first and last trade. The high and close prices are the maximum and minimum prices within the range of the predefined number of ticks. Figure 3 is the depiction of the generated tick bars/candles from the GBPUSD tick prices with the predefined number of ticks. The tick bar with the The Durbin-Watson test reports a value from 0 to 4, where: • DW ¼ 2 is no auto-correlation.
• 2\DW\4 is negative auto-correlation. Table 2 represents the Durbin-Watson results for the tick bars. As the results imply, 1000 tick-bar has the lowest DW value for GBPUSD, EURUSD, USDCHF and 200 tick-bar for the USDCAD pair. The Average True Range will be calculated for the tick-bars with the smallest DW and will then be used as the Directional Change threshold h. As it was previously mentioned, the Average True Range (ATR) is a market volatility measure and is typically calculated from the 14-day simple moving average of true range values. With the derived h, DC-based summaries will be generated and used within a sliding window of length 5 to predict the next event value. The CNN-LSTM model, as its name implies, consists of a convolutional neural network layer and a long short-term memory layer. Figure 4 is the illustration of the employed model. As demonstrated in Fig. 4, the convolutional layer outputs are passed into a max-pooling layer. In order to prevent the model from over-fitting, a dropout layer is placed following the LSTM layer. The number of Convolutional filters, LSTM units and activation function, as well as the Dropout percentage and optimizer learning rate, were determined through hyper-parameter tuning with KerasTuner [21]. Table 3 presents the parameters' setting for the CNN-LSTM model. The DC summaries of the currency pairs were divided into training, validation, and test sets, where 80% of data points constitute the training, and the remaining 20% is the test set. Moreover, 20% of the training set was used as the validation set to prevent data leakage. The training process was performed with the Adam optimiser and the mean squared error as the loss function. To evaluate the predictive performance of the model, the mean absolute error (MAE), root mean squared error (RMSE), and coefficient of determination (R 2 ) will be used. The followings are the equations for the MAE, RMSE, and R 2 ( Table 1).
The CNN-LSTM model will be trained and validated with the DC summaries of GBPUSD, EURUSD, USDCHF, and USDCAD with an EarlyStopping of Keras callback API. Initially, DC summaries of the GBPUSD pair will be used to train and validate the model on the training and validation sets with respective 4,567 and 1,138 data points. Prediction on the test set, which is considered the out-ofsample set, resulted in a 0.0142 mean absolute error and a 0.0179 root mean squared error. Figure Table 4b, in the absence of the DC Framework, the coefficient of determination has plummeted from 0.985 to 0.359. Figure 5b portrays this noticeable decline in the prediction accuracy of the model. The same steps were applied for EURUSD, USDCHF, and USDCAD currency pairs. With the suggestion of Table 4 and the comparison of Fig. 6a and b , an increase in the MAE and RMSE metrics from 0.0188 to 0.0294 and 0.0248 to 0.0368 is discernible. Furthermore, the coefficient of determination (R 2 ) for EURUSD has decreased  prediction in all performance metrics. It is concluded from the results that applying the CNN-LSTM architecture within the directional change framework improves the accuracy of prediction for high-frequency FX data. Support Vector and Random Forest regression, two widely used machine learning techniques in financial forecasting, were also utilised to compare to the CNN-LSTM model. Both models' hyper-parameters were tuned with Ran-domisedSearchCV [22] and used in the same fashion as the CNN-LSTM with and without DC framework. It is concluded from Table 4 that Support Vector, and Random Forest regression failed to perform an acceptable prediction with significantly high error and negative coefficient of determination (R 2 ). Summarily, the tick bars were created from raw tick prices and the least auto-correlated were determined using the Durbin-Watson statistic. Next, the least auto-correlated tick bars were used to calculate the ATR value, which then was used as the Directional Change threshold h. Then, the DC summaries of the tick bars were generated. Finally, the proposed model was applied to the mentioned DC summaries of all the currency pairs as well as their raw tick

Conclusions and future work
This paper has investigated applying the CNN-LSTM model within the Directional Change (DC) framework, an approach to summarise price movement by transforming a time series price curve into an intrinsic time curve to predict the subsequent event price. An event is identified by a significant change in the price of an asset, defined as a price change greater than a predefined threshold value theta. The threshold h is determined with the Average True Range (ATR) indicator. The CNN-LSTM employs the DC summaries of tick bars with the lowest Durbin-Watson statistic for GBPUSD, EURUSD, USDCHF, and USDCAD currency pairs as the model's input. The same model was applied to the closing prices of the currency pairs tick bars without the DC framework to inspect the model's performance. The experimental results suggest that the CNN-LSTM performance improves significantly within the directional change framework concerning MAE, RMSE, and R 2 metrics for all the currency pairs.
In future research, we intend to apply our model to predict more extended periods and experiment with more complex GRU and BiLSTM architectures on different currency pairs and financial assets. Due to the fact that thresholds are determined based on the practitioner's preferences, it would be of importance and interest to explore ways to determine the Directional Change threshold dynamically to address the sensitivity of the model to thresholds.
Data availability statement The datasets generated during and/or analysed during the current study are available from the corresponding author on reasonable request.

Declarations
Conflict of Interest The authors declare that they have no conflict of interest.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons. org/licenses/by/4.0/.