1 Introduction

Predicting foreign exchange (Forex) rate is an important step in understanding the relationship among global currencies to evaluate the benefits and risks attached to cross-border trading. The prediction of Forex rate was relatively straightforward up to the early 1970s (Chang and Huang 2014). It was mostly determined by the balance of payments of countries and their level of importation and exportation of goods and services. In 1973, the floating exchange rate was adopted by the world major currencies and in recent times, Forex trading is primarily done electronically (Cheung and Chinn 2001). These changes in currency policy and trading location opened the market to more participants, which led to an increase in market activities. The higher number of participants coupled with local and international supply–demand factors, such as economic, political and psychological, makes Forex forecasting a challenging task (Spero and Hart 2009; Nassirtoussi et al 2011; Frieden 2014; Bilgin et al 2020; Pascual-Ezama et al 2014; Petropoulos et al 2017). These challenges include: (i) short-term pronounced price fluctuation; (ii) a high trading volume of over 6.6 trillion USD per day as of 2019 (Wooldridge 2019) in a market open 24 h/day from Sunday 20:15 GMT to Friday 22:00 GMT across the globe (Sobol and Szmelter 2020; iii) low profit margin in comparison to fixed income trading (Petropoulos et al 2017); and (iv) noisy and chaotic signals, making separation of uninteresting features from trends difficult (Abu-Mostafa and Atiya 1996; Kamruzzaman et al 2003).

Majority of approaches to predict Forex rates use historic market data captured using a physical time scale (Brabazon et al 2020). A drawback of using a physical time scale is that it makes the flow of physical time discontinuous, exposing market participants to some degree of risks due to ignorance of market activities between discrete time points. An alternative approach is to utilise intrinsic time scale which summarises data by capturing significant activities in the market. In this work, we use directional changes (DCs)—a form of intrinsic time scale to summarise significant market movement. DCs presents an alternative way of sampling data. Instead of taking snapshots of historical data in constant intervals, snapshots are taken when there is a change in price by a predetermined threshold \(\theta \). The threshold value is decided in advance by a trader according to their belief of what a significant price change is, either upwards or downwards. Price summaries are thus divided into alternate upward and downward trends. Each of these trends consists of a DC event, which is usually followed by an overshoot (OS) event. Using different threshold values allows the detection of different events and, as a consequence, the creation of different trend summaries. Therefore, the DC framework focuses on the size of a price change as time varies, while under physical time, the time interval is fixed (e.g. daily closing prices). This concept provides traders with new perspectives for price movements analysis and allows them to focus on key price movements, blurring out other price details which could be considered irrelevant. Furthermore, DCs have enabled researchers to discover new regularities in markets, which could have been ignored by the interval-based summaries (Glattfelder et al 2011). Therefore, these new regularities give rise to opportunities for traders and open a whole novel area for research.

As a result, an increasing number of works have been using the DC concept for trading purposes (e.g., Aloud 2020, 2021). Furthermore, Gypteau et al (2015) proposed a genetic programming (GP) based multi-threshold DC (MTDC) strategy, where the terminal nodes of the GP trees were composed of the trading actions (buy/sell/hold) recommended by each DC threshold, and the inner nodes were logical operators for combining the above recommendations. Bakhach et al (2016) proposed a classification algorithm that uses information from event series sampled using smaller threshold to forecasts DC events in DC summary sampled with a larger threshold. Ye et al (2017) proposed a mathematical equation for anticipating the magnitude of OS events. Their result shows that trading based on DC trend (DCT) reversal forecasting techniques yields profitable positive returns at comparatively low risk. Alkhamees and Fasli (2017a) highlighted a problem with summarising price movements based on single fixed threshold over a long physical time period. They argued that if a threshold of 0.01% is used in summarising events and over time significant events level drops to 0.009%, the new types of events will not be captured. Based on this finding, they recommended to trade with event summary sample taken over shorter physical time frame and to recalibrate threshold size at intervals to identify the most significant event. They proposed to generate event series daily with dynamically adjusted thresholds size according to current and previous day price movement. Comparison results showed that trading on event series generated in shorter time period with dynamic threshold was more profitable than trading on event series generated using fixed threshold over longer periods. A similar conclusion was reached by Alkhamees and Fasli (2017b) having explored the same idea of generating event series using dynamically adjusted thresholds in data stream. Salman et al (2022) proposed several new DC-based trading strategies and optimised their recommendations using a genetic algorithm (GA). Lastly, Long et al (2022) was the first to combine different DC indicators under a GP algorithm.

A particular branch of DC trading research has been to identify the reversal point of a trend. Kampouridis and Otero (2017) and Kampouridis et al (2017) attempted to do this by estimating the length of the DCT. To achieve this, they calculated the average length of DCTs for each dataset in the training set, and then used this value to predict when a trend would end in the test set. Adegboye et al (2017) extended Kampouridis and Otero (2017) by using a symbolic regression GP (SRGP) algorithm to evolve equations that calculated the average DC–OS event length ratio which was thus used to predict the duration of a trend. They identified both linear and non-linear relationships between DC and OS events, which they embedded into a trading strategy and yielded higher returns. Adegboye and Kampouridis (2021) further extended the above work, by observing that trends in DC datasets did not consistently have both DC and OS events. They observed that it was possible to have as little as 14.77% of DC events having a corresponding OS event. Although the number of OS events in a DC summary is threshold dependent, the maximum number of DC–OS event pair observed was 52.46%. To address this issue, they proposed a DCT reversal forecasting algorithm that combined classification with symbolic regression. A tailored classifier distinguished between DCTs composed of OS and DC events and others having only DC event. They then used a tailored SRGP to estimate OS event length of DCTs classified to have an OS event in their training set. Their results showed their approach significantly improved DCT reversal forecasting. The model was embedded in a single threshold-based strategy and tested on 1000 DC datasets created from a 10-min physical time-series from 20 major Forex markets. The trading strategy outperformed other DC and technical analysis-based strategies including buy-and-hold (BandH). Similar results were obtained in Adegboye et al (2021), which applied the above classification technique to a number of different DC algorithms.

While the above classification and regression GPs were novel and effective algorithms, they only used a single DC threshold, i.e., each trading strategy was based on a single DC summary. Therefore, the strategy is constrained by the information provided by that specific DC threshold—this is a major limitation. Besides, it is not easy to know which DC threshold results return a more informative DC summary. To overcome the above drawback, in this paper we propose using a MTDC trading algorithm. As a result, many thresholds will be used simultaneously, and thus at each point in time, there will be multiple buy–sell–hold recommendations. To overcome the conflicting recommendations, we will use a GA to optimise the weights of each DC threshold. The rationale for combining predictions stems from the fact that different kinds of events drive price volatility and a threshold is capable of summarising only one type of such events. Using multiple thresholds will enable us summarise concurrent events, thus, increasing the total number of DC events over the profiling period, consequently providing more opportunities to trade profitably.

We will run experiments on 200 datasets from 20 different Forex currency pairs. The proposed MTDC strategy will be compared against a total of nine benchmarks: five single-threshold DC (STDC) strategies; three technical analysis indicators under physical time; and a BandH strategy. The rest of this paper is organised as follows. Section 2 presents a brief overview of the concept DCs, and Sect. 3 presents our proposed GA-based multi-threshold trading strategy. Section 4 presents the experimental setup, and Sect. 5 presents and discusses the results. Lastly, Sect. 6 concludes this article and discusses future work.

2 DC background

A DC event is identified by price changes defined by a user-specified threshold value. DC events are divided into upturn and downturn events. Once a DC event is confirmed, the price series usually continue moving towards the same direction (upwards or downwards, depending on what the current DCT is), and they form an OS event. An OS event finishes once a DC event in the opposite direction is confirmed. A DCT, upward or downward, consists of the combination of a DC and an OS events. Different thresholds generate different event series. Smaller thresholds create higher number of DC events than larger thresholds, which produce fewer events.

Let us now look at Fig. 1, where we present how we can summarise a physical-time price series into DC and OS events. In this example, we summarise price movements with two different thresholds, namely \(\theta =0.01\%\) (lines in red) and \(\theta =0.018\%\) (lines in blue). Price changes below \(\theta \) are not considered a significant event. Price changes above \(\theta \) are considered significant events, and divide the market into uptrends and downtrends. Solid lines represent DC events, and dashed lines represent OS events. For example, under \(\theta =0.01\%\), between Points A and B we have a downturn DC event followed by a downward OS event from Point B to C; when a trend reversal occurs, an upturn DC event starts from Point C to D. Lastly, between Point D and E it is an upward OS event. The price point where a DCT begins or ends is called DC extreme point (DCE); under \(\theta =0.01\%\), Points A, C, and E are DCE points.

Fig. 1
figure 1

Directional changes for the GBP/JPY FX currency pair. The red lines represent events created by a threshold \(\theta = 0.01\%,\) and the blue lines events created by a threshold \(\theta = 0.018\%.\) DC events are denoted by solid lines, and OS events by dashed lines. Under \(\theta = 0.01\%\), we summarise data as follows: Downturn DC event: Point \(A \mapsto B\); Downward OS event: Point \(B \mapsto C\); Upturn DC event: Point \(C \mapsto D\); Upward OS event: Point \(D \mapsto E\); Downturn DC event: Point \(E \mapsto F\). Under \(\theta = 0 . 018\%,\) we summarise data as follows: Downturn DC event: Point \(A \mapsto B^{\prime }\); Downward OS event: Point \(B^{\prime } \mapsto C\); Upturn DC event: Point \(C \mapsto E\); Upward OS event: Point \(E \mapsto E^{\prime }\). DC extreme points (DCE): Points A, C, E, and \(E^{\prime }\). DC confirmation points (DCC): Points B, \(B^{\prime }\), D, E, and F

Under \(\theta =0.018\%\) (lines in blue), we obtain a different set of events: from A to B\('\): a downturn event; from B\('\) to C: a downward OS; from C to E: an upturn DC event; lastly, from Point E to E\('\) we have an upward OS trend.

Note that we can only confirm a DC event in hindsight, i.e., after there has been a price change of \(\theta \). For instance, under \(\theta =0.01\%\) we would not know we are in an upward trend until we have reached Point D. This point is called a DC Confirmation point (DCC). Before Point D, one would consider that the market has been in a downward trend since Point A. Similarly, we would not know the trend has reversed from upward to downward until we have reached the DCC Point F. It is therefore crucial to be ablel to accurately predict when a trend reversal will take place. Algorithm 1 presents the pseudocode for the transformation of physical time series to event-based (DC) series.

figure a

3 Methodology

Our proposed method, which we refer to as multi-threshold DC (MTDC), is a new algorithm for trading under the DC paradigm. This novel algorithm will overcome the limitations from Adegboye and Kampouridis (2021) and Adegboye et al (2021), which were constrained in using a single threshold to generate DC summaries. MTDC allows for multiple DC thresholds and DC summaries to be used. This allows for multiple view of the data. Thus combines the use of different threshold values in an attempt to take advantage of the different characteristics of smaller and larger event kinds that cause price movement. Furthermore, since there are multiple thresholds, there can be multiple recommendations (buy/sell/hold) at any point in time. To resolve the conflicting recommendations, we use a GA to decide how much weight we should assign to each DC threshold.

Our trading algorithm can be broken into two main parts: a STDC algorithm and MTDC, which essentially optimises the recommendations from the multiple STDC algorithms. STDC was first presented in Adegboye and Kampouridis (2021), and its main components are summarised in Sect. 3.1. Afterwards, in Sect. 3.2, we discuss the main contribution of this article, namely the MTDC algorithm.

3.1 Singe threshold-based DC strategy

The aim of the single threshold-based DC strategy (STDC) is to predict when the current trend will reverse (trend reversal point) and subsequently use this information during trading. The trend reversal point can be predicted via regression algorithms, where the relationship between the DC and OS lengths is estimated. However, early work in Adegboye et al (2017) showed that simply regressing the above relationship has a major drawback: the resulted function f that describes this relationship does not take into account that many DC events are not followed by an OS event, as they can often be followed by another DC event of the opposite direction. Thus any regression algorithm will learn a DC–OS length relationship by using inaccurate data. To overcome this issue, we first use a classification step, which predicts whether a DC event is followed by an OS event. Introducing this classification step allows us to only perform regression on data that contain consecutive DC and OS events, thus creating more accurate regression models, which itself enables us to predict the end of a trend. This information is then used by a trading strategy.

Thus, STDC has three main steps: a classification step; a regression step; and a trading step. The process flowchart is illustrated in Fig. 2. Next we briefly present each step. For a more detailed description, the reader is referred to Adegboye and Kampouridis (2021).

Fig. 2
figure 2

Predicting trend reversal in DC. A DC trend classified to compose of only DC event is expected to reverse at DCC, while DC trend classified to compose of DC and OS events is expected to reverse at estimated DCE, which is the sum of DC event length at DCC and the OS event length predicted by the SRGP. Once the trend reversal point has been determined, we embed it into a trading strategy and perform trading

3.1.1 The classification step

As there are numerous classification algorithms that can be used for the classification task, STDC uses Auto-WEKA (Thornton et al 2013), an automated machine learning (AutoML) framework.Footnote 1 Using Auto-WEKA allows for a tailored classification algorithm and tailored hyperparameters for each dataset. The model classifies a DCT as either composed of DC and OS events (\(\alpha DC\)) or only DC event (\(\beta DC\)). If a trend is classified as \(\beta DC\), then the trading action will be taken at the DCC point (more about this in Sect. 3.1.3). Conversely, if a trend is classified as \(\alpha DC\), the trend is expected to reverse at the end of a sum of the DC event length, known at the DCC point and the OS event length, estimated with a regression model, which is presented in the next section (Sect. 3.1.2).

Lastly, the attributes used for classification are DC-related features, namely:

  • DC event price, which is the price difference between the upturn/downturn point and the DCC point.

  • DC event time, which is the time difference between the upturn/downturn point and the DCC point.

  • Speed, which is the speed at which price change from the start of a trend to the DCC point.

  • Previous DC event price, which is the price at the previous confirmation point.

  • Previous OS, which is a boolean variable that indicates whether the immediately previous DCT has an OS event.

  • Flash event, which is a boolean variable indicating whether the DC event start and end times are equal.

3.1.2 The regression step

Once the classification step is complete, STDC aims to learn the relationship between the DC and OS lengths for each dataset. This is actually a symbolic regression step, which aims to find both the shape of the solution/equation, as well as the value of its parameters. The target solution can be expressed by the generic equation 1, which essentially tells us that the length of an OS event is a function of the length of the DC event.

$$\begin{aligned} OS_l = f(DC_l), \end{aligned}$$
(1)

where \(OS_{l}\) is the length of an OS event, \(DC_{l}\) is the length of a DC event.

To find the form of the equation, Algorithm 2, a tree-based SRGP algorithmFootnote 2 (SRGP) proposed by Adegboye et al (2017) is used. SRGP is able to evolve scale variant linear and non-linear equations that best express the relationship between DC and OS events lengths in a DC event summary.

figure b

To evolve the equation, SRGP is configured to use 2-arity functions {addition, subtraction, division, multiplication, power} and 1-arity functions {sine, cosine, power, logarithm, exponential} as the function set. The terminal nodes are composed of attribute that represented DC event length and ephemeral random constants (ERC).Footnote 3 SRGP is initialised using the ramped half-and-half method. During evolution the fittest 10% of the tree population are copied to the next generation and the rest are evolved with subtree crossover and mutation operators. The fitness of the trees in the population is measured using Eq. 2. It calculates the regression error \(\varepsilon \) between actual OS length (\({OS_l}\)) and SRGP estimated OS length. After evolution, the tree with the least regression error in the final generation is selected as the regression model.

$$\begin{aligned} \varepsilon \ = \sqrt{\frac{\sum _{i=1}^{N} ({OS_l} - \hat{{OS_l}})^2}{N}}, \end{aligned}$$
(2)

where N is the sample size, \(\varepsilon \) is the root mean squared error.

A concurrent step with the creation of the regression model is the selection of appropriate thresholds. A pool of 100 thresholds is created with real value numbers from 0.005 with a step size of 0.0025. The SRGP model described above is created for each threshold in the pool. The regression errors of the 100 thresholds’ best SRGPs are then measured. The SRGP with the least error with the associated threshold is selected for backtesting.Footnote 4

For backtesting, Adegboye et al (2021) selected the best five thresholds and their SRGPs and traded with them independently. A characteristic of a single threshold-based trading strategy is its obliviousness to slight changes to price movement that are lower than the specified threshold even though the change could also be relevant. For example, if a trader considers a price change of 0.1 to be significant, price changes of 0.0999 are ignored. This inherent limitation of single threshold-based strategy can be addressed by combining information from multiple thresholds (Fernald et al 2021). In Sect. 3.2, we present a multi-threshold trading strategy that combined and optimised information from multiple thresholds using GA (Holland 1992), a well-known technique for solving optimisation problems.

3.1.3 The trading step

As explained earlier, the classification step predicts if a trend is composed of DC and OS events (\(\alpha \)DC) or only a DC event (\(\beta \)DC). In the former case, the trend reversal point is predicted to be the end of the OS event as predicted by the SRGP algorithm, while in the former case it is the DCC point.

In order to decide how to trade, we differentiate between opening (sell the base currency and buy the quoted currency) and closing a position (buy the base currency and sell the quoted currency). In order to open a position, there are two requirements: (i) there is not an already open position, and (ii) the return from opening the position would be positive, after accounting for transaction costs. If the above requirements hold, we open a position at the extreme point of an upward DCT. Similarly, to close a position, there are two requirements: (i) there is an existing open position, and (ii) the return from closing the position would be positive after accounting for transaction costs. If these conditions hold, we close the position at the extreme point of a downward DCT.

The extreme point in both of the above cases can be either at a \(\alpha \)DC or \(\beta \)DC, depending on the prediction of the classification model. When the above requirements are not met, no trading takes place. All transactions take place by using the entire capital. Transaction cost is 0.025% per transaction.

3.2 Multi-threshold DC strategies using a genetic algorithm

3.2.1 Overview

This strategy builds on the single-threshold strategy by combining market trends and predictions from multiple thresholds. As discussed in Sect. 2, a DC event is identified by a change in the price by a given threshold value. Each DC threshold summarises the data in a unique way: smaller thresholds allow the detection of more events and, as a result, actions can be taken promptly; larger thresholds detect fewer events, but provide the opportunity of taking actions when bigger price variations are observed. This proposed trading strategy combines the use of different threshold values in an attempt to take advantage of the different characteristics of smaller and larger thresholds.

Thus, at one point in time the trading strategy under one threshold could be recommending a buy action, while a different threshold recommending a sell action. In addition, even if all strategies are recommending the same trading action, there might not be consensus on where the trend reversal point is, as each DC summary uses its own SRGP algorithm and thus has different predicted reversal points.

To deal with the above issues, we assign a weight for each DC threshold. Thus, if there are \(N_\theta \) thresholds, there will be \(N_\theta \) DCs summaries and as a result \(N_\theta \) recommendations. Each threshold makes the two following recommendations: (i) what action to take, and (ii) where is the trend reversal point, i.e. when to take the recommended action.

What action to take. A majority vote is performed, based on the thresholds’ weights: the weights of the same actions (e.g., buy, buy, ...) are summed up and the action with the largest weight is followed. For example, if \(N_\theta =5\) and the sum of weights for the buy actions is 0.65, while the sum of weights for the sell action is 0.35, the action to be taken will be buy. It is worth noting here that the deciding factor is the sum of weights, rather than the number of thresholds recommending an action.

Overall, there are three possible actions that can be recommended by each threshold: buy, sell, and hold. If the action to be taken is a buy, we buy all available base currency in exchange for the quoted currency. If the action to be taken is a sell, we sell all available base currency in exchange for the quoted currency. Therefore, there is no situation where we have both base currency and quoted currency in our portfolio. With regards to a hold action, this can happen in three specific situations: (1) when the action is “sell” and there isn’t enough base currency available to sell, (2) when the action is “buy” and there isn’t enough base currency to buy and, (3) when the return is negative after deducting transaction costs.

When to act. As each DC summary (each derived by a different DC threshold) can predict a different trend reversal point, there is no consensus as to what point to take a buy/sell/hold action. To alleviate this, we act at the weighted average of the predicted reversal points of the recommended action. To better understand this, let us go back to the previous example of \(N_\theta =5\), where the sum of weights for buy was greater than the sum of weights for sell, and as a result the action to be taken is buy. Let us assume that it was only two thresholds recommending the buy action, with weights \(w_1 = 0.3\) and \(w_2 = 0.35\), respectively. Let us also assume that the first threshold predicts that the trend will reverse at point \(t=10\), and that the second threshold predicts point \(t=20\). Thus, the buy action will be taken according to Eq. 3:

$$\begin{aligned} W \ = \frac{\sum _{i=1}^{n} w_{i}t_{i} }{\sum _{i=1}^{n} w_{i}}, \end{aligned}$$
(3)

where \(w_i\) is each weight value, and \(t_i\) is each predicted trend reversal point. Plugging in the above values would give \(\frac{(0.3 \times 10 ) + (0.35 \times 20)}{(0.3 + 0.35)} \approx 15\). Thus the buy trading action will take place at point \(t=15\). By following the above method we take into account the threshold weights both in terms of what action to take and when to act.

The above is a brief introduction of the multi-threshold strategies. Apart from what has been discussed above, everything else is similar to the single-threshold strategy. Thus, the MTDC framework consists of four steps: (1) classification step; (2) symbolic regression step; (3) optimised strategy by a GA; and (4) trading step. Steps (1), (2), and (4) follow the same approach as in STDC described in Sect. 3.1. The differentiating step in MTDC is step (3), which is what we have discussed above and is necessary for dealing with the multiple trend reversal points that are returned by each individual threshold’s process. Thus, after predicting the trend reversal point per threshold, we assign weights to represent each threshold. Figure 3 presents a high-level overview of the MTDC strategy.

The only part we have not discussed yet is how the weights are decided. This is an important step, because we do not know how much weight we should give to each threshold. Simply assigning an equal weight of 1 to all of the thresholds might be a naive approach. Some thresholds might be more useful than others, hence we should give them more weight. Thus we use a GA to evolve real values for the weight of each DC threshold. We present the GA next.

Fig. 3
figure 3

Our proposed multi-threshold strategy framework. It uses a majority vote system to sum similar trade action recommendations from thresholds and it follows the action with the highest sum. The action is taken after calculating the weighted average of the forecasted trend reversal point of thresholds that recommended the winning action

3.2.2 Genetic algorithm

GAs are well-known evolutionary algorithms to find solutions to hard optimization problems (Goldberg 1989; Michalewicz 2002; Mitchell 1996). GAs use a population of individuals (candidate solutions) and subject them to an evolutionary process: individuals are evaluated according to how well they solve the problem and combined to generate individuals using genetic operators. In this process, individuals are selected based on their quality, where individuals with a higher quality have a higher chance to be selected and their genetic material to contribute to the creation of the next population.

Representation. Each GA chromosome consists of \(N_\theta \) genes, where \(N_\theta \) is the number of thresholds used in the multi-threshold strategy. Each gene is assigned a weight value during population initialisation. The weight is a measure of the importance of a threshold’s recommendation in the trading decisions. The weights are real values where the maximum weight value is 1 and the minimum value is 0. We initialise the first gene in the first chromosome with the maximum weight value and initialise the rest of the genes with minimum weight value. We initialise the second gene in the second chromosome with the maximum weight value and initialise the rest of the genes with minimum weight value. We initialise the third gene in the third chromosome with the maximum weight value and initialise the rest of the genes with minimum weight value. We repeat this initialisation of weights for the first \(N_\theta \) chromosomes in our GA population. The idea is to ensure that in the worst case scenario, the trading result of our strategy is as good as the result of the best performing single threshold. The genes of the rest of the chromosome in our GA population are randomly assigned real values between the minimum and maximum weights inclusive. The pseudocode presented in Algorithm 3 summarises this procedure and Fig. 4 illustrates the initialisation step. The GA then evolves real value weights for each threshold over a number of generations. At the end of the evolution process our optimisation model is created.

Fig. 4
figure 4

Illustration of GA population initialisation for \(N_\theta =5\) thresholds

figure c

Genetic operators. We use three operators namely elitism, uniform crossover and uniform mutation. For elitism, we copy the chromosome with the best fitness value into the next generation. For uniform crossover and uniform mutation, individuals from the population are selected into a mating pool. From the pool, through tournament selection, individuals that best favour the optimisation goal are selected as parents of individuals for the next generation. In this work we select as parent, individual in the pool with highest fitness. In uniform crossover both parents contribute their genes where each gene has a fixed probability of 0.5 of being swapped. In uniform mutation operation, the selected parent’s gene have a fixed probability of 0.5 of being swapped as well. Figures 5 and 6 illustrate uniform crossover and uniform mutation, respectively.

Fig. 5
figure 5

A sample uniform crossover operation by our GA. Either of the children is randomly selected for the next generation

Fig. 6
figure 6

A sample uniform mutation operation by the GA

GA model evaluation. We measure the quality of our GA individual using Sharpe ratio presented in Eq. 6 where R is the return, Q is the quantity traded , TC is the transaction cost discounted from a transaction, FXrate is the FX rate of the relevant currency pair RFR the risk-free rate which we assume to be zero for Forex trading and \(\sigma _{R}\) the standard deviation of the return over the trading period. We choose Sharpe ratio because it is an aggregate metric of risk-adjusted return, as it measures how well the return compensates an investor for the risk of following the trades strategy.

$$\begin{aligned}&TC=Q \,*\, \frac{0.025}{100}, \end{aligned}$$
(4)
$$\begin{aligned}&R=(Q - TC)\, *\, FX rate, \end{aligned}$$
(5)
$$\begin{aligned}&Sharpe\,Ratio=\frac{R - RFR}{\sigma _{R}}. \end{aligned}$$
(6)

Algorithm 4 summarises the procedure of optimising trading actions and trend reversal points. Algorithms 5 and 6 summarise the trading rules applied at optimised trend reversal point at optimal trade action buy and sell respectively.

figure d
figure e
figure f

4 Experimental setup

4.1 Data

We use 10-min interval high frequency data from March 2016 to February 2017 for 16 currency pairs and from June 2013 to May 2014 for 4 currency pairs.Footnote 5 These pairs are presented in Table 1.

Table 1 FX currency pairs used in our experiments

We consider each month as a separate physical-time dataset. In the tuning phase, we use 40 physical-time datasets (i.e., 20 currency pairs \(\times \) first 2 months of our physical-time data) to create 200 DC datasets (40 physical-time datasets \(\times \) 5 thresholds). In the non-tuning phase, 200 datasets (i.e., 20 currency pairs \(\times \) last 10 months of our physical-time data) are used in creating 1000 DC datasets (i.e., 5Footnote 6 thresholds \(\times \) 20 currency pairs \(\times \) remaining 10 months of our physical time datasets). In both tuning and non-tuning phases, the DC datasets are split in 70:30 ratio for training and testing respectively.

4.2 Parameter tuning

For the classification step, the only parameter of Auto-WEKA that required tuning was its execution time. This is because Auto-WEKA requires to be given enough time to search its algorithms and hyperparameter space for a classification model that is best in predicting our two class labels (\(\alpha \)DC, \(\beta \)DC). We experimented with different runtime configurations namely 15 min, 30 min, 45 min 60 min, 75 min. We chose a runtime of 60 min based on average \(f{\text {-}}measure\), which we observed to diminish at a runtime of 75 min. Depending on the number of CPU cores available, it is possible to execute Auto-WEKA in serial or parallel mode. For our experiment we executed Auto-WEKA in serial mode, using 1 CPU core.

With regards to the regression step, we tuned the GP parameters using the I/F-Race package (López-Ibánez et al 2011). I/F-Race package is based on an iterated racing procedure, which is an extension of the Iterated F-race procedure. It implements racing methods for the selection of the best configuration for an optimisation algorithm by empirically selecting the most appropriate settings from a set of instances of an optimisation problem. Table 2 presents the GP configuration to evolve the five symbolic regression models for estimating the OS event length.

Table 2 Regression GP experimental parameters for detecting DC–OS relationship, determined using I/F-Race

With regards to the optimisation part of the DC weights, we again used the I/F-Race package to determine the optimal GA parameters. Table 3 presents the value of the tuned parameters.

Table 3 GA experimental parameters for multi-threshold trading strategy determined using I/F-Race

4.3 Trading experimental setup

4.3.1 DC-related benchmarks

As a reminder, the aim of this study is to demonstrate that by optimising recommendations from multiple thresholds using ML techniques we can improve profitability and risk, statistically outperforming single threshold-based strategies. To do this, we compare the trading performance from a trading strategy that draws recommendations under a five-threshold DC setup, against five individual strategies, where each strategy draws recommendations from a single DC threshold setup. Each single threshold DC strategy will from now on be denoted as STDC; since we are testing five individual thresholds, we thus have STDC1 (single threshold DC trading strategy 1), STDC2, STDC3, STDC4, and STDC5. It is worth re-iterating that the five individual thresholds can be different per dataset, as they are dynamically chosen. The MTDC strategy will be referred to as MTDC and consists of the same five individual thresholds in a given dataset.

4.3.2 Financial (non-DC) benchmarks

Technical analysis trading strategy Technical analysis is a very popular approach in trading. It uses technical indicators, for insight into when to make trading decisions. We experiment with three trading strategies that utilise the relative strength index (RSI) indicator, the exponential moving average indicator (EMA), and the moving average convergence divergence (MACD).

Buy and hold Buy and hold (BandH) is a well-known benchmark for trading algorithms. Under this trading strategy we buy the quoted currency in the first month of the non-tuning data, and then sell it in exchange for the base currency after the 10 month period.

5 Trading results

This section presents experimental results for our proposed MTDC algorithm.Footnote 7 We first compare MTDC’s performance against five STDC strategies (Sect. 5.1), and afterwards to financial benchmarks (Sect. 5.2). Then, in Sect. 5.3 we discuss computational times. Lastly, we summarise the main findings of our results in Sect. 5.4.

5.1 Summary statistics

Table 4 presents returns of single-threshold and multi-threshold trading strategies calculated monthly. In this table, cases where 0.00 is reported as return indicates that the strategy is passive (i.e., hold action). Trading return results show that the multi-threshold strategy has the highest return (1.15%), which is over 100% better than the best single threshold-based strategy that recorded return of 0.53%. The result of the multi-threshold strategy was also the best per currency pair. Table 5 presents the non-parametric Friedman test with the Hommel post hoc test to determine if the differences in performance are statistically significant. The null hypothesis is that the strategies come from the same continuous distribution. As we can observe, the best ranking strategy is the multi-threshold strategy, and it statistically outranks the 5 single-threshold strategies at the 5% significance level in all pairs.

Table 4 Average return result (%) for trading strategies of individual single-threshold strategies and multi-threshold strategy
Table 5 Statistical test results for average returns according to the non-parametric Friedman test with the Hommel post hoc test of multi-threshold (c) vs. other single-threshold based trading strategies

We also evaluated the risk adjusted return (Sharpe ratio) over the transactions that occurred in the 10-min monthly dataset. Table 6 presents the result, and it shows that multi-threshold strategy outperformed single threshold-based strategy in all 20 currency pairs. The Sharpe ratio of 0.78 is over 200% better than the Sharpe ratio of the best single threshold-based strategy. We also tested the statistical significance of the Sharpe ratio result using Friedman non-parametric test. The null hypothesis is that the strategies come from the same continuous distribution. We reject the null hypothesis because the statistical test results presented in Table 7 shows that multi-threshold strategy outperformed the 5 single-threshold strategies.

We also performed risk analysis, measuring maximum drawdown and standard deviation of our daily return. Table 8 presents the maximum drawdown results, where the lower the drawdown the better the result. Our multi-threshold strategy recorded the lowest overall average maximum drawdown (0.02). On average, the risk was 10 times lower than trading using single-threshold strategies. We also perform Friedman test and Table 9 shows that multi-threshold strategy statistically outperforms all single-threshold strategies at the 5% significance level.

Finally, Table 10 presents the standard deviation results. The results are not as homogenous as in the previous tables, where the multi-threshold strategy is ranking first across all datasets. Nevertheless, MTDC strategy ranked the highest for the number of currency pairs (7 currency pairs), it had the lowest average standard deviation (0.1638). We also performed Friedman statistical test, presented in Table 11 and the multi-threshold strategy ranks first overall. The performance was not statistically significant against any of the single-threshold strategies. Nonetheless, the volatility risk was slightly lowered when trading with MTDC which is noteworthy considering that the average return (see Table 4) more than doubled.

Table 6 Average Sharpe ratio result for trading strategies of individual single-threshold strategies and multi-threshold strategy
Table 7 Statistical test results for average Sharpe ratio according to the non-parametric Friedman test with the Hommel post hoc test of multi-threshold (c) vs. other single-threshold based trading strategies
Table 8 Average maximum drawdown (%) result for trading strategies of individual single-threshold strategies and multi-threshold strategy
Table 9 Statistical test results for average maximum drawdown according to the non-parametric Friedman test with the Hommel post hoc test of multi-threshold (c) vs. other single-threshold based trading strategies
Table 10 % Average standard deviation (SD) result for trading strategies of individual single-threshold strategies and multi-threshold strategy
Table 11 Statistical test results for average standard deviation according to the non-parametric Friedman test with the Hommel post hoc test of MTDC (c) vs. other single-threshold based trading strategies

5.2 Financial benchmarks

Since MTDC was found to be the best algorithm in the above tests with other DC-based strategies, we now proceed to compare it to financial benchmarks, namely the RSI, EMA and MACD technical indicators, as well as the well-known BandH benchmark. Under BandH, we buy on the first day of the first month and sell on the last day of the tenth month.

Table 12 compares the mean returns of MTDC and the other strategies. MTDC outperforms the four benchmarks in 13 currency pairs with an overall average return of 1.1577% against a negative average return of \(-0.128\%\) under BandH, \(-0.0378\%\) for RSI, 0.1117% for EMA, and \(-0.1879\%\) for MACD. In addition, MTDC’s variance is 0.76; BandH’s is 6.91; RSI’s is 0.09; EMA’s is 0.14; and MACD’s is 0.16. This indicates that MTDC is not only more profitable, but also less risky than BandH. It is riskier to the technical indicators, but given the significantly higher returns, it can be argued that MTDC’s performance is worth the increased risk. These results were also confirmed by a Friedman statistical test (Table 13), where MTDC ranks first with 1.35 and statistically outperforms all four benchmarks at the 5% level.

Table 12 Comparison of MTDC to RSI, EMA, MACD and buy-and-hold in terms of average return (%)
Table 13 Statistical test results for average return according to the non-parametric Friedman test with the Hommel post hoc test of MTDC (c) vs. RSI, EMA, MACD, and BandH

5.3 Computational times

Table 14 presents the average computational time for multi-threshold strategy in comparison to the single-threshold strategies. The results show an increase in computation time taken by multi-threshold strategy. This is expected since it includes the time required to train multiple classification models. Additional time is also used in training the GA-based strategy. The computation time was measured on a non-dedicatedFootnote 8 Red Hat Enterprise Linux (Maipo) with a 24 core, 2.53 GHz processor and 24 Gigabit memory. Although Auto-WEKA, the tool for our classification step can be executed using multiple threads of concurrent execution, we chose to run in serial mode using a single CPU core due to limitation on hardware resources. Besides the classification step, we acknowledge that improvements can be made in computation time through parallelisation of the different steps that make up the trading strategy framework (Brookhouse et al 2014; Ong and Schroder 2020). We do not consider the additional time to be a significant drawback as the framework is used off-line. As a result, training would not be happening at the same time as trading. Instead, one would train the algorithm separately from the live trading process, and then when a best model is chosen, this would be used during live trading. Therefore, we believe that the significant improvements observed in trading results would likely outweigh the extra computational time needed, and the final say would be for the user/trader to decide what is the impact of the computational times of the MTDC algorithm.

Table 14 Average computational times per trend for single threshold-based strategy and multi-threshold strategy

5.4 Summary

We can summarise our findings as follows.

DC-based trading strategies embedded with an optimised multi-threshold trend reversal forecasting algorithm improves profit at reduced risk. As we observed in Tables 4 and 6, profit obtained trading using an optimised multi-threshold strategy outperformed single-threshold based strategies two and four folds respectively. The statistical tests performed show that that the increase in profit is statistically significant. In addition, having better insight into price movement enables traders make better decisions without increasing risk. We were able to achieve the aforementioned profit without increasing risk. As we discussed in Table 11, multi-threshold strategy was unable to statistically outperform single thresholds in standard deviation risk measure; however, it was ranked first, and we consider this a positive result.

Optimisation of individual threshold trading recommendation is beneficial. Optimising trade action recommendations from multiple thresholds using a GA is an effective way of determining the best action to take.

The paradigm of directional changes has a lot of potential. Although this is not a paper that discusses and thoroughly compares DC to physical-time trading strategies, we can make two general comments about DC-based trading strategies: (i) they can be profitable and low risk, and (ii) they can outperform technical analysis indicators. In this paper we only compared the DC performance to three popular indicators, so in future work we could further benchmark the MTDC to more technical indicators.

6 Conclusion

To conclude, this paper presented an investigation of 200 monthly datasets from 20 different Forex currency pairs to demonstrate that trading with a strategy that combines information from multiple DC thresholds leads to significant improvements in profit and risk. We benchmarked the results against single threshold DC strategies, as well as financial benchmarks, such as technical analysis indicators and BandH. Our results confirmed that the optimal combination of recommendations from multiple thresholds leads to a very strong performance across the majority of metrics, which was further supported by strong statistical significance results. These are significant results, because they indicate that the paradigm of DCs is able to be competitive to the physical time paradigm. Lastly, the fact that we run extensive experiments over 200 datasets leads us to believe that our results are not only significant, but also widely applicable.

It is yet to be confirmed whether similar performance can be achieved in other markets (i.e., commodities, bond, indices and stocks, cryptocurrency). It will therefore be relevant to experiment our approach in other markets. In this work we experimented with data sampled at 10 min interval. It will also be worth experimenting with data of higher frequency like 1-min physical time data and tick-data to further evaluate the robustness of the approach. Furthermore, it would be interesting to experiment with the number of thresholds of the MTDC algorithm. Currently, we use five thresholds, but it would be worth allowing the algorithm to dynamically select the number of thresholds for each dataset, similarly to what is already happening with the threshold values, which are dynamically selected from a given pool of thresholds. Finally, it would be valuable performing classification features’ analysis, to better understand the contribution of each feature to the classification task.