To learn or not to learn? Evaluating autonomous, adaptive, automated traders in cryptocurrencies financial bubbles

Guarino, Alfonso; Grilli, Luca; Santoro, Domenico; Messina, Francesco; Zaccagnino, Rocco

doi:10.1007/s00521-022-07543-4

To learn or not to learn? Evaluating autonomous, adaptive, automated traders in cryptocurrencies financial bubbles

Original Article
Open access
Published: 27 July 2022

Volume 34, pages 20715–20756, (2022)
Cite this article

Download PDF

You have full access to this open access article

Neural Computing and Applications Aims and scope Submit manuscript

To learn or not to learn? Evaluating autonomous, adaptive, automated traders in cryptocurrencies financial bubbles

Download PDF

Alfonso Guarino¹,
Luca Grilli²,
Domenico Santoro³,
Francesco Messina⁴ &
…
Rocco Zaccagnino ORCID: orcid.org/0000-0002-9089-5957⁵

2317 Accesses
14 Citations
1 Altmetric
Explore all metrics

Abstract

Financial bubbles represent a severe problem for investors. In particular, the cryptocurrency market has witnessed the bursting of different bubbles in the last decade, which in turn have had spillovers on all the markets and real economies of countries. These kinds of markets and their unique characteristics are of great interest to researchers. Generally, investors and financial operators study market trends to understand when bubbles might occur using technical analysis tools. Such tools, which have been historically used, resulted in being precious allies at the basis of more advanced systems. In this regard, different autonomous, adaptive and automated trading agents have been introduced in the literature to study several kinds of markets. Among these, we can distinguish between agents with Zero/Minimal Intelligence (ZI/MI) and Computational Intelligence (CI)-based agents. The first ones typically trade on the market without resorting to complex learning strategies; the second ones usually use (deep) reinforcement learning mechanisms. However, these trading agents have never been tested on the cryptocurrencies market and related financial bubbles, which are still mostly overlooked in the literature. It is unclear how these agents can make profits/losses before, during, and after a bubble to adjust their strategy and avoid critical situations. This paper compares a broad set of trading agents (between ZI/MI and CI ones) and evaluates them with well-known financial indicators (e.g., volatility, returns Sharpe ratio, drawdown, Sortino and Omega ratio). Among the experiment’s outcomes, ZI/MI agents were more explainable than CI ones. Based on the results obtained above, we introduce GGSMZ, a trading agent relying on a neuro-fuzzy mechanism. The neuro-fuzzy system is able to learn from the trades performed by the agents adopted in the previous stage. GGSMZ’s performances overcome those of other tested agents. We argue that GGSMZ could be used by investors as a decision support tool.

Machine learning and deep learning

Article Open access 08 April 2021

Artificial intelligence in Finance: a comprehensive review through bibliometric and content analysis

Article Open access 20 January 2024

A brief review of portfolio optimization techniques

Article 15 September 2022

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

Price prediction in financial and real markets is a problem that industry experts and scholars have always studied. Forecasting has become an increasingly complex process, especially today, where markets are fully connected, and information circulates easier and faster. However, in parallel with the increase in forecasting complexity, different tools have been developed to carry out machine-assisted forecasting. For example, various studies successfully forecast stock prices [8, 9, 11, 57, 78] (or more specifically daily close price of stocks [49]), stock market index performance [71], carbon emissions futures prices [6], the price of gold [45], the price of oil [10] and the price of various commodities [5] like coffee, cocoa, etc.

In recent years, Bitcoin has attracted considerable attention from investors, policy makers, and the media. This is not surprising since its price increased from a value of nearly zero in 2009 to almost $20,000 in December 2017. This was accompanied by a tremendous increase both in the number of Bitcoins in circulation and the Bitcoin market capitalization, being around 16.8 million Bitcoins and $300 billion, respectively. Policymakers around the world have raised concerns because Bitcoin is anonymous, decentralized and unregulated, and it could be a bubble, threatening the stability of the financial system [16, 18, 50]. Nonetheless, investors appear to be attracted by the potential to earn high returns, the introduction of Bitcoin derivatives, and the potential diversification benefits. Thanks to these features, the focus has shifted to this market, and it is possible to find a lot of studies that develop and test prediction models for the Bitcoin market. For example, Shah and Zhang [67] propose a trading strategy based on a Bayesian regression model that allows them to earn substantial returns when tested on real data. In a similar context, Madan, Saluja, and Zhao [58] propose the use of binomial regressions, support vector machines and random forest algorithms to predict the sign of the Bitcoin price change. Using machine learning optimization, Greaves and Au [39] obtain an up-down Bitcoin price movement classification accuracy of roughly 55%. In more recent times, Atsalakis et al. [7] developed a neuro-fuzzy system for Bitcoin price prediction with root mean squared error (RMSE) of 0.0376. Lastly, Mudassir et al. [61] proposed a machine learning approach exploiting joint regressors forecasting for Bitcoin price prediction, which has proven to be effective also in medium-term predictions.

In step with the introduction of prediction methods and systems, a set of tools to study the financial market has been proposed. There is a long tradition of research to automatically discover, implement, and fine-tune strategies for autonomous adaptive automated trading in financial markets, with a sequence of research papers on this topic published at major artificial intelligence (AI) conferences and in prestigious journals. Among these, we can distinguish between two broader sets of automatic traders: Zero/Minimal Intelligence (ZI/MI) traders [19], and Computational Intelligence (CI) traders [55]. The first set concerns agents that trade on the market without resorting to complex learning strategies. The second set includes traders that usually exploit (deep) reinforcement learning mechanisms.

1.1 Overview

Objective In this paper, our goal is to study and evaluate the behavior of different agents in the Bitcoin market during financial bubbles (see a visual abstract in Fig. 1). We use two different types of trading agents to analyze their ability to identify the particular market phases (before/during/after the bubble) and their behavior in the investment phase. We want to analyze ZI/MI and CI agents in different scenarios, such as various stages of a financial bubble and compare these agents to understand which ones make more profit in anomalous situations. (Over the years, these bubbles in crypto seem to be more frequent.) Finally, based on what we have observed experimenting with such trading agents, we aim to develop our own automatic trader that operates during the bursting of a bubble. The ultimate goal is to define and introduce a trader outperforming state-of-the-art ones.

Motivation Bitcoin has particular attributes that introduce additional challenges when building a model to forecast its price movements. For example, its volatility is considerably higher than that of gold, the US dollar or stock markets [13], and it is particularly susceptible to regulatory and market events [31]. Additionally, prices may be manipulated through suspicious trading activity [34]. Our interest is driven by the absence in the literature of a comparison between two types of agents, i.e., ZI/MI and CI. Moreover, since the crypto market is subjected to the effect of financial bubbles more and more frequently, it is interesting to study how these agents behave in the different market phases [18]. Furthermore, since the capitalization of these markets is always higher, studying which agent has the best behavior could allow human traders to benefit from its strategies (added value not to be underestimated, from an economic point of view). Finally, the analyses made on this market could be transferred to newbies that have the same characteristics (e.g., high volatility, high frequency of bubbles, $\ldots$).

The proposed approach In order to compare the ZI/MI and CI traders, we considered the ones in [19]—a broad collection of ZI/MI agents including ZIC, ZIP, GDX, AA, and GVWY (see Sect. 2.1)—, and the ones in [55]—a collection of CI traders including A2C, DDPG, TD3, PPO and SAC (see Sect. 2.2). We compared such agents on the Bitcoin market from 2015 to 2018, from 2019 to 2021, and Ethereum market from 2019 to 2021 showing how ZI/MI agents were more explainable than CI ones. Building upon the achieved results, we introduce a neuro-fuzzy system, which is trained on the basis of the experience made by the best agents found in the previous phase and whose aim is to suggest the best operation to perform on the market in a specific period. Neuro-fuzzy systems are hybrid models that combine the functionality of fuzzy systems with the learning abilities of neural networks [59]. Consequently, one of the main advantages of a neuro-fuzzy system is its ability to learn and use linguistic variables to model the input–output relationships of a given system. In addition, using neural network learning algorithms, the fuzzy subsystem can automatically adjust the parameters of the fuzzy rules, thereby producing a data-driving-based rule for more accurate forecasting.

The adaptive network-based fuzzy inference system (ANFIS) used in the present study was proposed by [47]. It consists of five layers of adaptive networks with several inputs and one output. Such fuzzy system thus created is placed at the core of a new trading agent, namely GGSMZ, and was tested on the Bitcoin and Ethereum market also during the bubbles of 2018 and 2021. The results show how GGSMZ outperforms other agents under many indicators in various market situations, and it can be a great trading support tool.

Key points The main contributions of this paper are:

The study of the efficacy of Zero/Minimal Intelligence and Computational Intelligence based agents in terms of economic return in trading cryptocurrencies. In particular, we have analyzed the behavior of such trading agents also during financial bubbles. To the best of our knowledge, the current study is one of the first that compares such a broad range of trading agents (ZI/MI and CI) in similar scenarios.
The analyzes show that some ZI agents can identify the phase of a bubble based on volatility. While for CI agents, their behavior is always excellent at any stage of the market. However, despite the optimality of the investment phase, CI agents lose explainability due to their depth of training.
In the light of the above, we have built a novel learning-based trading agent, namely GGSMZ, that is based on an adaptive neuro-fuzzy inference system (ANFIS) approach. We have tested such an agent and compared its performance against the most promising ones found in the previous step of the project. The results indicate that GGSMZ was able to learn from the best choices of the CI agents and to use them to put himself in a profitable position. Investors could use our neuro-fuzzy model as a decision support tool. In the literature, few agents perform actions based on RL trading agents, placing us among the first to develop models of this type (particularly with neuro-fuzzy rules).

Paper’s organization This article is structured as follows: Sect. 2 presents the literature review, highlighting initiative and studies that show contact points with the presented paper; Sect. 3 elucidates the set of methods used and the dataset adopted; Sect. 4 provides details and a step-by-step description of the experiments carried out, ending with the discussion on the obtained results; Sect. 5 presents GGSMZ and the neuro-fuzzy system at its core. It also illustrates the methodology adopted to build the neuro-fuzzy system and the experiments made; Sect. 6 concludes the paper with final remarks and an overview of the work done, and it traces the path for future works. Finally, Appendix 1 repeats the experiments previously carried out with the same agents on a different market, that of currencies (FOREX).

2 Literature review

This section focuses on providing the key literature referring to ZI/MI trading agents (Sect. 2.1), CI agents (Sect. 2.2), and dwells on the works that compared different agents (Sect. 2.3).

2.1 Zero-intelligence and minimal intelligence trading agents

As many world’s major financial markets have lived a shift from physical stock exchanges to electronic markets, many software agents with various degrees of artificial intelligence have started to replace human traders. One relevant example of these software agents is represented by the Zero/Minimal Intelligence trading agents, which we briefly sketch in the following.

The birth of the ZI is due to Becker [14], who developed a model thanks to which he was able to discover that the taking of the supply and demand curves is associated with a behavior of the agents (traders) without any individual rationality. These behaviors are due to a market mechanism. On this idea, the first to consider a market mechanism in continuous double auctions were Gode and Sunder [38]. In particular, they consider two types of markets, each consisting of twelve agents, divided into two groups: buyer and seller. Traders can submit shouts at any time for one unit at a time. The key feature is that buyers and sellers can modify the offer after submitting a price, e.g., buyers may submit a higher price and sellers a lower price than the bid. The subjects operating in this market are human agents, who can shout prices at any time and whose price choice is governed by strategy and ZI agents (that do not learn strategies): In particular, the ZI are classifiable in ZI Unconstrained that can shout prices at a loss compared to their booking prices; and ZI Constrained (ZIC), for which this mechanism is not allowed and the shouted price cannot allow losses. As a result, Gode and Sunder found that in markets populated by human traders and ZI Constrained there is a rapid convergence toward the equilibrium price, while in markets populated by ZI Uncontrained this convergence did not occur (measuring a higher profits dispersion). According to Gode, Spear and Sunder [37], the result of this analysis highlights how the dominant factor in auctions is not the strategy chosen by the trader, but the market mechanism. The effect of this mechanism produces a rational market behavior even in the presence of irrational agents, going against the classic economic theories according to which the perfect rationality of the agents allowed an optimal allocation in the markets.

There have been several extensions of this model. Friedman [32] and Wilson [77] introduced two behavioral models for ZI agents: (i) Bayesian Game Against Nature (BGAN) with bounded rationality, to explain the bid-ask spread; (ii) Waiting Game Double Auction (WGDA) with completely rationality, to check what happens in markets with an unequal number of traders. Jamal and Sunder [46] used ZI traders whose price limits use heuristic and Bayesian rules, demonstrating the achievement of Bayesian equilibrium.

Gjerstad and Dickhaut [35] developed a trading strategy called GD to achieve competitive equilibrium outcomes (prices and allocations) in a market where individual choices are made myopically using heuristic beliefs. Their model aims to strike a balance between the approach taken by Wilson and the one by Gode and Sunder, while it also boasts the merit of avoiding the positive autocorrelation of price changes found in Friedman’s model. In addition, Gode and Sunder [36] examined the effect of unconstrained price controls, showing that traders do not adjust the strategy in the case of price controls.

Among various criticisms that have been put forward to the model of Gode and Sunder, most notably is that of Cliff and Bruten [24], which have shown that the accuracy with which the model captures the behavior of real markets is dependent on the supply and demand functions. The condition demonstrated by Gode and Sunder only occurs when these functions are symmetrical (a situation that does not occur in reality), making the ZI model weak in representing the results.

Cliff and Bruten [24] have developed an agent called Zero Intelligence Plus (ZIP) with a learning mechanism, through which the agent maintains a profit margin that reflects that individual’s belief of the profit that can be obtained from a successful transaction, therefore function of the trader’s reservation price. In this case, the authors demonstrated how ZIP behavior allows for better performance than ZI. One of the main features of the marketplace they used is that, at any given time, only one agent can announce a bid/offer. This agent is chosen at random by the market institution.

Inspired by this ZIP agent work, while considering unrealistic the marketplace bid/offer procedure, Priest and van Tol [62] have developed a new agent called PS-agent. The performance of ZIP agents and PS-agents has been compared in a marketplace characterized by a persistent shout double auction mechanism, where a trader’s current bid or offer will persist until the trader makes another. As a result, PS-agents turn out as more rapid in converging to equilibrium than the ZIP agents. Then, ZIP and a modified version of GD, renamed as MGD, have been tested by Das et al. [25] in CDA markets, in order to study the interactions between human and artificial traders. Another extension of GD model, the GDX, has been developed by Tesauro and Bredin [72]. The GDX not only involves a belief function that an agent builds to indicate whether a particular shout is likely to be accepted in the market, but it also considers the time left before the auction closes.

Inspired by Das et al. [25], Grossklags and Schmidt [40, 41] have studied the effect of knowledge/ignorance of the presence of trader-agents on the behavior of human traders, highlighting a “knowledge effect” capable of altering market dynamics. The ZIP trader has been modified by Cliff [19] through genetic algorithms to study the evolution of strategies or by extending the parameters from 8 to 60—introducing the ZIP60 [20]. In this paper, it has been observed that, thanks to a simple search/optimization process, is possible to found ZIP60 parameter-vectors that outperform ZIP8.

The introduction of ZI and ZIP agents marked an important step in trading strategies [51].

A further step forward has been made by Vytelingum et al. [74] with the presentation of a dominating strategy called Adaptive Aggressive (AA), which has been widely considered to be the best performing strategy in the public domain. The crucial peculiarity of AA is having both a short and a long-term learning mechanism to adapt its behavior to changing market conditions. Later on, AA’s supposed dominance has been tested against two novel algorithms known as GVWY and SHVR [21], which involve no AI or machine learning at all. The result is surprising: GVWY and SHVR can outperform AA and many of the other AI/ML-based trader-agent strategies.

2.2 Computational Intelligence traders

The increasingly strong use of neural networks, also in the financial field, has made it possible to combine the high ability to represent features with reinforcement learning. For example, Deng et al. [29], starting from the idea that computers can beat experienced traders, proposed a recurrent neural network (RNN) for sensing the dynamic market condition for feature learning and combined it with a RL framework that makes trading decisions. Almahdi and Yang [2] proposed a recurrent reinforcement learning (RRL) method for portfolio allocation, with a risk-adjusted performance objective function (Calmar ratio) to obtain signals and asset weights, showing how this method outperforms hedge fund benchmarks. Jiang, Xu and Liang [48] proposed a RL framework for asset allocation, consisting of a convolutional neural network (CNN), an RNN and a long short-term memory (LSTM) in a particular scheme with deep deterministic policy gradient, showing how, on a crypto market, this framework monopolize top positions in various experiments. Liu et al. [56] proposed an adaptive trading model, namely iRDPG, to develop trading strategies useful to balance exploration and exploitation combining RL techniques with GRU-based networks. Or again, on financial signal’ study, Ye et al. [79] built a new RL framework, the State-Augmented RL framework (SARL), that augments asset information with their price movement prediction as additional states, to incorporate data heterogeneity and environment uncertainty of the market, testing it on the Bitcoin and stock markets, and demonstrating the importance of state augmentation. Wang et al. [75] proposed AlphaStock, a new type of strategy based on the Attention Mechanism to model the price relations for buying and selling strategy, testing it on the USA and Chinese markets and highlighting the robustness of their model. Wang et al. [76], considering the market conditions, proposed a Deep RL method to optimize the investment policy (DeepTrader); a model that considers macro-market conditions as an indicator and is able to capture the spatial and temporal dependencies between assets.

Recently, Yang, Gao and Wang [55] due to the difficulty of developing RL models under the programming aspect, created a new open-source framework (FinRL) to help quantitative traders. Several works have been developed on this framework, such as Guan and Liu [42] who used it to explain the trading strategies of a DRL agent for portfolio management in three steps; or Bau and Liu [12] who proposed a DRL multi-agent-based on FinRL, which capture high-level complexity, to optimize the process of selling a large number of stocks (called liquidation). Thanks to the ease of implementation and the number of agents included, in line with the previous authors, we also used FinRL for the subsequent analyzes.

2.3 Comparison and evaluation of different trading agents

Since Gode and Sunder developed the ZIC agent, several papers have addressed the topic of comparing bidding strategies and agents’ behaviors. First, Cason and Friedman [17] evaluated the performances of Wilson’s waiting game/Dutch auction (WGDA) model, Friedman Bayesian game against nature (BGAN) and Gode and Sunder ZIC agent in price formation in Double Action Markets. The results suggested that models which rely most heavily on trader rationality, as WGDA and BGAN, have less ability to describe markets behavior than ZIC agents, which requires very little trader rationality. Nevertheless, the authors suggest new experiments since the conditions of their experiment did not give a fair chance to WGDA model. In 2001, in their already mentioned work, Das et al. [25] applied the laboratory methods of experimental economics to compare Extended-GD agent and ZIP agent against human traders in a continuous double auction (CDA) mechanism. Ten years later, De Luca and Cliff [26] recreated the same experiment in a trading system called OpEx, obtaining the same results as Das et al. [25] in terms of comparison between robot traders and human traders, as ZIP and GDX agents had consistently outperformed human traders, and observing that GDX had outperformed ZIP. At the same time, in 2002 Tesauro and Bredin [72] pointed out that ZIP slightly outperformed EGD. In addition, another work by De Luca and Cliff [27] confirmed that “Adaptive Aggressive” (AA) algorithmic traders of Vytelingum [74] outperformed ZIP, GD, and GDX in agent vs agent experiments in CDA markets, as Vytelingum himself claimed. A few years later, Vach [73] questioned the dominance of AA over ZIP and GDX agents by designing symmetric agent–agent experiments with a variable composition of agent population. Surprisingly, GDX is a dominant strategy over AA in many experiments in this work in contrast to the previous literature. In 2019 Cliff [23] reaches a similar result: in markets with dynamically varying supply and demand, so market environments that are in various ways more realistic and closer to real-world financial markets, AA can be routinely outperformed by more straightforward trading strategies. On the other hand, AA remains dominant only in highly simplified market scenarios and maybe because AA was designed with exactly those simplified experimental markets in mind. In the same year, Snashall and Cliff [69] made another step forward by exhaustively testing AA across a sufficiently wide range of market scenarios against GDX. The outcome was that not only AA is outperformed by GDX in more realistic market environments, but also in the simple experiment conditions that were used in the original AA papers. So, the various results achieved in the previous years and well known in the literature could no longer be fully trusted. On this path, one year later, Rollins and Cliff [64], employing a new version of BSE called Threaded-BSE (TBSE) by Rollins [64], questioned the original benchmark dominance-hierarchy AA > GDX > ZIP > ZIC, obtained in the BSE, and got a different result: The dominance-hierarchy is instead ZIP > AA > ZIC > GDX. The authors also guess that this new achievement is probably due to the previous use of simplistic simulation methodologies.

Thus, several experiments have been conducted with autonomous, adaptive, automated traders, but to the best of our knowledge the following aspects have been overlooked:

There is lack of throughout comparisons in the cryptocurrencies market, and, in particular, in the Bitcoin and Ethereum market;
There is lack of experiments on how the trading agents behave during financial bubbles—except the study by Duffy and Unver [30] that successfully verified whether ZIC traders can generate asset price bubbles and crashes of the type observed in a series of laboratory asset market experiments^{Footnote 1}.
There is lack of comparison between ZI/MI traders and other traders adopting higher degree of AI techniques, such as CI ones.

We remark that this work aims to fill the gaps mentioned above offering a comparison between ZI/MI and CI trading agents on the crypto market over different phases, including during the bursting of a financial bubble. Furthermore, building upon the experiments carried out, we propose GGSMZ, a trading agent relying on a neuro-fuzzy system which outperforms other analyzed traders.

3 Methods and materials

In this section, we first present and provide details about the traders adopted, from ZI/MI ones (Sect. 3.1) to the CI ones (Sect. 3.2). Then, we sketch technical details on adaptive neuro-fuzzy systems at the basis of the proposed GGSMZ trader (Sect. 3.3). Lastly, we show the dataset employed for running the experiments (Sect. 3.4).

3.1 Zero/Minimal Intelligence traders

We used the following ZI/MI traders:

Zero Intelligence Constrained (ZIC), the ZIC trader generates random bids or offers (depending on whether it is a buyer or a seller) distributed independently, identically and uniformly over the entire feasible range of trading prices from 1 to 200. The trader has no memory of past market activity, and each trader has an equal probability of being the next trader to make a bid or an ask. The assumption by Gode and Sunder [38]: (i) each ask, bid, and transaction is valid for a single unit; (ii) a transaction cancels any unaccepted bids and offers; (iii) when a bid and ask crosses, the transaction price is equal to the earlier of the two. Buyer’s profit from buying the ith unit is given by the difference between the redemption value of the unit i, $v_i$, and its price $p_i$: $\pi _i^B= v_i - p_i$ Seller’s profit from selling the ith is given by the difference between the price of the unit i, $p_i$, and its cost to the seller $c_i$: $\pi _i^S= p_i - c_i$. Every trader has to sell the unit i before selling the unit $i+1$. The agents are subject to budget constraints: If they generate a bid (to buy) above their redemption value or an offer (to sell) below their cost, such actions are considered invalid and are ignored by the market. So, the market forbids traders to buy or sell at a loss. Therefore, the support of the distribution that generated the uniform random bids was restricted between 1 and the redemption value of the bidder, while the uniform distribution of random ask was restricted to the range between the seller’s cost and 200.
Zero Intelligence Plus (ZIP), it is an evolution of ZIC. Individual traders adjust their profit margins using market price information thanks to simple adaptive mechanisms. More specifically, they adjust the profit margins up or down based on the prices of bids and offers made by other traders and whether these shouts are accepted, leading to deals or ignored. As a result, the performances of these agents sensibly increase. The adjustments depend on four factors. The first is whether the trader is active or inactive – in other words, if it is still able to make transactions or not. The other three factors are connected to the most recent shout: its price q, whether it was a bid or an offer and whether it was accepted or rejected. At a given time t, an individual ZIP trader i calculates the shout price $s_i(t)$ for a unit j by multiplying the trader’s real-valued profit margin $\mu _i(t)$ by the limit price $\lambda _{i,j}$ of the unit: $s_i(t)=\lambda _{i,j}[1+\mu _i(t)]$. Sellers: $\mu _i(t) \in [0,\infty ) \forall t$, so that $s_i$ is raised by increasing $\mu _i$ or lowered by decreasing $\mu _i$; Buyers: $\mu _i(t) \in [-1,0] \forall t$, so that $s_i$ is raised by decreasing $\mu _i$ or lowered by increasing $\mu _i$. In principle, a ZIP buyer will buy from any seller that makes an offer less than the buyer’s current bid shout price; similarly, a ZIP seller sells to any buyer making a bid greater than the seller’s current offer shout price. The aim is that the value of $\mu _i$ for each trader should alter dynamically, in response to the actions of other traders in the market, increasing or decreasing to maintain a competitive match between that trader’s shout-price and the shouts of the other traders.
Gjerstad-Dickhaut (GDX), the GDX agent is the result of an improvement process that begins from Gjerstad and Dickhaut [35] with their GD trader and ends up with Tesauro and Bredin [72]. As ZIP trader, GD agent can trade profitably by adapting its behavior over time, in response to market events. In contrast to the ZIP work, Gjerstad’s trading algorithm uses frequentist statistics, gradually constructing and refining a belief function that estimates the likelihood for a bid or offer to be accepted in the market at any particular time, mapping from price of the order to its probability of success. The original GD agent was developed for a market where there was no queue, so old bids or asks were erased as soon as there was a more favorable bid/ask or a trade. In Das et al. [25] version of the CDA market, unmatched orders can be retained in a queue, and therefore the notion of an unaccepted bid or ask becomes ill-defined. In their version of GD agent, called Modified-GD (MGD), they overcome this problem by introducing into the GD algorithm a “grace period” $t_g$. Another modification to GD addressed the need to handle a vector of limit prices since the original algorithm assumed a single tradeable unit. Finally, an extension of MGD was reported by IBM researchers Tesauro and Bredin in 2002 and took the name of GDX [72]. In their work, Tesauro and Bredin combine the belief function with a forecast of how it changes over time. The result is an optimization of cumulative long-term discounted profitability, whereas GD traders merely optimize immediate profits.
Adaptive Aggressiveness (AA), the AA agent has both a short- and long-term learning mechanism to adapt its behavior to changing market. In particular, in the static case, the agent can be effective by assuming that the competitive equilibrium does not change significantly, whereas in the dynamic case, it can make no such assumption and must learn, assuming that this competitive equilibrium may change. The focus is on the bidding aggressiveness shown by the agent because it describes how the agent manages the trade-off between profit and probability of transaction. Whenever the agent submits a bid or an ask, a short-term learning mechanism is employed to adjust agent’s level of aggressiveness $r\in [-1,1]$. For $r<0$, the agent adopts an aggressive strategy, which trades-off profit to improve its probability of transacting in the market. For $r>0$, the agent adopts a passive strategy, waiting for more profitable transactions than at and willing to trade-off its chance of transacting for a higher expected profit. If $r=0$, the agent is neutral and submits offers at what it believes is the competitive equilibrium price, which is the expected transaction price. How the level of aggressiveness influences an agent’s choice of which bids or asks to submit in the market depends on a long-term learning strategy, based on market information observed after every transaction. In a few words, an AA agent has two principal decision-making components: (i) a bidding layer that specifies what bid or ask should be submitted based on its current degree of aggressiveness; (ii) an adaptive layer to update its behavior according to the prevailing market conditions. Given a target price $\tau$ and a set of bidding rules, the first layer determines which bids or asks to submit. The aggressiveness model gives a mapping function to $\tau$ employing the agent’s current degree of aggressiveness, its limit price $\hat{p}^*$ and an intrinsic parameter $\theta$.
Giveaway (GVWY), the GVWY agent simply sets its quote price equal to its limit price, giving away any chance of surplus. GVWY seller: $P_{sq(GVWY)}(t)=\lambda ^S$ GVWY buyer: $P_{bq(GVWY)}(t)=\lambda ^B$ where S and B are, respectively, the seller’s limit price and buyer’s limit price. Anyway, the GVWY trader can enter in a surplus-generating transaction: If a GVWY buyer quotes its limit price $\lambda ^B$ and the current best ask $p_{ask}^*<\lambda ^B$, the GVWY buyer is matched with whichever seller issued that best ask and the transaction goes through at price $p_{ask}^*$ yielding a $\lambda ^B-p_{ask}^*$ surplus for the GVWY buyer.

3.2 Computational Intelligence-based traders

On the other side, as previously said we used FinRL [55] as a reinforcement learning (RL) framework. This framework, consisting of 3 layers, encapsulates historical trading data in training environments and provides useful demonstrative trading tasks to users for develop their strategies. The first layer, Application, is used to transform the trading strategy into deep reinforcement learning (DRL) by defining: the state space $\mathcal {S}$ (that describes how the agent perceives the environment), the action space $\mathcal {A}$ (that describes the allowed actions for an agent) and the reward function (as an incentive for the agent to learn better policy, Sharpe ratio in this case). The second layer, Agent, allows the user to play with the standard DRL algorithms like Stable Baseline 3 [63], RLlib [52] and ElegantRL [54]. Finally, the third layer, Environment, simulates real world markets to learn a new strategy. Here the agent updates iteratively and obtains a trading strategy to maximize the expected return. The methods used in FinRL framework for representing the training agents are:

Asynchronous Advantage Actor Critic (A2C) [60], a policy optimization method that performs gradient ascent to maximize performance. Defining a state ${\textbf {s}}_t$ in which an actor selects an action ${\textbf {a}}_t$ according to the policy $\pi$, $r_t$ the scalar rewards such that $R_t = \sum _{k=0}^{+\infty } \gamma ^k r_{t+k}$ is the total accumulated return from t with discount factor $\gamma$, $Q^{\pi }({\textbf {s}}, {\textbf {a}})) = \mathbb {E}[R_t|{\textbf {s}}_t = {\textbf {s}}, {\textbf {a}}]$ the action value following the policy $\pi$, $Q^*({\textbf {s}}, {\textbf {a}}) = \max _{\pi }Q^{\pi }({\textbf {s}}, {\textbf {a}})$ the maximum action value for the state s, $V^{\pi }({\textbf {s}}) = \mathbb {E}[R_t|{\textbf {s}}_t = {\textbf {s}}]$ the value of state s under the policy $\pi$ and $Q({\textbf {s}}, {\textbf {a}};\theta )$ an approximate action-value function, Mnih et al. [60] starting from the one step Q-Learning loss
$$\begin{aligned} J(\theta _i) = \mathbb {E}[r + \gamma \max _{a^{'}}Q({\textbf {s}}^{'}, {\textbf {a}}^{'}; \theta _{i-1} - Q({\textbf {s}}, {\textbf {a}};\theta _i)]^2 \end{aligned}$$
have designed new methods to find a RL method that is trainable through neural networks without excessive use of resources. In this vein, the authors have introduced a property modification of Asynchronous one-step Q-Learning (in which each thread interacts with its own copy of the environment and computes a gradient of the loss), Asynchronous n-step Q-Learning and Asynchronous advantage actor-critic (called A3C) that maintains a policy $\pi ({\textbf {a}}_t|{\textbf {s}}_t;\theta )$ and an estimate of the value function $V({\textbf {s}}_t; \theta _v)$.
Deep Deterministic Policy Gradient (DDPG) [53], a first type of mixed method between Q-Learning and Policy Optimization that use each to improve the other. In this situation, since it is not possible to apply Q-Learning to continuous action spaces, Lillicrap et al. [53] use an approach based on the deterministic policy gradient (DPG). Considering ${\textbf {s}}_t$ the state in which an agent takes an action ${\textbf {a}}_t$ and obtain the reward $r_t$, $\rho ^\pi$ the discounted state visitation distribution for a policy $\pi$, Q the off-policy, $\mu (s) = argmax_a Q({\textbf {s}}, {\textbf {a}})$ a greedy policy, $\gamma$ the discount factor and $\beta$ a stochastic behavior policy, it is possible to start from Q-Learning loss
$$\begin{aligned} J(\theta ^Q) = \mathbb {E}_{{\textbf {s}}_t\sim \rho ^\beta , {\textbf {a}}_t \sim \beta , r_t \sim E}[(Q({\textbf {s}}_t, {\textbf {a}}_t|\theta ^Q) - y_t)^2], \end{aligned}$$
where $y_t = r({\textbf {s}}_t, {\textbf {a}}_t) + \gamma Q({\textbf {s}}_{t+1}, \mu ({\textbf {s}}_{t+1})|\theta ^Q)$. The author, to make the DPG deeper and implement it through neural networks, has made several changes, e.g., to the replay buffer making it larger, improving the learning algorithm to avoid divergence, using the batch normalization technique and adopting a new policy $\mu ^{'}({\textbf {s}}_t) = \mu ({\textbf {s}}_t|\theta ^{\mu }_t) + \mathcal {N}$ built by introducing a noisy process $\mathcal {N}$.
Twin-Delayed DDPG (TD3) [33], an evolution of DDPG method that solve the problem of reducing overestimation bias by introducing a novel clipped variant of Double Q-Learning and reduce high variance estimates minimizing error at each update. In this case, Fujimoto et al. [33] maintain the loss of the DDPG model but introduce a novelty in updating the pair of critics of the actions selected by the target policy, defining
$$\begin{aligned} y = r({\textbf {s}}_t, {\textbf {a}}_t) + \gamma \min _{i=1, 2} Q_{\theta ^{'}_{i}}({\textbf {s}}^{'}, \pi _{\phi ^{'}}({\textbf {s}}^{'}) + \epsilon ) \end{aligned}$$
with $\epsilon \sim clip(\mathcal {N}(0, \sigma ), -c, c)$, where c is a constant and $clip(\mathcal {N}(0, \sigma ), -c, c)$ clip the probability. These changes made it possible to increase the stability and performance with consideration of function approximation error.
Proximal Policy Optimization (PPO) [66], another policy optimization method that maximize a surrogate objective function which indicates the variations of the $J(\pi _{\theta })$ function at each update. In particular, Schulman et al. [66] develop a loss function that combines policy surrogate and value function error term. Starting from the clipped loss
$$\begin{aligned} J^{CLIP}(\theta ) = \mathbb {E}[\min (r_t(\theta )\hat{A}_t, clip(r_t(\theta ), 1-\epsilon , 1+\epsilon )\hat{A}_t] \end{aligned}$$
where $\pi _{\theta }$ is the stochastic policy, $\hat{A}_t$ is an estimator of the advantage function at time t, $r_t(\theta ) = \frac{\pi _{\theta }({\textbf {a}}_t| {\textbf {s}}_t)}{{\pi _{\theta }}_\mathrm{old}({\textbf {a}}_t|{\textbf {s}}_t)}$, $\epsilon$ is an hyperparameter and $clip(r_t(\theta ), 1-\epsilon , 1+\epsilon )\hat{A}_t$ modifies the surrogate objective by clipping the probability ratio; the authors combined it with entropy bonus obtaining the following objective
$$J_t^{CLIP+VF+S}(\theta ) = \hat{\mathbb {E}}_t[J_t^{CLIP}(\theta ) - c_1 L_t^{VF}(\theta ) + c_2 S[\pi _{\theta }]({\textbf {s}}_t)]$$
where $c_1$ and $c_2$ are coefficients, S is the entropy bonus and $L_t^{VF}$ is a squared-error loss $(V_{\theta }({\textbf {s}}_t) - V_t^{targ})^2$ between state-value functions.
Soft Actor-Critic (SAC) [44], another mixed method between Q-Learning and Policy Optimization that uses stochastic policies and entropy regularization to stabilize learning than DDPG. In this case, the Soft Actor-Critic algorithm start from a maximum entropy variant of the policy iteration method. According to [44], we know that ${\textbf {s}}_t \in \mathcal {S}$ is the current state, ${\textbf {a}}_t \in \mathcal {A}$ is an action, $V_{\psi }({\textbf {s}}_t)$ is the parameterized state value function, $Q_{\theta }({\textbf {s}}_t, {\textbf {a}}_t)$ is the soft Q-function and $\pi _{\phi }({\textbf {s}}_t, {\textbf {a}}_t)$ is the tractable policy. The parameters are: $\psi$ learned by minimizing the square residual error
$$\begin{aligned} J_V(\psi )& = {} \mathbb {E}_{{\textbf {s}}_t \sim \mathcal {D}}\biggl [\frac{1}{2}(V_{\phi }({\textbf {s}}_t) - \mathbb {E}_{{\textbf {a}}_t \sim \pi _{psi}}[Q_{\theta }({\textbf {s}}_t, {\textbf {a}}_t)\\&\quad - \log \pi _{\psi }({\textbf {a}}_t|{\textbf {s}}_t)])^2\biggr ], \end{aligned}$$
where $\mathcal {D}$ is the distribution of previously sampled states and actions; $\theta$ learned by minimizing the soft Bellman residual
$$\begin{aligned} J_Q(\theta ) = \mathbb {E}_{({\textbf {s}}_t, {\textbf {a}}_t) \sim \mathcal {D}}\biggl [\frac{1}{2}(Q_{\theta }({\textbf {s}}_t, {\textbf {a}}_t) - \hat{Q}({\textbf {s}}_t, {\textbf {a}}_t))^2\biggr ], \end{aligned}$$
with $\hat{Q}({\textbf {s}}_t, {\textbf {a}}_t) = r({\textbf {s}}_t, {\textbf {a}}_t) + \gamma \mathbb {E}_{{\textbf {s}}_{t+1} \sim p}[V_{\bar{\psi }}({\textbf {s}}_{t+1})]$ and $\bar{\psi }$ the exponentially moving average of the value network weights; and finally $\phi$ learned by minimizing the expected KL-divergence
$$\begin{aligned} J_{\pi }(\phi ) = \mathbb {E}_{{\textbf {s}}_t \sim \mathcal {D}}\biggl [D_{KL}\biggl (\pi _{\psi }(\cdot |{\textbf {s}}_t)\biggl |\biggl |\frac{\exp (Q_{\theta }({\textbf {s}}_t, \cdot ))}{Z_{\theta }({\textbf {s}}_t)}\biggr )\biggr ]. \end{aligned}$$

3.3 Neuro-fuzzy systems: ANFIS technical details

In this section, we provide basic technical details on adaptive neural fuzzy inference system (ANFIS) which is at the basis of our GGSMZ trader (see Sect. 5).

ANFIS was first proposed by Jang [47]. ANFIS constructs a fuzzy inference system (FIS) whose membership function parameters are derived from training examples. As a matter of example, we assume a FIS with two inputs x and y with two associated membership functions (MFs), and one output (z). For a typical first-order Takagi–Sugeno fuzzy model [70], a common rule set, with two fuzzy if-then rules, is presented as follows:

Rule 1: if x is $A_1$ and y is $B_1$, then $f_1$ = $a_1x + b_1y + c_1$,
Rule 2: if x is $A_2$ and y is $B_2$, then $f_2$ = $a_2x + b_2y + c_2$,

where $A_1$, $A_2$, $B_1$ and $B_2$ are the linguistic labels of the inputs x and y, respectively, and $a_i, b_i,c_i$ $(i = 1, 2)$ are the parameters [47]. Figure 2a, b illustrate the reasoning mechanism and the corresponding ANFIS architecture, respectively [47].

As shown in Fig. 2b, ANFIS is a multilayer network. The operation of ANFIS model from layer 1 to layer 5 is briefly presented below [47].

Layer 1: all the nodes in this layer are adaptive nodes, which indicate that the shape of membership function can be modified through training. Taking Gaussian MFs as an example, the generalized MFs are defined as follows:
$$\begin{aligned} O_i^1=\mu _{Ai}(x)=e^{-\frac{(x-c_i)^2}{2\sigma _i^2}} \end{aligned}$$
where x is crisp input to node i and Ai is the linguistic label, such as low, medium and high. $O_i^1$ is the membership grade of fuzzy-set Ai, which can be trapezoidal, Gaussian, bell-shaped and sigmoid functions or others. The variables $(\sigma _i , c_i)$ are the parameters of the MF governing the Gaussian function.
Layer 2: The nodes in this layer are gray circle nodes labeled $\Pi$, indicating that they perform as a simple multiplier. Each node output represents the firing strength of each rule.
$$\begin{aligned} O_i^2=w_i=\mu _{Ai}(x)\times \mu _{Bi}(x),\,\, i=1, 2 \end{aligned}$$
Layer 3: the nodes in this layer are also gray circle nodes labeled N. The ith node is the ratio of the ith rule’s firing strength to the sum of all rules’ firing strengths. The outputs of this layer can be given by
$$\begin{aligned} O_i^3=\bar{w}_i=\frac{w_i}{w_1+w_2},\,\, i=1,2 \end{aligned}$$
Layer 4: each node i in this layer is adaptive. Parameters in this layer are considered as consequent parameters. The outputs of this layer can be represented as
$$\begin{aligned} O_i^4=\bar{w}_if_i=\bar{w}_i(p_ix+q_iy+r_i),\,\, i=1,2 \end{aligned}$$
Layer 5: the node in the last layer computes the overall output as the summation of all incoming signals. The overall output is given as
$$\begin{aligned} O_{i}^{5} & = z \\ & = \sum\limits_{i} {i = 1^{2} \bar{w}_{i} f_{i} = \frac{{w_{1} (p_{1} x + q_{1} y + r_{1} ) + w_{2} (p_{2} x + q_{2} y + r_{2} )}}{{w_{1} + w_{2} }}} \\ \end{aligned}$$

In the ANFIS architecture, the major task of the training process is to make the ANFIS output fit with the training data by optimizing the fuzzy rules and parameters of membership functions. The hybrid-learning algorithm incorporating gradient method and the least-squares are used in ANFIS to estimate the initial parameters and quantify the mathematical relationship between input and output. Further details are in [47, 70].

3.4 Crypto datasets

This work uses datasets that describe the evolution of the price of some of the most famous cryptocurrencies, Bitcoin and Ethereum, in different time frames.

BTC-USD 2018 dataset Regarding the Bitcoin price and its time-division, we have chosen as a ticker the BTC-USD price recorded by CoinMarketCap through Yahoo!Finance^{Footnote 2}, and we have split it into 3 time frames. The entire dataset contains the prices from 01/01/2015 to 12/31/2018 (the first big bubble on this crypto) for a total of 1,460 days and consists of the classic OHLCV features used in financial sector: Open, High, Low, Close, and Volume. In Table 1, we show an extract of how the dataset is composed.

Table 1 Extract of the BTC-USD price dataset

Reference values BTC2018 Before
In the first time period, the reference values of benchmark during the test period were:
− Annual returns: 1574.78%;
− Volatility: 76.12%;
− Sharpe ratio: 3.33;
− Max Drawdown: \(-35.50\)%;
− Sortino ratio: 5.89;
− Omega ratio: 1.82.

Reference values BTC2018 During
− Annual returns: \(-61.25\)%;
− Volatility: 87.40%;
− Sharpe ratio: \(-1.23\);
− Max Drawdown: \(-65.28\)%;
− Sortino ratio: \(-1.64\);
− Omega ratio: 0.82.

Reference values BTC2018 After
− Annual returns: \(-47.96\%\);
− Volatility: \(54.23\%\);
− Sharpe ratio: \(-1.17\);
− Max Drawdown: \(-61.57\)%;
− Sortino ratio: \(-1.50\);
− Omega ratio: 0.80.

Reference values BTC2021 Before
− Annual returns: 287.10%;
− Volatility: 51.56%;
− Sharpe ratio: 2.68;
− Max Drawdown: \(-25.40\)%;
− Sortino ratio: 4.37;
− Omega ratio: 1.63.

To learn or not to learn? Evaluating autonomous, adaptive, automated traders in cryptocurrencies financial bubbles

Abstract

Similar content being viewed by others

Machine learning and deep learning

Artificial intelligence in Finance: a comprehensive review through bibliometric and content analysis

A brief review of portfolio optimization techniques

1 Introduction

1.1 Overview

2 Literature review

2.1 Zero-intelligence and minimal intelligence trading agents

2.2 Computational Intelligence traders

2.3 Comparison and evaluation of different trading agents

3 Methods and materials

3.1 Zero/Minimal Intelligence traders

3.2 Computational Intelligence-based traders

3.3 Neuro-fuzzy systems: ANFIS technical details

3.4 Crypto datasets

4 Experiment

4.1 Experiment setup

4.2 Experiment results

4.2.1 Bitcoin bubble 2018

4.2.2 Bitcoin bubble 2021

4.2.3 Ethereum bubble 2021

4.2.4 Summarization and some recommendations for investors

5 GGSMZ: a neuro-fuzzy trading agent

5.1 Methodology: building a neuro-fuzzy system

5.1.1 Dataset building

5.2 Tuning and testing ANFIS

5.3 Inside GGSMZ trader

5.4 Discussion

6 Conclusion

Notes

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Appendices

Appendix A: FOREX 2021

Appendix B: Depicting volatility and cumulative returns of traders

1.1 BTC2018 bubble

1.2 BTC2021 bubble

1.3 ETH2021 bubble

1.4 FOREX2021

Appendix C: Further ANFIS training and testing results

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation