1 Introduction

Cryptoassets represent an interesting laboratory setting for financial market research. They are traded in anonymous exchanges, several exchanges are available for each market, there are no best price execution obligations, few traders actively operate in several exchanges, price discovery and information flow are not smooth as in regulated markets. These features render the analysis of efficiency and deepness of these markets a very interesting—and largely unexplored—topic.

Differently from many papers that concentrate only on markets where cryptoassets are exchanged with the US dollar, see for example Baur et al. (2018), Brandvold et al. (2015), Grobys (2021), Lintilhac & Tourin (2017), Petukhina et al. (2021), we consider markets where fiat money is exchanged with cryptoassets as well as markets where cryptoassets are exchanged with each other (including Tether). We would like to stress the importance of considering the latter markets: their role is becoming more and more relevant, for example Ciaian et al. (2018) show that 47% of Ethers are exchanged with Bitcoin, and only 37% with US dollar or Euro.

We investigate market impact and market efficiency (return and arbitrage opportunities) at high frequency, centering the analysis on market order flow (signed volume). We show that markets where cryptoassets are exchanged between themselves play a central role on price formation and are more efficient than markets where cryptocurrencies are exchanged with the US dollar, where there is a predominance of herding.

More in details, analysing the order flow, we show that traders act as trend-followers at high frequency in all markets and behave as contrarians at daily frequency (as in stock exchange) only in markets where cryptoassets are exchanged between themselves. There is some evidence that sophisticated traders operate in these markets exploiting financial time series regularities for trading, whereas in the second set of markets traders mostly herd at high frequency with low persistency of order flow at lower frequencies.

Market impact refers to the effect of the order flow on the contemporaneous price change/return. The topic has been addressed in a large literature for stock exchange and exchange rate markets, see Berger et al. (2008), Chordia et al. (20022005), Cont et al. (2013), Evans & Lyons (2002), and in cryptocurrency markets, see Donier & Bonart (2015), Lyons & Viswanath-Natraj (2019), Makarov & Schoar (2020), Silantyev (2019). In agreement with the stock exchange literature, we observe that there is a positive relation between the order flow and the contemporaneous asset return in all markets, however the market impact of the order flow is negligible for markets where cryptocurrencies are exchanged with the US dollar and high for markets where cryptocurrencies are exchanged with Tether. Moreover, in the latter markets there is some evidence that sophisticated traders operate with an inventory target, whereas the order flow does not seem to contain relevant information in markets where cryptoassets are exchanged with the US dollar.

As far as market efficiency is concerned, we investigate the topic along two different directions: return predictability and arbitrage opportunities.

Market efficiency of cryptoassets has been investigated in many papers through statistical tests, see, e.g., Bariviera (2017); Brauneis and Mestel (2018); Nadarajah and Chu (2017); Sensoy (2019); Tiwari et al. (2018); Urquhart (2016). In our analysis we take a different perspective. Given that the order flow has a market impact on contemporaneous price movements/returns, we follow Chordia et al. (2002) and Chordia et al. (2005) testing for the random walk hypothesis at different frequencies, i.e., return does not allow to predict future return, adding the lagged order flow as explanatory variable. In the stock market, the authors find that the order flow has an impact on future market return over a short horizon, then sophisticated traders react to order imbalances within trading day by undertaking countervailing trades exploiting serial correlation of the order flow. As a result, the order flow affects market returns over short horizons (5 min–1 h) but not over the day. We find evidence that markets where cryptocurrencies are exchanged with the US dollar are strongly inefficient, the others being more efficient.

We build on the fact that cryptoassets are traded in several exchanges to investigate the possibility that arbitrage opportunities arise by trading on different exchanges (the same pair of cryptoassets) or trading different pairs of cryptoassets.Footnote 1 The issue has been already investigated by Makarov and Schoar (2020) analyzing the existence of arbitrage opportunities in the Bitcoin–US dollar market. They show that there are large deviations in Bitcoin prices across exchanges that often persist for a long time. However, arbitrage opportunities are limited unless different currencies are involved. We show that in the very short term trading activity does not close an arbitrage opportunity, at a high frequency there is a continuation/amplification effect. Then traders discover arbitrage opportunities and the market moves in the direction of closing arbitrage opportunities. Crypto-markets are homogenous on this point. However, arbitrage opportunities are closed more quickly in markets where Tether is involved rather than in markets where cryptocurrencies are exchanged with the US dollar.

These results suggest that markets where cryptoassets are exchanged with each other play a central role on price formation. In these markets there are sophisticated traders who ease the aggregation of opinions/technology shocks. Instead, there is a predominance of herding in markets where cryptoassets are exchanged with the US dollar. The result is confirmed by observing that the order flow in the latter markets does not seem to contain relevant pieces of information and that they are strongly inefficient, whereas the other markets are more efficient.

The paper is organized as follows. In Sect. 2 we describe the dataset of our analysis. In Sect. 3 we provide a statistical analysis of the order flow time series. In Sect. 4 we investigate market impact of order flow. In Sect. 5 we investigate efficiency of crypto-markets. In Sect. 6 we analyze the profitability of arbitrage strategies. In Appendix 1 we provide the list of exchanges considered in the analysis for each pair of cryptoassets. In Appendix 1 we describe in detail how arbitrage opportunities are constructed. In Appendix 1 we provide a table showing the autocorrelation of time series.

2 The dataset

We start defining the main quantities considered in this work: dealing with the European Central Bank’s definition,Footnote 2 a cryptoasset is an asset recorded in digital form and enabled by the use of cryptography that is not and does not represent a financial claim on, or a liability of, any identifiable entity. In the following, we define a cryptocurrency as a native blockchain cryptoasset. As an example, Ether is a cryptocurrency since it is the native digital asset of Ethereum (fees to use Ethereum must be paid in Ether), while Tether, a stablecoin, is a cryptoasset, but not a cryptocurrency, since it is a digital token exchanged on several blockchains, like Ethereum or Tron, none of them using Tether as the native digital asset. In the following we also define a crypto-market as a market where a pair of cryptoassets, or one cryptoasset and a currency, are traded.

We focus our analysis on the period April 1, 2019–October 31, 2020, see Fig. 1 for the US dollar price of Bitcoin during the period of analysis. The main reason for considering this period is that it is just before the huge surge in 2021: as crypto-markets are stable we may investigate their functioning concentrating on market impact and efficiency abstracting from speculative and herding phenomena that are likely to characterize crypto-markets afterwards.

We refer to a pair/market for the currency and asset involved, e.g., BTC-USD stands for the Bitcoin–US dollar pair. The BTC–USD price represents the amount of USD necessary to buy/sell a BTC. Each pair of currency/assets is traded in several exchanges. We consider markets involving currencies, cryptocurrencies, and stablecoins. As far as currency is concerned, we only deal with the US dollar (USD). We consider the two most relevant cryptocurrencies by market capitalization: Bitcoin (BTC) and Ether (ETH). We restrict our attention to the stablecoin with the largest market capitalization: Tether (USDT), a stablecoin pegged to the US dollar and collateralized by the US dollar itself (fiat-backed stablecoin). The choice of concentrating on these cryptoassets is motivated by their relevance. Bitcoin and Ether are the two cryptocurrencies with the largest capitalization in US dollar from February 2016 (few months after the launch of Ethereum) to nowadays. Trading volumes of markets where Tether is exchanged against major cryptocurrencies have steadily grown with Tether becoming the cryptoasset with the largest trading volume since the second quarter of 2019. Also its market capitalization increased significantly: during the period covered in our analysis, Tether capitalization went from 2 billion dollars to 16 billion. We only deal with markets where cryptoassets are traded with the US dollar, the rationale of this choice is that these are the most liquid markets and therefore the most significant markets for our analysis. Moreover, considering different currencies, we should consider the exchange rate between the currencies complicating significantly the picture; for an analysis of arbitrage opportunities dealing with different currencies we refer to Makarov and Schoar (2020).

Fig. 1
figure 1

Bitcoin value in USD on the period April 1, 2019–October 31, 2020

We consider markets where cryptoassets are exchanged between themselves, and markets where cryptoassets are exchanged with the US dollar. We end up with three different sets of pairs: BTC and ETH against USDT (two pairs), ETH against BTC (one pair), BTC, ETH, USDT against USD (three pairs), see Fig. 2. The six pairs are associated to twentyone exchanges, see Appendix 1 for the list of exchanges for each pair. Exchanges have been selected to cover at least 70% of each market according to coinmarketcap.com data regarding the trading volume as in September 2020.Footnote 3 In Sect. 35 we aggregate information from all the exchanges of each market, and therefore we look at all the exchanges as a unique market. In Sect. 6 we deal with data at exchange level.

Fig. 2
figure 2

Pairs considered in the analysis

Differently from other papers on cryptocurrencies we deal with cryptocurrencies and stablecoins markets. The reason is that stablecoins are becoming more and more important, playing a relevant role on trading cryptoassets, see also the discussion in Barucci et al. (2022). To better understand the nexus among standard currency, cryptocurrencies and stablecoins we consider the following example. A person holding BTC on a wallet, sells them against USD in Exchange A, and uses USD to acquire Ether (ETH) in Exchange B. To deploy these trades, the steps could be as follows: the person

  1. 1.

    sends BTC to Exchange A, and sells them against USD. Then asks Exchange A to transfer the acquired USD to a bank account;

  2. 2.

    transfers USD to Exchange B (via bank transfer or credit card);

  3. 3.

    acquires ETH against USD and asks Exchange B to move ETH to a wallet.

The bank transfer from Exchange A to the agent can take a significant time delay, and usually there are high fees. An alternative approach would be to leave a certain amount of USD deposited on different exchanges, to be used for trading, but this approach is inefficient as it requires a significant amount of USD to be allocated in the exchanges.

A shortcut is provided by stablecoins, tokens that have been introduced to capitalize the benefits of cryptocurrencies along with price stability. Exploiting stablecoins the transactions involved in the second example can be carried out as follows: the person

  1. 1.

    sends BTC to Exchange A, and sells them against a stablecoin. Then asks Exchange A to move the stablecoins to a wallet;

  2. 2.

    transfers stablecoins from the wallet to Exchange B;

  3. 3.

    acquires ETH against stablecoins and asks Exchange B to move ETH to the wallet.

In this case no bank transfer is necessary.

This example highlights the relevance of stablecoins in cryptoasset markets to facilitate transactions of cryptoassets without involving USD, i.e., when trades only occur in the cryptoassets domain. Because of these features, we claim that markets involving stablecoins play a relevant role for sophisticated traders who want to detain cryptocurrencies for technology or liquidity reasons.

The dataset is made up of tick-by-tick trading information obtained from Kaiko.Footnote 4 We emphasize that our dataset represents the registered trading activity occurring in the different exchanges with synchronous trading/price observations. The dataset captures actual trading activity and does not look at blockchain activity. On the importance of the right choice of the data provider and of the use of tick-by-tick data in cryptocurrency markets, we refer to Alexander and Dakos (2020); Manahov (2021); Vidal-Tomás (2021).

For each transaction, trade information includes the following items: Exchange, Currency/asset pair, Date (timestamp in milliseconds), Price of the transaction in the reference currency, Amount (quantity of the asset), Sell (True or False, referring to the trade direction, a trade marked as ‘true’ means that a price taker placed a market sell order).

We deal with outliers applying a variation of the methodology proposed in Brownlees and Gallo (2006). For each point-observation of the raw high frequency time series, we consider the interval of 60 s centered on that point and exclude it if the price is more than three standard deviations away from the mean of the interval. The observation is discarded for all time series. For all pairs except USDT-USD, less than 0.01% of the original sample was discarded. For USDT-USD the fraction was around 5%.

Prices are sampled at 1 s frequency. For each second interval, we compute the price as the average price of trades executed during that second, the prices are weighted by the volume of the corresponding trades. Starting from the price sampled at 1 s frequency, we compute the one minute log-return. For each minute t, where there is at least one executed trade, we identify the first second within that minute with a transaction. We denote the price of that transaction as \(p_t\) and the log-return \(r_t\) for minute t is computed as \(r_t= \log \left( \frac{p_{t+1}}{p_t}\right)\). If no trade is executed during minute t, then \(r_t=0\).

In Table 1 we report some basic statistics on returns. Notice that the average log-return is positive when cryptoassets are exchanged with other cryptoassets and negative when they are exchanged with the USD. Skewness is limited with the exception of USDT-USD and BTC-USD markets. In the latter market we observe a negative value showing the relevance of abrupt negative returns. Kurtosis is high in all the markets. Notice that the standard deviation of returns in markets involving USD is much higher than in the three markets involving only cryptoassets.

Table 1 Statistics for 1 min log-returns

3 Order flow

Our analysis is centered on the market Order Flow (OF) or market imbalance. In Chordia et al. (2005) three different specifications of OF are considered: the number of buyer-initiated less the number of seller-initiated trades, the number of buyer-initiated shares purchased less the number of seller-initiated shares sold, the dollars paid by buyer-initiators less the dollars received by seller-initiators. In what follows, we consider the OF in terms of signed volume (the second specification) as it is considered in other papers on cryptocurrencies, see Silantyev (2019), Lyons & Viswanath-Natraj (2019), while the third specification is considered in Makarov and Schoar (2020).Footnote 5

The order flow is computed at the one minute frequency and it is defined as the buyer-initiated volume minus the seller-initiated volume:

$$\begin{aligned} OF = \sum _{i=1}^n V_i \cdot S_i, \end{aligned}$$

where \(V_i\) is the volume of the i-th trade and \(S_i\) denotes the market side initiating the trade: 1 for the buyer, \(-1\) for the seller. n denotes the number of trades in the minute. Some basic statistics of OF are reported in Table 2.

To investigate how OF affects market activity, in Table 3 we first provide results of the Ordinary Least Squares (OLS) regression of \(OF_t\) on lagged return (\(r_{t-1}\)), lagged order flow (\(OF_{t-1}\)), and on both lagged variables (\(r_{t-1}, \ OF_{t-1})\), that is

$$\begin{aligned} OF_t & = \beta _0+\beta _1 OF_{t-1}+\epsilon _t, \end{aligned}$$
(1)
$$\begin{aligned} OF_t & = \beta _0+\beta _2 r_{t-1}+\epsilon _t, \end{aligned}$$
(2)
$$\begin{aligned} OF_t & = \beta _0+\beta _1 OF_{t-1}+\beta _2 r_{t-1}+\epsilon _t, \end{aligned}$$
(3)

\(\epsilon\) being the error random variable.Footnote 6 In the analysis, we deal with the 1, 5 and 10 min, hour, and daily frequency. In Table 11 of Appendix 1 we also report autocorrelations at different frequencies (5 min, 1 h, and 1 day, for the sake of brevity we omit the 1 and 10 min frequency information).

Table 2 Statistics for OF at 1 min frequency
Table 3 OLS regression of \(OF_t\) on lagged return (\(r_{t-1}\)), lagged order flow (\(OF_{t-1}\)), and both lagged variables (\(r_{t-1}, \ OF_{t-1})\) at 1, 5 and 10 min, hour, daily frequency

Let us consider the first column for each frequency in Table 3, i.e., regression (1), as well as Table 11 on OF. The autoregressive component for the OF is positive for all the markets. The pattern with respect to the sampling frequency looks different: as it decreases, the autoregressive coefficient decreases for BTC-USD and ETH-USD markets and increases for BTC-USDT, ETH-USDT, ETH-BTC markets, being almost constant for USDT-USD. At the daily frequency it is limited and weakly or non statistically significant for BTC-USD and ETH-USD markets. The coefficient is high, positive and statistically significant at any frequency for BTC-USDT, ETH-USDT, BTC-ETH, and USDT-USD markets. The explanatory power of the regression (\(R^2\)) decreases and becomes negligible as frequency decreases for the first set of markets and increases for the latter. Results are confirmed controlling for lagged return in the regression (3) (third column).

The relation between OF and lagged return, i.e., regressions (2)–(3), depends on the frequency and on the market. Notice that a negative coefficient would suggest that traders are contrarians either for liquidity or information/arbitrage arguments, instead a positive coefficient would suggest that traders are trend-followers either for the spread of information or because they are herding. We concentrate on the regression (3) (third column for each frequency) which yields a higher \(R^2\) with respect to the regression only involving \(r_{t-1}\). In the USDT-USD market we observe a non statistically significant coefficient for lagged return at any frequency. The result is likely to be due to the features of USDT: being pegged to one USD, USDT is characterized by a constant fundamental value with no dissemination of new information about it and therefore there is no economic rationale for traders to act as contrarians/trend-followers reacting to past returns. On the possibility of trading motivated by arbitrage arguments (with respect to the conversion value) see Sect. 6. In BTC-USD and ETH-USD markets we observe a positive coefficient for \(r_{t-1}\) up to the ten minute frequency, then the coefficient of the linear regression turns out to be non statistically significant and also the explanatory power of the regression becomes negligible.

Considering markets where cryptoassets are exchanged with each other, we observe a positive coefficient for \(r_{t-1}\) at high frequency up to 1 h (BTC-USDT and ETH-USDT) and one minute (ETH-BTC), then the coefficient becomes negative; coefficients are statistically significant with only few exceptions.

Results look different from what is obtained for stock exchanges. Chordia et al. (2002) showed that the order flow in stock exchanges is highly persistent at daily frequency and that investors in aggregate are contrarians: they buy after market declines and sell after market moves up. Chordia and Subrahmanyam (2004) provided a theoretical model replicating the above regularities considering informed and discretionary liquidity traders (they can split their order in different periods). Notice that in a cryptoasset market we cannot assume that there are informed traders, as a matter of fact it is difficult to define its fundamental value as there is no cash flow associated with a cryptocurrency (fiat money). The exception is provided by the USDT-USD market: being USDT pegged to one US dollar, its fundamental value is well defined and does not vary over time. In the other markets we may only assume that there are insiders and outsiders with different opinions on the technology.

The analysis of crypto-markets provides different results. BTC-USD and ETH-USD markets are characterized by very short effects. The order flow is characterized by a strong positive autoregressive component over a short time window (up to 1 h in our analysis) coupled with a positive effect associated with past return; over a longer time window (1 day) both the the autoregressive component of the order flow and past returns are not statistically significant. These results highlight that the markets are characterized by herding effects at high frequency, confirming the analysis in Ballis and Drakos (2020); Bouri et al. (2019); King and Koutmos (2021); Manahov (2021), with no evidence of countervailing-contrarian forces over the day as observed in stock markets. BTC-USDT, ETH-USDT and ETH-BTC markets look different: serial correlation of OF is positive, statistically significant and increases as the frequency decreases, the relation with past return shows that traders act as trend-followers at high frequency and as contrarians at low frequency.

We interpret this evidence as showing that sophisticated traders looking at exploiting financial time series regularities for trading are present only in markets where cryptoassets are exchanged between themselves and not in markets where cryptoassets are exchanged with the USD. In the latter markets, traders mostly herd at high frequency with low persistency of order flow at lower frequencies.

These results confirm the analysis provided by Barucci et al. (2022) showing that markets where a cryptoasset is exchanged against the US dollar play a less signifcant role with respect to markets where cryptoassets are exchanged between themselves. The second set of markets seems to be the place where prices are formed aggregating preference/technology shocks and heterogeneous opinions. In particular, the BTC-USDT market represents a privileged locus for price aggregation and not only for manipulation of BTC as shown in Griffin and Shams (2020).

4 Market impact

We investigate price pressure in crypto-markets, i.e., the effect of the order flow on market return. The literature on stock exchanges has shown that the order flow affects the contemporaneous market return, see Chordia et al. (2002, 2005), Cont et al. (2013). In Table 4 we provide results on a regression of the log-return at time t (\(r_t\)) on \(OF_t\), that is

$$\begin{aligned} r_t & = \beta _0+\beta _1 OF_{t}+\epsilon _t, \end{aligned}$$

at 1, 5 and 10 min, as well as 1 h and 1 day frequency. Results are also reported in Fig. 3, where we plot log-return against OF together with the line obtained from the linear regression for two representative markets (BTC-USD and BTC-USDT) at 1, 5 and 10 min, 1 h, and 1 frequency, see Appendix 1 for the other markets.

Fig. 3
figure 3

Scatter plot of BTC-USD (left) and BTC-USDT (right): 1 min, 5 min, 10 min, 1 h, 1 day. x-axis: OF, y-axis: log-return. The straight line obtained from the linear regression is reported

Table 4 OLS regression of \(r_t\) on \(OF_t; \ OF_t, OF_{t-1}; \ OF_t, OF^3_{t}\) at 1, 5 and 10 min, hour, daily frequency

We observe a positive statistically significant effect in all the markets with the exception of USDT-USD (all frequencies) and of ETH-BTC, BTC-USD, ETH-USD at the daily frequency (positive but not significative).Footnote 7 The results are confirmed looking at the coefficient of \(OF_t\) in all regressions reported in Table 4 including \(OF_{t-1}\), that is

$$\begin{aligned} r_t & = \beta _0+\beta _1 OF_{t}+\beta _2 OF_{t-1}+\epsilon _t, \end{aligned}$$
(4)

with the exception of the regression for ETH-BTC at the daily frequency, where the coefficient of \(OF_t\) is positive and statistically significant.

The rationale for the non statistically significant effect detected for the USDT-USD market can be traced back to the features of USDT: being Tether a stablecoin pegged to the US dollar, traders do not attach any informative value to the order flow interpreting it as pure liquidity. Considering BTC-USD and ETH-USD markets, we observe a positive statistically significant coefficient up to 1 h, at the one day frequency the coefficient is not statistically significant. However, the explanatory power of the regression is negligible for all the frequencies.Footnote 8 In particular, at the daily frequency we have a very low \(R^2\) and a non significative relationship. Results are different for BTC-USDT and ETH-USDT markets showing high explanatory power of the regressions.Footnote 9

The literature on market impact of OF has investigated the linearity of the relation, see Cont et al. (2013), Silantyev (2019): a large OF should significantly move the price. Fig. 3, the pictures render a visual representation of the results of the regressions. The slope in case of BTC-USD is lower than in case of BTC-USDT, at the daily frequency the relationship for the first market is nearly flat. Moreover, in case of BTC-USD we have many observations with limited OF and a large (in absolute value) log-return, the phenomenon is not observed in the BTC-USDT market. We conclude that prices also move without a significant OF in markets where cryptocurrencies are exchanged with the US dollar, instead price movements are associated with a large OF in markets where cryptocurrencies are exchanged with Tether. As a robustness check we performed a regression eliminating observations with large return/small OF, results look similar to those presented above: the explanatory power of the OF for return in the BTC-USD and ETH-USD market increases in a negligible way.

Confirming the analysis in Silantyev (2019), the pictures show no clear evidence of nonlinearity in the relationship between OF and return. To investigate the point analytically, we have added a cubic term for OF in the regressions (Table 4, third column)

$$\begin{aligned} r_t & = \beta _0+\beta _1 OF_{t}+\beta _3 OF_{t}^3+\epsilon _t. \end{aligned}$$

It turns out that the cubic term enters with a small positive and statistically significant coefficient at low frequencies for BTC-USDT and ETH-USDT (with an increase in the explanatory power, especially at the daily frequency), thus showing that a large OF significantly affects price movements in these markets highlighting that the size of the OF amplifies the price variation.

Moving back to regression (4) (Table 4, second column) we observe a negative coefficient for the lagged OF as in Chordia et al. (2002) for all the markets except USDT-USD. The coefficient is statistically significant for BTC-USD and ETH-USD at high frequency but the explanatory power is almost negligible. Lagged OF is statistically significant for all the frequencies for BTC-USDT, ETH-USDT and BTC-ETH markets. As suggested in the above paper for stock exchange, see also Chordia and Subrahmanyam (2004) for a model, this result is consistent with the inventory stabilization hypothesis: sophisticated traders (insiders) have an inventory target for BTC, ETH, USDT and therefore the lagged imbalance is reversed and hence it exerts a negative effect on the contemporaneous return. It is interesting to notice that the phenomenon is observed only in markets where cryptoassets are exchanged with each other and not in markets where BTC and ETH are exchanged with the US dollar. It seems that sophisticated traders mostly trade in the first set of markets with an inventory target, whereas, in markets where cryptocurrencies are exchanged with the US dollar there are outsiders who trade for other reasons.

These results confirm the heterogeneity among crypto-markets: markets where an asset is exchanged with the US dollar and markets where cryptoassets are exchanged between them. In the latter set of markets we observe a strong impact of order flow on market return with a nonlinear effect, moreover in these markets the dynamics of the order flow is consistent with the hypothesis that sophisticated traders operate with an inventory target. In markets where cryptoassets are exchanged with the US dollar the order flow does not seem to contain relevant information.

We may interpret these results as showing that in markets where only cryptoassets are involved (and in particular a stablecoin) traders interpret the order flow as conveying market sentiment-opinions of the market or technology shocks and, therefore, in these markets the price moves in the direction of the order flow. This evidence corroborates the claim that BTC-USDT and ETH-USDT markets play a predominant role in aggregating preference/technology shocks and heterogeneous opinions while the markets where cryptoassets are exchanged against the US dollar play a limited role on price discovery being populated by outsiders who mostly follow the flock.

5 Market efficiency

In this section we deal with market efficiency investigating the relation between log-return and lagged log-return and OF, that is

$$\begin{aligned} r_t & = \beta _0+\beta _1 r_{t-1}+\epsilon _t, \end{aligned}$$
(5)
$$\begin{aligned} r_t & = \beta _0+\beta _2 OF_{t-1}+\epsilon _t,\nonumber \\ r_t & = \beta _0+\beta _1 r_{t-1}+\beta _2 OF_{t-1}+\epsilon _t. \end{aligned}$$
(6)

In Table 5 we provide results of the regressions. The evidence is not in favor of market efficiency: the coefficient associated with lagged log-return is statistically significant in almost all the markets at every frequency.

This result is at odds with the evidence for stock markets, where it is shown that the lagged log-return is not statistically significant at any frequency, that lagged OF has an impact on market return intraday over a short horizon, then sophisticated traders react to order imbalances within trading day by undertaking countervailing trades exploiting serial correlation of the OF, see Chordia et al. (2002, 2005).

Table 5 OLS regression of \(r_t\) on \(r_{t-1}; \ r_{t-1}, OF_{t-1}\) at 1, 5 min, hour, daily frequency
Fig. 4
figure 4

Scatter plot of BTC-USD (left) and BTC-USDT (right): 1 min, 5 min, 10 min, 1 h, 1 day. x-axis: lagged log-return, y-axis: log-return. The straight line obtained from the linear regression is reported

We concentrate on all markets with the exception of USDT-USD, as it is not really interesting to investigate efficiency in this market being pegged to 1 US dollar. The markets look different along three dimensions: sign of the coefficient of past return, statistical significance of the regression, and statistical significance of the coefficient of \(OF_{t-1}\). BTC-USD and ETH-USD are characterized by a negative coefficient for past return at the one minute frequency and by a positive coefficient at lower frequency. BTC-USDT, ETH-USDT and ETH-BTC markets are characterized by a positive (or non significant) coefficient at one minute frequency and mostly by a negative coefficient for past return at lower frequency. To provide a visual representation of the difference, in Fig. 4 we plot the log-return against lagged log-return for BTC-USD and BTC-USDT, along with the line corresponding to regression (5). For the other pairs, we refer to Figs. 8, 9 in Appendix 1. This evidence suggests that the first couple of markets is inefficient with the predominance of trend-follower traders during the day, instead the second set of markets is characterized by contrarian traders. This result corroborates the evidence on the OF in Sect. 3. Looking at the explanatory power of the regressions, we observe that it is negligible at a high frequency but becomes very high at the daily frequency in BTC-USD and ETH-USD markets. Instead, in BTC-USDT, ETH-USDT and ETH-BTC markets the explanatory power is negligible and increases slightly at the daily frequency. As far as the lagged OF is concerned, looking at regression (6), we observe that its coefficient is statistically significant for BTC-USD and ETH-USD at any frequency (with only one exception). Instead, in BTC-USDT, ETH-USDT and ETH-BTC markets the coefficient is statistically significant only at high frequency and not at low frequency. In both cases, the contribution to the explanatory power is limited.

We should be cautious in interpreting these results as the explanatory power is very low with the only exception provided by BTC-USD and ETH-USD at daily frequency. The two sets of markets seem to be inefficient but in a different way. In case of markets where cryptocurrencies are exchanged with the US dollar, predictability comes from a trend component in returns which is likely to be associated with a herding phenomenon among traders, see Ballis and Drakos (2020), Bouri et al. (2019), King & Koutmos (2021), Manahov (2021); also lagged OF has a predictive power for future return. At daily horizon, predictability is high and the evidence is strongly against markets efficiency. Instead, in markets where cryptoassets are exchanged with each other, there is evidence of mean reversion and predictability is limited. This result may be linked to liquidity effects rather than to traders exploiting predictability. However, the results at a daily frequency suggest that markets where cryptocurrencies are exchanged with the US dollar are strongly inefficient.

6 Arbitrage profits

The emergence of arbitrage opportunities in the BTC-USD market has been investigated in Makarov and Schoar (2020) considering different exchanges and different currencies, i.e., an arbitrage is obtained buying/selling BTC in different exchanges for different currencies. They find out that there are limited arbitrage opportunities in each market (exchanges per a specific currency) but significant arbitrage opportunities arise by trading in different currencies. In what follows, we apply their methodology as it is described in Appendix 1. We define an arbitrage strategy as a couple of trades that can be implemented in a market buying and selling the cryptoassets contemporaneously (same second as a time stamp) with no inventory risk. For example an arbitrage strategy in the BTC-USD market is built acquiring BTC (with USD) in an exchange and immediately reselling them in another exchange obtaining a positive net amount of USD. To implement this strategy, the price of the first transaction should be lower than the one of the second transaction yielding an arbitrage spread (\(s_\textsc {arbitrage}\), see Eq. (7) in Appendix 1). The profit of the arbitrage strategy (\(p_\textsc {arbitrage}\), see Eq. (8)) is obtained by multiplying the spread for the minimum between the quantity available for trade on the bid and on the ask side.

Table 6 Arbitrage summary statistics trading in a single pair

Table 6 provides summary statistics for the arbitrage strategy. Confirming the analysis in Makarov and Schoar (2020), the money value of arbitrage opportunities (arbitrage) is rather limited. The rank of arbitrage profits presents on top BTC-USDT, ETH-USDT and BTC-USD markets, the market with the smallest amount of arbitrage profit is the one for USDT-USD. We decompose arbitrage profits in the average arbitrage size (spread) and the fraction of seconds in the sample with an arbitrage opportunity (opp_perc). The first measure is about the size of the arbitrage when it materializes, the second one about its frequency. The average arbitrage size is high in markets where BTC, ETH and USDT are exchanged with the USD, but in these markets there are few seconds with arbitrage opportunities. Instead, the frequency of arbitrage opportunities is high in markets where BTC and ETH are exchanged with the USDT but in these markets the average arbitrage spread is limited. Notice that there is a weak connection between the size of the arbitrage opportunity, its frequency and the number of exchanges in which the pair is traded.

Most exchanges have a taker fee of 0.10%.Footnote 10 In the last column of Table 6 we report arbitrage profits net of transaction costs. High fees may render unprofitable arbitrage opportunities. This occurs in case of the pair with highest arbitrage profits (BTC-USDT). We can conclude that arbitrage strategies between USDT and cryptocurrencies are non-profitable. Instead, arbitrage strategies centered on USD (BTC-USD, ETH-USD) are profitable net of transaction costs. The outcome is due to the fact that latter markets are characterized by higher arbitrage spreads.

We have extended the analysis to arbitrage opportunities that can arise when three markets are involved, considering a cryptocurrency, Tether and the US dollar, see Fig. 5 in Appendix 1 for a graphical illustration. Table 7 provides summary statistics for the buy and sell arbitrage strategies: considering a triangulation, a buy arbitrage strategy consists in buying Tether through the BTC or ETH markets, and selling them in the Tether market against USD; a sell arbitrage strategy goes in the opposite direction, see Appendix 1 for details. We observe that the size of arbitrage profits is rather limited. Arbitrage profits net of fees are null in most of the cases. Because of the limited size of the arbitrage we omit further analysis.

Table 7 Arbitrage summary statistics for the buy and sell arbitrage strategies, trading in three pairs

We investigate efficiency in crypto-markets by looking at the relationship between the size of the arbitrage spread and market activity: trading volume and the absolute value of OF. If markets are efficient, we expect market activity to close arbitrage opportunities. Notice that the arbitrage spread is positive or null and therefore we provide censored regressions. For the USDT-USD market, given that the arbitrage spread is different from zero only on 0.01% of the seconds, see Table 6, we substitute the arbitrage spread with the price parity, i.e., the distance of the value in USD of Tether from 1 USD. We control for the lagged level of the arbitrage spread, and therefore our regression looks at the arbitrage spread variation.

Table 8 Censored regressions of the arbitrage spread on market activity at 1, 5 and 10 min, hour, daily frequency

There are few papers dealing with the relation between arbitrage opportunities and market activity in stock exchange markets,Footnote 11 in our analysis we deal with the following censored regressions

$$\begin{aligned} s_{\textsc {arbitrage}}{\,}_t & = \beta _0+\beta _1 s_{\textsc {arbitrage}}{\,}_{t-1}+\beta _2 |OF_t| + \epsilon _t,\\ s_{\textsc {arbitrage}}{\,}_t & = \beta _0+\beta _1 s_{\textsc {arbitrage}}{\,}_{t-1}+\beta _2 V_t + \epsilon _t, \end{aligned}$$

V denoting the volume. In Chordia et al. (2002), the authors analyze the relationship between the absolute value of OF and variations in bid-ask spread (liquidity). They find that higher bid-ask spreads, and therefore lower arbitrage spreads, occur when orders are more unbalanced in both directions. Building on this result, we claim that |OF| should negatively affect the arbitrage size: order imbalance in either direction should close the arbitrage opportunities. Results reported in the left part of Table 8 show that this is not the case in crypto-markets: a significant order imbalance (from the buy and sell side) leads to a larger arbitrage spread. A similar result holds true for trading volume, see the right part of Table 8. Only in case of USDT-USD at a daily frequency the variables are not statistically significant (also USDT-USD at one minute frequency for |OF|).

Table 9 Censored regressions of the arbitrage spread on \(OF_t\) and \(OF_t \ OF_{t-1}\) at 1, 5 and 10 min, hour, daily frequency

Results change when the the arbitrage spread is regressed on OF, that is

$$\begin{aligned} s_{\textsc {arbitrage}}{\,}_t & = \beta _0+\beta _1 s_{\textsc {arbitrage}}{\,}_{t-1}+\beta _2 OF_t+ \epsilon _t, \end{aligned}$$

see the left part of Table 9. If there is a strong activity from the buy side compared to the sell side, then the arbitrage spread declines. The result suggests that traders, aiming at exploiting an arbitrage opportunity, opt to buy the cryptoassets and then to sell rather than to do the reverse. The rationale could be that there are short sale constraints in the markets.Footnote 12 The result holds true for all the markets depending on the frequency, except ETH-BTC which shows a statistically significant positive coefficient of \(OF_t\) at every frequency. For BTC-USD, ETH-USD, BTC-USDT, ETH-USDT markets we observe a positive statistically significant coefficient for \(OF_t\) at one minute, then the coefficient becomes statistically significant and negative and finally tends to be non significant at the daily frequency (the coefficient is still positive at 5 min for BTC-USD, while for ETH-USD is statistically significant also at the daily frequency). This result suggests that markets are not able to close the arbitrage in the short term, i.e., there is a continuation/amplification effect, then traders discover arbitrage opportunities and the market moves in the direction of closing them.Footnote 13

The analysis is similar for the different markets, however the explanatory power of the regressions for BTC-USDT and ETH-USDT is higher than for the regressions for BTC-USD and ETH-USD at high frequency. The first set of markets seems to close arbitrage opportunities quickly compared to the second one. The result is confirmed by observing that coefficient for \(OF_t\) is positive also at the five minute frequency for BTC-USD. Once again, these results reinforce the earlier findings, indicating that all markets exhibit inefficiencies. However, the markets where a cryptoasset is exchanged with the US dollar demonstrate particularly strong inefficiency. As a robustness check, in the right part of Table 9 we also consider the lagged OF in the regression, i.e.,

$$\begin{aligned} s_{\textsc {arbitrage}}{\,}_t & = \beta _0+\beta _1 s_{\textsc {arbitrage}}{\,}_{t-1}+\beta _2 OF_t+\beta _3 OF_{t-1} + \epsilon _t. \end{aligned}$$

Results on the coefficient of \(OF_t\) are confirmed.

7 Conclusions

This paper aims to investigate the microstructure of cryptoasset markets building on the observation that markets where cryptocurrencies are exchanged with the US dollar are different from markets where cryptoassets are exchanged with each other. As a matter of fact, stablecoins allow traders to trade cryptocurrencies with lower costs compared to go through an exchange in US dollars. As a consequence, sophisticated traders are likely to remain inside the cryptoassets domain detaining Tether as safe asset rather than US dollar.

Investigating market impact and efficiency at different frequencies we have provided evidence that markets where cryptoassets are exchanged with each other play a central role on price formation. In these markets there are sophisticated traders who behave as contrarians, instead in markets where cryptoassets are exchanged with the US dollar there is a predominance of herding. In markets where cryptoassets are traded against US dollar the order flow does not seem to contain relevant pieces of information. Moreover, markets where cryptocurrencies are traded against the US dollar are strongly inefficient, whereas those where cryptoassets are exchanged with each other are inefficient to a less extent.

These results highlight that crypto-markets are not homogeneous. To capture the sentiment about cryptoassets, we should not look at markets where cryptocurrencies are exchanged with the US dollar, but at Tether that plays a central role in the cryptoasset environment. Tether–Bitcoin and Tether–Ether are the markets to look at in order to capture the mood about cryptoassets.