We obtain the 1-s intraday data of nearest-maturity futures price of BTC, 10-year T-Note and e-mini gold on Chicago Mercantile Exchange (CME), S&P 500 ETF (SPY) on New York Stock Exchange, and spot prices of BTC and gold from Thomson Reuters Tick History. The data sample starts from 18th December 2017, the earliest date available for BTC futures contracts, to 26th March 2021. We note that while SPY trades in Monday–Friday from 9:30 to 16:00 EST, the futures contracts of BTC, 10-year T-Note and gold trade in Sunday–Friday 17:00 to 16:00 CT, and the spot prices of BTC and gold are available throughout 24 h a day. In addition, we use SPY and 10-year T-Note futures to proxy for the S&P500 index and bond spot prices, respectively. Lastly, we collect the daily BTC spot market capitalization and mean USD transaction fee of BTC blockchain from coinmetrics.io and Google trend data containing the keyword “Bitcoin” from trends.google.com as the control variables in the regression analysis.
We first study that the information shares of BTC futures market \(IS\) are linked to futures trading activity with the following regression on daily observations:
$${\text{log}}\left( {BTC \, futures \, trading \, activity} \right)_{t} = \beta_{0} + \beta_{1} {\text{log}}(\% \,spread_{t} ) + \beta_{2} {\text{log}}(realized \, volatility_{t} ) + \beta_{3} IS_{t} + \epsilon_{t} ,$$
(1)
where the \(realized \, volatility\) and \(\% \, spread\) are common factors to explain the trading activity, see Karpoff (1987) and Chordia et al. (2001), respectively. \(realized \, volatility\) is measured as the square root of the sum of squared 1- or 5-min midquote futures returns over a day, and \(\% \, spread\) is measured as the daily average of (ask price – bid price)/midquote price in futures market throughout the day. We also control for the weekday fixed effects and calculate the Newey and West (1987) standard errors adjusted for 5 lags. We next analyze the extent to which the BTC futures’ information shares can explain the BTC correlation with other assets from the following regression:
$$\begin{aligned} correlation_{t} & = \beta_{0} + \beta_{1} {\text{log}}(BTC \, futures \, trading \, activity_{t} ) \\ &\quad+ \beta_{2} {\text{log}}(the \,other \, asset \, trading \, activity_{t} ) + \beta_{3} IS_{t - i} + \epsilon_{t} , \end{aligned}$$
(2)
where the daily realized correlation is computed from 1-min spot returns, the daily BTC futures’ information share \(IS_{t - i}\) lags the BTC correlation with other assets from \(i = 0\) to 5 days, and the daily \(trading \, activity_{t}\) is measured by trading volume or the number of trades on day \(t\) to control for the positive linkage between trading activity and asset price volatility, therefore any potential relation between trading activity and cross-asset correlation. We also include the weekday fixed effects and calculate the Newey and West (1987) standard errors adjusted for 5 lags. As a robustness test, we add the BTC spot market capitalization, daily mean USD transaction fee of BTC blockchain, and Google trend that contains the keyword “Bitcoin” to capture the rising interest on BTC, which in turn impacts BTC price and its correlation with other assets, e.g. see Chuffart (2021) for the predictability of cryptocurrency attention on BTC price.
Our sample period contains the COVID pandemic that has been shown to adversely impact financial markets, therefore the connectedness and volatility transmission between different assets. For example, Fig. 1 illustrates a shift to the positive domain in the daily correlation between BTC and S&P500 as well as gold since 2020. We treat this issue with two approaches. We first include the time trend and its squared value in (Eq. 3) to capture any nonlinear trend of correlation. Alternatively, we have a dummy variable COVID equal 1 since the start of 2020 and its interaction with \(IS_{t - i}\) in (Eq. 4) to study any change in the main coefficient of interest during the COVID pandemic as follows:
$$\begin{aligned} correlation_{t} & = \beta_{0} + \beta_{1} {\text{log}}(BTC \, futures \, trading \, activity_{t} ) \\ & \quad + \beta_{2} {\text{log}}(the \, other \, asset \, trading \, activity_{t} ) + \beta_{3} IS_{t - i} + \beta_{4} {\text{log}}(BTC \, market \, cap_{t} ) \\ & \quad + \beta_{5} {\text{log}}(daily \, mean \, USD \, transaction \, fee_{t} ) + \beta_{6} {\text{log}}(Google \, trend_{t} ) \\ & \quad + \beta_{7} { }time \, trend_{t} + \beta_{8} { }time \, trend_{t}^{2} + \epsilon_{t} \\ \end{aligned}$$
(3)
$$\begin{aligned} correlation_{t} & = \beta_{0} + \beta_{1} {\text{log}}(BTC \, futures \, trading \, activity_{t} ) \\ &\quad+ \beta_{2} {\text{log}}(the \, other \, asset \, trading \, activity_{t} )\\ &\quad + \beta_{3} IS_{t - i} + \beta_{4} IS_{t - i} \times COVID + \beta_{5} COVID + \epsilon_{t} . \end{aligned}$$
(4)
To proxy for the information shares of futures price in the BTC price discovery, we follow Hasbrouck (1995, 2003) as follows. Given the same underlying cryptocurrency in the futures and spot market, the random walk component is the same for all prices. The random walk innovation variance is then decomposed into components attributed to innovations in each price, and the relative contribution of a price series to this variance is defined as its information shares. Let \(p_{1t}\) and \(p_{2t}\) be the log spot price and futures prices, respectively, and the quantity \(p_{1t} - p_{2t}\) ex-ante does not diverge over time. To measure the information shares of either futures or spot price, we estimate the reduced-form econometric specification or the vector error correction model (VECM) of order \(M\) as follows:
$${\Delta }{\varvec{p}}_{t} = {\varvec{A}}_{1} {\Delta }{\varvec{p}}_{t - 1} + {\varvec{A}}_{2} {\Delta }{\varvec{p}}_{t - 2} + \cdots + {\varvec{A}}_{M} {\Delta }{\varvec{p}}_{t - M} + \gamma \left( {p_{1,t - 1} - p_{2,t - 1} - \mu } \right) + {\varvec{u}}_{t} ,$$
(5)
where \({\varvec{p}}_{t}\) is the column vector of log prices, \({\varvec{A}}_{i}\) is the matrix of autoregressive coefficients, \(\mu = E\left( {p_{1,t} - p_{2,t} } \right)\) is the mean deviation, \(\gamma \left( {p_{1,t - 1} - p_{2,t - 1} - \mu } \right)\) is the error correction term, and \(\gamma\) is the adjustment coefficient. Each price in the VECM model contains a latent random walk component or the “efficient price”. This component is unobservable without further identification restrictions in the current reduced-form specification, but its innovations have the property that they are linear in the disturbances. In other words,
$${\varvec{w}}_{t} = \left( {w_{1,t} w_{2,t} } \right)^{\prime } = A{\varvec{u}}_{t} = \left[ {\begin{array}{*{20}c} {a_{11} } & {a_{12} } \\ {a_{21} } & {a_{22} } \\ \end{array} } \right]\left( {u_{1t} u_{2t} } \right)^{\prime } ,$$
(6)
where \(w_{it}\) is the random walk innovation in the ith price series, and the \(a_{ij}\) is determined from the VECM parameter set. Both futures and spot prices reflect the same efficient price, or the random walk innovations are identical. Therefore. the rows in the coefficient matrix are the same, and we focus on either one to have the random walk innovation variance as
$$Var\left( {w_{1t} } \right) = \left( {a_{11} a_{12} } \right) \left[ {\begin{array}{*{20}c} {\sigma_{1}^{2} } & {\sigma_{12} } \\ {\sigma_{12} } & {\sigma_{2}^{2} } \\ \end{array} } \right]\left( {a_{11} a_{12} } \right)^{\prime } .$$
(7)
If the price innovation covariance matrix is diagonal, \(Var\left( {w_{1t} } \right) = a_{11}^{2} \sigma_{1}^{2} + a_{12}^{2} \sigma_{2}^{2}\), and the information share of ith price series is defined as \(I_{i} = a_{ii}^{2} \sigma_{i}^{2} /Var\left( {w_{1t} } \right)\). If the price innovation covariance matrix is not diagonal, the information share is not exactly identified, alternative factor rotations are examined, and we take the simple average of information shares from different factor rotations. We measure the information share on daily basis to allow the time-varying mean deviation \(\mu\). We also note that if the price innovations are highly correlated, it is not possible to assign explanatory power with any precision, so we employ a time resolution of 1–5 s to avoid introducing correlation by time aggregation. To avoid a large number of coefficients if the interval width is small, we follow Hasbrouck (2003) to use the polynomial distributed lags.
We note that the BTC spot returns and futures returns Granger cause to each other in the Vector Autoregressive VAR(20) model from daily returns at the 1% significance level.Footnote 2 In other words, futures returns and their trading activities matter to the spot market and therefore underlying BTC price, even when institutional investors only trade futures contract in the BTC market. This is because the institutional investorsFootnote 3 are main players in the CME futures market due to its restrictive contract specifications, e.g. 5 BTCs per contract vs. 0.001 BTC as the minimum trade amount per perpetual futures contract in Binance, and they have limited access to the spot market due to regulatory issues over the sample period.Footnote 4 This gives us further support to study the information shares from the futures market as a proxy of the relative importance of institutional trading in the BTC market. Alternatively, Gonzalo and Granger (1995) measure the information share of a market from the permanent (as opposed to transitory) shocks in that market to underlying price that result in a disequilibrium, which is reflected through the error correction process of CME futures price in our context. We argue that this error correction process is potentially inefficient when compared with other traditional assets because a majority of institutional investors are not able to exploit any arbitrage opportunities in the BTC market from trading futures and spot contracts. Therefore, we focus on the Hasbrouck’s information shares and leave other metrics for future research.