1 Introduction

It is yet to be definitively understood what drives cryptocurrency prices. One school of thought, inspired from the vast and very diverse literature on network theory, posits that digital currencies can be valued on the basis of actual users and their interactions together (Alabi 2017; Economides 1993; Metcalfe 2013; Wheatley et al. 2019). Others argue that such prices may be driven by irrationality and resemble the historic bubbles that have come to pass throughout human history (Wenker 2014). From the standpoint of regulators, there are also conflicting views regarding digital currencies, their legitimacy, and what function they serve. For example, policymakers regularly issue stern warnings arguing that such currencies cannot replace, or, fundamentally serve, the function that traditional fiat currencies perform (Lo and Wang 2014). In addition, they may aid and abet criminal activity (Grinberg, 2012). Despite such warnings, however, the number of businesses accepting such digital currencies, like bitcoin, is ever-increasing.Footnote 1 Along with this growing interest, central bankers are presently discussing the feasibility of using government-backed, rather than decentralized, digital currencies in society.Footnote 2

The aforementioned give only an infinitesimal and very partial glimpse of the divergences in opinion regarding cryptocurrencies. They also show the diverse and competing stakeholders that are allured into understanding more their possible uses and intrinsic properties. Perhaps most alluring to all such stakeholders is the unprecedented price volatility these digital coins exhibit. For example, in 2015, the average price of bitcoin oscillated around $270 (in USD). Bitcoin's price first peaked December 17, 2017, where it was valued at about $19,533 \(-\) an arithmetic return of over 7000% over its 2015 price level. After the introduction of bitcoin futures by the Chicago Board Options Exchange (CBOE) on December 10, 2017, and, subsequently, by the Chicago Mercantile Exchange (CME) on December 18, 2017, the price of bitcoin declined precipitously. It reached a low of about $3,200 in mid-December of 2018 before rising rapidly to another peak of over $12,600 in late June of 2019. Since the global outbreak of the severe acute respiratory syndrome coronavirus 2 (SARS-Cov-2), bitcoin's price continues to oscillate wildly.

This type of price volatility can render conventional regression modeling approaches unstable, with coefficient estimates that lack robustness as bitcoin's price behavior undergoes regime changes across time. Lahmiri and Bekiros (2019) show strong evidence that cryptocurrency price movements are governed by regime shifts. Others show how bitcoin price regimes can be described using Markovian-type models (Koutmos 2020; Koutmos and Payne 2021; Ma et al. 2020). Yang et al. (2020) show that bitcoin miners themselves may exhibit behaviors that can be modeled using Markov models. The recurring theme in this particular literature is that bitcoin's volatility results in a significant quantity of outliers and thus thicker tails than what is commonly observed from Gaussian processes. In the words of Yermack (2015), “…bitcoin faces a number of obstacles in becoming a useful unit of account…one problem arises from its extreme volatility…the value of a bitcoin compared to other currencies changes greatly on a day-to-day basis…” (p. 38).

From a data analysis perspective, this is problematic because conclusions cannot be aptly drawn from conventional regression techniques. Catania et al. (2019) show that cryptocurrencies, like bitcoin, are subject to instabilities both in the means and at higher moments of their returns. As a result, coefficient estimates can be unstable across time, making it difficult to ascertain what factors are important in driving cryptocurrency prices and when they actually matter. In a similar vein, Atsalakis et al. (2019) show that there are strong nonlinearities inherent in cryptocurrency prices that can weaken the explanatory power of conventional regression models. In the words of Li and Wang (2017), and in view of their findings which seek to delineate the technologic and economic determinants of bitcoin, they argue, “…the relationships we identified could be subject to further changes as exchange markets develop…it will be necessary to revisit the model at some future time and consider the possibility of multiple regime changes in exchange rate dynamics…” (p. 59).

In light of the aforementioned, and focusing particularly on bitcoin, this study nests several motives and objectives together.Footnote 3 First, and as mentioned, there is much debate as to what drives cryptocurrency prices and there is yet to be definitive agreement as to which variables matter and when they matter. Drawing on behavioral research, this study argues that investor sentiment is strongly linked to bitcoin price changes. Using order book data from Coinbase, which is a major bitcoin exchange, we estimate order flow imbalance across time and use this as a proxy for investor sentiment. This approach is motivated by behavioral research, such as the studies by Kumar and Lee (2006) and Chelley-Steeley et al. (2019), and references therein, which show that buy and sell orders reflect investor sentiment and can be linked to price changes. As we discuss more elaborately later on, using buy and sell order data for bitcoin can serve as a proxy for investor sentiment that is linked to shifts in bitcoin prices.

Second, several studies, such as those mentioned, find instability in coefficient estimates in models seeking to describe cryptocurrency price changes. This arises from thick tails in the distributions of their price changes. These thick tails are the result of outliers and regime changes which their prices frequently undergo, and, can be an issue when trying to obtain robust point estimates of regression coefficients. To directly address this, our study utilizes a bootstrapped quantile regression approach, motivated by Buchinsky (1995), Koenker and Basset (1978), Chakraborty (2003) and Pedersen (2015), among others, to construct robust estimates of the sentiment-return relation.

Third, this study draws on emergent research that seeks to describe bitcoin's price changes on the basis of microstructure and blockchain variables, as well as network theory. As mentioned, a segment of these studies examine whether bitcoin's value is driven by the number of users and the measurable interactions between these users. Drawing on some of these studies, our study employs a broad range of exchange-specific and blockchain-wide variables, respectively, and uses them as control variables to check the robustness of our estimated sentiment-return relation.

By combining these three motives and objectives, we depict a more complete picture of the entire distribution of bitcoin price changes. While the primary objective of our study is to quantify a robust sentiment-return relation, which we strive to achieve both empirically and theoretically, a secondary contribution that emerges is the observable heterogeneous role which microstructure and network variables play in explaining bitcoin returns across low, mid and high quantiles. This indicates that bitcoin explanatory or forecasting models are likely strongly affected by the regime, or, time sample, which they investigate. This raises questions about the robustness of their estimates and is a likely reason why aforementioned studies detect instability in their models. As mentioned by Li and Wang (2017), bitcoin prices undergo regime changes and thus display distinct statistical properties.

This study uses intraday order book data (buy and sell orders) of individual bitcoin investors to construct a proxy for investor sentiment and to quantify the sentiment-return relation across quantiles. We source our trading records from Coinbase since this is presently one of the largest digital currency exchanges in the world. Our objective, as we discuss further, is to specifically answer the following empirical question: What is the relation between investor sentiment and price changes for bitcoin?

With a unique data set of intraday buy and sell orders from 2015 to 2020, we construct a proxy for daily investor sentiment in order to shed light on this question and to quantify the sentiment-return relation across the conditional distribution of bitcoin price changes. Using our bootstrapped quantile regression approach to account for heteroskedasticity and ensure robustness in our results, we show reliable evidence that rising sentiment is associated with positive price changes while declining sentiment is associated with negative price changes. This relation remains robust across the distribution of bitcoin price changes, from periods of extreme negative returns (lower conditional quantiles) to periods of extreme positive returns (higher conditional quantiles). This relation also remains robust when controlling for a variety of exchange-specific and blockchain-wide variables. It also remains robust when controlling for aggregate momentum across major cryptocurrencies.

The remainder of this study is structured as follows. Section 2 discusses the data; specifically, the bitcoin prices we use from Coinbase as well as their statistical properties before and after the introduction of bitcoin futures. Section 2 also discusses the order book data and how we use it to construct a proxy of daily investor sentiment, as well as the exchange-specific and blockchain-wide variables which later serve as control variables. For Sect. 2, we also introduce a constructed index of aggregate cryptocurrency returns that we later use as yet an additional control variable to check the robustness of our sentiment-return relation. Section 3 discusses our bootstrapped quantile regression model and how it helps in achieving robust results. Sections 4 and 5 discuss the findings and conclude, respectively.

2 Data

This section discusses the sample data used in this study. Our data is sourced from the Coinbase exchange and Bitcoin's blockchain, respectively. In sub-Sects. 2.1 and 2.2 we discuss the bitcoin price data and its statistical properties before and after the introduction of futures by the CBOE. In sub-Sect. 2.3 we show how intraday order book data (buy and sell orders) can be used to construct a daily measure of investor sentiment. This measure serves as our primary regressor variable in quantifying the sentiment-return relation. Sub-Sect. 2.4 discusses the exchange-specific and blockchain-wide variables that are used as control variables. Finally, sub-Sect. 2.5 develops a cryptocurrency market index (CRYIX) in order to capture the effects of market-wide cryptocurrency market momentum. The construction and use of CRYIX serves as yet another control variable to gauge the explanatory power of the daily investor sentiment variable.

2.1 Bitcoin prices

Bitcoin's explosive price volatility, relative to other asset classes, is arguably one of its major attractions for investors and speculators alike. It is also one of the reasons regulators caution that it may never serve as a medium of exchange, a unit of account, or a store of value (Lo and Wang 2014). Figure 1 shows a time series plot of bitcoin prices, as well as our measure for investor sentiment, which we discuss in sub-Sect. 2.3, over our sample period (January 26, 2015 through October 23, 2020).

Fig 1
figure 1

Time series plot of bitcoin prices and investor sentiment. This figure shows time series plots for daily bitcoin prices in USD (top and right axis) and investor sentiment (bottom and left axis), both of which are collected from the Coinbase digital currency exchange The vertical red line that falls within the 4th quarter of the year 2017 denotes the day that the Chicago Board Options Exchange (CBOE) launched bitcoin futures products, which is December 10th. The entire sample period for this analysis is from January 26, 2015 through October 23, 2020 and the frequency of the data is daily (and includes weekend data)

For this particular sample period, bitcoin prices reached a peak of about $19,533 (in USD) on December 17, 2017. As mentioned, the CBOE and CME introduced bitcoin futures, respectively, on December 10th and 18th of 2017.

The introduction of futures markets may be a contributing factor that led to sharp declines in bitcoin's price. By December 22, 2017 (just four days after the CME introduced futures), bitcoin's price fell to about $13,893 (in USD) \(-\) an arithmetic price change of almost -30% from its December 17 high! From there, it declined precipitously before bottoming out on mid-December of 2018. In the words Hale et al. (2018), “…[the] one-sided speculative demand came to an end when the futures for bitcoin started trading…with the introduction of bitcoin futures, pessimists could bet on a bitcoin price decline, buying and selling contracts with a lower delivery price in the future than the spot price…” (p. 2).

2.2 Summary statistics and bitcoin risk-return metrics

Table 1 provides a glimpse into bitcoin's price behavior and its risk-return characteristics before and after the introduction of bitcoin futures on December 10, 2017. A quick inspection of the sampled minimum (min.) and maximum (max.) returns show the explosive price appreciations and declines that bitcoin has experienced. For the entire sample, the minimum daily return was about -22%, which occurred during the post-futures period, while the maximum daily return was about 19%, which occurred during the pre-futures period. Across the entire sample period, as well as the two subsample periods, bitcoin returns do not follow a Gaussian distribution pattern and are instead heavy-tailed.

Table 1 Risk-return statistics of bitcoin returns

The lower mean returns, higher standard deviation, and heavier tales of the post-futures sample period indicate relatively heightened risks and an overall deterioration in the risk-return tradeoff. This can be observed from the calculated value-at-risk (VaR) and Sharpe ratios.Footnote 4 To integrate higher moment risks, which can account for crash risk (skewness) and tail risk (kurtosis), we also compute the modified VaR (MVaR) and modified Sharpe ratio (Gregoriou and Gueyie 2003; Signer and Favre 2002). The MVaR can be expressed as follows (using similar notations as the VaR equation in footnote (4)):

$$\mathrm{MVaR}=W\left[\mu -\{{z}_{c}+\frac{1}{6}\left({z}_{c}^{2}-1\right)S+\frac{1}{24}\left({z}_{c}^{3}-3{z}_{c}\right)K-\frac{1}{36}\left(2{z}_{c}^{3}-5{z}_{c}\right){S}^{2}\}\sigma \right]$$
(1)

whereby \(W\) is the value of the invested portfolio; \({z}_{c}\) is the critical value for the probability \((1-\alpha )\) and is -1.96 for a 95% probability; \(\mu\) is the mean return; \(\sigma\) is the standard deviation of returns; \(S\) is skewness of returns and computed as \(S=(1/T)\sum_{t=1}^{T}{\left({[R}_{t}-\overline{R }]/\sigma \right)}^{3}\); \(K\) is excess kurtosis of returns and computed as \(K=(1/T)\sum_{t=1}^{T}{\left({[R}_{t}-\overline{R }]/\sigma \right)}^{4}-3\). The modified Sharpe ratio, which measures returns per unit of volatility and higher moment risks, can thus be expressed as,

$$\mathrm{Modified Sharpe Ratio}=({R}_{\mathrm{t}}-{r}_{\mathrm{f}})/\mathrm{MVaR}$$
(2)

whereby \({r}_{f}\) serves as the risk-free rate (see footnote (4)).

Comparing between the pre- and post-futures sub-sample periods reveals distinct regimes in bitcoin's risk-return characteristics. As mentioned, bitcoin futures provided investors and speculators a vehicle in which they can bet on (or hedge against) bitcoin price declines. These distinct price regimes provide motivation for our robust quantile regression technique that we discuss in Sect. 3, which allows us to get a clear picture of bitcoin's entire return distribution. Figure 2 is a quantile–quantile plot of bitcoin returns for the entire sample period and also shows how bitcoin returns depart from a theoretical Gaussian distribution. Such departures can render instability in parameter estimates from traditional regression techniques which tend to focus on the center of variables' distributions.

Fig 2
figure 2

Quantile–Quantile plot of bitcoin returns. This figure shows a quantile–quantile (Q-Q) plot for bitcoin returns. The horizontal axis (x-axis) denotes the quantiles theoretically observable in a normal distribution while the vertical axis (y-axis) denotes the quantiles of bitcoin returns The entire sample period for this analysis is from January 26, 2015 through October 23, 2020 and the frequency of the data is daily (and includes weekend data)

2.3 Measuring investor sentiment

Coinbase is an electronic market where participants can submit buy and sell orders for bitcoin. Such orders may be executed immediately (if they are market orders) or they may be limit orders (with a limit price that may not be in close proximity to the prevailing market price) that are potentially executed at a later point in time. Taken together, these orders constitute Coinbase's order book and consist of outstanding orders that are awaiting eventual execution or cancellation by the participant, subject to the rules of the exchange.Footnote 5 The flow of such orders across time intervals is referred to as order flow.

Figure 3 shows a toy example of an order book that shows bid- and ask-side orders in queues at various price levels around the mid price. Fluctuations in these orders across time, and any resultant imbalances between bid- and ask-side orders, is referred to as order flow imbalance. It is this very imbalance that serves as a measure for investor sentiment in our study.

Fig 3
figure 3

(Source: Author's depiction). Similar toy examples of order books for traditional asset classes are shown elsewhere in market microstructure and asset pricing literature (e.g., Bonart and Gould 2017). The green and red bars denote buy- and sell-side limit orders, respectively. The x-axis denotes possible prices and the y-axis corresponds to market depth. While differences in the bid and ask prices result in a bid-ask spread, differences in the quantity of orders on the buy- and sell-side result in an order flow imbalance

Example of an order book. This figure illustrates a toy example of an order book

In discussing the incremental information content of order flow, relative to what can be gleaned by observing only trade volumes, Chordia et al. (2002) argue “…volume alone is absolutely guaranteed to conceal some important aspects of trading…consider, for example, a reported volume of one million shares…at one extreme, this might be a million shares sold…at the other extreme it could be a million shares purchased…” (p. 112). As others have noted, order flow and order flow imbalances can signal the possession of private information in the market and, when amplified, can impact liquidity conditions and drive price movements (Chelley-Steeley et al., 2019).

In light of the aforementioned, and, using intraday bid- and ask-side orders from Coinbase across the various time intervals, we model daily order flow imbalance. This serves as our proxy for investor sentiment and is calculated as followsFootnote 6:

$${SENT}_{t}={\sum }_{1}^{N}({Buys}_{t}-{Sells}_{t})/{\sum }_{1}^{N}({Buys}_{t}+{Sells}_{t})$$
(3)

whereby, respectively, \(Buys\) and \(Sells\) are bid- and ask-side market and limit orders (in dollar volumes) on day \(t\), calculated from trades \(1\) through \(N\) on trading day \(t\). A given trading day’s \(SENT\) indicates whether, on aggregate, investors are optimistic (net buyers whereby \(SENT>0\)) or pessimistic (net sellers whereby \(SENT<0\)). This ratio serves as a direct and explicit measure of investor sentiment and it is the primary objective of this study to use this measure and quantify its linkages with bitcoin returns.

Figure 4 shows a scatter plot between our measure for investor sentiment (in Eq. 3) and bitcoin returns, on the x-axis and y-axis, respectively. The estimated \({R}^{2}\) from a least squares regression is approximately 0.02 while the fitted confidence ellipses at the 0.90, 0.95 and 0.99 levels, respectively, do not show signs of strong dependence patterns. However, a K-nearest neighbor fit line shows sharp nonlinearities in the sentiment-return relation—a feature in the data that cannot be robustly identified or quantified using conventional regression tools. In addition, and as shown in the scatter plot, there are a multitude of outliers, indicative of regime shifts that take place within bitcoin prices across time.

Fig 4
figure 4

Scatter plot of bitcoin returns and investor sentiment. This figure shows a scatter plot of bitcoin returns in percentages (y-axis) against investor sentiment (x-axis), both of which are collected from the Coinbase digital currency exchange. While the red line is a slope estimate using ordinary least squares, which has an R.2 of 0.0209, the green line is a K-nearest neighbor fit line. The three ellipses in blue are confidence ellipses at the 0.90, 0.95 and 0.99 confidence levels, respectively. The entire sample period for this analysis is from January 26, 2015 through October 23, 2020 and the frequency of the data is daily (and includes weekend data)

2.4 Exchange-specific and blockchain-wide variables

To quantify the sentiment-return relation, and to obtain robust coefficient estimates, our study integrates a variety of exchange-specific and blockchain-wide variables as control variables, all of which are also summarized in Table 2. In addition, Table 2 identifies what transformations, if any, are applied to induce stationarity in their respective time series. All these variables, including our investor sentiment measure in Eq. (3), are observed, or calculated, across each trading day t.

Table 2 Description of variables

The exchange-specific variables, obtained through Coinbase (see footnote (6)), reflect investor activity and market conditions, and are as follows:

1. RV Range volatility, estimated as \({(\mathrm{ln}{H}_{t}-\mathrm{ln}{L}_{t})}^{2}/(4\mathrm{ln}2)\), where H and L denote the intraday high and low price over day t. This is a measure of volatility risk and, in asset pricing literature, impacts risk aversion and liquidity conditions. In empirical tests of the risk-return relation in conventional assets, such as stock portfolios or currencies, there tends to be a negative relation. This phenomena is often referred to as the so-called 'volatility feedback' effect (Carr and Wu 2017).

2. VOLM Trade volume is the total value of all trading (in USD) over a given day t. This variable measures the flow of information to the market, and has been shown to impact asset prices and volatilities (Andersen 1996).

3. TPM Trades per minute is the average number of trades (on a per minute basis) over a given day t. This measure can potentially capture intensity in the information flow (Eisler and Kertesz 2006).

The blockchain-wide variables, which measure network activity and the health of the overall blockchain for bitcoin, are as followsFootnote 7:

4. ADD Addresses is the number of active bitcoin addresses over day t. A bitcoin address is a unique identifier that serves as a destination for a bitcoin payment—in much the same way as an email address is needed to send or receive email messages. Active addresses serve as a proxy for network value and activity. As mentioned earlier, there is a growing body of literature on network theory that seeks to delineate the intrinsic value of bitcoin as a function of actual users and their interactions together.

5. BCONT Block count is the number of blocks discovered (or “mined”) by miners on day t. These blocks record data relating to bitcoin transfers or transactions among all the users. They serve as pieces of the whole ledger and, when confirmed and added to the blockchain, become an immutable record within the entire ledger. Blocks serve as a proxy for network value and activity.

6. BSIZE Block size is the sum of the size (measured in bytes) of all the blocks created on day t. Block sizes serve as a proxy for network value and activity.

7. FEE Mining fees is the median fee per transaction (in USD) that miners earn when validating transactions and discovering new blocks that are added to the blockchain. Mining fees serve as a proxy for network value and activity. The median fee on day t is observed rather than the mean fee (although in untabulated results, available upon request, our findings are robust to this choice).

8. HASH Hash rate represents the actual hash rate (in giga hashes per second) which the Bitcoin network is performing at in order to stay operational. This is a technological unit of measurement as well as security metric. Greater hashing power reflects greater resistance to attack and enhanced security. On a given day t, the hash rate can be estimated from the number of blocks mined (in the last 24 h, or, 1 trading day), as well as the level of difficulty. Thus, it can be expressed as follows: HASH = \({[2}^{32}][\mathrm{D}/\mathrm{T}]\), whereby \(\mathrm{D}\) denotes network difficulty and \(\mathrm{T}\) denotes the average time between mined blocks. Network difficulty, \(\mathrm{D}\), in technical terms, measures how difficult it is to find a hash below some target, and is adjusted approximately every 2 weeks (or 2,016 blocks).

9. ISSU Issuance count is the sum of new bitcoins (in BTC) issued on a given day t. This is a direct measure of bitcoin's money supply that is in circulation, and is analogous to the monetary base measures of 'M1' or 'M2' that are used by the Federal Reserve for US dollars. Unlike US dollars, however, bitcoins are issued, or, “minted,” when a miner discovers a new block. Bitcoin's total supply is set to converge to 21 million units (government authorities behind fiat currencies, such as the US government and the dollar, do not declare a constrained supply). The number of newly issued bitcoins began at 50 and is set to decrease geometrically (halving takes place every 210,000 blocks, or, every 4 years approximately). The cumulative supply can thus be expressed as \(\left\{{\sum }_{i=0}^{32}\mathrm{210,000}\left[50*{10}^{8}/{2}^{i}\right]\right\}/{10}^{8}\).

2.5 Cryptocurrency market-wide Momentum

To further substantiate the robustness of our sentiment-return relation, we construct a cryptocurrency market index (CRYIX) in order to capture information content, investor behavior, and momentum effects arising from cryptocurrency markets on aggregate and not necessarily idiosyncratic to only bitcoin exchanges. To construct CRYIX, we use nine of the major cryptocurrencies that are presently in circulation. Our index is market-weighted whereby the weight of each of the respective nine constituents is determined by dividing its market capitalization by the total market capitalization of the nine cryptocurrencies. The construction of this market-weighted index is similar to, say, that of the S&P 500 index, and serves to capture aggregate market conditions. The returns for our CRYIX index, \({R}_{m,t}\), are thus constructed as

$${R}_{m,t}={\sum }_{i=1}^{N}{R}_{i,t}\times \left({MktCap}_{i,t} / {TotalMktCap}_{t}\right)$$
(4)

whereby \({R}_{i,t}\) and \({MktCap}_{i,t}\) is the return and market capitalization of cryptocurrency \(i\), respectively, while \({TotalMktCap}_{t}\) denotes the total market capitalization of all nine \((N=9)\) cryptocurrencies. The CRYIX index used here is constructed for each trading day \(t\) in light of the daily changes in market capitalizations of all the nine cryptocurrencies.

Selection of these nine cryptocurrencies was performed while satisfying a delicate balance between, firstly, choosing major cryptocurrencies that have some of the largest present-day market capitalizations, since they are the ones that predominantly receive attention in the financial press, and, secondly, ensuring that those selected cryptocurrencies have a sample range that availably corresponds as close as possible with that in our present sample. This is worth noting since there are several emergent cryptocurrencies that, although may be very large in terms of current market capitalization, have only been in existence and circulation for a rather short period of time.

In light of the aforementioned, the nine cryptocurrencies used for our CRYIX index are as follows, respectively, along with their abbreviations: bitcoin (BTC), XRP (XRP), ethereum (ETH), stellar (XLM), litecoin (LTC), dash (DASH), monero (XMR), NEM (XEM), and dogecoin (DOGE). In the appendix, we discuss more the various intricacies of these cryptocurrencies, including summary statistics of their price movements and average market capitalizations (in Table 5), summary statistics of their respective contributions to the CRYIX in terms of market capitalization (in Table six), and pairwise correlations between the nine CRYIX constituent cryptocurrencies (in Table 7).

Tables 5, 6 and 7 reveal that bitcoin remains the dominant cryptocurrency in terms of market capitalization among all the nine constituents, with an average contribution to the CRYIX of approximately 67% over the sample range. Presently, and at the time of this study, even while considering the nearly 10,000 individual cryptocurrencies in circulation, bitcoin constitutes approximately 50% of the total market capitalization of all cryptocurrencies. Thus, our CRYIX index is fairly representative of conditions and investor expectations in aggregate cryptocurrency markets. The average pairwise correlation among all the nine cryptocurrency constituents (shown in Table 7) is 0.4592. The lowest pairwise correlation is shown to be between Dash and XRP (at 0.3385) while the highest pairwise correlation is shown to be between bitcoin and litecoin (at 0.6697).

Table 3 Quantifying the sentiment-return relation

As is discussed in Sect. 4, CRYIX is used as an additional control variable that captures aggregate market momentum and expectations among cryptocurrency traders. Its inclusion as a control variable will test whether our sentiment measure, as well as our estimated sentiment-return relation, still retains explanatory power across the distribution of bitcoin price changes.

3 Robust approach to quantile modeling

This study seeks to map the linkages between investor sentiment, as defined in Eq. (3), and bitcoin price changes across their entire distribution. This serves the purpose of giving a more complete, accurate, and robust picture of the sentiment-return relation since, as shown in Sect. 2, bitcoin price changes are non-Gaussian in nature with thick tails and pronounced degrees of skewness. Because our sample range encapsulates at least two distinct regimes in bitcoin's price behaviors (pre- and post-futures), there are compelling reasons why a full description of the sentiment-return relation across the entire return distribution is fundamental to our understanding of what drives bitcoin prices.

In addition to using the bootstrapped quantile regression procedure to ensure robustness in coefficient estimates of the sentiment-return relation, this study employs a range of exchange-specific and blockchain-wide control variables, respectively. While quantifying, and, ensuring robustness in the sentiment-return relation is an important objective of this study, a secondary contribution that emerges is we identify heterogeneous linkages between these variables, which serve to capture conditions in the Coinbase exchange and blockchain network, respectively, with bitcoin returns, across the distribution. In other words, the explanatory power of these variables (and in some instances their fundamental signs, as we discuss later on) essentially varies in terms of location shifts, variance, and skewness.

In light of the aforementioned, we can begin by expressing the conditional quantile model as follows (Buchinsky 1994, 1995; Koenker and Bassett 1978):

$${{y}_{i}={x}_{i}^{^{\prime}}{\beta }_{\theta }+{u}_{\theta i}\mathrm{ with Quant}}_{\theta }\left({y}_{i}|{x}_{i}\right)={x}_{i}^{^{\prime}}{\beta }_{\theta }, i=1,\dots ,n$$
(5)

whereby \({\beta }_{\theta }\) and \({x}_{i}\) are \(\mathrm{K}\times 1\) vectors and \({x}_{i1}\equiv 1\). \({\mathrm{Quant}}_{\theta }\left(y|x\right)\) represents the \(\theta\) th quantile of \(y\) given \(x\). The error term, \({u}_{\theta }\equiv y-{x}^{^{\prime}}{\beta }_{\theta }\) is assumed to be a continuously differentiable cumulative distribution function, \({F}_{{u}_{\theta }}(\cdot |x)\), and density function \({f}_{{u}_{\theta }}(\cdot |x)\). From this, it follows that \(\mathrm{Quant}({u}_{\theta }|x)=0\) and \({f}_{{u}_{\theta }}\left(0|x\right)>0\). An estimator for \({\beta }_{\theta }\) is obtainable as follows:

$$\underset{\beta }{\mathrm{min}}\frac{1}{n}{\sum }_{i=1}^{n}{\rho }_{\theta }({y}_{i}-{x}_{i}^{^{\prime}}\beta )$$
(6)

and whereby the check function can be expressed as \(\rho_{\theta } \left( \lambda \right) = \left( {\theta - I\left( {\lambda < 0} \right)} \right)\lambda\) and \(I(A)\) is the indicator function. As Koenker and Bassett (1978) show, Eq. (6) is a solvable linear programming problem.

As Powell (1986) demonstrates, Eq. (6) methodologically fits into the generalized method of moments (GMM) model for censored quantile regression frameworks and, under the conditions of Huber (1967), it can be shown that

$$\sqrt{N}\left({\widehat{\beta }}_{\theta }-{\beta }_{\theta }\right)\stackrel{\mathcal{L}}{\to }\mathcal{N}(0,{\Lambda }_{\theta })$$
(7)

whereby.

$${\Lambda }_{\theta }={\theta \left(1-\theta \right)\left(E\left[{F}_{{u}_{\theta }}\left(0|x\right)x{x}^{\mathrm{^{\prime}}}\right]\right)}^{-1}{E\left[x{x}^{\mathrm{^{\prime}}}\right](E\left[{F}_{{u}_{\theta }}\left(0|x\right)x{x}^{\mathrm{^{\prime}}}\right])}^{-1}$$
(8)

If we have \({F}_{{u}_{\theta }}\left(0|x\right)={F}_{{u}_{\theta }}(0)\) (i.e., the density of \({u}_{\theta }\) at 0 is independent of \(x\)), then \({\Lambda }_{\theta }\) can be simplified to as follows:

$${\Lambda }_{\theta }=\left(\theta \left(1-\theta \right)/{F}_{{u}_{\theta }}^{2}(0)\right){\left(E\left[x{x}^{^{\prime}}\right]\right)}^{-1}={\sigma }_{\theta }^{2}{\left(E\left[x{x}^{^{\prime}}\right]\right)}^{-1}$$
(9)

and where.

$${\sigma }_{\theta }^{2}=\left(\theta \left(1-\theta \right)/{F}_{{u}_{\theta }}^{2}(0)\right)$$
(10)

Consistency and asymptotic normality is already well-established in the literature for quantile regression modeling (Koenker 2005). Following Buchinsky (1994, 1995) and Pedersen (2015), among others, our study extracts standard errors through an (x, y)-bootstrapping procedure. This method consists of drawing \(B\) samples of (xt, yt) pairs each of size \(m\) and with replacement from the \(N-1\) pairs of the original sample pool, each with an equivalent probability. This method can enhance robustness since, first, it does not require identically distributed error terms and, second, can account for heteroskedasticity in a time series. As shown in Fig. 2, bitcoin price changes are heteroskedastic, exhibiting sharp regime changes at various points in time.

Thus, and in the case where\({f}_{{u}_{\theta }}\left(0|x\right)\ne {f}_{{u}_{\theta }}(0)\), the (x, y)-bootstrapping method is employed and, for each of the \(B\) samples, an estimator for \({\widehat{\beta }}_{\theta }\) is computed in order to capture \(B\) bootstrap estimates,\({\widehat{\beta }}_{\theta }^{1},\dots ,{\widehat{\beta }}_{\theta }^{B}\). Therefore, an estimate for \({\Lambda }_{\theta }\) can now be expressed as.

$${\widehat{\Lambda }}_{\theta }=N\left(m/N\right)\left(1/B\right){\sum }_{b=1}^{B}({\widehat{\beta }}_{\theta }^{b}-{\widehat{\beta }}_{\theta })({\widehat{\beta }}_{\theta }^{b}-{\widehat{\beta }}_{\theta })\mathrm{^{\prime}}$$
(11)

Buchinsky (1995) conducts an extensive Monte Carlo study and finds that this (x, y)-bootstrapping method can enhance robustness of coefficient estimates. In light of this, and consistent with Pedersen (2015), the size of the bootstrap samples used in this study will be equal to the original sample size.

4 Discussion of findings

Given the unique investor clientele and microstructure underlying the Bitcoin ecosystem, it is important to quantify the sentiment-return relation and assess whether it contrasts with the types of relations we see in conventional asset classes. As mentioned, our measure for investor sentiment in Eq. 3 is direct and explicit, and is derived from actual buying and selling behavior. To ensure robustness in our sentiment-return relation estimations, and to control for the aforementioned factors discussed in sub-Sects. 2.4 and 2.5, respectively, we show our findings in Table 3 across three panels. In panel A, we show the sentiment-return relation estimated using a univariable approach (in other words, there are no control variables and only SENT serves as the regressor). In panel B we employ a multivariable approach where we include exchange-specific controls (RV, VOLM and TPM, respectively). Finally, in panel C, we use a multivariable approach where we include both exchange-specific and blockchain-wide controls (RV, VOLM, TPM, ADD, BCONT, BSIZE, FEE, HASH, and ISSU, respectively).Footnote 8

The findings are extracted using the robust quantile regression approach in Eqs. (5) through (11), with the purpose of producing a higher resolution mapping of the linkages between sentiment (as well as the other control variables) and returns across the distribution of bitcoin's price changes. In panel A we show coefficient estimates of the sentiment-return relation using a univariable approach (no control variables). This serves as a baseline model to gauge the nature of the relation and to assess whether it changes qualitatively after we include controls (in panels B and C, respectively). From panel A, the coefficient for SENT is consistently positive and significant at the 5% level across all of the quantiles. Let us now discuss the significance and interpretation of this. As mentioned in sub-Sect. 2.3, on any given trading day t, SENT indicates whether investors are optimistic (net buyers, whereby SENT > 0) or pessimistic (net sellers, whereby SENT < 0). In other words, our SENT indicator becomes positive when bid-side orders outweigh ask-side orders and our SENT indicator becomes negative when ask-side orders outweigh bid-side orders. The positive coefficients across the quantiles show a positive relation, which means that if SENT is positive, it indicates that growing optimism is associated with positive price changes. Conversely, if SENT is negative, it indicates that growing pessimism is associated with negative price changes. It is important to emphasize that the sign and significance of the coefficients remain robust across bear periods (lower and left-tail quantiles) and bull periods (higher and right-tail quantiles). This finding strongly suggests that investor sentiment is an important determinant that drives bitcoin price behaviors, much in the same way as other studies show how sentiment can impact the prices of traditional asset classes, such as bonds and equities.

When integrating exchange-specific controls in panel B, we see that the significance of our coefficients for SENT remain qualitatively robust across quantiles. This continues to hold when we also include blockchain-wide controls, as shown in panel C. Namely, across all quantiles, we document a significant sentiment-return relation that is not diluted by the inclusion of important explanatory factors.

Inspection of the full multivariable model with all the controls (panel C), as well as the slope coefficients from this model, shown in Fig. 5, yield many interesting insights. First, there appears to be a volatility feedback effect, which posits a negative risk-return relation, in the left-tail region of bitcoin price changes (specifically, in the 0.10 through 0.50 quantiles) while, in the extreme right-tail region (the 0.80 and 0.90 quantiles), there is a positive and significant relation at the 5% level. The remainder of the quantiles show a weak or insignificant relation. This is interesting to note since the volatility feedback effect is a stylized fact in the return series of traditional asset classes, such as equities or currencies.

Fig 5
figure 5figure 5

Slope coefficient estimates. This figure shows slope coefficient estimates for the multivariable model which includes exchange-specific and blockchain-wide control variables (panel C of Table 3) for each quantile \(\alpha\) \(= \{0.10, 0.20,\dots ,0.80, 0.90\}\). The red lines give the upper and lower bounds of a 95% confidence interval size using the bootstrapped standard errors, as discussed in Eqs. (5) through (11)

Second, there is partial evidence in support of a liquidity premium hypothesis, in the spirit of Amihud and Mendelson (1986), but possibly only in the left-tail quantiles (0.10, 0.20 and 0.30). One interpretation of this hypothesis, in broad terms, argues that rising illiquidity (low trading activity) should increase expected returns. We see in the aforementioned left-tail quantiles a negative relation between trade volume, which can proxy for (il-)liquidity, and bitcoin returns. However, we do not see such a relation in the other quantiles. Despite these very preliminary findings, and while not the focal point of our present study, more research into whether there exists a liquidity premium in cryptocurrency markets would be of interest since others also show that asset size, or, the hedging demand of investors can play a role in whether we see a liquidity premium and what its sign is (Llorente et al. 2002).

Third, Bitcoin microstructure variables, while they portray important characteristics underlying the blockchain, do not provide an infallible explanation as to why bitcoin prices move the way they do. For example, the coefficient for BSIZE shows a sign reversal from the left- to the right-tail quantiles, going from positive to negative. Of all the microstructure variables, FEE is most reliably significant across the quantiles. The sign for its coefficient is also positive, which signifies that increases in fees, arguably from a growing demand for miners' services, is linked to positive bitcoin price changes.

In Table 4, we show estimates for our sentiment-return relation when including, in addition to all the exchange-specific and blockchain-wide controls (as shown in Table 3), lagged returns of the CRYIX index, estimated using Eq. (4). These lags capture up to the past five trading days and, as mentioned, are designed to incorporate momentum and expectations that impact cryptocurrency markets on aggregate. As is discussed in King and Koutmos (2021), the buying and selling activities of momentum-type, or, sentiment-driven, traders are conditional upon past price changes. This is consistent with many widely-viewed cryptocurrency websites that make buy and sell recommendations that utilize some form of momentum-based strategies based on historical price data. Such websites are publicly available and can amplify momentum-type of trading behaviors.Footnote 9

Table 4 Controlling for aggregate cryptocurrency market index (CRYIX) momentum

Inspection of our coefficients in Table 4 reveal the following. First, inclusion of the lagged \(CRYIX\) does not change the statistical significance or interpretation of our sentiment-return relation. Second, only \({CRYIX}_{t-1}\) is statistically significant throughout the distribution of bitcoin price changes. It is also positive in its sign, indicating that returns at time \(t-1\) are positively related to returns at time \(t\). This provides some evidence of short-term persistence in bitcoin price changes. This is consistent with the findings of King and Koutmos (2021) who show that past prices can motivate momentum-type trading behaviors. Third, there are no major changes or reversals in signs in the remaining exchange-specific and blockchain-wide variables. One notable difference is the coefficient for \(VOLM\), which in Table 3 is negative and significant only in the lower quantiles but in Table 4 is negative and significant in the lower quantiles and positive and significant in the higher quantiles. This finding, although not the focal point of our study, reveals some complexities in the volume-return relation for bitcoin and makes for interesting future research, especially from the perspective of the liquidity premium hypothesis of Amihud and Mendelson (1986).

Taken altogether, these findings provide compelling evidence that bitcoin prices undergo regime shifts across time, making it intractable to extract more precise coefficient estimates and to answer the fundamental question of what drives bitcoin prices. A broad advantage to the quantile regression approach implemented herein, apart from the bootstrapping procedure which is performed to assure robust estimates, is that quantile regressions do not assume a particular distribution for the regressand, unlike conventional regression modeling, and can overcome non-Gaussian features in data sets, such as heteroskedasticity. As shown in our findings, the ability to provide a more complete description of how sentiment (as well as other factors) is linked to returns across the distribution of bitcoin price changes, shows the future challenges cryptocurrency research must consider in terms of model building and model output interpretation.

5 Concluding remarks

This study uses trading records of individual bitcoin investors to quantify the linkages between investor sentiment and bitcoin returns. Our measure for investor sentiment is estimated from Coinbase's order book and consists of bid- and ask-side limit and market orders. Using this measure, which is a direct and explicit account of investor sentiment and ensuing trading behavior, we show that rising sentiment is linked to price increases while declining sentiment is linked to price decreases.

To ensure robustness in our findings we, first, perform estimations using a bootstrapped quantile regression approach and, second, employ a host of control variables that capture conditions in the Coinbase market exchange, Bitcoin's overall blockchain, and information content pertaining to major cryptocurrencies other than just bitcoin. We show that our sentiment-return relation remains robust across quantiles and when including controls. This is an important finding since, as other studies argue, bitcoin's price changes frequently undergo regime shifts. As we show herein, after the introduction of futures by the CBOE, bitcoin experienced large declines initially and heightened volatility. As we show, this type of volatile price behavior results in many outlier observations and is difficult to model using conventional approaches which rely on using normally distributed data.

Our findings also reconcile the many observations of other studies, which argue that there is intertemporal instability in coefficient estimates in models seeking to explain bitcoin prices. This instability arises because, first, bitcoin prices undergo regime shifts and, second, conventional regression models tend to focus on the mean or the center of the distribution of bitcoin price changes. We show that such an approach, apart from possibly yielding instability in estimates, sweeps very important information under the rug regarding intervariable relationships during periods of sharp price declines (lower quantiles) versus periods of sharp price increases (higher quantiles). For example, many of the exchange-specific and blockchain-wide variables show heterogeneity across quantiles in terms of their signs and sizes in explaining bitcoin price movements. This heterogeneity is a likely reason for the lack of consensus when attempting to identify what factors are linked to cryptocurrency price movements.

In light of our findings, we recommend future research into cryptocurrency price movements proceed in at least two (interrelated) paths. The first path can address shortcomings in modeling approaches that focus exclusively on the center of the distribution of cryptocurrencies' price changes. Higher resolution mappings of intervariable relations across the distribution are needed in order to better understand the behavior of these digital coins. Second, future research can also focus on deriving more proxies for investor sentiment. While the investor clientele for cryptocurrencies may differ from that of traditional asset classes, it has been shown in studies that psychology and emotions are strong determinants of asset price movements. More proxies for sentiment can thus help in assessing to what extent their prices move due to investment behavior, or, psychology. This can then answer broader questions, such as whether the prices of these digital assets are irrational, or, whether they reflect some intrinsic value.