Beyond the hype: examining the relationship between Wikipedia attention and realised skewness for crypto assets

This study investigates the relationship between Wikipedia searches and the next day’s realised skewness for the top four cryptocurrencies between 2020 and 2022, using a time-varying framework. Daily realised skewness was calculated using one-minute data, and Wikipedia queries were used as a proxy for investor attention. The study reports a positive time-varying relationship between today’s Wikipedia attention and the next day’s realised skewness, with increases in Wikipedia attention on a given day associated with higher realised skewness on the following day. However, there was no significant contemporaneous relationship between Wikipedia attention and realised skewness. The study also found that the relationship between Wikipedia attention and realised skewness becomes more stable over time. The findings suggest that Wikipedia attention may be a useful predictor of realised skewness for cryptocurrencies, which could have implications for investors and market participants.


Introduction
Existing literature has extensively discussed how the gambling preferences of investors affect asset pricing and the resulting consequences. Brunnermeier et al. (2007), Barberis and Huang (2008), and Kumar (2009) use theoretical models and empirical data from the US market to demonstrate that investors have a notable inclination towards stocks that resemble lotteries. As a result of their demand for such securities, the prices of these stocks may become overvalued, leading to lower future returns. According to Kumar (2009), individual investors, particularly those with low incomes, tend to favour investments with low probability but high potential returns. As a result, some investors actively rebalance their portfolios to capture positive skewness in search of extreme positive returns.
Cryptocurrency is one financial asset that has been associated with lotterylike features and gambling preference because of the high volatility of returns (Lin et al. 2021) as well as the dominance of naïve retail traders (Ahmed and Al Mafrachi 2021). Johnson et al. (2023) equate the risky trading associated with cryptocurrency with gambling and that retail traders often end up with mental health problems as they seek positive skewness. Thus, the features of cryptocurrency markets as well as the types of investors who invest in this asset, make them susceptible to extreme returns. Liu and Tsyvinski (2021) report that cryptocurrencies have higher average skewness than stocks, which makes them more lottery-like. This is particularly important for short or medium-term investors who may not hold their positions long enough for the average to stabilise (Kumar 2009;Liu and Tsyvinski 2021). The cryptocurrency markets' inefficiencies and volatile price behaviour present an opportunity to investigate behavioural factors influencing cryptocurrency prices. Additionally, the retail investor dominates the cryptocurrency market, offering an intriguing platform to investigate the effects of attention-driven trading in this particular asset class.
The theory of "limited attention" (Kahneman 1973;Pashler and Johnston 1998) suggests that individuals have limited cognitive resources to devote to processing information and making decisions. This means that people cannot fully analyse and evaluate all available information, and instead, they must rely on simplified mental shortcuts or heuristics to make decisions. The theory of limited attention is particularly relevant in the context of investment decision-making. Given the vast amount of financial information available, investors cannot possibly process and evaluate every piece of information and must instead focus on a limited set of information. This can lead to biases in decision-making, as investors may rely too heavily on easily accessible information or ignore important information that is not at the forefront of their attention. For example, limited attention can lead to herding behaviour, as investors may rely on the attention and opinions of others to make investment decisions, rather than conducting their own independent analysis. Barber and Odean (2008) report that due to an individual's limited cognitive capacity, only stocks that catch the attention of investors are likely to be considered for investment. They also observed that this limited attention has an asymmetrical effect on buying and selling decisions, leading to a net buying behaviour for stocks that attract attention. Thus, investor attention can play a crucial role in investors' recognition of stock gambling characteristics and their subsequent trading behaviour. This means that a positive relationship between investor attention and skewness can be hypothesised. In line with the theory of limited attention, the lack of adequate information about cryptocurrencies makes investors resort to readily available information from online sources like Wikipedia, Google search queries and online news platforms. We therefore formally investigate the time-varying relationship between one of these sources of information, Wikipedia queries and whether it contributes to positive realised skewness in cryptocurrencies.
In the context of cryptocurrencies, several studies have examined how crypto assets are affected by investor attention, but mostly using the first two moments: returns and volatility. Zhang and Wang (2020) examine the underlying connections between the top 20 cryptocurrencies and investor attention as measured by Google Trends between April 2013 and April 2018. The authors demonstrate the linear and nonlinear Granger causality tests' support for the bi-directional causality between investor attention and cryptocurrency returns and volatility. Zhu et al. (2021) findings provide evidence that points to investor attention being the primary driver of changes in the return and realised volatility of the Bitcoin market. Smales (2022b) looks at how the market dynamics for Bitcoin and other cryptocurrencies relate to investor attention and indicators of uncertainty. The study reports that when investor interest grows, cryptocurrency markets see larger returns, more volatility, and increased illiquidity.
The quantile causality approach was used by Subramaniam and Chakraborty (2020) to explore the impact of investor attention on cryptocurrency prices. The findings show that investor attention affects the pricing of emerging cryptocurrencies like Ripple only when they perform well. In the study, the prices of cryptocurrencies during expansionary phases and fear selling during the subpar market performance are used as support for the attention-induced price pressure hypothesis. The quantile regression results shows that a high level of investor interest is always linked to a profitable outcome. The study reported that investor attention can considerably predict return and volatility based on a regression framework utilising the hash algorithm.
Rather than seeking the predictive power of investor attention in the first two moments, other studies have concentrated on the lower tail of the return distribution by investigating investor attention vis-à-vis crash risk. To ascertain the impact of investor attention on crash risk, Smales (2022a) uses a quantile regression technique. Crash risk is estimated using the down-up volatility and negative coefficient of skewness. When the crash risk is low (below median quantiles), investor attention is positively correlated with crash risk, but this correlation is negative when the crash risk is high (above median). The results hold for a variety of crash risk measures, alternative internet searches and a panel of significant cryptocurrencies besides Bitcoin.
We expect this study to contribute to the body of knowledge in several ways. First, the studies that have looked at the relationship between investor attention and the first two moments (i.e. returns and volatility) have mostly used static linear models that assume that the strength of the relationship does not change over time. We adopt a time-varying regression framework to unveil the evolution of the strength of the relationship between investor attention and realised skewness to capture the effects of external shocks distributed across time. Second, the majority of the studies have rather used other proxies of investor attention like Google Search Volume (e.g. Smales 2022b). Capitalising on recent results showing that Wikipedia can be used as a proxy for the overall attention on the web (ElBahrawy et al. 2019), our analysis relies on data from the popular online encyclopaedia. Empirically, we utilise the concept of realised skewness of Amaya et al. (2015) using high-frequency data gathered at 1-min intervals to construct daily realised skewness measures. We then empirically test whether Wikipedia searches have a time-varying association with realised skewness. Our findings show that the association between realised skewness and investor attention is not static but is time-varying. Notably, increases in investor attention are associated with increases in realised skewness, providing evidence of the potential of investor attention in contributing to the positive realised skewness of cryptocurrency returns.
Our study proceeds as follows; "Methodology" section presents the methodology; "Results" section outlines the results and the discussion thereof while "Conclusion" section concludes.

Data
Our study is based on a panel of four cryptocurrencies namely, Bitcoin (BTC), Litecoin (LTC), Ethereum (ETH) and XRP (XRP). Our choice for these cryptocurrencies is motivated by their market capitalisation and liquidity and most importantly the availability of 1-min interval data. The closing intraday (1-min) crypto prices are sourced from https:// www. crypt odata downl oad. com/ (Binance Exchange) and Wikipedia search volumes are gathered from Wikipedia. The sample period starts from 11 January 2020 to 22 June 2022. The starting date of the sample period is motivated by the availability of minute-level data for all the sampled cryptocurrencies.

Variables
We used Wikipedia daily page views (Wiki) for each cryptocurrency as a proxy for investor attention. In line with related literature, we utilise specific search terms for each cryptocurrency, like "Bitcoin" and "Ethereum". In our empirical analyses, we also control for market uncertainty using three different measures in line with Smales (2022b). The first variable in that regard is the economic policy uncertainty (EPU) index of Baker et al. (2016). The index is constructed from the occurrence of newspaper articles that contain three terms namely, uncertainty, economy and policy. We also control for the CBOE S&P500 implied volatility index (VIX) as a proxy for financial market uncertainty. A recent proxy for uncertainty, the cryptocurrency uncertainty index (UCRY) specifically directed at cryptocurrency, was created by Lucey et al. (2022) following Baker et al. (2016). We also control for this index which is created from news associated with uncertainty in the cryptocurrency market. We also control for the logarithm of trading volume (vol) and cryptocurrency return (ret), which is the percentage change in the daily closing price of each cryptocurrency.
The ith intraday log return for each cryptocurrency on day t is defined as: (1) where P is the price of the cryptocurrency and N is the number of return observations in a trading day.
In line with Amaya et al. (2015), we define ex-post realised daily skewness based on intraday returns standardised by realised variance as follows:

Empirical method
The main thrust of this study is to reveal the dynamic relationship between investor attention and realised skewness. However, before running the time-varying model, for comparison, we first attempt to find a static relationship between the variables. To guard against multicollinearity in the variables, overfitting and ensuring a robust predictive model, we use Principal Component Regression (PCR) (Jolliffe 2002). In the PCR, the predictor variables are standardised to have a mean of zero and a standard deviation of one. This ensures that all variables are on the same scale. A principal component analysis (PCA) is then performed on the standardised predictor variables to generate a set of orthogonal principal components, Z1, Z2, …, Zq, where q is the number of principal components selected for inclusion in the PCR model. A linear regression model is then constructed using the selected principal components as predictors as follows: where Rskew t+1 is the realised skewness at time t + 1 ; Wiki i,t is the logarithm of investor attention (Wiki) of cryptocurrency i at time t ; Rskew t is the realised skewness of cryptocurrency i at time t ; Vol t is the logarithm of the cryptocurrency i ′ s traded volume at time t ; UTY j,t is the logarithm of j uncertainty measures (EPU, VIX and UCRY).
The choice of the control variables is motivated by existing literature (e.g. Smales 2022a, b) which has detailed determinants of cryptocurrency prices. In further analysis, we examine the dynamics of the coefficients of the relationship depicted in Eq.
(3) using the time-varying panel linear model (tv-PLM) of Casas and Fernandez-Casal (2019). The evolving coefficients in time are obtained by combining Ordinary Least Squares and the local polynomial kernel estimator (Fan and Gijbels 1996). We propose to use a model that uses the Nadaraya-Watson estimator as described in Casas and Fernandez-Casal (2019). A time-varying coefficient linear model is given as follows: where the error term is as defined for the classical ordinary least squares, but the coefficients are varying and are functions of the variable z t , the smoothing variable. (2) This means that z t = 0 z t , 1 (z t , … , d (z t )) T vary in time-space. Assuming that (.) is twice differentiable, an approximation of the (z t ) around z is given by the Taylor rule, z t ≈ (z) + (z) (1) z t − z , where (z) (1) = d (z)∕dz is its first derivative. The estimates resolve the following minimisation: which fits a set of weighted local regressions with an optimally chosen window size given by the bandwidth b and the weights are given by for a kernel function K(.) . This further analysis will provide evidence of whether the relationship between investor attention and realised skewness is constant or is heterogenous in a time domain. Table 1 shows the summary statistics of the raw variables before being converted to their principal components. Table 1 shows the summary statistics of the variables used in the study. The mean of the realised skewness is negative, an indicator that on average the return distribution is negatively skewed. The Jarque Bera statistics are all statistically significant at the 1% level, showing that the null hypothesis of a normal distribution can be rejected, and it can be concluded that the variables are not normally distributed. This justifies methods like the time-varying ordinary least squares used in this study, which is semi-parametric. The ADF statistics are all statistically significant at the 1% level, showing that the null hypothesis of a unit root can be rejected, and it can therefore be concluded that the variables are all stationary. In Fig. 1, we present the pairwise relationships among the raw variables. The strength and direction of the relationships between the variables are visually represented in the correlogram. Upon examination of the pairwise relationships, it is evident that most of the variables display a significant and moderate correlation with one another. However, some variables exhibit a stronger correlation than others. In particular, the strongest pairwise correlation observed in this study is between Wikipedia searches and trading volume, with a correlation coefficient of 0.767. Regarding the focus of this study on the correlation between investor attention and realised skewness, it is worth noting that the relationship between these two variables is positive and statistically significant at the 5% level, with a correlation coefficient of 0.067. Fig. 1 Correlogram. Notes The figure shows a visualisation of the relationships among the variables. The variables Vol, RSkew, UCRY, VIX, Wiki and return, respectively, represent the natural logarithm of the volume of cryptocurrency traded in US dollars, Realised skewness, VIX, Wikipedia searches and cryptocurrency returns. The lower triangle shows the pairwise scatter plots, the diagonal shows the frequency density and the upper triangle shows the Pearson correlation coefficients. ***, ** and * shows statistical significance at the 1%, 5% and 10%, respectively,

Empirical results
In this section, we report on the empirical findings on the relationship between realised skewness and the two proxies of investor attention. First, we report the findings from the baseline model using PCR to establish the static relationship between the variables in Table 2. Table 2 presents the results of analysing the relationship between investor attention and the realised skewness of the following day in Model 1. The findings in Model 1 show a statistically significant positive relationship between these two variables. On the other hand, Model 2 shows a statistically insignificant contemporaneous relationship between investor attention and realised skewness. This implies that though investor attention has predictive power to predict the next day's realised skewness, there is no evidence of the two variables comoving in the same period. It is important to note that though there is a statistically significant predictive relationship between investor attention and realised skewness, this relationship may be dynamic and may vary over time. To address this, the study utilises a time-varying panel linear model (tv-PLM) to examine the changing relationship between investor attention and realised skewness. The results of this model are presented in Fig. 1, which provides a more comprehensive understanding of the dynamics of the relationship between these two variables over time. In this table, the dependent variable is the next day's realised skewness (Model 1) and today's realised skewness (Model 2) ***, ** and * represent statistical significance at the 1%, 5% and 10%, respectively. t statistics are shown in brackets  Figure 2 demonstrates the time-varying relationship between realised skewness and investor attention. The first panel of Fig. 2 shows the relationship between today's investor attention and the next day's realised skewness while the second panel shows the contemporaneous relationship between the two variables. In the first panel, Wiki is statistically significant and entirely positive throughout the period. This could be interpreted as evidence that increases in Wikipedia attention may lead to an increase in the probability of observing more extreme values in the next day's data, which would result in a positive skewness. This is in line with Yao et al. (2019), who reports that trading behaviour after attention may be the main functional channel of individual investor gambling behaviour. The Wiki coefficients are high at the beginning of the sample period, reach a minimum at the beginning of 2021 and rise again thereafter-before stabilising at the end of the sample period. We can therefore infer that the relationship between investor attention and realised skewness is not static but is time-varying and seems to respond to external shocks at specific periods (Ben El Hadj Said and Slim 2022).
Second, we observe wider 95% confidence bands at the beginning of the sample period, which narrow as we approach the end of the sample period. The width of the confidence bands is an indication of the uncertainty or variability of the estimated relationship (Caldara et al. 2016) between Wikipedia attention and realised skewness. A wider confidence interval suggests a greater degree of uncertainty in the estimated relationship, while a narrower confidence interval indicates more confidence in the estimated relationship. The fact that the confidence bands are wider at the beginning of the sample period may indicate that the relationship between Wikipedia attention and realised skewness was less well defined or more variable during that time. Since wider confidence bands coincide with the early stages of the COVID-19 pandemic, the associated uncertainty during this period could explain the variability in the coefficients.
The second panel of Fig. 2 shows the time-varying contemporaneous relationship between Wikipedia attention and realised skewness. The relationship is positive at the beginning of the period before it turns into a negative region towards the end of the sample period. However, the time-varying Wikipedia coefficients are statistically insignificant at the 5% level of significance across the sample period. The lack of a significant contemporaneous relationship between Wikipedia attention and realised skewness on the same day may indicate that there is a lag or delay between changes in Wikipedia attention and the corresponding effects on realised skewness. Alternatively, it could suggest that other factors have a more immediate effect on the distribution of the data, such as changes in market conditions or other external events.
The time-varying association between investor attention as measured by Wikipedia searches and realised skewness show that the relationship is not static but varies in a time-space. However, one significant drawback of utilising Wikipedia to gauge interest is the difficulty in differentiating between interest stemming from positive or negative events (Kristoufek 2013). Thus, during a period of heightened uncertainty, increased attention on financial assets may stem from a combination of positive and negative feedback that may lead to a neutral effect on financial assets.

Conclusion
The study investigated the relationship between Wikipedia attention and realised skewness for four cryptocurrencies using daily data. The study calculated realised skewness daily using one-minute data and measured Wikipedia attention as a proxy for investor interest or attention. The study found a positive time-varying relationship between Wikipedia attention and realised skewness for the four cryptocurrencies analysed. Specifically, the results indicate that increases in Wikipedia attention on a given day are associated with higher realised skewness on the following day.
However, there was no significant contemporaneous relationship between Wikipedia attention and realised skewness. The fact that the 95% confidence bands are wider at the beginning of the sample period and narrow over time may suggest that the relationship between Wikipedia attention and realised skewness becomes more stable or well defined as more data become available. Overall, the study's findings could provide useful insights for policymakers and regulators who are interested in monitoring and regulating the rapidly evolving cryptocurrency market. However, it is important to note that the results are based on a specific dataset and may not be generalisable to other cryptocurrencies or periods. Further research would be needed to confirm and extend these findings.
Funding Open access funding provided by University of the Witwatersrand.

Conflict of interest
The authors whose names are listed immediately below certify that they have NO affiliations with or involvement in any organisation or entity with any financial interest (such as honoraria; educational grants; participation in speakers' bureaus; membership, employment, consultancies, stock ownership or other equity interest; and expert testimony or patent-licencing arrangements) or non-financial interest (such as personal or professional relationships, affiliations, knowledge or beliefs) in the subject matter or materials discussed in this manuscript.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http:// creat iveco mmons. org/ licen ses/ by/4. 0/.