1 Introduction

Retail trading has soared since the beginning of 2020, stimulated by stay-at-home-advisory mandates and lack of other entertainment options on the one hand and new, accessible app-based brokers without order fees on the other hand. Citing data from web data provider SimilarWeb, Ozik et al. (2021) report a 115% increase in traffic in Q2/2020 for Robinhood, the most popular app-based retail broker along with large increases for other retail brokers. Investment firm Citadel estimates that retail traders more than doubled their share of stock-market activity in H1 2020 compared to 2019, accounting for more than 20% of trade volume with an even sharper rise of retail activity in the option marketFootnote 1 At the same time, sharing investment opinions in online communities has become a common phenomenon in recent years. While earlier social-media platforms with a focus on investments or stock markets mostly appealed to relatively few, sophisticated users and investment-specific data from general social-media platforms like Twitter or Facebook is sparse and difficult to filter, the WallStreetBets community on Reddit rose to almost 10 million users in early 2021 and contains more than 40 million user posts in our sample from the start of 2020 to March 2021, enabling us to analyze the relationship between posts and trading activity in 30-minute windows.

Among these posts, more than 4 million can be attributed to a single company whose share price rose more than 20-fold in January 2021 without any fundamental news release or discernible economic reason: GameStop. While several other factors, which are partially discussed in Sect. 3.1, may have contributed to the surge or could have ignited the initial interest in this specific company, media reports and online discussion suggest that WallStreetBets played a crucial role in this situation. The chair of the SEC, Gary Gensler, acknowledged that “this winter’s events also highlighted the rapidly changing face of social media and its intersection with our capital markets.”Footnote 2 However, the nature of the relationship between social media posts and trading activity is less clear than it seems and data availability and noisiness problems are complicating the scientific and forensic examination of events. As high-frequency empirical data about how modern social media platforms affect trading activity is still scarce, the recent situation around GameStop is uniquely suited to analyze whether prior results can be transferred to this new high-volume social media platform in a stock market environment with increased volatility and retail participation.

While there is a breadth of literature showing that online activity can have an impact on the stock market (e.g., Antweiler and Frank 2004, with one of the first studies of a large social media dataset, showing a positive relationship between daily Yahoo message board comments and trading volume), the mechanism behind this relationship is still a much-discussed topic. Alternative explanations about how Internet postings affect individual’s decision-making include i) effects of general sentiment and disagreement (e.g., Tetlock 2007; García 2013; Cookson and Niessner 2020; Guégan and Renault 2021), ii) more specific social interaction and social transmission bias (e.g., Hirshleifer 2020; Cookson et al. 2021) or iii) the effect of investor attention itself, independent of the impact on beliefs and sentiment (e.g., Barber and Odean 2008; Da et al. 2011). Multiple recent working papers try to shed light on different aspects of the GameStop surge and disentangle possible effects (e.g., Long et al. (2021) and Umar et al. (2021) with a focus on sentiment; Hasso et al. (2021) and Pedersen (2021) on investor types and social interaction or Vasileiou et al. (2021) on web searches and attention); however empirical evidence of an intraday effect of Reddit comments on trading activity is still lacking.

Our contribution to the existing literature is twofold: First, our study centers on the question whether social media posts are indeed a short-term driver of (retail) stock trading activity and we can confirm previously observed relationships in a dynamic, highly unusual market environment. We also extend the existing evidence by adding option-based volume measures to our analyses, which, to the best of our knowledge, hasn’t been done before in this context as retail option trading is a fairly recent phenomenon. Second, we try to disentangle the effect on retail and on other trades and examine the informativeness of Reddit posts for future price changes. The informativeness of online posts and retail trading is still a controversial topic among researchers with conflicting results in different settings. Using the raw number of comments without regard for their content or tone, we add to the attention-based strand of literature on the relationship between social media and trading activity. Regardless the direction and exploitability of trading activity and pricing changes, the existence of a direct link between social media activity and stock buying is at odds with conventional economic theory and the efficient market hypothesis. This is especially apparent in the case of GameStop, where there was no confounding dissemination of material new information and a specific group of investors bought stocks and options at valuation levels, which far exceed any reasonable fair value. Our results show that an increase in Reddit posts on GameStop is followed by a significant increase in the GameStop trading volume for both, stocks and options in the following 30-minute window. This effect is economically large, robust over multiple specifications and found for both, the overall volume and volume of retail investors. However, we are not able to reliably discern between the effect on retail and other trades using multiple criteria to classify retail trades. Additionally, we cannot establish causality beyond reasonable doubt, as the effect is not one-directional and the logarithmic rise of comments and trade activity in the sample poses challenges for empirical analyses.

In further results, we don’t find that Reddit posts are informative for GameStop returns. There is no significant relationship between Reddit posts and realized abnormal returns in the following 30-min window. We also find no significant relationship to marketable retail order imbalances as defined by Boehmer et al. (2021). This confirms e.g., the results by Antweiler and Frank (2004) and validates the result of Ammann and Schaub (2021), that “mainly followers who are typically considered to be unsophisticated [...] trade after comment postings” on a recent, large-sample case study.

The remainder of this paper is organized as follows: Sect. 2 contains a brief literature review, first elaborating on possible different mechanisms explaining the impact of online activity on stock and option trading in Sect. 2.1 before reviewing previous results in Sect. 2.2. Section 3 introduces the institutional setting that makes our analysis relevant, with an overview of Reddit and WallStreetBets as a specialized but widely used social media platform in Sect. 3.1 and a perspective on the enormous growth of Retail trading in 2020 and 2021 in Sect. 3.2. Section 4 then explains data and methodology, before empirical results are presented and discussed in Sect. 5. Finally, Sect. 6 provides some directions for future research and concludes with a summary.

2 The effect of social media posts on trading activity and returns

The possible effect of online activity on asset prices and whether social media activity can contain information for the formation of prices is the topic of many studies in the area after De Long et al. (1989) coined the term “noise trading” and Daniel et al. (2002) established that psychological biases affect investor behavior and prices and concluded that “if investors are foolishly aggressive in their trading, they may earn higher rewards for [...] exploiting information signals more aggressively.” Following them, Antweiler and Frank (2004) then showed empirically that a relationship between online activity and trading volume can be demonstrated, using a dataset of Yahoo! Finance and Raging Bull comments. Since then, the mechanisms through which online communities impact (retail) investor behavior are an important topic in economic research and are well-discussed.

2.1 Possible explanations for a relationship between online and trading activity

The suggested mechanisms that affect the relationship of social media activity and trading activity by altering the decision-making of individual market participants can mainly be summarized into three strands of literature:

First, approaches that focus on investor Sentiment, including the tone, heterogeneous beliefs and also disagreement. Historically most research focused on price effects of indirectly measured, “top-down” sentiment (e.g., Baker and Wurgler 2007; Tetlock 2007; García 2013; Kumar et al. 2020, finding for example that tone in newspaper articles can lead to pricing anomalies and results that are “are consistent with theoretical models of noise and liquidity traders”). Related to our case study, Long et al. (2021) classify Reddit posts into sentiment categories and find that “both tone and number of comments influence GameStop intraday returns.” However, these effects are elusive and data suggests that textual sentiment classification into emotion-based categories is very challenging for WallStreetBets posts, as these contain a lot of different slang, memes and emoticons which are barely understandable for uninitiated readers or parsers.Footnote 3 More recent research also focuses on “bottom-up” investor sentiment, demonstrating how individual sentiment can affect trading activity (e.g., Cookson and Niessner 2020, who show that disagreement on an online platform can have an impact on trading volume).

Second and related to the “bottom-up” sentiment approach, there is the recent strand of Social Interaction, going beyond the cursory and visible measures often used for sentiment analysis. For example, Heimer (2016) explains the disposition effect with social interaction and peer pressure. Hirshleifer (2020) introduced the term “social transmission bias” that offers an endogenous social explanation for “action booms, price bubbles, and swings in investor sentiment” in contrast to the exogenous explanation of most sentiment literature. One recent example is the “echo chambers” on the StockTwits social media platform analyzed by Cookson et al. (2021). We think that the existence of similar echo chambers on WallStreetBets is likely, but impossible to empirically validate in our GameStop case study.

Thus, we concentrate on the third explanation for the relationship of social media and trading activity: Investor Attention. Barber and Odean (2008) showed that individual investors are more likely to buy attention-grabbing stocks, employing a range of proxies like exceptional volume, returns or news coverage. Da et al. (2011) extended this approach using the Google search volume index. They argue that an impact on investor beliefs or sentiment is not necessary to drive retail trading volume. Ben-Rephael et al. (2017) further validate this hypothesis, showing a similar effect for institutional investors as well. More recently, Ammann and Schaub (2021) find empirical evidence for a significant correlation between public Internet postings by traders and the investment activity of followers. However, they also show, “that it is mainly followers who are typically considered to be unsophisticated who trade after comment postings.” Related to our empirical choices and concentration of raw post count as measure of attention, Behrendt and Schmidt (2018) find that the amount of social media posts yields better results compared to sentiment measures, using a Twitter dataset, fitting to our conjecture of an attention-based mechanism.

2.2 Prior results on retail and online activity-induced trading and returns

While many studies find some significant relationship between online activity and trading volume, published results about the impact of investor attention and sentiment on future returns and the informativeness of online activity-induced trading are not very consistent. Whereas some published studies, at least partially, document a significant relation between online posts and returns (e.g., Chen et al. 2014; Avery et al. 2016; Crawford et al. 2017; Bradley et al. 2021, , the latter on a small subset of Reddit WallStreetBets posts), other researchers cannot confirm a significant relationship in that regard (e.g., Tumarkin and Whitelaw 2001; Dewally 2003; Antweiler and Frank 2004; Kim and Kim 2014; Behrendt and Schmidt 2018; Nisar and Yeung 2018; Ammann and Schaub 2021), which mirrors the results of studies surveying the effect of media news on stock returns (e.g., Campbell et al. 2012; García 2013; Ahmad et al. 2016).

With regards to retail trading in general, the empirical evidence on the predictability of future stock returns is mixed as well. While many early studies in this strand of literature such as Barber et al. (2006) find that individual investors trading provide no information for equity markets and prices and individual investors often achieve negative returns (e.g., Barber and Odean 2000), more recent studies such as Kelley and Tetlock (2013), Barrot et al. (2016) or Boehmer et al. (2021) find that retail investor trading can be informative and/or predict the cross section of future stock returns.

Fittingly for the GameStop scenario, Han and Kumar (2013) find empirical evidence that retail investors in contrast to institutional investors prefer “stocks with high volatility, high skewness and low prices.” In addition, the authors document that retail traders that prefer lottery stocks are often younger and male and have a lower income and lower education compared to other investors, as well as a strong gambling propensity (e.g., Kumar and Lee 2006, come to a similar conclusion), which matches well with commonly observable behavior of WallStreetBets users and could be an additional explanation for the lack of informativeness of comments.

3 Institutional setting: WallStreetBets, GameStop and the surge of retail trading

3.1 WallStreetBets and GameStop

Reddit is a social media platform that was founded in 2005. Like other social media platforms, contributors are able to post content which can then be commented on by other users. Reddit is a collection of forums, which are called subreddits and where each subreddit is dedicated to a specific topic. WallStreetBets, which is now one of Reddit’s largest subreddits with more than 11 million subscribers, was created in 2012 and focuses on speculative equity trading.Footnote 4 As speculative trading and “gambling” is emphasized, it is reasonable to assume that retail trading activity originating from WallStreetBets may exhibit different characteristics than retail trading activity from more conventional sources of investment advice or discussion.

However, due to this explicit focus on speculative trading, the high probability that active WallStreetBets users indeed engage in stock and option tradingFootnote 5 and the high volume of posts that often concentrates on a few feverishly discussed stocks, WallStreetBets recently became a valuable data source for the analysis of retail investor behavior (see e.g., Long et al. 2021; Bradley et al. 2021) and we regard the platform as uniquely suited for our research.

Starting in 2020, users of the message board WallStreetBets on Reddit turned their eye on the stock of struggling video game retail company GameStop. While only a few users discussed the stock at firstFootnote 6, hundreds and thousands of retail investors joined them in early 2021, when the GameStop stock surged due to the expectation of an imminent short squeeze. While GameStop opened in 2021 on January 4th with a price of $19.03, the closing price on January 21th was already $43.03, an increase of more than 100%, without any new fundamental information released by the company in the meantime. However, the real surge had barely started by then: In the following five trading days, the share price increased 10-fold again and reached a top of $483 in the morning of Thursday, January 28th, before major brokers disabled market participants ability to open new or increase existing positions in GameStop. Huge losses of GameStop-shorting hedge funds, margin calls of unprecedented size and the failure of established risk management models led the CEO of Interactive Brokers, one of the biggest American brokers to the statement that “we have come dangerously close to the collapse of the entire system and the public seems to be completely unaware of that, including Congress and the regulators”Footnote 7.

3.2 Surge of retail trading in 2020 and 2021

A reason for the relevance of our contribution despite the large amount of existing literature on the relationship between social media activity and stock trading is the strong increase in retail trading that started in 2020 and which is partially attributable to the COVID-19 pandemic, which limited many day-to-day (out-of-home) activities for a big part of 2020 and 2021.

Analysts from Credit Suisse estimated in February 2021 that retail trading as a share of overall market activity had nearly doubled from between 15% and 18% to over 30% since the start of 2020.Footnote 8 An additional reason for increased retail trading has been the decision of many retail-focused US brokers to drop trading fees in the fall of 2019. Robinhood, an app-based broker with more than 3 million app downloads in January 2021Footnote 9, is the most notable of a new kind of brokers which gamify trades and make stock and option trading available to a new demographic that also exhibits high social media affinity and activity. Ozik et al. (2021) confirm that “access to financial markets facilitated by fintech innovations to trading platforms, along with ample free time, are significant determinants of retail-investor stock-market participation.” The importance of this development for market mechanics is underscored by van der Beck and Jaunin’s (2021) finding that “despite their negligible market share of 0.2% [...] Robinhood traders account for 10% of the cross-sectional variation in stock returns during the second quarter of 2020.”

This and other results underscore that the recent surge in retail trading, combined with the enormous growth of WallStreetBets, likely also affects the relationship between social media online activity and stock and option trading. Our goal is to provide updated evidence on that relationship and add a timely new angle to the existing literature.

4 Data and methodology

4.1 Data sources

We collect all WallStreetBets posts from the start of 2020 to the end of February 2021, in total more than 40 million, thereof 33.2 million in our trading hours-sample. For posts starting in January 2021 we got the posts directly from Reddit’s streaming API and for posts in 2020 we used the unofficial Pushshift API which ingests all Reddit posts (see Baumgartner et al. 2020, for more information on the Pushshift dataset and API). Comments are then sorted into GameStop-related and non-GameStop-related comments by the following procedure:

  1. 1.

    A comment is GameStop-related if “GameStop” or its ticker “GME” are mentioned in the comment.

  2. 2.

    If neither GameStop nor another stock symbol is mentioned in a comment, we search iteratively in the parent comment or post for the mention of a stock symbol or company name and classify the post as GameStop-related if the first parent stock symbol or company name mention is related to GameStop.

In total, we find that 4 million or about 12% of the comments in our sample period are GameStop-related with the share increasing from below 1% for most of the first half of 2020 to 60% at the peak end of January 2021.

We obtain consolidated stock TAQ trade data for GameStop from Interactive Brokers and consolidated option TAQ data from IVolatility. In our sample, the stock price of GameStop oscillated from a low of $2.57 on 2020/4/3 to a high of $508.02 on 2021/1/28. There were on average 9.8 million GameStop shares and 115.500 GameStop options (corresponding to 11.55 million shares) traded per day.

Fig. 1
figure 1

Logarithmic Chart of GameStop’s Stock Price, Traded Volume and Reddit WSB Comments. The Figure summarizes the development of GameStop’ stock price, the trading volume and the comments on WallStreetBets from January 2020 to March 2021

Figure 1 gives an overview of the development of GameStop’s share price, trading volume and GameStop-related WallStreetBets comments. Due to the enormous surge in the share price, as well as volume and comments, we use a logarithmic scale on the y-axis for all three measures. This strong trend introduces challenges for the empirical analysis, as it may cause spurious results with the strong growth clouding more nuanced relationships between variables. Thus, we employ multiple differencing and deseasonalizing approaches explained in Sect. 4.3 to check robustness of our results.

Table 1 Summary Statistics Raw Data

Summary statistics on our raw data can be found in Table 1. We divide our sample period into 3768 30-min windows with each full trading day containing 13 of these windows (we follow e.g., Sun et al. 2016; Gao et al. 2018; Farrell et al. 2020, with the use of 30-min intraday windows). The first three rows show statistics related to the trading volume. TRD denotes the number of trades in each 30-min window; the dataset includes 31.5 million trades in total with an average of 8,360 during each 30-min period. As almost all our variables, this is heavily skewed due to the logarithmic rise of GameStop’s price and trading volume in early 2021, evidenced by the large difference between mean and median (1,280 trades). The next variable, Volume describes the number of shares traded, with a mean of 819,218 shares (which equals about 100 shares per trade on average) and a total of 3.2 billion shares traded. The turnover, TO is calculated as trade size*price and more important for our analysis than in comparable settings due to the huge price changes. Whereas turnover and volume are almost exchangeable for empirical analysis in most settings, the huge short-term price swings of the GameStop shares cause these variables to drift apart considerably. The average turnover is 49.0 million USD which equals a total turnover of 184.7 billion USD.

The following three rows with a preceding O show the same measures for GameStop-related option trading. We have a total of 7.2 million option trades with 34.5 million traded options in total and a turnover of 40.1 billion USD in our sample. The average option turnover in each 30-min window is 10.1 million USD, about one-fifth of the stock turnover. As a myriad of different options are traded and the option price depends on several parameters, there is no direct connection between volume and (total) turnover for options in the aggregated dataset.

In the last four rows, summary statistics for WallStreetBets are shown. \(RC_{ALL}\) contains measure for all comments while GameStop-specific comments are denoted as (\(RC_{GME}\)). The last two rows (TH) limit comments to our trading hours sample, with a total of 12.0 million comments on our sample, thereof 1.9 million related to GameStop. On average there are 3,260 comments in each 30-minute window with 508 or almost 16% on GameStop; however this is also heavily skewed towards the end of our sample period.

4.2 Variables used in the empirical analysis

Table 2 gives summary statistics on the variables used in our empirical analysis. As we do not conduct a broad cross section analysis but a case study focused on a specific stock with a highly unusual and very dynamic trading pattern and price and volume development, a single measure is not sufficient to shed light on the connection between Reddit posts and trading activity. Thus, we include multiple measures for (retail) trading activity and volume in our analysis and use a variety of control variables.

Table 2 Summary statistics regression variables

The first five columns include the Reddit comments on GameStop (lRC) and four different measures for general stock and option volume, as introduced in Sect. 4.1. In the second segment of Table 2, variables relating specifically to the retail stock and option turnover are given. As these measures contain only the turnover of trades classified as retail trades by different proxies, they are always subsets of the total stock or option turnover and thus smaller than lTO, respectively, lOTO. The retail classifications are defined as follows:

  1. 1.

    lRTO(OL) classifies all odd-lot trades as retail trades:

    $$\begin{aligned} lRTO(OL)=log(Turnover\; where\; Volume\;mod\;100 != 0) \end{aligned}$$

    This identification is one of the oldest and most established ones and follows e.g., Dyl and Maberly (1992). However, more recently O’Hara et al. (2014) and others warned that odd-lot trading, while still often used by retail traders, is increasingly caused by high frequency or algorithmic traders. As these kinds of traders are less likely in very-high volatility environments like GameStop in our sample period and odd-lot trading is still widely used as a proxy for retail trading, we incorporate odd-lots in our analysis.

  2. 2.

    The second measure, lRTO(ST), refers to small trades as main criterion. Here we follow e.g., Barber et al. (2006) and Han and Kumar (2013), who use a trade size of $ 5,000 as cut-off value for a classification as retail trade.

    $$\begin{aligned} lRTO(ST)=log(Turnover\; where\; Turnover<5000\,\, USD) \end{aligned}$$

    Han and Kumar (2013) confirm that their definition “closely captures the preferences and trading activities of retail investors” by comparing it with “actual retail holdings and trading data from a broker.” However, similar limitations as for lRTO(OL) apply.

  3. 3.

    Our third measure lROTO(OC) is option-based and thus avoids many of the shortfalls of lRTO(OL) and lRTO(OL) as automated trading is less prevalent on the option market, mainly due to lower liquidity and bigger spreads. It is based on the observation that retail traders mostly trade single option contracts (one contract corresponds to 100 shares) while institutional traders who use options e.g., to hedge positions rarely trade single contracts:

    $$\begin{aligned} lROTO(OC)=log(Option\;Turnover\; where\; Volume=1) \end{aligned}$$

    Retail option trading is a novel phenomenon and we are—to our knowledge—the first to introduce this option-based measure. While Battalio et al. (2004) already wrote that they “examine one-contract trades separately to isolate retail orders more confidently,” a more recent cross-sectional analysis is still missing in the scientific literature. However, research by brokers, e.g., Goldman Sachs, shows that retail option trading and especially one-contract trading increased sharply since the beginning of 2020 with one-contract trades now accounting for 13% of total option volume and even more for popular stocks.Footnote 10 Our results indicate that there is indeed a significant relationship between Reddit posts and one-contract option turnover; however broader cross-sectional research is needed to confirm our GameStop-focused results and confirm whether this measure can be generalized to identify retail trading.

  4. 4.

    Finally, lRTO(MR) identifies marketable retail orders as laid out by Boehmer et al. (2021):

    $$\begin{aligned} lRTO(MR)= & {} log(Turnover\; where\; (Price\mod \;0.01)\;\\&*100\; in\; ]0,0.4]\; or\; [0.6,1[\;) \end{aligned}$$

    Trades are classified as retail trades if the TAQ data indicates that they have been reported through a FINRA-facility and are priced just below a round penny (fraction of a cent between 0.6 and 1) or just above a round penny (fraction of a cent between 0 and 0.4). While this classification captures retail trades reliably due to the regulatory rules around sub-penny price improvements and the increasing internalization of orders by retail brokers, it omits limit trades which are not marketable and all trades that are routed to exchanges.

The RTP measures in the third section then denote the respective Retail Trading Proportion (retail turnover divided through total turnover). While the non-option-based measures indicate retail trading proportions of 30–52%, the option-based measure only shows a mean retail trading proportion of 9%. While these measures are calculated as a fraction, their development over time shows a clear trend and indicates non-stationarity.

Fig. 2
figure 2

Daily Development of RTP, measured as Proportion of Trade Volume (Rolling Average over 5 Days). The Figure summarizes the development of the Retail Trading Proportion in GameStop shares and options according to different proxies introduced in Sect. 4.2 from January 2020 to March 2021

In Fig. 2, one can see that most measures slowly increase in the second half of 2020 and peak end of January 2021, with the increases most pronounced for RTP(OL), while RPT(MR) changes least. The decrease for RTP(ST) is expected, as this measure is directly and inversely related to the price of the shares.

The last section contains return and volatility-related variables that are used for the analysis of informativeness and as control variables. R is the log return of GameStop shares during one 30-minute window and |R| the log absolute return. The AR variables denote the abnormal return of GameStop shares and are calculated using the simple market model:

$$\begin{aligned} AR_{GME,t}=R_{GME,t}-(\beta _{GME,t}R_{m,t}-R_{rf,t}) \end{aligned}$$

For the first variant (\(AR(\beta _{ges})\)), beta is calculated for our whole sample and equates to approximately 1.1. As this estimate seems to be very low given the huge price fluctuations, we also calculate \(AR(\beta _{30})\) and \(AR(\beta _{7})\) with the rolling beta over the previous 30, respectively, 7 days (in calendar time). While the 7-day beta for example swings between a minimum of -20 and a maximum of +13, its mean of 1.3, 25% quantile of 0.7 and 75% quantile of 2.5 are in the more usual territory and the summary statistics for abnormal returns indicate only negligible changes. The next variable, mroibvol, is the marketable retail order imbalance introduced by Boehmer et al. (2021), defined as the difference of turnover of marketable retail buy orders and retail sell orders divided through the total turnover of marketable retail orders. IVOL, used as additional control variable, is the idiosyncratic volatility (calculated with the market model on a 7-day rolling window).

4.3 Differencing and deseasonalizing approaches

Due to the aforementioned challenges with non-stationarity and logarithmic increase in comment and trading measures, we apply three different differencing and deseasonalizing approaches to our Reddit comment and trading activity data:

  1. 1.

    First, in our main specification we difference out the variable’s levels with its level from one week prior. That means, when e.g., predicting the log trading volume we don’t e.g., predict the level on Monday, the 25th of January 2021 between 9:30 AM and 10:00 AM but the difference between the log trading volume in that period and the log trading volume one week prior (in this case Monday, 18th of January 2021 between 9:30 AM and 10:00 AM). The same transformation is applied to the Reddit comments and non-stationary control variables (for the preceding 30-min window). This approach succeeds in achieving stationarity (as evidenced by non-significant Intercepts) and at the same time alleviates concerns regarding time trends and seasonality due to daily patterns during our 30-minute windows. This transformation is denoted with WCHG (weekly change) in our results. However, due to the lag of one week, one could argue that short-term trends could still distort results.

  2. 2.

    Thus, we also show results that are differenced out with the preceding 30-minute window, denoted with CHG (change). We suppose that this approach includes a lot of noise due to seasonality of intraday and intraweek returns and Reddit comments but feel that the first difference is the most natural way of measuring changes and thus should be included in the analysis.

  3. 3.

    In addition to both differencing approaches described above, we explicitly try to remove seasonal and trend components in a third specification. For this, we use the seasonal-trend decomposition procedure based on LOESS (locally estimated scatterplot smoothing) introduced by Cleveland et al. (1990). This approach is well suited to split timeseries that contain nonlinearities and jumps into seasonal, trend and residual components. For our transformed timeseries that are denoted with STL, seasonal and trend components are subtracted so that only the residuals remain. While this transformation introduces look-ahead bias and should not be used to claim exploitable market anomalies, it serves us well to remove any short-term trend that could still affect results of the differencing approaches and check the robustness of our previous results. An example of the effect of STL decomposition on log Reddit comments is given in Fig. 3.

Fig. 3
figure 3

STL Decomposition of Log Reddit Comments. The Figure shows the effect of LOESS-based STL decomposition of a timeseries in trend, seasonal and residual components on the example of log WallStreetBets comments in the week from 2021/02/01 to 2021/02/05

In addition to these transformations, we add half-hour and weekday controls as well as a continuous monthly trend control to our regression specifications (13\(\times \)5\(\times \)14).

5 Results

In this section, we discuss the results of our regression analyses for the relationship between Reddit comments on GameStop and the firms general trading volume, the retail trading volume and proportion and excess returns.

5.1 Regression analysis of the effect of comments on trading volume

For our main analysis, we explore the potential predictive effect of Reddit comments on GameStop trading volume using the following regression

$$\begin{aligned} lVolume_{30m,t}= & {} Intercept+\beta _{1}{lVolume_{30m,t-1}}+\beta _{2}{lVolume_{1d,t-1}}\nonumber \\&+\beta _{3}{lRC_{30m,t-1}}+\beta _{4}{lRC_{1d, t-1}}\nonumber \\&+\beta _{5}{|R|_{30m,t-1}}+\beta _{6}{|R|_{1d, t-1}}\nonumber \\&+\beta _{7}{R_{30m,t-1}}+\beta _{8}{R_{1d, t-1}}\nonumber \\&+\beta _{9}{IVOL_{30m,t-1}}+\beta _{10}{IVOL_{1d, t-1}}\nonumber \\&+{Time}+{Trend}+\epsilon \end{aligned}$$
(1)

where the dependent variable lVolume stands for our different variations of trade volume measures (turnover, volume, option turnover, trade count for total trade activity) and the change in the log number of Reddit comments lRC in the preceding 30-minute window as main variable of interest. The dependent variable, its lagged values and the variable of interest are transformed as described in Sect. 4.3. The absolute log return |R|, the log return R and the idiosyncratic volatility IVOL are used as additional control variables on the right-hand side. For all right-hand-side variables, we add controls for the preceding trading day (in addition to the preceding 30-minute window) as well, denoted with 1dFootnote 11. As we are less interested in the results for control variables, we accept some risk of multicollinearity between controls in exchange for higher confidence in the results for our comment-related dependent variables, achieving a higher overall R-squared. For all independent variables, we subtract the mean and divide through the standard deviation to standardize them and make coefficients comparable. In addition, we add half-hour and weekday as well as a continuous monthly trend variable to control for time and trend fixed effects (13\(\times \)5\(\times \)14).

Standard errors are corrected for potential heteroskedasticity and autocorrelation, using the Newey and West (1987) methodology with 13 lags which equals one normal trading day of 30-minute periods and significance is denoted by * for p < 0.1, ** for p < 0.05 and *** for p < 0.01.

Table 3 Effect of Reddit comments on GameStop trading volume—weekly differences

Results for our main specification, showing the relationship between Reddit comments and different trading volume measures applying the weekly differences approach, can be found in Table 3. Coefficients for all volume measures are significantly positive (p < 0.0001) with t-values ranging from 6.76 for the number of trades to 9.17 for the trade volume. Coefficients can be interpreted as follows: the standardized coefficient of \(lRC_{wchg,30m}\) for the log turnover lTO of approximately 0.16 means, that for a one standard deviation change of the log number of Reddit comments on GameStop in a 30-minute window compared to the previous week, the difference in log turnover vs the previous week in the following 30-minute window increases by 0.16. A standard deviation of 1.32 for \(lRC_{wchg,30m}\) equals an increase of \(exp(1.32)-1=272\%\) in the weekly difference of comments that would be followed by a \(exp(0.16)-1=17\%\) weekly increase in turnover in the following 30-minute window. Surprisingly, the effect is largest for option turnover with an increase of 30% for one standard deviation in comment change while the coefficient for the number of trades is considerably smaller, leading to the conclusion that the trade number is less influenced by Reddit comments than total turnover. In contrast to the difference of comments in the preceding 30-min window, the difference of comments on the previous trading day versus one week ago is not significant for the change in trading volume and has even negative coefficients for some variants.

For the control variables, we find that the lagged versions of the respective volume measure have significant positive coefficients in all cases, hinting at autocorrelation for the weekly difference in trading volume. While coefficients and t-values for the previous 30-min window are very high, the trading volume difference on the previous day in the fourth row is less relevant with smaller coefficients and t-values for future trading volume change than the change in Reddit comments in the preceding 30-min window. The only other control variables that are consistently significant predicting the change in trading volume are lagged measures of GameStop’s log return (with the exchange of previous-day return for lVolume). Interestingly, the 30-min-lagged idiosyncratic volatility, which is calculated with market-model abnormal returns for stock prices, does have a significant predictive effect for the difference in GameStop option turnover but not for any of the stock-based turnover measures which could be explained by additional information from stock market activity that is already absorbed for stock-related measures.

R-squared is unsurprisingly lowest for the change in option turnover with 0.39 and reaches 0.70 for the change in the number of trades, demonstrating a high explanatory power of the model.

Results for the 30-min changes in log trading volume in Table 4 mostly confirm the impression of the first results.

Table 4 Effect of Reddit comments on GameStop trading volume - half-hour differences

Differencing volume-related and comment measures against the previous 30-minute window can’t get rid of seasonality and leaves a significant intercept in our model. Coefficients for all trading volume measures are still significantly positive, albeit with considerably smaller coefficients and t-values. In this case, the coefficient of 0.0187 for the effect of a change of one standard deviation in the log number of comments in the previous 30-minute window for the number of trades means that an increase of approximately 90% in Reddit comments compared to the previous 30 minutes predicts a 2% increase in the number of trades in the following 30-min window. While overall small, the effect size and direction confirm prior results from us and others. In contrast, the one-day change of comments on the previous day does not have any significant impact on the following change in trade volume.

Notably, the lagged dependent variable is highly significant for all trading volume measures but shows a negative sign in contrast to the weekly differences. That result has several important implications: First, this could be interpreted as some kind of reversion to the mean (e.g., return to normal trading volume after periods of very high trading volume). Second, this result could also indicate a complex and nonlinear trend profile over different time horizons. While a high change of trading volume versus the volume one week ago predicts a positive weekly change in the following period as well, a positive change of trading volume versus the volume 30 minutes ago predicts a negative 30-min change in the following period. Third, while the sign changes for the impact of lagged trading volume, the sign for the impact of Reddit comments stays positive. This could mean that the attention mechanism is more stable, with an increase in the number of comments preceding an increase in trading volume regardless the time horizon. This also somewhat alleviates concerns of reverse causality (e.g., trading volume spikes that drive comment numbers which in turn increase trading volume again at a later time), as in this case we would expect the signs of both variables to behave similarly. However, despite this interesting result, we cannot argue that this underscores a causal relationship as there are multiple alternative explanations for this behavior.

Change in Idiosyncratic volatility over the 30-min window has a significant negative relationship to trading volume change in the following window for all measures except option turnover, probably also indicating some kind of “cooling-down” effect after short-term volatility spikes. Unsurprisingly, the R-squared is considerably smaller for all variables in Table 4, ranging between 0.19 for option turnover and 0.37 for number of trades.

As a robustness check, Table 5 shows the results for STL decomposition of comments and trading volume as described in Sect. 4.3. Stripped of their trend and seasonal components, we analyze whether the remaining residuals in comment count can explain the residuals of trade volume in the following 30-min window. The results have serious caveats due to lookahead bias, possible interaction effects and loss of valuable information due to the decomposition procedure and should not be interpreted in isolation. However, using them as an additional robustness check to validate previous results and as another, non-conclusive piece of potential evidence for a causal effect of Reddit comments on trading activity seems beneficial, as this setup doesn’t require to take a difference that could introduce additional noise.

Table 5 Effect of Reddit comments on GameStop trading volume - seasonal-trend decomposition

While the coefficients are very small and the t-values between 1.69 and 2.56 are only barely significant for all trade volume measures, the small but consistently positive predictive effect of residual Reddit comments on future residual trade volume confirm the picture of our previous results. Lagged values of trade volume are also significant with considerably higher coefficients in this model, which is somewhat expected when using level values with imperfect decomposition. However, while for Table 3 with weekly differences all lagged versions of the dependent variable were positive and for Table 4 all were negative, in this case the residual trade volume in the preceding 30 minutes has a positive effect on following trade volume while the residual trade volume on the previous day has a negative effect. This further underlines the complex time-dependent autocorrelated profile of the trade volume and strengthens our suspicion of nonlinear behavior.

5.2 Regression analysis of the effect of comments on retail trading volume and proportion

Next, we turn to the trading volume specifically caused by Retail investors. To differentiate between retail and non-retail trades, we use four different proxies described in Sect. 4.2. In addition to the control variables used for previous results in Sect. 5.2, we add total (stock) turnover lTO as additional control variable. The rest of the model remains unchanged.

Table 6 Effect of Reddit Comments on GameStop retail trading volume—weekly differences

Table 6 shows the relationship between weekly differences of Reddit comments and retail turnover in the following 30-min window. In general, the results are very similar to results for the effect of Reddit comments on total turnover. While the coefficient for total turnover in Table 3 was 0.1595 with a t-value of 8.89, coefficients are slightly lower for stock-based retail turnover range from 0.1061 for the turnover of small trades below 5,000 USD (lRTO(ST)) to 0.1578 for FINRA marketable retail orders (lRTO(MR)) with t-values between 7.23 and 7.65. For option-based turnover lROTO(OC) the coefficient halves from 0.2690 to 0.1281 and the t-value decreases from 8.88 to 5.83 when only looking at one-contract trades. However, the slight decrease in effect strength could also be due to the addition of turnover as additional control variable and despite this, the relationship remains consistently positive and significant. With regard to the other control variables, the general picture is similar. Coefficients for the 30-min lagged dependent variable are smaller than in the total turnover model, also likely caused by the addition of total turnover as additional control and resulting collinearity. One notable exception is the previous-day lagged dependent variable—while we had a t-value of 3.21 for its effect on the weekly difference of total turnover in Table 3 (and 5.35 for total option turnover), the maximum t-value for its effect on future retail turnover is failing to reach significance at 1.55 for the stock-based measures and barely reaches significance for retail option turnover at 2.46. The addition of total turnover does not seem to be a reason, as all one-day lagged variants thereof don’t reach t-values greater than 0.35. This result could be an indication that trends for retail turnover are shorter-lived, so that previous-day retail volume has a smaller impact compared to total volume.

R-squared is also similar for retail stock-based turnover and slightly higher for one-contract option turnover compared to total option turnover, increasing from 0.39 to 0.57 (which also could be caused by the addition of stock turnover as control variable).

Results for retail turnover calculated with 30-min differenced variables (Table 11) and STL-decomposed timeseries (Table 12) also mirror respective results for total turnover closely and can be found in the Supplementary Material.

As this approach fails to separate a specific effect of Reddit comments on GameStop retail turnover, we next turn to the proportion of retail turnover to total turnover (RTP for Retail Trading Proportion) as dependent variable. As seen in Sect. 4.2, the retail trading proportion is rising during our sample but as it is calculated as a fraction, it does not exhibit exponential growth. The rest of our model remains unchanged.

Table 7 Effect of Reddit comments on GameStop retail trading proportion—weekly Differences

Trying to differentiate between the effect specifically on retail turnover, the results for the effect of the 1 week difference of log Reddit comments on the weekly difference in the retail trading proportion in the following 30-min window can be found in Table 7. The effect of the weekly change in Reddit comments on the change of the retail trading proportion in the following 30-min window is negative among all our retail proxies with very small coefficients and only significant for the small-trade retail proxy. For the weekly change of Reddit comments one day prior, we find a significant but small coefficient for its impact on the retail trading proportions based on odd-lots and marketable retail orders. There could be several explanations for these results (e.g., institutional turnover increasing faster than retail turnover after an increase in Reddit comments due to better ability to screen social activity and react in real-time or proxies failing to pick up parts of the short-term retail-driven turnover).

However, as also evidenced by the low R-squared values between 0.03 and 0.16, our model fails to explain the change in retail trading proportion and we are not able to find evidence for a significantly different effect of Reddit comments on turnover caused by different kinds of investors.Footnote 12

5.3 Regression analysis of the effect of comments on abnormal returns

While an impact of online activity on future trade volume is a good indicator for the effect of investor attention, a second important question is whether there is an effect of Reddit comments on (short term) abnormal returns, which could be an indicator for the informativeness of Reddit comments. If we find a significant positive effect of the change in the number of Reddit comments on abnormal returns for GameStop in the following 30-minute window, it would be more improbable that Reddit comments only facilitate noise trading with no contribution to price finding. However, please note that our analysis is limited to one 30-minute window and a longer horizon would be needed for a conclusive analysis of informativeness.

Table 8 Effect of Reddit comments on GameStop abnormal return

In Table 8, results of the regression of the weekly difference in log comments on abnormal returns for the following 30-min window are displayed. While the first column is calculated using a whole-sample beta, columns two and three use the 30-day rolling beta, respectively, the 7-day rolling beta. We find no significant effect of the weekly difference in Reddit comments on abnormal returns for all three variations. control variables are also overwhelmingly insignificant, indicating that abnormal returns can’t be predicted with available variables.

In column four, we use the marketable retail order imbalance (mroibvol) as dependent variable. Boehmer et al. (2021) showed that stocks with a positive retail order imbalance outperformed other stocks over one-week horizons and thus a positive relationship could enable us to establish another (indirect) link between Reddit comments and abnormal returns. However, as for abnormal returns, we do not find a significant relationship between Reddit comments and trading activity and almost all coefficients in this model remain small and negative.

Neither the change in the number of comments, nor the remaining control variables seem to have a significant impact on retail order imbalance or directly on abnormal returns. The low explanatory value of this model is also evidenced by the very small R-squared ratios of 0.01 for mroibvol and 0.04 for the excess return variables. This result is consistent with multiple prior studies (e.g., Antweiler and Frank 2004; Ammann and Schaub 2021, and others) which also demonstrated an impact of online activity and attention on trading volume but found that this online activity and the resulting investor attention is not informative.

Thus, investors are not able to exploit the relationship between online activity and trading volume in the following 30-minute window and cannot achieve abnormal returns over that time horizon. While one common explanation for this is simply that Reddit users posting comments lack information and do not add value, we cannot rule out that a nonlinear and complex interdependence between Reddit comments and trading activity or even reverse causality plays a role and prevents a measurable effect. Additionally, we also cannot rule out that informativeness of Reddit comment for abnormal returns can be found over longer time horizons.

5.4 Results of the Granger causality test

As an additional robustness check and due to prevalent autocorrelation and results of bi-directional effects in previous studies, we perform a Granger Causality Test for the effect of weekly changes of Reddit comments on trading volume and vice versa, as well as for the relationship between Reddit comments and the retail trading proportion. We choose 13 lags for the test as a full trading day consists of 13 half-hour windows.

Table 9 Granger Causality test - trading volume

Results for the first test can be found in Table 9. It can be seen that the null hypothesis

$$\begin{aligned} lRC_{wchg} \nrightarrow \, Volume_{wchg} \end{aligned}$$

can be discarded with significance at the 1% level for all specifications, confirming our results of a significant impact of Reddit comments on future GameStop trading volume. However, the mirrored relationship is also significant for all measures with even higher F-statistics. While we did not systematically test for reverse causality with a lagged and controlled model, it seems likely that there is also an effect of trading volume on future Reddit comments, as has also been demonstrated for e.g., Yahoo! Finance comments (see Antweiler and Frank 2004) or Twitter posts (see Behrendt and Schmidt 2018) and others. In this line of thinking, an increase in trading volume would cause investors to divert attention to the stock and discuss it in social media. Whether this effect is completely independent of the effect in the opposite direction that we demonstrated above for WallStreetBets comments and GameStop or whether there is some systematic interaction between both effects remains to be shown in a larger, cross-sectional study.

Table 10 Granger causality test - retail trading proportion

Granger causality for retail trading proportions can be found in Table 10. While the null hypothesis of no significant impact of Reddit comments on the weekly change of retail trading proportion can only be discarded with high confidence for the small-trades and marketable-retail-based variables, this was already expected due to the non-significant results in Table 7. However, with the exception of the small-trades-based measure and in accordance to our interpretation of the results for trading volume, all F-statistics for the opposite effect of retail trading proportion on comments are lower and non-significant. While not very convincing on itself, this result adds a further small piece of evidence to our assumption of a causal impact of Reddit comments on future trading volume, regardless of a possible bi-directional effect.

6 Conclusion

In this article, we use the unique situation that arose around the GameStop share during 2020 and 2021 with highly elevated trading and investor attention to establish a link between social media activity and the trading volume in shares and options of a company.

Due to the unprecedented social media activity on Reddit’s WallStreetBets board with more than four million posts directly related to GameStop in our dataset, our results are not only robust over different specifications of trading volume but also significant for high-frequency intraday data. Finding a significant effect on trading volume but not on abnormal returns, our results confirm earlier studies like (e.g., Tumarkin and Whitelaw 2001; Antweiler and Frank 2004; Kim and Kim 2014; Ammann and Schaub 2021) in a recent high-impact scenario and extend the literature by documenting an impact on option turnover as well. The results are in accordance with an attention-based mechanism that drives trading volume after frequent exposure.

However, our results come with several limitations: We are not able to find a distinct effect of WallStreetBets comments on retail volume or trading proportion specifically, some results indicate a more complex and nonlinear relation over different time horizons and the Granger Causality test also suggests a bi-directional effect of trading volume on Reddit comments. Thus, we cannot establish causality for our findings and alternative explanations for the volume effect are possible as well. A larger cross-sectional study, optimally augmented with broker-sourced, individual-level trade data, would be necessary to confirm our findings, prove a causal effect and disentangle the possible mechanisms in which online comments affect trading activity.