Background

The predictability of stock returns has long been the focus in financial economics with both cross-sectional and time series analysis. Voluminous variables have been put forward to predict the stock returns, including book-to-market ratio (Pontiff and Schall, 1998), inflation rate (Nelson, 1976), dividend-price ratio (Fama and French, 1988) and term structure of the interest rates (Campbell, 1987), etc. As for the predictability of Chinese stock market, Lee and Rui (2000) show that US stock return helps predict returns of Shanghai A and B stocks. Chen et al. (2010) document that only 5 out of 18 firm-specific variables can predict returns in Chinese stock market and attribute this weak predictability to the low informativeness and less heterogeneously distributed of stock returns. Goh et al. (2013) find that US economic variables have statistically significant predictive power for Chinese stock returns after China entering into the WTO. Specifically, the joint variables of China and US economic variables have superior predictive power. Jordan et al. (2014) find that the returns of economically-linked economies can predict the aggregate Chinese stock market. Chen et al. (2016) show that the U.S. economic variables can strongly forecast the monthly volatility of Chinese stock returns.

Recently, scholars begin to employ the information extracted from Internet as the new source of information for financial studies, including the online stock message boards (Tumarkin and Whitelaw, 2001; Antweiler and Frank, 2004), Twitter (Bollen et al., 2011; Zhang et al., 2016a), Google Trends (Da et al., 2011; Da et al., 2015), Baidu Index (Zhang et al., 2013), Baidu News (Zhang et al., 2014; Shen et al., 2016), online stock commentary column (Zhang et al., 2016c) and Sina Weibo (Jin et al., 2016). In particular, Da et al. (2011) show that the search frequency from Google Trends can predict stock returns in the next weeks. Joseph et al. (2011) consider the online ticker search as the proxy for investor sentiment and find that the sentiment can predict abnormal stock return and trading volume at weekly horizon. Dimpfl and Jank (2016) find that the internet search queries can predict next day’s volatility.

This paper connects with the above-mentioned two streams of literature and contributes to the existing literature in two aspects. On the one hand, we give the first empirical study of the predictability of Baidu Index, the newly emerged internet information, on Chinese stock returns, while other studies mainly focus on the market variables (Goh et al., 2013; Chen et al., 2010; Chen et al., 2016). Although Zhang et al. (2013) advocate the search frequency of stock name in Baidu Index as the proxy for investor attention, they only focus on the explanatory power of this proxy on stock return and do not investigate its predictability. On the other hand, we complement the existing studies on Chinese stock returns (Goh et al., 2013; Jordan et al., 2014) in the sense that our predictive variable is at daily horizon. The causes of the changes in daily returns, volatility of trading volume is significantly different from the monthly movement of the market performance (Admati and Pfleiderer, 1988; Amihud and Mendelson, 1987). Therefore, we can gain more insights into the pricing mechanism and our study has potential important implications for pricing mechanisms, asset allocation, and risk management.

The remainder of this paper is organized as follows. "Data description" section describes the Baidu Index and capital data. "Empirical analysis" section performs the empirical analysis, including the lead-lag relationships, the cross-sectional analysis and the trading strategy. "Conclusions" section sets forth the conclusions.

Methods

The sample focuses on both the Shanghai Stock Exchange and the Shenzhen Stock Exchange with 30 stocks in each board respectively, i.e., the ChiNext, the SME Board and the Main Board, covering the calendar days from March 1st 2011 to March 30th 2012.Footnote 1 Totally, there are 90 stocks with 267 trading days. In particular, we obtain the daily trading volume, daily stock return rate after dividend reinvestment and the market return from the China Stock Market and Accounting Research (CSMAR) database. The 1-min prices are retrieved from the RESSET Database to calculate the intraday volatility. In a pooled analysis, the daily trading volume ranges from 33700 to 1.5*108, the firm capitalization ranges from 6.5*105 to 5.8*109, the PE ratio ranges from −667 to 3119 and the turnover ranges from 0.0398 to 58.94. Due to the diversified distribution in different boards as well as the distinct characteristic of the stocks, our sample can be viewed as a parsimonious representation of Chinese stock market.

Baidu index

Baidu Index is a keyword-searching tool launched by Chinese largest search engine and its main customers are Chinese language users. We search the stock name in Baidu Index and record the Search Frequency of Baidu Index (SFBI) for each stock during the sample period. To make the SFBI comparable across firms, we calculate the standardized SFBI (SSFBI) for each individual stock.

$$ SSFB{I}_t=\frac{ S FB{I}_t- A{V}_{S FB I}}{S{ D}_{S FB I}} $$
(1)

where AV SFBI is the average value in the sample period and SD SFBI is the standard deviation of the SFBI time series.

Variables

We calculate the absolute value of the difference between individual stock return and market return as the abnormal return (AbRet). This measurement captures absolute price changes, rather other the upward and downward directions. In a similar way, we also calculate the cumulative abnormal return (CAR) for future analysis.

$$ AbRe{t}_t=\left| R e{t}_{I, t}- R e{t}_{M, t}\right| $$
(2)
$$ CAR\left(-30,+30\right)={\displaystyle {\sum}_{t=-30}^{+30}\left( Re{t}_{I, t}- Re{t}_{M, t}\right)} $$
(3)

where Ret I,t is the daily stock return rate after dividend reinvestment and Ret M,t is the corresponding market index return. In particular, we choose the return of Chinese Stock Index 300 (CSI 300) as the market return for the reason that the CSI 300 is the first index launched by both the Shanghai and Shenzhen Stock Exchanges and thus it represents the whole market.

Following Barber and Odean (2008), we calculate the excess trading volume (ETV) for each stock as the daily ratio of the stock’s trading volume that day to its average trading volume in the whole sample period.

$$ E T{V}_t=\frac{T{V}_t}{A{V}_{T V}} $$
(4)

where AVTV is the daily trading volume on each trading day and AV TV is the average trading volume in the whole sample period.

The intraday volatility is calculated as the standard deviation of the 1-min prices.

$$ Volatilit{y}_t=\sqrt{\frac{{\displaystyle {\sum}_1^N}{\left({P}_t- A{V}_P\right)}^2}{N}} $$
(5)

where Pt is the 1-min stock price and AVP is the average stock price. N denotes the number of observation and N = 240.

Results and discussion

This section provides empirical analysis on the lead-lag relationships between SFBI and stock returns, the market reactions around the MSD and LSD events as well as the performance of the trading strategy.

Lead-lag relationships

In line with other recent studies on the lead-lag relations (Schmeling, 2009 and Siganos et al., 2014), we use five lags for SFBI and stock return as the explanatory variables and formulate the following ordinary regression model.

$$ AbRe{t}_{i, t}={\alpha}_1+{\displaystyle {\sum}_{j=1}^5{\beta}_{i, j} SSFB{I}_{i, t- j}+}{\displaystyle {\sum}_{j=1}^5{\gamma}_{i, j} AbRe{t}_{i, t- j}+{\varepsilon}_{i, t}} $$
(6)

Table 1 reports the regression results of model (6). Since we employ the stock by stock regression, the Bonferroni correction method is used to correct the p-values for the multiple comparisons problem. We observe that the one lagged SFBI is significantly related to next trading day’s abnormal return with positive correlation coefficient across different subsamples. These findings suggest that the online search behavior from individual investors can predict price changes in the next trading day.

Table 1 Summary of the regression results This table reports the regression results of model (6). As the analysis is based on the stock by stock regression, we employ the Bonferroni correction method to correct the p-values for the multiple comparisons problem. The significance cut-off is set to α/n (α = 0.05 and n = 90). Since not all the variables are standardized, the reported coefficients are not the basis points but depend on the scale of the variables

MSD and LSD

To give a meticulous observation of the market reaction, for each individual stock, we sort the trading days based on the corresponding SFBI and select the highest 10 trading days as the Most Searching Days (MSD) and the lowest 10 trading days as the Least Searching Days (LSD). We can then consider the MSD and LSD as event day and observe the market reactions around MSD and LSD with the event study methodology. Figure 1 illustrates the CAR, ETV and Volatility around the MSD and LSD. In particular, Panel A plots the changes of CAR around the MSD and LSD, we find that the stock price goes down after the MSD and goes up after the LSD. These results are inconsistent with the findings in the US stock market (Da et al., 2011), which support higher search volume predicts future higher stock return. This inconsistence may be driven by the “big position construction” by institutional investors. After constructing the position for certain stocks, the institutional investors release some news about the companies and attract the individual investors to buy these stocks. Meanwhile, the institutional investors sell out their holdings to gain profit. This argument reconciles with report conducted by the Shanghai Stock Exchange showing that the individual investors account for 93.20% in A shares at the end of 2012 as well as the some scholars claiming that Chinese stock market are dominated by irrational individual investors who are subject to strong psychological bias and thus resulting in speculation (Feng and Seasholes, 2008 and Zhang et al., 2016b). Besides, Fig. 1 also show that there are significantly larger ETV and Volatility on the MSD.

Fig. 1
figure 1

CAR, ETV and Volatility around the MSD and LSD. Panel a: CAR. Panel b: ETV. Panel c: Volatility

Trading strategy

In this section, we further investigate the economic significance of our findings by constructing a long-short trading strategy based on the SFBI. The long-short trading strategy is formed as follows: on each trading day, the 90 firms are sorted into quantiles (Q) based on the SFBI in previous trading day. Q1 contains the firms with the least SFBI and Q4 contains the firms with the most SFBI. The firms are held in their portfolios for the entire sample period with different holding periods, e.g., 20, 40, 60, 80, 100 and 120 trading days, respectively. We then obtain a portfolios that consists of a long position in the least quantile of firms (Q1) and a short position in the most quantile of firms (Q4), i.e., the returns of the portfolio are the Q1 minus Q4. Figure 2 illustrates the cumulative profit with different holding periods. As is plotted, this trading strategy has positive return for all the holding periods. Specifically, Fig. 3 illustrate the strategy with 120 holding period. Panel A of Fig. 3 plots the returns of the Q1 and Q4, we can find that the return is Q1 is significantly larger than that of the Q4 at 1% level with p-value = 0.0017. The blue line in Panel B of Fig. 3 is the difference between the Q1 and Q4 and the red line is the corresponding market returns. We can find that our trading strategy outperforms the market returns significantly at 1% level with p-value = 0.0000.

Fig. 2
figure 2

Cumulative Holding Period Returns

Fig. 3
figure 3

An Illustration of Cumulative Returns in Holding Period with 120 Trading Days. The left subfigure of figure 3 is Panel a; The right subfigure of figure 3 is Panel b

Conclusions

This paper employs the Baidu Index as a predictive variable and investigates its predictability for Chinese stock returns. The empirical findings show that Baidu Index can predict the price changes on the next trading day. After constructing the MSD and LSD, we mainly find that the stock prices go up when individual investors pay less attention to the stocks and go down when individual investors pay more attention to the stocks. Besides, we also construct a trading strategy by shorting on the most SFBI and longing on the least SFBI. The trading strategy outperforms the corresponding market index returns. However, we must caution scholars in adopting our trading strategy in real investment. Because the transaction costs associated with the rebalancing are not considered.