1 Introduction

Neoclassical financial theory argues that the arrival of news is the major driver of asset prices. Focusing particularly on those aspects that neoclassical finance usually assumes away, the (more recent) literature on market microstructure provides a wealth of models featuring effects on market prices that could not be explained in the neoclassical framework. In this literature, a number of papers investigate the effect of buying/selling pressure currently prevailing in the market for an asset on its price movements. A popular measure of buying/selling pressure in intermediated markets, where market makers ensure liquidity, is order (flow) imbalance, which measures the disparity between buyer- and seller-initiated trades (a precise definition will be provided in Sect. 3.1.1).

The present paper analyzes effects of order imbalance on daily returns of German stocks. It contributes to the empirical literature on order imbalance effects in stock returns in various ways. First, up to now, there are no studies investigating imbalance–return relations for German stocks. An advantage of the German data over most US data is that all trades are identified as either buyer- or seller-initiated, thus avoiding errors from the use of trade classification algorithms. Second, while most of the literature uses time series regressions, we rely on fixed-effects panel regression as described in Sect. 3.2. Third, studies based on a recent sample of daily order imbalances do not seem to exist: stock markets worldwide become more efficient, and it seems interesting whether effects documented for the 1990s still persist at daily frequencies. Therefore, in the present paper, we scrutinize concurrent and unconditional relations for the German market and provide results for recent day-to-day effects. Fourth, we document size and liquidity effects in the imbalance–return relation. Fifth, in contrast to the previous literature, we find imbalance effects to be weaker for very high levels of order imbalance. Sixth, we are the first to analyze imbalance effects during the financial crisis and show that the concurrent relation has increased in that period.

The paper is organized as follows: Section 2 provides a review of previous papers on order imbalance, which positions our results in the context of the existing literature. Section 3 defines the variables and the regression models used. Section 4 describes our data together with the sample selection criteria we applied. Section 5 discusses our results and compares them to those in the literature. Section 6 concludes.

2 Review of the literature and contribution

In this section, we provide an overview of the literature on order imbalance, its causes, and its effects on asset returns. We start with theoretical explanations for the existence of order imbalance and its effects on asset prices. This will be followed by a comparison of previous empirical results.

2.1 Theoretical models related to order imbalance

A very simple model of an intermediated stock market is presented by Roll (1984). A risk-neutral market maker sets quotes for trading with a non-discretionary liquidity trader. The assumption of an efficient market implies that the quotes remain unchanged unless new information arrives. In this situation, a market buy order will be executed at the ask and can either be followed by a trade at the same price or at a lower price (the bid). This induces a negative link between order imbalance and subsequent price changes. The resulting bid-ask bounce effect creates negative first-order autocorrelation in returns calculated from traded prices measured over adjacent time intervals (e.g., daily closing prices).

When the market maker is assumed to be risk-averse instead of risk-neutral, trading leads to (potentially undesirable) changes in his risk position. A model in this spirit is studied by Stoll (1978). Starting from an initially optimal portfolio, this implies two types of risks: first, any trade changes the overall risk of the market maker’s portfolio, moving it to a risk level different from the market maker’s target level. Second, assuming the initial portfolio was perfectly diversified, any trade moves the portfolio away from perfect diversification by increasing unsystematic risk. Trying to reduce this inventory holding risk, the market maker will adjust the level of quotes to induce trading at the desired side of the spread, which—at first—results in a positive link between order imbalance and price change. Once successful, quotes are reset to their initial values. The resulting higher probability for a price change that is negatively related to order imbalance has been termed induced order arrival effect by (Huang and Stoll 1994, p. 183).

Acknowledging that some traders may have private information not yet incorporated in market prices, market makers anticipate possible losses due to informed trading (adverse selection) by widening bid-ask spreads. This allows them to recover losses to informed traders through increased profits from trading with liquidity traders (Bagehot 1971, p. 13). The risk of other traders obtaining the same information creates time pressure on informed traders (Glosten 1994, p. 1151), which leads to a preference for market orders or aggressively priced limit orders for exploiting private information (Harris 1998, pp. 1ff.). By pushing the price towards the asset’s fundamental value, informed trading creates a positive link between order imbalance and price changes (Huang and Stoll 1997, p. 999).

Private information may also lead to serial correlation in trades, since informed investors try to prevent conveying their information to the market by splitting their orders and buy or sell repeatedly until the price has moved to the extent indicated by their information (Kyle 1985, p. 1330). Similar effects occur when institutions split large orders to reduce price impact, sometimes over several days (Chan and Lakonishok 1995, p. 1152). Herding (see, e.g., Lakonishok et al. 1992) e.g., due to peer group pressure (see, e.g., Lee et al. 2004, p. 332) information cascades (Chiao et al. 2011, p. 132) or processing correlated (Chiao et al. 2011, p. 132) or even the same public (Lakonishok et al. 1992, p. 26) or private information (Hasbrouck and Seppi 2001, p. 386), positive feedback trading (Lakonishok et al. 1992, p. 26) or exogenous factors (Hasbrouck and Seppi 2001, p. 386) will also lead to serial correlation in trades. This amplifies the inventory holding and adverse selection effects described above.

Several more recent models study the interrelation of all these effects together with their total impact on market prices. Examples for such models include Huang and Stoll (1994, 1997); Stoll (2000); Llorente et al. (2002); Chordia and Subrahmanyam (2004), and Subrahmanyam (2008). Some of the effects amplify each other, while others act in opposite directions. Table 1 lists the component effects together with their respective signs.

Table 1 Effects of order imbalance on subsequent price changes implied by theoretical models

Which of the effects dominates depends on the circumstances. However, when excluding bid-ask bounces using mid-quote returns, most of the remaining effects point towards a positive predictive relation between order imbalance and subsequent price changes. Chordia and Subrahmanyam (2004, p. 487) argue that adding current order imbalance as an explanatory variable will change the sign of the coefficients of past order imbalances to negative. This is due to the autocorrelation in trades.

2.2 Empirical results on order imbalance effects in asset returns

The majority of empirical studies confirms the signs of imbalance–return relations suggested by market microstructure theory: contemporaneous order imbalance is positively linked to returns whereas conditional lags are negatively linked. The unconditional first lag is positive, whereas higher lags are either negative or insignificant. However, the strength of these dependencies differs across markets and sample periods analyzed.

Existing empirical studies can be broadly classified by data frequency. We will first discuss results for intra-day data before covering studies based on daily or lower observation frequencies. Table 2 summarizes information on intra-day studies, the samples used, and their findings. Most intra-day studies document a strong contemporaneous relationship with decreasing conditional lags. Shifting the relation by one interval, stock market studies document a strong unconditional first lag for observation intervals of up to several minutes. Order imbalances have more explanatory power for less efficient markets. Higher unconditional lags are mostly insignificant.

Table 2 Empirical studies dealing with order imbalance and return (intra-day data)

Harford and Kaul (2005) document a strong concurrent relation on the US stock market for 1986 and 1996. In the 2000s, this is confirmed for special samples such as top losers or gainers by Su and Huang (2008), Su et al. (2009b), Su et al. (2011), and Huang et al. (2012). Apart from stocks, Locke and Onayev (2007, S&P 500) and Huang and Chou (2007, Taiwan) find strong intra-day relations for index futures. The relation for higher lags is weak or even insignificant when controlling for concurrent imbalance.

Studies with more recent sample periods mainly focus on the unconditional lagged relation. For NYSE stocks, Chordia et al. (2008) find significant coefficients for lag 1 based on 5-min returns. The relation is stronger for smaller firms. Their sample covers the largest 500 stocks from 1993 to 2002. The more detailed results for 1996, 1999 and 2002 in Chordia et al. (2005) (covering the biggest 150 NYSE stocks) reveal that in earlier years, the link was significant up to an interval length of 30 min. In 2002, however, there is no significant link beyond five minutes. In this regard, the Japanese stock market seems to be as efficient as its US counterpart. Yamamoto (2012) documents a strong relationship for intervals of up to five minutes in a sample covering 2006 and 2007. There is a U-shaped size effect with a stronger relation for both small and large firms.

In other markets the unconditional link is more persistent. Visaltanachoti and Yang (2010) compare non-US and US firms and show that imbalances have more explanatory power for non-US firms, where significant effects last for up to 15 min. The analysis by Jiang et al. (2011) comprises 20 randomly drawn stocks traded on the Chinese stock exchanges Shanghai and Shenzhen and extends from 2000 to 2008. The average coefficients are highly significant for 10- and 15-min intervals before becoming insignificant from 30 min onwards. Chang and Shie (2011) deal with Taiwanese index futures from 2006 to 2007. At the 5-min observation frequency, order imbalances are found to be related only to extreme (positive or negative) returns.

Insignificant or negative unconditional links are documented for samples selected in a non-random manner. For example, stocks with extremely negative returns show faster return reversals than other stocks do. Accordingly, Su et al. (2011) and Huang et al. (2012) find strong negative links at lag 1 for NASDAQ and NYSE stocks, respectively. Conversely, stocks with extremely positive returns do not show any significant imbalance–return relation. This is shown for the NASDAQ by Su and Huang (2008) and Su et al. (2009a, b). The first paper deals with 5-min returns, the two others with 90-s intervals. In all three time series studies, the percentage of significantly positive or negative coefficients is low and almost equal. Visaltanachoti and Luo (2009) find no significant imbalance–return relation for Taiwanese stocks at a 30-min observation frequency.

Table 3 Empirical studies dealing with order imbalance and return (daily and lower frequencies)

Table 3 presents the evidence of studies using daily or lower frequencies. The strong concurrent imbalance–return relation found for 5–15 min is also present at daily and weekly intervals. However, it declines markedly when unconditional lags are examined.

Studies based on daily returns for US stocks focus on the period from 1988 to 1998. They find a strong positive contemporaneous link and a weaker negative link for conditional lags (see, e.g., Chan and Fong 2000; Aktas et al. 2008; Stoll 2000; Chordia et al. 2002; Chordia and Subrahmanyam 2004). For the early 2000s, the positive concurrent relation is confirmed by Bailey et al. (2006) and Shenoy and Zhang (2007) on Asian markets. Conditional lags, however, are found to be insignificant. Similar results apply for the FTSE 100 index future from 1993 to 2005 (Ning and Tse 2009, pp. 342–343) and for currency pairs during 2007 (Chen et al. 2012, pp. 606–607). Kao (2011) does not find any relation for the Taiwanese index futures market over a period from 2008 to 2009.

The evidence for unconditional imbalance–return relations is scarce. Analyzing NYSE stocks from 1988 to 1998, Chordia and Subrahmanyam (2004) find a strong positive first-lag relation, which is most pronounced in the three smallest size quartiles. Chordia et al. (2002) use a similar sample and find a strong negative first-lag relation for extremely negative returns. However, they do not control for bid-ask-bounce, which might have biased the results. Studies for Taiwanese stocks (Lee et al. 2004, pp. 334–335) or currency pairs (Chen et al. 2012, pp. 606–607) do not find any pronounced relationships. Kao (2011) finds a strong positive unconditional first lag only for extreme positive imbalances.

A positive relation between order imbalances and returns has been documented even beyond the daily horizon. Studying Taiwanese stocks from 1994 to 2002, Andrade et al. (2008) find a significantly positive contemporaneous relation for weekly data. Conditional lags are significantly negatively related. In the cross-sectional regression of Kaniel et al. (2008), the unconditional first lag is significantly positive. The study analyzes order imbalances of individual investors trading NYSE stocks from 2000 to 2003. Subrahmanyam (2008) aggregates order imbalances to monthly data. His sample consists of NYSE stocks from 1988 to 2002. The first and the second unconditional lags are negatively related to returns. The relation is significant for the second lag and can be traced back to mid-sized firms.

Whereas the initial imbalance effects on US markets are strong and last only for several minutes, offloading inventories seems to occur gradually and over longer time periods of sometimes up to several weeks. This is suggested by the fact that a positive link can be found even at daily and weekly frequencies and for both concurrent and unconditional first lags. For Chinese stock and future markets the daily relation is only significant for the concurrent view. Various size effects have been documented, but vary in nature from market to market.

3 Methodology

3.1 Variables

3.1.1 Order imbalance

In the literature, three major approaches to measuring order imbalance are used: one is based on the number of buy and sell orders, another considers also the size of orders (i.e., the number of shares in each order), and yet another accounts also for the current share price by multiplying it with the order size. Most of the literature on order imbalance uses the first approach, sometimes combined with the second. A number of studies favor the use of the simple number measure: Jones et al. (1994) find a much stronger effect of the number of trades (as compared to trading volume) on return volatility. On a sample of NYSE stocks observed over roughly 10 years, Chordia and Subrahmanyam (2004) find a markedly higher correlation between returns and order imbalance when the latter is measured using the number measure approach. Scaling order imbalance by the total number of trades may diminish autocorrelation (Chordia and Subrahmanyam 2004, p. 498) but has the advantage of allowing for meaningful comparisons across stocks despite differences in liquidity. Hence, we define the order (flow) imbalance for stock i on day t as

$$\begin{aligned} I_{i,t}=\frac{{\text {No. of buyer-initiated trades}}_{i,t} - {\text {No. of seller-initiated trades}}_{i,t}}{{\text {Total no. of trades}}_{i,t}}. \end{aligned}$$
(1)

Xetra allows for identification of every single transaction as either buyer- or seller-initiated, even for transactions within the bid-ask spread. This avoids any need for applying the Lee and Ready (1991) trade classification algorithm used in many previous quote-driven studies, see e.g., Chan and Fong (2000, p. 254), Chordia and Subrahmanyam (2004, p. 494), Yamamoto (2012, p. 9). Moreover, by including both market orders and marketable limit orders (marketable limit orders are limit buy orders above the ask quote or limit sell orders below the bid), all traders demanding immediacy in execution are included. Li et al. (2010) argue that withdrawing a limit buy (sell) order has the same effect as submitting a limit sell (buy) order. Including such canceled orders leads to a higher explanatory power of order imbalance for concurrent returns. Unfortunately, our dataset does not contain information on canceled limit orders, which precludes us from using this extended measure of order imbalance.

3.1.2 Returns

We compute daily log returns from the last mid-quotes before the closing auction:

$$\begin{aligned} R_{i,t}=\log \left( \frac{{\text {ask}}_{i,t}+{\text {bid}}_{i,t}}{{\text {ask}}_{i,t-1}+{ \text {bid}}_{i,t-1}}\right) , \end{aligned}$$
(2)

where \({\text {ask}}_{i,t}\ldots\) is the last ask quote for stock i before the closing auction of day t and \({\text {bid}}_{i,t}\ldots\) is the corresponding bid quote. Using mid-quotes instead of traded prices avoids any bid-ask bounce effects, which would induce negative first-order autocorrelation in returns (see, e.g., Roll 1984; Kaul and Nimalendran 1990; Jegadeesh 1990).

When investigating lead–lag relations as in the present study, infrequent trading may distort the results (see, e.g., Lo and MacKinlay 1990, p. 178). Following the literature, we deal with this potential problem by focusing on the most liquid stocks only and eliminating stocks with missing values for order imbalance. The exact exclusion procedure will be described in Sect. 4.2.

3.2 Regression models and hypotheses

3.2.1 General relation

Our literature review in Sect. 2 shows that there is a large number of papers investigating the relation between order imbalances and returns. The models used in these papers can be broadly classified into two categories: one group tries to forecast returns from (only) past order imbalances (unconditional lagged relation), the other aims at explaining returns using current and past order imbalances (concurrent and conditional lagged relation).

In this paper, we investigate both types of relations between order imbalances and returns. In contrast to most previous studies based on time series regressions, however, we stack all observations across the stocks in our sample and perform panel regressions. We account for time- and stock-specific effects by applying the within transformation (see Wooldridge 2010, p. 302).

Unobserved effects like market sentiment might be present in our data, which may well be correlated with order imbalance. To assess whether the data correspond rather to a fixed or a random effects model, we perform Hausman (1978) tests. Estimators for the fixed and random effects model differ significantly (at the 1 % level) for both unconditional and conditional models. This indicates that a fixed-effects regression fits the data better.

For a generic variable Y, unit-specific effects are removed using

$$\begin{aligned} \ddot{Y}_{i,t}:=Y_{i,t}-\bar{Y_i}, \end{aligned}$$
(3)

where \(\bar{Y_i}\) is the time-average of the observations on \(Y_i\). When applied to return data, this transformation is equivalent to applying the constant-mean-return correction (see Brown and Warner 1985, pp. 4–5). Time-specific effects are removed by subsequently applying the within transformation cross-sectionally, i.e.,

$$\begin{aligned} \tilde{Y}_{i,t}:=\ddot{Y}_{i,t}-N^{-1}\sum _{i=1}^N \ddot{Y}_{i,t}, \end{aligned}$$
(4)

where N is the total number of stocks in the sample.

The fixed-effects regression model for the conditional lagged relation is specified as

$$\begin{aligned} \tilde{R}_{i,t}=\sum _{k=0}^K \beta _k^c \tilde{I}_{i,t-k}+\tilde{\epsilon }_{i,t}^c, \end{aligned}$$
(5)

where K is the highest order imbalance lag included, and \(\epsilon _{i,t}^c\) is the error term for stock i at time t. We test whether \(\beta _k^c\) equals zero by means of two-tailed t tests.

The fixed-effects regression model for the unconditional lagged relation is given by

$$\begin{aligned} \tilde{R}_{i,t}=\sum _{k=1}^K \beta _k^u \tilde{I}_{i,t-k}+\tilde{\epsilon }_{i,t}^u, \end{aligned}$$
(6)

with analogous definitions. The null hypothesis of \(\beta _k^u=0\) is again tested using two-tailed t tests.

Preliminary data analyses reveal that the error terms are subject to both heteroskedasticity and autocorrelation. Robust standard errors are, therefore, calculated using the methodology suggested by Arellano (1987, pp. 432–433).

3.2.2 Size and liquidity effects

Previous studies suggest that additional variables, such as size and liquidity, influence the imbalance–return relation. Adverse selection effects, e.g., are presumably weaker for large firms and liquid stocks due to better analysts’ coverage (Huang et al. 2012, p. 9584) or a stronger presence of informed traders (Kyle 1985, pp. 1317–1320). However, the impact of liquidity on inventory holding effects is still unclear. On the one hand, inventory holding effects could be stronger for illiquid stocks because liquidity providers face difficulties in offloading undesired inventories (Jiang et al. 2011, p. 475). On the other hand, stronger herding may lead to amplified inventory holding effects for highly liquid stocks (see Keim and Madhavan 1995, p. 385 or Bailey et al. 2006, p. 14).

We measure size by yearly market capitalization, \(C_{i,t}\) (provided by Datastream and updated at the beginning of each year), and liquidity by the bid-ask spread, \(S_{i,t}\). Size and liquidity effects are interrelated. Stocks of large firms are likely to be more liquidly traded than smaller stocks. The correlation between market capitalization and bid-ask spread is \(-\)0.27 in our sample. Stratifying the sample by size shows that correlation is highest for the smallest (\(-\)0.24) and the largest quintiles (\(-\)0.16). The magnitude of this correlation is not high enough to raise concerns about multicollinearity problems, but it may be difficult to clearly separate size from liquidity effects.

We employ regressions including control and interaction variables for market capitalization and spread. The latter are products of two factors. The first factor is the corresponding imbalance lag. The second factor accounts for market capitalization and spread. Preliminary data analyses show that imbalance effects seem to be weakest for mid-cap stocks and stronger for large and small stocks. We capture the resulting U-shape by including “abnormal” market capitalization, \(C^a_{i,t}\), which is defined as follows:

$$\begin{aligned} C^a_{i,t}:=\left| C_{i,t} - (N T)^{-1} \sum _{i=1}^N \sum _{t=1}^T C_{i,t} \right| , \end{aligned}$$
(7)

where T is the total number of observations in the sample.

In a first step, we analyze size effects. The regression model for the conditional relation is

$$\begin{aligned} \tilde{R}_{i,t}=\sum _{k=0}^K \beta _k^{c} \tilde{I}_{i,t-k}+ \sum _{k=0}^K \gamma _k^{lc} \tilde{C}_{i,t-k}+ \sum _{k=0}^K \delta _k^{lc} \tilde{C}_{i,t-k} \cdot \tilde{I}_{i,t-k} + \sum _{k=0}^K \zeta _k^{qc} \tilde{C}^a_{i,t-k}+ \sum _{k=0}^K \eta _k^{qc} \tilde{C}^a_{i,t-k} \cdot \tilde{I}_{i,t-k} +\tilde{\epsilon }_{i,t}^{c}, \end{aligned}$$
(8)

where we test the null hypotheses of \(\gamma _k^{lc}=0\), \(\delta _k^{lc}=0\), \(\zeta _k^{qc}=0\), and \(\eta _k^{qc}=0\) separately by means of two-tailed t tests.

The regression model for the unconditional relation is given by

$$\begin{aligned} \tilde{R}_{i,t}=\sum _{k=1}^K \beta _k^u \tilde{I}_{i,t-k}+ \sum _{k=1}^K \gamma _k^{lu} \tilde{C}_{i,t-k}+ \sum _{k=1}^K \delta _k^{lu} \tilde{C}_{i,t-k} \cdot \tilde{I}_{i,t-k}+ \sum _{k=1}^K \zeta _k^{qu} \tilde{C}^a_{i,t-k}+ \sum _{k=1}^K \eta _k^{qu} \tilde{C}^a_{i,t-k} \cdot \tilde{I}_{i,t-k} +\tilde{\epsilon }_{i,t}^{u}, \end{aligned}$$
(9)

with analogous definitions.

In the second step, we analyze liquidity effects. The regression model for the conditional relation is

$$\begin{aligned} \tilde{R}_{i,t}=\sum _{k=0}^K \beta _k^c \tilde{I}_{i,t-k}+ \sum _{k=0}^K \gamma _k^{lc} \tilde{S}_{i,t-k}+ \sum _{k=0}^K \delta _k^{lc} \tilde{S}_{i,t-k} \cdot \tilde{I}_{i,t-k}+ \tilde{\epsilon }_{i,t}^{c}, \end{aligned}$$
(10)

where we test the null hypotheses of \(\gamma _k^{lc}=0\) and \(\delta _k^{lc}=0\) separately by means of two-tailed t tests.

The regression model for the unconditional relation is given by

$$\begin{aligned} \tilde{R}_{i,t}=\sum _{k=1}^K \beta _k^u \tilde{I}_{i,t-k}+ \sum _{k=1}^K \gamma _k^{lu} \tilde{S}_{i,t-k}+ \sum _{k=1}^K \delta _k^{lu} \tilde{S}_{i,t-k} \cdot \tilde{I}_{i,t-k}+ \tilde{\epsilon }_{i,t}^{u}, \end{aligned}$$
(11)

with analogous definitions.

Finally, we run two regressions (conditional and unconditional) including size and liquidity interaction terms simultaneously, i.e., we combine Eqs. (8) and (10) as well as Eqs. (9) and (11).

4 Data

4.1 Initial dataset

Our dataset includes stocks traded on the German Xetra trading system starting from Feb. 1, 2002, until Sept. 30, 2009 (1950 trading days). For all stocks, the last available quotes before the closing auction together with order imbalances are available on a daily basis. In addition, the market capitalization, which is updated once a year, is used to categorize companies according to size. Quotes and market capitalization are retrieved from Thomson Reuters Datastream, and the order imbalances are computed from data provided by the Karlsruher Kapitalmarktdatenbank. Data are adjusted backwards for capital measures such as dividend payouts, stock splits, reverse splits or repurchases.

The sample selection described in Sect. 4.2 will result in one sample of daily data. To this end, a number of filtering or exclusion criteria are applied to eight subperiods: the calendar years from 2003 to 2008 and two somewhat shorter periods, from Feb. 2002 to year-end and from the beginning of 2009 to the end of September.

4.2 Sample selection

Three filtering criteria are applied to the initial dataset to arrive at the sample used in our study. First, for the effects we want to examine, insufficient liquidity may distort the results. For this reason, we follow previous studies in this field (e.g., Chan and Fong 2000; Lo and Coggins 2006) and exclude stocks with low liquidity. Second, ex-dividend dates and similar events are dropped. Third, days with missing data are excluded. We will now provide more details on each of these steps.

To filter out stocks with insufficient liquidity, the initial dataset is analyzed by subperiods. This is inspired by the empirical observation that liquidity varies considerably over time for individual stocks. We consider a stock to be sufficiently liquid (or traded sufficiently actively) if order imbalance can be computed for each single trading day. For each subperiod described in Sect. 4.1, a stock is excluded if there is one illiquid day or more. Out of 1225 stocks in the initial dataset, 214 stocks meet this criterion for at least one of the subperiods. Some of the stocks are included in all subperiods while others meet the selection criterion only in some subperiods, but not in others.

In a second step, ex-dividend days and days with capital changes (e.g., stock splits) are excluded. The corresponding dates are obtained from Thomson Reuters Datastream.

Third, all relevant variables are screened for missing observations. There are 116 days with missing quote data after steps 1 and 2. These days are also excluded for the corresponding stocks. Bid and ask quotes show a large number of missing values on Aug. 24, 2009. Since no information about special market circumstances could be found for this day (CDAX volatility and volume behave normally), this seems to be a data integrity issue, which is dealt with by eliminating this day for all stocks. For Continental AG, all quotes are missing from April 2–12, 2002. This stock is, therefore, dropped for the 2002 subperiod. Market capitalization shows missing values throughout entire subperiods for six out of the 214 stocks remaining after steps 1 and 2 (for other stocks, market capitalization shows missing values for some days. Since it remains constant throughout a year, such temporarily missing data are not a problem). This leads to two stocks being dropped completely and two other stocks being removed from the affected subperiods, but retained in the sample in other subperiods.

4.3 Validity checks

The sample is then checked for data errors and invalid observations. No negative quotes are detected. Four ask quotes are found to be lower than the corresponding bid quotes. These observations are dropped from the sample. The remaining order imbalances, bid-ask spreads, and returns are tested for validity as described in the following.

First, order imbalance data are checked. Extreme values are rare. Only three observations differ from the cross-sectional daily average by more than 1.0. Two of these observations are accompanied by other large order imbalances in the same direction. Hence, despite these observations looking extreme at first glance, they seem to validly document the true development of the market at the time. One observation is dropped from the sample because the extreme imbalance is not supported by other market variables during a period of five days around the extreme observation.

Second, absolute spreads larger than 20 % of the bid quote are examined. Four quote pairs for one stock and two for a second stock violate this criterion and are excluded for the stocks in question. In addition, IKB Deutsche Industriebank AG faced an extraordinary decrease in share price, leading to 17 invalid spreads in December 2008. To avoid any distortion of the results, IKB is dropped from the 2008 subperiod.

Third, returns above 50 % or below \(-\)50 % are analyzed in detail. On May 25, 2005, there are 17 returns outside of this interval and many more that are larger than usual. Quotes differ markedly from the quotes on adjacent days. This leads to another 13 extreme returns for May 26, 2005. There is no unusual economic news on either of these days, and neither DAX nor CDAX themselves show abnormal returns or volumes. To ensure data validity, we excluded May 25, 2005 for all stocks. Aside from May 25 and 26, 2005, there are 13 other extreme return days. Three of them concern IKB in the subperiod of 2008, which has already been excluded due to invalid spreads. The remaining 10 extreme returns are deemed to be valid (and kept in the sample) because quotes before and after the extreme observation confirm the return development.

4.4 Final data set

Application of the sample selection criteria described in Sect. 4.2 and the validity checks in Sect. 4.3 reduces the initial dataset of 624,236 daily observations for 1225 stocks to 207,939 observations for 212 stocks. Table 4 provides the number of stocks in the various subperiods.

Table 4 Number of stocks included in the final sample by subperiod (out of 1225 stocks in the initial dataset)

Figure 1 shows the number of daily observations by subperiod. The years with the highest number of observations are 2006–2008. These three years account for 52 % of the total number of observations. The subperiods 2002 and 2009 are shorter than 12 months. The remaining variation is due to different numbers of stocks included in the eight subperiods.

Fig. 1
figure 1

Daily observations included in the final sample by subperiod

Table 5 provides descriptive statistics for order imbalance and return in the final sample. The percentage of positive order imbalances of 50.08 % documents that buying and selling pressure are almost exactly balanced. Nevertheless, the standard deviation of 21.06 % shows that there is considerable variation in our observations. 1.6 % of all observations are below \(-\)0.5, and 1.4 % are above 0.5. Although there is a small tendency towards positive order imbalances, negative returns are more prevalent.

Table 5 Descriptive statistics for the final sample (all values in percent)

Standard deviation of order imbalance is not distributed evenly across firm sizes and liquidity levels. As shown in Table 6, the standard deviation is largest for size quintile 1 (smallest firms) and decreases steadily to quintile 5 (largest firms). Results for liquidity quintiles are similar. This indicates that size and liquidity may play an important role for the explanation of the imbalance–return relation.

Table 6 Standard deviations for order imbalance stratified by size and liquidity (entire sample: 0.211)

To confirm the significance of this pattern, we regress the absolute value of order imbalance on market capitalization, \(C_{i,t}\), and spread, \(S_{i,t}\):

$$\begin{aligned} | \tilde{I}_{i,t} |= \gamma \tilde{C}_{i,t} + \delta \tilde{S}_{i,t} +\tilde{\epsilon }_{i,t}. \end{aligned}$$
(12)

The null hypotheses \(\gamma =0\) and \(\delta =0\) are tested separately using two-tailed t tests. The results reported in Table 7 show that the absolute value of order imbalance is related to the bid-ask spread. This relation is significant at the 1 % level. In contrast to liquidity, market capitalization does not have a significant impact.

Table 7 Dependence of the magnitude of order imbalance on size and liquidity

5 Results

5.1 Conditional lagged relation

Table 8 reports the regression results for the conditional lagged relation. The second column provides the results for Eq. (5), i.e., using only current and past order imbalance as explanatory variables. Preliminary analyses suggested to include four lags of order imbalance. Consistent with previous findings, the coefficient of concurrent order imbalance is positive and significant. This can be explained by serially correlated trades induced by order-splitting or herding (cf. Sect. 2.1). Moreover, as suggested by theory, coefficients of conditional lagged imbalances are negative and significant. This is because the effect of current order imbalance is already partially compensated by liquidity providers in the meantime. The negative relation is strongest on the second lag and wanes with higher lags.

The remaining columns in Table 8 give the results for the conditional relation when size and liquidity effects are included (Eqs. (8), (10), and both equations combined). The number of lags included was determined by starting with four lags, followed by eliminating insignificant higher lags. There are pronounced size and liquidity effects for concurrent order imbalance. The size interaction coefficients \(\tilde{C}_{i,t} \cdot \tilde{I}_{i,t}\) are negative and significant at the 1 % level for the concurrent and lag 1 interaction terms. This means that smaller stocks, in general, are more sensitive to concurrent imbalances than are larger stocks, and that they show a weaker reversal effect at lag 1. The positive coefficient for the first two lags of \(\tilde{C}^a_{i,t} \cdot \tilde{I}_{i,t}\) confirms the U-shape on top of the linear relation just described: very small and very large stocks show higher sensitivity with respect to concurrent order imbalance, and a smaller reversal effect on the following day.

Liquidity effects are strong on the concurrent and lag 1 interaction terms, showing positive and significant coefficients. This shows that illiquid stocks have a stronger concurrent imbalance–return relation, but a weaker reversal on the following day. The magnitude of these coefficients is somewhat less stable when including/not including size interaction coefficients together with liquidity. We interpret this as an effect of the correlation between size and liquidity and a hint that the size effect may be stronger/more important than the liquidity effect.

Table 8 Conditional relation with and without size/liquidity effects

5.2 Unconditional lagged relation

Table 9 shows the regression results for the unconditional lagged relation. Results for the regression specified in Eq. (6) are presented in the second column. The first unconditional lagged coefficient is positive and significant, which is consistent with previous research. However, it is much smaller than the concurrent coefficient from Table 8 (\(2.89\cdot 10^{-3}\) vs. \(25.88 \cdot 10^{-3}\)). Thus, the strong contemporaneous effect of order imbalances wanes markedly already one day later. In addition, the second lag of order imbalance is negative as expected, but only significant at the 10 % level. Higher lags are eliminated because they turned out to be insignificant in preliminary analyses. The fact that the imbalance effect dies out completely within two days is in contrast to previous studies based on daily data. This may be due to higher efficiency in stock markets in the 2000s compared to the sample periods of previous studies given in Table 3.

Columns 3–5 in Table 9 report regression results for Eqs. (9) and (11) as well as both equations combined. Size interaction coefficients are highly significant for the unconditional first lag, but insignificant for higher lags. The first-lagged linear relation is negative, which means that order imbalances have a stronger impact on returns from small stocks. The “absolute relation” is positive and supports a U-shaped pattern (similar to the findings of Yamamoto 2012, on Japanese data) where mid-sized stocks have a weaker imbalance–return relation than small and large stocks. Once including size effects the first lag of the imbalance coefficient \(\tilde{I}_{i,t-1}\) becomes insignificant: the interaction between size and order imbalance shows a higher explanatory power than order imbalance per se.

The unconditional first-lagged relation exhibits liquidity effects as well. The first interaction coefficient \(\tilde{S}_{i,t-1} \cdot \tilde{I}_{i,t-1}\) is positive and significant at the 5 % level (at the 1 % level when size effects are not included). This shows that returns of illiquid stocks are more sensitive to order imbalance than are returns of very liquid stocks. However, similar to the conditional lagged relation discussed in Sect. 5.1, liquidity effects are again less stable than size effects. A U-shaped liquidity pattern as suggested by theory (see, e.g., Keim and Madhavan 1995; Bailey et al. 2006) could not be detected in the data. We initially included also interaction terms based on the absolute difference of the spread from its mean, defined similar to Eq. (7). The coefficients were insignificant, and the corresponding terms were dropped from the final regressions.

Table 9 Unconditional relation with and without size/liquidityeffects

5.3 Different order imbalance levels

Previous research finds higher coefficients when confining the analysis to extreme order imbalances, see Chordia et al. (2002, pp. 124–126) analyzing aggregated NYSE stocks, or Chang and Shie (2011, pp. 74–77) covering the Taiwan index futures market. To see how the effect on returns depends on the level of order imbalance, we re-run the regressions in Eqs. (5) and (6) on corresponding sub-samples stratified by the magnitude of order imbalance. Table 10 provides the results.

Table 10 Dependence of the imbalance–return relation on the magnitude of order imbalance

The concurrent effect of order imbalance on returns is strongest for small-order imbalances (\(|I_{i,t}|<0.2\)) and decreases for the two categories of higher order imbalance (\(0.2 \le |I_{i,t}| < 0.4\) and \(0.4 \le |I_{i,t}|\), resp.). For the unconditional relation, the coefficient for the first order imbalance lag increases for higher order imbalances, but the difference between high and intermediate order imbalance levels is negligible. This shows that our results are not driven by extreme observations for order imbalance. Furthermore, this is in contrast to previous studies, which found higher coefficients when confining the analysis to extreme order imbalances. A possible explanation is that very large orders may be filled outside the stock exchange’s regular trading, which is not captured in our sample.

5.4 Financial crisis

Since this paper is the first one on order imbalance effects using data covering the recent financial crisis, we take the opportunity and analyze the relation between order imbalance and return during this period of extreme market stress. To this end, we create a sub-sample for the period from July 1, 2007 to Sept 30, 2009, and re-run the regressions in Eqs. (5), (6), (8), (9), (10), (11), and the corresponding combinations. Table 11 provides the results.

Table 11 Imbalance–return relation during the financial crisis with and without size/liquidity effects

Conditional imbalance coefficients increase during the crisis period when controlling for size and/or liquidity effects, cf. the top lines of Tables 8 and 11. Unconditional imbalance coefficients remain largely unaffected, cf. the corresponding lines in Tables 9 and 11. \(R^2\) increases during the crisis. The control variables’ market capitalization and abnormal market capitalization show higher effects during the crisis period, with coefficients between twice and three times their values computed from the entire sample.

For the conditional relation, concurrent interaction terms decrease in magnitude, while lag 1 interaction terms increase in magnitude (sometimes subject to decreased significance as mentioned above). For the unconditional relation, size interaction terms decrease in magnitude, whereas liquidity interaction terms increase. To rule out a possible increase in the number of large order imbalances as the cause for the changes during the financial crisis, we compared the fractions of small, medium and large order imbalances for the crisis sub-sample to those in the entire sample. During the crisis, the fraction of small imbalances shows a small increase, while the two categories of larger imbalances decrease slightly. Hence, the results in Table 11 are not driven by changes in the magnitude of order imbalances.

6 Summary

In this paper, we investigated effects of order flow imbalance on daily returns of German stocks. In contrast to previous studies based on time series regressions, we used fixed-effects panel regressions. For the conditional relation (including concurrent order imbalance), our results confirm those of previous studies. For the unconditional relation (which allows forecasting returns from past order imbalance), our results are qualitatively in line with the literature, but the effects are weaker. This may point to increased efficiency of stock markets in the first decade of this century (this paper) compared to the 1990s (previous studies). We find pronounced and stable size effects and somewhat weaker liquidity effects. The general imbalance–return link in our sample is not driven by extreme order imbalances. Concurrent imbalance effects turn out to be stronger during the financial crisis. If information on canceled limit orders had been available for our dataset, effects of order imbalance would have been even more pronounced. A further limitation of our dataset is that it may not contain very large orders, which may be filled through channels outside the stock exchange. This may explain why we found decreasing effects for higher order imbalances, which is in contrast to some previous studies.

An interesting direction for further research would be a more comprehensive coverage and comparison of order imbalance effects across markets and observation frequencies: the geographical focus of existing studies lies mainly on the U.S. and some Asian countries, whereas there are hardly any results on other European markets. This holds both for daily frequencies and for intra-day data.