1 Introduction

Predictive causality and feedback between variables is one of the main subjects of applied time series analysis. Granger (1969) provided a definition that allows formal statistical testing of the hypothesis that one variable is not temporally related to (or does not “Granger-cause”) another one. Besides time series models, this hypothesis is also important in panel data analysis when examining relationships between macroeconomic or microeconomic variables.

The seminal paper of Holtz-Eakin et al. (1988) provided one of the early contributions to the panel data literature on Granger non-causality testing. Using Anderson and Hsiao (1982) type moment conditions, the authors put forward a Generalised Method of Moments (GMM) testing framework for short T panels with homogeneous coefficients. Unfortunately, this approach is less appealing when T is sizeable. This is due to the well-known problem of using too many moment conditions, which often renders the usual GMM-based inference highly inaccurate. While there exist alternative fixed T procedures that can be applicable to cases where T is large (e.g. those of Binder et al. 2005; Karavias and Tzavalis 2017; Juodis 2013; Arellano 2016; Juodis 2018), these methods are designed to estimate panels with homogeneous slope parameters only. Thus, when feedback based on past own values is heterogeneous (i.e. the autoregressive parameters vary across individuals), inferences may not be valid even asymptotically.

For the reasons above, one of the most popular approaches among practitioners has been the one proposed by Dumitrescu and Hurlin (2012), which can accommodate heterogeneous slopes under both null and alternative hypotheses. Their approach is reminiscent of the so-called IPS panel unit root test for heterogeneous panels proposed by Im et al. (2003) and involves averaging of individual Wald statistics. The resulting standardized Wald test statistic has asymptotic normal limit as \(T\rightarrow \infty \) followed by \(N\rightarrow \infty \). However, this approach does not account for “Nickell” bias, and therefore, it is theoretically justified only for sequences with \(N/T^{2}\rightarrow 0\), as it is the case with standard Mean-Group type approaches.Footnote 1

The aim of this paper is to propose a new test for Granger non-causality that explicitly accounts for “Nickell” bias and is valid in both homogeneous and heterogeneous panels. The novelty of our approach comes from exploiting the fact that under the null hypothesis, while the individual effects and the autoregressive parameters may be heterogeneous across individuals, the Granger-causation parameters are all equal to zero and thus they are homogeneous. We therefore propose the use of a pooled estimator for these parameters only. Pooling over cross sections guarantees that the estimator has the faster \(\sqrt{NT}\) convergence rate.

The pooled estimator suffers from the incidental parameters problem of Neyman and Scott (1948) due to the presence of the predetermined regressors, see, e.g. Nickell (1981) and Karavias and Tzavalis (2016). This result implies that standard tests for pooled estimators do not control size asymptotically, unless \(N<<T\). To overcome this problem, we use the idea of Split Panel Jackknife (SPJ) of Dhaene and Jochmans (2015) and construct an estimator that is free from the “Nickell bias”. This type of bias correction works very well under circumstances that are empirically relevant: moderate time dimension, heterogeneous nuisance parameters, and high persistence, as argued by Dhaene and Jochmans (2015), Fernández-Val and Lee (2013) and Chambers (2013), respectively. Furthermore, Chudik et al. (2018) argue that SPJ procedures are suitable so long as \(N/T^{3} \rightarrow 0\). Thus, we test the null hypothesis of Granger non-causality by using a Wald test based on our bias-corrected estimator.

A Monte Carlo study shows that the proposed method has good finite sample properties even in panels with a moderate time dimension. In contrast, the Wald statistic of Dumitrescu and Hurlin (2012) can suffer from substantial size distortions, especially when \(T<<N\). In terms of power, the proposed method appears to dominate the method of Dumitrescu and Hurlin (2012), especially so in panels with N and T both large.

Using a panel data set of 350 U.S. banks observed during the period 2006:Q1-2019:Q4, we test for Granger non-causality between banks’ profitability and cost efficiency. The null hypothesis is rejected in all cases, except for large banks during a period spanning the financial crisis (2007–2009) and prior to the introduction of the Dodd–Frank Act in 2011. This outcome may be conducive of past moral hazard-type behaviour of large financial institutions.

The remainder of the present paper is organized as follows: Sect. 2 sets up the model and the hypothesis of interest. Section 3 outlines the SPJ estimator and the proposed test statistic. Section 4 studies the finite sample performance of the approach using Monte Carlo experiments. Section 5 presents the empirical illustration, and Sect. 6 concludes.

2 Testing framework

We consider a simple linear dynamic panel data model with a single covariate \(x_{i,t}\):

$$\begin{aligned} y_{i,t}=\phi _{0,i}+\sum _{p=1}^{P}\phi _{p,i}y_{i,t-p}+\sum _{q=1}^{Q} \beta _{q,i} x_{i,t-q}+\varepsilon _{i,t}; \quad t=1,\ldots ,T, \end{aligned}$$
(2.1)

for \(i=1,\ldots ,N\), where \(\phi _{0,i}\) captures the individual-specific fixed effects, \(\varepsilon _{i,t}\) denotes the innovation for individual i at time t, \(\phi _{p,i}\) denotes the heterogeneous autoregressive coefficients and \(\beta _{q,i}\) denotes the heterogeneous feedback coefficients or Granger causation parameters.Footnote 2 Thus, we assume that \(y_{i,t}\) follows an ARDL(P,Q) process; more generally, \(y_{i,t}\) can be considered as one of the equations of a joint VAR model for \((y_{i,t},x_{i,t})'\). Such bivariate system is studied for simplicity of presentation, as our results are straightforwardly extendable to multivariate systems.Footnote 3

The null hypothesis that the time series \({x_{i,t}}\) does not Granger-cause (linearly) the time series \(y_{i,t}\) can be formulated as a set of linear restrictions on the \(\beta \)’s in Eq. (2.1):

$$\begin{aligned} H_{0}: \quad \beta _{q,i}=0, \quad \text {for all}\ i\ \text {and}\ q, \end{aligned}$$
(2.2)

against the alternative

$$\begin{aligned} H_{1}: \quad \beta _{q,i}\ne 0 \quad \text {for some}\ i\ \text {and}\ q. \end{aligned}$$
(2.3)

The model, null and alternative hypotheses presented here are as in Dumitrescu and Hurlin (2012). Similarly to the case of panel unit root testing, rejection of the null hypothesis should be interpreted as evidence of the existence of a large enough number of cross-sectional units i in which the null hypothesis is violated (see, e.g. Pesaran 2012).

3 Approach

Equation (2.1) can be re-written as follows:

$$\begin{aligned} y_{i,t}=\varvec{z}_{i,t}'\varvec{\phi }_{i} +\varvec{x}_{i,t}'\varvec{\beta }_{i}+\varepsilon _{i,t}, \end{aligned}$$
(3.1)

where \(\varvec{z}_{i,t}=(1,y_{i,t-1},\ldots ,y_{i,t-P})'\) and \(\varvec{x}_{i,t}=(x_{i,t-1},\ldots ,x_{i,t-Q})'\) are column vectors of order \(1+P\) and Q, respectively, while \(\varvec{\phi }_{i}=(\phi _{0,i},\ldots ,\phi _{P,i})'\) and \(\varvec{\beta }_{i}=(\beta _{1,i},\ldots ,\beta _{Q,i})'\) denote the corresponding parameter vectors.

Define \(\varvec{y}_{i}=(y_{i,1},\ldots ,y_{i,T})'\) and \(\varvec{\varepsilon }_{i}=(\varepsilon _{i,1},\ldots ,\varepsilon _{i,T})'\), both of which are column vectors of order T, and let \(\varvec{Z}_{i}=(\varvec{z}_{i,1},\ldots ,\varvec{z}_{i,T})'\) be a matrix of dimension \(\left[ T \times (1+P) \right] \), and \(\varvec{X}_{i}=(\varvec{x}_{i,1},\ldots ,\varvec{x}_{i,T})'\), a matrix of dimension \(\left[ T \times Q \right] \). Equation (3.1) can be expressed in vector form as

$$\begin{aligned} \varvec{y}_{i}=\varvec{Z}_{i}\varvec{\phi }_{i} +\varvec{X}_{i}\varvec{\beta }_{i}+\varvec{\varepsilon }_{i}. \end{aligned}$$
(3.2)

Observe that under the null hypothesis of Granger non-causality, the true coefficient vector of \(\varvec{X}_{i}\) equals zero. Thus, assuming homogeneity in \(\varvec{\beta }_{i}\), Eq. (3.2) becomes

$$\begin{aligned} \varvec{y}_{i}=\varvec{Z}_{i}\varvec{\phi }_{i} +\varvec{X}_{i}\varvec{\beta }+\varvec{\varepsilon }_{i}. \end{aligned}$$
(3.3)

In what follows, we shall use the above model specification to estimate the common parameters \(\varvec{\beta }\). In particular, we propose the following least-squares (fixed effects type) estimator of \(\varvec{\beta }\):

$$\begin{aligned} \hat{\varvec{\beta }}=\left( \sum _{i=1}^{N}\varvec{X}_{i}' \varvec{M}_{\varvec{Z}_{i}} \varvec{X}_{i}\right) ^{-1} \left( \sum _{i=1}^{N}\varvec{X}_{i}'\varvec{M}_{\varvec{Z}_{i}} \varvec{y}_{i}\right) , \end{aligned}$$
(3.4)

where \(\varvec{M}_{\varvec{Z}_{i}}\) denotes a \(\left[ T \times T \right] \) matrix that projects on the orthogonal complement of \(\varvec{Z}_{i}\), i.e. \(\varvec{M}_{\varvec{Z}_{i}} =\varvec{I}_{T}-\varvec{Z}_{i}\left( \varvec{Z}_{i}^{\prime } \varvec{Z}_{i} \right) ^{-1}\varvec{Z}_{i}^{\prime }\). The estimator in Eq. (3.4) generalizes the standard FE estimator, as the latter imposes that all slope coefficients are homogeneous, including the autoregressive parameters (see, e.g. Hahn and Kuersteiner 2002). Note that for this estimator to be well defined, a sufficient number of \(\varvec{M}_{\varvec{Z}_{i}}\) matrices should be nonzero. As in that paper, we limit our attention to balanced panels, and so the necessary condition is \(T>1+P\), which ensures that the coefficients \(\varvec{\phi }_{i}\) are estimable.

The model in (2.1) belongs to a class of panel data models with nonadditive unobserved heterogeneity studied in Fernández-Val and Lee (2013). In particular, under Conditions 1–2 of that paper, which restrict \(\varvec{q}_{i,t}=(y_{i,t},x_{i,t})'\) to be a strong mixing sequence, conditional on all time-invariant effects, with at least \(4+\delta \) moments (for some \(\delta >0\)), the asymptotic distribution of \(\hat{\varvec{\beta }}\) is readily available. Note that the aforementioned restriction rules out non-stationary and local-to-unity dynamics in \(\varvec{y}_{i}\) and \(\varvec{X}_{i}\).

In order to facilitate further discussion, we shall adapt the conclusions of Theorem 1 in Fernández-Val and Lee (2013) to the present setup:

Theorem 3.1

Under Conditions 1–2 (Fernández-Val and Lee 2013) and given \(N/T\rightarrow a^{2} \in [0;\infty )\) as \(N,T\rightarrow \infty \) jointly:

$$\begin{aligned} \sqrt{NT}\left( \hat{\varvec{\beta }}-\varvec{\beta }_{0}\right) \mathop {\rightarrow }\limits ^{d}\varvec{J}^{-1}N\left( -a \varvec{b},\varvec{V}\right) . \end{aligned}$$
(3.5)

The Hessian matrix \(\varvec{J}\) in our case is given by:

$$\begin{aligned} \varvec{J}=\mathrm{plim}_{N,T\rightarrow \infty }\frac{1}{NT} \sum _{i=1}^{N}\varvec{X}_{i}'\varvec{M}_{\varvec{Z}_{i}}\varvec{X}_{i}, \end{aligned}$$
(3.6)

while the exact form of \(\varvec{V}\) and \(\varvec{b}\) depends on the underlying assumptions of \(\varepsilon _{i,t}\). For example, if \(\varepsilon _{i,t}\) are independent and identically distributed (i.i.d.) over i and t, i.e. \(\varepsilon _{i,t}\sim i.i.d.(0,\sigma ^{2})\), then

$$\begin{aligned} \varvec{V}=\sigma ^{2} \varvec{J}. \end{aligned}$$
(3.7)

The vector \(\varvec{b}\) captures the incidental parameter bias of the common parameter estimator, which is induced by estimation of \(\varvec{\phi }_{1},\ldots ,\varvec{\phi }_{N}\). We will not elaborate on the exact form of this matrix, as it is not needed for the purposes of this paper.Footnote 4

Although \(\hat{\varvec{\beta }}\) is consistent, the asymptotic distribution of the estimator is not centered around zero under sequences where N and T grow at a similar rate. The presence of bias invalidates any asymptotic inference because the bias is of the same order as the variance (that is, unless \(a=0\)). In particular, the use of \(\hat{\varvec{\beta }}\) for Granger non-causality testing of \(H_{0}: \varvec{\beta }_{0}=\varvec{0}_{Q}\) will not lead to a test with correct asymptotic size. As a result, the Wald test statistic:

$$\begin{aligned} W=NT \hat{\varvec{\beta }}'\left( \varvec{J}^{-1} \varvec{V}\varvec{J}^{-1}\right) ^{-1}\hat{\varvec{\beta }}, \end{aligned}$$
(3.8)

converges to a non-central \(\chi ^{2}(Q)\) distribution under the null hypothesis even if \(\varvec{J}\) and \(\varvec{V}\) are assumed to be known.

The above discussion implies that \(\hat{\varvec{\beta }}\) should not be used in the construction of the Wald test statistic (3.8). Instead, we suggest the use of the same test statistic, but based on an alternative estimator that is free from the asymptotic bias term \(-a \varvec{b}\). Below, we shall focus on a bias-corrected estimator constructed based on the Jackknife principle, using the Half Panel Jackknife (HPJ) procedure of Dhaene and Jochmans (2015). Given a balanced panel with an even number of time series observations, the HPJ estimator is defined as

$$\begin{aligned} \tilde{\varvec{\beta }}\equiv 2\hat{\varvec{\beta }} -\frac{1}{2}\left( \hat{\varvec{\beta }}_{1/2} +\hat{\varvec{\beta }}_{2/1}\right) , \end{aligned}$$
(3.9)

where \(\hat{\varvec{\beta }}_{1/2}\) and \(\hat{\varvec{\beta }}_{2/1}\) denote the FE estimators of \(\varvec{\beta }\) based on the first \(T_{1}=T/2\) observations, and the last \(T_{2}=T-T_{1}\) observations, respectively. The HPJ estimator can be decomposed into a sum of two terms:

$$\begin{aligned} \tilde{\varvec{\beta }}=\hat{\varvec{\beta }}+\left( \hat{\varvec{\beta }} -\frac{1}{2} \left( \hat{\varvec{\beta }}_{1/2}+\hat{\varvec{\beta }}_{2/1}\right) \right) =\hat{\varvec{\beta }}+T^{-1}\hat{\varvec{b}}, \end{aligned}$$
(3.10)

where the second component implicitly estimates the bias term in (3.5). The use of this estimator can be justified in our setting given that the bias of \(\hat{\varvec{\beta }}\) is of order \((T^{-1})\) and thus satisfies the expansion requirement of Dhaene and Jochmans (2015). Although there do exist alternative ways of splitting the panel to construct a bias-corrected estimator, as shown in Dhaene and Jochmans (2015), the HPJ estimator minimizes the higher order bias in the class of Split Panel Jackknife (SPJ), provided that the data are stationary. For this reason, we limit our attention to Eq. (3.9).

Corollary 3.1

Under Conditions 1–2 of Fernández-Val and Lee (2013) and given \(N/T\rightarrow a^{2}\in [0;\infty )\) as \(N,T\rightarrow \infty \) jointly:

$$\begin{aligned} \hat{W}_{HPJ}=NT \tilde{\varvec{\beta }}'\left( \hat{\varvec{J}}^{-1} \hat{\varvec{V}} \hat{\varvec{J}}^{-1}\right) ^{-1} \tilde{\varvec{\beta }}\mathop {\rightarrow }\limits ^{d}\chi ^{2}(Q), \end{aligned}$$
(3.11)

where, assuming \(\varepsilon _{i,t}\sim i.i.d.(0,\sigma ^{2})\),

$$\begin{aligned} \hat{\varvec{J}}&=\frac{1}{NT}\sum _{i=1}^{N}\varvec{X}_{i}' \varvec{M}_{\varvec{Z}_{i}}\varvec{X}_{i}\\ \hat{\varvec{V}}&=\hat{\sigma }^{2}\hat{\varvec{J}}\\ {\hat{\sigma }}^{2}&=\frac{1}{N(T-1-P)-Q}\sum _{i=1}^{N}\left( \varvec{y}_{i} -\varvec{X}_{i}\hat{\varvec{\beta }}\right) ' \varvec{M}_{\varvec{Z}_{i}}\left( \varvec{y}_{i} -\varvec{X}_{i}\hat{\varvec{\beta }}\right) . \end{aligned}$$

The proof of this corollary follows from the corresponding results in Fernández-Val and Lee (2013) and Dhaene and Jochmans (2015). The formula for \(\hat{\varvec{V}}\) can be easily modified to allow for heteroskedasticity in both cross-sectional and time-series dimensions, based, e.g. on the clustered-covariance matrix estimator of Arellano (1987). For instance, cross-sectional heteroskedasticity can be accommodated by setting

$$\begin{aligned} \hat{\varvec{V}}=\frac{1}{N(T-1-P)-Q}\sum _{i=1}^{N} \varvec{X}_{i}'\varvec{M}_{\varvec{Z}_{i}} \hat{\varvec{\varepsilon }}_{i}\hat{\varvec{\varepsilon }}_{i}' \varvec{M}_{\varvec{Z}_{i}}\varvec{X}_{i}, \end{aligned}$$
(3.12)

where \(\hat{\varvec{\varepsilon }}_{i}=\varvec{y}_{i} -\varvec{X}_{i}\hat{\varvec{\beta }}\). Given the recent results in Chudik et al. (2018), we conjecture that for the HPJ approach to work it is only necessary to assume \(N/T^{3}\rightarrow 0\).

Remark 3.1

An alternative homogeneous estimator is available by taking into account the fact that under the null hypothesis, not only \(\varvec{\beta }_{i}=\varvec{\beta }\) for all i but also \(\beta _{1}=\beta _{2}=\ldots =\beta _{Q}=0\). Therefore, letting \(\varvec{x}_{i,-1}=(x_{i,0},\ldots ,x_{i,T-1})'\), one can also consider the following restricted fixed effects type estimator:

$$\begin{aligned} \hat{\beta }_{1}=\left( \sum _{i=1}^{N}\varvec{x}_{i,-1}' \varvec{M}_{\varvec{Z}_{i}} \varvec{x}_{i,-1}\right) ^{-1} \left( \sum _{i=1}^{N}\varvec{x}_{i,-1}'\varvec{M}_{\varvec{Z}_{i}} \varvec{y}_{i}\right) . \end{aligned}$$
(3.13)

This estimator is attractive because, under the null hypothesis, it does not require specifying a value for Q. However, the resulting Wald test statistic is expected to have lower power compared to that in Eq. (3.11).

Remark 3.2

Jackknife is by no means the only approach that corrects the incidental parameters bias of the FE estimator. Alternatively, one can consider an analytical bias-correction, as in Hahn and Kuersteiner (2002) and Fernández-Val and Lee (2013). However, the analytical approach has several practical limitations such as the need to specify a kernel function and the corresponding bandwidth. In this respect, the HPJ approach of Dhaene and Jochmans (2015) has some clear advantages.

4 Monte Carlo simulation

4.1 Design

To illustrate the performance of the new testing procedure, we adapt the Monte Carlo setup of Binder et al. (2005) and Juodis (2018). In particular, we assume that the bivariate vector \(\varvec{y}_{i,t} =(y_{i,t},x_{i,t})'\) is subject to the following VAR(1) process:

$$\begin{aligned} \varvec{y}_{i,t}={\varvec{\varPhi }}_{i}\varvec{y}_{i,t-1} +\varvec{\varepsilon }_{i,t}; \quad \varvec{\varepsilon }_{i,t} \sim N(\varvec{0}_{2},\varvec{\varSigma }), \end{aligned}$$
(4.1)

for all \(i=1,\ldots ,N,\) and \(t=1,\ldots ,T\). The vector \(\varvec{y}_{i,t}\) is assumed to be initialized in a distant past, in particular we set \(\varvec{y}_{i,-50} =\varvec{0}_{2}\) and discard the first 50 observations in estimation.

In order to simplify parametrization, our baseline setup specifies that some of the design matrices are common for all i. In particular, we adopt Design 2 of Juodis (2018) for the error variance matrix, setting

$$\begin{aligned} \varvec{\varSigma } \equiv \left( \begin{array}{ll} \sigma _{\varepsilon _{y}}^{2} &{}\quad \sigma _{\varepsilon _{y,x}} \\ \sigma _{\varepsilon _{y,x}} &{}\quad \sigma _{\varepsilon _{x}}^{2} \\ \end{array}\right) =\left( \begin{array}{ll} 0.07 &{}\quad 0.05 \\ 0.05 &{}\quad 0.07 \\ \end{array}\right) . \end{aligned}$$
(4.2)

Matrix \({\varvec{\varPhi }}_{i}\) is set equal to

$$\begin{aligned} {\varvec{\varPhi }}_{i}=\left( \begin{array}{ll} \alpha _{i} &{}\quad \beta _{i} \\ -0.5 &{}\quad \rho \\ \end{array}\right) , \end{aligned}$$
(4.3)

where in the homogeneous case we impose \(\alpha _{i}=\alpha =0.4\) while in the heterogeneous case \(\alpha _{i}=\alpha +\xi _{i}^{(y)} =0.4+\xi _{i}^{(y)}\), \(\xi _{i}^{(y)} \sim i.i.d.U \left[ -.15,.15\right] \). \(\rho \) alternates such that \(\rho =\{0.4;0.8\}\). This parameter controls the degree of persistence in \(x_{i,t}\), which can be either moderate (\(\rho =0.4\)) or high (\(\rho =0.8\)).

The main parameter of interest is \(\beta _{i}\). For \(\beta _{i}=0\), the \({\varvec{\varPhi }}_{i}\) matrix is lower triangular so that \(x_{i,t}\) does not Granger-cause \(y_{i,t}\). In this case, the empirical rejection rate corresponds to the size of the test. On the other hand, for \(\beta _{i}\ne 0\), the empirical rejection rate reflects power. In order to cover a broad range of possible alternative hypotheses, we consider the following schemes:

  1. 1.

    (Homogeneous). \(\beta _{i}=\beta \) for all i. \(\beta =\{0.00;0.02;0.03;0.05\}\).

  2. 2.

    (Heterogeneous). \(\beta _{i}=\beta +\xi _{i}^{(x)}\), \(\xi _{i}^{(x)} \sim i.i.d.U\left[ -0.1;0.1\right] \), where \(\beta \) is as in the homogeneous case.

The homogeneous design covers the classical pooled setup of Holtz-Eakin et al. (1988). On the other hand, heterogeneity introduced in the second design is qualitatively closer to Dumitrescu and Hurlin (2012). Note that in the heterogeneous case \(\mathrm{E}[\beta _{i}]=\beta \).

Given that the procedure of Dumitrescu and Hurlin (2012) is primarily used in medium-size macro-panels, we focus on combinations of (NT) that better reflect these applications. In particular, we limit our attention to the following 9 combinations:

$$\begin{aligned} N=\{50;100;200\};\quad T=\{20;50;100\}. \end{aligned}$$
(4.4)

We consider the following test statistics:

  • “DHT”—the Dumitrescu and Hurlin (2012) Wald test statistic given byFootnote 5

    $$\begin{aligned} \widetilde{W}_{DH}=\sqrt{\frac{N}{2P}\frac{T-2P-5}{T-P-3}} \left( \left( \frac{T-2P-3}{T-2P-1}\right) \frac{1}{N}\sum _{i=1}^{N}W_{i}-P\right) . \end{aligned}$$
    (4.5)
  • “HPJ”—the proposed pooled Wald test statistic in Eq. (3.11), which is based on the HPJ bias-corrected estimator.

Inference is conducted at the \(5\%\) level of significance. The total number of Monte Carlo replications is set to 5, 000. Size-adjusted power is reported.

In an alternative setup, we also consider heteroskedastic innovations, where the top-diagonal entry of the variance–covariance matrix \(\varvec{\varSigma }\) in Eq. (4.2), \(\sigma _{\varepsilon _{y}}^{2}\), is scaled by \(\xi _{i}^{(\varepsilon )} \sim i.i.d.U\left[ 0,2\right] \), such that \(\mathrm{E}\left[ \sigma _{\varepsilon _{y},i}^{2}\right] =\sigma _{\varepsilon _{y}}^{2} \mathrm{E}\left[ \xi _{i}^{(\varepsilon )} \right] =0.07\).

4.2 Results

This section provides a brief summary of the simulation results, which are reported in Tables 3, 4, 5 and 6 in Appendix A. In specific,

  • (size) when the degree of persistence in \(x_{i,t}\) is moderate, such that \(\rho =0.4\), both HPJ and DHT tests perform similarly. In particular, empirical size is fairly close to its nominal value in most circumstances, with some size distortions observed when \(T<< N\), especially for DHT. On the other hand, for \(\rho =0.8\), the performance of both tests deteriorates. This is particularly so for DHT, where in 8 out of 18 cases size exceeds 20%. In fact, for the case where \(N=200\) & \(T=20\) size is over 50%. On the other hand, HPJ appears to be more reliable and size remains below 15% under all circumstances.

  • (power) for \(\rho =0.4\) HPJ dominates DHT almost uniformly in terms of power. Similar conclusions can be drawn for \(\rho =0.8\). Note that on average, for any fixed value of N, power increases with T at a higher rate for HPJ than DHT, which reflects the \(\sqrt{NT}\) convergence rate of the bias-corrected least-squares estimator employed by the HPJ test.

  • (homogeneous vs heterogeneous models) The performance of the tests in the heterogeneous model is similar to the homogeneous one in terms of both size and power.

  • (homoskedasticity vs heteroskedasticity) The results are similar in terms of both size and power under homoskedasticity and heteroskedasticity. This implies that heteroskedasticity does not distort the performance of the tests, once appropriately accounted for.

In summary, the above results suggest that HPJ has good finite sample properties even in panels with a moderate time dimension. In contrast, DHT can suffer from substantial size distortions, especially when \(T<<N\). Moreover, in terms of power, HPJ dominates DHT, especially so in panels where N and T are both large.Footnote 6

5 Illustration: Granger causality evidence on bank profitability and efficiency

We perform Granger non-causality tests in order to examine the sign and the type of temporal relation between banks profitability and cost efficiency. We employ panel data from a random sample of 350 U.S. banking institutions, each one observed over 56 time periods, namely 2006:Q1-2019:Q4. This data set has also been used by Cui et al. (2020), albeit in a different context related to the estimation of a spatial dynamic panel model with common factors. The data are publicly available, and they have been downloaded from the Federal Deposit Insurance Corporation (FDIC) website.Footnote 7

5.1 Data and model specification

We consider the following specification:

$$\begin{aligned} y_{i,t}=\phi _{0,i}+\sum _{p=1}^{P}\phi _{p,i}y_{i,t-p} +\sum _{q=1}^{Q}\beta _{q,i} x_{i,t-q}+\varepsilon _{i,t}, \end{aligned}$$
(5.1)

for \(i=1,\ldots ,N\) and \( t=1,\ldots ,T\), where y denotes profitability, which is proxied by the return on assets (ROA), defined as annualized net income after taxes expressed as a percentage of average total assets, and x denotes the time-varying operational cost efficiency of bank i at period t, to be defined shortly. The parameters of the model above are described in Sect. 2. For the purposes of the present illustration, we shall focus on the unidirectional link (one-way causation) from cost efficiency to profitability. In addition, we shall impose \(P=Q\).

A measure of cost efficiency has been constructed based on a cost frontier model using a translog functional form, two outputs and three inputs. In particular, following Altunbas et al. (2007), we specify

$$\begin{aligned} \mathrm{ln} TC_{i,t}&= \sum _{h=1}^{3} \gamma _{h} \mathrm{ln} P_{h,i,t} + \sum _{h=1}^{2} \delta _{h} \mathrm{ln} Y_{h,i,t} + 0.5 \sum _{m=1}^{2}\sum _{n=1}^{2} \mu _{mn} \mathrm{ln} Y_{m,i,t} \mathrm{ln} Y_{n,i,t}\nonumber \\&\quad + \sum _{m=1}^{3}\sum _{n=1}^{3} \pi _{mn} \mathrm{ln} P_{m,i,t} \mathrm{ln} P_{n,i,t} +\sum _{m=1}^{2}\sum _{n=1}^{3} \xi _{mn} \mathrm{ln} Y_{m,i,t} \mathrm{ln} P_{n,i,t} +\eta _{i} + \tau _{t} + \upsilon _{it}, \end{aligned}$$
(5.2)

where TC represents total cost, while \(Y_{1}\) and \(Y_{2}\) denote two outputs, net loans and securities, respectively; \(Y_{1}\) is defined as gross loans minus reserves for loan loss provision. \(Y_{2}\) is the sum of securities held to maturity and securities held for sale. \(P_{1}\), \(P_{2}\) and \(P_{3}\) denote three input prices, namely the price of capital, price of labour and price of loanable funds. The model above is estimated using two-way fixed effects regression. The bank-specific, time-varying operational inefficiency component is captured by the sum of the two fixed effects, i.e. \(\eta _{i} + \tau _{t}\). Subsequently, cost efficiency, \(x_{i,t}\) is computed as follows:

$$\begin{aligned} x_{i,t}= e^{ \mathrm{min}\{\hat{\eta }_{i} + \hat{\tau }_{t}\}_{i,t} -(\hat{\eta }_{i} + \hat{\tau }_{t}) }, \end{aligned}$$
(5.3)

which ensures that larger scores imply higher cost efficiency such that the most efficient bank scores one.

We initially test for Granger non-causality using Eq. (5.1) based on the entire sample, i.e. all 350 banks during 2006:Q1-2019:Q4. Subsequently, we split banks into two groups based on their average size, which is proxied by the natural logarithm of banks total assets. The grouping of banks is performed using a k-means algorithm, as advocated, e.g. in Lin and Ng (2012) and Sarafidis and Weber (2015). In addition, we distinguish between two subperiods, namely “Basel II” (2006:Q1-2010:Q4) and a period under the Dodd-Frank Act “DFA” (2011:Q1-2019:Q4). Basel II represents the second of the Basel Accords and constitutes recommendations on banking laws and regulations issued by the Basel Committee on Banking Supervision (BCBS).Footnote 8 The DFA is a federal law enacted towards the end of 2010, aiming “to promote the financial stability of the United States by improving accountability and transparency in the financial system, to end “too big to fail”, to protect the American taxpayer by ending bailouts, to protect consumers from abusive financial services practices, and for other purposes”.Footnote 9 In a nutshell, the DFA has instituted a new failure-resolution regime, which seeks to ensure that losses resulting from bad decisions by managers are absorbed by equity and debt holders, thus potentially reducing moral hazard.

5.2 Results

Table 1 reports summary statistics for the two groups of banks in terms of their size, proxied by the natural logarithm of the average value (over time) of total assets.

Table 1 Summary statistics for bank size

Table 2 reports results for the Wald test statistic and its p value for the null hypothesis \({H_{0}}: {\beta _{q,i}=0} \quad \text {for all}\,\, {i} \text { and } q\). We also report the estimated number of lags employed, \(\hat{P}\), which is obtained using BIC,Footnote 10 as well as estimates for the pooled estimator (standard errors in parentheses) of the Granger-causation parameters, defined in Eq. (3.9) and denoted as \(\hat{\beta }\). When \(\hat{P}=1\), \(\hat{\beta }=\hat{\beta }_{1}\) in Eq. (5.1), whereas for \(\hat{P}>1\) we report the sum of the estimates of \(\beta _{q}\), \(q=1,\dots ,\hat{P}\), i.e. \(\hat{\beta }=\sum _{q=1}^{\hat{P}} \hat{\beta }_{q}\). The variance–covariance matrix of the pooled estimator, \(\widehat{\varvec{V}}\), is computed as in Eq. (3.12), i.e. it accommodates cross-sectional heteroskedasticity. For the purposes of comparison, we also report the mean-group estimator of the Granger-causation parameters, \(\hat{\beta }_{MG}\), computed using the sample mean (across i) of the corresponding individual-specific regression estimates.

Table 2 Results for the HPJ-based Wald test approach

The top panel corresponds to the entire sample of 350 banks. Column “Full” reports results for the entire period of the sample, i.e. 2006:Q1-2019:Q4. Columns “Basel II” and “DFA” present results for two different subperiods, namely 2006:Q1-2010:Q4 and 2011:Q1-2019:Q4, respectively. The middle panel contains results for “small-sized” banks, followed by “large-sized” banks at the bottom panel.

As we can see, in almost all cases the null hypothesis is rejected at the \(1\%\) level of significance, which implies that cost efficiency Granger-causes profitability, i.e. past values of x contain information that helps to predict y over and above the information contained in past values of y. The only exception occurs when it comes to large banks during Basel II, where the null hypothesis is not rejected, with a p value approximately equal to 0.509. This result is important because it signifies potential moral hazard-type behaviour prior to the introduction of the DFA; such outcome is consistent with findings in the existing literature, such as those of Cui et al. (2020) and Zhu et al. (2020). However, following the introduction of DFA, the null of Granger non-causality is rejected for large banks as well.

In regards to the remaining quantities, in most cases \(\widehat{P}=1\), i.e. the optimal lagged value of x and y equals unity except for large banks during DFA, where \(\widehat{P}=2\). As expected, the Granger-causation parameters are statistically significant at the \(5\%\) level, except for \(\hat{\beta }_{MG}\) when the null hypothesis of Granger non-causality is not rejected.

We have also run Granger non-causality tests based on the method of Dumitrescu and Hurlin (2012) (the “DHT” test statistic) using the Stata algorithm developed by Lopez and Weber (2017).Footnote 11 The results are identical when it comes to lag model selection using BIC. However, as it turns out, this time the null hypothesis of Granger non-causality is rejected in all cases, including for the sample of large banks during the subperiod under Basel II. In particular, in this case the DHT statistic equals 2.58 with a p value of 0.0099. Given that the result is marginal at the \(1\%\) level of significance, and taking into account the potentially substantial size distortions observed in the simulations for the DHT test when \(T=20\), one is inclined to trust the outcome of the HPJ-based Wald test reported in Table 2.

6 Conclusions

This paper considers the problem of Granger non-causality testing in panels with large cross-sectional and time series dimensions. First, we put forward a pooled fixed effects type estimator for the Granger-causation parameters, which makes use of the fact that, under the null hypothesis, these parameters are all equal to zero and, thus, they are homogeneous. Pooling over cross sections guarantees that the estimator has a \(\sqrt{NT}\) convergence rate. In order to account for the well-known “Nickell bias”, we make use of the Split Panel Jackknife procedure of Dhaene and Jochmans (2015). Subsequently, a Wald test is proposed, which is based on the bias-corrected fixed effects type estimator. The resulting approach is valid irrespective of whether the alternative hypothesis is homogeneous or heterogeneous, or whether the autoregressive parameters vary across individuals or not, so long as T is (at least moderately) large.

The statistical model considered in this paper rules out any forms of the cross-sectional dependence in \(\varepsilon _{i,t}\). This restriction can be easily relaxed if one is willing to assume that cross-sectional dependence is strong, generated by an unobserved factor component, \(\varvec{\lambda }_{i}'\varvec{f}_{t}\). In particular, in this case one can use either the Common Correlated Effects (CCE) approach of Pesaran (2006)/Chudik and Pesaran (2015) combined with HPJ as in Juodis et al. (2020), or the PC estimator of Bai (2009)/Ando and Bai (2017). In these setups, the HPJ-based statistic provides a natural starting point, as the finite T corrections proposed by Dumitrescu and Hurlin (2012) are not feasible. In panels with homogeneous autoregressive parameters and T fixed, one can employ the GMM framework of Robertson and Sarafidis (2015) and the linear GMM estimator of Juodis and Sarafidis (2020).Footnote 12 We leave these avenues for future research.