1 Introduction

As a statistical concept, causality has a central role both from a theoretical and a practical point of view (see Berzuini et al. 2012). In time series analysis, it was first introduced by Wiener in the context of prediction theory (Wiener 1956) and then formalized by Granger in the context of linear regression modelling of stochastic processes (Granger 1969). Since then it is denoted as Granger-causality (GC). Specifically, \({Y}_t\) is said to Granger-cause \({X}_t\) if the past values of \({Y}_{t}\), i.e. \({Y}_{t-1}\), \({Y}_{t-2}\), \(\dots \), help predicting the current value of \({X}_t\).

The concept of causality was then extended from the time to the frequency domain. Pierce (1979) explicitly measured the effect of the lagged values of \(Y_t\) onto \(X_t\) through an \(R^2\) indicator, defined frequency-wise to evaluate the intensity of the linear causal dependence at each frequency.

Geweke (1982) proposed to estimate a VAR model on \([X_t, Y_t]'\) by defining the causality from \(Y_t\) to \(X_t\) at each frequency as the proportional effect of the lagged values of \(Y_t\) onto the spectral density of \(X_t\). This approach can be implemented even considering the conditional causality with respect to the lagged values of a third variable \(W_t\) (Geweke 1984). The described measures are respectively defined as unconditional and conditional Granger-causality spectra (Geweke 1982, 1984), and also discussed by Hosoya (1991) and Hosoya (2001). The conditional Granger-causality has then been reformulated by Chen et al. (2006).

While the use of GC in the time-domain dates back to the sixties, GC in the frequency domain has been employed in more recent years. In Lemmens et al. (2008) the causality structure of European production expectation surveys is analyzed by the methods of Pierce (1979) and Geweke (1982) comparatively, which require to study appropriate frequency-wise coefficients of coherence. The same approach is used in Tiwari (2014) for exploring the relationship between energy consumption and income in the United States. In this paper, we aim to study the mutual relationship between gross domestic product and money stock in the Euro Area by making inference on Granger-causality spectra. The problem has been widely addressed for the U.S. economy, which is a key player from an international macroeconomic perspective. In Belongia and Ireland (2016), for instance, the methodology of Friedman and Schwartz (1975), based on structural VAR models, is revisited and applied to U.S. data across the period 1967–2013. The same consideration applies to the Indian economy (see Nadig and Viswanathan 2019). On the contrary, since the European Central Bank was born in 1998, evidences and analyses regarding the Euro Area are less established and more controversial.

The advantage of frequency-domain GC lies in the disentanglement of the causality structure across a range of frequencies, while traditional time-domain GC only provides an overall indication of the presence of a causality relationship. In particular, GC measures at each frequency the intensity of the causal relationship between the components of the series at that particular frequency. For this reason, we aim to provide an inferential procedure to highlight at which specific frequency there exists causality among the data generating processes of the observed time series.

In Breitung and Candelon (2006), a parametric test (henceforth, BC test) for Granger-causality in the frequency domain is proposed. Its convergence rate is \(O(T^{-1/2})\) (where T is the time series length) and its power is decreasing as the distance of the frequency of interest from \(\frac{\pi }{2}\) increases (even if Yamada and Yanfeng 2014 show that the same test is still useful at extreme frequencies). The test is based upon a set of linear restrictions on the parameters of the (possibly cointegrated) VAR model best representing the series (we refer to Lütkepohl 2005 for VAR selection and estimation). Breitung and Schreiber (2018) generalize the BC procedure to test causality and delay within a frequency band.

BC test is widely used in the economic literature. In Croux and Reusens (2013), it is applied to test the predictive power of domestic stock prices for the economic activity of G-7 countries. Bozoklu and Yilanci (2013) consider the relationship between energy consumption and economic growth in the OECD area. The relationship between real and financial business cycles is analyzed in Gomez-Gonzalez et al. (2015). Wei (2015) deals with the link between commodity prices and monetary policies. Keho et al. (2015) test Wagner’s law for some African economies. Two relevant applications for Turkish economy are described in Gül et al. (2018) and Tastan and Sahin (2020). Fromentin and Tadjeddine (2020) analyze the relationship between cross-border workers and financial instability in Luxembourg.

Some alternative testing approaches were also proposed. Hidalgo (2000) estimates VAR filters via generalized least squares and then accordingly derives a test statistics for GC in the presence of long-range dependence. Hidalgo (2005) extends the framework of Hidalgo (2000) to the multivariate case computing relevant quantiles under the null via resampling bootstrap. A consistent nonparametric test for nonlinear causality is proposed by Nishiyama et al. (2011). A nonparametric approach to test for Granger-causality in tail events is proposed by Hong et al. (2009) and extended by Candelon and Tokpavi (2016). The Philipps spectral estimator (Phillips 1988) is instead exploited to estimate causality both at frequency 0 and at the rest of frequencies in a cointegrated setting by Assenmacher-Wesche and Gerlach (2008). Such a method is used in Berger and Osterholm (2011) to test the relationship between money growth and inflation in the Euro Area. A comprehensive computational and inferential strategy for time-domain and frequency-domain GC spectra has been proposed in Barnett and Seth (2014).

The present work proposes a different approach to the classical testing framework of the no-causality hypothesis. Our null hypothesis of zero-causality, in fact, is tested frequency-wise comparing each causality to the \(100 \times (1-\alpha )-\)th percentile of the distribution of the median causality across frequencies under the stronger case of stochastic independence, where \(\alpha \) is the significance level. The null distribution is approximated via the stationary bootstrap of Politis and Romano (1994). As empirically shown in Sect. 2.4, the median is chosen because under the null the individual causalities and the median causality are indistinguishable from a stochastic point of view. We thus derive the desired bootstrap quantile of the median causality across frequencies and we compare each observed causality to it. Our test is consistent provided that \(T^{\frac{1}{3}} \rightarrow \infty \).

The paper is organized as follows. In the next Section the concept of Granger-causality in the frequency domain is recalled, our bootstrap inference approach is explained in detail and a simulation study which clarifies the features of our test is presented. In Sect. 3 we show the potentialities of our method in outlining the causal relationships between money and output in the Euro Area during the period 1999-2017. Finally, we conclude the paper with a discussion.

2 Granger-Causality Spectra: A Bootstrap Testing Approach

2.1 Definition

We now briefly recall the bases of Granger-causality spectral theory. We follow the approach in Ding et al. (2006), which we refer to for the details. Suppose that the stochastic processes \({X}_t\) and \({Y}_t\), jointly covariance-stationary, follow a non-singular VAR(p) model. Defining the bivariate stochastic process \({Z}_t=[{X}_t, {Y}_t]'\), we have

$$\begin{aligned} {Z}_t={A}_{1}{Z}_{t-1} +\ldots +{A}_{p}{Z}_{t-p} +{\epsilon }_t, \end{aligned}$$
(1)

where \({\epsilon }_t\) is a zero-mean bivariate stochastic process with covariance matrix \({{\varSigma _2}}\) and \({A}_{1}, \ldots , {A}_{p}\) are \(2 \times 2\) coefficient matrices. We stress that \({\epsilon }_t\) has no auto-covariance structure at any lag different from 0.

Moving to the frequency domain, for each frequency \(\omega \) we define the transfer function \({P}(\omega )\) of \({Z}_t\) in (1) as

$$\begin{aligned} {P}(\omega )=\left( {I}_2-\sum _{j=1}^p {A}_j e^{-i j\omega }\right) ^{-1}, \; -\pi \le \omega \le \pi , \end{aligned}$$
(2)

which is invertible if and only if the roots of the equation \(\det ({I}_2-\sum _{j=1}^{p}{A}_jL^j)~=~0\) (where L is the lag operator) lie within the unit circle. Setting \({P}(\omega )= \)

\(\left[ \begin{array}{cc} {P}_{{X}{X}}(\omega )&{} {P}_{{{X}{Y}}}(\omega )\\ {P}_{{{Y}{X}}}(\omega ) &{} {P}_{{{Y}{Y}}}(\omega ) \end{array}\right] \), the definition (2) allows to define in a compact way the model-based spectrum \({h}(\omega )\) as follows:

$$\begin{aligned} {h}(\omega )={P}(\omega ){\varSigma _2} {P}(\omega )^{*}, \; -\pi \le \omega \le \pi , \end{aligned}$$

where \(^{*}\) denotes the complex conjugate transpose.

Setting \({\varSigma _2}= \left[ \begin{array}{cc} \sigma _2&{} \upsilon _2\\ \upsilon _2 &{} \gamma _2 \end{array}\right] \), we need for computational reasons to define the transform matrix \({S}= \left[ \begin{array}{cc} 1 &{} 0\\ -\frac{\upsilon _2}{\sigma _2} &{} 1 \end{array}\right] ,\) from which we derive the transformed transfer function matrix \({\tilde{{P}}}(\omega )= {P}(\omega )\times {S}^{-1}\). The process \({Z}_t=[{X}_t, {Y}_t]'\) is normalized accordingly as \({{\tilde{Z}}}_t={\tilde{{P}}}(L)[{X}_t, {Y}_t]'\) and becomes \({{\tilde{Z}}}_t=[{{\tilde{X}}}_t, {{\tilde{Y}}}_t]'\).

The unconditional Granger-causality spectrum of \({X}_t\) (effect-variable) respect to \({Y}_t\) (cause-variable) is then defined as (Geweke 1982)

$$\begin{aligned} h_{{Y}\rightarrow {X}}(\omega )=\ln \left( \frac{{{h}}_{{{X} {X}}}(\omega )}{\sigma _2|{\tilde{{P}}}_{{{X} {X}}}(\omega )|^2}\right) .\end{aligned}$$
(3)

In the empirical analysis, the theoretical values of coefficient and covariance matrices will be replaced by the corresponding SURE estimates (Zellner 1962).

Moreover, we can define the conditional Granger-causality spectrum of \({X}_t\) respect to \({Y}_t\) given an exogenous variable \({W}_t\) (conditioning variable). Suppose that we estimate a VAR on \([{X}_t, {W}_t]'\) with covariance matrix of the noise terms \({\varSigma _{2'}}= \left[ \begin{array}{cc} \sigma _{2'}&{} \upsilon _{2'}\\ \upsilon _{2'} &{} \gamma _{2'} \end{array}\right] \) and transfer function \({G}(\omega )\) [defined as in (2)]. The corresponding normalized process of \([{X}_t, {W}_t]'\) (according to the procedure described above) is denoted by \([\tilde{{X}}_t, \tilde{{W}}_t]'\).

We then estimate a VAR on \([{X}_t, {Y}_t, {W}_t]'\) with covariance matrix of the noise terms

$$\begin{aligned} {{\varSigma }_3}=\left[ \begin{array}{ccc} \sigma _{{{X}{X}}} &{}\sigma _{{{X}{Y}}} &{} \sigma _{{{X}{W}}} \\ \sigma _{{{Y}{X}}} &{}\sigma _{{{Y}{Y}}} &{} \sigma _{{{Y}{W}}}\\ \sigma _{{{W}{X}}} &{}\sigma _{{{W}{Y}}}&{} \sigma _{{{W}{W}}} \end{array}\right] \end{aligned}$$

and transfer function \({P}'(\omega )\). Building the matrix

$$\begin{aligned} {C}(\omega )=\left[ \begin{array}{ccc} {G}_{{{X}{X}}}(\omega ) &{}0&{} {G}_{{{X}{W}}}(\omega ) \\ 0 &{}1&{} 0\\ {G}_{{{W}{X}}}(\omega ) &{}0&{} {G}_{{{W}{W}}}(\omega ) \end{array}\right] , \end{aligned}$$

we can define \({Q}(\omega )={C}^{-1}(\omega ) {P}'(\omega )\), which is a sort of “conditional” transfer function matrix. The theoretical spectrum of \({{{\tilde{X}}}}_t\) can thus be written as

$$\begin{aligned} h_{{{\tilde{X}}}{{\tilde{X}}}}(\omega )= & {} {Q}_{{{X}{X}}}(\omega )\sigma _{{{X} {X}}} {Q}_{{{X}{X}}}(\omega )^{*} \\&+{Q}_{{{X}{Y}}}(\omega )\sigma _{{YY}} {Q}_{{{X}{Y}}}(\omega )^{*} +{Q}_{{{X}{W}}}(\omega ) \sigma _{{{W}{W}}} {Q}_{{{X}{W}}}(\omega )^{*}. \end{aligned}$$

The conditional spectrum of \({X}_t\) (effect-variable) respect to \({Y}_t\) (cause-variable) given \({W}_t\) (conditioning variable) is (Geweke 1984)

$$\begin{aligned} h_{{Y}\rightarrow {X}|{W}}(\omega )=\ln \left( \frac{{h}_{{{\tilde{X}}} {{\tilde{X}}}}(\omega )}{\sigma _{{{X}{X}}} |{Q}_{{{X}{X}}}(\omega )|^2}\right) . \end{aligned}$$
(4)

Both \(h_{{Y}\rightarrow {X}}(\omega )\) and \(h_{{Y}\rightarrow {X}|{W}}(\omega )\) range from 0 to \(\infty \), with \(-\pi ~\le ~\omega ~\le ~\pi \). \(h_{{Y}\rightarrow {X}}(\omega )\) expresses the power of the relationship from \({Y}\) to \({X}\) at frequency \(\omega \), \(h_{{Y}\rightarrow {X}|{W}}(\omega )\) expresses the strength of the relationship from \({Y}\) to \({X}\) at frequency \(\omega \) given \({W}\). Therefore, the unconditional spectrum accounts for the whole effect of the past values of \({Y}_t\) onto \({X}_t\), while the conditional spectrum accounts for the direct effect of the past values of \({Y}_t\) onto \({X}_t\) excluding the effect mediated by the past values of \({W}_t\).

Granger-causality spectra \(h_{{Y}\rightarrow {X}}(\omega )\) and \(h_{{Y}\rightarrow {X}|W}(\omega )\) can be interpreted as follows. If \(h_{{Y}\rightarrow {X}}(\omega )>0\), it means that past values of \({Y}_t\) help predicting \({X}_t\), and \(\frac{1}{\omega }\) is a relevant cycle. If \(h_{{Y}\rightarrow {X}|W}(\omega )>0\), it means that past values of \({Y}_t\) in addition to those of \({W}_t\) help predicting \({X}_t\), and \(\frac{1}{\omega }\) is a relevant cycle. Significant frequencies give us some hints on the relevant delay structure of the cause variable with respect to the effect variable.

2.2 Testing Framework

The inference on Granger-causality spectra in the frequency domain is still an open problem. In fact, differently from the corresponding time-domain quantities, the limiting distribution for unconditional and conditional spectra under the null hypothesis of no-causality is unknown (see Barnett and Seth 2014, Sect. 2.5). In spite of that, Breitung and Candelon (2006) test the nullity of unconditional and conditional GC at each frequency \(\omega \), imposing a necessary and sufficient set of linear restrictions to the (possibly cointegrated) VAR model best fitting the series. The resulting test statistics is distributed under their null as a Fisher distribution with 2 and \(T-2p\) degrees of freedom (except for \(\omega =\{0,\pi \}\) at which the distribution is \(F_{(1,T-p)}\)), where p is the VAR delay and T is the time series length.

On the contrary, as explained in the Introduction, our benchmark is GC spectrum under the null hypothesis of stochastic independence. At each frequency \(\omega \), we test in Fisher’s sense the null hypothesis \(H_0: r(\omega )=0\) where the functional \(r(\omega )\) may be the unconditional GC \(h_{{Y}\rightarrow {X}}(\omega )\), assuming that \(X_t\) is stochastically independent from \(Y_t\), or the conditional GC \(h_{{Y}\rightarrow {X}|{W}}(\omega )\), assuming that \(X_t\) is stochastically independent from \(Y_t\) conditionally on \(W_t\). This equals to assume that under the null the bivariate stochastic process \([X_t,Y_t]\) is an independent process, unconditionally or conditionally on \(W_t\).

In order to test the null hypothesis, we refer the estimated causality at each frequency to the distribution of \({\hat{r}}_{med}\), the estimated median of \(r(\omega )\) across frequencies, approximated by the stationary bootstrap of Politis and Romano (1994). The median is chosen because under the null all causalities are indistinguishable with respect to the median causality from a stochastic point of view (see Sect. 2.4). We stress that unconditional and conditional spectra must be assessed separately, because their distributions are different under the null.

Our idea derives from Politis and Romano (1994), according to which each Fréchet-differentiable functional may be successfully approximated by the stationary bootstrap, and the resulting bias depends on the sum of the Fréchet-differential \(g_F\) evaluated at each \(X_t\). The bootstrap series obtained via the stationary bootstrap of Politis and Romano (1994) are stationary Markov chains conditionally on the data. This means that each bootstrap series \({X}_1^{*},\ldots ,{X}_T^{*}\) is a Markov chain conditionally on \({X}_1,\ldots ,{X}_T\).

Suppose that we apply the stationary bootstrap procedure independently to \({X}_t\) and \({Y}_t\), obtaining the stationary bootstrap series \({X}_t^{*}\) and \({Y}_t^{*}\). Computing unconditional Granger-causality spectra on those series equals to assess unconditional causalities under the assumption of stochastic independence, because \({X}_t^{*}\) and \({Y}_t^{*}\) result to be independent Markov chains. Therefore, testing each Granger-causality computed on the original series \({X}_t\), \({Y}_t\) against the median causality computed across frequencies on \({X}_t^{*}\) and \({Y}_t^{*}\) is effective as a test for causality strength under the null.

For the conditional case, we proceed as follows. First, we estimate a VAR model on the observed series \({X}_t\) and \({W}_t\) jointly considered. Then, we apply the residual bootstrap (see Tibshirani and Efron 1993) to VAR residuals, thus obtaining the consistent residual bootstrap series \({X}_t^{*}\) and \({W}_t^{*}\). This step is needed in order to avoid any bias which may arise ignoring the dependence structure between \({X}_t\) and \({W}_t\). Finally, we apply independently the stationary bootstrap to the observed series \({Y}_t\) to obtain the series \({Y}_t^{*}\). Computing conditional Granger-causality spectra on \({X}_t^{*}\), \({Y}_t^{*}\) and \({W}_t^{*}\) then equals to assess conditional causalities under the assumption that \({X}_t\) is stochastically independent from \({Y}_t\) given \({W}_t\). For this reason, we use the resulting distribution of the conditional median causality across frequencies computed on those series as our benchmark under the null hypothesis.

Let us consider \(\hat{{{r}}}(\omega )={{\hat{h}}}_{{Y}\rightarrow {X}}(\omega )\), which is defined as (3) where the coefficient matrices \({A}_j, j=1,\ldots ,p\), and the error covariance matrix \({\varSigma _2}\) are replaced by the corresponding SURE estimates (Zellner 1962). We know that SURE estimates \({{\hat{A}}}_j\), \(j=1,\ldots ,p\), are rational functions of \(X_t\), thus being Fréchet-differentiable. \({\hat{\tilde{{P}}}}(\omega )\) and \({{\hat{\varSigma }}}_2\) are functions of the \({{\hat{A}}}_j\), thus being rational in turn; the same holds as a consequence for \(\hat{{h}}(\omega )\). Therefore, \({{\hat{h}}}_{{Y}\rightarrow {X}}(\omega )\), the natural logarithm of a rational function of the data, is Fréchet-differentiable. At this point, even if the median is not Fréchet-differentiable, bootstrapping \({\hat{r}}_{med}=med_{}({{\hat{h}}}_{{Y}\rightarrow {X}}(\omega ))\) under the null is still valid, because the distribution of \({\hat{r}}_{med}\) is indistinguishable from the causality distribution at any frequency under the null as T is enough large (see Sect. 2.4). As a consequence, according to Politis and Romano (1994), paragraph 4.3, we can estimate consistently any quantile of the distribution of the median of \({{\hat{h}}}_{{Y}\rightarrow {X}}(\omega )\) across frequencies under the null hypothesis via the stationary bootstrap. Considering \(\hat{{{r}}}(\omega )={{\hat{h}}}_{{Y}\rightarrow {X}|{W}}(\omega )\), which is defined as (4) where the coefficient matrices and the error covariance matrix are replaced by the corresponding SURE estimates, a similar reasoning can be carried out.

We stress that our aim is not to represent the common multivariate distribution function F of the process \([{X}_t, {Y}_t]\) or \([{X}_t, {Y}_t, {W}_t]\). That problem is an estimation one, which would be effectively solved by parametric or residual bootstrap. Our aim is to exploit the random process \([{X}_t^{*}, {Y}_t^{*}, {W}_t^{*}]\) to derive the bootstrap quantile \(q_{{r},1-\alpha }^{*}\), which satisfies

$$\begin{aligned} P\left( {r}_{med}^{*}\le q_{{r},1-\alpha }^{*}\right) =1-\alpha , \end{aligned}$$

where \(\alpha \) is the significance level and \({r}_{med}^{*}\) is the bootstrap median across frequencies of unconditional or conditional GC under the assumption of stochastic independence. Since the stationary bootstrap procedure is valid for \({{\hat{r}}}_{med}\) under the null, \(P({r}_{med}^{*}\le q_{{r},1-\alpha }^{*})\) approximates consistently \(P({{\hat{r}}}_{med}\le q_{{r},1-\alpha |H_0})\) as \(T^{\frac{1}{3}}\rightarrow \infty \).

In more detail, suppose that \({r}\) is a Fréchet-differentiable functional, that is, there exists some influence function \(g_F\), giving the influence of single observations on the functional \({r}\) with respect to some distribution function F, such that

$$\begin{aligned} {r}(G)={r}(F)+\int g_F \mathrm {d}(G-F)+\mathrm {o}(||G-F||) \end{aligned}$$

with \(\int g_F \mathrm {d}F=0\) (||.|| is the supremum norm). We define the mixing coefficient (which controls for the overall amount of time dependence) as

$$\begin{aligned} \alpha _{X}(k)=\sup _{A,B} |P(A,B)-P(A)P(B)| \end{aligned}$$

where A and B vary over events in the \(\sigma \)-fields generated by \(\{{X}_t, t \le 0\}\), \(\{{X}_t, t \ge k\}\). The following Theorem, expanded by Politis and Romano (1994), holds.

Theorem 1

Suppose that \({X}^{\dagger }_t\) and \({Y}^{\dagger }_t\) are strictly stationary stochastic processes with distribution functions \(F_{{X}^{\dagger }}\), \(F_{{Y}^{\dagger }}\). Assume that, for some \(d \ge 0\), \(E(g_{F_{{X}^{\dagger }}}(X^{\dagger }_{1})^{2+d})<\infty \), \(\sum _{k=1}^{\infty } \alpha _{{X}^{\dagger }}(k)^{\frac{d}{2+d}}<\infty \) and \(\sum _{k=1}^{\infty } k^2 \alpha _{{X}^{\dagger }}(k)^{1/2-\tau }< \infty \), \(0<\tau <\frac{1}{2}\). Further assume that these assumptions also hold for \({Y}^{\dagger }_t\). Then, under \(H_0\), i.e. if the distribution function of the stochastic process \({Z}^{\dagger }_t=[{X}^{\dagger }_t, {Y}^{\dagger }_t]\) can be factorized as \(F_{{X}^{\dagger }}F_{{Y}^{\dagger }}\), it holds

$$\begin{aligned} \sup _{c \in {\mathcal {R}}} \Big \vert P\left\{ \sqrt{T}({r}({F}^{*}_{{Z}^{\dagger }}) -{r}({\hat{F}}_{{Z}^{\dagger }}))\le c\right\} - P\left\{ \sqrt{T}({r}({\hat{F}}_{{Z}^{\dagger }}) -{r}(F_{{Z}^{\dagger }}))\le c\right\} \Big \vert {\mathop {\longrightarrow }\limits ^{prob}} 0 \end{aligned}$$

for any Fréchet-differentiable functional \({r}\) under the assumption \(T^{\frac{1}{3}} \rightarrow \infty \).

The key to prove Theorem 1 is to bootstrap the cause and the effect series independently, and then show that any inference on a Fréchet-differentiable functional is weakly consistent.

For what concerns the conditional case, the following Theorem holds.

Theorem 2

Suppose that \({Z}^{''}_t=[{X}^{\dagger }_t|{W}^{\dagger }_t, {W}^{\dagger }_t|{X}^{\dagger }_t]\) and \({Y}^{\dagger }_t\) are strictly stationary stochastic processes with distribution functions \(F_{{Z}^{''}}\), \(F_{{Y}^{\dagger }}\). Assume that, for some \(d \ge 0\), \(E(g_{F_{{Y}^{\dagger }}}(Y^{\dagger }_{1})^{2+d})<\infty \), \(\sum _{k=1}^{\infty } \alpha _{{Y}^{\dagger }}(k)^{\frac{d}{2+d}}<\infty \) and \(\sum _{k=1}^{\infty } k^2 \alpha _{{Y}^{\dagger }}(k)^{1/2-\tau '}< \infty \), \(0<\tau '<\frac{1}{2}\). Further assume that \(0<E(g_{F_{Z^{''}}}({Z^{''}_{{1}}})^{2})<\infty \) and \(\int _{D_{{Z^{''}}}} \{F_{Z^{''}}(z'')(1-F_{Z^{'' }}(z''))\}^{\tau ''/2}dz''\), \(\tau ''>0\). Finally, assume that \({W}^{\dagger }_t|{X}^{\dagger }_t\) is stochastically independent from \({Y}^{\dagger }_t|{X}^{\dagger }_t\). Then, under \(H_0\), i.e. if \({X}^{\dagger }_t|{W}^{\dagger }_t\) is stochastically independent from \({Y}^{\dagger }_t|{W}^{\dagger }_t\), for the stochastic process \({Z}_t^{'}=[{X}^{\dagger }_t|{W}^{\dagger }_t, {Y}^{\dagger }_t, {W}^{\dagger }_t|{X}^{\dagger }_t]\) it holds

$$\begin{aligned} \sup _{c \in {\mathcal {R}}} \Big \vert P\left\{ \sqrt{T}({r}({F}^{*}_{{Z^{'}}}) -{r}({\hat{F}}_{{Z^{'}}}))\le c\right\} - P\left\{ \sqrt{T}({r}({\hat{F}}_{{Z^{'}}}) -{r}(F_{{Z^{'}}}))\le c\right\} \Big \vert {\mathop {\longrightarrow }\limits ^{prob}} 0 \end{aligned}$$

for any Fréchet-differentiable functional \({r}\) under the assumption \(T^{\frac{1}{3}} \rightarrow \infty \).

By assuming that the functional \({r}\) is \(g_{\tau '}-\) Fréchet-differentiable with respect to \(F_{{Y}^{\dagger }}\) and \(g_{\tau ''}-\) Fréchet-differentiable with respect to \(F_{{Z}^{''}}\) we can ensure bootstrap consistency in weak sense for any conditional GC. Note that the assumption of stochastic independence between \({W}^{\dagger }_t|{X}^{\dagger }_t\) and \({Y}^{\dagger }_t|{X}^{\dagger }_t\) under the null is needed to avoid any circular causality effect under the null, such that the stationary bootstrap may be meaningfully applied. We refer to “Appendix B” for the proofs.

2.3 Testing Procedure

We now report in detail the testing procedures relative to the two functionals. For the functional \({{h}}_{{Y}\rightarrow {X}}(\omega )\), our bootstrap procedure is

  • Simulate N independent stationary bootstrap series \(({X}^{*}_t,{Y}^{*}_t)\) given the observed series \(({X}_t,{Y}_t)\).

  • On each simulated series \(({X}^{*}_t,{Y}^{*}_t)\):

    1. 1.

      estimate a VAR model on \(({X}^{*}_t,{Y}^{*}_t)\) via SURE using BIC for model selection.

    2. 2.

      at Fourier frequencies \(f_i=\frac{i}{T}, i=1,\ldots ,[\frac{T}{2}]\), compute \({{\hat{h}}}_{{{Y}^{*}}\rightarrow {X}^{*}}(2\pi f_i)\).

    3. 3.

      compute \(med_{\{f_i, i=1,\ldots ,T/2\}} {{\hat{h}}}_{{{Y}^{*}}\rightarrow {X}^{*}}(2\pi f_i)\).

  • Then, compute \(q_{uncond,1-\alpha }\), the \(100 \times (1-\alpha )\)-th percentile of the bootstrap distribution at Step 3 across the N bootstrap series, where \(\alpha \) is the significance level.

  • Finally, at each \(f_i\), flag \({{\hat{h}}}_{{Y}\rightarrow {X}}(2\pi f_i)\) as significant if larger than \(q_{uncond,\alpha }\).

For the functional \({{h}}_{{Y}\rightarrow {X}|{W}}(\omega )\), the procedure becomes

  • Estimate a VAR on \(({X}_t,{W}_t)\) via SURE using BIC for model selection.

  • Simulate N residual bootstrap series \(({X}^{*}_t,{W}^{*}_t)\) from VAR estimates (Tibshirani and Efron 1993).

  • Simulate N stationary bootstrap series \({Y}^{*}_t\) independently of \(({X}^{*}_t,{W}^{*}_t)\) given the observed series \({Y}_t\).

  • On each simulated series \(({X}^{*}_t,{Y}^{*}_t,{W}^{*}_t)\):

    1. 1.

      estimate a VAR model on \(({X}^{*}_t,{W}^{*}_t)\) and \(({X}^{*}_t,{Y}^{*}_t,{W}^{*}_t)\) via SURE using BIC for model selection.

    2. 2.

      at Fourier frequencies \(f_i=\frac{i}{T}, i=1,\ldots ,[\frac{T}{2}]\), compute \({{\hat{h}}}_{{{Y}^{*}}\rightarrow {{X}^{*}}|{{W}^{*}}}(2\pi f_i)\).

    3. 3.

      compute \(med_{\{f_i, i=1,\ldots ,T/2\}} {{\hat{h}}}_{{{Y}^{*}}\rightarrow {{X}^{*}}|{{W}^{*}}}(2\pi f_i)\).

  • Then, compute \(q_{cond,1-\alpha }\), the \(100 \times (1-\alpha )-\)th percentile of the bootstrap distribution at Step 3 across the N bootstrap series.

  • Finally, at each \(f_i\), flag \({{\hat{h}}}_{{Y}\rightarrow {X}|{W}}(2\pi f_i)\) as significant if larger than \(q_{cond,\alpha }\).

We provide an R package, called “grangers”, performing these routines (see Declarations).

In addition, we extend our framework to test the nullity of \({r}(2\pi f_i)\), \(i=1,\ldots ,[\frac{T}{2}]\), across the entire frequency range. In order to do that, we apply the conservative Bonferroni correction, that is, we apply the testing procedure to each frequency with significance level \(\frac{2\alpha }{T}\). In this way, we ensure that the overall level across frequencies is not larger than \(\alpha \) under the null.

2.4 Test Features and Simulation Results

We now describe the performance of our test (with a significance level of \(5\%\)) in a number of situations, in comparison to BC test by Breitung and Candelon (2006). We do not use more competitors like Hidalgo (2000) and Nishiyama et al. (2011) for the following reasons. First of all, both approaches are intended to test the unconditional causality. Second, the assumption settings of both procedures are quite different and somehow more demanding: the former requires an explicit control of the spectral matrix, while the second requires particular conditions on the process distribution. Third, BC test has become the standard test for GC in the frequency domain for economic applications, as shown in the Introduction.

First of all, suppose that we simulate \(N=300\) replicates from a VAR process of length \(T=100,300\) in the form (1) with \(p=1\), \({\varSigma _2}=diag(1,1)\) and no causality coefficients. The VAR delay is selected for each bootstrap setting by BIC criterion. Our first tested coefficient matrix \({A}_1\) is \({{A}_{1,(jj)}}=0,0.5,1\), \(j=1,2\). We observe that for \(T=300\) the estimated level is approximately \(5\%\) at all Fourier frequencies as long as \({A}_{1,(jj)}=0\) (Case 1), both for the unconditional and the conditional GC. If \({\varSigma _2}=\left[ \begin{array}{cc} 1 &{} \rho \\ \rho &{} 1 \end{array}\right] \), the estimated level is still close to \(5\%\) when \(\rho =0.5\).

As \({A}_{1,(jj)}\) increases (Case 2), our test presents an increasing rejection rate particularly at low frequencies. This happens because our test signals any violation of the null hypothesis of stochastic independence. As \({A}_{1,(jj)}\) approaches 1, the power tends to 1 at the lowest frequency, as showed for the unconditional GC in Fig. 1. On the contrary, BC test has a level close to \(5\%\) over all frequencies as \({A}_{1,(jj)}=0,0.5\), and approaches 0.4 as \({A}_{1,(jj)}=1\). A similar pattern is observed also for the conditional GC in Fig. 2.

In order to show that under the null hypothesis the distributions of the individual causalities are indistinguishable from the one of the median causality, we have calculated the Kullback-Leibler (KL) divergence (Kullback and Leibler 1951) between the density functions of each causality and the median causality, estimated via a Gaussian kernel with bandwidth equal to 0.25. The results for Case 1 (with \({A}_{1,(jj)}=0)\) are reported in Fig. 3. We note that the observed divergence is extremely small, considered that the KL-divergence between two simulated uniform distributions located in [0, 1] is around 0.2. The pattern of KL over frequencies may vary, but its magnitude is always very small, even if we set \(T=100\). Similar results are observed for the conditional case, or allowing for \(\rho =0.5\) and \({{A}_{1,(jj)}}=0.5\).

Another relevant case we deal with is for \(p=1\) and \({{A}_{1,(j2)}}=0.2,0.5,1\), \(j=1,2\). This process has a causality which decreases as the frequency increases (Case 3). If \({{A}_{1,(j2)}}\) is at least 0.5, \(T=100\) is enough to reject the null hypothesis at all frequencies with probability 1. If \({{A}_{1,(j2)}}=0.2\), \(T=300\) is required to reject the null at all frequencies, as for \(T=100\) the power is far from 1 and stands around \(60\%\) at the lowest frequency. At the same time, we observe in Figs. 4 and 5 that BC presents remarkable power losses at all frequencies even for \(T=300\), both for the unconditional and the conditional case.

Fig. 1
figure 1

Zero unconditional GC with \({{A}_{1,(jj)}}=1\), \(j=1,2\) (Case 2), \(T=300\). In solid line, GC shape is reported at left, rejection rates for our NEW test and BC test are reported at center and at right. One standard deviation bands are reported in dotdash line. The values \(\alpha =0.05\) and 1 are reported in dashed line

Moreover, we consider the VAR models described in Breitung and Candelon (2006), Sect. 4. Those models have \(p=3\), \({{A}_{k,(j2)}}=1\), \(k=1,3\), \(j=1\) and \({{A}_{k,(j2)}}=-2\cos (\omega ^{*})\), \(k=2\), \(j=1\). Such coefficient structure results in a null causality at frequency \(\omega ^{*}\). In addition, we test the sensitivity of the results to the condition number of the covariance matrix, setting \({\varSigma _2}=diag(1,1)\), \({\varSigma _2}=diag(1,0.2)\), \({\varSigma _2}=diag(1,5)\).

If \(\omega ^{*}=\frac{\pi }{2}\) and \({\varSigma _2}=diag(1,1)\) (Fig. 6), we observe that the power of our test is close to the BC one. This happens because \({\varSigma _2}\) has condition number 1, i.e. the process shows no collinearity. If \({\varSigma _2}=diag(1,0.2)\) (Fig. 7), BC test has a level approximately equal to \(5\%\) at \(\omega ^{*}\), while ours is larger, because the two series are both dependent in distribution and collinear. On the contrary, if \({\varSigma _2}=diag(1,5)\) (Fig. 8), even BC test loses power at \(\omega ^{*}\) because both the magnitude and the variance of the estimated GC increase considerably. Setting a different \(\omega ^{*}\), like \(\frac{3}{4}\pi \), the exposed patterns remain the same (Fig. 9).

Setting instead \({A}_{k,(22)}\), \(k=1,3\), to 0.25 and 0.5 equals to increase the magnitude of the VAR roots until the limit value of 1 (non-stationary case). In that case, consistently with our null hypothesis, we observe that the rejection rate of our test increases as \({A}_{k,(22)}\) increases (Figs. 10, 11).

Fig. 2
figure 2

Zero conditional GC with \({{A}_{1,(jj)}}=1\), \(j=1,2\) (Case 2), \(T=300\). In solid line, GC shape is reported at left, rejection rates for our NEW test and BC test are reported at center and at right. One standard deviation bands are reported in dotdash line. The values \(\alpha =0.05\) and 1 are reported in dashed line

Fig. 3
figure 3

An example of Kullback–Leibler divergence between the density of each individual causality and the median causality estimated via a Gaussian kernel with bandwidth equal to 0.25 for Case 1 with \({{A}_{1,(jj)}}=0\), \(j=1,2\), \(T=300\) and \(N=300\)

Fig. 4
figure 4

Decreasing unconditional GC with \({{A}_{1,(j2)}}=0.2\), \(j=1,2\) (Case 3), \(T=300\). In solid line, GC shape is reported at left, rejection rates for our NEW test and BC test are reported at center and at right. One standard deviation bands are reported in dotdash line. The values \(\alpha =0.05\) and 1 are reported in dashed line

Fig. 5
figure 5

Decreasing conditional GC with \({{A}_{1,(j2)}}=0.2\), \(j=1,2\) (Case 3), \(T=300\). In solid line, GC shape is reported at left, rejection rates for our NEW test and BC test are reported at center and at right. One standard deviation bands are reported in dotdash line. The values \(\alpha =0.05\) and 1 are reported in dashed line

Fig. 6
figure 6

Case with \(p=3\), \({{A}_{k,(j2)}}=1\), \(k=1,3\), \(j=1\), \({A}_{2,(12)}=0\), \({\varSigma _2}=diag(1,1)\), and \(T=300\). In solid line, GC shape is reported at left, rejection rates for our NEW test and BC test are reported at center and at right. One standard deviation bands are reported in dotdash line. The values \(\alpha =0.05\) and 1 are reported in dashed line

Fig. 7
figure 7

Case with \(p=3\), \({{A}_{k,(j2)}}=1\), \(k=1,3\), \(j=1\) and \(\omega ^{*}=\frac{\pi }{2}\), \({A}_{2,(12)}=0\), \({\varSigma _2}=diag(1,5)\), and \(T=300\). In solid line, GC shape is reported at left, rejection rates for our NEW test and BC test are reported at center and at right. One standard deviation bands are reported in dotdash line. The values \(\alpha =0.05\) and 1 are reported in dashed line

Fig. 8
figure 8

Case with \(\omega ^{*}=\frac{\pi }{2}\), \({A}_{2,(12)}=0\), \({\varSigma _2}=diag(1,0.2)\), and \(T=300\). In solid line, GC shape is reported at left, rejection rates for our NEW test and BC test are reported at center and at right. One standard deviation bands are reported in dotdash line. The values \(\alpha =0.05\) and 1 are reported in dashed line

Fig. 9
figure 9

Case with \(\omega ^{*}=\frac{3}{4} \pi \), \({A}_{2,(12)}=0\), \({\varSigma _2}=diag(1,1)\), and \(T=300\). In solid line, GC shape is reported at left, rejection rates for our NEW test and BC test are reported at center and at right. One standard deviation bands are reported in dotdash line. The values \(\alpha =0.05\) and 1 are reported in dashed line

Fig. 10
figure 10

Case with \(\omega ^{*}=\frac{\pi }{2}\), \({A}_{p,(22)}=0.25\), \(p=1,3\), and \(T=300\). In solid line, GC shape is reported at left, rejection rates for our NEW test and BC test are reported at center and at right. One standard deviation bands are reported in dotdash line. The values \(\alpha =0.05\) and 1 are reported in dashed line

Fig. 11
figure 11

Case with \(\omega ^{*}=\frac{\pi }{2}\), \({A}_{p,(22)}=0.5\), \(p=1,3\). In solid line, the rejection rates for our NEW test and BC test are reported at left and at right respectively. One standard deviation bands are reported in dotdash line. The values \(\alpha =0.05\) and 1 are reported in dashed line. Here we omit the GC shape because the process is non-stationary

To sum up, the rejection rate of our test depends on three factors:

  • the magnitude of VAR roots: the rejection rate is perturbed at low frequencies as the process is closer to non-stationarity;

  • the true underlying spectral variability, which in turn depends on the relationship between the magnitude of causality and non-causality coefficients;

  • the condition number of the autocovariance matrices \({R}_j\), \(j\ge 0\), which impacts on the degree of stochastic dependence.

We also compute the rejection rates of the conservative test on all causalities jointly considered obtained by Bonferroni correction. The results for our test are reported in Table 1, the results for BC test are reported in Table 2. We note that our test is more powerful as soon as the underlying setting goes towards stochastic dependence.

Table 1 Rejection rates of our test on all causalities jointly considered obtained by Bonferroni correction
Table 2 Rejection rates of BC test on all causalities jointly considered obtained by Bonferroni correction

Table 3 reports the mean computational time (with the relative standard deviation) of our inferential procedures (both for unconditional and conditional GC) for different values of the time series length T and the number of bootstrap samples N under the null hypothesis. We can see that the computational cost increases with the number of Fourier frequencies (equal to T/2). Considering that \(N=300\) is enough to obtain a sufficient approximation of critical values, our approach appears to be perfectly feasible. For sets of very large series, provided that they are covariance stationary, we note that an appropriate subsample of Fourier frequencies can be easily drawn to keep computational times low.

Table 3 Mean computational times in seconds of our NEW bootstrap testing approaches with the relative standard deviations under the null hypothesis for different values of the time series length T and the number of bootstrap samples N across 10 trials. Calculations are performed by a 64-bit Intel Core i7-8665U processor with 1.90 GHz base frequency and 16GB RAM

3 A Granger-Causality Analysis of Euro Area GDP, M3 and M1 in the Frequency Domain

In this section we study the co-movements of gross domestic product (GDP) and money stock (M3 and M1 aggregate) in the Euro Area. Differently from the Federal Reserve, the European Central Bank still considers the M3 growth rate as a policy target (see Jung and Villanova 2020). We have thus considered M3 instead of M2, also because they are quite close in the Euro Area (see Darvas 2015). We test in the frequency domain both the direct link from one variable to the other one and the indirect link with respect to further explanatory variables like the inflation rate (HICP), the unemployment rate (UN), or the long-term interest rate (LTN). Published works on this research topic make use of time-domain methods: some of them use factor modelling (Cendejas et al. 2014), some others use likelihood methods (Andrés et al. 2006, Canova and Menz 2011), or large-dimensional VAR models (Giannone et al. 2013), or dynamic factor models (Breitung and Eickmeier 2006) or VAR models with time-varying parameters (Psaradakis et al. 2005). On the contrary, we apply the inferential framework for GC in the frequency domain developed in Sect. 2. In this way, we provide explicit inference on unconditional and conditional GC. HICP, UN and LTN are used as conditioning variables, with the aim to discount for the mediating power of each of the three variables with respect to the relationship between output and money supply.

3.1 Data Preparation

We have considered the time series of GDP at market price in the Euro Area (chain linked volumes in Euro) and the monetary aggregate M3 and M1 (outstanding amount of loans to the whole economy excluded the monetary and financial sector, all currencies combined). M3 is also called “broad money”, M1 “narrow money”.

Since our goal is to focus on the effect of monetary policy on output, we restrict our analysis to the period 1999–2017, when the ECB has taken actual decisions on the Euro Area. We can thus denote our series by \({GDP}_t\), \({M3}_{t}\), \({M1}_{t}\), \({HICP}_{t}\), \({UN}_{t}\), \({LTN}_{t}\), where \(t=1,\ldots ,76\) (there are 76 quarters from Winter 1999 to Autumn 2017). The data are drawn from the ECB Real Time Research database where national figures are aggregated according to a changing composition of the Euro Area across time (see Giannone et al. 2012). We consider quarterly series as the GDP is regularly published on a quarterly base. The other series (all but GDP) are therefore made quarterly by averaging. We refer to “https://www.ecb.europa.eu/stats” and ECB (2012) for technical and computational details.

Fig. 12
figure 12

Left figures contain quarterly Euro Area GDP, M3 and M1 in logs across the period 1999–2017. In dashed the estimated trend by Hodrick-Prescott filter (\(\lambda =1600\)). Right figures contain the estimated cycle by the same method

Fig. 13
figure 13

Left figures contain quarterly Euro Area HICP, UN, LTN rates across the period 1999–2017. In dashed the estimated trend by Hodrick-Prescott filter (\(\lambda =1600\)). Right figures contain the estimated cycle by the same method

According to the Dickey-Fuller test, the logarithmic transforms of \({GDP}_t\), \({M3}_t\), \({M1}_t\) are non-stationary, as well as the three conditioning variables \({HICP}_t\), \({UN}_t\) and \({LTN}_t\). Therefore, following Friedman and Schwartz (1975), we pass all series by Hodrick-Prescott filter (Hodrick and Prescott 1997), with the canonical value of \(\lambda =1600\), in order to remove any trend and to extract cyclical components. We do not use Baxter-King filter (Baxter and King 1999), as suggested in Belongia and Ireland (2016), because we have not enough end of sample data. Cycle extraction is performed via the R package “mFilter”.

Figures 12 and 13 contain the plots of \({GDP}_t\), \({M3}_{t}\), \({M1}_{t}\), \({HICP}_{t}\), \({UN}_{t}\), \({LTN}_{t}\) respectively. Left figures contain the original series and the estimated trend, while right figures contain the estimated cycles. The patterns of the estimated ACF and CCF are reported and commented in “Appendix A”. Since our ultimate goal is to infer about the cause-effect relationship of money stock and economic output, we test at each frequency Granger-causality spectra via our approach. Due to the use of Fast Fourier Transform, the frequencies used are the following: \(f_i=\frac{i}{80}, i=1,\ldots ,40\), because \(T=76\). The frequency range is re-scaled to [0, 2] for the quarterly frequency of our series.

Relevant VAR models, estimated including an intercept by the R package “vars” (Pfaff et al. 2008), are selected by the Bayesian Information Criterion (BIC), imposing a maximum of four lags. BIC is used because we know that BIC is correctly estimating the unknown number of delays, while AIC may overestimate it, thus increasing the probability to estimate non-stationary VAR models. In any case, the roots of all estimated characteristic polynomials are strictly smaller than one. The number of bootstrap samples is 1000.

Note that for computational reasons BC test cannot be computed for \(p=1\). Besides, its p-value is constant across frequencies (except for \(\omega =\{0,\pi \}\)) for \(p=2\). BC test requires a large number of delays, while ours works for all values, given that the resulting VAR is stationary and non-singular. Therefore, we can not compare directly our test to BC test on real data, because BC is not useful for all cases with \(p\le 2\).

3.2 Causality Results

We start describing VAR estimates on the couple GDP-M3. Our lag selection procedure chooses 2 lags. In the \({GDP}_t\) equation, \({GDP}_{t-1}\) and \({GDP}_{t-2}\) are heavily significant, while \({M3}_{t-1}\) and \({M3}_{t-2}\) slightly are (at \(5\%\) and \(10\%\) respectively). This results in a GC spectral shape which is approximately constant and slightly significant across frequencies. In the \({M3}_{t}\) equation, \({M3}_{t-1}\) is heavily significant, while \({GDP}_{t-2}\) is at \(10\%\). The corresponding GC shape is significant in the first half of the frequency range.

Concerning the couple GDP-M1, our VAR lag selection procedure chooses 2 lags. In the \({GDP}_t\) equation, \({GDP}_{t-1}\), \({GDP}_{t-2}\) and \({M1}_{t-2}\) are heavily significant. The related unconditional GC shape is significant across the entire frequency range. In the \({M1}_{t}\) equation, only \({M1}_{t-1}\) is heavily significant, while \({GDP}_{t-1}\) has a p-value of \(12\%\). The resulting GC spectral shape is thus significant in the first third of the frequency range.

In Figs. 14 and 15, unconditional and conditional GC spectra from M3 to GDP and viceversa are reported. The same spectra from M1 to GDP and viceversa are in Figs. 16 and 17 respectively. In dashed, our bootstrap threshold at \(5\%\) is outlined. In dotdash, the same threshold for the overall test obtained by Bonferroni correction is depicted.

Fig. 14
figure 14

GC spectra from M3 to GDP: unconditional, conditional on HICP, UN and LTN respectively. In dashed, our bootstrap threshold at \(5\%\) is outlined. In dotdash, the same threshold for the overall test obtained by Bonferroni correction is depicted

Fig. 15
figure 15

GC spectra GDP to M3: unconditional, conditional on HICP, UN and LTN respectively. In dashed, our bootstrap threshold at \(5\%\) is outlined. In dotdash, the same threshold for the overall test obtained by Bonferroni correction is depicted

Fig. 16
figure 16

GC spectra M1 to GDP: unconditional, conditional on HICP, UN and LTN respectively. In dashed, our bootstrap threshold at \(5\%\) is outlined. In dotdash, the same threshold for the overall test obtained by Bonferroni correction is depicted

Fig. 17
figure 17

GC spectra GDP to M1: unconditional, conditional on HICP, UN and LTN respectively. In dashed, our bootstrap threshold at \(5\%\) is outlined. In dotdash, the same threshold for the overall test obtained by Bonferroni correction is depicted

We first comment conditional GC spectra for the couple GDP-M3. Conditioning on HICP, the level of significance of \({M3}_{t-1}\) and \({M3}_{t-2}\) increases in the \({GDP}_t\) equation. This results in a GC decreasing across frequencies and significant across the entire frequency range. In the \({M3}_{t}\) equation, the level of significance of \({GDP}_{t-2}\) increases to \(5\%\). As a result, GC is significant until the period of 1 year. Conditioning on UN, in the \({GDP}_t\) equation the level of significance of \({M3}_{t-1}\) is \(5\%\) while \({M3}_{t-2}\) is no longer significant. This results in a GC which is no longer significant. In the \({M3}_{t}\) equation, \({GDP}_{t-2}\) is no longer significant, resulting in a non-significant GC everywhere. Conditioning on LTN, the level of significance is \(5\%\) for \({M3}_{t-1}\) and \(10\%\) for \({M3}_{t-2}\) in the \({GDP}_t\) equation. The corresponding GC is significant across the entire frequency range. In the \({M3}_{t}\) equation, \({GDP}_{t-2}\) is significant at \(5\%\), causing again GC to be significant almost everywhere.

We now comment conditional GC spectra for the couple GDP-M1. Conditioning on HICP, in the \({GDP}_{t}\) equation \({M1}_{t-2}\) is still heavily significant. The GC spectral shape is almost the same as the unconditional one. In the \({M1}_{t}\) equation, the level of significance is quite smaller, so that causalities are significant only in the first quarter of the frequency range. Conditioning on UN, \({M1}_{t-2}\) is still heavily significant in the \({GDP}_{t}\) equation. The conditional GC magnitude is slightly weaker than the corresponding unconditional one. In the \({M1}_{t}\) equation, \({GDP}_{t-1}\) has a p-value of \(20\%\) and the related GC shape is significant in the first half of the frequency range. Conditioning on LTN, \({M1}_{t-2}\) is still significant at \(1\%\) in the \({GDP}_{t}\) equation, causing GC shape to be significant over all frequencies. In the \({M1}_{t}\) equation, \({GDP}_{t-1}\) has a p-value of \(26\%\). As a consequence, we observe significance at low frequencies only.

Concerning the overall test on all causalities, we observe the absence of any significance in four cases out of sixteen: the unconditional GC from M3 to GDP, the conditional GC from M3 to GDP both on UN and LTN, and the conditional GC from GDP to M3 on UN. We remark that this test is conservative in nature: however, it allows to adequately contextualize the significance of individual tests. In particular, we can claim that, for those four cases, the GC spectrum is indistinguishable at all frequencies with a significance level of at most \(5\%\) from the one we would observe in case of stochastic independence between output and money stock.

4 Conclusions

In this paper, we have developed a bootstrap testing procedure for Granger-causality (GC) in the frequency domain. In particular, our test is based on the null hypothesis of stochastic independence, which embeds the traditional one of no Granger-causality. Via our test, we highlight those frequencies which present a Granger-causality systematically larger than the median causality across frequencies observed under the null hypothesis. Such a goal is reached by appropriately exploiting the stationary bootstrap procedure of Politis and Romano (1994). We both deal with unconditional GC and conditional GC with respect to third variables. A simulation study has shown that our procedure is complementary with respect to the test of Breitung and Candelon (2006).

Our test has a general validity, as it only requires the stationary bootstrap of Politis and Romano (1994) to be consistent on the data generating process of interest, which requires \(T^{\frac{1}{3}} \rightarrow \infty \), where T is the time series length. Therefore, our procedure may find application in several fields like macroeconomics, neuroscience, meteorology, seismology and finance. Among those, monetary economics is a very suitable application field, as the time series of interest often present a rich causality structure, and the need rises to disambiguate among relevant frequencies the significant ones for causality.

For this reason, we have applied our methodology to Euro Area money stock and GDP. From an empirical point of view, we have been able to say that the relationship between money supply and output is present in the Euro Area across the period 1999–2017. We have provided evidence that M3 (M1) in some cases reacts to economic shocks, in some others it acts as a policy shock with respect to economic output. We have observed that the link between GDP and M1 is much stronger in both directions than the link between GDP and M3. In particular, the causal relationship from M1 to GDP appears to be significant at all frequencies, while the opposite one is significant at low frequencies only. This allows us to conclude that the intensity of the causal link from money to output appears to be stronger than the reverse one in the Euro Area, reinforcing the idea, outlined among others in Musso et al. (2019), that M1 has a strong predictive power for GDP in the Euro Area.