A Bootstrap Method to Test Granger-Causality in the Frequency Domain

We propose a bootstrap test for unconditional and conditional Granger-causality spectra in the frequency domain. Our test aims to detect if the causality at a particular frequency is systematically different from zero. In particular, we consider a stochastic process derived applying independently the stationary bootstrap to the original series. At each frequency, we test the sample causality against the distribution of the median causality across frequencies estimated for that process. Via our procedure, we infer about the relationship between money stock and GDP in the Euro Area during the period 1999–2017. We point out that the money stock aggregate M1 had a significant impact on economic output at all frequencies, while the opposite relationship is significant only at low frequencies.


Introduction
As a statistical concept, causality has a central role both from a theoretical and a practical point of view (see Berzuini et al. 2012). In time series analysis, it was first introduced by Wiener in the context of prediction theory (Wiener 1956) and then formalized by Granger in the context of linear regression modelling of stochastic processes (Granger 1969). Since then it is denoted as Granger-causality (GC). Specifically, Y t is said to Granger-cause X t if the past values of Y t , i.e. Y t−1 , Y t−2 , . . . , help predicting the current value of X t .
The concept of causality was then extended from the time to the frequency domain. Pierce (1979) explicitly measured the effect of the lagged values of Y t onto X t through an R 2 indicator, defined frequency-wise to evaluate the intensity of the linear causal dependence at each frequency. Geweke (1982) proposed to estimate a VAR model on [X t , Y t ] by defining the causality from Y t to X t at each frequency as the proportional effect of the lagged values of Y t onto the spectral density of X t . This approach can be implemented even considering the conditional causality with respect to the lagged values of a third variable W t (Geweke 1984). The described measures are respectively defined as unconditional and conditional Granger-causality spectra (Geweke 1982(Geweke , 1984, and also discussed by Hosoya (1991) and Hosoya (2001). The conditional Granger-causality has then been reformulated by Chen et al. (2006).
While the use of GC in the time-domain dates back to the sixties, GC in the frequency domain has been employed in more recent years. In Lemmens et al. (2008) the causality structure of European production expectation surveys is analyzed by the methods of Pierce (1979) and Geweke (1982) comparatively, which require to study appropriate frequency-wise coefficients of coherence. The same approach is used in Tiwari (2014) for exploring the relationship between energy consumption and income in the United States. In this paper, we aim to study the mutual relationship between gross domestic product and money stock in the Euro Area by making inference on Granger-causality spectra. The problem has been widely addressed for the U.S. economy, which is a key player from an international macroeconomic perspective. In Belongia and Ireland (2016), for instance, the methodology of Friedman and Schwartz (1975), based on structural VAR models, is revisited and applied to U.S. data across the period 1967-2013. The same consideration applies to the Indian economy (see Nadig and Viswanathan 2019). On the contrary, since the European Central Bank was born in 1998, evidences and analyses regarding the Euro Area are less established and more controversial.
The advantage of frequency-domain GC lies in the disentanglement of the causality structure across a range of frequencies, while traditional time-domain GC only provides an overall indication of the presence of a causality relationship. In particular, GC measures at each frequency the intensity of the causal relationship between the components of the series at that particular frequency. For this reason, we aim to provide an inferential procedure to highlight at which specific frequency there exists causality among the data generating processes of the observed time series.
In Breitung and Candelon (2006), a parametric test (henceforth, BC test) for Granger-causality in the frequency domain is proposed. Its convergence rate is O(T −1/2 ) (where T is the time series length) and its power is decreasing as the distance of the frequency of interest from π 2 increases (even if Yamada and Yanfeng 2014 show that the same test is still useful at extreme frequencies). The test is based upon a set of linear restrictions on the parameters of the (possibly cointegrated) VAR model best representing the series (we refer to Lütkepohl 2005 for VAR selection and estimation). Breitung and Schreiber (2018) generalize the BC procedure to test causality and delay within a frequency band.
BC test is widely used in the economic literature. In Croux and Reusens (2013), it is applied to test the predictive power of domestic stock prices for the economic activ-ity of G-7 countries. Bozoklu and Yilanci (2013) consider the relationship between energy consumption and economic growth in the OECD area. The relationship between real and financial business cycles is analyzed in Gomez-Gonzalez et al. (2015). Wei (2015) deals with the link between commodity prices and monetary policies. Keho et al. (2015) test Wagner's law for some African economies. Two relevant applications for Turkish economy are described in Gül et al. (2018) and Tastan and Sahin (2020). Fromentin and Tadjeddine (2020) analyze the relationship between cross-border workers and financial instability in Luxembourg.
Some alternative testing approaches were also proposed. Hidalgo (2000) estimates VAR filters via generalized least squares and then accordingly derives a test statistics for GC in the presence of long-range dependence. Hidalgo (2005) extends the framework of Hidalgo (2000) to the multivariate case computing relevant quantiles under the null via resampling bootstrap. A consistent nonparametric test for nonlinear causality is proposed by Nishiyama et al. (2011). A nonparametric approach to test for Grangercausality in tail events is proposed by Hong et al. (2009) and extended by Candelon and Tokpavi (2016). The Philipps spectral estimator (Phillips 1988) is instead exploited to estimate causality both at frequency 0 and at the rest of frequencies in a cointegrated setting by Assenmacher-Wesche and Gerlach (2008). Such a method is used in Berger and Osterholm (2011) to test the relationship between money growth and inflation in the Euro Area. A comprehensive computational and inferential strategy for time-domain and frequency-domain GC spectra has been proposed in Barnett and Seth (2014).
The present work proposes a different approach to the classical testing framework of the no-causality hypothesis. Our null hypothesis of zero-causality, in fact, is tested frequency-wise comparing each causality to the 100×(1−α)−th percentile of the distribution of the median causality across frequencies under the stronger case of stochastic independence, where α is the significance level. The null distribution is approximated via the stationary bootstrap of Politis and Romano (1994). As empirically shown in Sect. 2.4, the median is chosen because under the null the individual causalities and the median causality are indistinguishable from a stochastic point of view. We thus derive the desired bootstrap quantile of the median causality across frequencies and we compare each observed causality to it. Our test is consistent provided that T 1 3 → ∞. The paper is organized as follows. In the next Section the concept of Grangercausality in the frequency domain is recalled, our bootstrap inference approach is explained in detail and a simulation study which clarifies the features of our test is presented. In Sect. 3 we show the potentialities of our method in outlining the causal relationships between money and output in the Euro Area during the period 1999-2017. Finally, we conclude the paper with a discussion.

Definition
We now briefly recall the bases of Granger-causality spectral theory. We follow the approach in Ding et al. (2006), which we refer to for the details. Suppose that the stochastic processes X t and Y t , jointly covariance-stationary, follow a non-singular V AR( p) model. Defining the bivariate stochastic process Z t = [X t , Y t ] , we have where t is a zero-mean bivariate stochastic process with covariance matrix Σ 2 and A 1 , . . . , A p are 2 × 2 coefficient matrices. We stress that t has no auto-covariance structure at any lag different from 0. Moving to the frequency domain, for each frequency ω we define the transfer function P(ω) of Z t in (1) as which is invertible if and only if the roots of the equation det( , the definition (2) allows to define in a compact way the modelbased spectrum h(ω) as follows: where * denotes the complex conjugate transpose.
Setting Σ 2 = σ 2 υ 2 υ 2 γ 2 , we need for computational reasons to define the transform , from which we derive the transformed transfer function matrix The unconditional Granger-causality spectrum of X t (effect-variable) respect to Y t (cause-variable) is then defined as (Geweke 1982) In the empirical analysis, the theoretical values of coefficient and covariance matrices will be replaced by the corresponding SURE estimates (Zellner 1962). Moreover, we can define the conditional Granger-causality spectrum of X t respect to Y t given an exogenous variable W t (conditioning variable). Suppose that we estimate a VAR on [X t , W t ] with covariance matrix of the noise terms Σ 2 = σ 2 υ 2 υ 2 γ 2 and transfer function G(ω) [defined as in (2)]. The corresponding normalized process of [X t , W t ] (according to the procedure described above) is denoted by [X t ,W t ] .
We then estimate a VAR on [X t , Y t , W t ] with covariance matrix of the noise terms and transfer function P (ω). Building the matrix we can define Q(ω) = C −1 (ω)P (ω), which is a sort of "conditional" transfer function matrix. The theoretical spectrum ofX t can thus be written as The conditional spectrum of X t (effect-variable) respect to Y t (cause-variable) given W t (conditioning variable) is (Geweke 1984) Both h Y →X (ω) and h Y →X |W (ω) range from 0 to ∞, with −π ≤ ω ≤ π . h Y →X (ω) expresses the power of the relationship from Y to X at frequency ω, h Y →X |W (ω) expresses the strength of the relationship from Y to X at frequency ω given W . Therefore, the unconditional spectrum accounts for the whole effect of the past values of Y t onto X t , while the conditional spectrum accounts for the direct effect of the past values of Y t onto X t excluding the effect mediated by the past values of W t .
Granger-causality spectra h Y →X (ω) and h Y →X |W (ω) can be interpreted as follows. If h Y →X (ω) > 0, it means that past values of Y t help predicting X t , and 1 ω is a relevant cycle. If h Y →X |W (ω) > 0, it means that past values of Y t in addition to those of W t help predicting X t , and 1 ω is a relevant cycle. Significant frequencies give us some hints on the relevant delay structure of the cause variable with respect to the effect variable.

Testing Framework
The inference on Granger-causality spectra in the frequency domain is still an open problem. In fact, differently from the corresponding time-domain quantities, the limiting distribution for unconditional and conditional spectra under the null hypothesis of no-causality is unknown (see Barnett and Seth 2014, Sect. 2.5). In spite of that, Breitung and Candelon (2006) test the nullity of unconditional and conditional GC at each frequency ω, imposing a necessary and sufficient set of linear restrictions to the (possibly cointegrated) VAR model best fitting the series. The resulting test statistics is distributed under their null as a Fisher distribution with 2 and T − 2 p degrees of freedom (except for ω = {0, π} at which the distribution is F (1,T − p) ), where p is the VAR delay and T is the time series length.
On the contrary, as explained in the Introduction, our benchmark is GC spectrum under the null hypothesis of stochastic independence. At each frequency ω, we test in Fisher's sense the null hypothesis H 0 : r (ω) = 0 where the functional r (ω) may be the unconditional GC h Y →X (ω), assuming that X t is stochastically independent from Y t , or the conditional GC h Y →X |W (ω), assuming that X t is stochastically independent from Y t conditionally on W t . This equals to assume that under the null the bivariate stochastic process [X t , Y t ] is an independent process, unconditionally or conditionally on W t .
In order to test the null hypothesis, we refer the estimated causality at each frequency to the distribution ofr med , the estimated median of r (ω) across frequencies, approximated by the stationary bootstrap of Politis and Romano (1994). The median is chosen because under the null all causalities are indistinguishable with respect to the median causality from a stochastic point of view (see Sect. 2.4). We stress that unconditional and conditional spectra must be assessed separately, because their distributions are different under the null.
Our idea derives from Politis and Romano (1994), according to which each Fréchetdifferentiable functional may be successfully approximated by the stationary bootstrap, and the resulting bias depends on the sum of the Fréchet-differential g F evaluated at each X t . The bootstrap series obtained via the stationary bootstrap of Politis and Romano (1994) are stationary Markov chains conditionally on the data. This means that each bootstrap series X * 1 , . . . , X * T is a Markov chain conditionally on X 1 , . . . , X T . Suppose that we apply the stationary bootstrap procedure independently to X t and Y t , obtaining the stationary bootstrap series X * t and Y * t . Computing unconditional Granger-causality spectra on those series equals to assess unconditional causalities under the assumption of stochastic independence, because X * t and Y * t result to be independent Markov chains. Therefore, testing each Granger-causality computed on the original series X t , Y t against the median causality computed across frequencies on X * t and Y * t is effective as a test for causality strength under the null. For the conditional case, we proceed as follows. First, we estimate a VAR model on the observed series X t and W t jointly considered. Then, we apply the residual bootstrap (see Tibshirani and Efron 1993) to VAR residuals, thus obtaining the consistent residual bootstrap series X * t and W * t . This step is needed in order to avoid any bias which may arise ignoring the dependence structure between X t and W t . Finally, we apply independently the stationary bootstrap to the observed series Y t to obtain the series Y * t . Computing conditional Granger-causality spectra on X * t , Y * t and W * t then equals to assess conditional causalities under the assumption that X t is stochastically independent from Y t given W t . For this reason, we use the resulting distribution of the conditional median causality across frequencies computed on those series as our benchmark under the null hypothesis.
Let us considerr (ω) =ĥ Y →X (ω), which is defined as (3) where the coefficient matrices A j , j = 1, . . . , p, and the error covariance matrix Σ 2 are replaced by the corresponding SURE estimates (Zellner 1962). We know that SURE estimatesÂ j , j = 1, . . . , p, are rational functions of X t , thus being Fréchet-differentiable.P(ω) andΣ 2 are functions of theÂ j , thus being rational in turn; the same holds as a consequence forĥ(ω). Therefore,ĥ Y →X (ω), the natural logarithm of a rational function of the data, is Fréchet-differentiable. At this point, even if the median is not Fréchetdifferentiable, bootstrappingr med = med(ĥ Y →X (ω)) under the null is still valid, because the distribution ofr med is indistinguishable from the causality distribution at any frequency under the null as T is enough large (see Sect. 2.4). As a consequence, according to Politis and Romano (1994), paragraph 4.3, we can estimate consistently any quantile of the distribution of the median ofĥ Y →X (ω) across frequencies under the null hypothesis via the stationary bootstrap. Consideringr (ω) =ĥ Y →X |W (ω), which is defined as (4) where the coefficient matrices and the error covariance matrix are replaced by the corresponding SURE estimates, a similar reasoning can be carried out.
We stress that our aim is not to represent the common multivariate distribution That problem is an estimation one, which would be effectively solved by parametric or residual bootstrap. Our aim is to exploit the random process [ where α is the significance level and r * med is the bootstrap median across frequencies of unconditional or conditional GC under the assumption of stochastic independence. Since the stationary bootstrap procedure is valid forr med under the null, P(r * med ≤ q * r ,1−α ) approximates consistently P(r med ≤ q r ,1−α|H 0 ) as T 1 3 → ∞. In more detail, suppose that r is a Fréchet-differentiable functional, that is, there exists some influence function g F , giving the influence of single observations on the functional r with respect to some distribution function F, such that with g F dF = 0 (||.|| is the supremum norm). We define the mixing coefficient (which controls for the overall amount of time dependence) as The following Theorem, expanded by Politis and Romano (1994), holds.
Theorem 1 Suppose that X † t and Y † t are strictly stationary stochastic processes with The key to prove Theorem 1 is to bootstrap the cause and the effect series independently, and then show that any inference on a Fréchet-differentiable functional is weakly consistent.
For what concerns the conditional case, the following Theorem holds.
for any Fréchet-differentiable functional r under the assumption T 1 3 → ∞.
By assuming that the functional r is g τ − Fréchet-differentiable with respect to F Y † and g τ − Fréchet-differentiable with respect to F Z we can ensure bootstrap consistency in weak sense for any conditional GC. Note that the assumption of stochastic independence between W † t |X † t and Y † t |X † t under the null is needed to avoid any circular causality effect under the null, such that the stationary bootstrap may be meaningfully applied. We refer to "Appendix B" for the proofs.

Testing Procedure
We now report in detail the testing procedures relative to the two functionals. For the functional h Y →X (ω), our bootstrap procedure is -Then, compute q uncond,1−α , the 100 × (1 − α)-th percentile of the bootstrap distribution at Step 3 across the N bootstrap series, where α is the significance level.
For the functional h Y →X |W (ω), the procedure becomes -Estimate a VAR on (X t , W t ) via SURE using BIC for model selection.
We provide an R package, called "grangers", performing these routines (see Declarations).
In addition, we extend our framework to test the nullity of , across the entire frequency range. In order to do that, we apply the conservative Bonferroni correction, that is, we apply the testing procedure to each frequency with significance level 2α T . In this way, we ensure that the overall level across frequencies is not larger than α under the null.

Test Features and Simulation Results
We now describe the performance of our test (with a significance level of 5%) in a number of situations, in comparison to BC test by Breitung and Candelon (2006). We do not use more competitors like Hidalgo (2000) and Nishiyama et al. (2011) for the following reasons. First of all, both approaches are intended to test the unconditional causality. Second, the assumption settings of both procedures are quite different and somehow more demanding: the former requires an explicit control of the spectral matrix, while the second requires particular conditions on the process distribution. Third, BC test has become the standard test for GC in the frequency domain for economic applications, as shown in the Introduction.
First of all, suppose that we simulate N = 300 replicates from a VAR process of length T = 100, 300 in the form (1) with p = 1, Σ 2 = diag(1, 1) and no causality coefficients. The VAR delay is selected for each bootstrap setting by BIC criterion. Our first tested coefficient matrix A 1 is A 1,( j j) = 0, 0.5, 1, j = 1, 2. We observe that for T = 300 the estimated level is approximately 5% at all Fourier frequencies as long as A 1,( j j) = 0 (Case 1), both for the unconditional and the conditional GC. If , the estimated level is still close to 5% when ρ = 0.5.
As A 1,( j j) increases (Case 2), our test presents an increasing rejection rate particularly at low frequencies. This happens because our test signals any violation of the null hypothesis of stochastic independence. As A 1,( j j) approaches 1, the power tends to 1 at the lowest frequency, as showed for the unconditional GC in Fig. 1. On the contrary, BC test has a level close to 5% over all frequencies as A 1,( j j) = 0, 0.5, and approaches 0.4 as A 1,( j j) = 1. A similar pattern is observed also for the conditional GC in Fig. 2.
In order to show that under the null hypothesis the distributions of the individual causalities are indistinguishable from the one of the median causality, we have calculated the Kullback-Leibler (KL) divergence (Kullback and Leibler 1951) between the density functions of each causality and the median causality, estimated via a Gaussian kernel with bandwidth equal to 0.25. The results for Case 1 (with A 1,( j j) = 0) are reported in Fig. 3. We note that the observed divergence is extremely small, considered that the KL-divergence between two simulated uniform distributions located in [0, 1] is around 0.2. The pattern of KL over frequencies may vary, but its magnitude is always very small, even if we set T = 100. Similar results are observed for the conditional case, or allowing for ρ = 0.5 and A 1,( j j) = 0.5.
Another relevant case we deal with is for p = 1 and A 1,( j2) = 0.2, 0.5, 1, j = 1, 2. This process has a causality which decreases as the frequency increases (Case 3). If A 1,( j2) is at least 0.5, T = 100 is enough to reject the null hypothesis at all frequencies with probability 1. If A 1,( j2) = 0.2, T = 300 is required to reject the null at all frequencies, as for T = 100 the power is far from 1 and stands around 60% at the lowest frequency. At the same time, we observe in Figs. 4 and 5 that BC presents remarkable power losses at all frequencies even for T = 300, both for the unconditional and the conditional case.
If ω * = π 2 and Σ 2 = diag(1, 1) (Fig. 6), we observe that the power of our test is close to the BC one. This happens because Σ 2 has condition number 1, i.e. the process shows no collinearity. If Σ 2 = diag(1, 0.2) (Fig. 7), BC test has a level approximately equal to 5% at ω * , while ours is larger, because the two series are both dependent in distribution and collinear. On the contrary, if Σ 2 = diag(1, 5) (Fig. 8), even BC test loses power at ω * because both the magnitude and the variance of the estimated GC increase considerably. Setting a different ω * , like 3 4 π , the exposed patterns remain the same (Fig. 9).
Setting instead A k, (22) , k = 1, 3, to 0.25 and 0.5 equals to increase the magnitude of the VAR roots until the limit value of 1 (non-stationary case). In that case, consistently with our null hypothesis, we observe that the rejection rate of our test increases as A k, (22) increases (Figs. 10, 11).
To sum up, the rejection rate of our test depends on three factors: -the magnitude of VAR roots: the rejection rate is perturbed at low frequencies as the process is closer to non-stationarity; -the true underlying spectral variability, which in turn depends on the relationship between the magnitude of causality and non-causality coefficients; -the condition number of the autocovariance matrices R j , j ≥ 0, which impacts on the degree of stochastic dependence.
We also compute the rejection rates of the conservative test on all causalities jointly considered obtained by Bonferroni correction. The results for our test are reported in Table 1, the results for BC test are reported in Table 2. We note that our test is more powerful as soon as the underlying setting goes towards stochastic dependence. Table 3 reports the mean computational time (with the relative standard deviation) of our inferential procedures (both for unconditional and conditional GC) for different values of the time series length T and the number of bootstrap samples N under the null hypothesis. We can see that the computational cost increases with the number of Fourier frequencies (equal to T /2). Considering that N = 300 is enough to obtain a sufficient approximation of critical values, our approach appears to be perfectly  feasible. For sets of very large series, provided that they are covariance stationary, we note that an appropriate subsample of Fourier frequencies can be easily drawn to keep computational times low.

A Granger-Causality Analysis of Euro Area GDP, Mand M1 in the Frequency Domain
In this section we study the co-movements of gross domestic product (GDP) and money stock (M3 and M1 aggregate) in the Euro Area. Differently from the Federal Reserve, the European Central Bank still considers the M3 growth rate as a policy  target (see Jung and Villanova 2020). We have thus considered M3 instead of M2, also because they are quite close in the Euro Area (see Darvas 2015). We test in the frequency domain both the direct link from one variable to the other one and the indirect link with respect to further explanatory variables like the inflation rate  factor modelling (Cendejas et al. 2014), some others use likelihood methods (Andrés et al. 2006, Canova andMenz 2011), or large-dimensional VAR models (Giannone et al. 2013), or dynamic factor models (Breitung and Eickmeier 2006) or VAR models Case with p = 3, A k,( j2) = 1, k = 1, 3, j = 1 and ω * = π 2 , A 2,(12) = 0, Σ 2 = diag(1, 5), and T = 300. In solid line, GC shape is reported at left, rejection rates for our NEW test and BC test are reported at center and at right. One standard deviation bands are reported in dotdash line. The values α = 0.05 and 1 are reported in dashed line with time-varying parameters (Psaradakis et al. 2005). On the contrary, we apply the inferential framework for GC in the frequency domain developed in Sect. 2. In this way, we provide explicit inference on unconditional and conditional GC. HICP, UN and LTN are used as conditioning variables, with the aim to discount for the mediating power of each of the three variables with respect to the relationship between output and money supply.

Data Preparation
We have considered the time series of GDP at market price in the Euro Area (chain linked volumes in Euro) and the monetary aggregate M3 and M1 (outstanding amount of loans to the whole economy excluded the monetary and financial sector, all currencies combined). M3 is also called "broad money", M1 "narrow money".
Since our goal is to focus on the effect of monetary policy on output, we restrict our analysis to the period 1999-2017, when the ECB has taken actual decisions on the Euro Area. We can thus denote our series by G D P t , M3 t , M1 t , H I C P t , U N t , LT N t , where t = 1, . . . , 76 (there are 76 quarters from Winter 1999 to Autumn  Giannone et al. 2012). We consider quarterly series as the GDP is regularly published on a quarterly base. The other series (all but GDP) are therefore made quarterly by averaging. We refer to "https://www.ecb.europa.eu/stats" and ECB (2012) for technical and computational details.
According to the Dickey-Fuller test, the logarithmic transforms of G D P t , M3 t , M1 t are non-stationary, as well as the three conditioning variables H I C P t , U N t and LT N t . Therefore, following Friedman and Schwartz (1975), we pass all series by Hodrick-Prescott filter (Hodrick and Prescott 1997), with the canonical value of λ = 1600, in order to remove any trend and to extract cyclical components. We do not use Baxter-King filter (Baxter and King 1999), as suggested in Belongia and Ireland (2016), because we have not enough end of sample data. Cycle extraction is performed via the R package "mFilter". Figures 12 and 13  reported and commented in "Appendix A". Since our ultimate goal is to infer about the cause-effect relationship of money stock and economic output, we test at each frequency Granger-causality spectra via our approach. Due to the use of Fast Fourier Transform, the frequencies used are the following: f i = i 80 , i = 1, . . . , 40, because T = 76. The frequency range is re-scaled to [0, 2] for the quarterly frequency of our series.
Relevant VAR models, estimated including an intercept by the R package "vars" (Pfaff et al. 2008), are selected by the Bayesian Information Criterion (BIC), imposing a maximum of four lags. BIC is used because we know that BIC is correctly estimating the unknown number of delays, while AIC may overestimate it, thus increasing the probability to estimate non-stationary VAR models. In any case, the roots of all estimated characteristic polynomials are strictly smaller than one. The number of bootstrap samples is 1000.
Note that for computational reasons BC test cannot be computed for p = 1. Besides, its p-value is constant across frequencies (except for ω = {0, π}) for p = 2. BC test requires a large number of delays, while ours works for all values, given that the resulting VAR is stationary and non-singular. Therefore, we can not compare directly our test to BC test on real data, because BC is not useful for all cases with p ≤ 2.

Causality Results
We start describing VAR estimates on the couple GDP-M3. Our lag selection procedure chooses 2 lags. In the G D P t equation, G D P t−1 and G D P t−2 are heavily significant, while M3 t−1 and M3 t−2 slightly are (at 5% and 10% respectively). This results in a GC spectral shape which is approximately constant and slightly significant across frequencies. In the M3 t equation, M3 t−1 is heavily significant, while G D P t−2 is at 10%. The corresponding GC shape is significant in the first half of the frequency range.
Concerning the couple GDP-M1, our VAR lag selection procedure chooses 2 lags. In the G D P t equation, G D P t−1 , G D P t−2 and M1 t−2 are heavily significant. The related unconditional GC shape is significant across the entire frequency range. In the M1 t equation, only M1 t−1 is heavily significant, while G D P t−1 has a p-value of 12%. The resulting GC spectral shape is thus significant in the first third of the frequency range.
In Figs. 14 and 15, unconditional and conditional GC spectra from M3 to GDP and viceversa are reported. The same spectra from M1 to GDP and viceversa are in Figs. 16 and 17 respectively. In dashed, our bootstrap threshold at 5% is outlined. In We first comment conditional GC spectra for the couple GDP-M3. Conditioning on HICP, the level of significance of M3 t−1 and M3 t−2 increases in the G D P t equation. This results in a GC decreasing across frequencies and significant across the entire frequency range. In the M3 t equation, the level of significance of G D P t−2 increases to 5%. As a result, GC is significant until the period of 1 year. Conditioning on UN, in the G D P t equation the level of significance of M3 t−1 is 5% while M3 t−2 is no longer significant. This results in a GC which is no longer significant. In the M3 t equation, G D P t−2 is no longer significant, resulting in a non-significant GC everywhere. Conditioning on LTN, the level of significance is 5% for M3 t−1 and 10% for M3 t−2 in the G D P t equation. The corresponding GC is significant across the entire frequency range. In the M3 t equation, G D P t−2 is significant at 5%, causing again GC to be significant almost everywhere.
We now comment conditional GC spectra for the couple GDP-M1. Conditioning on HICP, in the G D P t equation M1 t−2 is still heavily significant. The GC spectral shape is almost the same as the unconditional one. In the M1 t equation, the level of significance is quite smaller, so that causalities are significant only in the first quarter of the frequency range. Conditioning on UN, M1 t−2 is still heavily significant in the G D P t equation. The conditional GC magnitude is slightly weaker than the corresponding unconditional one. In the M1 t equation, G D P t−1 has a p-value of 20% and the related GC shape is significant in the first half of the frequency range. Conditioning on LTN, M1 t−2 is still significant at 1% in the G D P t equation, causing GC shape to be significant over all frequencies. In the M1 t equation, G D P t−1 has a p-value of 26%. As a consequence, we observe significance at low frequencies only.
Concerning the overall test on all causalities, we observe the absence of any significance in four cases out of sixteen: the unconditional GC from M3 to GDP, the conditional GC from M3 to GDP both on UN and LTN, and the conditional GC from GDP to M3 on UN. We remark that this test is conservative in nature: however, it allows to adequately contextualize the significance of individual tests. In particular, we can claim that, for those four cases, the GC spectrum is indistinguishable at all   GC spectra GDP to M1: unconditional, conditional on HICP, UN and LTN respectively. In dashed, our bootstrap threshold at 5% is outlined. In dotdash, the same threshold for the overall test obtained by Bonferroni correction is depicted frequencies with a significance level of at most 5% from the one we would observe in case of stochastic independence between output and money stock.

Conclusions
In this paper, we have developed a bootstrap testing procedure for Granger-causality (GC) in the frequency domain. In particular, our test is based on the null hypothesis of stochastic independence, which embeds the traditional one of no Granger-causality. Via our test, we highlight those frequencies which present a Granger-causality systematically larger than the median causality across frequencies observed under the null hypothesis. Such a goal is reached by appropriately exploiting the stationary bootstrap procedure of Politis and Romano (1994). We both deal with unconditional GC and conditional GC with respect to third variables. A simulation study has shown that our procedure is complementary with respect to the test of Breitung and Candelon (2006).
Our test has a general validity, as it only requires the stationary bootstrap of Politis and Romano (1994) to be consistent on the data generating process of interest, which requires T 1 3 → ∞, where T is the time series length. Therefore, our procedure may find application in several fields like macroeconomics, neuroscience, meteorology, seismology and finance. Among those, monetary economics is a very suitable application field, as the time series of interest often present a rich causality structure, and the need rises to disambiguate among relevant frequencies the significant ones for causality.
For this reason, we have applied our methodology to Euro Area money stock and GDP. From an empirical point of view, we have been able to say that the relationship between money supply and output is present in the Euro Area across the period 1999-2017. We have provided evidence that M3 (M1) in some cases reacts to economic shocks, in some others it acts as a policy shock with respect to economic output. We have observed that the link between GDP and M1 is much stronger in both directions than the link between GDP and M3. In particular, the causal relationship from M1 to GDP appears to be significant at all frequencies, while the opposite one is significant at low frequencies only. This allows us to conclude that the intensity of the causal link from money to output appears to be stronger than the reverse one in the Euro Area, reinforcing the idea, outlined among others in Musso et al. (2019), that M1 has a strong predictive power for GDP in the Euro Area. Data Availability The R package "grangers", downloadable at https://gthub.com/MatFar88/grangers, contains the data used for the analysis.
Code Availability The R package "grangers", downloadable at https://github.com/MatFar88/grangers, contains four functions performing the calculation of unconditional and conditional Granger-causality spectra and bootstrap inference on both, as well as two functions performing the tests of Breitung and Candelon (2006) on unconditional and conditional Granger-causality respectively (see Tastan 2015 for the STATA routines computing the same tests).

Conflict of interest
The authors declare that they have no conflict of interest.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. Figures 18 and 19 show the ACF of the estimated cycles by Hodrick-Prescott filter (λ = 1600) for the series of Euro Area GDP, M3, M1, HICP, UN and LTN across the period 1999-2017. The patterns are very similar across series: positive for the first 4-5 quarters, negative for all quarters around 2 years and non-significant elsewhere. UN shows a rebound for the quarters around 5 years. LTN is no longer significant after 2 quarters. Figure 20 shows the CCF for the couples GDP-M3 and GDP-M1. Their pattern is similar: we have positive correlation around 0 and negative correlation at sides around the lag of 2 years. Let us define the stochastic process Z † t = [X † t , Y † t ]. We assume that X † t , Y † t are stochastically independent, which causes the null hypothesis H 0 to hold. This is like assuming that the distribution function F Z † can be factorized as F X † F Y † . In addition, we assume that X † t and Y † t are strictly stationary. By Politis and Romano (1994)

A Auto and Cross Covariance Functions
whereF X † is the empirical density function of X † t and F X † is the corresponding true distribution function. The same equation holds for Y † .
If, for some d ≥ 0, E(g F X † (X † 1 )) 2+d < ∞, and if it holds ∞ k=1 α X † (k) the whole process Z † t . Therefore, for any Fréchet-differentiable functional r we can write under the assumption T 1 3 → ∞.

B.2 Proof of Theorem 2
Under the assumptions of Theorem 2, the stochastic processes Z and Y † are independent in distribution. Therefore, the distribution function of any statistics r (Z , Y † ) is the product of the distribution function of r (Z ) and the one of r (Y † ). Under the two assumptions on Z , we can apply Theorem 3.6 in Shao and Tu (1995), which prescribes that any re-sampling bootstrap approximation of r (Z ), like the one we adopt based on residual bootstrap, is consistent in weak sense as T → ∞. Since in our case the median of the functional r across frequencies is stochastically indistinguishable from the functional r at any frequency, we can apply the Theorem in paragraph 4.3 of Politis and Romano (1994) to claim that the bootstrap approximation of F Y † is also consistent in weak sense as T 1 3 diverges. From these arguments, the thesis follows. In fact, defined Z = [Z , Y † ], for any Fréchet-differentiable functional r we can write under the assumption T 1 3 → ∞.