This section describes how we evaluate forecasts, and introduces the forecasting methods we consider.
Measure of forecast accuracy
We measure the accuracy of distribution forecasts via the Continuous Ranked Probability Score (CRPS; Matheson and Winkler 1976; Gneiting and Raftery 2007). For a cumulative distribution function F (i.e., the forecast distribution) and a realization \(y \in \mathbb {R}\), the CRPS is given by
$$\begin{aligned} \text {CRPS}(F, y) = \int _{-\infty }^{\infty } (F(z) - 1(z \ge y))^2~\mathrm{d}z, \end{aligned}$$
(1)
where 1(A) is the indicator function of the event A. Closed-form expressions of the integral in (1) are available for several types of distributions F, such as the two types we use below: the two-piece normal distribution (Gneiting and Thorarinsdottir 2010) and mixtures of normal distributions (Grimit et al. 2006). Our implementation is based on the R package scoringRules (Jordan et al. 2016) which includes both of these variants.
Ensemble-based methods
Let \(x_{t+h}^i\) denote the point forecast of participant i, made at date t for date \(t + h\). Furthermore, let \(\mathcal {S}_{t+h}\) denote the collection of available participants, and denote by \(N_{t+h} = |\mathcal {S}_{t+h}|\) the number of participants. As described above, the set of available participants may change over time and across forecast horizons. Ensemble methods construct a forecast distribution for \(Y_{t+h}\), based on the individual forecasts \(\{x_{t+h}^i\}_{i\in \mathcal {S}_{t+h}}\). The ensemble forecast may depend on ‘who says what’, that is, it may vary across permutations of the indexes \(i \in \mathcal {S}_{t+h}\). If that is the case, the individual forecasters are said to be non-exchangeable. Alternatively, if the ensemble forecast is invariant to ‘who says what’, the forecasters are said to be exchangeable.
In order to treat forecasters as non-exchangeable, one must be able to compare them. For example, if we knew that Anne is a better forecaster than Bob, we might want to design an ensemble method which puts more weight on Anne’s forecast than on Bob’s. However, relative performance is hard to estimate in the ECB-SPF data set, since the past track records of different forecasters typically refer to different time periods. Similarly, estimating correlation structures among the forecasts, as is required for regression-type combination approaches (e.g., Timmermann 2006), requires to impute the missing forecasts in some way. Given the difficulties just described, it is perhaps not surprising that Genre et al. (2013) find very little evidence in favor of either performance-based or regression methods for the ECB-SPF data. Motivated by these concerns, we consider two simple ensemble methods which treat the forecasters as exchangeable:
-
1.
The first method (‘BMA’) constructs a forecast distribution as follows:
$$\begin{aligned} f_{t+h}^\text {BMA} = \frac{1}{N_{t+h}} \sum _{i \in \mathcal {S}_{t+h}} \mathcal {N}(x_{t+h}^i, \theta ), \end{aligned}$$
(2)
where \(\theta \in \mathbb {R}_+\) is a scalar parameter to be estimated. In words, (2) posits an equally weighted mixture of \(N_{t+h}\) forecast distributions, each of which corresponds to an individual forecaster i. Each of these distributions is assumed to have the same variance, \(\theta \). The method has been proposed by Krüger and Nolte (2016), who find that it performs well for the US SPF data. The label ‘BMA’ hints at the method’s close conceptual connection to the Bayesian model averaging approach proposed by Raftery et al. (2005) in the meteorological forecasting literature. Denote the mean of the survey forecasts by \(\bar{x}_{t+h} = N_{t+h}^{-1} \sum _{i \in \mathcal {S}_{t+h}} x_{t+h}^i\). Then, the variance of the forecast distribution in (2) is given by \(D_{t+h} + \theta \), where
$$D_{t+h} = N_{t+h}^{-1} \sum _{i \in \mathcal {S}_{t+h}} \left( x_{t+h}^i-\bar{x}_{t+h}\right) ^2$$
is the cross-sectional variance of the survey forecasts (‘disagreement’). The two-component structure of the BMA forecast variance is similar to the model by Ozturk and Sheng (2016) which decomposes forecast uncertainty into a common component (similar to \(\theta \) in Eq. 2) and an idiosyncratic component (which they proxy by forecaster disagreement, \(D_t\)).
-
2.
The ‘EMOS’ method assumes that
$$\begin{aligned} f_{t+h}^\text {EMOS} = \mathcal {N}(\bar{x}_{t+h}, \gamma ), \end{aligned}$$
(3)
where \(\gamma \in \mathbb {R}_+\) is a parameter to be estimated. The method simply fits a normal distribution around the mean of the survey forecasts. Unlike in the BMA method, forecaster disagreement does not enter the variance of the distribution. The label ‘EMOS’ alludes to the method’s similarity to the ensemble model output statistics approach of Gneiting et al. (2005), again proposed in a meteorological context.
Both ensemble methods require only one parameter to be estimated, which can be done via grid search methods. In each case, we fit the parameter to minimize the sample average of the CRPS (see Sect. 3.1), based on a rolling window of 20 observations. Conceptually, the BMA method is based on the idea of fitting a simplistic distribution to each individual forecaster, and then averaging over these distributions. This is why Krüger and Nolte (2016) call it a ‘micro-level’ method. In contrast, the EMOS method fixes a normal distribution and fits the parameters of that distribution via a summary statistic (the mean) from the ensemble of forecasters.
Survey histograms
In our out-of-sample analysis, we consider the average histogram over all forecasters (for a given date, variable, and forecast horizon). In order to convert the histogram into a complete forecast distribution, we consider a parametric approximation, obtained by fitting a two-piece normal distribution (Wallis 2004, Box A) to the histogram. Specifically, the parameters of the approximating distribution solve the following minimization problem (suppressing time and forecast horizon for ease of notation):
$$\begin{aligned} \min _{\mu , \sigma _1, \sigma _2} \sum _{j=1}^J \left[ P_j - F_{\text {2PN}}(r_j; \mu , \sigma _1, \sigma _2)\right] ^2, \end{aligned}$$
(4)
where \(r_j\) is the right endpoint of histogram bin \(j = 1, \ldots , J\); \(P_j\) is the cumulative probability of bins 1 to j, and \(F_{\text {2PN}}(\cdot ~; \mu , \sigma _1, \sigma _2)\) is the cumulative distribution function of the two-piece normal distribution with parameters \(\mu , \sigma _1\) and \(\sigma _2\).
Parametric approximations to survey histogram forecasts have been proposed by Engelberg et al. (2009), who consider a very flexible generalized beta distribution as an approximation. In our case, replacing the two-piece normal distribution by a conventional Gaussian distribution yielded very similar forecasting results, suggesting that the flexibility of the two-piece normal is well sufficient.
Simple time series benchmark model
We also compare the ECB-SPF to a simple Gaussian benchmark forecast distribution, with mean equal to the random walk prediction, and variance estimated from a rolling window of 20 observations (in line with the sample used for fitting the ensemble methods). Specifically, denote the series of training sample observations by \(\{y_t\}_{t = T-19}^T.\) Then, the benchmark distribution has mean \(y_T\) and variance \(h_q \times \frac{1}{19} \sum _{t = T-18}^T (y_t-y_{t-1})^2,\) where \(h_q\) is the forecast horizon (in quarters). Our choice of the random walk is motivated by the quarter-on-quarter definition of the predictands, which implies considerable persistence almost by definition (see Fig. 3).Footnote 3