German forecasters’ narratives: How informative are German business cycle forecast reports?

Müller, Karsten

doi:10.1007/s00181-021-02100-9

German forecasters’ narratives: How informative are German business cycle forecast reports?

Open access
Published: 31 July 2021

Volume 62, pages 2373–2415, (2022)
Cite this article

Download PDF

You have full access to this open access article

Empirical Economics Aims and scope Submit manuscript

German forecasters’ narratives: How informative are German business cycle forecast reports?

Download PDF

Karsten Müller ORCID: orcid.org/0000-0001-8485-7615¹

1408 Accesses
2 Citations
2 Altmetric
Explore all metrics

This article has been updated

Abstract

Based on German business cycle forecast reports covering 10 German institutions for the period 1993–2017, the paper analyses the information content of German forecasters’ narratives for German business cycle forecasts. The paper applies textual analysis to convert qualitative text data into quantitative sentiment indices. First, a sentiment analysis utilizes dictionary methods and text regression methods, using recursive estimation. Next, the paper analyses the different characteristics of sentiments. In a third step, sentiment indices are used to test the efficiency of numerical forecasts. Using 12-month-ahead fixed horizon forecasts, fixed-effects panel regression results suggest some informational content of sentiment indices for growth and inflation forecasts. Finally, a forecasting exercise analyses the predictive power of sentiment indices for GDP growth and inflation. The results suggest weak evidence, at best, for in-sample and out-of-sample predictive power of the sentiment indices.

A survey on sentiment analysis methods, applications, and challenges

Article 07 February 2022

Sentiment analysis using deep learning architectures: a review

Article 02 December 2019

Sentiment Analysis of Financial News: Mechanics and Statistics

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

German business cycle forecast reports offer quantitative point forecasts and qualitative text data for growth and inflation, among other variables. The qualitative texts describe forecasters’ views on the macroeconomic situation and development. And, the narratives also express the forecasters’ expectations about the future economic development. Using the narratives, the forecasters’ expectations can be objectified by applying textual analysis methods to generate sentiment indices. The key issue is to analyse whether the forecasters’ narratives contain additional information beyond the quantified forecasts.

The evaluation of German and international business cycle forecasts has traditionally focused on the analysis of quantitative point forecasts. A large number of existing studies have examined the accuracy and efficiency of German macroeconomic forecasts (see e.g. Heilemann and Stekler 2013; Fritsche and Tarassow 2017; Döpke et al. 2019, and the literature cited therein). Prior research suggests three key insights. First, macroeconomic forecasts for Germany are (mostly) unbiased, but inefficient (see e.g. Döpke et al. 2010; Krüger and Hoss 2012). Second, forecast errors seem to be stable on average over decades which are neither increasing nor decreasing in tendency (Heilemann and Stekler 2013). Third, no forecaster’s performance is uniformly superior (Döpke and Fritsche 2006), and there are not significant institutional differences in accuracy across a long time horizon (Döhrn and Schmidt 2011).^{Footnote 1}

Recently, another forecast evaluation approach, which uses qualitative text as data, has become increasingly popular. In this context, textual analysis methods are applied to convert qualitative text data into quantitative scores. The generated indices are used for forecast evaluation tests. Two major strands of the literature can be identified.

One strand will be subsumed here under the term ‘elicited forecasts’, which was used by Jones et al. (2020). This concept applies a manual scoring procedure to quantify qualitative assessments about the future stance of the economy. Goldfarb et al. (2005) mapped newspaper articles published during the Great Depression into an index series using a scoring system to compare the quantified qualitative assessments with numerical forecasts and realized values. A series of forecast evaluation studies applied the developed scoring procedure of Goldfarb et al. (2005) in several contexts to generate elicited forecasts to evaluate them (see e.g. Lundquist and Stekler 2012; Stekler and Symington 2016; Mathy and Stekler 2018). The recent analysis of Jones et al. (2020) investigates the Bank of England’s growth forecasts using elicited forecasts over the period 2005–2015. The more general research question as to whether the text contains additional information for the numerical forecasts is similar to this work. Jones et al. (2020) find that the economic development in the UK is accurately represented by the elicited forecasts. Moreover, regression results suggest informational content of the text index in the sense that they can improve the Bank of England’s numerical growth nowcasts and one-quarter-ahead forecasts.

A second strand of the literature uses computational text analysis methods to generate text-based sentiment indices. Clements and Reade (2020) and Sharpe et al. (2020) are two seminal related studies. The latter study applies computational text analysis to quantify the ‘tonality’ (the degree of optimism versus pessimism) of the Federal Reserve Board’s Greenbooks and examines whether this measure has predictive power for the economic development over the period 1972–2009. The investigation shows some predictive power of the Greenbook tonality on Greenbook numerical GDP growth and unemployment forecasts, as well as on private GDP forecasts. The latter point implies that the sentiment index also covered policy-relevant information (Sharpe et al. 2020). Clements and Reade (2020) analyse whether the narratives in the Bank of England’s Inflation Reports contain useful information about the future course of GDP growth and inflation between 1997 and 2018. Encompassing tests show some informational content for predicting GDP forecast errors for one and two quarters ahead, but no evidence that sentiment indices are useful to predict forecast revisions. Both studies use the dictionary-based approach to generate sentiment indices, and both studies show that ‘an important element of economic forecasting is in the accompanying narrative’ (Sharpe et al. 2020, p. 31).

Considering German forecasters’ narratives, Fritsche and Puckelwald (2018) analyse the topics of German business cycle forecast reports using generative models. The authors find that textual expressions vary with the business cycle, which is in line with the hypothesis of adaptive expectations. But, a number of questions regarding German forecasters’ narratives remain to be addressed.

There is a broader and growing literature in (computational) textual analysis in economics, finance, and accounting (see e.g. Loughran and McDonald 2016; Gentzkow et al. 2019, and the literature cited therein). The following examples give a selective overview of literature that is related to this paper. For example, Shapiro et al. (2020) for the US and Lamla et al. (2020) for Germany use textual analysis tools to create news media sentiment indicators. Both studies has provided evidence for a correlation between news media sentiment indicators and the business cycle and show that sentiment indicators can serve as predictors of the future stance of the economy. Another strand of the literature concerns the predictability of stock market activity. Tetlock (2007), Tetlock et al. (2008) and Garcia (2013) use a dictionary-based approach to generate sentiment indices via news coverage. Loughran and McDonald (2011, 2016) developed a finance-specific dictionary to improve the forecasting performance relative to existing linguistic dictionaries. Jegadeesh and Wu (2013) and Manela and Moreira (2017) apply text regression methods to predict stock market outcomes, while Jegadeesh and Wu (2013) show that text regression-based sentiment indices are superior to sentiment indices based on Loughran and McDonald (2011) dictionary in an out-of-sample forecast environment. The analysis of central bank communication is another topic in text mining. Jegadeesh and Wu (2017) find incremental information value in the Federal Open Market Committee meeting minutes. The authors use a generative model to quantify the tone and the topics of texts. Tillmann and Walter (2018) apply dictionary-based sentiment indices to analyse the tone of Bundesbank and ECB speeches. The authors find significant divergences between the tone of the two institutions. An additional topic is about measuring policy uncertainty. Baker et al. (2016) developed the prominent economic policy uncertainty index (EPU) by analysing news coverage with a dictionary method. Using a (nonlinear) text regression method to construct an EPU for Belgium, Tobback et al. (2018) show that they have improved the predictive power of the EPU.

This paper makes several contributions to the literature on forecast evaluation and textual analysis. First, German forecasters’ narratives were considered using textual analysis methods. Second, previous studies have almost exclusively focused on dictionary methods to generate sentiment indices. To the best of the author’s knowledge, this paper is the first in forecast evaluation to apply (linear) text regression approaches, and additionally, it uses a recursive estimation technique. Third, the paper tests why forecasters’ narratives have predictive power. Although recent studies discussed several explanatory hypotheses, the answer is still insufficiently explored.

The purpose of the paper is to analyse German forecasters’ narratives and the question as to whether the forecasters’ stories and expectations contain additional information relative to numerical forecasts. Based on 534 business cycle forecast reports covering 10 German institutions from 1993 to 2017, the paper creates sentiment indices using text mining techniques. Regression results suggest that some sentiment indices can reduce the absolute magnitude of the quantitative forecast errors for GDP growth and inflation forecasts. German forecasters’ narratives are informative for the accuracy of German business cycle forecasts. One explanation might be that forecasters’ narratives contain useful information about the future stance of the German economy. An in-sample and out-of-sample forecasting exercise tests whether the sentiment indices can predict the evolution of German economic activity. Forecasting results indicate weak in-sample predictive power and out-of-sample predictive power of the sentiment indices.

The following section explains the methodology used to convert qualitative text data into quantitative sentiment scores. Section 3 describes the employed text corpus and numerical data. Section 4 analyses the empirical results, and Sect. 5 concludes and discusses these results.

2 Methodology: sentiment analysis

There are various computational analysis methods to connect word counts to attributes to generate sentiment indices, e.g. dictionary-based methods, text regression methods, generative models, and word embeddings (Gentzkow et al. 2019). This paper uses dictionary-based methods and text regression methods to convert qualitative text data into quantitative indices.

Furthermore, qualitative measures can only be directly related to macro-variables, provided that they are appropriately scaled (Clements and Reade 2020, p. 1491). Hence, all weighted sentiment indices are standardized to have a mean equal to zero and a standard deviation equals to one. In order to avoid bias in the measure, all weighted sentiments are normalized by the total number of words per report to account for varying text lengths and numbers of documents per year (Fritsche and Puckelwald 2018).

2.1 Dictionary-based method

Following Clements and Reade (2020) and Sharpe et al. (2020), the dictionary-based method is applied to develop sentiment indices. In fact, three well-established linguistic dictionaries are used to generate five different indices.

First, the word list is prepared by Bannier et al. (2018). This is the German equivalent of the English original dictionary provided by Loughran and McDonald (2016). The last-mentioned word list is well established for textual analysis in finance- and accounting-specific contexts. The word list prepared by Bannier et al. (2018) includes over 2200 positive and 10,000 negative word forms. The dictionary is binary coded for polarity in positive and negative terms.
Second, there is a forecast-specific German dictionary-based on Sharpe et al. (2020). According to Di Fatta et al. (2015), words have different connotations and meanings in different contexts, and sentiment indices have to be adapted to the content to which they have been applied. To this end, Sharpe et al. (2020) developed a forecast-specific word list which excludes words that have special meanings in an economic forecasting context. The word list contains 205 positive and 103 negative words (see Tables 8, 9) and is binary coded like the previous one.
Finally, there is the SentimentWortschatz (SentiWS) dictionary (Remus et al. 2010). The SentiWS dictionary contains a German-specific word list for sentiment analysis. The current version (v2.0) contains about 16,000 positive and 18,000 negative word forms, and unlike the other two dictionaries, it includes weights for polarity within the interval of $[-1; 1]$.

Two different score systems will be applied for the two binary dictionary-based sentiments (hereinafter called ‘Bannier’ and ‘Sharpe’). Sentiment score number one consists of the difference between positive word count, P, and negative word count, N, normalized by the total number of words, T, per report:

$$\begin{aligned} \hbox {Sentiment \,score}_1 = (P - N) / T \end{aligned}$$

(1)

The second sentiment score (polarity score) is defined as the quotient of the difference between positive and negative word counts and the sum of positive and negative words:

$$\begin{aligned} \hbox {Sentiment \,score}_{2} = (P - N) / (P + N) \end{aligned}$$

(2)

In contrast, the SentiWS index is a continuous score. The score of each word sums up over all words and is normalized by the total number of words per report.

2.2 Automatic variable selection approach

The automatic variable selection approach is a promising text regression method to generate regression-based sentiment indices (e.g. Pröllochs et al. 2018). In contrast to the dictionary-based method, here the required dictionary is not given and will be recursively estimated. In fact, the estimated parameters will be updated by expanding the estimation windows by one observation in chronological order (see Sect. 2.3). Generally, text regression methods introduce a regularization penalty that reduces the complexity, number, and size of the predictors included in the model. Penalized linear models use each word in the text corpus as explanatory variables, shrink non-informative noise variables to zero, and select decisive variables (Pröllochs et al. 2015).

Regularization methods can serve as mathematical mechanisms to extract important terms, which is why it is a common tool for variable selection in data science (Pröllochs et al. 2018; Varian 2014). Given a standard multivariate regression with y (dependent variable) as a linear function of $\beta _0$ (constant) and $x_j$ (explanatory variable), the penalty term of the form:

$$\begin{aligned} \lambda \sum _{j=1}^{P} \left[ (1-\alpha ) \vert \beta _{j} \vert + \alpha \vert \beta _{j}^2 \vert \right] \end{aligned}$$

(3)

can be added (Varian 2014). Setting $\alpha = 0$, the term Eq. 3 reduces to the linear $l_1$-norm penalty $\lambda \sum _{j=1}^{P} \vert \beta _{j} \vert $, which represents the least absolute shrinkage and selection operator (LASSO) introduced by Tibshirani (1996). Formally, the LASSO estimator is given by (Pröllochs et al. 2015):

$$\begin{aligned} {\hat{\beta }}_\mathrm{LASSO} = {{\,\mathrm{arg\,min}\,}}_\beta \sum _{i=1}^{N} \left[ y_i - \beta _0 + \sum _{j=1}^{P} \beta _j x_{ij} \right] ^2 + \lambda \sum _{j=1}^{P} \vert \beta _{j} \vert \end{aligned}$$

(4)

where $x_{ij}$ are the document terms (words $j = 1, \ldots , P$) for forecast report $i = 1, \ldots , N$, and $y_i$ represents the 12-month-ahead fixed horizon growth and inflation forecasts as response variables. If $\lambda = 0$, the penalty reaches zero, and we get the classical OLS estimator by simply minimizing the residual sum of squares. The higher $\lambda $, the larger the penalty shrinkage gets, with the result that more coefficients end up being zero. The optimal $\lambda ^*$ is estimated by minimizing the mean squared error (MSE) (Dimpfl and Kleiman 2019):

$$\begin{aligned} \hbox {MSE}_\mathrm{CV} (\lambda ) = \frac{1}{K} \sum _{i=1}^{K} \frac{1}{n_{i}} \vert \vert y_{i}-X_{i}{\hat{\beta }}_\mathrm{LASSO}^{-i} \vert \vert _{2}^{2} \end{aligned}$$

(5)

using an established 10-fold cross-validation, where $n_i$ is the size of ith subsample. Therefore, the data are split into K subsets, one part i is removed, the coefficients ${\hat{\beta }}_\mathrm{LASSO}^{-i}$ are estimated, and the cross-validated $\hbox {MSE}_\mathrm{CV} (\lambda )$ is calculated for any given value of $\lambda $.

In contrast, setting $\alpha = 1$ shortens the term Eq. 3 to the quadratic $l_2$-norm penalty $\lambda \sum _{j=1}^{P} \beta _{j}^2$, and the ridge estimator is implemented (Pröllochs et al. 2015):

$$\begin{aligned} {\hat{\beta }}_\mathrm{Ridge} = {{\,\mathrm{arg\,min}\,}}_\beta \sum _{i=1}^{N} \left[ y_i - \beta _0 + \sum _{j=1}^{P} \beta _j x_{ij} \right] ^2 + \lambda \sum _{j=1}^{P} \beta _{j}^2 \end{aligned}$$

(6)

Again, the tuning parameter $\lambda $ is the regularization penalty. The quadratic penalty $l_2$-norm follows similar characteristics to the LASSO penalty: if $\lambda $ reaches zero, we get OLS coefficients; if $\lambda $ moves towards infinity, the coefficients come down to zero. However, in contrast to the LASSO regularization, the ridge estimator does not set explicitly some coefficients equal to zero (Pröllochs et al. 2015).^{Footnote 2} Again, the optimal $\lambda ^*$ is estimated by minimizing the MSE using 10-fold cross-validation.

Equations 4 and 6 are used to estimate the LASSO and ridge regression coefficients ${\hat{\beta }}_\mathrm{LASSO}$ and ${\hat{\beta }}_\mathrm{Ridge}$. The magnitude of ${\hat{\beta }}_\mathrm{LASSO}$ and ${\hat{\beta }}_\mathrm{Ridge}$ serve as the weight and a measure of variable importance, specifying which variables (words) are included in the final dictionary (Pröllochs et al. 2015). A linear rule is then applied to calculate document ith sentiment score. Again, the document’s score is defined as the continuous score normalized by the total number of words.

2.3 Recursive estimation

In order to guarantee that no information is produced and used for tests for forecast efficiency and predictive power that are (hypothetically) not known for forecaster in time t, a recursive estimation technique will be applied for sentiment indices based on the automated variable selection approach. First, a sufficiently large text corpus is generated as a basis (pre-estimation corpus) using business cycle forecast reports from the period 1993–1998, including 74 observations. Second, based on the pre-estimation corpus, a recursive estimation approach is applied, expanding the estimation windows by one observation per estimation in chronological order. In fact, the following procedure is executed in each recursive estimation step: First, the extended text corpus is established and weighted; second, the optimal $\lambda ^*$ is estimated by minimizing the MSE using 10-fold cross-validation; third, LASSO and ridge estimator (Eqs. 4, 6 ) are used to estimate the respective dictionaries and weights (${\hat{\beta }}_\mathrm{LASSO}$ and ${\hat{\beta }}_\mathrm{Ridge}$); finally, the respective sentiment (document) score is calculated and stored in a common series.

3 Corpus and data

3.1 The text corpus

The plain corpus includes business cycle forecast reports for Germany issued by 10 institutions with different institutional backgrounds. First, the corpus covers the six largest economic research institutes in Germany that are formally politically and economically independent. These comprise the five publicly founded institutes, the Ifo Institute Munich (Ifo), the Berlin Institute (DIW), the Essen Institute (RWI), the Halle Institute (IWH), the Kiel Institute (IfW), and the privately funded Hamburg Institute (HWWI).^{Footnote 3} Second, the corpus contains institutes that are funded by interest groups: the employer’s institute of the German economy located in Cologne (IW Köln), and the trade union’s macroeconomic policy institute (IMK). Third, the corpus includes the ‘joint diagnosis’ (GD), the economic projection of the leading research institutes as an institution within the process of economic policy advice. Fourth, the corpus covers a financial institution, the Bundesbank. The German central bank is another formally politically and economically independent public institution.

The entire corpus contains 534 documents.^{Footnote 4} There is a wider range of potential business cycle forecast reports for Germany than the selected institutes that did not meet the defined criteria. For the selection, a range of criteria was checked:

Business cycle forecast (sub-)section Business cycle forecast reports are heterogeneous in size and content. Some reports are structured into different subsections like recent national or international economic development, business cycle forecasts, economic policy advices, or methodological explanations. Other reports are miscellaneous texts of various themes and cannot be split in a meaningful way. Therefore, business cycle reports should contain a clearly defined forecast (sub-)section.
Time range The corpus covers business cycle forecast reports for Germany from 1993 to 2017 to circumvent the German reunification and possible misspecification for East and West Germany.
Forecasters’ experiences Continuity and regularity of publication within the examined period ensure forecasters’ experiences in the field of economic forecasting, ensuring a sufficient level of homogeneity in language across institutes.
Language homogeneity The (relatively short) period of 25 years as well as forecasters’ experiences assures a sufficient degree of homogeneity in language over time.
Quantitative forecast availability To use a comparative sample for growth and inflation forecast analysis, only business cycle forecast reports with a calculable fixed horizon forecast for growth and inflation will be used. The availability of numerical point forecasts of growth and inflation for the current and next year restricts the number of incorporated forecast reports (see Sect. 3.2).
Forecasting date The forecasting date is distributed over the whole year, depending on respective institutional practice and the frequency of publication. In most cases, the frequency of publication is bi-annual or higher.
Text availability Another criterion was the public availability of business cycle forecast reports, which is why private institutes like banks are not included.

As a result, 534 business cycle forecast reports for Germany issued by 10 institutions are used for the creation of the corpus. In the first step of textual analysis, data cleaning and linguistic pre-processing are applied to all texts. In fact, line breaks, numbers and words with fewer than four characters are eliminated, lower cases were introduced, stopwords (e.g. from German linguistic stopword lists or names) and sparse terms where a word that occurs in less than 10% of documents are removed. With reference to Zipf’s law (Zipf 1949), the texts are weighted with their term frequency—inverse document frequency (tf-idf).^{Footnote 5} Zipf’s law for empirical language implies that a word’s frequency is inversely proportional to its rank. Consequently, the corpus is adjusted for that symptom. Figure 1 shows the wordcloud of the weighted corpus. The wordcloud sort terms frequency in descending order. The larger the word, the more often the term occurs. The wordcloud shows that the weighted corpus includes a lot of important forecast-specific vocabulary, for example ‘Anstieg’ (growth), ‘Prognose’ (forecast), and ‘Exporte’ (exports).^{Footnote 6}

Finally, Porter’s stemming algorithm (Porter et al. 1980) is used to truncate the different word forms to its base forms.^{Footnote 7}

3.2 The sample

The incorporated business cycle forecast reports for Germany typically contain numerical fixed event forecasts of growth and inflation for the current and next year. Depending on the forecast date, the forecast horizon of fixed event forecasts varies from one up to 11 months. Heilemann and Müller (2018) show in a forecast evaluation study for Germany that forecast accuracy decreases with increasing forecast horizon, and that differences in forecast accuracy are mainly determined by the different timings of the production of the forecasts.^{Footnote 8}

Furthermore, uncertainty and cross-sectional dispersion of fixed event forecasts show a pronounced seasonal pattern (Dovern et al. 2012). Consequently, fixed horizon forecasts are used to reduce different forecast horizons within one quarter. The method of Dovern and Fritsche (2008), Heppke-Falk and Hüfner (2004) and Smant (2002)

$$\begin{aligned} {\hat{y}}^{12}_{i,t} = \frac{4-q+1}{4}{\tilde{y}}^{0}_{i,t} + \frac{q-1}{4}{\tilde{y}}^{1}_{i,t} \end{aligned}$$

(7)

is applied to construct 12-month-ahead fixed horizon forecasts for growth and inflation. Given current (${\tilde{y}}^{0}_{i,t}$) and next (${\tilde{y}}^{1}_{i,t}$) year fixed event forecast, q is equal the quarter where the forecast is done. Subsequent, the fixed horizon forecast is approximated as a quarterly weighted average of their share in both years. For example, considering the forecasts of the Berlin institute from September 2015, ${\tilde{y}}^{0}_{i,t} = 1.8$ and ${\tilde{y}}^{1}_{i,t} = 1.9$, q is equal to three and the 12-month-ahead fixed horizon forecast ${\hat{y}}^{12}_{i,t} = 1.85$.

Moreover, forecast narratives cannot distinguish between different forecast horizons within a quantitative textual analysis. All in all, nine different sentiment indices will be calculated for each forecasting report at time t.

Figure 2 depicts the different forecast horizons and the construction of 12-month-ahead fixed horizon forecast and sentiment indices using an forecast report of the German institute of economic research (DIW Berlin).

Besides, seasonally adjusted and finally revised real GDP is used for realized GDP growth (quarterly data, source Federal Statistical Office 2019b). Finally, the revised consumer price index is used for actual inflation outcome (monthly data, source Federal Statistical Office 2019a).^{Footnote 9} (Dovern et al. 2012) point out that the approximation error in the fixed horizon series in Eq. 7 could result in a correlation if dependent variable and regressors are constructed in the same way. To avoid this, the annualized cumulative percentage change from past quarter $t-h$ to current quarter t is used for the realized values. Thus, $h=4$ denotes the forecasting horizon in quarters based on the 12-month-ahead fixed horizon forecasts.

The forecast error is defined as $e_{t} = A_{t} - P_{t}$—the realized value in period t minus the forecast made in period $t-j$. Hence, a positive forecast error represents an underestimation of the growth (inflation) rate, and vice versa, whereas a negative forecast error corresponds to an overestimation.

Table 1 Descriptive statistics on forecast accuracy in Germany, 1993–2017

Full size table

Table 1 provides an overview of some standard error measures of forecast evaluation (see for example, Fildes and Stekler 2002) for the pooled data of the introduced sample. On the whole, the error measures correspond to previous forecast evaluation studies for Germany (Heilemann and Stekler 2013; Döpke et al. 2019). The ME is nearly zero, indicating unbiased forecasts. Growth forecasts MAE and RMSE are on average large compared to Heilemann and Stekler (2013) and Döpke et al. (2019) due to the forecasting error in the Great Recession 2008/2009.^{Footnote 10}

Considering the ability to forecast turning points, three directional analysis measures are included. Referring to Diebold and Lopez (1996, p. 28) and Merton (1981), the information content of a forecast series is calculated.^{Footnote 11} The forecasts beat a pure coin-flip if the informational content has a value above one. Second, a $\chi ^2$-test validate whether the forecasts are significant better than chance, testing the null hypothesis of no information content of the forecasts under investigation. The results indicate that both, growth and inflation forecasts, have an significant informational content at conventional significance levels. In addition, the area under a receiver operating curve (AUROC), a frequently used measure of the quality of directional forecasts (see, e.g. Berge and Jordà 2011; Pierdzioch and Rülke 2015; Liu and Moench 2016) is calculated. An AUROC $ < 0.5 $ indicate that forecasts are even worse than pure coin-flip and an AUROC $ = 0.5 $ that forecasts are indistinguishable from a pure coin-flip because the ROC curve coincides with the $45^{\circ }$ line. An AUROC $ > 0.5 $ and $ < 1 $ beat the coin-flip, whereas an AUROC $ = 1 $ represents perfect forecasts. Considering the AUROC for growth and inflation forecasts, both forecasts beat again pure coin-flip and indicate to some directional accuracy.

4 Empirical results

4.1 Sentiments’ characteristics

Table 2 gives an overview of sentiment characteristics.

Table 2 Overview dictionaries metrics

Full size table

Considering dictionary metrics as positive and negative entries and standard statistical measures, Table 2 shows how different the individual sentiment approaches work. The ridge estimation results show that the ridge estimator does not explicitly set some coefficients equal to zero. In contrast to the LASSO estimator, the ridge approach selects much more words as its LASSO counterpart.

Tables 10, 11 and 12 list in a full sample example the (stemmed) dictionaries and weights generated by the automated variable selection approach. Table 10 shows the estimated 71 words and their coefficients according to LASSO regression with real GDP growth forecasts as the response variable (hereinafter ‘LASSO_GDP_P’). The term with the most positive weight is ‘upswing’ (‘Aufschwung’), which in German is also a synonym for ‘boom’ or ‘recovery’, whereas ‘drastic’ (‘drastisch’) is the word with the most negative coefficient. The list of plausible words and weight with respect to GDP development is long, i.e. ‘export dynamic’ (‘Exportdynamik’), ‘continuation’ (‘Fortsetzung’), ‘lively’ (‘schwungvoll’) with positive coefficients, or ‘deep’ (‘tief’), ‘layoffs’ (‘Entlassungen’), and ‘shrink’ (‘schrumpfen’) with negative coefficients. Nevertheless, the list contains few outliers whose economic sense is not immediately clear, e.g. ‘a third’ (‘drittel’), or where the words have a non-intuitive weight, such as ‘recover’ (‘erholen’).^{Footnote 12}

Similar patterns can be observed in other text regression-based dictionaries. Table 11 lists the estimated 69 words and weights according to LASSO regressions, with inflation forecasts as the response variable (hereinafter ‘LASSO_INF_P’). Table 12 list ridge regression results for real GDP growth forecasts (hereinafter ‘Ridge_GDP_P’) and inflation forecasts (hereinafter ‘Ridge_INF_P’). Both tables list the top 30 estimated words with the largest positive and negative coefficients.

Figures 3 and 4 give a visual impression of the generated sentiment indices. The figures illustrate the sentiment values per business cycle forecast report aggregated over years and across institutes, in combination with the realized real GDP growth, or inflation rate, respectively. Panels (a) to (i) present for each sentiment specification the aggregate sentiment value per year on the left axis (solid line), and the realized value of GDP growth, respective inflation, on the right axis (dashed line).

Considering each of the panels from (a) to (i) separately, we can conclude that each sentiment specification varies in its pattern. Concerning, for instance, the Great Recession in 2008–09, it can be seen that some sentiment indices are closer to the real development, i.e. LASSO_GDP forecast in Fig. 3, whereas some sentiment indices have a longer time lag, i.e. Sharpe 1 in Fig. 3. Other sentiment indices are even ahead of the real development, i.e. Sharpe 2 in Fig. 4. Another picture illustrates a (partly) countercyclical behaviour. For example, Bannier1 and Bannier2 in Fig. 4 show this countercyclical behaviour, which could be explained by a huge time lag or an opposite polarity of terms.

In summary, the generated sentiment indices differ across patterns and in amplitude, as well as in terms of time lag and lead.

4.2 Forecast efficiency

Forecast efficiency analysis is used to test whether the narratives of German business cycle reports contain useful information for the numerical forecasts of German forecasters. More precisely, we test whether the sentiment indices can be used to improve the accuracy of the quantitative point forecasts. In particular, we test for weak and strong efficiency of forecasts by using the specification of Holden and Peel (1990):

$$\begin{aligned} e_{i,t} = \beta _{0,i} + \beta _1 e_{i,t-1} + \beta _2 \hbox {Sentiment}_{i,t} + u_{i,t}, \end{aligned}$$

(8)

and test the joint null hypothesis $H_0 : \beta _{0,i} = \beta _1 = \beta _2 = 0$.

In Eq. 8, $e_{i,t}$ is the forecast error of forecaster i in time t, $\beta _{0,i}$ is institution’s i individual effect, $e_{i,t-1}$ is the institution’s forecast error made in $t-1$, $\hbox {Sentiment}_{i,t}$ is the forecaster’s sentiment index at time t as exogenous variable which is known by the forecasters on the forecasting date, and $u_{i,t}$ is the error term. Forecasts are weakly efficient if the forecast errors are not autocorrelated, and forecasts are strongly efficient if there is no variable that helps to predict the forecast errors, including the lagged forecast error. Optimal forecasts should consider all available information at the date of the forecast. A fixed effects estimation approach is used to account for individual institutional effects, such as different forecast horizons during the quarter. According to Gaibulloev et al. (2014), panel-corrected standard errors (PCSE) suggested by Beck and Katz (1995) are reliable for panel type T>N to deal with unit heterogeneity and panel heteroscedasticity. The standard test statistics are reliable and the Nickell bias (Nickell 1981) is negligible (see Gaibulloev et al. 2014, and the literature cited therein).^{Footnote 13} Estimates are corrected for serial and cross-sectional correlation. Comparable forecast evaluation studies have used this kind of robust standard errors (see, among others, Keane and Runkle 1990; Kauder et al. 2017; Döpke et al. 2019).

Table 3 Tests for efficiency of forecasts—1999-2017

Full size table

Table 3 presents the estimated parameters and the standard errors (in parentheses) of the individual coefficients and the p-value [in brackets] for the joint efficiency test. In almost all cases, the weak efficiency condition of no serial correlation of the forecast errors has to be rejected for GDP growth forecasts. Moreover, test results with sentiment indices indicate several significant influences of forecasters’ narratives for forecast accuracy. For both Sharpe sentiment indices, as well as for all text regression-based sentiment indices, the null of no correlation has to be rejected at a conventional significance level. The negative coefficients indicate that a higher sentiment value correlates with a higher GDP prediction in that smaller (or negative) forecast errors imply higher forecast values. In addition, all specifications reject the joint test on efficiency. But it is not clear whether the autocorrelated forecast error or the sentiment indices are the reason for the rejection of the joint tests.

Considering inflation forecasts, again, the lagged forecast error has generally a significant influence on the forecast error of the following period, at a conventional significance level. Moreover, we find some hints for explanatory power of the narratives on the numerical point forecast errors. Sharpe2 and the LASSO, as well as the ridge sentiment with inflation forecast as response variable, are significantly correlated with the forecast error. Both text regression-based sentiment indices are the only two out of nine specifications that also reject the joint efficiency hypothesis without having autocorrelated errors. The varying signs of sentiment indices’ coefficients indicate sentiment indices with different polarity. Thus, rising inflation, e.g. the word ‘inflation’, could have both positive and negative weights, depending on the given dictionary (dictionary-based methods) and the used response variable (text regression methods).

The efficiency test results suggest that forecasters’ narratives have informational power for the forecast errors at the time when the forecasts were made, implying that the numerical forecasts do not make efficient use of all available information. Previous studies (e.g. Döpke et al. 2010, 2019) confirm that forecasts for Germany are not strongly (in part weakly) efficient by not incorporating all available information. But they never test the narratives of the forecaster itself. Sentiment indices, based on business cycle forecast reports, seem informative for the accuracy of German business cycle forecasts.^{Footnote 14} Thus, forecasters’ narratives contain information which is not exhausted by numerical forecasts. One explanation might be that the forecasters’ narratives contain useful information about the future stance of the German economy.

4.3 Predictive power

To test whether the narratives of German business cycle forecast reports contain useful information for the future stance of the German economy, the paper applies an in-sample and an out-of-sample forecast exercise.

4.3.1 In-sample forecasting regressions

Following Estrella and Hardouvelis (1991), Stock and Watson (2003) and Ferreira (2018), single forecasting equations are used to predict actual GDP growth and the inflation rate of changes. The in-sample and (pseudo) out-of-sample forecasting exercise tests whether text-based sentiment indices have predictive power for actual GDP growth and inflation. Similar methods were used to find predictors of economic activity (Estrella and Hardouvelis 1991) or predictors of business cycle fluctuations (Ferreira 2018). In order to do that, the sentiment indices are transformed by averaging all observations per quarter to build quarterly time series as explanatory variables. Hence, we get a quarterly time series with 100 observations from 1993Q1 to 2017Q4. The dependent variable in the basic forecasting regression is the annualized cumulative percentage change in real GDP respectively inflation. Following (Estrella and Hardouvelis 1991; Stock and Watson 2003):

$$\begin{aligned} {\hat{Y}}_{t|t+h} = (400/{h}) [\hbox {ln} ({Y_{t+h}}/{Y_t}) ] \end{aligned}$$

(9)

where $Y_t$ and $Y_{t+h}$ denote the level of real GDP (consumer price index) in period t and $t+h$, ${\hat{Y}}_{t|t+h}$ is the annualized cumulative percentage change from current quarter t to future quarter $t+h$, and $h=4$ denotes the forecasting horizon in quarters. The single forecasting equation is provided by (Ferreira 2018):

$$\begin{aligned} {\hat{Y}}_{t|t+h} = \alpha + \underbrace{\sum _{i=1}^p \rho _{i} {\hat{Y}}_{t-i}}_{\text {Lag. endog. var.}} + \underbrace{\sum _{j=0}^q \beta _{j} \hbox {SI}_{t-j}}_{\text {Sentiment indices}} + \underbrace{\sum _{m=1}^3 \sum _{j=0}^q \gamma _{j}^m \text {IN(m)}_{t-j}}_{\text {Control variables}} +\,\, \epsilon _{t+h} \end{aligned}$$

(10)

where $\hbox {SI}_t$ denotes the respective sentiment index, and $\text {IN(m)}$ represents German leading indicators as control variables. The control variables are also standardized by subtracting the mean from each variable and dividing it by its standard deviation. The forecast horizon h is set to four quarters to capture the annualized cumulative percentage change of GDP growth (${\hat{Y}}_{t|t+h}$), respectively inflation, from current quarter t to future quarter $t+h$. To hold the model parsimonious, the lag length p of the endogenous variable is set to one, and q is set equal to 0.

The single forecast regression given in Eq. 10 reduces under the simplifying assumption to a simple forecast equation, as suggested by Estrella and Hardouvelis (1991). According to Estrella and Hardouvelis (1991), the overlapping forecasting horizons provoke a moving average error term of order $h-1$, resulting in consistent but inefficient estimates. Therefore, Newey and West (1987)-corrected standard errors for heteroscedasticity and autocorrelation are applied with a lag length set equal to three ($h=4$) in line with Estrella and Hardouvelis (1991).^{Footnote 15}

As control variables for the forecasting regressions, several admitted economic predictors for the German business cycle are introduced:^{Footnote 16}

First, the term ‘spread’ (long-term interest rate minus the short-term interest rate) serves as a monetary control variable. The long-term interest rate serves the yield on debt securities outstanding issued by residents with mean residual maturity of more than nine and up to 10 years (monthly average, source Deutsche Bundesbank 2020). As the short-term interest rate, the EURIBOR 3-month funds money market rate is used (monthly average, source Deutsche Bundesbank 2020).
Second, total orders received by the German industry serves as the industry control variable. We take the change over the previous month at constant prices, calendar and seasonally adjusted orders (source: Deutsche Bundesbank 2020)
Third, the Ifo business climate index as leading business cycle indicator (monthly data, source Ifo institute 2020)

Table 4 presents the in-sample forecasting regression results, including selected business cycle indicators as control variables given by Eq. 10. While neither the lagged endogenous variable nor the Ifo business climate index is significantly different from zero, the order inflow and the spread interest rate have a significant impact on the average GDP growth rate. All control variables have the expected sign and a notable magnitude, indicating to a robust specification. Considering the generated sentiment indices, it can be seen that the coefficients are statistically significant only in three out of nine cases. The bag-of-words approach of Bannier1 and both text regression-based sentiments with inflation prediction as response variable (LASSO_INF_P, Ridge_INF_P) are statistically different from zero at conventional significance levels.

Noteworthy is the performance of text regression-based sentiment indices with inflation forecasts as response variables, instead of GDP growth prediction. It seems that this ‘wrong’ macroeconomic target variable captures the real GDP development as well.^{Footnote 17} This results can be a hint that GDP sub-aggregates, such as investments and consumption, could be promising response variables for text analysis tools to predict GDP growth.

Table 4 Forecasting equations including sentiment indices and control variables for Germany, GDP, 1999Q1 to 2017Q4

Full size table

Table 5 presents results regarding inflation in-sample forecasting regressions. Both dictionary-based Bannier sentiment indices have a significant influence on the average growth rate of inflation over the next four quarters. Both sentiment indices are negatively correlated with the target variable.^{Footnote 18} However, most of the generated sentiment indices do not show a significant impact on the average growth rate of inflation over the next four quarters at a conventional significance level.

In brief, changes in the narratives have weak in-sample predictive power on the average growth rate of GDP and inflation over the next four quarters.

Table 5 Forecasting equations including sentiment indices and control variables for Germany, Inflation, 1993Q1 to 2017Q4

Full size table

4.3.2 Out-of-sample forecasting performance

To evaluate the pseudo out-of-sample predictive power of the narratives, a reduced forecasting model of Eq. 10 is used to predict the 12-month-ahead average growth rate of real GDP respectively inflation:

$$\begin{aligned} {\hat{Y}}_{t|t+h} = \alpha + \sum _{i=1}^p \rho _{i} {\hat{Y}}_{t-i} + \sum _{j=0}^q \beta _{j} SI_{t-j} + \epsilon _{t+h} \end{aligned}$$

(11)

Following Ferreira (2018), we include only the lagged endogenous variable to the forecasting model as an additional regressor. The training sample covers 56 observations for the period from 1999Q1 to 2013Q4. The test sample includes 20 observations for the period from 2014Q1 to 2017Q4, which meets the recommended value of 20 per cent of the full sample (Hyndman and Athanasopoulos 2018). The model will be re-estimated at each iteration of the pseudo out-of-sample exercise before each one-step-ahead forecast is computed. The number of lags of the endogenous variable (p) and the predictor variable $SI_t$ (q) will be obtained by minimizing the Akaike information criterion (AIC) at each forecasting period. An autoregressive model is used as a comparative benchmark model. The order of the autoregressive model is also determined by minimizing the Akaike information criterion (AIC) at each forecasting period. In order to evaluate the predictive ability of the narratives, two common forecast evaluation metrics are calculated in a first step. The relative MAE:

$$\begin{aligned} \text {Relative MAE} = \frac{\frac{1}{T}\sum _{t=1}^T \left| e_{t}^{\hbox {SI}(k)} \right| }{\frac{1}{T}\sum _{t=1}^T \left| e_{t}^{\hbox {AR}} \right| } \end{aligned}$$

(12)

with a linear loss function, and the relative MSE with quadratic loss:

$$\begin{aligned} \text {Relative MSE} = \frac{\frac{1}{T} \sum _{t=1}^T \left( e_{t}^{\hbox {SI}(k)}\right) ^2}{\frac{1}{T} \sum _{t=1}^T \left( e_{t}^{\hbox {AR}}\right) ^2 } \end{aligned}$$

(13)

is calculated by using the respective forecast error $e_{t}$ of model 11 in relation to the benchmark autoregressive model. If the value of the relative measure is smaller than 1, the current model outperforms the benchmark model.

In a second step, a Diebold–Mariano test (Diebold and Mariano 1995; Harvey et al. 1997) is employed to test the out-of-sample forecasting performance. To this end, the null hypothesis of equal predictive accuracy (i.e. equal expected loss) between the forecasts with sentiment index and without (benchmark model). The one-sided alternative hypothesis that the forecasts without sentiment index is less accurate:^{Footnote 19}

$$\begin{aligned} H_0: L \left( e_t^\mathrm{AR} \right) = L\left( e_t^{\mathrm{SI}(k)} \right) \, \text {versus} \, H_1: L \left( e_t^\mathrm{AR} \right) > L \left( e_t^{\mathrm{SI}(k)} \right) \end{aligned}$$

(14)

where $L(e_t)$ represents the respective linear loss $L(e_t)=e_t$ or quadratic loss $L(e_t)=e_t^2$. Again, the Newey and West (1987) procedure is applied to correct for autocorrelation and the lag length is set equal to 3 ($h-1$) following Estrella and Hardouvelis (1991).

Table 6 Out of sample forecasting performance

Full size table

Table 6 shows the pseudo out-of-sample forecasting performance results for real GDP growth and inflation. The first two columns present the relative forecast performance based on relative MAE and MSE measures. Considering GDP growth, two forecasting series with regression-based sentiment indices (LASSO_GDP_P, Ridge_INF_P) beat the benchmark series in both relative measures, MAE and MSE, whereas one forecasting series (LASSO_INF_P) outperforms the benchmark series at least in relative MAE. In contrast, no dictionary-based sentiment index outperforms the benchmark forecasts in relative forecast performance metrics. Statistical tests to check whether the forecasting series with sentiment indices are more accurate as the benchmark forecasts without sentiment indices are given in lines three to six. The Diebold–Mariano tests for linear and quadratic losses do not reject the null hypothesis of equal predictive accuracy for all except one (linear: LASSO_GDP_P) forecasting series with sentiment indices at a conventional significance level. Thus, the generated sentiment indices do not seem to be a statistically powerful out-of-sample predictor for the average growth rate of GDP over the next four quarters. Forecasting performance results for inflation are also given in Table 6. On average, the relative forecast performance of the sentiment series are also weak, measured by the relative MAE and MSE. Again, two forecasting series (Sharpe2, Ridge_INF_P) outperform the benchmark series in relative MAE and relative MSE, whereas one forecast series with LASSO_INF_P index beat the benchmark series in relative MSE. Considering Diebold–Mariano tests, the null hypothesis of equal forecast accuracy can only be rejected for the forecasting series with Ridge_INF_P index in linear and quadrat forecast error environment. To summarize, forecasters’ narratives in the form of sentiment indices have weak, at best, predictive ability regarding future GDP growth and inflation in a (pseudo) out-of-sample environment.

5 Discussion and conclusion

Based on 534 business cycle forecast reports covering 10 German institutions for the period 1993–2017, the paper analysed the information content of German forecasters’ narratives for German business cycle forecasts and macroeconomic development. In order to do that, textual analysis is used to convert qualitative text data into quantitative sentiment indices.

In a first step, computational textual analysis methods are used to transform forecasters’ expectations about the future macroeconomic development into nine sentiment indices.

Second, sentiment analysis shows that the generated sentiment indices vary in their behaviour, pattern, and amplitude. In addition, the sentiment indices differ in their timely relationship to the realized macroeconomic development. Some sentiment indices show nearly a parallel development to the realized value, while other sentiment indices lag behind the real development and a small number of exceptions (partly) lead, compared to the realized value.

Third, sentiment indices are used to test forecast efficiency for GDP growth and inflation forecasts. Using 12-month-ahead fixed horizon forecasts, fixed-effects panel regression results suggest several sentiment indices with informational content for GDP growth and inflation forecasts. German forecasters’ narratives can enhance the accuracy of German business cycle forecasts. Overall, the results are in line with the findings of Jones et al. (2020), Sharpe et al. (2020) and Clements and Reade (2020). The four-quarter forecast horizon is comparable with the results of Sharpe et al. (2020) for the Fed’s Greenbook, whereas findings for the UK show shorter forecast horizons (Jones et al. 2020; Clements and Reade 2020).

Fourth, a forecasting exercise analysed the predictive power of sentiment indices for realized growth and inflation. This might explain why forecasters’ narratives have predictive power for forecast errors. But the forecasting exercise finds modest evidence, at best, for this hypothesis. The results indicate weak in-sample and out-of-sample predictive power of the sentiment indices for the future stance of the economy. However, more sophisticated forecasting models, e.g. mixed-data sampling (MIDAS) regression models, could improve the results.

There are several explanatory hypotheses as regards why the narratives contain information that is not exhausted by numerical forecasts. One of these is information rigidity. Based on the hypothesis that forecast revisions have predictive power for forecast errors (Nordhaus 1987), Coibion and Gorodnichenko (2015) and Dovern et al. (2015) find some hints supporting this hypothesis using tests for numerical forecasts in an international setting. Kirchgässner and Müller (2006) also find some evidence that German forecasters are reluctant to revise numerical forecasts. In a similar vein, forecasters’ narratives could be faster adjusted than their numerical counterparts. Sharpe et al. (2020) analysis for sticky point forecasts could only find weak evidence, at best, for this hypothesis. Another explanatory approach for the predictive power of forecasters’ narratives is the ‘modal-forecast explanation’ (Sharpe et al. 2020, p. 5). This hypothesis is based on the concept that the sentiment indices are particularly informative about tail risks, whereas numerical forecasts unbalance the risks because they are modal rather than mean forecasts. Sharpe et al. (2020) findings suggest such an interpretation. An additional explanation could be that the forecast narrative offers a wider scope for individuality than the quantitative forecast. The numerical forecast is limited to a number. And the production of the forecasts also depends on the institutes’ hierarchy and other influencing factors (see e.g. Fritsche and Heilemann 2010, for the Joint Diagnosis). Thus, the forecast report may allow the forecaster a higher degree of freedom. An study of the general issue—why forecasters’ narratives have predictive power for forecast errors—could form part of further research.

Last but not least, there is not a single sentiment index or sentiment analysis approach which is generally superior to other methods. The forecast-specific dictionary (Sharpe et al. 2020) and text regression methods perform well in tests for forecast efficiency. Considering the predictive power for GDP growth and inflation, dictionary-based approaches and text regression methods perform relatively weakly. However, the sentiment analysis could be improved in further research using more sophisticated text analysis and machine learning tools.

Change history

01 November 2021
In the published article, the funding note “Open Access funding enabled and organized by Projekt DEAL” was missed. The article has been updated.

Notes

Several studies suggest similar results for example for the US (Batchelor 1990), Japan (Ashiya 2006) or Austria (Fortin et al. 2020).
Ridge regularization is introduced as an opposite of LASSO because the ridge estimator cannot benefit from a parsimonious model (Pröllochs et al. 2018). Therefore, the elastic net, a mixture of both regularization methods, is not absolutely necessary for this investigation.
Until 2005, the HWWI was known as HWWA and mainly funded by public money. It became a privately funded institute in 2006.
See Table 7 for an overview.
The principle behind the tf-idf weighting scheme is that the more often a word appears in a document, the more important it is (term frequency). But, the more the word appears in all documents, the less important it is (inverse document frequency). The tf-idf weighting scheme is a commonly used metric in text analysis literature (see e.g. Loughran and McDonald 2011; Sharpe et al. 2020).
Nevertheless, the pre-processed corpus contains some meaningless terms as ‘gegenüber’ (in relation to) or ‘deutlich’ (obvious). To avoid a selection bias, the linguistic stopword lists were not manually expanded.
German is a morphologically rich language and the text corpora is a specific economic text corpora, and therefore, the meaning of a word is crucial. Stemming reduces different word forms to its base forms and to retain the meaning and semantic interpretation of the word (Jivani 2011). Porter’s stemming algorithm is one of the best stemming algorithms; it has a lower error rate and it is a light stemmer (Jivani 2011). Thus, the stemming procedure reduces complexity without losing the meaning of the word form. In contrast, lemmatization reduces the word forms to its root forms and the semantic interpretation can be lost (Jivani 2011).
An analysis of forecast revision patterns shows an inverse L-curve relationship between accuracy and shortening forecast horizon (Heilemann and Müller 2018).
In forecast evaluation contexts, it is appropriate to use first published (real-time) data or the last available revised data (Döpke et al. 2019). Here, the revised data are used because of data availability.
Calculations without the period of the Great Recession in 2008/2009 results in similar error measures.
The information content addressing the relation between the number of forecasts with correctly predicted direction by the number of all forecasts.
An extended pursuit of stopwords could reduce some ‘outliers’ to a minimum. But first, the objective of this paper is not to find the best stopword list, and, second, the few outliers should not matter from a purely statistical point of view.
Therefore, it is not necessary to employ the dynamic panel estimator proposed by Arellano and Bond (1991)
Robustness checks with the last known forecast error instead of the lagged forecast error support this finding. The results are available on request.
An automatic selection method for the number of lags is given by Andrews (1991) approximation rule. Another widely used method is to determine the lag length simply to the integer part of $T^\frac{1}{4}$, where T is the sample size (Greene 2012).
For a detailed discussion about German business cycle leading indicators, see Heinisch and Scheufele (2018) and the literature cited therein
The reason for the correlations are the generated dictionaries. For example, consider the full sample dictionary and weights for LASSO_INF_P in Table 11 again. Words such as ‘recovery’ (‘erholung’), ‘stable’ (‘stabil’), and ‘expansive’ (‘expansiv’) have negative weights, whereas words such as ‘slow down’ (‘abkühlung’) and ‘deficit’ (‘verlust’) have positive weights. All these words are related to GDP growth but have a reversed sign in relation to GDP growth, which explains the correlation and the negative coefficient.
The negative polarity of inflation is not surprising, given the finance-specific context of the dictionary. There is no ‘right’ sign of coefficient; it depends only on the given polarity (or weight).
The autoregressive benchmark model can be seen as nested in the sentiment model. Clark and McCracken (2001) show that the asymptotics of the Diebold–Mariano test can fail when comparing nested models. Diebold (2015) demonstrates that the Diebold–Mariano test is still useful and valid for comparing forecasts. Here, we simply ask whether the forecasts of one series is statistically more accurate than another, not whether one forecasting model is better than the other.

References

Andrews DW (1991) Heteroskedasticity and autocorrelation consistent covariance matrix estimation. Econometrica 59(3):817–858
Article Google Scholar
Arellano M, Bond S (1991) Some tests of specification for panel data: Monte Carlo evidence and an application to employment equations. Rev Econ Stud 58(2):277–297
Article Google Scholar
Ashiya M (2006) Forecast accuracy and product differentiation of Japanese institutional forecasters. Int J Forecast 22(2):395–401
Article Google Scholar
Baker S, Bloom N, Davis S (2016) Measuring economic policy uncertainty. Q J Econ 131(4):1593–1636
Article Google Scholar
Bannier CE, Pauls TT, Walter A (2018) Content analysis of business communication: introducing a German dictionary. J Bus 89(1):79–123
Google Scholar
Batchelor RA (1990) All forecasters are equal. J Bus Econ Stat 8(1):143–144
Google Scholar
Beck N, Katz JN (1995) What to do (and not to do) with time-series cross-section data. Am Political Sci Rev 89(3):634–647
Article Google Scholar
Berge TJ, Jordà Ò (2011) Evaluating the classification of economic activity into recessions and expansions. Am Econ J Macroecon 3(2):246–277
Article Google Scholar
Clark TE, McCracken MW (2001) Tests of equal forecast accuracy and encompassing for nested models. J Econom 105(1):85–110
Article Google Scholar
Clements MP, Reade JJ (2020) Forecasting and forecast narratives: the bank of England inflation reports. Int J Forecast 36(4):1488–1500
Article Google Scholar
Coibion O, Gorodnichenko Y (2015) Information rigidity and the expectations formation process: a simple framework and new facts. Am Econ Rev 105(8):2644–78
Article Google Scholar
Deutsche Bundesbank (2020) Time series data base. https://www.bundesbank.de/Navigation/EN/Statistics/Time_series_databases/time_series_databases.html. Accessed 5 April 2020
Di Fatta G, Reade JJ, Jaworska S, Nanda A (2015) Big social data and political sentiment: the tweet stream during the UK general election 2015 campaign. In: 2015 IEEE international conference on smart city/socialcom/sustaincom (smartcity). IEEE, pp 293–298
Diebold FX (2015) Comparing predictive accuracy, twenty years later: a personal perspective on the use and abuse of Diebold–Mariano tests. J Bus Econ Stat 33(1):1–9
Article Google Scholar
Diebold FX, Lopez JA (1996) Forecast evaluation and combination. Handb Stat 14:241–268
Article Google Scholar
Diebold FX, Mariano RS (1995) Comparing predictive accuracy. J Bus Econ Stat 13(3):253–263
Google Scholar
Dimpfl T, Kleiman V (2019) Investor pessimism and the German stock market: exploring google search queries. German Econ Rev 20(1):1–28
Article Google Scholar
Döhrn R, Schmidt C (2011) Information or institution? On the determinants of forecast accuracy. J Econ Stat (Jahrbuecher fuer Nationalökonomie und Statistik) 231(1):9–27
Google Scholar
Döpke J, Fritsche U (2006) Growth and inflation forecasts for Germany: a panel-based assessment of accuracy and efficiency. Empir Econ 31(3):777–798
Article Google Scholar
Döpke J, Fritsche U, Siliverstovs B (2010) Evaluating German business cycle forecasts under an asymmetric loss function. OECD J J Bus Cycle Meas Anal 1:1–18
Google Scholar
Döpke J, Fritsche U, Müller K (2019) Has macroeconomic forecasting changed after the great recession? Panel-based evidence on forecast accuracy and forecaster behavior from Germany. J Macroecon 62:103–135
Article Google Scholar
Dovern J, Fritsche U (2008) Estimating fundamental cross-section dispersion from fixed event forecasts (787). DIW Berlin Discussion Paper
Dovern J, Fritsche U, Slacalek J (2012) Disagreement among forecasters in G7 countries. Rev Econ Stat 94(4):1081–1096
Article Google Scholar
Dovern J, Fritsche U, Loungani P, Tamirisa N (2015) Information rigidities: comparing average and individual forecasts for a large international panel. Int J Forecast 31(1):144–154
Article Google Scholar
Estrella A, Hardouvelis GA (1991) The term structure as a predictor of real economic activity. J Finance 46(2):555–576
Article Google Scholar
Federal Statistical Office (2019a) Preise, Verbraucherpreisindizes für Deutschland, Lange Reihen ab 1948. https://www.destatis.de. Accessed 14 Mar 2019
Federal Statistical Office (2019b) Volkswirtschaftliche gesamtrechnungen, Bruttoinlandsprodukt ab 1970, Vierteljahres- und Jahresergebnisse. https://www.destatis.de. Accessed 14 Sept 2019
Ferreira T (2018) Stock market cross-sectional skewness and business cycle fluctuations. International Finance Discussion Papers 1223, Board of Governors of the Federal Reserve System (U.S.). https://www.fedinprint.org/items/fedgif/1223.html. Accessed 5 May 2020
Fildes R, Stekler H (2002) The state of macroeconomic forecasting. J Macroecon 24(4):435–468
Article Google Scholar
Fortin I, Koch SP, Weyerstrass K (2020) Evaluation of economic forecasts for Austria. Empir Econ 58(1):107–137
Article Google Scholar
Fritsche U, Heilemann U (2010) Too many cooks? The German joint diagnosis and its production. Technical Report 1/2010, DEP (Socioeconomics) Discussion Papers, Macroeconomics and Finance Series
Fritsche U, Puckelwald J (2018) Deciphering professional forecasters’ stories: analyzing a corpus of textual predictions for the German economy. DEP (Socioeconomics) discussion papers—macroeconomics and finance series 4/2018, Hamburg. http://hdl.handle.net/10419/194021
Fritsche U, Tarassow A (2017) Vergleichende Evaluation der Konjunkturprognosen des Instituts für Makroökonomie und Konjunkturforschung an der Hans-Böckler-Stiftung für den Zeitraum 2005-2014. IMK Study 54, Düsseldorf. http://hdl.handle.net/10419/156388
Gaibulloev K, Sandler T, Sul D (2014) Dynamic panel analysis under cross-sectional dependence. Political Anal 22(2):258–273
Article Google Scholar
Garcia D (2013) Sentiment during recessions. J Finance 68(3):1267–1300
Article Google Scholar
Gentzkow M, Kelly B, Taddy M (2019) Text as data. J Econ Lit 57(3):535–74
Article Google Scholar
Goldfarb RS, Stekler HO, David J (2005) Methodological issues in forecasting: insights from the egregious business forecast errors of late 1930. J Econ Methodol 12(4):517–542
Article Google Scholar
Greene WH (2012) Econometric analysis, 7th edn. Pearson, New York
Google Scholar
Harvey D, Leybourne S, Newbold P (1997) Testing the equality of prediction mean squared errors. Int J Forecast 13(2):281–291
Article Google Scholar
Heilemann U, Müller K (2018) Wenig Unterschiede-Zur Treffsicherheit internationaler Prognosen und Prognostiker. AStA Wirtschafts-und Sozialstatistisches Archiv 12(3–4):195–233
Article Google Scholar
Heilemann U, Stekler HO (2013) Has the accuracy of macroeconomic forecasts for Germany improved? German Econ Rev 14(2):235–253
Article Google Scholar
Heinisch K, Scheufele R (2018) Bottom-up or direct? Forecasting German GDP in a data-rich environment. Empir Econ 54(2):705–745
Article Google Scholar
Heppke-Falk K, Hüfner FP (2004) Expected budget deficits and interest rate swap spreads-evidence for France, Germany and Italy. Deutsche Bundesbank Discussion Paper (40/2004)
Holden K, Peel DA (1990) On testing for unbiasedness and efficiency of forecasts. Manch Sch 58(2):120–127
Article Google Scholar
Hyndman R, Athanasopoulos G (2018) Forecasting: principles and practice, 2nd edn. OTexts. https://otexts.com/fpp2/. Accessed 03 Sept 2020
Ifo institute (2020) Business climate index. http://www.cesifo-group.de/ifoHome/facts/Survey-Results/Business-Climate.html. Accessed 04 May 2020
Jegadeesh N, Wu D (2013) Word power: a new approach for content analysis. J Financial Econ 110(3):712–729
Article Google Scholar
Jegadeesh N, Wu DA (2017) Deciphering fedspeak: the information content of fomc meetings. Technical report, SSRN. https://ssrn.com/abstract=2939937. Accessed 19 Oct 2019
Jivani AG (2011) A comparative study of stemming algorithms. Int J Comput Technol Appl 2(6):1930–1938
Google Scholar
Jones JT, Sinclair TM, Stekler HO (2020) A textual analysis of bank of England growth forecasts. Int J Forecast 36(4):1478–1487
Article Google Scholar
Kauder B, Potrafke N, Schinke C (2017) Manipulating fiscal forecasts: evidence from the German states. FinanzArchiv Public Finance Anal 73(2):213–236
Article Google Scholar
Keane MP, Runkle DE (1990) Testing the rationality of price forecasts: new evidence from panel data. Am Econ Rev 80:714–735
Google Scholar
Kirchgässner G, Müller UK (2006) Are forecasters reluctant to revise their predictions? Some German evidence. J Forecast 25(6):401–413
Article Google Scholar
Krüger JJ, Hoss J (2012) German business cycle forecasts, asymmetric loss and financial variables. Econ Lett 114(3):284–287
Article Google Scholar
Lamla MJ, Lein SM, Sturm JE (2020) Media reporting and business cycles: empirical evidence based on news data. Empir Econ 59(3):1085–1105
Article Google Scholar
Liu W, Moench E (2016) What predicts US recessions? Int J Forecast 32(4):1138–1150
Article Google Scholar
Loughran T, McDonald B (2011) When is a liability not a liability? Textual analysis, dictionaries, and 10-ks. J Finance 66(1):35–65
Article Google Scholar
Loughran T, McDonald B (2016) Textual analysis in accounting and finance: a survey. J Account Res 54(4):1187–1230
Article Google Scholar
Lundquist K, Stekler HO (2012) Interpreting the performance of business economists during the great recession. Bus Econ 47(2):148–154
Article Google Scholar
Manela A, Moreira A (2017) News implied volatility and disaster concerns. J Financ Econ 123(1):137–162
Article Google Scholar
Mathy G, Stekler H (2018) Was the deflation of the depression anticipated? An inference using real-time data. J Econ Methodol 25(2):117–125
Article Google Scholar
Merton RC (1981) On market timing and investment performance. I. An equilibrium theory of value for market forecasts. J Bus 54:363–406
Article Google Scholar
Newey W, West K (1987) A simple, positive semi-definite, heteroskedasticity and autocorrelation consistent covariance matrix. Econometrica 55(3):703–08
Article Google Scholar
Nickell S (1981) Biases in dynamic models with fixed effects. Econometrica 1417–1426
Nordhaus WD (1987) Forecasting efficiency: concepts and applications. Rev Econ Stat 69(4):667–674
Article Google Scholar
Pierdzioch C, Rülke JC (2015) On the directional accuracy of forecasts of emerging market exchange rates. Int Rev Econ Finance 38:369–376
Article Google Scholar
Porter MF et al (1980) An algorithm for suffix stripping. Program 14(3):130–137
Article Google Scholar
Pröllochs N, Feuerriegel S, Neumann D (2015) Generating domain-specific dictionaries using Bayesian learning. ECIS 2015 Completed Research Papers (Paper 144)
Pröllochs N, Feuerriegel S, Neumann D (2018) Statistical inferences for polarity identification in natural language. PLoS ONE 13(12):1–21
Article Google Scholar
Remus R, Quasthoff U, Heyer G (2010) Sentiws—a publicly available german-language resource for sentiment analysis. In: Calzolari N, Choukri K, Maegaard B, Mariani J, Odijk J, Piperidis S, Rosner M, Tapias D (eds) Proceedings of the 7th international language resources and evaluation (LREC’10), European language resources association (ELRA), Valletta, Malta
Shapiro AH, Sudhof M, Wilson D (2020) Measuring news sentiment. Federal Reserve Bank of San Francisco. Working Paper 2017-01
Sharpe SA, Sinha NR, Hollrah CA (2020) The power of narratives in economic forecasts. FEDS Working Paper (2020-001)
Smant DJ (2002) Has the European Central Bank followed a Bundesbank policy? Evidence from the early years. Kredit und Kapital 35(3):327–343
Google Scholar
Stekler H, Symington H (2016) Evaluating qualitative forecasts: the fomc minutes, 2006–2010. Int J Forecast 32(2):559–570
Article Google Scholar
Stock JH, Watson MW (2003) Forecasting output and inflation: the role of asset prices. J Econ Lit 41(3):788–829
Article Google Scholar
Tetlock PC (2007) Giving content to investor sentiment: the role of media in the stock market. J Finance 62(3):1139–1168
Article Google Scholar
Tetlock PC, Saar-Tsechansky M, Macskassy S (2008) More than words: quantifying language to measure firms fundamentals. J Finance 63(3):1437–1467
Article Google Scholar
Tibshirani R (1996) Regression shrinkage and selection via the lasso. J R Stat Soc Ser B (Methodol) 58(1):267–288
Google Scholar
Tillmann P, Walter A (2018) ECB vs Bundesbank: diverging tones and policy effectiveness. MAGKS Joint Discussion Paper Series in Economics 20-2018, Marburg
Tobback E, Naudts H, Daelemans W, de Fortuny EJ, Martens D (2018) Belgian economic policy uncertainty index: improvement through text mining. Int J Forecast 34(2):355–365
Article Google Scholar
Varian HR (2014) Big data: new tricks for econometrics. J Econ Perspect 28(2):3–28
Article Google Scholar
Zipf GK (1949) Human behavior and the principle of least effort. Addison-Wesley, Cambridge
Google Scholar

Download references

Funding

Open Access funding enabled and organized by Projekt DEAL.

Author information

Authors and Affiliations

University of Applied Sciences Merseburg, Eberhard-Leibnitz-Straße 2, 06217, Merseburg, Germany
Karsten Müller

Authors

Karsten Müller
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Karsten Müller.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This research was supported by the German Science Foundation (DFG) under the Priority Program 1859. The author thanks the reviewers and, Jörg Döpke, Ulrich Fritsche and, Christian Schmeißer for constructive criticism and comments.

Appendix

See Tables 7, 8, 9, 10, 11 and 12.

Table 7 List of included institutions and publications

Full size table

Table 8 Forecasting specific word list: positive words (205 words)

Full size table

Table 9 Forecasting specific word list: negative words (103 words)

Full size table

Table 10 Dictionary and weights—Lasso GDP (71 words)

Full size table

Table 11 Dictionary and weights—Lasso inflation (69 words)

Full size table

Table 12 Dictionary and weights—Ridge GDP and inflation—Top 30 positive and negative words in descending order

Full size table

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Müller, K. German forecasters’ narratives: How informative are German business cycle forecast reports?. Empir Econ 62, 2373–2415 (2022). https://doi.org/10.1007/s00181-021-02100-9

Download citation

Received: 27 October 2020
Accepted: 14 July 2021
Published: 31 July 2021
Issue Date: May 2022
DOI: https://doi.org/10.1007/s00181-021-02100-9

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

German forecasters’ narratives: How informative are German business cycle forecast reports?

Abstract

Similar content being viewed by others

A survey on sentiment analysis methods, applications, and challenges

Sentiment analysis using deep learning architectures: a review

Sentiment Analysis of Financial News: Mechanics and Statistics

1 Introduction

2 Methodology: sentiment analysis

2.1 Dictionary-based method

2.2 Automatic variable selection approach

2.3 Recursive estimation

3 Corpus and data

3.1 The text corpus

3.2 The sample

4 Empirical results

4.1 Sentiments’ characteristics

4.2 Forecast efficiency

4.3 Predictive power

4.3.1 In-sample forecasting regressions

4.3.2 Out-of-sample forecasting performance

5 Discussion and conclusion

Change history

01 November 2021

Notes

References

Funding

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Appendix

Appendix

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation