1 Introduction

Macroeconomic analyses, as well as forecasts, are to a large extent based on national accounts (NA) data. However, national accounts data are usually revised several times after first release, which may lead policymakers to draw incorrect conclusions (Runkle, 1998), as well as forecasters (Döhrn, 2019). In Germany, NA data are typically designated as ‘final’ three and a half years after the end of the reported year. Thereafter, the data will only be subject to benchmark revisions.

The differences between first release and final data are substantial: The mean absolute revision of the year-over-year growth rate for the quarterly real GDP from 1994 to 2013 was 0.47 percentage points in Germany, which is at the lower end among OECD countries (Zwijnenburg, 2015). Particularly in small countries, the extent of the revisions is much larger. Seemingly, there was also little progress made in reducing the size of revisions. For Germany, almost the same results were found for earlier periods of time as today (Ahmad et al., 2004; York & Atkinson, 1997).

Given the volume of revisions, it is worthwhile to look for ways to reduce them. One option is to establish additional surveys in areas that are particularly prone to changes. However, this would place new reporting responsibilities on the economy. Another option might be to use big data, i.e., data that are byproducts of business and administrative systems, social networks, and the internet of things. These data may shed some light on activities not yet covered by statistics, however, this is still an evolving field of research (for an overview, see Hammer et. al., 2017). A third option, which is included in the focus of this paper, is improving NA compilation methods. An evaluation of revisions may indicate how to do this (George, 2004, Hoven, 2008).

From a methodological point of view, the third option is closely linked to the question of whether revisions are noise or result from new information (Mankiew and Shapiro, 1986). If all information is used efficiently, revisions should be noise, i.e., there is no systematic pattern, and in that case, a more efficient use of data sources would not help reduce revisions. For the question analyzed here, the opposite is interesting: If revisions are not noise, this suggests there is room for their reduction by using data sources more efficiently.

However, judging whether revisions are noise or the result of new information is not that simple. As an analogy to the forecast evaluation literature, a distinction could be made between weak efficiency, strong efficiency (Stekler, 2002), and—as introduced by Nordhaus (1987)—the Nordhaus efficiency. Weak efficiency results when first release data are orthogonal to the final data. Strong efficiency, in the present case, means that revisions are uncorrelated with any data that are already known when the NA are calculated. Finally, Nordhaus efficiency differentiates between vintages of revisions, and results when a revision at time t-1 is uncorrelated with the preceding revision at time t.

Although tests for weak efficiency can be found in most papers analyzing NA revisions, tests for strong efficiency are scarce. Faust et al. (2005) analyzed covariations of GDP revisions in G7 countries with oil prices, stock rents, and interest rates, finding significant coefficients for some countries but not for Germany. York and Atkinson (1997) found in their G7 study some covariations of GDP revisions with inflation in the case of Germany and of Canada. Nordhaus efficiency has not been investigated yet in the context of revisions, to the author’s knowledge.Footnote 1

This paper contributes to the existing literature in four ways. First, the study focuses on strong efficiency and, thus, on the question of what could be done to reduce revisions. Second, whereas most literature concentrates on GDP revisions (an exception is York & Atkinson 1997), this paper analyses many demand side and some production side components of GDP. Third, this study makes a distinction between benchmark revisions and current revisions. Benchmark revisions are internationally coordinated and guided by methodological considerations. They cannot be reduced through better compilation methods. The lever to decrease revisions is the reduction of current revisions, which can be achieved, for example, through better estimation methods. Therefore, this paper uses an intuitive approach described by Döhrn (2019) that splits total NA revisions into benchmark and current revisions. Finally, this paper takes into account the characteristics of the German NA revision process. As a rule, the initial missing primary statistics come from annual surveys. When these become available, the annual NA data are first revised and thereafter, the results are broken down to quarters. Finally, the quarterly data are seasonally adjusted using time-varying seasonal factors.Footnote 2 Earlier papers like York and Atkinson (1997), Faust et al. (2005), Garatt et al. (2008), Glass (2018), and Strohsal and Wolf (2020) analyse the total outcome of these steps in the revision process. Thus, they take the user’s view asking what size of revisions can be expected and how the data must be interpreted. Focusing on annual NA revisions, since only these can be reduced by a more efficient use of the available data, this paper provides hints, how revisions could be reduced in the future.

This paper is organized as follows. Section 2 will scrutinize how “efficiency” is measured in the context of revisions, and will propose tests of the properties of data revisions. In Section 3, the data are presented, along with a description of how current revisions are separated from benchmark revisions. Section 4 details the tests for weak efficiency, strong efficiency, and Nordhaus efficiency. Section 5 concludes and provides recommendations for producing NA.

2 Measuring Efficiency

Revisions should have three properties (Strohsal & Wolf 2020, p. 1253):

  1. I.

    They should be small, as measured by the mean absolute or the mean squared revision;

  2. II.

    They should be unbiased; and

  3. III.

    They should be efficient, i.e., not forecastable.

Analogous with the forecast evaluation literature, weak and strong efficiency can be distinguished (Nordhaus, 1987; Stekler, 2002: 223). Whereas revisions are weakly efficient when they are nonsystematic, they are strongly efficient when not correlated with any other information known at the time a given NA release is published.

A widely used descriptive measure about whether revisions are unbiased is the mean revision, which is defined as:

$${\text{MR}} = \frac{1}{T}\mathop \sum \nolimits_{t} F_{{\text{t}}} - P_{{\text{t}}}$$
(1)

with Ft and Pt being the annual growth rates of the final and the preliminary (first release) data for observation period t, and T the number of observations. As a test for bias, the regression

$$F_{{\text{t}}} - P_{{\text{t}}} = \, c \, + \varepsilon_{{\text{t}}}$$
(2)

is run, with εt as an iid residual. The preliminary data are unbiased if c does not differ significantly from zero, which is measured by the t-statistic. However, this test presents only a partial view. A more comprehensive test is the Mincer-Zarnowitz regression (Mincer & Zarnowitz 1969), which augments (2) by adding the growth rate of the preliminary data as a regressor (Faust, Rogers, & Wright, 2005, p. 406; Glass, 2018, p. 10).

$$F_{\text{t}} - P_{\text{t}} = \, c + {\beta}P_{\text{t}} + \varepsilon_{\text{t}}$$
(3)

If the NA data are efficient, c as well as β should be zero. Whether both restrictions are met can be determined by a F-Test.Footnote 3 Furthermore, the residuals of (3) should be uncorrelated, since autocorrelation of the residuals could also hint at inefficiencies. This was tested by employing the Ljung-Box Q-statistic for first degree autocorrelation.

A standard test for strong efficiency is proposed by Holden and Peel (1990). They augmented the Mincer-Zarnowitz equation to:

$$F_{{\text{t}}} - P_{{\text{t}}} = {\beta}_{0} + {\beta}_{1} P_{{\text{t}}} + {\beta}_{2} X_{{\text{t}}} + \varepsilon_{{{\text{t}}}}.$$
(4)

Where X stands for any indicator that is available when P is released. For normalized Xt the null is β0=β1=β2=0. As Holden and Peel (1990) point out, the results of the traditional Mincer-Zarnowitz test can be spoiled in cases where the revisions are correlated with X. Hence, β0 and β1 in the augmented equation may differ from c and β in (3).

Applying the Nordhaus (1987) approach to revisions deviates from previous measurements of efficiency in two respects. First, the revision process steps are considered separately, instead of comparing the first release and the final data. Second, the approach is based on a fixed event procedure (Nordhaus, 1987: 669), i.e., it looks at years individually. Efficiency in this approach means that statisticians do not stick to the previous NA release, but use new information immediately.

PT|n denote the nth vintages of preliminary data for year T, and vT|n the revision in vintage n compared to vintage n − 1. The total number of vintages is N, and vintage N is identical to the final data F. Thus, the following restriction holds:

$$F_{{\text{t}}} - P_{{\text{t|N}}} = \mathop \sum \limits_{2}^{n} v_{{{\text{t}}|{\text{n}}}}$$
(5)

To test for Nordhaus efficiency, the regression

$$v_{{{\text{t}}|{\text{n}}}} = {\beta}_{1} \cdot v_{{{\text{t}}|{\text{n}} - 1}} + \varepsilon_{{\text{t}}}$$

can be used. A positive and significant β1 is a sign of stickiness and, therefore, of inefficiency.

In the present context, however, testing for Nordhaus efficiency is difficult for two reasons:

  • The first is the small number of revisions of Germany’s annual GDP. The first release of the NA for year t is published in February of year t+1. The data are revised for the first time in May, but only since 2003. All subsequent revisions are made in August, the first one in t+1 and the last one in t+4. In the intervening quarterly publications, the annual figures are left unchanged. Thus, only four revisions can be analyzed for the full sample.Footnote 4

  • Second, analyses are impeded by the benchmark revisions. Benchmark revisions are externally caused events, not a “logic” step from first release to final data. Therefore, they cannot be interpreted in the sense of stickiness or non-stickiness, and they should be left out of consideration.

These problems are difficult to overcome. Due to the small number of observations, a sign test is advisable: If NA are revised at each step in the same direction, this would hint at stickiness and inefficiencies. To eliminate the troubling effect of benchmark revisions, an examination of years without this influence may provide some clues; unfortunately, this is possible for only eight years of the sample.Footnote 5

3 Data

As already stated, this paper focuses on revisions of annual data. Thus, it takes into account the revision process of the German Federal Statistical Office (Statistisches Bundesamt). It starts with the arrival of new data that substitute or help to improve estimates made in earlier NA publications. These new data typically come from annual surveys, and they are incorporated into the NA with the August publication. Having calculated new annual data, in a second step the quarterly figures are broken down to a quarter profile. To a large part this is done mechanically, because newly arriving data contain little information on quarters.Footnote 6 As a last step, the quarterly data are seasonally adjusted, which is done purely mechanically. Consequently, for the question of information efficiency the annual revisions are important.

The revisions of value added may serve as an example. Value added is defined as the difference between production and the inputs companies acquire. For the first release of NA, production is estimated from turnover, which is based on a monthly census of the manufacturing sector and a quarterly census in many service sectors. However, these sources cover only large companies. The contribution of small companies to value added must be estimated at this stage. For inputs, only annual data exist, so for the first release of NA, the share of inputs in production is kept constant at the last observed value. One and a half years after the first release of NA have been published, annual data on the production of all companies and the relation of inputs to production become available. At this stage, the German Statistical Office will revise its first estimate using an annual base, later breaking down the annual figures into a quarterly profile.

Of course, the use of annual data is associated with a loss of information and observations. This disadvantage can be partially overcome with the size of the sample. This study covers revisions for the years 1993 to 2016, i.e., a sample of 24 years.Footnote 7 The start is marked by the first year for which the full set of revisions can be traced for a unified Germany. The last year for which currently final data have been published is 2016. In Germany, final data are released 44 months after the end of the reported data year. All later revisions are the result of benchmark revisions. Subsequently, the first release data for year t are the NA published in February of year t+1. Final data are taken from the publication in August of year t+4.

As stated earlier, total revisions are the result of current and benchmark revisions. The latter are applied in five year’s turn, on average, and are typically internationally coordinated, incorporating conceptual, methodological, or benchmark changes (Jacobs and van Norden, 2011). Current revisions are a consequence of missing data in early releases, which are replaced by observations or better estimates when new basic statistics become available. Better compilation methods can reduce current revisions, whereas benchmark revisions are not data-driven but follow conceptual considerations.

Most papers combine both types of revision, which makes it difficult to draw conclusions for NA compilation. Here, an approach described in Döhrn (2019) was used to separate current revisions from benchmark revisions. First, all data were transformed into year-over-year growth rates. Next, the entire growth rate change in the quarter a benchmark revision took place was ascribed to the benchmark revision. Finally, subsequent current revisions were chained to the rates before the benchmark revision. This procedure only provides a rough approximation. However, the direction in which the current revisions are biased by the adjustment remains an open question, since it is unknown how both types of revisions interact.

Table 1 provides some descriptive statistics for the total revisions and the estimated current revisions. It shows that the mean absolute revisions (MAR), the mean squared revisions (MSR), and the noise-to-signal ratio (NSR), defined as MSR relative to the variance of the observed growth rates, are considerably smaller for current revisions than for total revisions. Particularly large are the investment and external trade revisions. However, since the growth rates of these variables are highly volatile, the NSR is small. On the other hand, private consumption and government consumption show lower revisions but are less volatile, leaving the NSR relatively large. The change in stocks is the only variable for which the NSR is above 1.

Table 1 Descriptive statistics of NA revisions in Germany, 1993-2016

An issue raised by Glass (2018) and Öller and Hansson (2005) is the statistical properties of the revisions. Most statistical tests assume normal distribution of the residuals. Non-normally distributed revisions would spoil the tests. As Table 2 shows, the revisions are mostly skewed, and the kurtosis is below 3 in many cases. However, the Jarque-Bera test does not reject the null of normal distribution for most variables, with three exceptions. For private consumption expenditure, imports, and employment, the null of being normally distributed must be rejected, the last two at a high level of significance. Therefore, in addition to the OLS estimates, a least absolute deviation (LAD) estimator will also be presented, which is more robust to outliers.Footnote 8

Table 2 Skewness, Kurtosis, and test for normality of NA revisions in Germany, 1993-2016

4 Results

4.1 Weak Efficiency

As outlined above, three tests for weak efficiency are employed. Table 3 shows the results of the test for (partial) bias according to (2). Both approaches confirm that revisions to exports, imports, and employment are biased. For private consumption, only the OLS estimate exhibits bias. Seemingly, the results are strongly influenced by outliers, demonstrated by the median being closer to zero than the mean for most variables.

Table 3 Test for bias of NA revisions in Germany, 1993-2016

As noted above, correlations of revisions with the preliminary data may distort the results of the partial test. Table 4 shows the results of the Mincer-Zarnowitz equation (3), which tested both dimensions of weak efficiency—bias and correlation between revisions and preliminary data.

Table 4 Test for Mincer-Zarnowitz efficiency of NA revisions in Germany, 1993-2016

Again, the results of the OLS and the LAD estimates differ to some extent, hinting at outliers influencing the OLS results. For government consumption, the change in stocks and exports both estimates hint in the same direction. For government consumption, bias and negative correlation of the revisions with the preliminary data interact. For the change in stocks, the revisions are negatively correlated with the preliminary data, for exports bias is the problem. For imports, only the OLS regression shows bias but for the LAD regression, the null of both coefficients being zero was rejected. For employment, only the OLS regression exhibited signs of inefficiency.

Finally, the residuals of the Mincer-Zarnowitz equation were tested for autocorrelation. As indicated by Table 5, revisions of total investment and employment are significantly autocorrelated. The coefficients are positive, which shows that a revision in year t is likely to be followed by a revision in the same direction in year t+1. The result for employment is mirrored in GDP per employee, which suggests that new information on employment does not translate into revisions of GDP.

Table 5 Autocorrelation of residuals of the Mincer-Zarnowitz equation, 1993-2016

In summary, there is evidence of a lack of informational efficiency for the first release data of some NA components. Revisions of private consumption, exports, imports, and employment show bias. Revisions of government consumption, change in stocks, imports, and employment are correlated with the preliminary data. Finally, revisions of investment and employment seem to be autocorrelated. However, this only indicates room for improvement but does not reveal the method of utilization. Here, tests for strong efficiency might provide some insights.

4.2 Strong Efficiency

Testing strong efficiency is associated with the problem of choosing the X-variable in equation (4). Faust et al. (2005) consider the influence of oil prices, stock returns, and the short-term interest rate.Footnote 9 For Germany, these variables do not covariate with GDP revisions, but there are other variables that are worth considering, such as company or consumer surveys. However, there is no rule what additional variables should be included.

The indicators used in this study try to cover different types of information (Table 6). They enter (4) as annual averages. If necessary, the indicators were made stationary either by calculating growth rates or first differences. Furthermore, all variables normalized, to ensure that β2 in (4) is zero if an indicator is not correlated with the revision. To test whether the extent of the revisions changes systematically over time, a trend-variable was also included as an additional regressor.

Table 6 Indicators entering the strong efficiency test

Table 7a presents the results of the OLS estimator and Table 7b the LAD estimator. Only results that meet two conditions are shown: First, β2 differs significantly from zero and second, an F-test rejects the null of β0=β1=β2=0.

Table 7 Test for strong efficiency of NA revisions in Germany, 1993-2016, (a) OLS estimate (b) LAD estimate

Whereas the tests for weak efficiency indicated that GDP revisions are noise, the tests for strong efficiency show a different picture. The strong efficiency tests find covariations with several indicators, including business climate in industry and trade, as well as business climate in manufacturing.Footnote 10 A favorable business climate seems to be followed by an upward revision of GDP growth. Furthermore, there is a positive correlation of GDP revisions with the time trend, indicating that upward revisions have become more important over time. Moreover, a growing number of insolvencies tends to indicate a downward revision of GDP growth. This may be due to the fact that the monthly production census may lose representativeness when many companies go bankrupt. However, this effect was only found in the OLS estimate.

Also, for the GDP components, some covariations of indicators and revisions were found. Some seem plausible: In years world trade grew strongly, export growth tends to be revised upward. In other cases, it is harder to explain the correlation between indicators and revision. Why, for example, should revisions of government consumption be correlated with raw material prices? All in all, many results seem spurious. This is underpinned by the fact that the LAD estimator shows a considerably smaller number of significant covariations.

Particularly for imports and exports, there are indicators at hand that might help reduce revisions. An interesting feature is that revisions of the change in stocks also correlates with business survey results and capacity utilization. In the first release of NA, this variable is mostly a residual, catching many discrepancies between the supply side and the demand side of NA. This suggests that the use of business surveys may help reduce these discrepancies.

To summarize, analyses of strong efficiency provide some ideas about how NA compilations methods could be improved to reduce revisions. However, the indicators tested here represent only a small fraction of the available indicators. Thus, a test of more indicators can provide additional insights.

4.3 Nordhaus-Efficiency

Given the limitations discussed above, the outcomes of the Nordhaus calculations presented below can only provide an indication of whether revisions of German NA are efficient. Table 8 shows the number of upward revisions for 13 NA variables, as well as the eight years not influenced by a benchmark revision. For the question of Nordhaus efficiency, the cases with 4 and 0 upward revisions (the latter means 4 downward revisions) are important because they mean that all revisions shifted the data in the same direction. This may indicate that statisticians adhered to their earlier publications.

Table 8 Test for Nordhaus efficiency of NA revisions in Germany, number of upward revisions in years not influenced by benchmark revisions

Among the variables considered, domestic demand, exports, and GDP per employee stand out. If downward and upward revisions are randomly distributed, only one case in which all revision are in the same direction can be expected among eight cases. For the three variables mentioned, this was the case in three out of eight years. Grouped by years, the year 2006 shows a high degree of stickiness. Almost half of the variables analyzed were persistently revised upward.

Summarizing the results of Table 8, in 20 out of 104 cases (19.2%) all revisions were in the same direction, showing the values 0 or 4. If upward and downward revisions were distributed randomly, this would be expected only in 12.5% of cases. The probability of observing 20 or more quadruples of the same sign among 104 random draws is only 3.2 %. The difference between the observed and the expected share hints at revisions being inefficient in the sense of Nordhaus, namely at producers of NA being sticked to the data they have published before. However, due to data limitations the results should be treated with caution.

5 Conclusions

National accounts are the most important statistical basis for macroeconomic analyses and forecasts. However, revisions can be extensive and reducing them would be beneficial. One option for decreasing revisions is improving NA calibration. Evaluations of the revisions may provide some hints as to where to put the lever. If revisions were noise, making better use of the data will not help reduce revisions, meaning the NA are information efficient. What is interesting is the opposite, in which revisions result from new information. Identifying these cases might reduce future revisions.

The paper distinguishes three forms of efficiency: Weak efficiency, strong efficiency, and Nordhaus efficiency. As far as weak efficiency is concerned, GDP revisions during the period 1993–2016 were noise. Thus, the study confirms the results of Strohsal and Wolf (2020), who used quarterly data. Compared to the results of York and Atkinson (1997), as well as Faust et al. (2005), efficiency seems to have improved. Both found inefficiencies in GDP revisions for earlier periods. Among GDP components, government consumption, exports, imports, and employment showed signs of inefficiencies. Revisions of these variables show bias, covariate with the first release data, or are autocorrelated. For most of these variables, York and Atkinson (1997) found bias as well.

Tests of strong efficiency provided a different assessment of GDP revisions. The hypothesis that revisions are noise was rejected, also for the sample used in this study. Revisions covariate with some economic indicators, particularly with the results of business surveys. Thus, there are some indicators that may be used to improve the calibration of NA, leading to reduced revisions. This also seems to be the case for exports and imports, where weak efficiency tests indicate the need for action.

The discussion of Nordhaus efficiency tackled the interesting question of whether statistical agencies are hesitant to revise previously published NA data. Due to poor data, the analyses could only give a weak indication. However, the unexpectedly large number of cases for which all vintages of revisions showed the same sign indicates some stickiness.

In conclusion, analyses of the revisions of many aggregates of German NA reject the hypothesis that the revisions are noise. For those who produce the NA, this means there is room to improve the calibration methods. The tests for strong efficiency should encourage more intensive use of business survey results to improve the first estimates of missing data.