Revisions in the Norwegian National Accounts: accuracy, unbiasedness and efficiency in preliminary figures

This paper investigates the quality of preliminary figures in the Norwegian National Accounts. To address the problem of few observations in such analyses, we use some recently developed system tests for forecast evaluation. We find that preliminary figures for growth rates NA figures (measured in real terms) are accurate, unbiased and efficient. The exception is growth rates for real gross fixed capital formation, which under-predict the final figures. Early published vintages of growth rates for real gross fixed capital formation are often closer to the final vintages than later vintages are.


Introduction
In Norway, the final vintage of annual National Accounts figures (based on the normal revision cycle) is published approximately one year and a half after the end of the year for which they apply. Before the final figures are released, preliminary figures are used, among other things, in policy formulation. Therefore, it is important that the preliminary figures are good predictors of the final figures and as accurate as possible.
Data revisions and their implications have been studied for many years. Cole (1969) found that for certain types of forecasts, "The use of preliminary rather than revised data resulted in a doubling of the forecast error." Investigating preliminary National Accounts figures for Germany, Strohsal and Wolf (2020) conclude that the revisions are "biased, large and predictable," with the noteworthy exception for GDP. Similar results were found for the USA in Aruoba (2008). A recent special issue of Empirical Economics, see Kunst and Wagner (2020), focuses on forecasting of macroeconomic variables and on the consequence of final vintage of National Accounts figures not being available when forecasts (or nowcasts) are made. For example, Siliverstovs (2020) considers the problem of nowcasting (both point forecasts and density forecasts) when conditioning on variables that are preliminary and finds that a simple univariate model can be better than a sophisticated mixed frequency model to obtain good nowcasts. On the other hand, Claudio et al. (2020) find that a mixed frequency model outperforms forecasts obtained by more traditional single-frequency models when applying data available in real time. Glocker and Wegmueller (2020) consider the problem of dating recessions when taking revisions of GDP into account.
There are only a few previous studies of revisions of the Norwegian National Accounts. Bernhardsen et al. (2005) consider the problem of estimating the output gap based on preliminary National Accounts figures in Norway. They "find that total revisions of output gap estimates are heavily influenced by uncertainty about the trend at the end of the sample and that data revisions are of less importance." Jore (2017), studying quarterly Norwegian National Accounts data, finds that first releases of growth in both nominal GDP and its deflator under-predict the final figures. However, these biases cancel out such that "there is no tendency for the first released data [for growth in real GDP] to either over-or under-predict the final data." Although these two papers analyze revisions in GDP for Norway, they do not study the revisions for all of the main aggregates in the National Accounts.
In this paper, we compare the preliminary published figures for the growth rate of GDP and its main components to the final published figures of these variables.
For the preliminary figures to be characterized as good, they must satisfy certain requirements (see also Aruoba 2008). First, they must be unbiased estimates of the final figures, which we find in tests for most of the figures, both separately and jointly. Second, they must have a small variance (compared to the variance of the final vintage of the figures), and its variance must decrease with new vintages. For most of the National Accounts figures, we find support for both a small and a decreasing variance with newer vintages. Third, they must be efficient; that is, they utilize all available information at the time they are published and, thus, future revisions are unpredictable. In practice, it is impossible to test that there does not exist any available information at the time the preliminary National Accounts figures were made that could have improved them. However, it is possible to test whether one can improve the preliminary figures by using its unconditional mean or earlier vintages of the same variables (also known as weak efficiency). We test weak efficiency both with a test based on Mincer and Zarnowitz (1969) and with an encompassing test (see Chong and Hendry 1986). The efficiency test based on Mincer and Zarnowitz (1969) indicates that the preliminary National Accounts figures cannot be improved by replacing them with a weighted average of the preliminary figures and an unconditional expectation estimate. We show that this test also implies that more "news" is incorporated into the preliminary figures through the revision process. The results from the encompassing tests support this conclusion; new preliminary vintages of the National Accounts figures generally seem to contain all information from earlier vintages, so the latest vintages cannot be improved using earlier vintages.
When testing for efficiency, we do not only apply tests on the variables in the National Accounts separately. We also apply tests of efficiency for the whole vector of variables. The test we use for this is proposed in Hungnes (2018). One of the advantages of testing all of the elements in this vector jointly is that the test can have better power than tests for each variable separately have. Also, it may be easier to draw conclusions from such a joint test if you get divergent results for different variables when testing these separately.
The tests for efficiency might have weak power in small data sets, even if tested jointly. We therefore also apply equal predictability tests, where we examine whether two vintages of the preliminary National Accounts figures are equally good predictors of the final ones. We usually fail to reject this hypothesis when we test the variables separately (with some exceptions). However, when we use the test in Hungnes (2020) to test this hypothesis for all variables jointly, we reject the hypothesis that two vintages of the National Accounts figures are equally good predictors for the final ones. Furthermore, the estimates underlying the tests show that this rejection occurs because the vector of the most recent vintage of preliminary National Accounts figures is significantly better than the vector of an earlier vintage of the same figures.
Although the tests generally show that the preliminary National Accounts figures are unbiased and weakly efficient, there are some exceptions. In particular, this applies to the gross fixed capital formation (a component of gross capital formation). For this variable, it turns out that the preliminary figures are significantly too small for all vintages and that this bias is increasing through the revision process. Thus, the predictions of this variable get worse the closer you get to the time when the final vintage of National Accounts figures is published. The preliminary vintages of the National Accounts figures would have been better if you had kept the figures published in the first vintage of this variable until the final figure is published.
The present paper takes benchmark revisions directly into account. In the main analysis, we do so by attributing the change in the growth rates from the last publication before the benchmark revision to the first publications directly after the benchmark revision to be treated as the effect of the benchmark revision.
Whenever possible, we compare our results with those in Strohsal and Wolf (2020). Strohsal and Wolf (2020) study German quarterly National Accounts data and consider many of the same series as we do here. We find the preliminary figures of growth in Norwegian GDP components to be more accurate than those for Germany. Furthermore, all preliminary National Accounts figures for the Norwegian GDP components are efficient, while this is not the case for either private or public consumption in Germany.
The paper is organized as follows: Sect. 2 describes the revision cycle in the Norwegian National Accounts and gives an overview of the different sources used for the various vintages throughout the revision cycle. The section also presents the National Accounts variables we are considering in this study. Section 3 describes the accuracy of the preliminary vintages of the variables considered, tests for unbiasedness of preliminary vintages, as well as considers two efficiency tests (including an encompassing test) and an equal predictability test. Section 4 concludes.

Revision cycles, sources and the National Accounts data
The national statistical office publishes annual National Accounts figures in a relatively fixed cycle: The first to third vintages of the annual figures are preliminary estimates based on the system of the Quarterly National Accounts (QNA). The fourth vintage is the final in the regular revision cycle and is based on the system of the annual National Accounts (NA). The first vintage of the annual figures is published when all the quarters of the year they apply for are available. Since the NA figures of 2006, the first vintage has been published at the beginning of February (about 40 days after the end of the year to which it applies). The second vintage (for the year 2006 and later) has been published in May (about 19 weeks after the end of the year to which it applies), and the third vintage (for the year 2014 and later) has been published in August (about 34 weeks after the end of the year to which it applies). The fourth vintage, which is the final vintage in the regular revision cycle, has (for the year of 2013 and later) been published in August 1 year after the third vintage (about 20 months after the end of the year to which it applies). As a result of benchmark revisions, figures can also be revised after the publication of the fourth vintage.
The times for publishing the different preliminary vintages have changed somewhat over time. For the NA figures for the years 2003-2017, the changes have been exclusively in the direction of higher timeliness. Table 1 shows an overview of the revision cycle for GDP for the mainland Norwegian economy (Mainland GDP). The table also indicates by means of color codes according to which benchmark revision standard each figure was published.
Benchmark revisions usually imply changes in the National Accounts' definitions and guiding principles. If the first vintage of the NA figures for a year was published before a benchmark revision, while the final vintage was published after or as part of a benchmark revision, then revisions from preliminary to final figures could come from definitions and guidance changes, in addition to normal revisions within the regular publishing cycle.

Sources in the revision cycle
When the first vintage is published, most of the short-term statistics are included in the QNA system. This publication is based on monthly and quarterly figures from the state accounts and KOSTRA (Municipality-State-Reporting), respectively, as the basis for developments in public administration. Several units, including state education and health, have no reporting obligation other than annual figures, so these are estimated in all quarters for the first vintage. For foreign trade, goods data are available, while import and export of services are estimated at a smaller subset for the fourth quarter of the year.
For the second vintage, updated figures for public management are available as annual figures from both the state accounts and KOSTRA are now available. For the state accounts, this implies a full census, while for KOSTRA the reporting deadline for the annual figures is somewhat longer so that several units are missing. For foreign trade, there are some revisions in the goods data, but the main source of revision is that there are complete fourth-quarter figures for service trade. Otherwise, revisions in the short-term statistics can lead to revisions in the NA figures.
The third vintage has full KOSTRA figures, which can give revisions in municipal administration. Also, preliminary structural statistics are available, which will improve the estimates of the market-oriented industries. In connection with the publication of the third vintage, the base year in the QNA is also updated. The shift of base year implies that the short-term indicators in the QNA are weighted together with the NA figures from a more recent year. Changing the base year can provide revisions in itself, even though we have no new information in the short-term statistics.
For the fourth and final vintage, the figures are based on the NA system. The NA system utilizes more sources than the QNA system, and the calculations are done at a more detailed level. In addition, the volume calculations in QNA are mainly done by extrapolating the NA sizes with suitable volume indicators. The NA system consists of accounting sizes at current prices, which are then deflated by suitable price indices to give the figures in real terms. NPISHs non-profit institutions serving households

National account series considered
In the current paper, we consider revisions in the growth rates (measured in percent growth from the previous year) of gross domestic product (GDP) and its main components, all measured in real terms. The volume and growth rates of the considered series are reported in Table 2  Gross capital formation includes changes in stocks and statistical discrepancies. Since it also includes statistical discrepancies, the figure is derived from the sum of GDP and import minus final consumption expenditure (CP + CO + EXP) and changes in the figure for gross capital formation between vintages may not reflect new information on gross capital formation. Therefore, we also consider gross fixed capital formation (JK), which constitutes the most of gross capital formation.
In Norway, GDP Mainland is considered as the most important NA figure. GDP Mainland is defined as the total GDP minus Petroleum activities and ocean transport. 1 In 2017, GDP for Mainland Norway constituted about 85% of total GDP.

Tests of unbiasedness and efficiency
In this section, we consider measures for accuracy and tests for unbiasedness and efficiency. To do so, we must define some variables. Let y i t,( j) be the j'th vintage of the growth rate of the NA figure of variable i applying for year t, where i = 1, 2, . . . , K with K as the number of NA variables we are considering. The 4th vintage of this figure is treated as the final value. Thus, the prediction error for the j'th vintage of variable i for year t is defined as e i t,( j) = y i t,(4) − y i t,( j) ( j = 1, 2, 3). The variables are measured in percent growth from the previous year with one decimal as in Table 1.
Due to benchmark revisions, for many years, we have that the first vintage of the NA figures is based on one benchmark revision standard, while the final version is based on another benchmark revision standard. We consider here three different approaches to handling this.
In the first approach, we ignore that such revisions have taken place. Although benchmark revisions may change the level of the variable, they will not necessarily change the year-to-year growth of the variable, since the level of the variable for the previous year is also changed. As the first year with NA figures for GDP Mainland Norway was in 1988, and the last year with final NA figures is from 2017, we consider all years from 1988 to 2017-giving us a sample of 30 observations.
In the second approach, we exclude the years where such benchmark revisions have taken place between the first and the last vintage. As there are 15 years where there has been a benchmark revision between the publication of the first and the final NA figures, we only have a sample of 15 years where there have not been such revisions during the publication process.
In the third approach, we adjust for the effect of the benchmark revisions on the figures. If a revision has taken place between vintage j and vintage j + 1 of variable i applying for year t (and we expect that when adjusting for this revision the vintage j should be an unbiased predictor of vintage j +1 of the same variable for the same year), the best adjustment for the benchmark revision is the change in the preliminary figure of this variable for year t from vintage j to vintage j + 1. Let R t,( j) be an indicator variable, taking the value of 1 if there is a benchmark revision between version j and j + 1 for year t, and zero otherwise. Then, the adjusted predictions are given by such that the adjusted prediction errors are given by 4 For example, consider the Mainland GDP growth rate for 1991 from vintage 3 (see Table 1), which is −0.6, i.e., y G D P 1991,(3) = −0.6, i.e., a decline of 0.6% from 1990 to 1991. Before the final figure is published there is a benchmark revision, so we have R 1991,(3) = 1. Applying (1), we have y G D P * 1991,(3) = y G D P 1991,(3) + (y G D P 1991,(4) − y G D P 1991,(3) ) = y G D P 1991,(4) = 1.1, implying that we are revising the GDP growth for vintage 3 up by 1.7 percentage points. The GDP growth for 1991 in vintages 1 and 2 is revised up by the same figure of percentage points, as this is our estimate of the effect of the benchmark revision for GDP in 1991. Furthermore, applying (2), we have e G D P * 1991,(3) = 0. Thus, if the benchmark revision takes place between the 3rd and the final (4th) vintage, we are essentially comparing the preliminary figures with its 3rd vintage.
The first approach, where we ignore that a benchmark revision has taken place between the first and final vintage (for the normal revision cycle), seems to be the usual approach in this type of analysis (especially when the variables are formulated in percentage growth); see, e.g., Strohsal and Wolf (2020) and Aruoba (2008). Our third approach, where we adjust for benchmark revisions, is more in line with Clements and Galvão (2013), who include "benchmark dummies" to adjust for benchmark revisions. We use this third approach in the main part of the paper. The results obtained by using the other two approaches are reported in "Appendix B."

Accuracy
The root-mean-squared error (RMSE) for vintage j of variable i is given by where the set T contains the years included in the sample, and N is the number of elements in T . If we include all years, we have N = 30. If we only include years for 4 This can also be formulated as which there have been no benchmark revision, we have N = 15. And if we adjust for benchmark revisions by using (2), we have N = 30 for vintages 1 and 2 and N = 22 for vintage 3. 5 When we adjust for benchmark revisions, we replace e i t,( j) with e i, * t, ( j) defined in (3). The RMSE is a measure of the accuracy of the preliminary figures. It is reported in Table 3 for the adjusted prediction errors (see also Table 8 and Table 13 in "Appendix B" for the results obtained under alternative treatments of benchmark revisions). The average bias in the preliminary figures isē i ( j) . The RMSE can then be decomposed into a prediction variance and a bias component: The root of the prediction variance is also reported in Table 3 (whereas the bias component will be considered in the next section). For comparison, Table 3 also reports the root of the variance of the final vintage of the variable, see the final column of the table, where the variance is given as This measure can be used as a benchmark for the prediction variance of the preliminary figures. If the preliminary figure for a variable were the same each year (that is, if y i 1988,( j) = y i 1989,( j) = · · · = y i 2017,( j) ), then the prediction variance of this vintage of the variable would be equal to the variance of the final vintage. Therefore, V i could be considered as an upper limit for the prediction variance.
From Table 3 (and also Table 8 and Table 13), we draw the following conclusions: First, the preliminary figures for private consumption expenditure of households and non-profit institutions serving households (CP) are the most accurate of the NA figures we are considering, based on both RMSE and the root of the prediction variance. However, for the third vintage, the preliminary figures for GDP and Mainland GDP (GDPM) are approximately equally accurate as the figures for private consumption expenditure (CP).
Second, the preliminary figures for gross capital formation (J) and its main component gross fixed capital formation (JK) are the least accurate NA figures we are considering here. For the first two vintages, these figures have an RMSE about four times as high as those for private consumption expenditure (CP), the GDP and the Mainland GDP (GDPM). However, this is due to the high volatility in investments over time. The root of the final vintage figures' variance for these two investment types is also about four times as high as those for CP, GDP and GDPM.
Third, for most variables, the accuracy increases for later vintages. Recall that the number of observations is N = 22 in vintage 3, implying that vintage 2 and vintage 3 are not directly comparable; for government consumption (CO), the RMSE (and also the root of PV) is equal for vintages 2 and 3. However, this seems to be due to the changed number of observations, as RMSE decreases both when we consider the full sample (see Table 8) and only years for which there are no benchmark revisions (see Table 13). For gross fixed capital formation (JK), the accuracy decreases throughout the revision cycle, as RMSE (and PV) increases with the vintage figure for these variables. For export (EX), the third vintage seems to be less accurate than the second vintage, independent of how benchmark revisions are treated.
Fourth, based on the ratio between prediction variance and the variance of the final vintage (also known as the noise-to-signal ratio, see, e.g., Aruoba 2008), government consumption (CO) has the least accurate figure for all vintages.
Fifth, RMSE for the adjusted prediction errors given by (2) reported in Table 3 is smaller than the corresponding RMSE for the unadjusted prediction errors for the full sample reported in Table 8. This indicates that the correction for benchmark revisions in (2) works well.
Sixth, the small difference between RMSE and the root of prediction variance indicates only small biases. The biggest difference is found for gross fixed capital formation (JK). Below, we will formally test for the absence of bias. Strohsal and Wolf (2020) consider the accuracy of the first vintage of many of the same NA figures for Germany, though they consider quarterly NA figures. For GDP, Strohsal and Wolf (2020) estimate a noise-to-signal ratio of 0.44, which is in line with the estimates we get for the Norwegian GDP. For private consumption, public consumption and investments, which correspond to CP, CO and J in our analysis, they identify noise-to-signal ratios about twice as large (they report ratios of 0.74, 1.24 and 0.60, respectively) as those we find for Norway. For exports, the estimate for Germany's signal-to-noise ratio in Strohsal and Wolf (2020) is larger than what we identify for Norway (0.43 vs. 0.32).

Unbiasedness
The test of unbiasedness is based on where  Root-mean-squared error (RMSE) and root of both the absolute and the relative prediction variance, as well as the root of the variance of the final vintage whereq i is a consistent estimate of the variance of d i t . We usê where τ is the order of autocorrelation (where we in the current paper assume τ = 1 due to few observations), and w l denotes weights (where we follow Newey and West (1987) and use w l = 1 − l τ +1 ). Furthermore, the notation t, t + l ∈ T means that we take the sum over all combinations where both t and t + l are elements in T . The test statistic given by (5) is asymptotically normally distributed. However, in small samples we assume it to be t-distributed with N − 1 degrees of freedom.
We also consider the joint test of all elements in the vector μ 1 being zero, by applying the test statistic withd = d 1 ,d 2 , . . . ,d K , andQ being a matrix version ofq defined aŝ The test statistic in (7) is asymptotically χ 2 -distributed with K degrees of freedom. In small samples, however, we assume it to be F-distributed with K degrees of freedom in the numerator and N − 1 degrees of freedom in the denominator. 6 Table 4 reports the test results for unbiasedness (see also Tables 9 and 14): Using a 5% significance level, we do not reject the null hypothesis that the preliminary data are unbiased, except for the second vintage of gross fixed capital formation (JK). This is also supported in Table 14, whereas in Table 9 we cannot reject the null hypothesis for JK. Since we are using a 5% significance level for the tests, we will expect 1 out of 20 independent tests to yield rejection even if the null hypothesis is true. Therefore, 6 A potential problem with this and the remaining joint test is the approximately linear relationship between the variables measured in percent growth: is the private consumption to GDP ratio in year t (and similarly for s CO,t , s J ,t , s EXP,t , and s IMP,t ). If these ratios are time-invariant, the covariance matrix in (8) (and alsoΩ for the later defined t test) will not be positive definite and, thus, not invertible. If this turns out to be a problem, one can exclude one of the variables.
based on the tests for each combination of variables and vintages, we could argue that the overall conclusion from the test results reported in Table 4 is that they are in line with the hypothesis that the preliminary figures are unbiased. However, the joint hypothesis for unbiasedness, which considers the hypothesis that all preliminary figures are unbiased, is rejected for both vintages 2 and 3. Also, unbiasedness for vintage 1 is close to being rejected at the 5% significance level, which is due to the biased preliminary figures for JK; if we exclude JK in the vector of considered variables, we cannot reject that the vector of preliminary variables is unbiased. Then, our overall conclusion is that preliminary figures for JK are significantly biased and under-predict the final vintage. Our results are also in line with Strohsal and Wolf (2020), who rejected unbiasedness for preliminary German NA figures for vintage 1.

Weak efficiency
A problem with the test of unbiasedness in the previous section is that we fail to reject the null of absence of bias not only if the estimated bias (μ i ( j) ) is close to zero but also if the variance in (6) is large. When τ = 0, the variance in (6) is equal to the prediction variance and can be decomposed as 2,3,4), and, then by definition,ē i ( j) = y i (4) −ȳ i ( j) ) which shows that this variance does not only become high if there is a large observed variance in the variable we want to predict (the first term), but also if there is a large observed variance in the prediction (the second term), and in particular if the preliminary predictions are not highly positively correlated with the variable we want to predict (the third term).
To handle the problem that non-rejection of unbiasedness can be due to a large variance in (6), we also apply the Mincer and Zarnowitz (1969) regression, which usually is used to test the joint hypothesis of unbiasedness and weak efficiency, given as β i 0 = 0 and β i 1 = 1 in ( j) . The OLS estimator for β i * 1 is Inserting the expression forβ i * 1 in (9) Therefore, for the observed prediction variance to exceed the variance in the preliminary version j, we must haveβ i * 1 < −0.5.
i.e., the variance of preliminary figures must be smaller than the variance of the final figures. Thus, the test of β * 1 = 0 (or the more common joint test of β 0 = 0 combined with β * 1 = 0) is also a test of whether the revision from vintage j to the final vintage contains "news," see Mankiw et al. (1984) and Croushore and Stark (2003).
increases with the vintage version. Since PV i j decreases with the vintage version for most variables (as can be seen from Table 3), the test of β * 1 = 0 also becomes a test of whether each revision step contains "news." The hypothesis of β i 1 = 1 ⇔ β i * 1 = 0 can be tested by defining and applying (7). Table 5 reports the results (see also Table 10  and Table 15).
If we under the alternative hypothesis impose the restriction that β * 1 is equal across all variables (β 1 * 1 = β 2 * 1 = · · · = β K * 1 ), then the test of the hypothesis that this  the denominator ** and * indicate significance at the 1% and 5% level, respectively common parameter is equal to zero can be formulated by defining and using the test statistic given in (5), which is assumed to be t-distributed with N K −1 degrees of freedom. This follows from an analog derivation of the test statistic for the equal predictability test in Hungnes (2020). The usual approach when applying the Mincer and Zarnowitz (1969) regression is to test the joint hypothesis β i 0 = 0 and β i 1 = 1. Here, we consider these tests separately, as we are reporting results of testing β i 1 = 1 in Table 5, and test results related to the hypothesis β i 0 = 0 conditioned on β i 1 = 1 are reported in Table 4. With only one exception (vintage 2 of Final consumption expenditure of general government, CO), we do not reject the null hypothesis of β * i 1 = 0. Thus-since we concluded that the results in Table 4 indicate that the preliminary figures also are unbiased for all variables excluding gross fixed capital formation (JK)-the results indicate that preliminary figures are both unbiased and efficient for these variables.
We apply two different tests for testing the null hypothesis of joint efficiency. In the F test, we allow the individual β * i 1 to differ across variables under the alternative hypothesis. Using the t test, this parameter is restricted to be equal under the alternative hypothesis. The advantage of the latter test is increased power. For both these tests, we do not reject the null hypothesis of β * i 1 = 0, ∀i, which implies that considered jointly the preliminary figures are efficient estimates of the final vintage of the NA figures.
Strohsal and Wolf (2020) consider the joint hypothesis β i 0 = 0 and β i 1 = 1 for the first vintage of German NA figures. They find that this hypothesis is not rejected for GDP, gross capital formation and export. However, for private consumption and public consumption (corresponding to CP and CO here), Strohsal and Wolf (2020) need to apply later vintages for the hypothesis not to be rejected. Granger and Newbold (1973) and Chong and Hendry (1986) suggest an encompassing test that can be used to test whether one vintage of the data is inferior to a later vintage of the data, i.e., the first of the two vintages contains no additional information. Consider two different vintages of the value for variable i in year t, denoted y i t,( j 1 ) and y i t,( j 2 ) , where 1 ≤ j 2 < j 1 ≤ 3 such that j 1 represents the latest vintage. Then, consider the "composite artificial model"

Encompassing
which is a weighted average of the values of the two preliminary vintages with weight α for preliminary vintage j 2 , u i t,( j 1 , j 2 ) is an error term. 7 The encompassing test of the hypothesis α = 0 investigates whether vintage j 1 contains all information (i.e., there is no additional information in vintage j 2 ). This test can be considered as an alternative test for efficiency to the one considered in Table 5; if the latter vintage of the variables is efficient, its prediction cannot be improved by using an earlier vintage of the same variable.
Note that if we subtract y i t,( j 1 ) on both sides of (10), we get the formulation Harvey et al. (1998) show that test of encompassing based on (10) or (11) is related to the equal predictability test put forward by Diebold and Mariano (1995). Define where is the error in (11) when a = α. Thus, the encompassing version of the Diebold and Mariano (1995) test is based on d i (a),t,( j 1 , j 2 ) with a = 0, as The Diebold and Mariano (1995) test involves testing whether the population equivalence of the mean of d i (0),t,( j 1 , j 2 ) is equal to zero. The corresponding test statistic is given in (5) with d i t = d i (0),t,( j 1 , j 2 ) . The results are reported in Table 6 (see also  Table 11 and Table 16). In the last line of the table, results from a vector version of the test is also reported, see Hungnes (2018). This test involves imposing the restriction that the parameter α is equal across all variables and considers the null hypothesis that the value of this common parameter is equal to zero. We define d t = e t,( j 1 ) − e t,( j 2 ) Ω −1 e t,( j 1 ) , whereΩ = N −1 t∈T e t,( j 1 ) e t,( j 1 ) and apply the test statistic given in (5). This test statistic is assumed to be approximately t-distributed with N K − 1 degrees of freedom. This follows from an analog derivation of the test statistic for the equal predictability test in Hungnes (2020) that shows that the test statistic is asymptotically normally distributed; see also Hungnes (2018).  First, we test whether vintage 2 encompasses vintage 1. (Here, N = 28 as we exclude the years 1994 and 2001, as there is a benchmark revision for these 2 years between vintage 1 and vintage 2.) For government consumption (CO), we get an estimate of 0.40, implying that the best (numerical) estimate of the growth in CO after the first two vintages are published is a weighted average with 0.60 weight on vintage 2 and 0.40 weight on vintage 1. For gross fixed capital formation (JK), the best estimate would be with a weight of as much as 0.71 on vintage 1. However, this may be a result of how we have adjusted for benchmark revisions. When ignoring the effect of benchmark revisions, we do not reject that the second vintage of either CO or JK encompasses the first vintage, see Table 11. And when excluding the years where a benchmark revision has taken place, we only reject that vintage 2 of CO encompasses vintage 1 at a 5% significance level-and with a much smaller estimate of the optimal weight of vintage 1 (see Table 16). Considering all variables jointly, we reject that vintage 2 encompasses vintage 1 (at the 5% significance level). 8 If we exclude gross capital formation (JK) from the joint test, we do not reject the null hypothesis that vintage 2 encompasses vintage 1.
Second, we test whether vintage 3 encompasses vintage 2. Table 6 shows that we neither reject the null hypothesis for encompassing for the variables individually nor jointly.
Third, we test whether vintage 3 encompasses vintage 1. Again, we reject the null hypothesis for gross fixed capital formation (JK). The rejection of encompassing for JK also leads to rejection of the joint test for encompassing: If we exclude JK from the vector of considered variables, we do not reject the null hypothesis.
The overall conclusion we draw from Table 6, Table 11 and Table 16 is that when excluding gross fixed capital formation (JK), the test results more or less support the null hypothesis that the latest vintage encompasses an earlier vintage, both considered individually and jointly.

Equal predictability
The equal predictability test implies testing the null hypothesis of α = 0.5 in (11). By defining we can apply the test statistic given in (5). This test statistic is assumed to be approximately t-distributed with N − 1 degrees of freedom. The vector version of the test is derived in Hungnes (2020) and implies restricting α to be equal across all variables and testing the null hypothesis that this common parameter is equal to one-half. To apply this test, we define d t = e t,( j 1 ) − e t,( j 2 ) Ω −1 e t,( j 1 ) + e i t,( j 2 ) whereΩ = N −1 t∈T e t,( j 1 ) + e t,( j 2 ) e t,( j 1 ) + e t, ( j 2 ) and apply the test statistic given in (5). This test statistic is according to Hungnes (2020) asymptotically normally distributed, and we assume it to be approximately t-distributed with N K − 1 degrees of freedom in small samples.
The results of these tests are reported in Table 7. For the individual tests of equal predictability, we only reject the null hypothesis of α = 0.5 in a few cases. However, when testing this hypothesis on the vector of all variables except gross fixed capital formation (JK), we reject the null hypothesis of equal predictability. Furthermore, in the joint test, the point estimates of the optimal weight between the two vintages are in all three cases well below 0.5. Hence, the later vintage is significantly better than an earlier vintage of the variables.

Conclusions
In this paper, we have investigated the revision process of National Accounts figures for Norway. We have found that the accuracy of most of the preliminary figures increases throughout the revision process, as the root-mean-squared errors (RMSEs) decrease throughout the revision process. For most of the variables considered here, the preliminary figures are unbiased estimates of their final values. The exception is for gross fixed capital formations, which tend to be underestimated in all preliminary versions.
We also conducted two different tests to investigate the efficiency of the preliminary National Accounts figures. The first, a variant of the Mincer and Zarnowitz (1969) test, indicates that the preliminary figures are weakly efficient estimates of the final National Accounts figures.
The second type of test on efficiency in the preliminary figures involves comparing different vintages of the National Accounts figures against each other, both variable by variable but also the full vectors of the variables. We have conducted an equal predictability test between two different vintages of a vector of the National Accounts series that excludes gross fixed capital formation. For each pair of vintages that we have compared, we have rejected the equal predictability hypothesis, and combined with estimated parameters, we find that the most recent vintage of the preliminary National Accounts data is significantly better than an earlier vintage of the data. We have also conducted encompassing tests for the same pairs of the preliminary vintages.
The results for the variables tested separately are somewhat mixed. But when tested jointly, we cannot reject the null hypothesis that a later vintage encompasses an earlier one when we exclude gross fixed capital formation. When including gross fixed capital formation in the vector of variables, we reject the null hypothesis for a majority of the compared vintages, indicating that the National Accounts figures for this series are not optimally updated.  The figure for total fixed capital formation is given as the sum of fixed capital formation across all industries. For some industries, Statistics Norway has good sources for gross capital formation for preliminary figures. For other industries, few or no sources are available for early vintages of gross fixed capital figures. The lack of sound sources for these industries may be why revisions of preliminary National Accounts figures for gross capital formation are biased and predictable.

A Benchmark revisions in Norway since 1988
At more or less regular intervals-approximately every 5 years-revisions of the National Accounts series of figures are carried out. These are referred to as benchmark revisions and normally include the incorporation of new definitions and classifications that come with international regulations. Benchmark revisions may also include the incorporation of new source material, new calculation schemes and any error correction of earlier publications without any definition changes being made. Benchmark revisions often lead to level shifts in the time series. In connection with the publication of benchmark revisions, the time series in the National Accounts are updated. This is done to ensure that the time series are consistent and comparable back in time to provide the most accurate picture of developments. For the years 1988-2017, there have been six benchmark revisions (BR): BR1995, BR2002, BR2006, BR2011, BR2014 and BR2019.
BR1995 involved the incorporation of new definitions and guidelines from SNA93, as well as the review and inclusion of new statistical data from the last 10-15 years before the benchmark revision started. Due to the extensive work for preparing the benchmark revision, the final vintage for the year 1991 was delayed and finally published according to BR1995.
BR2002 was a comprehensive revision without new definitions and classifications. The main reason for carrying out the numerical revision was that Statistics Norway compiled new structural statistics for several industries during the 1990s . The results  of the revision were published in June 2002, with revised final figures for 1991-1999,  as well as new preliminary figures for 2000 and 2001. BR2006 was published in December 2006. The main reason for the revision was an EU regulation that required the size of indirectly measured banking and financial services to be distributed to end users-such as product intermediates or consumption-rather than being deducted from the gross domestic product in a correction item. This revision resulted in a higher level of GDP.
BR2011 was published in November 2011. The most significant change was the incorporation of a new industry standard, which is consistent with the EU standard NACE Rev. 2. The new industry standard was the reason why the final vintage for the year 2008 was delayed by one year.
BR2014 was published in November 2014. The most significant change that this major revision entailed was that research and development work went from being treated as intermediate consumption to being treated as investments. Therefore, the benchmark revision redistributed costs from intermediate consumption to investments, and the result was a higher level of GDP. The definition of Mainland Norway was also changed and contributed to increased growth in Mainland GDP (see also footnote 1).
BR2019 was published in August 2019, and incorporation of a new data source for salaries and employment ("a-ordningen") was the most important single cause of the revisions. Transfer of some specific units from market producers to the government sector, as well as a change in how some existing sources are used, has caused other corrections in earlier published figures.

B Estimation result with alternative treatment for benchmark revisions
See Tables 8, 9 , 10, 11, 12, 13, 14, 15, 16 and 17.  Root-mean-squared error (RMSE) and root of both the absolute and the relative forecast variance, as well as the root of the variance of the final vintage        Root-mean-squared error (RMSE) and root of both the absolute and the relative forecast variance, as well as the root of the variance of the final vintage    ( j) is the difference between vintage j and the final vintage of a National Accounts figure. All reported tests are t tests with N − 1 degrees of freedom when one variable is considered and with N K − 1 degrees of freedom when there is a joint test involving all K variables ** and * indicate significance at the 1% and 5% level, respectively

C Graphs
The graphs below show percentage growth rates as published in the first to fourth versions (top), and associated revisions in percentage points (bottom). In the upper graph, the color code indicates which benchmark revision the different versions belong to (Figs. 1,2,3,4,5,6,7 and 8).  Fig. 1