Keywords

4.1 Potential Solutions to Stochastic Trend Non-stationarity

 As described in Sect. 2.2, Yule created integrated processes deliberately, but there are many economic, social and natural mechanisms that induce integratedness in data. Perhaps the best known example of an I(1) process is a random walk, where the current value is equal to the previous value plus a random error. Thus the change in a random walk is just a random error. Such a process can wander widely, and was first proposed by Bachelier (1900) to describe the behaviour of prices in speculative markets. However, such processes also occur in demography (see Lee and Carter 1992) as well as economics, because the stock of a variable, like population or inventories, cumulates the net inflow as discussed for Fig. 2.3. A natural integrated process is the concentration of atmospheric CO\(_2\), as emissions cumulate due to CO\(_2\)’s long atmospheric lifetime, as in the right-hand panel of Fig. 3.4. Such emissions have been mainly anthropogenic since the industrial revolution. When the inflows to an integrated process are random, the variance will grow over time by cumulating past perturbations, violating stationarity. Thus, unlike an I(0) process which varies around a constant mean with a constant variance, an I(1) process has an increasing variance, usually called a stochastic trend, and may also ‘drift’ in a general direction over time to induce a trend in the level.

Fig. 4.1
figure 1

Twenty successive serial correlations for (a) nominal wages; (b) real wages; (c) wage inflation; and (d) real wage growth

Cumulating past random shocks should make the resulting time series relatively smooth since successive observations share a large number of past inputs. Also the correlations between successive values will be high, and only decline slowly as their distance apart increases–the persistence discussed in Sect. 2.1. Figure 4.1(a), (b) illustrates for the logs of wages and real wages, where the sequence of successive correlations shown is called a correlogram. Taking wages in the top-left panel (a) as an example, the outcome in any year is still correlated 0.97 with the outcome 20 years previously, and similar high correlations between variables 20 years apart hold for real wages. Values outside the dashed lines are significantly different from zero at 5%.

Fig. 4.2
figure 2

Densities of the differences for: (a) nominal wages; (b) prices; (c) real wages and (d) productivity

Differencing is the opposite of integration, so an I(1) process has first differences that are I(0). Thus, despite its non-stationarity, an I(1) process can be reduced to I(0) by differencing, an idea that underlies the empirical modelling and forecasting approach in Box and Jenkins (1976). Now successive values in the correlogram should decline quite quickly, as Figs. 4.1(c) and (d) show for the differences of these two time series. Wage inflation is quite highly correlated with its values one and two periods earlier, but there are much smaller correlations further back, although even as far back as 20 years, all the correlations are positive. However, the growth of real wages seems essentially random in terms of its correlogram. As a warning, such evidence does not imply that real wage growth cannot be modelled empirically, merely that the preceeding value by itself does not explain the current outcome.

Differences of I(1) time series should also be approximately Normally distributed when the shocks are nearly Normal. Such outcomes implicitly suppose there are no additional ‘abnormal’ shocks such as location shifts. Figure 4.2 illustrates for wage and price inflation, and the growth in real wages and productivity. None of the four distributions is Normal, with all revealing large outliers, which cannot be a surprise given their time series graphs in Fig. 3.2.

To summarise, both the mean and the variance of \(\mathsf{I}(1)\) processes change over time, and successive values are highly interdependent. As Yule (1926) showed, this can lead to nonsense regression problems. Moreover, the conventional forms of distributions assumed for estimates of parameters in empirical models under stationarity no longer hold, so statistical inference becomes hazardous unless the non-stationarity is taken into account.

4.2 Cointegration Between I(1) Processes

Linear combinations of several I(1) processes are usually I(1) as well. However, stochastic trends can cancel between series to yield an I(0) outcome. This is called cointegration. Cointegrated relationships define a ‘long-run equilibrium trajectory’, departures from which induce ‘equilibrium correction’ that moves the relevant system back towards that path.Footnote 1 Equilibrium-correction mechanisms are a very large class of models that coincide with cointegrated relations when data are I(1), but also apply to I(0) processes which are implicitly always cointegrated in that all linear combinations are I(0). When the data are I(2) there is a generalized form of cointegration leading to I(0) combinations. Equilibrium-correction mechanisms (EqCMs) can be written in a representation in which changes in variables are inter-related, but also include lagged values of the I(0) combinations. EqCMs have the key property that they converge back to the long-run equilibrium of the data being modelled. This is invaluable when that equilibrium is constant, but as we will see, can be problematic if there are shifts in equilibria.

Fig. 4.3
figure 3

Time series for the wage share

Real wages and productivity, shown in Fig. 3.2, are each I(1), but their differential, which is the wage share shown in Fig. 4.3, could be I(0). The wage share cancels the separate stochastic trends in real wages and productivity to create a possible cointegrating relation where the stochastic trends have been removed, but there also seem to be long swings and perhaps location shifts, an issue we consider in Sect. 4.3.

To illustrate pairs of variables that are (i) unrelated I(0) but autocorrelated, (ii) unrelated I(1), and (iii) cointegrated, Fig. 4.4 shows 500 observations on computer-generated data. The very different behaviours are marked, and although rarely so obvious in practice, the close trajectories of real wages and productivity in Fig. 3.3 over 150 years resembles the bottom panel.

Fig. 4.4
figure 4

Pairs of artificial time series: (i) unrelated I(0); (ii) unrelated I(1); (iii) cointegrated

In economics, integrated-cointegrated data seem almost inevitable because of the Granger (1981) Representation Theorem, for which he received the Sveriges Riksbank Prize in Economic Science in Memory of Alfred Nobel in 2003. His result shows that cointegration between variables must occur if there are fewer decision variables (e.g., your income and bank account balance) than the number of decisions (e.g., hundreds of shopping items: see Hendry 2004, for an explanation). If that setting was the only source of non-stationarity, there would be two ways of bringing an analysis involving integrated processes back to I(0): differencing to remove cumulative inputs (which always achieves that aim), or finding linear combinations that form cointegrating relations. There must always be fewer cointegrating relations than the total number of variables, as otherwise the system would be stationary, so some variables must still be differenced to represent the entire system as I(0).

Cointegration is not exclusive to economic time series. The radiative forcing of greenhouse gases and other variables affecting global climate cointegrate with surface temperatures, consistent with models from physics (see Kaufmann et al. 2013; Pretis 2019). Thus, cointegration occurs naturally, and is consistent with many existing theories in the natural sciences where interacting systems of differential equations in non-stationary time series can be written as a cointegrating model.

Other sources of non-stationarity also matter, however, especially shifts in the means of data distributions of I(0) variables, including equilibrium correction means, and growth rate averages, so we turn to this second main source of non-stationarity. There is a tendency in the econometrics literature to identify ‘non-stationarity’ purely with integrated data (time series with unit roots), and so incorrectly claim that differencing a time series induces stationarity. Certainly, a unit root is removed by considering the difference, but there are other sources of non-stationarity, so for clarity we refer to the general case as wide-sense non-stationarity.

4.3 Location Shifts

Location shifts are changes from the previous mean of an I(0) variable. There have been enormous historical changes since 1860 in hours of work, real incomes, disease prevalence, sanitation, infant mortality, and average age of death among many other facets of life: see http://ourworldindata.org/ for comprehensive coverage. Figure 3.2 showed how greatly log wages and prices had increased over 1860–2014 with real wages rising sevenfold. Such huge increases could not have been envisaged in 1860. Uncertainty abounds, both in the real world and in our knowledge thereof. However, some events are so uncertain that probabilities of their happening cannot be sensibly assigned. We call such irreducible uncertainty ‘extrinsic unpredictability’, corresponding to unknown unknowns: see Hendry and Mizon (2014). A pernicious form of extrinsic unpredictability affecting inter-temporal analyses, empirical modelling, forecasting and policy interventions is that of unanticipated location shifts, namely shifts that occur at unanticipated times, changing by unexpected magnitudes and directions.

Fig. 4.5
figure 5

Location shift in a normal or a fat-tailed distribution

Figure 4.5 illustrates a hypothetical setting. The initial distribution is either a standard Normal (solid line) with mean zero and variance unity, or a ‘fat-tailed’ distribution (dashed line), which has a high probability of generating ‘outliers’ at unknown times and of unknown magnitudes and signs (sometimes called anomalous ‘black swan events’ as in Taleb 2007). As I(1) time series can be transformed back to I(0) by differencing or cointegration, the Normal distribution often remains the basis for calculating probabilities for statistical inference, as in random sampling from a known distribution. Hendry and Mizon (2014) call this ‘intrinsic unpredictability’, because the uncertainty in the outcome is intrinsic to the properties of the random variables. Large outliers provide examples of ‘instance unpredictability’ since their timings, magnitudes and signs are uncertain, even when they are expected to occur in general, as in speculative asset markets.

However, in Fig. 4.5 the baseline distribution experiences a location shift to a new Normal distribution (dotted line) with a mean of −5. As we have already seen, there are many causes for such shifts, and many shifts have occurred historically, precipitated by changes in legislation, wars, financial innovation, science and technology, medical advances, climate change, social mores, evolving beliefs, and different political and economic regimes. Extrinsically unpredictable location shifts can make the new ordinary seem highly unusual relative to the past. In Fig. 4.5, after the shift, outcomes will now usually lie between 3 and 7 standard deviations away from the previous mean, generating an apparent ‘flock’ of black swans, which could never happen with independent sampling from the baseline distribution, even when fat-tails are possible. During the Financial Crisis in 2008, the possibility of location shifts generating many extremely unlikely bad draws does not seem to have been included in risk models. But extrinsic unpredictability happens in the real world (see e.g., Soros 2008): as we have remarked, current outcomes are not highly discrepant draws from the distributions prevalent in the Middle Ages, but ‘normal’ draws from present distributions that have shifted greatly. Moreover, the distributions of many data differences are not stationary: for example, real growth per capita in the UK has increased intermittently since the Industrial Revolution as seen in Fig. 3.2, and most nominal differences have experienced location shifts, illustrated by Fig. 3.3. Hendry (2015) provides dozens of other examples.

4.4 Dynamic-Stochastic General Equilibrium (DSGE) Models

Everyone has to take decisions at some point in time that will affect their future in important ways: marrying, purchasing a house with a mortgage, making an investment in a risky asset, starting a pension or life insurance, and so on. The information available at the time reflects the past and present but obviously does not include knowledge of the future. Consequently, a view has to be taken about possible futures that might affect the outcomes.

All too often, such views are predicated on there being no unanticipated future changes relevant to that decision, namely the environment is assumed to be relatively stationary. Certainly, there are periods of reasonable stability when observing how past events unfolded can assist in planning for the future. But as this book has stressed, unexpected events occur, especially unpredicted shifts in the distributions of relevant variables at unanticipated times. Hendry and Mizon (2014) show that the intermittent occurrence of ‘extrinsic unpredictability’ has dramatic consequences for any theory analyses of time-dependent behaviour, empirical modelling of time series, forecasting, and policy interventions. In particular, the mathematical basis of the class of models widely used by central banks, namely DSGE models, ceases to be valid as DSGEs are based on an inter-temporal optimization calculus that requires the absence of distributional shifts.

This is not an ‘academic’ critique: the supposedly ‘structural’ Bank of England Quarterly Model (BEQM) broke down during the Financial Crisis, and has since been replaced by another DSGE called COMPASS, which may be pointing in the wrong direction: see Hendry and Muellbauer (2018).

DSGE Models

Many of the theoretical equations in DSGE models take a form in which a variable today, denoted \(y_t\), depends on its ‘expected future value’ often written as \(\mathsf{E}_t[y_{t+1}|\mathcal{I}_t]\), where \(\mathsf{E}_t[\cdot ]\) indicates the date at which the expectation is formed about the variable in the \([\;]\). Such expectations are conditional on what information is available, which we denoted by \(\mathcal{I}_t\), so are naturally called conditional expectations, and are defined to be the average over the relevant conditional distribution. If the relation between \(y_{t+1}\) and \(\mathcal{I}_t\) shifts as in Fig. 4.5, \(y_{t+1}\) could be far from what was expected.

As we noted above, in a stationary world, a ‘classic’ proof in elementary statistics courses is that the conditional expectation has the smallest variance of all unbiased predictors of the mean of their distribution. By basing their expectations for tomorrow on today’s distribution, DSGE formulations assume stationarity, possibly after ‘removing’ stochastic trends by some method of de-trending. From Fig. 4.5 it is rather obvious that the previous mean, and hence the previous conditional expectation, is not an unbiased predictor of the outcome after a location shift.

As we have emphasized, underlying distributions can and do shift unexpectedly. Of course, we are all affected to some extent by unanticipated shifts of the distributions relevant to our lives, such as unexpectedly being made redundant, sudden increases in mortgage costs or tax rates, or reduced pension values after a stock market crash. However, we then usually change our plans, and perhaps also our views of the future. The first unfortunate outcome for DSGE models is that their parameters shift after a location shift. The second is that their mathematical derivations usually assume that the agents in their model do not change their behaviour from what would be the optimum in a stationary world. However, as ordinary people seem unlikely to be better at forecasting breaks than professional economists, or even quickly learning their implications after they have occurred, most of us are forced to adapt our plans after such shifts.

By ignoring the possibility of distributional shifts, conditional expectations can certainly be ‘proved’ to be unbiased, but that does not imply they will be in practice. Some econometric models of inflation, such as the so-called new-Keynesian Phillips curve, involve expectations of the unknown future value written as \(\mathsf{E}[y_{t+1}|\mathcal{I}_t]\). A common procedure is to replace that conditional expectation by the actual future outcome \(y_{t+1}\), arguing that the conditional expectation is unbiased for the actual outcome, so will only differ from it by unpredictable random shocks with a mean of zero. That implication only holds if there have been no shifts in the distributions of the variables, and otherwise will entail mis-specified empirical models that can seriously mislead in their policy implications as Castle et al. (2014) demonstrate.

There is an intimate link between forecast failure, the biasedness of conditional expectations and the inappropriate application of inter-temporal optimization analysis: when the first is due to an unanticipated location shift, the other two follow. Worse, a key statistical theorem in modern macroeconomics, called the law of iterated expectations, no longer holds when the distributions from which conditional expectations are formed change over time. The law of iterated expectations implies that today’s expectation of tomorrow’s outcome, given what we know today, is equal to tomorrow’s expectation. Thus, one can ‘iterate’ expectations over time. The theorem is not too hard to prove when all the distributions involved are the same, but it need not hold when any of the distributions shift between today and tomorrow for exactly the same reasons as Fig. 2.8 reveals: that shift entails forecast failure, a violation of today’s expectation being unbiased for tomorrow, and the failure of the law of iterated expectations.

As a consequence, dynamic stochastic general equilibrium models are inherently non-structural; their mathematical basis fails when substantive distributional shifts occur and their parameters will be changed. This adverse property of all DSGEs explains the ‘break down’ of BEQM facing the Financial Crisis and Great Recession as many distributions shifted markedly, including that of interest rates (to unprecedently low levels from Quantitative Easing) and consequently the distributions of endowments across individuals and families. Unanticipated changes in underlying probability distributions, especially location shifts, have detrimental impacts on all economic analyses involving conditional expectations and hence inter-temporal derivations as well as causing forecast failure. What we now show is that with appropriate tools, the impacts of outliers and location shifts on empirical modelling can be taken into account.

4.5 Handling Location Shifts

At first sight, location shifts seem highly problematic for econometric modelling, but as with stochastic trends, there are several potential solutions. Differencing a time series will also inadvertently convert a location shift to an impulse (an impulse in the first difference is equivalent to a step-shift in the level). Secondly, time series can co-break, analogous to cointegration, in that location shifts can cancel between series.

Fig. 4.6
figure 6

Partial co-breaking between wage and price inflation

Thus, time series can be combined to remove some or all of the individual shifts. Individual series may exhibit multiple shifts, but when modelling one series by another, co-breaking implies that fewer shifts will be detected when the series break together. Figure 3.2 showed the divergent strong but changing trends in nominal wages and prices, and Fig. 3.3 recorded the many shifts in wage inflation. Nevertheless, as shown by the time series of real wage growth in Fig. 4.6, almost all the shifts in wage inflation and price inflation cancelled over 1860–2014. The only one not to is the huge ‘spike’ in 1940, which was a key step in the UK’s war effort, to encourage new workers to replace army recruits.

The third possible solution is to find all the location shifts and outliers whatever their magnitudes and signs then include indicators for them in the model. To do so requires us to solve the apparently impossible problem of selecting from more candidate variables in a model than observations. Hendry (1999) accidently stumbled over a solution. Most contributors to Magnus and Morgan (1999) had found that models of US real per capita annual food demand were non-constant over the sample 1929–1952, so dropped that earlier data from their empirical modelling. Figure 2.4(a) indeed suggests very different behaviour pre and post 1952, but by itself that does not entail that econometric models which include explanatory variables like food prices and real incomes must shift. To investigate why, yet replicate others’ models, Hendry added impulse indicators (which are ‘dummy variables’ that are zero everywhere except for unity at one data point) for all observations pre-1952, which revealed three large outliers corresponding to a US Great Depression food programme and post-war de-rationing. To check that his model was constant from 1953 onwards, he later added impulse indicators for that period, thereby including more variables plus indicators than observations, but only entered in his model in two large blocks, each much smaller than the number of observations. This has led to a statistical theory for modelling multiple outliers and location shifts (see e.g., Johansen and Nielsen 2009; Castle et al. 2015), available in our computational tool Autometrics  (Doornik 2009) and in the package Gets (Pretis et al. 2018) in the statistical software environment R. This approach, called indicator saturation, considers a possible outlier or shift at every point in time, but only retains significant indicators. That is how the location-shift lines drawn on Fig. 3.3 were chosen, and is the subject of Chapter 5.

Location shifts are of particular importance in policy, because a policy change inevitably creates a location shift in the system of which it is a part. Consequently, a necessary condition for the policy to have its intended effect is that the parameters in the agency’s empirical models of the target variables must remain invariant to that policy shift. Thus, prior to implementing a policy, invariance should be tested, and that can be done automatically as described in Hendry and Santos (2010) and Castle et al. (2017).

4.6 Some Benefits of Non-stationarity

Non-stationarity is pervasive, and as we have documented, needs to be handled carefully to produce viable empirical models, but its occurrence is not all bad news. When time series are I(1), their variance grows over time, which can help establish long-run relationships. Some economists believe that so-called ‘observational equivalence’—where several different theories look alike on all data—is an important problem. While that worry could be true in a stationary world, cointegration can only hold between I(1) variables that are genuinely linked. ‘Observational equivalence’ is also unlikely facing location shifts: no matter how many co-breaking relations exist, there must always be fewer than the number of variables, as some must shift to change others, separating the sheep from the goats.

When I(1) variables also trend, or drift, that can reveal the underlying links between variables even when measurement errors are quite large (see Duffy and Hendry 2017). Those authors also establish the benefits of location shifts that co-break in identifying links between mis-measured variables: intuitively, simultaneous jumps in both variables clarify their connection despite any ‘fog’ from measurement errors surrounding their relationship. Thus, large shifts can help reveal the linkages between variables, as well as the absence thereof.

Moreover, empirical economics is plagued by very high correlations between variables (as well as over time), but location shifts can substantively reduce such collinearity. In particular, as demonstrated by White and Kennedy (2009), location shifts can play a positive role in clarifying causality. Also, White (2006) uses large location shifts to estimate the effects of natural experiments.

Finally, location shifts also enable powerful tests of the invariance of the parameters of policy models to policy interventions before new policies are implemented, potentially avoiding poor policy outcomes (see Hendry and Santos 2010). Thus, while wide-sense non-stationarity poses problems for economic theories, empirical modelling and forecasting, there are benefits to be gained as well.

Non-stationary time series are the norm in many disciplines including economics, climatology, and demography as illustrated in Figs. 2.3–3.2: the world changes, often in unanticipated ways. Research, and especially policy, must acknowledge the hazards of modelling what we have called wide-sense non-stationary time series, where distributions of outcomes change, as illustrated in Fig. 4.5. Individually and together when stochastic trends and location shifts are not addressed, they can distort in-sample inferences, lead to systematic forecast failure out-of-sample, and substantively increase forecast uncertainty as we will discuss in Chapter 7. However, both forms can be tamed in part using the methods of cointegration and modelling location shifts respectively, as Fig. 4.6 showed.

A key feature of every non-stationary process is that the distribution of outcomes shifts over time, illustrated in Fig. 4.7 for histograms and densities of logs of UK real GDP in each of three 50-year epochs. Consequently, probabilities of events calculated in one time period do not apply in another: recent examples include increasing longevity affecting pension costs, and changes in frequencies of flooding vitiating flood-defence systems.

Fig. 4.7
figure 7

Histograms and densities of logs of UK real GDP in each of three 50-year epochs

The problem of shifts in distributions is not restricted to the levels of variables: distributions of changes can also shift albeit that is more difficult to see in plots like Fig. 4.7. Consequently, Fig. 4.8 shows histograms and densities of changes in UK CO\(_2\) emissions in each of four 40-year epochs in four separate graphs but on common scales for both axes. The shifts are now relatively obvious at least between the top two plots and between pre and post World War II, although the wide horizontal axis makes any shifts between the last two periods less obvious.

Conversely, we noted some benefits of stochastic trends and location shifts as they help reveal genuine links between variables, and also highlight non-constant links, both of which are invaluable knowledge in a policy context.

Fig. 4.8
figure 8

Histograms and densities of changes in UK CO\(_2\) emissions in each of four 40-year epochs