Introduction

Stationarity is a minimal requirement in time series to make statistical inference. Until the 1980s, the standard approach for removing the nonstationary component was to make regressions on time, under the assumption that the estimated errors were then stationary. Later on, and especially after the seminal paper (Nelson and Plosser 1982) in 1982, taking first (or second) differences in the series became standard practice. This came after the development of various unit root tests, starting with Fuller (1976) and Dickey and Fuller (1979) (ADF), and followed by others such as Phillips and Perron (1988), Kwiatkowski et al. (1992), Elliot et al. (1996), etc. Nevertheless, all these methods consider only two alternatives for the series of interest: stationarity I(0) or nonstationarity I(1). In other words, you have to determine whether to work with the series itself, in levels, if it is I(0), or in first differences if it is I(1) (in some special cases the series might require twice differentiation if it is I(2)). During the 90s, however, a more flexible approach became very popular that allows the number of differences to be adopted in the series to be a fractional value. Thus, it may be 0 or 1, but also a value below 0, in the interval (0, 1) or even above 1.

Granger (1980) proposed this type of model under the observation that the periodogram of many aggregated time series possessed a very large value at the smallest (zero) frequency, which may be consistent with the need for differentiation, but once the series were differentiated, the periodograms displayed a value close to zero at the zero frequency, which was consistent with overdifferentiation. Robinson (1978), earlier, also justified these models based on the aggregation of heterogeneous autoregressive processes. Years later, in 1997, Gil-Alana and Robinson (1997) examined an updated version of the fourteen macro series studied in Nelson and Plosser (1982). The latter authors, using the tests of Dickey and Fuller (1979) found that all except one of the series were consistent with I(1) nonstationary models. Gil-Alana and Robinson (1997), however, discovered that fractional degrees of differentiation were appropriate for all the series examined. Fractional integration then became popular in the forthcoming years (see e.g., Michelacci and Zaffaroni 2000; Box-Steffensmeier and Tomlinson 2000; Candelon and Gil-Alana 2004; Gil-Alana and Moreno 2007; Abbritti et al. 2016, etc.). On the other hand, many authors have demonstrated that fractional integration is an issue that is very much related with the potential presence of breaks in the data that are not taking into account (see, e.g., Gil-Alana 2008a; Barassi et al. 2018). However, other authors argue that structural breaks produce abrupt changes in the models that are not observed in the data, and non-linear models have been proposed as alternative approaches (see, e.g., Diebold and Inoue 2001).

In this paper, various models based on fractional integration are employed in the analysis of road casualties in Great Britain. Linear and non-linear trends along with breaks are permitted in the data, and the results display different interpretations depending on the model employed. The rest of the paper is structured as followed. “Methodology” section is devoted to the methodology and the different models used in the manuscript. “Data and Empirical Results” section presents the data and the main empirical results, while “Conclusions” section contains the most relevant conclusions.

Methodology

We define L as the lag operator, implying that Lkx(t) = x(t-k). In this context, a series is said to be integrated of order 1, or I((1) if it can be represented as

$$\left(1 - L\right) x\left(t\right) = u(t), \quad t=1, 2, \ldots ,$$
(1)

where x(t) represents a time series, and u(t) is a covariance stationary process that is short memory or integrated of order 0. Short memory processes are characterized because the infinite sums of its autocovariances, defined as γ(u) = E[(x(t) − μ)(x(t + u) − μ), is finite, i.e.,

$$ \sum\limits_{j = - \infty }^{\infty } {\left| {\gamma (u)} \right|} < \infty. $$
(2)

In this category of short memory processes, we can include the classical AutoRegressive Moving Average (ARMA) processes such that if u(t) is ARMA(p, q), x(t) is said to be ARIMA(p, 1, q). As mentioned in the above section, there are many statistical testing methods to determine if a series is I(1) or I(0) (also named short memory) such as ADF (Dickey and Fuller 1979), PP (Phillips and Perron 1988), KPSS (Kwiatkowski et al. 1992), ERS (Elliot et al. 1996), NP (Ng and Perron 2001), but all them only consider integer orders of integration. On the other hand, authors such as Diebold and Rudebush (1991), Hassler and Wolters (1994), Lee and Schmidt (1996) and others showed that most of the above mentioned approaches have very low power if the true data generating process is I(d) where d is a fractional value. If the differencing parameter d is fractional, Eq. (1) can be extended as

$$ (1 - L)^{d} x(t) = u(t) \quad t = 0 \pm 1,.... $$
(3)

and, as long as d is positive, x(t) in (3) becomes long memory in the sense that the infinite sum of the autocorrelations is infinite, i.e.,

$$ \sum\limits_{j = - \infty }^{\infty } {\left| {\gamma (u)} \right|} = \infty. $$
(4)

Alternatively, we can describe the two types of processes, long memory and short memory in the frequency domain. Defining the spectral density function, f(λ), as the Fourier transform of the autocovariances, i.e.,

$$ f(\lambda ) = \sum\limits_{j = - \infty }^{\infty } {\gamma (u)\,} e^{i\,\lambda \,u} = \sum\limits_{j = - \infty }^{\infty } {\gamma (u)\,} \cos \,(\lambda \,u). $$
(5)

we say that x(t) is long memory if the spectrum goes to infinity at least at one point in the frequency [0, π),

$$ f (\lambda ) \to \infty , \,\text{for}\,\text{some} \,\lambda \in [0, \pi ). $$
(6)

On the other hand, it is said to be short memory if f(λ) is positive and bounded at all frequencies,

$$ 0 < f(\lambda ) < \infty $$
(7)

as in the white noise case, where f(λ) is constant across λ. In the context of (3) it can be shown that the spectral density function of x(t) is

$$ f(\lambda ) = \frac{{\sigma^{2} }}{2\,\pi }\,\left| {\frac{1}{{1 - e^{i\,\lambda } }}} \right|^{{\text{d}}} , $$
(8)

and it tends to infinity as λ → 0+ with d > 0 justifying its long memory property.

The fractional integrated process in (3) allows for a great variety of specifications, including among others:

  1. (i)

    anti-persistence, if d < 0,

  2. (ii)

    short memory, if d = 0

  3. (iii)

    long memory, though covariance stationary processes, if 0 < d < 0.5,

  4. (iv)

    nonstationary mean reverting processes, if 0.5 ≤ d < 1,

  5. (v)

    unit roots if d = 1,

  6. (vi)

    and explosive processes if d > 1.

In addition, the model in (1) can be extended to allow for linear (or even non-linear) trends. In the most classical way, and following the standard unit roots literature (Bhargava 1986; Schmidt and Phillips 1992), we consider a linear time trend model of the following form,

$$y\left(t\right) = \alpha + \beta t + x\left(t\right), t=1, 2, \dots ,$$
(9)

where α and β are unknown coefficients to be estimated, specifically a constant and a time trend, and where xt is given by Eq. (3).

In the empirical application conducted in the following section, we first estimate a model given by the Eqs. (9) and (3), assuming that the error term u(t) adopts first a white noise process, then an AR(1) structure, and finally the exponential spectral approach of Bloomfield (Bloomfield 1973), which is a non-parametric technique that approximates AR structures by means of an approximation of the logged spectral density function. He showed that the log spectrum of an AR process can be well approximated by the following expression,

$$ f\left( {\lambda ; \tau } \right) = \left[ { \frac{\sigma 2}{{2\pi }} } \right]\exp \left[ {2 \sum \limits_{i = 0}^{n} \tau_{i} \cos \left( {\lambda i} \right) } \right]_{ } . $$
(10)

This model displays exponentially decaying autocorrelations as in the AR case, and using this specification, there is no need of relying on so many parameters as in the ARMA models, which always is tedious in terms of estimation, testing, and model specification. In addition, this model is stationary for all range of values unlike what happens in the AR case. See Gil-Alana (2004, 2008b) for the implementation of the model of Bloomfield in the context of fractional integration.

The presence of structural breaks is also taken into account by using Bai and Perron (2003) and Gil-Alana (2008a)’s approaches, the latter being an extension of the former to the fractional case. This is used noting that various authors have argued that fractional integration might be an artificial artifact generated by the presence of breaks in the data that have not been taken into account (Diebold and Inoue 2001; Granger and Hyung 2004, etc.).

Finally, in order to avoid the abrupt change produced by the use of structural breaks, a non-linear deterministic model, based on the Chebyshev polynomials in time, and used in Gil-Alana and Cuestas (Gil-Alana and Cuestas 2016) in the context of I(d) models is also employed in the paper. Using this model, we replace Eq. (9) by

$$ y(t) = \sum\limits_{i\, = \,0}^{m} {\theta_{i} P_{iT} } (t) + x(t),\quad t = 1,2,..., $$
(11)

with m indicating the order of the Chebyshev polynomial Pi,T(t) defined as:

$$ P_{0,T} (t) = 1, $$
$$ P_{i,T} (t) = \sqrt 2 \cos \left( {i\pi (t - 0.5)/T} \right),\quad t = 1,2,...,T;\quad i = 1,2,... $$
(12)

See Hamming (1973) and Smyth (1998) for a detailed description of these polynomials. Authors such as Bierens (1997) and Tomasevic and Stanivuk (2009) argued that it is possible to approximate highly non-linear trends with rather low degree polynomials. If m = 0 the model contains an intercept, and if m > 0, it becomes non-linear—the higher m is the less linear the approximated deterministic component becomes.

Data and Empirical Results

We use annual data corresponding to the all road used casualties (killed) in Great Britain from 1926 until 2021. They have been obtained from the UK Department for Transport (https://en.wikipedia.org/wiki/Department_for_Transport). The time series plot of the data is displayed in Fig. 1. It can be observed that the highest death rate corresponds to 1941 during World War II, while 1966 is the year with the highest value at peacetime. Since then, the reported deaths have been generally decreasing and the lowest number corresponds to 2020 (1460). Several national casualty reduction targets were proposed at years 1987, 1999 and 2010, being successful in all three cases.

Fig. 1
figure 1

Time series data and estimated segmented trend

The first model examined is the one given by Eqs. (9) and (3), i.e.,

$$ y(t) = \alpha + \beta t + x(t)\,;\quad \,(1 - L)^{d} x_{t} = u_{t} ,\quad \,t = 1,2,..., $$
(13)

and the results are displayed across Tables 1 and 2. In Table 1 we report the estimates of d under the three classical scenarios in the unit root literature of (i) no deterministic terms, i.e., imposing that both unknown coefficients, α and β are equal to zero a priori in (13); (ii) with an intercept (β = 0) and (iii) with an intercept and a linear time trend. The values in bold in the table refer to the selected specification for each type of residuals, which are (1) white noise, (2) an AR(1) process, and (3) the exponential spectral model of Bloomfield (1973). This selection is based on the statistical significance of the estimated coefficients in the first equality in (13).

Table 1 Estimates of the differencing parameter d
Table 2 Estimated coefficients of the selected specifications in Table 1

The first thing we observe in Table 1 is that the time trend coefficient is not required in any single case, while the intercept is significant in the three cases. The estimates of the differencing parameter are 1.05, 0.71 and 0.86 respectively for white noise, AR and Bloomfield disturbances, and the unit root null hypothesis cannot be rejected in any single case, implying permanency of shocks according to this simple model. Table 2 displays the estimated coefficients of the selected model for each type of disturbances.

Performing the methods for testing the presence of structural breaks we find a single break at 1963. See Table 3. Tables 4 and 5 re-estimates the model given by Eq. (12) for each subsample, using both uncorrelated (white noise) and Bloomfield disturbances.

Table 3 Structural breaks
Table 4 Estimates of d
Table 5 Estimated coefficients

We observe that if u(t) is a white noise process, the estimates of d are 1.01 for the first subsample, and 0.85 for the second one, and the unit root null cannot be rejected in either of the two subsamples. However, the time trend coefficients is now significantly negative in the second subsample. Allowing for autocorrelation throughout the model of Bloomfield the estimates of d are now much lower, 0.51 for the first subsample and 0.11 for the second one, and the confidence intervals are wider. In fact, for the first subsample, we cannot reject either the I(0) or the I(1) hypothesis. In the second one, however, the I(1) hypothesis is rejected in favour of mean reversion and thus transitory shocks. The time trend coefficient is also negative in this case. Thus, according to this specification, if there is an exogenous shock affecting the death rates, its impact will not be permanent, since the series will return by itself to its trend.

As a final approach, the non-linear deterministic trend model in (11) is employed along with (3), and the results are displayed across Table 6. We see that if u(t) is white noise, all the Chebyshev coefficients are found to be statistically insignificant. However, allowing for autocorrelation, a non-linear structure is observed, with an estimated value of d equal to 0.60, and where the unit root null cannot be rejected.

Table 6 Non-linear deterministic terms

Conclusions

We have examined in this paper the total number of all road user casualties (killed) in Great Britain for the time period from 1926 to 2021. Using a variety of linear and non-linear models, all based on fractionally integration, our results indicate that if the sample is considered as a whole, the data are very persistent, supporting the existence of unit roots and implying permanency of shocks. However, if segmented trends are permitted, a structural break is found at 1963 and a different pattern is observed before and after the break. (see Fig. 1 with the estimated trend for the model with autocorrelated errors). Thus, for the sample ending at 1963 no time trend is observed and the estimated order of integration is 0. 51, though the unit root null cannot be rejected due to the large confidence interval for the value of d; for the second subsample, however, the estimated value of d is 0.11 and the hypothesis of d is decisively rejected in favour of d < 1 implying transitory shocks. Moreover, a significant time trend is also observed for this subperiod. A clear implication of this result is that mean reversion seems to take place in the data under examination and thus, there is no need for strong actions in the event of exogenous shocks since the series will return to its original long term projection.

Alternative methods based for instance on time varying differencing parameters or alternative non-linear structures such as those based on deterministic functions of time, like the Fourier functions (Gil-Alana and Yaya 2021) or neural networks (Yaya et al. 2021) are being investigated at present. Future work should also investigate non-linear stochastic models within the fractional integration structure as in Caporale and Gil-Alana (2007).