Estimating persistence for irregularly spaced historical data

This paper introduces to the literature on Economic History a measure of persistence which is particularly useful when the data are irregularly spaced. An illustration to ten historical unevenly spaced data series for Holland of 1738 to 1779 shows the merits of the methodology. It is found that the weight of slave-based contribution in that period has grown with a deterministic trend pattern.


Introduction and motivation
One way to study economic history amounts to the construction and analysis of historical time series data, see for example van Zanden and van Leeuwen (2012) amongst many others. A particularly interesting period to study concerns the times of the Atlantic slave trade. One of the aspects of frequent examination concerns the contribution of slave trade to the size of an economy. Recent important studies are Eltis and Engerman (2000). Fatah-Black and van Rossum (2015) and Eltis et al. (2016). Another recent study is Brandon and Bosma (2019) who shows that 5 to10% of Gross Domestic Product (GDP) in Holland around 1770 was based on slave trade, see Table 1.
An important feature to study concerns the trends in the data. Did the contribution to GDP of slave trade grow with a steady pace, like with a deterministic trend? Or, did that contribution jump to plateaus due to structural breaks, perhaps caused by technological developments? If it would be along a deterministic trend, then shocks to the data were not persistent. If the growth patterns followed sequences of structural breaks, then those shocks were persistent. Hence, it is of interest to study the persistence properties of the historical data.
Ideally, the constructed historical data are equally spaced, like per year of per ten years, as then basic time series analytical tools can be used to study the properties of the data. In the present paper the focus is on the analysis of unequally spaced data, which can also occur in historical research, as will be evident below.

Introductory remarks
An important property of time series data is, what is called, the persistence of shocks. Such persistence is perhaps best illustrated when we consider the following simple time series model for a variable y t , which is observed for a sequence of T years, t = 1, 2, … , T , that is, y t = y t−1 + t This model is called a first order autoregression, with acronym AR(1). The t is a series of shocks (or news) that drives the data over time, and these shocks have mean 0 and common variance 2 , and over time these shocks are uncorrelated. In other words, future shocks or news cannot be predicted from past shocks or news. The is an unknown parameter that needs to be estimated from the data. Usually one relies on the ordinary least squares (OLS) method to estimate this parameter, see for example Franses et al. (2014, Chapter 3) for details.
In anAR(1) model, 1 the persistence of shocks to y t is reflected by (functions of) the parameter . This is best understood by explicitly writing down all the observations on y t when the AR(1) is the model for these data. The first observation is then y 1 = y 0 + 1 1 If one were to consider an autoregression of higher order, then the measure of persistence is the sum of the autoregressive coefficients. One may also want to consider so-called fractionally integrated time series models, where the degree of differencing d is a measure of persistence. Nonparametric methods to measure persistence also exist, like the number of times a time series crosses its mean value. where y 0 is some known starting value, that can be equal to 0 or not. In practice this starting value is usually taken as the first available observation, and then the estimation sample runs from t = 2, 3, 4 … , T . The second observation is where the expression on the right-hand side now incorporates the expression for y 1 . When this recursive inclusion of past observations is continued, we have for any y t observation that This expression shows that the immediate impact of a shock t is equal to 1. The impact of a shock one period ago (which is t−1 ) is and the impact of a shock j periods ago is j . The total effect of a shock if t → ∞ is thus when | | < 1 . So, when = 0.5 , the total effect of a shock is 2. When = 0.9 , the total effect is 10. So, when approaches 1, the impact gets larger. When = 1 , the total effect is infinite. At the same time, when = 1 , each shock in the past has the same permanent effect 1, as 1 j = 1 . In that case, shocks are said to have a permanent effect.
One may also be interested in, what is called, a duration interval. For example, a 95% duration interval is the time period 0.95 within which 95% of the cumulative or total effect of a shock has occurred. It is defined by where log denotes the natural logarithm. When = 0.5 , the 0.95 = 4.32 , and when = 0.9 , the 0.95 = 28.4 . These persistence measures are informative about how many years (or periods) shocks last.

Motivation of this paper
In this paper the focus is on persistence measures in case the data do not involve a connected sequence of years but instead concern data with missing data at irregular intervals.
Consider for example the data on Gross Domestic Product (GDP) in Holland for the sample 1738-1779 in Fig. 1. In principle the sample size is 42, but it is clear that various years with data are missing, and hence the sample effectively covers 24 years. Take for example the data in the final column of Table 2, which concern the Weights of slave-based activities in GDP Holland, for the sample 1738-1779. The data are in Fig. 2. The issue is now how we can construct persistence measures, that is, functions of like above, when the data follow a first order autoregression for such irregularly spaced data. The paper proceeds as follows. The next section presents a useful model for unevenly spaced data. It also deals with a step-by-step illustration of how to implement this method, which can be done using any statistical package. The empirical section implements this method for ten variables with irregularly spaced data, all of which appeared in a recent study of Brandon and Bosma (2019) on the economic impact of the Atlantic slave trade. The final section concludes.

Methodology
The starting point of our analysis is the representation of an AR(1) process given in Robinson (1977) (see also for example Schulz and Mudelsee, 2002). Suppose an AR(1) process is observed at times t i where i = 1, 2, 3, … , N . A general expression for an AR(1) process with arbitrary time intervals is with where is scaling the memory, see Robinson (1977). For easy of analysis, it is assumed here that t i is a white noise uncorrelated process with mean 0 but with time-variation in the variance. 2 This means that in practice, one should correct for this heteroskedasticity by using the Newey West (1987) HAC estimator.
One may continue with (1) and (2), but it may be easier to define (1) This makes that the general AR (1) model can be written as When the data would be regularly spaced, then t i − t i−1 = 1 and this model collapses into which is the standard AR(1) model above. Or, suppose the data would be unequally spaced because of selective sampling each even observation, and all the odd observations would be called as missing, then t i − t i−1 = 2 , and then the model reads as Before one proceeds with estimating the parameter in (3), one first needs to demean and detrend the data, see Robinson (1977).

Estimation
Given a sample { t i , y t i } , one can use Nonlinear Least Squares (NLS) to estimate (and hence ). Table 3 provides the key variables relevant for estimation concerning the variable in Fig. 2. The first column gives the demeaned and detrended irregularly spaced time series, that is x t i , where this variable follows from the OLS regression  where t = 1, 2, 3, … , T with T = 42 here. The demeaned and detrended data are in Fig. 3. The next column in Table 3 contains the t i − t i−1 with acronym DIFT. The last column of Table 3 reflects the new variable x t i−1 . With this new variable, one can apply NLS to and obtain an estimate of and an associated HAC standard error.

Illustration
Let us see how this works out for the ten historical series in Table 2, which are taken from Brandon and Bosma (2019, Annex page XXX). Table 4 reports the estimation results for the auxiliary regression for demeaning and detrending. Two series do not seem to have a trend as the associated parameter is not significant at the 5% level, and these are Sugar refinery and Army and Navy. However, we do use the residuals of the auxiliary regressions in the subsequent analysis. Table 5 reports on the estimated parameters. The estimates range from 0.278 (Total size GDP of Holland) to 0.907 (Sugar refinery). Comparing the estimated parameters with their associated HAC standard errors, we see that 0 is included in the 95% confidence interval only for Total size GDP of Holland. So, this variable fully follows a deterministic trend. Table 6 presents the estimated persistence of shocks (news), measured the 95% duration interval 0.95 and by . Clearly, persistence is largest for Sugar refinery and Notaries. The parameter for Notaries is 0.862 (Table 5) is very close to 1, given its HAC standard error, so one might even claim that shocks to this sector in the observed period were permanent.  Newey and West, 1987) using NLS to the regression model

Conclusion
This paper has introduced to the literature on Economic History a measure of persistence which is particularly useful if the data are irregularly spaced. An illustration to ten historical series for the impact and contribution of slave trade in Holland of 1738-1779 showed the merits of the methodology. When the question is addressed whether the contribution to GDP of slave trade has grown with a steady pace, like with a deterministic trend, or whether that contribution jumped to plateaus due to structural breaks, perhaps caused by technological developments, the following conclusion can be drawn. The persistence in the variables "Weight of slave-based activities in GDP Holland", as measured by the parameters in an AR (1) regression, is equal to 0.536 with HAC standard error 0.214. This persistence is not equal to 1, meaning that there is no sign of occasional structural breaks with a long-lasting effect. Hence, in the considered period, the contribution to GDP has steadily grown with a deterministic pattern.
Further applications should emphasize the practical relevance of the method. Also, an extension to an autoregressive process of higher order could be relevant, in order to provide additional measures of persistence. An extension to fractionally integrated processes is also relevant. Finally, and this a further technical issue, that is, one may want to formally test if = 1 . This amounts to a so-called test for a unit root, for which the asymptotic theory is different than standard, see for example Chapter 4 of Franses et al. (2014).
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creat iveco mmons .org/licen ses/by/4.0/.