Spatial long memory

We discuss developments and future prospects for statistical modeling and inference for spatial data that have long memory. While a number of contributons have been made, the literature is relatively small and scattered, compared to the literatures on long memory time series on the one hand, and spatial data with short memory on the other. Thus, over several topics, our discussions frequently begin by surveying relevant work in these areas that might be extended in a long memory spatial setting.


Introduction
"Long memory', or 'long range dependence', or 'strong dependence', in time series is a topic that has been quite extensively studied in recent years. After a number of probabilistic contributions and empirical studies, serious treatment of issues of statistical inference could be said to have begun in the mid-1980s, with activity then increasing through the 1990s and this century. Predominately, this literature has focussed on observations that are regularly spaced over time, and the bulk of the theoretical development has been in terms of asymptotic statistical theory, with the number of observations, n, regarded as diverging, finite-sample theory proving mathematically intractable, even under the precise distributional assumptions that are typically not required in a large-sample treatment. Parametric, semiparametric and nonparametric models have all featured. The major characteristic feature of a long memory covariance stationary time series process is that its autocovariance function decays so slowly with increase in lag length as not to be summable, or, nearly equivalently, its spectral density diverges, typically at zero frequenccy; while some nonstationary processes, such as ones with a unit root can, a fortiori, be regarded as having even longer memory. By contrast, 'short memory' time series typically have autocovariances that are summable and spectral density that is more or less smooth (for example a stationary autoregressive moving average (ARMA) has exponentuially decaying autocovariance and analytic spectral density); though for some relevant purposes, a short memory process is sometimes defined as merely having spectral density that is positive and finite at zero frequency.
Spatial data have long attracted the attention of statisticians, and the configurations of some such data, especially some arising in such fields as meteorology, cosmology and agriculture, can be viewed as generalizations of the typical regularly spaced time series one mentioned above. In particular, they constitute observations in M ≥ 2 dimensions, (where M = 1 in the tme series case), such that there are n m observations across dimension m, for m = 1, … , M, so n = Π M m=1 n m . We call this a '(rectangular) lattice', and data recorded on it belong to the class of 'random fields'. There is equal spacing across each dimension, but the spacing can vary across dimensions. In Fig. 1 for (horizontal) dimension 1, n 1 = 15, while for (vertical) dimension 2, n 2 = 7. When n 1 = n 2 we can speak of a 'square lattice'.
Figure 1 brings to mind regularly spaced agricultural plots on a field. In fact, 'spatial long memory' goes back at least to the agriculturally motivated paper of Smith (1938), which is also a very early reference relative to the literature on time series long memory. Consider n = n 1 n 2 agricultural plots on a field, as in Fig. 1. Denote the yield (of rice, say) at location i = i 1 , i 2 by x i 1 i 2 and the sample mean by Smith (1938) assumed in effect that and estimated the unknown d and C from the logged relation meaning that the left side is estimated from data over several fields. Now, in the special square lattice case, n 1 = n 2 = n 1∕2 , the model (1) is consistent with the underlying isotropic model which is equivalent to the usual autocovariance function of a stationary long memory time series with differencing or memory parameter d, e.g., a fractional autoregressive integrated moving average (FARIMA). To verify (1), note that It is interesting that Smith (1938) thought of a power law decay: this might seem natural to one coming to the subject unschooled in time series modelng which, after World War II, stressed exponential decay, as in ARMA modeling.
Since then, many papers on 'spatial long memory' have appeared, but the topic has not been developed as systematically or comprehensively as 'long memory time series'. Some distinctive issues arising in the 'spatial' case, all of which have been studied far more under short memory than long memory, are: 1. Is there isotropy or not? If not, we might model each dimension separately, or with some interaction, and possibly with a different memory parameter for each dimension. 2. Is sampling regularly or irregularly spaced? Whereas in time series regular spacing has been predominately studied, irregular spacing is perhaps more likely to be found with spatial data. 3. Should modeling be unilateral or multilateral? For time series unilateral modeling, reflecting one-sided transition from past to future, is usually natural, but this is not the case with spatial data, where, for example, the dimensions might be latitude and longitude. 4. The edge effect. In estimating lagged quantities, there is loss of information at the boundary of the observation region, which has negligible effect when M = 1 , but increasing, and damaging, effect as M increases unless corrected for.
In this, quite personal, view of the subject, we discuss the following topics in the remaining three sections of the paper: 1. Inference on location and regression with long memory errors. 2. Inference on second-order properties of long memory stationary processes. 3. Miscellaneous topics: nonstationary processes, irregular spacing, adaptive estimation, nonparametric regression.
We do not consider 'spatial autoregressive'-type ('SAR') models (which depend on a user-chosen weight matrix of geographic or economic inverse distances); these do not fit into our framework but typically possess a kind of short memory.

Inference on location and regression with long memory errors
The sample mean x is a basic statistic, whose asymptotic properties are well known under time series short and long memory. In the time series case, suppose where j is a sequence of independent and identically distributed (iid) random variables with zero mean and unit variance, or even homoscedastic (but not necessarily identically distributed) martingale differences. The model (2) implies x t is covariance stationary, and it can have 'short memory' or 'long memory' or 'negative memory'.
First suppose x i has short memory, i.e., x i has spectral density that satisfies with also f ( ) continuous at = 0 . Then, x = n −1 ∑ n i=1 x i is an asymptotically normal and efficient estimate of , Long memory can be described by (2) with, for example, for a slowly varying function (j) so f (0) = ∞ . We can also consider the asymptotic approximation in (3) for −1∕2 < d < 0, assuming also ∑ ∞ j=0 j = 0, whence f (0) = 0 , and there is 'negative memory'. In both cases, there is interest in allowing for a nonconstant (j) (e.g., (j) = log j ), and this has been done in limit theory for basic statistics, and also in (semiparametric) estimation of d. But for simpliciy, we focus on constant (j), which covers FARIMA and fractional Brownian motion models. In this setting, x is no longer asymptotically efficient (Adenstedt 1974) but it can still be asymptotically normal, albeit with a different rate of convergence. In particular, under suitable conditions on j , (2) for a function 2 (d) of known form. More generally consider the centered and self-normalized statistic Ibragimov and Linnik (1971) showed that if only the j are iid and Note that under negative memory diverges more and more slowly as d ↓ −1∕2 , indeed in the moving average (MA) model with an MA unit root, For spatial data, consider an M−dimensional lattice, as described in the previous section (though results can be extended to a more general, irregular region). Let i be the multiple index (i 1 , … , i M ) with i j ∈ ℤ = {0, ±1, …} , j = 1, … , M. Consider a covariance stationary process x t observed on the lattice ℕ={i ∶ 1 ≤ i m ≤ n m , m = 1, … , M}, for j iid (say) with zero mean and unit variance. For example under isotropy, with M = 2, with memory parameter > 1 . Note that for time series, long memory has been 'explained' as arising from cross-sectional aggregation of AR(1) processes, see, Robinson (1978), Granger (1980), an interpretation extended to spatial processes by, e.g., Lavancier (2011).
For n = Π M m=1 n m and the multiple partial sum process

3
Specializing their results to the case M = 2, Lahiri and Robinson (2016) obtained with and also with Such results can be extended to regression models, especially ones with deterministic regressors. One interesting case, for M = 2, and with y i a response variable that is obserrvable on the lattice is, with 0 , 1 , 2 , 1 , 2 all unknown, with x i now an unobservable error process. For a more general regression model, and with general M, Robinson (2012) considered nonlinear least squares estimation of the ′ s and ′ s, for short memory x i , where 1 > −1∕2, 2 > −1∕2 is required (with negative values of the i conferring a decaying trend that could be relevant for some spatial data). In this case, the estimates are asymptotically efficient (in the Gauss-Markov sense), but the ′ s are estimated slightly less well than the ′ s, with rates dependng on the ′ s (and slightly less well than if the ′ s were known) . With long memory x i , where regression powers will 'interact' with memory parameters, the rates will be slower, and if memory is strong enough relative to the magnitude of the ′ s we may not be able to estimate some ′ s and ′ s , due to domination of the regression component by the error x i .

Inference on second-order properties of long memory stationary processes
Now consider estimating either a full parametric model for the spectrum/autocovariance function of x i , or else a 'semiparametric' one specified for only low frequencies/ long lags. In the parametric case, for short memory time series, early work on asymptotics covered least squares and Yule-Walker estimates of stationary autoregressive (AR) processes. Continuous and discrete frequency Whittle, and Gaussian pseudomaximum likelihood, estimates of ARMA and other short memory time series were well covered by Hannan (1973). He relaxed the commonly imposed iid assumption on ∈(1, 2) (long memory), ∈(2, 2.5), ∑ j∈ℤ 2 j = 0 (negative memory), innovations, allowing them, and centred squared innovations, to be stationary martingale differences (expressing the natural ordering of time series data), with only finite second moments required (for estimation of dependence parameters); in fact stationarity can be relaxed, the squared innovations needing only to be uniformly integrable). His central limit theorem proof was essentially found to work under long memory with differencing parameter d ∈ (0, 1∕4) by Yajima, (1985). Using a different method of proof, Fox and Taqqu (1986) established a central limit theorem valid over the whole stationary long memory interval d ∈ (0, 1∕2). For spatial lattice data, a vital early (short memory) reference is Whittle (1954). Noting that multilateral MA/AR representations are more natural for spatial data than the unilateral or one-sided ones normal in time series, he pointed out identifiability problems with multilateral representations, and extended the (unilateral) Wold representation for purely nondeterministic time series (cf. (1)) to spatial processes, introducing 'half-plane' representations. But a given multilateral ARMA does not necessarily have a neat half-plane representation, where AR or MA orders may be infinite; so this property may have limited practical use. Sometimes, a 'quarter plane' representation is possible. Whittle (1954) and others (e.g., Martin 1979,Tjostheim 1978,Kashyap and Lapsa 1984Huang and Anh 1992;Ma 2003) considered estimation of various short memory parametric spatial models, some extending ARMA in an isotropic or separable way, e.g., for M = 2, where L 1 x t = x i 1 −1,i 2 , L 2 x t = x i 1 ,i 2 −1 and the i are iid, a kind of first-order, unilateral AR in 2 dimensions. Other specifications, such as the Matern model, were considered by Stein (1999).
For asymptotic theory of estimates, one basic question is whether to use increasing domain (as usually in time series) or fixed-domain (infill) asymptotics (or some hybrid of the two, where the domain increases slowly while the sampling interval decays slowly). Since spatial observations can often be thought of as confined to a bounded region, infill asymptotics, as studied by Stein (1999), might seem natural. However, infill asymptotics can lead to results that are not easy to use in practice and even to inconsistency, whereas increasing domain asymptotics often provides a central limit theorem that is convenient for practical use.
Another important issue is the 'edge effect'. Consider estimating the autocovariance j = Cov x i , x i+j of a (zero mean) covariance stationary process x i on an M−dimensional lattice, by This is of great interest because various estimates of parameters, in particular ones based on a Gaussian pseudo-likelihood or approximation thereto (and indeed nonparametric spectral estimates) are essentially functions of the ̂ j . For M = 1 , as in time series, the summation includes n − j points and so (for fixed j), ̂ j has bias O n −1 , which does not prevent a central limit theorem for n 1∕2 ̂ j − j from holding. But for M = 2, and ℕ = 1, n 1∕2 × 1, n 1∕2 the bias is of exact order n −1∕2 so the limit distribution of n 1∕2 ̂ j − j has nonzero mean. And for M = 3 and ℕ = 1, n 1∕3 × 1, n 1∕3 × 1, n 1∕3 the bias is of exact order n −1∕3 ; so, no central limit theorem results, likewise for M > 3 . Thus, the obvious extensions of time series estimates to such spatial lattice data do not work.
The following solutions, have been proposed, all in a short memory context.
1. (Guyon (1982)) Use unbiased estimates of j . However the ̃ j become unstable for large ‖j‖ and a desirable non-negative definiteness property of the ̂ j is lost. 2. (Dahlhaus and Kuensch (1987)) A data taper, of the type extending that used to correct for bias in a variety of time series settings, is applied to x i . 3. (Robinson and Vidal Sanz (2006); Robinson (2007)) Trimming is employed: omitting the ̃ j defined above for large ‖j‖ , the first reference considering a parametric setting, the second a nonparametric one.
Because of the greater importance of long lags, i.e., large ‖j‖, in long memory processes relative to short memory ones, the edge effect seems an even bigger issue under long memory. Nevertheless, a number of parametric long memory stationary linear spatial models, both isotropic and separable, have been considered in the literature, see, e.g., Sethuram and Basawa (1995), Boissy et al. (2005), Shitan (2009), Beran et al. (2009). A simple illustration of a separable long memory spatial parametric model is For cyclic/seasonal long memory models, with a spectral pole at one or more nonzero frequencies, see, e.g., Cisse et al. (2016).
For time series 'semiparametric' estimates of the memory parameter, only local-to-zero assumptions are made about the spectrum, and an increasing number but vanishing fraction of Fourier frequencies near the origin, depending on a user-chosen bandwidth, are used. These have also been extended to spatial lattice data. Log periodogram spatial estimates have been considered by Ghodsi and Shitan (2016), the latter extending asymptotic theory of Robinson (1995a) (though Ghodsi and Shitan (2016) assume a parametric model). Local Whittle or Gaussian semiparametric estimates have been developed by Guo et al. (2009), Durastanti et al. (2014, extending asymptotc theory of Robinson (1995b).
There is considerable scope for for further study of issues of model choice (especially due to the danger of 'curse of dimensionality' in lattice models) and bandwidth choice. Basic asymptotic theory for estimation of second-order features of long memory lattice processes has been developed by Lavancier (2006Lavancier ( , 2007Lavancier ( , 2008, and Lavancier and Philippe (2011).
Sometimes, though observations are made only at discrete points in ℤ M , it is natural to think in terms of an underlying continuous model defined on R M , where R denotes the real line. This topic was long ago extensively discussed in the time series case M = 1 , especially for short memory models. The most popular covariance stationary model here has been expressed as a rational spectral density, of a continuous process x(t), t ∈ R, where a and b are polynomials of degrees r and s, respectively such that r > s (required for integrability of g, equivalently for x(t) to have finite variance), and such that the zeroes of a have negative real parts. When s = 0 , we can think of x(t) as generated by a stochastic differential equation (of order r) driven by 'continuous white noise', that is an unobservable process (t), t ∈ R , having (non-integrable) spectral density the b factor in (4) providing further structure when r > 1 , s > 0 . Suppose that x t = x(t) is observable at t ∈ ℤ only. Then (see, e.g., Bartlett 1946, Walker 1960, Phillips 1959Robinson 1980a) x t has an ARMA(r, r − 1 ) representation. This can be established in more than one way. Note that, if f ( ) denotes the spectral density of x t , and Robinson (1980a) thence used (6) with contour integration/theorem of residues to derive the ARMA(r, r − 1 ) property of x t , also using it to derive the ARMA(r, r) property of the averaged process X t = ∫ t t−1 x(u)du, t ∈ ℤ (as well as analogous properties when the underlying process is skip-sampled, being defined discretely at narrow intervals 1 / k, for an integer k ≥ 2, see also Telser (1967)). Generally, there are efficiency gains if the underlying parameters in a and b are estimated, rather than an unconstrained ARMA. But f ( ) is naturally parameterised in terms of the zeroes of a and b, and when r ≥ 2 its form depends on the extent of any multiplicities in a, whose number is likely unknown, even for given r. Note also that due to the aliasing property (6), a cannot be uniquely estimated from the AR coefficients of x t when r ≥ 2 , and though the presence of b may help, a Gaussian pseudo-likelihood may have multiple peaks from which it is difficult to choose (Robinson 1980b). Generally, corresponding to (6), there are uncountably many continuous processes consistent with a given discrete one, so attempting to estimate a continuous model is highly hazardous. It should be added that though (4) is classical, the model has two parts, the frequency response function b / a and the noise , and either might be specified differently, for example, the aliasing problem is avoided if is band limited to Nyqvist frequency, so instead of (5) leading to a quite different model for the discretely observed x t , see Robinson (1977a), while Robinson (1976a, b) evaluated bias of estimates based on (7) when in fact (4) holds, again using the theorem of residues. Extensions to vector x(t) were developed by Phillips (1974), Robinson (1993). The topic has been rediscovered more recently, with (4) referred to by the acronyms CAR (when s = 0 ) and CARMA (when s > 0).
For long memory, one can consider a continuous process such as with memory parameter d ∈ (0, 1∕2) . Again (6) can be applied but there is now no neat analytic solution characterizing x t and f ( ) (see Comte and Renault 1996;Chambers 1998).
For continuously defined spatial processes that are observed on a lattice, extensions of the above are clearly possible (see, e.g. Matsuda 2019), and analogous issues discussed for the time series case apply.

Nonstationary processes
Really, a time series AR with a unit root is a 'long memory' process because it has even longer memory than a stationary long memory process, and it coincides with a fractional process with memory parameter d = 1 . Asymptotic theory for unit root processes in an AR setting produces nonstandard limit distributiions, but nesting the unit root in the fractional, rather than AR, class leads to standard asymptotics, for example a central limit theorem with n 1∕2 norming and classical local power properties, for memory and other parameter estimates, even under nonstationarity, using either tapering and skipping of frequencies in discrete-frequency Whittle estimation (see Velasco and Robinson 2000), or using conditional sum-of-squares estimation (see Hualde and Robinson 2011). These ideas seem capable of extension to nonstationary fractional processes on a spatial lattice. Kuensch (1987) considered an alternative class of nonstationary spatial processes.
In a semiparametric time series setting, Velasco (1999) studied log-periodogram regression estimation under nonstationarity, using tapering and skipping of frequencies, as did Yajima and Matsuda (2019) for data on a spatial lattice of dimension M = 2 with an underlying continuous nonstationary isotropic process. In econometrics, study of nonstationary time series has much focussed on multivariate processes, and cointegration and fractional cointegration. Clearly, there is scope for corresponding study of multivarate spatial processes.

Irregular spacing
There is a time series literature on missing and irregularly spaced data, for example, Robinson (1977b) considered the underlying continuous model (4) with r = 1 , s = 0 , and showed that irregularly spaced observations have a kind of time-varying AR representation, with unknown parameters estimable by Gaussian pseudo likelihood. Irregular spacing is more likely to be an issue with spatial data. A general approach, for a parametric underlying continuous stationary spatial process, would be to construct a Gaussian pseudo-likelihood, conditional on the observation points. But conditions for asymptotic theory are messy relative to the regularly spaced case, mixing properties of the continuous process x(t) with those of the observation points. Matsuda and Yajima (2009) developed parametric and nonparametric methods and theory in the frequency demain, for spatial data with an underlying model defined on R M . Another approach that could be pursued in an irregularly spaced long memory spatial setting involves averaging over the process generating the observation times as well as over x(t) itself. In particular, in the time series case, Robinson (1980b) considered the model (4) with irregularly spaced observation points k , k ∈ ℤ , generated by a point process, deriving autocovariance properties that are not conditional on the observation times. For example, treating the time intervals as a renewal process, he obtained an ARMA(r, r − 1 ) representation for the observed pseudo-regularly spaced process X k = x k , k ∈ ℤ.
Sometimes irregular spacing is due to missing from a regular lattice. Parzen (1963) proposed an amplitude modulating sequence for missing data in time series x t , t ∈ ℤ , studying the regularly spaced process a t x t , where a t is a (possibly stochastic) zero-one sequence, taking the value 1 when x t , is observed, and 0 when it is missed. Dunsmuir and Robinson (1981) studied Whittle estimation of parametric models using this approach. It seems clear how it could be extended to cover spatial lattice processes with missing observations and long memory.

Adaptive estimation in semiparametric models
As is well known, under regularity conditions Gaussian pseudo-maximum likelihood estimates for parametric models are typically consistent and asymptotically normal with the same asymptotic covariance matrix irrespective of whether or not Gaussianity actually holds. But such estimates are not asymptotically efficient if the proces is non-Gaussian. Considering a semiparametric model, with innovations having distribution of unknown form, it is possible to obtain asymptotically efficient 'adaptive' estimates, using nonparametric smoothing. For long memory time series, this was studied by Hallin and Serroukh (1998) under stationarity, and by Robinson (2005) under stationarity and nonstationarity. Such ideas are certainly extendable to spatial lattice processes.

When ordering does not matter
Unlike with time series data, there is no natural uni-dimensional ordering of spatial data. For a spatial lattice, there is typically an ordering of locations only with respect to each of the dimensions. For inference on many features, such as spatial dependence parameters, respect for relative spatial locations is crucial. But for estimating some 'instantaneous' features, such as location and static parametric regression, and in stochastic design nonparametric regression, ordering can be disregarded and the spatial process x i i ∈ ℕ, mapped arbitrarily into a sequence u i , i = 1, … , n.
We might assume, say (Robinson 2011), the nonstationary infinite MA which can be checked in some spatial model settings. We have 'long memory' if For stochastic design nonparametric (kernel) regression, with spatial data, Robinson (2011) gave conditions for asymptotic normality, based partly on assuming (8) for the error process and on conditions on the distance between multivariate and product univariate densities for the regressors, which can also cover long memory. Such conditions might be extended to other models and methods for spatial data. (8) a ik a jk → ∞, as n → ∞.