A functional central limit theorem on non-stationary random fields with nested spatial structure

In this paper, we establish a functional central limit theorem on high dimensional random fields in the context of model-based survey analysis. For strongly-mixing non-stationary random fields, we provide an upper bound for the fourth moment of the finite population total. This inequality is the generalization of a key tool for proving functional central limit theorems in Rio (Asymptotic theory of weakly dependent random processes, Springer, Berlin, 2017). Under the nested sampling strategy, we introduce assumptions on strongly-mixing coefficients and quantile functions to show that a functional stochastic process asymptotically approaches to a Gaussian process.


Introduction
In survey samplings, the model-based approach models finite populations as outputs from some random process, see for example Brewer (1963), Royall (1970), and the descriptions in Chambers and Clark (2012) and Valliant et al. (2000). A natural model for the case of a spatially based survey is that of a random field, not necessarily stationary, e.g. the study on non-stationary Gaussian random fields with the varying anisotropy property by Fuglstad et al. (2015) and the research on simulations of non-stationary random fields by Emery and Arroyo (2018). Generally, to have an estimation of asymptotically based confidence intervals In memory of Alastair Scott, 1939-2017 on such random fields, we require a suitable central limit theorem, or if a function of means or empirical distribution functions are to be considered, a functional central limit theorem.
There is an extensive literature on functional central limit theorems for dependent random processes and fields. Most of this work concentrates on stationary processes, using a variety of conditions to control the weak dependence of the process. One strand of development does this using uniform mixing e.g. Billingsley (1968), Peligrad (1985), Deo (1975) and Chen (1991) for random fields. Another strand of work assumes positive or negative dependence Newman (1984). A group of papers works with the weaker concept of strongly-mixing, e.g. Doukhan et al. (1994), Doukhan (1995), Rio (2017), Merlevède and Peligrad (2000). Peligrad (1998) uses the maximal coefficient of correlations together with strongly-mixing to prove the invariance principle for stationary random processes. Some of these results have been extended to non-stationary processes, e.g. Matuła and Rychlik (1990) for positively dependent random processes, and Utev and Peligrad (2003) using strongly-mixing and conditions on the maximal coefficient of correlations. However, we have been unable to find any results for general non-stationary random fields with strongly-mixing.
In this paper, by using the approach in Doukhan et al. (1994), we prove a functional central limit theorem under conditions on the strongly-mixing coefficients and the quantile functions of the variables, where the finite population is modelled by a non-stationary strongly-mixing random field. Let X i∈Z d stand for a random field, X i be a random variable, Z d be a high dimensional grid, and Z be the set of integers. We use the subset D n of Z d to index the finite population on the random field X i∈Z d , and |D n |, the cardinal number of the set D n , is used to describe the population size, i.e. X i∈D n stands for the finite population with the size |D n |. The subscript n is used to express the change of subsets. We write S n = i∈D n X i and σ 2 n = Var(S n ). Then, under certain assumptions we will show that where t ∈ [0, 1], G(t) is a Gaussian process on [0, 1], and the symbol D − → is interpreted as weak convergence in the Skorohod space D[0, 1].
After introducing some preliminaries in Sect. 2, we introduce the Skorohod space, our main result, and key tools used in Sect. 3. All proofs are put into Sect. 4.

Mixing coefficients and a central limit theorem
Let X i∈Z d be a zero-mean non-stationary random field satisfying the strongly-mixing condition: α k, (m) → 0 as m → ∞, where α k, (m) is the strongly-mixing coefficient of X i∈Z d , In the above definition, 1 and 2 are subsets of Z 2 , | 1 | = k, | 2 | = , dist( 1 , 2 ) ≥ m, and σ ( ) denotes the sigma-field generated by the random variables X i , i ∈ . We also use α X k, (m) to clarify this coefficient belongs to the random field X i∈Z d .
For strongly-mixing coefficients, we introduce two applications, which are used in our proofs. One of the applications is used to bound the covariance between variables on random fields, see the following lemma: Lemma 2.1 [Theorem 3 of Doukhan (1994)] Let X i∈Z d be a random field, 1 , 2 ⊆ Z d , The other application of strongly-mixing coefficient is to provide the bound for the information of correlations as in the following lemma: Lemma 2.2 Let X t∈Z d be a random field, In our proofs, we also require the following central limit theorem for non-stationary random fields: Theorem 2.1 [Theorem 3.3.1 in Guyon (1995)] Let X i∈Z d be a zero mean random field, not necessarily stationary, and {D n } an increasing sequence of domains. Put S n = i∈Z d X i and σ 2 n = Var(S n ). If the random field satisfies If we assume additionally that then we have S n σ n d − → N (0, 1).

Quantile functions
Following Rio (2017), the quantile function of a random variable X is denoted by Q X . It is the inverse of the non-increasing and left-continuous tail function of |X |. Specifically, Some properties of the quantile function are established in Rio (2017), where they are used to connect the covariance function of the random field to the mixing coefficients. We expand the strongly-mixing coefficient for two random variables X and Y : where σ (X ) and σ (Y ) are the sigma fields generated by X and Y respectively. Then we have the following lemma: Rio (2017)] Let X and Y be integrable real-valued random variables, and α = α(X , Y ). Then If X and Y are complex-valued, the constants 2 and 4 before the integrals are replaced by 8 and 16. Rio (2017)] Let X 1 , · · · , X p be random variables. Then Practically, Lemma 2.4 remains true if the upper limit in the integrals is replaced by any α ∈ [0, 1]: Lemma 2.5 Let X 1 , · · · , X p be random variables. Then, for any α ∈ [0, 1 4 ],

The space D[0, 1]
Let where λ(t) is in the class of strictly increasing, continuous mappings from [0, 1] onto itself, and λ(t) satisfies λ < ∞. The Skorohod topology is defined by the metric d(·, ·), but the space D[0, 1] with metric d is not a complete metric space. However, the metric d 0 generates the same topology as d, and (D[0, 1], d 0 ) is a complete metric space, so we can apply the standard theory of weak convergence in complete metric spaces. For details, see Billingsley (1968), where we can also see a convenient set of sufficient conditions for weak convergence in D[0, 1] is contained in the following theorem:

A moment inequality
Along the line in Rio (2017), to generalize Theorem 2.1 therein, for the strongly-mixing coefficient, α k, (m), we define the inverse function of α k, (m) by Therefore, for any constant By applying the approach in Rio (2017), we prove the following inequality on non-stationary high-dimensional random fields. To simplify the expression, we combine α −1 2,2 (u), α −1 1,3 (u) and α −1 3,1 (u) by defining Theorem 3.2 Let X i∈Z d be a zero-mean random field with strongly-mixing coefficients α k, (m) and let α −1 4 (u) be defined as in (6), and D ⊆ Z d . Then there exists a positive constant K , which depends only on d, such that It is obvious that if (H1) is satisfied and

Nested structure and a functional CLT
To state our functional central limit theorem, we need an assumption on the increasing sequence of domains {D n }, used in the definition of S n . Specifically, we require that there exist constants K 0 > 0 and K 1 > 0, such that, for any n 1 ≤ n 2 , the following inequality is satisfied: where D c n is the complement of D n . In this case, we say that {D n } has a nested spatial structure. For example, in Fig. 1, we use black dots to stand for a random field, the closed solid lines are boundaries of domains, and the dots inside a boundary are used to describe the domain introduced above. In this case, K 0 = 1, K 1 = 2, and dist(D n 1 , D c n 2 ) = 2(n 2 − n 1 ), i.e. as it is shown in Fig. 1, D 1 ⊂ D 2 and dist(D 1 , D c 2 ) = 2. Regarding to the strongly-mixing coefficient in a structured random field, the coefficient between D n 1 and D c n 2 can be specified by and satisfies If we introduce a random field as shown in Fig. 1, and define D 1 as a 3-by-3 grid, D 2 a 6-by-6 grid, · · · , D n a (2 + n 2 )-by-(2 + n 2 ) grid, · · · , then we have |D n | = (2 + n 2 ) 2 , and dist(D n−1 , D c n ) ≥ n, for all n ≥ 1. Furthermore, if we assume |D c n | = ∞, the definition of strongly-mixing coefficient implies Practically, a random field with nested structure can be applied in many scenarios, such as the soil chemical concentration in earth science Christakos (1992), the colorectal cancer in health science Jha et al. (2021), the mobile robotic sensors of artificial intelligent products Nguyen et al. (2021), and so on. However, the nested structure assumes all objects in the random field can be observed at any time, i.e. if an object is observable in a smaller domain, it has to be observable for any wider domains. This assumption could be hard to suit some short lifespan objects such as mayflies researched in Sroka et al. (2021).

Theorem 3.3 Let the zero-mean random field X i∈Z d satisfy (H1), (H2) and (H3) in Theorem 2.1. We also assume
(1) The sequence of domains D n ⊂ Z d has a nested spatial structure, and for any n 1 < n 2 , for each fixed t ∈ [0, 1]; (3) The limit lim n→∞ σ 2 n |D n | exists, and is positive; (6).
In Theorem 3.3, Assumption (1) uses the relationship between dist(D n 1 , D c n 2 ) and D n 1 . Bradley (1993), where the distance m is not related to the size of domains. The nested domains endow a connection between dist(D n−1 , D c n ) and |D n−1 |. It will benefit the checking of the dependency asymptotically. Therefore, in Theorem 3.3, we introduce Assumption (1) rather than assume α ∞,∞ (m) → 0 as m → ∞.
Assumption (2) of Theorem 3.3 reflects that when the cardinal number can be expressed in the way |D n | = O(n d ), the limitation function is just F(t) = t d . Therefore, in the proof of Theorem 3.2, it will directly contribute to verifying the third assumption of Theorem 3.1.
For Assumption (3) of Theorem 3.3, it is a stronger version of condition (H4) of Theorem 2.1. A sufficient condition in the cases d = 1 is that the random field be stationary, and the mixing coefficients decay sufficiently fast, in the sense that if For more details we refer to Lemma 1.1 in Rio (2017). The result of Theorem 18.2.3 in Ibragimov (1971) is also related to the sufficient condition of Assumption (3). A similar result for strongly-mixing coefficients is Lemma 2.2 in Merlevède and Peligrad (2000). However, when d ≥ 2, we have been unable to find such sufficient conditions for Assumption (3) so one is provided in Sect. 4, where the domains are assumed in the form D n = ([0, n]×[0, n])∩Z 2 , and the result therein can be extended to a higher dimension random field.
Inspired by DMR condition used in Corollary 1.2 and Theorem 4.3 in Rio (2017), we introduce Assumption (4) for Theorem 3.3. Because the random field in this paper has no stationary property, we cannot get a unique integral as that in DMR condition. Therefore, we use the supremum to unify the expression. Additionally, since the dimension of random fields in this paper is higher, we have to introduce higher orders for the inverse function and the quantile function in Assumption (4). Reasoning for why a higher order is used can be found in the proof of Theorem 3.2.
Comparing with (H1) and (H3), Assumption (4) uses the information of distances and quantiles of random variables, whereas (H1) and (H3) use the distances and the stronglymixing coefficients. Assumption (4) looks more directly putting restriction on the moments of random variables. However, although the fourth moment is bounded, when δ = 2 in (H3), Assumption (4) may not be satisfied, due to the distance, [α X 4 ] −1 (u), could be unbounded. On the other hand, the bounded product in Assumption (4) does not imply any statements in (H1) or (H3). Therefore, in this present theorem, we use those relative weaker assumptions rather than integrate them into stronger conditions.

Proof of Lemma 2.2
We use the induction method. For r = 1, the result (2) holds. We suppose it holds for r = p, i.e. let Following this, we need to prove it is still true for r = p + 1.
Now let then, by using the monotonicity properties of strongly-mixing coefficients, we have and This completes the proof.
Then the tail function of Z is Now, for any random variables, X 1 and X 2 , we have

Proof of Theorem 3.2
Let {S 1 , S 2 } be a partition of distinct {i, j, k, }, i, j, k, ∈ Z d . Then S 1 and S 2 are two non-empty sets. We define Proof We suppose the contrary of (9). Then, it fall into two cases (a) and (b).
(a) Firstly, we suppose {i, j, k, } are distinct, then there are three cases to consider. (a.1) All of j, k, are more than 3M i jk from i. But this is impossible as the split S 1 = {i} and S 2 = { j, k, } would have a separation greater than M i jk .
(a.2) Two of j, k, are more than 3M i jk from i, say k, . Then the separation between i and k, is more than M i jk . Therefore we must have dist(i, j) ≤ M i jk , or else the separation S 1 = {i} and S 2 = { j, k, } would be more than M i jk . Hence i.e. d( j, k) ≥ 2M i jk . By the same argument, we have d( j, ) ≥ 2M i jk . Thus, the seperation between S 1 = {i, j} and S 2 = {k, } is more than M i jk . This contradicts the definition of M i jk .
(a.3) One of j, k, is more than 3M i jk from i, say . For example, if d(i, ) > 3M i jk , d(i, j) < M i jk and d(i, k) < M i jk , then by the same argument, we have d(i, ) > 2M i jk and d(k, ) > 2M i jk . This contradicts to the definition of M i jk .
Thus all these alternatives are impossible, so that j, k, are within 3M i jk of i. (b) Secondly, we suppose {i, j, k, } are not distinct. We might also have the following cases: (b.1) If only three of i, j, k, are distinct, say i = j, then in (8), we can interpret S 1 and S 2 as the partition of {i, k, }. In this case, with a similar argument, we must have the distance between i and the other two distinct points is less than or equal to 2M i jk .
(b.2) If they are equal in pairs, then we must have the distance between the two distinct points equal to M i jk .
(b.3) If they are all equal, then it becomes trivial. So, in every case, we must have the conclusion of Lemma 4.1.
Returning to the proof of Theorem 3.2, we write Consider partitioning {i, j, k, } into two non-empty subsets, S 1 and S 2 . Let M i jk be defined as in (8).
Case 1: Suppose the maximum distance M i jk occurs at the partition {S 1 , S 2 }, where S 1 contains two elements, and S 2 contains the other two. Without loss of generality, we set S 1 = {i, j} and S 2 = {k, }. Then, by using the inequality abcd ≤ 1 4 (a 4 + b 4 + c 4 + d 4 ), where a, b, c, d ∈ R, Lemmas 2.3 and 2.5 imply Case 2: If M i jk occurs at the partition, where S 1 contains one element, and S 2 contains the other three. Without losing of generality, we set S 1 = {i} and S 2 = { j, k, }. Then, similarly, we have The above two estimations of E(X i X j X k X ) also work for the cases where i, j, k, are not distinct. We note that i, j,k, ∈D Lemma 4.1 and (8) Then by using (4) and (5), Therefore the above discussion implies the upper bound of this fourth moment is not greater than the sum of these two possible cases. It leads to

Proof of Theorem 3.3
Let's introduce a lemma as the preparation of the proof.
Lemma 4.2 Let X n and Y n be two random sequences, with X n − Y n and Y n being asymptotically independent. Also assume that X n has a non-zero characteristic function.
Proof Let ϕ Z be the characteristic function of random variable Z . Now the asymptotic inde- which converges to zero. Since the characteristic function of Y is non-zero, this proves the lemma.
for all n and i = 2, · · · , r . We set Then Assumption (1) implies that, for all i = 1, · · · , r − 1, where H i ∈ B d , d-dimensional Borel sets, we have E i ∈ M i . Lemma 2.2 guarantees the following estimation, Since the strongly-mixing condition implies α X , (m) → 0 as m → ∞, the right-hand side of the above inequality goes zero as n → ∞. Hence G n (t) has asymptotically independent increments. By using Theorem 4.5.3 in Chung (2001), the fact of E(|G n (t)|) ≤ M < ∞, where M is a constant, and the fact of that, for all > 0, there exists λ( , M) such that for any E ∈ σ (G n (t)), if P(E) < λ( , M), then E |G n (t)|dP < , imply G n (t) is uniformly integrable. Then, referring to the arguments in the proof of Theorem 19.2 in Billingsley (1968) and that in the proof of Theorem 4.3 in Rio (2017), Theorem 8.2 in Billingsley (1968) implies G n (t) is tight. Therefore G(t) has independent increments. Its variance is F(t), which is proved by (13) in Part two.
Furhtermore, the covariance matricx of G(t) is positive definite, due to the convariance function of G(t) is that of the Gaussian process (B F(t) ) 0≤t≤1 , where (B t ) 0≤t≤1 is a standard Brownian motion. Therefore, the covariance function of G(t) is C(s, t) = min{F(s), F(t)}.
Part two: checking conditions in Theorem 3.1 We first check the three conditions in Theorem 3.1. For a finite positive integer k, reals 0 = t 0 < t 1 < · · · < t k−1 < t k = 1, and for sufficiently large n, we set To prove convergence of the finite dimensional distributions, it is enough to show that (a) G n (t) → N (0, F(t)) and that (b) For (a), by Theorem 2.1 and assumptions (2) and (3), we have G n (t) = S [nt] σ [nt] σ [nt] |D [nt] where For (b), as in the proof of Theorem 4.3 in Rio (2017), we consider characteristic functions. Let s = (s 1 , s 2 , · · · , s k ) ∈ R k , and Since G(t) has independent increments, (G(t i ) − G(t i−1 )) 1≤i≤k are independent. Therefore, to prove (12), it is sufficient to prove We write For term (IV), by recalling that = n −1/2 , and using the continuity property of characteristic functions, we have (IV) → 0 as n → ∞.
For term (I), using the inequality |e i x − e iy | ≤ |x − y|. we have so that by (H3) and assumption 2, there is a constant C such that Therefore, for all s, we have |ϕ n (s) − ϕ n, (s)| → 0 as n → ∞. For (II), set The random variable Y j is σ ( We set K 3 [n ] K 0 = d n . Then we have d n → ∞ as n → ∞, i.e. K 3 [n ] K 0 → ∞ as n → ∞. Furthermore, since |Y j | = |Y 1 · · · Y k | = 1 and Q |1| (u) = 1, Lemma 2.3 implies that (14) is less than For term (III), it is sufficient to prove that for t 2 > t 1 . Since Theorem 2.1 implies G n (t) → G(t) for any fixed t ∈ [0, 1], by Lemma 4.2 we only need to show that G n (t 2 ) − G n (t 1 ) and G n (t 1 ) are asymptotically independent, i.e. that where ϕ 1 (s), ϕ 2 (s) and ϕ 3 (s) are characteristic functions of G n (t 2 ), G n (t 2 ) − G n (t 1 ) and G n (t 1 ) respectively. Let ϕ * (s) be the characteristic function of G n (t 2 ) − G n (t 1 ) + G n (t 1 − ), and ϕ (s) be the characteristic function of G n (t 1 − ). Then we have For terms (i) and (iii), we use the same arguments as in the proof of (I). We have For (ii), arguing as above yields Hence, the first condition of Theorem 3.1 is satisfied.
To check the second condition of Theorem 3.1, we will show that G(t) has continuous sample paths with probability one. For t > s, we have a constant K such that where the last inequality follows from the fact that log x < √ x for x > 1. Hence by using Theorem 3.4.1 in Adler (2010), G(t) has continuous sample paths almost surely.
Finally, we check the third condition of Theorem 3.1. We set γ = 2, α = 1, then for t 1 < t < t 2 , Theorem 3.2 and Assumption 4) imply Therefore the third condition of Theorem 3.1 is satisfied. This completes the proof.

Then
Cov i∈D n −D n−1 X i , i∈D n−1 and Var i∈D n −D n−1 Cov(X , j , X 0,n ).
But assumption (15) implies the series n−1 j=0 n−1 =0 Cov(X l j , X 0,n ) converges absolutely, since, by using Lemma 2.1, there exists a constant K such that Therefore, (18) converges to a limit.

The contribution to v n n is
Var(X 0,0 ) + 2 n−1
The same arguments imply the absolute convergence of the above last two terms, and apply to those terms in (16) and (17), so that v n n converges to a value v. Put w n = v n n . Now Var(S n ) |D n | = 1 n 2 (v 1 + · · · + v n ) = 1 n (w 1 , + · · · + w n ), which also converges to v.