1 Introduction

In the past few years, much interest has been paid to the study of random fields over graphs. It has been driven by the growing needs for both theoretical and practical results for data indexed by graphs. On the one hand, the definition of graphical models by Darroch et al. [9] fostered new interest in Markov fields, and many tools have been developed in this direction (see, for instance [25, 26]). On the another hand, the industrial demand linked to graphical problems has risen with the apparition of new technologies. In very particular, the Internet and social networks provide a huge field of applications, but biology, economy, geography or image analysis also benefit from models taking into account a graph structure. For a general review of inference of graphs we refer for instance to [15, 16] and references therein.

The analysis of road traffic is at the root of this work. Actually, prediction of road traffic deals with the forecast of speed of vehicles which may be seen as a spatial random field over the traffic network. Some work has been done without taking into account the particular graph structure of the speed process (see for example [11, 18] for related statistical issues). In this paper, we build a new model for Gaussian random fields over graphs and study statistical properties of such stochastic processes.

A random field over a graph is a spatial process indexed by the vertices of a graph, namely \((X_i)_{i \in G}\), where \(G\) is a given graph. Many models already exist in the probabilistic literature, ranging from geostatistical processes, Markov fields to autoregressive processes. Graphical models are defined as Markov fields (see for instance  [15]), with a particular dependency structure, built by specifying a dependency structure for \(X_i\) and \(X_j\), conditionally to the other variables, as soon as the locations \(i \in G\) and \(j \in G\) are connected. For graphical models, we refer for instance to  [9] and references therein. Generalizations to the graph case exist also for some processes such as, for instance, autoregressive models on \(\mathbb Z ^d\) (see [15]). Finally, geostatistical processes are defined modeling directly the correlation between observations at nodes \(i\) and \(j\) using a graph based distance. We refer to [6] and references therein.

The usual purpose of graphical models is to design an underlying graph which reflects the dependency of the data and use it for statistical inference. Indeed, this methodology aims at building a graph of conditional correlations which helps understanding the relationships between high complex data (for instance for biological purpose or inference in social networks). For a known graph, the purpose is to estimate the dependency structure, see for instance [23].

Our approach differs since, in our case, the graph is known, and we aim at designing a large class of random processes yet enjoying furthermore a stationary property. These processes will be models for velocity fields of vehicles, whose correlations depend only on the local structure of the network. This assumption of stationarity is commonly accepted among professionals of road trafficking naming it as the road capacity. Moreover these processes must be easy to handle to be used at a large scale.

In this paper, we extend some classical results from time series to spatial fields over general graphs and provide a new framework to define a class of stationary Gaussian processes on graphs. For this, we will make use of spectral analysis and extend to our framework some classical results of time series. In particular, the notion of spectral density may be extended to graphs. This will enable us to construct a maximum likelihood estimate for parametric models of spectral densities. This also leads to an extension of the Whittle’s approximation (see [2, 13]). Actually, many extensions of this approximation have been performed, even in non-stationary cases (see [8, 12, 21]). The extension studied here concerns general processes over graphs. We point out that we will compare throughout all the paper our new framework with the case \(G= \mathbb Z ^d, d \ge 1\).

Section 2 is devoted to some definitions for spectral analysis on graphs. Then we provide in Sect. 3 a general construction of stationary processes indexed by a graph. These models depend on parameters for which we provide estimators in Sect. 4. Some simulations are provided in Sect.  5. The last section, Sect. 6, provides all necessary tools to prove the main theorems, in particular Szegö’s Lemmas for graphs are given in Sect. 6.1, while the proofs of the technical Lemmas are postponed in Sect. 6.3.

2 Definitions for spectral analysis on graphs

In the whole paper, we will consider a Gaussian spatial process \((X_i)_{i \in G}\) indexed by the vertices of an infinite undirected weighted graph.

We will call \(\mathbf G =(G,W)\) this graph, where

  • \(G\) is the set of vertices. \(\mathbf G \) is said to be infinite as soon as \(G\) is infinite (but countable).

  • \(W \in [-1,1]^{G\times G}\) is the symmetric weighted adjacency operator. That is, \( |W_{ij}|\ne 0\) when \(i\in G\) and \(j \in G\) are connected.

We assume that \(W\) is symmetric (\(W_{ij}=W_{ji},\; i,j\in G\)) since we deal only with undirected graphs. For any vertex \(i \in G\), a vertex \(j \in G\) is said to be a neighbor of \(i\) if, and only if, \(W_{ij} \ne 0\). The degree \(\text{ deg }(i)\) of \(i\) is the number of neighbors of the vertex \(i\), and the degree of the graph \(\mathbf G \) is defined as the maximum degree of the vertices of the graph \(\mathbf G \):

$$\begin{aligned} \text{ deg }(\mathbf G ) := \max _{i \in G} \text{ deg }(i). \end{aligned}$$

From now on, we assume that the degree of the graph \(\mathbf G \) is bounded:

$$\begin{aligned} \text{ deg }(\mathbf G )< + \infty . \end{aligned}$$

Assume now that \(W\) is renormalized: its entries belong to \([-\frac{1}{\text{ deg }(\mathbf G )},\frac{1}{\text{ deg }(\mathbf G )}]\). This is not restrictive since re-normalizing the adjacency operator does not change the objects introduced later. In particular, the spectral representation of Hilbertian operator is not sensitive to a renormalization.

Notice that in the classical case \(G=\mathbb Z \), the renormalized adjacency operator is

$$\begin{aligned} W^{(\mathbb Z )}_{ij}=\frac{1}{2}1\!\!1_{\{|i-j|=1\}}, (i,j\in \mathbb Z ). \end{aligned}$$
(1)

Here, \(\text{ deg }(\mathbb Z ) = 2\). This case will be used in all the paper as an illustration example.

We denote by \(B_G\) the set of all bounded Hilbertian operators on \(l^2(G)\) (the set of square sommable real sequences indexed by \(G\)).

To introduce the spectral decomposition, consider the action of the adjacency operator on \(l^2(G)\) as

$$\begin{aligned} \forall u \in l^2(G), (Wu)_i := \sum _{j \in G} W_{ij} u_j, (i\in G). \end{aligned}$$

The operator space \(B_G\) will be endowed with the classical operator norm

$$\begin{aligned} \forall A \in B_G, \left\| A \right\| _{2,op}: = \sup _{u \in l^2(G), \left\| u\right\| _2 \le 1} \left\| Au \right\| _2 , \end{aligned}$$

where \(\left\| . \right\| _2\) stands for the usual norm on \(l^2(G)\).

Notice that, as the degree of \(\mathbf G \) and the entries of \(W\) are both bounded, \(W\) lies in \(B_{G}\), and we have

$$\begin{aligned} \left\| W \right\| _{2,op} \le 1. \end{aligned}$$

Recall that for any bounded Hilbertian operator \(A \in B_G\), the spectrum \(\text{ Sp }(A)\) is defined as the set of all complex numbers \(\lambda \) such that \(\lambda \text{ Id }- A\) is not invertible (here \(\text{ Id }\) stands for the identity on \(l^2(G)\)). Since \(W\) is bounded and symmetric, \(\text{ Sp }(W)\) is a non-empty compact subset of \(\mathbb R \) [22].

We aim now at providing a spectral representation of any bounded normal Hilbertian operator. For this, first recall the definition of a resolution of identity (see for example [22]):

Definition 2.1

Let \(\mathcal{M }\) be a \(\sigma \)-algebra over a set \(\Omega \). We call identity resolution (on \(\mathcal{M }\)) a map

$$\begin{aligned} E : \mathcal{M } \rightarrow B_G \end{aligned}$$

such that,

  1. 1.

    \(E(\emptyset {}) = 0, E(\Omega )= I\).

  2. 2.

    For any \(\omega \in \mathcal{M }\), the operator \(E(\omega )\) is a projection operator.

  3. 3.

    For any \(\omega ,\omega ^{\prime } \in \mathcal{M }\), we have

    $$\begin{aligned} E(\omega \cap \omega ^{\prime }) =E(\omega )E(\omega ^{\prime })=E(\omega ^{\prime })E(\omega ). \end{aligned}$$
  4. 4.

    For any \(\omega ,\omega ^{\prime } \in \mathcal{M }\) such that \(\omega \cap \omega ^{\prime } = \emptyset \), we have

    $$\begin{aligned} E(\omega \cup \omega ^{\prime }) = E(\omega )+E(\omega ^{\prime }). \end{aligned}$$

We can now recall the fundamental decomposition theorem (see for example [22])

Theorem 2.1

(Spectral decomposition) If \(A \in B_G\) is symmetric, then there exists a unique identity resolution \(E\) over all Borelian subsets of \(\text{ Sp }(A)\), such that

$$\begin{aligned} A = \int \limits _{\mathrm{{Sp}}(A)} \lambda \mathrm{d } E (\lambda ). \end{aligned}$$

From the last theorem, we obtain the spectral representation of the adjacency operator \(W\) thanks to an identity resolution \(E\) over the Borelians of \(\text{ Sp }(W)\)

$$\begin{aligned} W = \int \limits _{\text{ Sp }(W)} \lambda \mathrm{d } E (\lambda ). \end{aligned}$$

Obviously, we have

$$\begin{aligned} W^k = \int \limits _{\mathrm{{Sp}}(W)} \lambda ^k \mathrm{d } E (\lambda ), k \in \mathbb N . \end{aligned}$$

Define now, for any \(i \in G\), the sequences \(\delta _i\) in \(l^2(G)\) by

$$\begin{aligned} \delta _i := (1\!\!1_{k = i})_{k \in G}. \end{aligned}$$

For any \(i,j \in G\), the sequences \(\delta _i\) and \(\delta _j\) define the real measure \(\mu _{ij}\) by

$$\begin{aligned} \forall \omega \subset \text{ Sp }(W), \mu _{ij}(\omega ) : = \langle E(\omega )\delta _i,\delta _j\rangle _{l^2(G)}. \end{aligned}$$

Hence, we can write:

$$\begin{aligned} \forall k \in \mathbb N , \forall i,j \in G, \left( W^k\right) _{ij} = \int \limits _{\text{ Sp }(W)} \lambda ^k \mathrm{d }\mu _{ij}. \end{aligned}$$

This family of measures \(\mu _{ij},i,j \in G\) will be used in the whole paper. They convey both spectral information of the adjacency operator, and combinatorial information on the number of path and loops in \(\mathbf G \). Indeed, the quantity \(\left( W^k\right) _{ij}\) is the number of path (counted with their weights) going from \(i\) to \(j\) with length \(k\). Note also that all diagonals measures \(\mu _{ii}, i \in G\) are probability measures.

In the usual case of \(\mathbb Z \), an explicit expression for \(\mu _{ij}\) can be given. Denote \(T_{k}(X)\) the \(k\text{ th }\)-Chebychev polynomial (\(k \in \mathbb N \)). We can provide the spectral decomposition of \(W^{(\mathbb Z )}\) (\(W^{(\mathbb Z )}\) has been defined in Eq. 1).

$$\begin{aligned} \forall i,j \in \mathbb Z , \left( \left( W^{(\mathbb Z )}\right) ^k\right) _{ij} = \frac{1}{\pi } \int \limits _{[-1,1]} \lambda ^k \frac{T_{\left| j-i\right| }(\lambda )}{\sqrt{1- \lambda ^2}} \mathrm{d }\lambda . \end{aligned}$$

We point out that the spectrum of \(\left( W^{(\mathbb Z )}\right) ^k\) is \([-1,1].\) This shows that, in this case, and for any \(i,j \in G\), the measure \(\mathrm{d }\mu _{ij}\) is absolutely continuous with respect to the Lebesgue measure, and its density is given by

$$\begin{aligned} \frac{\mathrm{d }\mu _{ij}}{\mathrm{d }\lambda } = \frac{1}{\pi }\frac{T_{\left| j-i\right| }(\lambda )}{\sqrt{1- \lambda ^2}}. \end{aligned}$$

Notice that we recover the usual spectral decomposition pushing forward \(\mu _{ij}\) by the cosine function:

$$\begin{aligned} \forall i,j \in G, \mathrm{d }\hat{\mu }_{ij}(t) := \frac{1}{2\pi } \cos \left( (j-i)t \right) \mathrm{d }t. \end{aligned}$$

We get

$$\begin{aligned} \forall i,j \in \mathbb Z , \left( \left( W^{(\mathbb Z )}\right) ^k\right) _{ij} = \int \limits _{[0,2 \pi ]} \cos (t)^k \mathrm{d }\hat{\mu }_{ij}(t). \end{aligned}$$

3 General definition of stationary processes on graphs

3.1 Spectral representation of time series

Our aim is to study some kind of stationary processes indexed by the vertices \(G\) of the graph \(\mathbf G \). To begin with, let us recall the usual case of \(\mathbb Z \). In particular, let us introduce Toeplitz operators associated to stationary time series.

Let \(\mathbf X = (X_i)_{i \in \mathbb Z }\) be a strongly stationary Gaussian process indexed by \(\mathbb Z \). Since \(\mathbf X \) is Gaussian, strong stationarity is equivalent to second order stationarity, that is, \(\forall i,k \in \mathbb Z , \text{ Cov }(X_i, X_{i+k}) \) does not depend on \(i\). Thus, we can define

$$\begin{aligned} r_k := \text{ Cov }(X_i, X_{i+k}). \end{aligned}$$

Assume further that \((r_k)_{k \in \mathbb Z } \in l^1(\mathbb Z )\). This leads to a particular form of the covariance operator \(\Gamma \) defined on \(l^2(\mathbb Z )\) by

$$\begin{aligned} \forall i,j \in \mathbb Z , \quad \Gamma _{ij} := r_{i-j}. \end{aligned}$$

Recall that \(B_\mathbb{Z }\) denotes here the set of bounded Hilbertian operators on \(l^2(\mathbb Z )\). Notice that, since \((r_k)_{k \in \mathbb Z } \in l^1(\mathbb Z )\), we have \(\Gamma \in B_\mathbb Z \) (see for instance [5] for more details). This bounded operator is constant over each diagonals, and is therefore called a Toeplitz operator (see also [4] for a general introduction to Toeplitz operators).

As \((r_k)_{k \in \mathbb Z } \in l^1(\mathbb Z )\), we have

$$\begin{aligned} \forall i,j \in \mathbb Z , \mathcal{T }(g)_{ij}:=\Gamma _{ij} = \frac{1}{2\pi }\int \limits _{[0,2 \pi ]} g(t) \cos \left( (i-j)t\right) \mathrm{d }t, \end{aligned}$$

where \(g\) is the spectral density of the process \(\mathbf X \), defined by

$$\begin{aligned} g(t) := 2\sum _{k \in \mathbb N ^*} r_k \cos (kt)+r_0. \end{aligned}$$

This expression can be written, using the Chebychev polynomials \((T_k)_{k \in \mathbb N }\),

$$\begin{aligned} g(t) := 2\sum _{k \in \mathbb N ^*} r_k T_k\left( \cos (t)\right) +r_0 T_0\left( \cos (t)\right) \! . \end{aligned}$$

Let, for \(\lambda \in [-1,1]\),

$$\begin{aligned} f(\lambda ) := 2\sum _{k \in \mathbb N ^*} r_k T_k(\lambda )+r_0 T_0(\lambda ). \end{aligned}$$
(2)

We get, using the family \((\hat{\mu }_{ij} )_{i,j \in \mathbb Z }\) defined above,

$$\begin{aligned} \forall i,j \in \mathbb Z , \quad \Gamma _{ij} = \int \limits _{[0,2 \pi ]} f\left( \cos (t)\right) \mathrm{d }\hat{\mu }_{ij}(t). \end{aligned}$$

Notice that the last expression may also be written as \(\Gamma = f(W^{(\mathbb Z )}) \), and the convergence of the operator valued series defined by Eq. 2 is ensured by the boundedness of \(W^{(\mathbb Z )}\) and of the Chebychev polynomials (\(T_k([-1,1]) \subset [-1,1], \forall k \in \mathbb Z \)), together with the summability of the sequence \((r_k)_{k \in \mathbb Z }\). For \(p\le +\infty \), we will extend usual \(MA_p\) processes to any graph, using this previous remark. This will be the purpose of Sect. 3.2.

Let us recall some properties about the moving average representation \(MA_\infty \) of a process on \(\mathbb Z \). This representation exists as soon as the \(\log \) of the spectral density is integrable (see for instance [5]). In this case, there exists a sequence \((a_k)_{k \in \mathbb N }\), with \(a_0 = 1\), and a Gaussian white noise \(\mathbf \epsilon = (\epsilon _k)_{k \in \mathbb Z }.\), such that the process \(\mathbf X \) may be written as

$$\begin{aligned} \forall i \in \mathbb Z , \quad X_i = \sum _{k \in \mathbb N } a_k \epsilon _{i-k}. \end{aligned}$$

Defining the function \(h\) over the unit circle \(\mathcal{C }\) by

$$\begin{aligned} \forall x \in \mathcal{C }, \quad h(x) = \sum _{k\in \mathbb N } a_k x^k, \end{aligned}$$

we recover, with a few computations, the spectral decomposition of the covariance operator \(\Gamma \) of \(\mathbf X \):

$$\begin{aligned} \forall i,j \in \mathbb Z , \quad \Gamma _{ij} = \int \limits _{[0,2 \pi ]} \left| h(e^{it})\right| ^2 \mathrm{d }\hat{\mu }_{ij}(t). \end{aligned}$$

This implies the equality

$$\begin{aligned} f\left( \cos (t)\right) = \left| h(e^{it})\right| ^2. \end{aligned}$$

Recall that when \(h\) is a polynomial of degree \(p\) (with non null first coefficient), the process is said to be \(MA_p\). In this case, \(f\) is also a polynomial of degree \(p\). Reciprocally, if \(f\) is a real polynomial of degree \(p\), and as soon as \(f\left( \cos (t)\right) \) is even, and non-negative for any \(t \in [0,2\pi ]\), the Fejér-Riesz theorem provides a factorization of \(f\left( \cos (t)\right) \) such that \(f\left( \cos (t)\right) = \left| h(e^{it})\right| ^2\) (see for instance [17]). This proves that \(\mathbf X \) is \(MA_p\) if, and only if, its covariance operator may be written \(f(W^{(\mathbb Z )})\), where \(f\) is a polynomial of degree \(p\). This remark is fundamental for the construction we provide in the following section (see Definition 3.1).

3.2 Graph analytical type process

In this section, we will define moving average and autoregressive processes over the graph \(\mathbf G \).

As explained in the last section, since \(W\) is bounded and self-adjoint, \(\text{ Sp }(W)\) is a non-empty compact subspace of \(\mathbb R \), and \(W\) admits a spectral decomposition thanks to an identity resolution \(E\), given by

$$\begin{aligned} W = \int \limits _{\text{ Sp }(W)} \lambda \mathrm{d } E (\lambda ). \end{aligned}$$

We define here \(MA\) and \(AR\) Gaussian processes, with respect to the operator \(W\), by defining the corresponding classes of covariance operators, since the covariance operator fully characterizes any Gaussian process.

Definition 3.1

(Graph Analytical Model) Let \((X_i)_{i \in G}\) be a Gaussian process, indexed by the vertices \(G\) of the graph \(\mathbf G \), and \(\Gamma \) its covariance operator.

If there exists an analytic function \(f\) defined on the convex hull of \(\text{ Sp }(W)\), such that

$$\begin{aligned} \Gamma =\int \limits _{\text{ Sp }(W)} f(\lambda ) \mathrm{d } E (\lambda ), \end{aligned}$$

we will say that \(X\) is

  • \(MA_q\) if \(f\) is a polynomial of degree \(q\).

  • \(AR_p\) if \(\frac{1}{f}\) is a polynomial of degree \(p\) which has no root in the convex hull of \(\text{ Sp }(W)\).

  • \(ARMA_{p,q}\) if \(f = \frac{P}{Q}\) with \(P\) a polynomial of degree \(p\) and \(Q\) a polynomial of degree \(q\) with no roots in the convex hull of \(\text{ Sp }(W)\).

Otherwise, we will talk about the \(MA_\infty \) representation of the process \(\mathbf X \). We call \(f\) the spectral density of the process \(\mathbf X \), and denote its corresponding covariance operator by

$$\begin{aligned} \Gamma = \mathcal{K }(f). \end{aligned}$$

Remark

Actually, this last construction may also be understood as

$$\begin{aligned} \Gamma = \mathcal{K }(f) = f(W), \end{aligned}$$

in the sense of normal convergence of the associated power series. However, the spectral representation will be useful in the following. Even if we consider only very regular functions \(f\) in this works, the definition using the spectral representation allows weaker regularity than the definition using the normal convergence of the associated power series.

The notation \(\mathcal{K }(.)\) has to be understood by analogy with the notation \(\mathcal{T }(.)\) used for Toeplitz operators.

Notice that, in the usual case of \(\mathbb Z \), and for finite order \(ARMA\), we recover the usual definition as shown in Sect. 3.1. So, the last definition may be seen as an extension of isotropic \(ARMA\) for any graph \(\mathbf G \). Besides, note that this extension is given by the equivalence, for any \(g \in \mathbb L ^2\left( [0, 2\pi ]\right) \), such that \(\int _{[0, 2\pi ]} \log g(\lambda )d\lambda <+ \infty \),

$$\begin{aligned} \forall f \in \mathbb L ^2([-1,1]), \left( g = f\left( \cos (t)\right) \Leftrightarrow \mathcal{T }(g) = \mathcal K (f)\right) \!. \end{aligned}$$

This means that, in the usual case \(\mathbf G =\mathbb Z \), the definition of spectral density in our framework is the usual one, up to an change of variable \(\lambda = \cos (t)\) (see Sect. 3.1).

Now, we get a representation of moving average processes over any graph \(\mathbf G \). In Sect. 4 we will give the main result of this paper. It deals with the maximum likelihood identification.

3.3 Intuitions behind graph analytical type processes

We provided in the previous section a new frame to build processes indexed on graphs. Driven by our application to road trafficking, our aim was twofold:

  • First, being able to model a process on a graph which inherit its structure with some stationarity properties since the behavior of the car velocities process depends mainly on the local structure of the road network, and not on its position or orientation. we would like to consider kind of stationary and isotropic models. For a regular lattice \(\mathcal{G } = (G,W)\), a Gaussian random field \((X_i)_{i \in G}\) is said to be stationary and isotropic if its covariance operator \(K\) verify \(K_{i,j} = K_{\sigma (i)\sigma (j)} \) for any \(i,j\) in \(G\) and any automorphism \(\sigma \). Let us recall that an automorphism is defined as a permutation \(\sigma \) on the vertices set \(G\) of \(\mathcal{G }\) that let invariant the adjacency operator: \(\forall i,j \in G, W_{ij} = W_{\sigma (i)\sigma (j)}\).

  • Second, in order to deal with a large number of observations at a large scale, we looked for a model involving few computational issues. Hence, we considered processes having spectral density lying in a parametric model, with few parameters. We would also like that the model provides a spectral representation of the process, to get an easy way to build definite positive operators.

Hereafter, we compare these processes with other models of processes on graph that have been developed in the literature.

Random processes on graphs have been built by considering spatial correlations between observations at proximal locations, see for instance in [6] for a review. For this, a structure (represented by a function \(\phi \)) for the covariance operator is chosen such that

$$\begin{aligned} \forall i,j \in G, K_{ij} = \phi (d(i,j)). \end{aligned}$$

These models are quite close to our point of view. However, they do not provide in general a spectral representation, and conditions for positiveness of the operator may be difficult to obtain.

Modeling dependency of observations on a graph is also the purpose of Gaussian Markov graphical model (see for example in [23]). A Gaussian process \((X_i)_{i \in G}\) indexed by the vertices of a graph \(\mathcal{G }=(G,W)\) is such that the inverse of the associated covariance operator (the so-called precision operator), \(Q = \Gamma ^{-1}\) satisfies

$$\begin{aligned} Q_{ij} \ne 0 \Leftrightarrow i \sim j. \end{aligned}$$

In others words, for \(i,j\in G\), \(X_i\) and \(X_j\) are conditionally independent given \((X_k)_{ k \ne i,j}\), whenever \(i\) and \(j\) are not neighbors. Indeed, \(Q_{ij}\) is the conditional covariance between \(X_i\) and \(X_j\) given \((X_k)_{k \ne i,j}\). In our framework, we can provide in some cases an interpretation of the dependency structure that we have designed. For a Graph-\(AR_p\) process introduced in Definition 3.1, the precision operator is \(Q = P(W)\) for \(P\) a polynomial of degree \(p\). So the conditional independence of \(X_i\) and \(X_j\) given \((X_k)_{k \ne i,j}\) holds as soon as \(d(i,j) >p\). This means that the particular Graph-\(AR_p\) model may be seen as a Gaussian graphical model. The underlying graph is obtained by drawing an edge between two vertices \(i\) and \(j\) as soon as \(d(i,j) <p\). Reciprocally, if the precision operator \(Q\) of a Gaussian graphical model \((X_i)_{i \in G}\) has entries lying in \([-1,1]\) then it is a Graph-\(AR_1\) process with underlying graph \((G,Q)\). In this case, the spectral density (see Definition 3.1) is given by \(x^{-1}\).

Note also that \(ARMA\) processes have also been built on regular lattices for example in [15]. On \(\mathbb Z \), \(MA_q\) (resp. \(AR_p\)) processes have a spectral density that is a polynomial of degree \(q\) (resp. the inverse of a polynomial of degree \(p\)) and reciprocally, processes with a rational spectral density have such an autoregressive representation. This property is not true anymore on \(\mathbb Z ^d\) since a process with a rational spectral density may not have an \(ARMA\) representation. Hence, we provided a construction of graph analytical processes and our terminology introduced in Definition 3.1 can be seen as a natural extension of the simple case of \(\mathbf G = \mathbb Z \). The extension is performed using natural spectral representation and operator theory. It allows both to build general parametric models for Gaussian processes indexed by a graph and to give sharp results on its asymptotic statistical properties.

4 Convergence of maximum approximated likelihood estimators

4.1 Parametric maximum likelihood estimation of the density of a graph ARMA process

The aim of this section is the parametric inference for processes introduced in Definition 3.1. For this, we will generalize the Whittle method that is useful in time series paradigm.

The data is observed on \((\mathbf G _n)_{ n \in \mathbb N }\), a growing sequence of finite nested subgraphs. This means that if \(\mathbf G _n = (G_n,W_n)\), we have \(G_n \subset G_{n+1} \subset G\) and that for any \(i,j \in G_n\), it holds that \(W_n(i,j) = W(i,j)\). Let \(m_n = \text{ Card }(G_n)\). The sequence \((m_n)_{n \in \mathbb Z }\) may actually be seen as the “volume” of the graph \(\mathbf G _n\).

We assume that the density of the process belongs to a parametric family of densities indexed by a parameter \( \theta \in \Theta \), a compact interval of \(\mathbb R \). We point out that if for sake of simplicity, we choose a one-dimensional parameter space \(\Theta \), all the results could be easily extended to the case \(\Theta \subset \mathbb R ^k, k \ge 1\). Define \(\mathcal{F }\) as the set of positive analytic functions over the convex hull of \(\text{ Sp }(W)\).

In this framework, we consider a parametric family of functions \((f_\theta )_{ \theta \in \Theta }\) in \(\mathcal{F }\). They define a parametric set of covariances on \(G\) (see Sect. 3.2) by considering

$$\begin{aligned} \mathcal{K }(f_\theta ) = f_\theta (W). \end{aligned}$$

Let \(\theta _0 \in \mathring{\Theta }\) such that the process \(\mathbf X \) is centered Gaussian process over \(\mathbf G \) with covariance operator \(\mathcal{K }(f_{\theta _0})\) (see Sect. 3.2). Our aim is to compute an estimator of \(\theta _0\) based on the observations \(X_n = (\mathbf X _i)_{i\in G_n}\).

In this frame, we get that

$$\begin{aligned} X_n \sim \mathcal{N }\left( 0,\mathcal{K }_n(f_{\theta _0}) \right) , \end{aligned}$$

where \(\mathcal{K }_n(f_{\theta _0})\) is the covariance matrix of the vector \(X_n\). In the case \(G=\mathbb Z \), \((X_i)_{i \in \mathbb Z }\) is a Gaussian time series with spectral density \(f_{\theta _0}\). If we observe \(\mathbf X _n:= (X_i)_{i = 1, \ldots n}, n>0\), we can define the maximum log-likelihood estimate \(\hat{\theta }_n\) of \(\theta _0\) by

$$\begin{aligned} \hat{\theta }_n := \arg \max L_n(\theta ,\mathbf X _n), \end{aligned}$$

where

$$\begin{aligned} L_n (\theta ,\mathbf X _n) :=-\frac{1}{2} \left( n \log (2 \pi ) + \log \det \left( \mathcal{T }_{n}(f_\theta )\right) + \mathbf X _{n }^T \big (\mathcal{T }_{n}(f_\theta )\big )^{-1}\mathbf X _{n} \right) . \end{aligned}$$

Here, \(\mathcal{T }_{n}(f_\theta )\) denotes the Toeplitz matrix associated with the function \(f_\theta \). This estimator is consistent as soon as the spectral densities are regular enough, and under assumptions on the function \(\theta \mapsto f_\theta \) (see for instance [2]). However, in practical situations, it is hard to compute. For this, the Whittle’s estimate is considered by maximizing an approximation of the likelihood instead of the likelihood itself

$$\begin{aligned} \tilde{\theta }_n := \arg \max \tilde{L}_n(\theta ,\mathbf X _n), \end{aligned}$$

where

$$\begin{aligned} \tilde{ L}_n (\theta ,\mathbf X _n) :=-\frac{1}{2} \left( n \log (2 \pi ) +n \int \limits _{[0,2\pi ]} \log \left( f_\theta (\lambda ) \right) \mathrm{d }\lambda + \mathbf X _{n }^T \mathcal{T }_{n}\left( \frac{1}{f_\theta }\right) \mathbf X _{n} \right) . \end{aligned}$$

The Whittle estimate is also consistent and asymptotically normal and efficient, as soon as the spectral densities are regular enough.

In this paper, we generalize this methodology to the graph ARMA processes. The corresponding log-likelihood at \(\theta \) is

$$\begin{aligned} L_n (\theta ) :=-\frac{1}{2} \left( m_n \log (2 \pi ) + \log \det \left( \mathcal{K }_{n}(f_\theta )\right) + X_{n }^T \big (\mathcal{K }_{n}(f_\theta )\big )^{-1}X_{n} \right) . \end{aligned}$$

Then we consider the two following approximations of the log-likelihood which consist first in replacing

$$\begin{aligned} \frac{1}{m_n}\log \det \left( \mathcal{K }_{n}(f_\theta )\right) \quad \mathrm{by} \quad \int \log \left( f_\theta \left( x\right) \right) \mathrm{d }\mu (x) ; \end{aligned}$$

second in replacing \(\left( \mathcal{K }_{n}(f_\theta )\right) ^{-1}\) by \( \mathcal{K }_{n}\left( \frac{1}{f_\theta }\right) \). This gives rise to the corresponding functions:

$$\begin{aligned} \bar{L}_n (\theta )&:= -\frac{1}{2}\left( m_n \log (2 \pi ) + m_n \int \log (f_\theta (x)) \mathrm{d }\mu (x) + X_{n }^T \left( \mathcal{K }_{n}(f_\theta )\right) ^{-1}X_{n} \right) .\\ \tilde{L}_n (\theta )&:= -\frac{1}{2}\left( m_n \log (2 \pi ) + m_n \int \log (f_\theta (x)) \mathrm{d }\mu (x) + X_{n }^T \left( \mathcal{K }_{n}\left( \frac{1}{f_\theta }\right) \right) X_{n} \right) . \end{aligned}$$

In Sect. 4.2, we prove the consistency of the estimators \((\hat{\theta }_n)_{n \in \mathbb N }\), \((\bar{\theta }_n)_{n \in \mathbb N }\), \((\tilde{\theta }_n)_{n \in \mathbb N }\), defined as the maximum log-likelihood estimators, maximizer of respectively \(L_n (\theta ) ,\,\bar{L}_n (\theta )\), \(\tilde{L}_n (\theta )\).

Notice that approximated maximum likelihood estimators are not asymptotically normal in general (see for instance [14] for \(\mathbb Z ^d\)). Indeed, the score associated to the approximated \(\log \)-likelihood has to be asymptotically unbiased at an adequate rate [2]. A solution to this problem in \(\mathbb Z ^d\) is to use the tapered periodogram (see [7, 14, 15]).

We provide in the following a generalization to the graph processes for the following two cases.

  • The \(MA_P\) case: There exists \(P>0\) such that the true spectral density \(f_{\theta _0}\) is a polynomial of degree bounded by \(P\).

  • The \(AR_P\) case: There exists \(P>0\) such that all the spectral densities (for any \(\theta \in \Theta \)) of the parametric set are such that \(\frac{1}{f_\theta }\) is a polynomial of degree bounded by \(P\).

To achieve a good approximated \(\log \)-likelihood, we first introduce the unbiased periodogram in each of these cases. Now, let \(P>0\).

Define a subset \(V_P\) of signed measures on \(\mathbb R \) as

$$\begin{aligned} V_P: = \left\{ \mu _{ij}, i,j \in G , d_\mathbf{G }(i,j) \le P \right\} \!, \end{aligned}$$

where \(d_\mathbf{G }(i,j), i,j \in G\) stands for the usual distance on the graph \(\mathbf G \), i.e. the length of the shortest path going from \(i\) to \(j\).

Define also the matrix \(B^{(n)}\) (the dependency on \(P\) is omitted, for clarity) by

$$\begin{aligned} B^{(n)}_{ij}&:= \frac{\text{ Card }\left\{ (k,l) \in G_n\times G, \mu _{kl}= \mu _{ij} \right\} }{ \text{ Card }\left\{ (k,l) \in G_n\times G_n, \mu _{kl}= \mu _{ij} \right\} },\quad \text{ if },d_\mathbf{G }(i,j)\le P\\&:= 1\quad \text{ if }\,\, d_\mathbf{G }(i,j)> P. \end{aligned}$$

The matrix \(B^{(n)}\) gives a boundary correction, comparing, for any \(v\in V_P\) the frequency of the interior couples of vertices with local measure \(v\) with the boundary couples of vertices with local measure \(v\). Actually, this way to deal with the edge effect is very similar to the one used for \(\mathbf G =\mathbb Z ^d\) (see [7, 14]).

Now we can define the unbiased periodogram as \(X_n^T \mathcal{Q }_n(\frac{1}{f}) X_n, \) where \(\mathcal{Q }_n(f) := B^{(n)}\odot \mathcal{K }_n(f).\) Here, the operation \(\odot \) denotes the Hadamard product for matrices, that is

$$\begin{aligned} \forall i,j \in G_n, \left( B^{(n)}\odot \mathcal{K }_n(f)\right) _{ij} = \left( B^{(n)} \right) _{ij}{\mathcal{K }_n(f)}_{ij}. \end{aligned}$$

Notice that this is actually a way to extend the so called tapered periodogram (see for instance [14]). Finally we can define the unbiased empirical log-likelihood, for any \(\theta \in \Theta \)

$$\begin{aligned} L^{(u)}_n (\theta ) := -\frac{1}{2}\left( m_n \log (2 \pi ) + m_n \int \log (f_\theta (x)) \mathrm{d }\mu (x) + X_{n }^T \left( \mathcal{Q }_{n}\left( \frac{1}{f_\theta }\right) \right) X_{n} \right) . \end{aligned}$$

We will finally consider \(\hat{\theta }^{(u)}_n\) the maximum likelihood estimators associated to \(L^{(u)}_n(\theta )\), which will be proved to be an efficient estimator of the true parameter \(\theta _0\).

4.2 Main result: convergence and asymptotic optimality

Consider the following assumptions

Assumption 4.1

Set \(\delta _n = \text{ Card }\left\{ i \in G_n, \exists j \in G \backslash G_n, W_{ij} \ne 0 \right\} \!.\) Assume that

$$\begin{aligned} \delta _n = o(m_n). \end{aligned}$$

Assumption 4.2

Define

$$\begin{aligned} \alpha (f) := \sum _{k \in \mathbb N } \left| f_k \right| (k+1). \end{aligned}$$
(3)

where for \(f \in \mathcal{F }\), the following expansion holds \( f(x) = \sum _k f_k x^k \left( x \in \text{ Sp }(W)\right) .\) Let \(\rho >0\). We make the following assumptions:

  • The map \(\theta \rightarrow f_\theta \) is injective.

  • For any \( \lambda \in \text{ Sp }(W)\), the map \(\theta \rightarrow f_\theta (\lambda )\) is continuous.

  • \(\forall \theta \in \Theta , f_\theta \in \mathcal{F }_\rho = \left\{ f \in \mathcal{F } ,\alpha (\log (f)) \le \rho \right\} \).

Assumption 4.3

There exists a positive measure \(\mu \), such that

$$\begin{aligned} \frac{1}{m_n} \sum _{i \in G_n} \mu _{ii} \underset{n \rightarrow \infty }{\underrightarrow{\mathcal{D }}} \mu . \end{aligned}$$

Here, \(\mathcal{D }\) stands for the convergence in distribution

Assumption 4.4

The set \(V_P\) of possible local measures over \(G\) is finite, and \(n\) is large enough to ensure that

$$\begin{aligned} \forall v \in V_P, \exists (i,j) \in G_n^2, \mu _{ij} = v. \end{aligned}$$

Assumption 4.5

There exists a positive sequence \((u_n)_{n \in \mathbb N }\) such that,

$$\begin{aligned} u_n \underset{n \rightarrow \infty }{\rightarrow } 0, \end{aligned}$$

and

$$\begin{aligned} \sup _{ij} \left| B^{(n)}_{ij}-1\right| \le u_n. \end{aligned}$$

Assumption 4.6

Assume that

  • There exists a positive sequence \((v_n)_{n\in \mathbb N } \) such that \(v_n = o(\frac{1}{\sqrt{m_n}})\) and

    $$\begin{aligned} \forall f \in \mathcal{F }_{2\rho }, \left| \frac{1}{m_n} \text{ Tr }(\mathcal{K }_{G_n}(f)) - \int f \mathrm{d }\mu \right| \le \alpha (f) v_n. \end{aligned}$$
  • For any \(\theta \in \Theta \), \(f_\theta \) is twice differentiable on \(\Theta \) and

    $$\begin{aligned} \frac{\mathrm{d }}{\mathrm{d }\theta }f_\theta \in \mathcal{F }_\rho ,\quad \frac{\mathrm{d }^2}{\mathrm{d }\theta ^2}f_\theta \in \mathcal{F }_\rho . \end{aligned}$$

We can now state one of our main result:

Theorem 4.1

Under Assumptions 4.1, 4.2 and 4.3, the sequences \((\hat{\theta }_n)_{n \in \mathbb N }\), \((\bar{\theta }_n)_{n \in \mathbb N }\), \((\tilde{\theta }_n)_{n \in \mathbb N }\) converge, as \(n\) goes to infinity, \(P_{f_{\theta _0}}\)-a.s. to the true value \(\theta _0\). If moreover Assumption 4.5 holds, this is also true for \((\hat{\theta }^{(u)}_n)_{n \in \mathbb N }\).

Proof

The proof follows the guidelines of [2]. We highlight the main changes performed here. Let denote the probability distribution of the process by \(\mathbb P _{f_{\theta _0}}\). First, we define the Kullback information on \(G_n\) of \(f_{\theta _0}\) with respect to \(f \in \mathcal{F }_\rho \), by

$$\begin{aligned} {\mathbb{IK }}_n(f_{\theta _0},f) := \mathbb{E }_{\mathbb{P }_{f_{\theta _0}}}\left[ L_n(\theta _0,X_n)-L_n(\theta ,X_n)\right] \!. \end{aligned}$$

and the asymptotic Kullback information (on \(\mathbf G \)) by

$$\begin{aligned} {\mathbb{IK }}(f_{\theta _0},f) = \lim _n \frac{1}{m_n} {\mathbb{IK }}_n(f_{\theta _0},f) \end{aligned}$$

whenever it is finite.

The convergence of the estimators of the maximum approximated likelihood is a direct consequence of the following lemmas:

Lemma 4.1

For any \(f\in \mathcal{F }_\rho \), and under Assumptions 4.1, 4.2 and 4.3, the asymptotic Kullback information exists and may be written as

$$\begin{aligned} {\mathbb{IK }}(f_{\theta _0},f) = \frac{1}{2}\int \left( -\log \left( \frac{f_{\theta _0}}{f}\right) - 1 + \frac{f_{\theta _0}}{f}\right) \mathrm{d } \mu . \end{aligned}$$

Furthermore, if we set \(l_n(\theta ,X_n) = \frac{1}{m_n}L_n(\theta ,X_n)\), we have that \(P_{f_{\theta _0}}\)-a.s.,

$$\begin{aligned} l_n(\theta _0,X_n) - l_n( \theta ,X_n) \underset{n \rightarrow \infty }{\rightarrow } \mathbb{IK }(f_{\theta _0},f_\theta ) \end{aligned}$$

uniformly in \(\theta \in \Theta \). This property also holds for \(\bar{l}_n := \frac{1}{m_n}\bar{L}_n\) and \(\tilde{l}_n:= \frac{1}{m_n}\tilde{L}_n\). Furthermore, for \(P>0\), and for both the \(AR_P\) or the \(MA_P\) case (see above), this also holds for \(l_n^{(u)} := \frac{1}{m_n}L^{(u)}_n\).

Lemma 4.2

Let \(f_{\theta _0}\) be the true spectral density, and \((\ell _n)_{n \in \mathbb N }\) be a deterministic sequence of continuous functions such that

$$\begin{aligned} \forall \theta \in \Theta , \quad \ell _n(\theta _0) - \ell _n(\theta ) \underset{n \rightarrow \infty }{\rightarrow } \mathbb{IK }(f_{\theta _0},f_\theta ) \end{aligned}$$

uniformly as \(n\) tends to infinity. Then, if \(\theta _n = \arg \max _\theta \ell _n(\theta )\), we have

$$\begin{aligned} \theta _n \underset{n \rightarrow \infty }{\rightarrow } \theta _0. \end{aligned}$$

The proofs of these lemmas are postponed to the Appendix (Sect. 6.2).\(\square \)

The second main result of the paper provides the asymptotic behaviour of the unbiased estimator \(\hat{\theta }^{(u)}_n \).

Theorem 4.2

In both the \(AR_P\) or \(MA_P\) cases, and under all the Assumptions 4.1, 4.2, 4.3, 4.4, 4.5, 4.6, the estimator \(\hat{\theta }^{(u)}_n\) of \(\theta _0\) is asymptotically normal:

$$\begin{aligned} \sqrt{m_n}(\hat{\theta }^{(u)}_n - \theta _0 ) \underset{n \rightarrow \infty }{ \underrightarrow{\mathcal{D }} } \mathcal{N }\left( 0,\left( \frac{1}{2}\int \left( \frac{f^{\prime }_{\theta _0}}{f_{\theta _0}}\right) ^2\mathrm{d }\mu \right) ^{-1}\right) . \end{aligned}$$

Furthermore, the Fisher information of the model is

$$\begin{aligned} J(\theta _0):= \frac{1}{2}\int \left( \frac{f^{\prime }_{\theta _0}}{f_{\theta _0}}\right) ^2\mathrm{d }\mu . \end{aligned}$$

Hence, the previous estimator is asymptoticly efficient.

To build the estimator \(\hat{\theta }_n^{(u)}\), stronger assumptions on the graph \(\mathbf G \) are needed, which corresponds to the price to pay to obtain its asymptotic distribution. With such results, we are able to estimate the parameters of a process indexed by a graph and use the model to generate new data. A practical application would be given by the data completion over a graph with missing data. Note that we are strongly convinced that Theorem 4.2 may be applied in the \(\mathbb Z ^d\) case with holes, up to the condition that they remain few enough. Actually, Assumption 4.1 is required, so the boundary of the subgraphs (counting the holes) has to be small in front of the volume of this subgraphs. The holes must be independent from the data but their repartition must satisfy a vanishing condition with respect to the total number of observed edges. Extending our results to this case and using the procedure for predicting road trafficking velocities will be the subject of a future work.

Proof

Here again, we mimic the usual proof by extending the result of [2] to the graph case.

Using a Taylor expansion, we get

$$\begin{aligned} (l_n^{(u)})^{\prime }(\theta _0) = (l_n^{(u)})^{\prime }(\hat{\theta }_n^{(u)}) + (\theta _0 - \theta _n^{(u)})(l_n^{(u)})^{\prime \prime }(\breve{\theta }_n) , \end{aligned}$$

where \(\breve{\theta }_n \in ]\hat{\theta }^{(u)}_n, \theta _0[.\) As \(\theta _n^{(u)} = \arg \max l_n^{(u)}\), we have

$$\begin{aligned} (l_n^{(u)})^{\prime }(\hat{\theta }_n^{(u)}) = 0. \end{aligned}$$

So that,

$$\begin{aligned} \sqrt{m_n}(\theta _0 - \hat{\theta }_n^{(u)}) = \left( (l_n^{(u)})^{\prime \prime }(\breve{\theta }_n)\right) ^{-1}\sqrt{m_n}(l_n^{(u)})^{\prime }(\theta _0). \end{aligned}$$

The end of the proof relies on three lemmas:

Lemma 4.3 provides the asymptotic normality for \(\sqrt{m_n}(l_n^{(u)})^{\prime }(\theta _0) \). Combined with Lemma 4.4, we get the asymptotic normality for \(\sqrt{m_n}(\theta _0 - \hat{ \theta }_n^{(u)})\). Finally, Lemma 4.5 gives the Fisher information.

Lemma 4.3

$$\begin{aligned} \sqrt{m_n}(l_n^{(u)})^{\prime }(\theta _0) \underset{n \rightarrow \infty }{\underrightarrow{\mathcal{D }}} \mathcal{N }\left( 0,\frac{1}{2}\int \left( \frac{f^{\prime }_{\theta _0}}{f_{\theta _0}}\right) ^2\mathrm{d }\mu \right) . \end{aligned}$$

Lemma 4.4

$$\begin{aligned} \left( (l_n^{(u)})^{\prime \prime }(\breve{\theta }_n)\right) ^{-1} \underset{n\rightarrow \infty }{\rightarrow } 2\left( \int \left( \frac{f^{\prime }_{\theta _0}}{f_{\theta _0}}\right) ^2\mathrm{d }\mu \right) ^{-1},\quad P_{f_{\theta _0}}-\text{ a.s. } \end{aligned}$$

Lemma 4.5

The asymptotic Fisher information is :

$$\begin{aligned} J(\theta _0) = \frac{1}{2}\int \left( \frac{f^{\prime }_{\theta _0}}{f_{\theta _0}}\right) ^2\mathrm{d }\mu . \end{aligned}$$

The proofs of these lemmas are postponed in Appendix (Sect. 6.3).\(\square \)

4.3 Comments on assumptions

  • Assumption 4.1 deals with the dimension of the graph. Indeed, recall that \(\delta _n\) is the size of the boundary of \(G_n\). For the special case \(G = \mathbb Z ^d\) and \(G_n = [-n,n]^d\), we get \(m_n = (2n+1)^d\) and \(\delta _n = 2d(2n+1)^{d-1}\). Hence, the ratio \(\frac{\delta _n}{m_n}\) is a natural quantity associated to the expansion of the graph that also appears in isoperimetrical [20] and graph expander issues. Assumption 4.1 is a non-expansion criterion that states that this ratio goes to \(0\) when the size of the graph goes to infinity. The graph has to be amenable, which is satisfied for the last examples \(G = \mathbb Z ^d\) and \(G_n = [-n,n]^d\), but not for a homogeneous tree, whatever the choice of the sequence of subgraphs \((\mathbf G _\mathbf{n })_{n \in \mathbb N }\).

  • Assumption 4.2 is a usual assumption which ensures the model to be identifiable. Note that the definition of the regularity factor \(\alpha \) is very close to the one used in [2].

  • The limit measure \(\mu \) that appears in Assumption 4.3 is classically called the spectral measure of \(\mathbf G \) with respect to the sequence of subgraphs \((\mathbf G _n)_{n \in \mathbb Z }\) (see [19] for example). Actually, under Assumption 4.1, Assumption 4.3 is equivalent to the convergence of the empirical distribution of eigenvalues of \(W_{G_n}\) (here, \(W_{G_n}\) denotes the restriction of \(W\) over the subgraph \(G_n\)), as shown in the following lemma whose proof is given in Sect. 6.3.

Lemma 4.6

Let \(\lambda ^{(n)}_1,\ldots , \lambda ^{(n)}_{m_n}\) be the eigenvalues (written with their multiplicity orders) of \(W_{g_n}\). Define

$$\begin{aligned} \mu ^{[1]}_n := \frac{1}{m_n} \sum _{i = 1}^{m_n} \delta _{\lambda ^{(n)}_i}, \end{aligned}$$

and

$$\begin{aligned} \mu ^{[2]}_n = \frac{1}{m_n} \sum _{i \in G_n} \mu _{ii}, \end{aligned}$$

Then, under Assumption 4.1, the convergence of \(\mu ^{[1]}_n\) to \(\mu \) and the convergence of \(\mu ^{[2]}_n\) to \(\mu \) are equivalent.

Note also that Assumption 4.3 holds as soon as there is a kind of homogeneity in the graph. The simplest application is quasi-transitive graph. Indeed, take for instance a finite graph (the pattern) and reproduce it at each vertex of an infinite (amenable) vertex-transitive graph. The final graph is then quasi-transitive, and all the previous assumptions hold. Note that if \(\mathbf G \) is “close” to be quasi-transitive, Assumption 4.3 is still true. We also could adapt notions of unimodularity [1] or stationarity [3] to our framework and prove the existence of a spectral measure.

  • Now, let us discuss about Assumption 4.4. This assumption is quite strong, and holds for instance for quasi-transitive graphs (i.e. such that the quotient of the graph with its automorphism group is finite). Relaxing this assumption can be achieved with very technical modifications, which falls out of the scope of this paper. Yet, we can clarify the meaning of the operator \(B^{(n)}\) through an example. Let us now describe the case \(G = \mathbb Z ^2\), for \(P=2\). In this case \(W^{(\mathbb Z ^2)}\) is

    $$\begin{aligned} \forall i,j,k,l \in \mathbb Z , W^{(\mathbb Z ^2)}\left( (i,j),(k,l)\right) := \frac{1}{4} 1\!\!1_{\left| i-j\right| +\left| k-l\right| = 1}. \end{aligned}$$

    In this example, we set \(G_n = [1,n]^2\), and we can compute the matrix \(B^{(n)}\). Indeed, it remains to notice that

    $$\begin{aligned} \mu _{(i_1,j_1),(i_1+k,j_1+l)} \!=\! \mu _{(i_2,j_2),(i_2 + \epsilon _1 k,j_2 + \epsilon _2 l)}, i_1,i_2,j_1,j_2,k,l \in \mathbb Z , \epsilon _1, \epsilon _2 \in \left\{ -1,1\right\} \!. \end{aligned}$$

    This means that the local measure of a couple of vertices depends only of their relative positions (stationarity and isotropy of this set of measure). So, we need to count the configurations given by Fig. 1 since we consider only couples of vertices \(u,v \in \mathbb Z ^2\) such that \(d_\mathbb{Z ^2}(u,v) \le 2\). We get, for any \(i,j \in \mathbb Z \),

    • \(B^{(n)}_{(i,j),(i,j)} = \frac{n^2}{n^2}= 1.\)

    • \(B^{(n)}_{(i,j),(i,j \pm 1)} = B^{(n)}_{(i,j),(i \pm 1,j)} = \frac{4n^2}{4n(n-1)}. \)

    • \(B^{(n)}_{(i,j),(i \pm 1,j \pm 1)}= \frac{4n^2}{4(n-1)^2}. \)

    • \(B^{(n)}_{(i,j),(i,j\pm 2)}= B^{(n)}_{(i,j),(i \pm 2,j)}= \frac{4n^2}{4n(n-2)} \)

    One can notice that

    $$\begin{aligned} \sup _{ij} \left| B^{(n)}_{ij}-1\right| \underset{n \rightarrow \infty }{\rightarrow } 0. \end{aligned}$$

    Assumption 4.5 just ensures that this property holds for the graph we consider.

  • Finally, Assumption 4.6 contains two points. The first one means that the convergence of the empirical distribution of eigenvalues of \(\mathcal K (f)\) to the spectral measure \(\mu \) is faster than \(\frac{1}{\sqrt{m_n}}\). It holds for instance for quasi-transitive graphs, with a suitable sequence of subgraphs. The second one concerns some regularity assumptions for the density function of the process. For instance, such smoothness assumptions are required in the case \(\mathbf G = \mathbb Z \) (see [2]).

Fig. 1
figure 1

Possible configurations for couple of vertices

5 Simulations

In this section, we give some simulations over a very simple case, where the graph \(G\) is built taking some rhombus connected by a simple edge both on the left and right (see Fig. 2).

Fig. 2
figure 2

Graph \(G\)

The sequence of nested subgraphs chosen here is the growing neighborhood sequence (we chose a point \(x\) and we take \(G_n = \left\{ y \in G, d_\mathbf{G }(x,y)\le n\right\} \)). We study an \(\text{ AR }_2\) model, where,

$$\begin{aligned} \Theta&= \left] -1,1 \right[,\\ f_\theta (x)&= \left( \frac{1}{1-\theta x}\right) ^2 ( \theta \in \Theta ). \end{aligned}$$

Here, we take for \(W\) the adjacency operator of \(G\) normalized in order to get \(\sup _{i,j \in G} W_{ij} \le \frac{1}{\text{ deg }(G)}\). We choose \(\theta _0 = \frac{1}{2}\), \(m_n = 724\). We approximate the spectral measure of \(G\) by the spectral measure of a very large graph (around \(10000\) vertices) built in the same way. Figure 3 shows the empirical spectrum of the graph \(G\) with respect to the sequence of subgraphs \((G_n)_{n \in \mathbb N }\).

Fig. 3
figure 3

Empirical spectrum

To compute \(\left( \mathcal K _n(f_\theta )\right) ^{-1}\), we use the power series representation of \(f_\theta \), and truncate this expression after the \(15\) first coefficient. This choice ensures that the simulation errors are neglectible with respect to the theoretical ones. Figure 4 gives the empirical distribution of

$$\begin{aligned} \sqrt{m_n} \sqrt{\int _{\text{ Sp }(A)} \left( \frac{f^{\prime }_\theta }{f_\theta }\right) ^2}\left( \tilde{\theta }_n-\theta _0\right) . \end{aligned}$$
Fig. 4
figure 4

Empirical distribution