Abstract
In this paper, using spectral theory of Hilbertian operators, we study a special class of Gaussian processes indexed by graphs. We extend Whittle maximum likelihood estimation of the parameters for the corresponding spectral density and show their asymptotic optimality.
1 Introduction
In the past few years, much interest has been paid to the study of random fields over graphs. It has been driven by the growing needs for both theoretical and practical results for data indexed by graphs. On the one hand, the definition of graphical models by Darroch et al. [9] fostered new interest in Markov fields, and many tools have been developed in this direction (see, for instance [25, 26]). On the another hand, the industrial demand linked to graphical problems has risen with the apparition of new technologies. In very particular, the Internet and social networks provide a huge field of applications, but biology, economy, geography or image analysis also benefit from models taking into account a graph structure. For a general review of inference of graphs we refer for instance to [15, 16] and references therein.
The analysis of road traffic is at the root of this work. Actually, prediction of road traffic deals with the forecast of speed of vehicles which may be seen as a spatial random field over the traffic network. Some work has been done without taking into account the particular graph structure of the speed process (see for example [11, 18] for related statistical issues). In this paper, we build a new model for Gaussian random fields over graphs and study statistical properties of such stochastic processes.
A random field over a graph is a spatial process indexed by the vertices of a graph, namely \((X_i)_{i \in G}\), where \(G\) is a given graph. Many models already exist in the probabilistic literature, ranging from geostatistical processes, Markov fields to autoregressive processes. Graphical models are defined as Markov fields (see for instance [15]), with a particular dependency structure, built by specifying a dependency structure for \(X_i\) and \(X_j\), conditionally to the other variables, as soon as the locations \(i \in G\) and \(j \in G\) are connected. For graphical models, we refer for instance to [9] and references therein. Generalizations to the graph case exist also for some processes such as, for instance, autoregressive models on \(\mathbb Z ^d\) (see [15]). Finally, geostatistical processes are defined modeling directly the correlation between observations at nodes \(i\) and \(j\) using a graph based distance. We refer to [6] and references therein.
The usual purpose of graphical models is to design an underlying graph which reflects the dependency of the data and use it for statistical inference. Indeed, this methodology aims at building a graph of conditional correlations which helps understanding the relationships between high complex data (for instance for biological purpose or inference in social networks). For a known graph, the purpose is to estimate the dependency structure, see for instance [23].
Our approach differs since, in our case, the graph is known, and we aim at designing a large class of random processes yet enjoying furthermore a stationary property. These processes will be models for velocity fields of vehicles, whose correlations depend only on the local structure of the network. This assumption of stationarity is commonly accepted among professionals of road trafficking naming it as the road capacity. Moreover these processes must be easy to handle to be used at a large scale.
In this paper, we extend some classical results from time series to spatial fields over general graphs and provide a new framework to define a class of stationary Gaussian processes on graphs. For this, we will make use of spectral analysis and extend to our framework some classical results of time series. In particular, the notion of spectral density may be extended to graphs. This will enable us to construct a maximum likelihood estimate for parametric models of spectral densities. This also leads to an extension of the Whittle’s approximation (see [2, 13]). Actually, many extensions of this approximation have been performed, even in non-stationary cases (see [8, 12, 21]). The extension studied here concerns general processes over graphs. We point out that we will compare throughout all the paper our new framework with the case \(G= \mathbb Z ^d, d \ge 1\).
Section 2 is devoted to some definitions for spectral analysis on graphs. Then we provide in Sect. 3 a general construction of stationary processes indexed by a graph. These models depend on parameters for which we provide estimators in Sect. 4. Some simulations are provided in Sect. 5. The last section, Sect. 6, provides all necessary tools to prove the main theorems, in particular Szegö’s Lemmas for graphs are given in Sect. 6.1, while the proofs of the technical Lemmas are postponed in Sect. 6.3.
2 Definitions for spectral analysis on graphs
In the whole paper, we will consider a Gaussian spatial process \((X_i)_{i \in G}\) indexed by the vertices of an infinite undirected weighted graph.
We will call \(\mathbf G =(G,W)\) this graph, where
-
\(G\) is the set of vertices. \(\mathbf G \) is said to be infinite as soon as \(G\) is infinite (but countable).
-
\(W \in [-1,1]^{G\times G}\) is the symmetric weighted adjacency operator. That is, \( |W_{ij}|\ne 0\) when \(i\in G\) and \(j \in G\) are connected.
We assume that \(W\) is symmetric (\(W_{ij}=W_{ji},\; i,j\in G\)) since we deal only with undirected graphs. For any vertex \(i \in G\), a vertex \(j \in G\) is said to be a neighbor of \(i\) if, and only if, \(W_{ij} \ne 0\). The degree \(\text{ deg }(i)\) of \(i\) is the number of neighbors of the vertex \(i\), and the degree of the graph \(\mathbf G \) is defined as the maximum degree of the vertices of the graph \(\mathbf G \):
From now on, we assume that the degree of the graph \(\mathbf G \) is bounded:
Assume now that \(W\) is renormalized: its entries belong to \([-\frac{1}{\text{ deg }(\mathbf G )},\frac{1}{\text{ deg }(\mathbf G )}]\). This is not restrictive since re-normalizing the adjacency operator does not change the objects introduced later. In particular, the spectral representation of Hilbertian operator is not sensitive to a renormalization.
Notice that in the classical case \(G=\mathbb Z \), the renormalized adjacency operator is
Here, \(\text{ deg }(\mathbb Z ) = 2\). This case will be used in all the paper as an illustration example.
We denote by \(B_G\) the set of all bounded Hilbertian operators on \(l^2(G)\) (the set of square sommable real sequences indexed by \(G\)).
To introduce the spectral decomposition, consider the action of the adjacency operator on \(l^2(G)\) as
The operator space \(B_G\) will be endowed with the classical operator norm
where \(\left\| . \right\| _2\) stands for the usual norm on \(l^2(G)\).
Notice that, as the degree of \(\mathbf G \) and the entries of \(W\) are both bounded, \(W\) lies in \(B_{G}\), and we have
Recall that for any bounded Hilbertian operator \(A \in B_G\), the spectrum \(\text{ Sp }(A)\) is defined as the set of all complex numbers \(\lambda \) such that \(\lambda \text{ Id }- A\) is not invertible (here \(\text{ Id }\) stands for the identity on \(l^2(G)\)). Since \(W\) is bounded and symmetric, \(\text{ Sp }(W)\) is a non-empty compact subset of \(\mathbb R \) [22].
We aim now at providing a spectral representation of any bounded normal Hilbertian operator. For this, first recall the definition of a resolution of identity (see for example [22]):
Definition 2.1
Let \(\mathcal{M }\) be a \(\sigma \)-algebra over a set \(\Omega \). We call identity resolution (on \(\mathcal{M }\)) a map
such that,
-
1.
\(E(\emptyset {}) = 0, E(\Omega )= I\).
-
2.
For any \(\omega \in \mathcal{M }\), the operator \(E(\omega )\) is a projection operator.
-
3.
For any \(\omega ,\omega ^{\prime } \in \mathcal{M }\), we have
$$\begin{aligned} E(\omega \cap \omega ^{\prime }) =E(\omega )E(\omega ^{\prime })=E(\omega ^{\prime })E(\omega ). \end{aligned}$$ -
4.
For any \(\omega ,\omega ^{\prime } \in \mathcal{M }\) such that \(\omega \cap \omega ^{\prime } = \emptyset \), we have
$$\begin{aligned} E(\omega \cup \omega ^{\prime }) = E(\omega )+E(\omega ^{\prime }). \end{aligned}$$
We can now recall the fundamental decomposition theorem (see for example [22])
Theorem 2.1
(Spectral decomposition) If \(A \in B_G\) is symmetric, then there exists a unique identity resolution \(E\) over all Borelian subsets of \(\text{ Sp }(A)\), such that
From the last theorem, we obtain the spectral representation of the adjacency operator \(W\) thanks to an identity resolution \(E\) over the Borelians of \(\text{ Sp }(W)\)
Obviously, we have
Define now, for any \(i \in G\), the sequences \(\delta _i\) in \(l^2(G)\) by
For any \(i,j \in G\), the sequences \(\delta _i\) and \(\delta _j\) define the real measure \(\mu _{ij}\) by
Hence, we can write:
This family of measures \(\mu _{ij},i,j \in G\) will be used in the whole paper. They convey both spectral information of the adjacency operator, and combinatorial information on the number of path and loops in \(\mathbf G \). Indeed, the quantity \(\left( W^k\right) _{ij}\) is the number of path (counted with their weights) going from \(i\) to \(j\) with length \(k\). Note also that all diagonals measures \(\mu _{ii}, i \in G\) are probability measures.
In the usual case of \(\mathbb Z \), an explicit expression for \(\mu _{ij}\) can be given. Denote \(T_{k}(X)\) the \(k\text{ th }\)-Chebychev polynomial (\(k \in \mathbb N \)). We can provide the spectral decomposition of \(W^{(\mathbb Z )}\) (\(W^{(\mathbb Z )}\) has been defined in Eq. 1).
We point out that the spectrum of \(\left( W^{(\mathbb Z )}\right) ^k\) is \([-1,1].\) This shows that, in this case, and for any \(i,j \in G\), the measure \(\mathrm{d }\mu _{ij}\) is absolutely continuous with respect to the Lebesgue measure, and its density is given by
Notice that we recover the usual spectral decomposition pushing forward \(\mu _{ij}\) by the cosine function:
We get
3 General definition of stationary processes on graphs
3.1 Spectral representation of time series
Our aim is to study some kind of stationary processes indexed by the vertices \(G\) of the graph \(\mathbf G \). To begin with, let us recall the usual case of \(\mathbb Z \). In particular, let us introduce Toeplitz operators associated to stationary time series.
Let \(\mathbf X = (X_i)_{i \in \mathbb Z }\) be a strongly stationary Gaussian process indexed by \(\mathbb Z \). Since \(\mathbf X \) is Gaussian, strong stationarity is equivalent to second order stationarity, that is, \(\forall i,k \in \mathbb Z , \text{ Cov }(X_i, X_{i+k}) \) does not depend on \(i\). Thus, we can define
Assume further that \((r_k)_{k \in \mathbb Z } \in l^1(\mathbb Z )\). This leads to a particular form of the covariance operator \(\Gamma \) defined on \(l^2(\mathbb Z )\) by
Recall that \(B_\mathbb{Z }\) denotes here the set of bounded Hilbertian operators on \(l^2(\mathbb Z )\). Notice that, since \((r_k)_{k \in \mathbb Z } \in l^1(\mathbb Z )\), we have \(\Gamma \in B_\mathbb Z \) (see for instance [5] for more details). This bounded operator is constant over each diagonals, and is therefore called a Toeplitz operator (see also [4] for a general introduction to Toeplitz operators).
As \((r_k)_{k \in \mathbb Z } \in l^1(\mathbb Z )\), we have
where \(g\) is the spectral density of the process \(\mathbf X \), defined by
This expression can be written, using the Chebychev polynomials \((T_k)_{k \in \mathbb N }\),
Let, for \(\lambda \in [-1,1]\),
We get, using the family \((\hat{\mu }_{ij} )_{i,j \in \mathbb Z }\) defined above,
Notice that the last expression may also be written as \(\Gamma = f(W^{(\mathbb Z )}) \), and the convergence of the operator valued series defined by Eq. 2 is ensured by the boundedness of \(W^{(\mathbb Z )}\) and of the Chebychev polynomials (\(T_k([-1,1]) \subset [-1,1], \forall k \in \mathbb Z \)), together with the summability of the sequence \((r_k)_{k \in \mathbb Z }\). For \(p\le +\infty \), we will extend usual \(MA_p\) processes to any graph, using this previous remark. This will be the purpose of Sect. 3.2.
Let us recall some properties about the moving average representation \(MA_\infty \) of a process on \(\mathbb Z \). This representation exists as soon as the \(\log \) of the spectral density is integrable (see for instance [5]). In this case, there exists a sequence \((a_k)_{k \in \mathbb N }\), with \(a_0 = 1\), and a Gaussian white noise \(\mathbf \epsilon = (\epsilon _k)_{k \in \mathbb Z }.\), such that the process \(\mathbf X \) may be written as
Defining the function \(h\) over the unit circle \(\mathcal{C }\) by
we recover, with a few computations, the spectral decomposition of the covariance operator \(\Gamma \) of \(\mathbf X \):
This implies the equality
Recall that when \(h\) is a polynomial of degree \(p\) (with non null first coefficient), the process is said to be \(MA_p\). In this case, \(f\) is also a polynomial of degree \(p\). Reciprocally, if \(f\) is a real polynomial of degree \(p\), and as soon as \(f\left( \cos (t)\right) \) is even, and non-negative for any \(t \in [0,2\pi ]\), the Fejér-Riesz theorem provides a factorization of \(f\left( \cos (t)\right) \) such that \(f\left( \cos (t)\right) = \left| h(e^{it})\right| ^2\) (see for instance [17]). This proves that \(\mathbf X \) is \(MA_p\) if, and only if, its covariance operator may be written \(f(W^{(\mathbb Z )})\), where \(f\) is a polynomial of degree \(p\). This remark is fundamental for the construction we provide in the following section (see Definition 3.1).
3.2 Graph analytical type process
In this section, we will define moving average and autoregressive processes over the graph \(\mathbf G \).
As explained in the last section, since \(W\) is bounded and self-adjoint, \(\text{ Sp }(W)\) is a non-empty compact subspace of \(\mathbb R \), and \(W\) admits a spectral decomposition thanks to an identity resolution \(E\), given by
We define here \(MA\) and \(AR\) Gaussian processes, with respect to the operator \(W\), by defining the corresponding classes of covariance operators, since the covariance operator fully characterizes any Gaussian process.
Definition 3.1
(Graph Analytical Model) Let \((X_i)_{i \in G}\) be a Gaussian process, indexed by the vertices \(G\) of the graph \(\mathbf G \), and \(\Gamma \) its covariance operator.
If there exists an analytic function \(f\) defined on the convex hull of \(\text{ Sp }(W)\), such that
we will say that \(X\) is
-
\(MA_q\) if \(f\) is a polynomial of degree \(q\).
-
\(AR_p\) if \(\frac{1}{f}\) is a polynomial of degree \(p\) which has no root in the convex hull of \(\text{ Sp }(W)\).
-
\(ARMA_{p,q}\) if \(f = \frac{P}{Q}\) with \(P\) a polynomial of degree \(p\) and \(Q\) a polynomial of degree \(q\) with no roots in the convex hull of \(\text{ Sp }(W)\).
Otherwise, we will talk about the \(MA_\infty \) representation of the process \(\mathbf X \). We call \(f\) the spectral density of the process \(\mathbf X \), and denote its corresponding covariance operator by
Remark
Actually, this last construction may also be understood as
in the sense of normal convergence of the associated power series. However, the spectral representation will be useful in the following. Even if we consider only very regular functions \(f\) in this works, the definition using the spectral representation allows weaker regularity than the definition using the normal convergence of the associated power series.
The notation \(\mathcal{K }(.)\) has to be understood by analogy with the notation \(\mathcal{T }(.)\) used for Toeplitz operators.
Notice that, in the usual case of \(\mathbb Z \), and for finite order \(ARMA\), we recover the usual definition as shown in Sect. 3.1. So, the last definition may be seen as an extension of isotropic \(ARMA\) for any graph \(\mathbf G \). Besides, note that this extension is given by the equivalence, for any \(g \in \mathbb L ^2\left( [0, 2\pi ]\right) \), such that \(\int _{[0, 2\pi ]} \log g(\lambda )d\lambda <+ \infty \),
This means that, in the usual case \(\mathbf G =\mathbb Z \), the definition of spectral density in our framework is the usual one, up to an change of variable \(\lambda = \cos (t)\) (see Sect. 3.1).
Now, we get a representation of moving average processes over any graph \(\mathbf G \). In Sect. 4 we will give the main result of this paper. It deals with the maximum likelihood identification.
3.3 Intuitions behind graph analytical type processes
We provided in the previous section a new frame to build processes indexed on graphs. Driven by our application to road trafficking, our aim was twofold:
-
First, being able to model a process on a graph which inherit its structure with some stationarity properties since the behavior of the car velocities process depends mainly on the local structure of the road network, and not on its position or orientation. we would like to consider kind of stationary and isotropic models. For a regular lattice \(\mathcal{G } = (G,W)\), a Gaussian random field \((X_i)_{i \in G}\) is said to be stationary and isotropic if its covariance operator \(K\) verify \(K_{i,j} = K_{\sigma (i)\sigma (j)} \) for any \(i,j\) in \(G\) and any automorphism \(\sigma \). Let us recall that an automorphism is defined as a permutation \(\sigma \) on the vertices set \(G\) of \(\mathcal{G }\) that let invariant the adjacency operator: \(\forall i,j \in G, W_{ij} = W_{\sigma (i)\sigma (j)}\).
-
Second, in order to deal with a large number of observations at a large scale, we looked for a model involving few computational issues. Hence, we considered processes having spectral density lying in a parametric model, with few parameters. We would also like that the model provides a spectral representation of the process, to get an easy way to build definite positive operators.
Hereafter, we compare these processes with other models of processes on graph that have been developed in the literature.
Random processes on graphs have been built by considering spatial correlations between observations at proximal locations, see for instance in [6] for a review. For this, a structure (represented by a function \(\phi \)) for the covariance operator is chosen such that
These models are quite close to our point of view. However, they do not provide in general a spectral representation, and conditions for positiveness of the operator may be difficult to obtain.
Modeling dependency of observations on a graph is also the purpose of Gaussian Markov graphical model (see for example in [23]). A Gaussian process \((X_i)_{i \in G}\) indexed by the vertices of a graph \(\mathcal{G }=(G,W)\) is such that the inverse of the associated covariance operator (the so-called precision operator), \(Q = \Gamma ^{-1}\) satisfies
In others words, for \(i,j\in G\), \(X_i\) and \(X_j\) are conditionally independent given \((X_k)_{ k \ne i,j}\), whenever \(i\) and \(j\) are not neighbors. Indeed, \(Q_{ij}\) is the conditional covariance between \(X_i\) and \(X_j\) given \((X_k)_{k \ne i,j}\). In our framework, we can provide in some cases an interpretation of the dependency structure that we have designed. For a Graph-\(AR_p\) process introduced in Definition 3.1, the precision operator is \(Q = P(W)\) for \(P\) a polynomial of degree \(p\). So the conditional independence of \(X_i\) and \(X_j\) given \((X_k)_{k \ne i,j}\) holds as soon as \(d(i,j) >p\). This means that the particular Graph-\(AR_p\) model may be seen as a Gaussian graphical model. The underlying graph is obtained by drawing an edge between two vertices \(i\) and \(j\) as soon as \(d(i,j) <p\). Reciprocally, if the precision operator \(Q\) of a Gaussian graphical model \((X_i)_{i \in G}\) has entries lying in \([-1,1]\) then it is a Graph-\(AR_1\) process with underlying graph \((G,Q)\). In this case, the spectral density (see Definition 3.1) is given by \(x^{-1}\).
Note also that \(ARMA\) processes have also been built on regular lattices for example in [15]. On \(\mathbb Z \), \(MA_q\) (resp. \(AR_p\)) processes have a spectral density that is a polynomial of degree \(q\) (resp. the inverse of a polynomial of degree \(p\)) and reciprocally, processes with a rational spectral density have such an autoregressive representation. This property is not true anymore on \(\mathbb Z ^d\) since a process with a rational spectral density may not have an \(ARMA\) representation. Hence, we provided a construction of graph analytical processes and our terminology introduced in Definition 3.1 can be seen as a natural extension of the simple case of \(\mathbf G = \mathbb Z \). The extension is performed using natural spectral representation and operator theory. It allows both to build general parametric models for Gaussian processes indexed by a graph and to give sharp results on its asymptotic statistical properties.
4 Convergence of maximum approximated likelihood estimators
4.1 Parametric maximum likelihood estimation of the density of a graph ARMA process
The aim of this section is the parametric inference for processes introduced in Definition 3.1. For this, we will generalize the Whittle method that is useful in time series paradigm.
The data is observed on \((\mathbf G _n)_{ n \in \mathbb N }\), a growing sequence of finite nested subgraphs. This means that if \(\mathbf G _n = (G_n,W_n)\), we have \(G_n \subset G_{n+1} \subset G\) and that for any \(i,j \in G_n\), it holds that \(W_n(i,j) = W(i,j)\). Let \(m_n = \text{ Card }(G_n)\). The sequence \((m_n)_{n \in \mathbb Z }\) may actually be seen as the “volume” of the graph \(\mathbf G _n\).
We assume that the density of the process belongs to a parametric family of densities indexed by a parameter \( \theta \in \Theta \), a compact interval of \(\mathbb R \). We point out that if for sake of simplicity, we choose a one-dimensional parameter space \(\Theta \), all the results could be easily extended to the case \(\Theta \subset \mathbb R ^k, k \ge 1\). Define \(\mathcal{F }\) as the set of positive analytic functions over the convex hull of \(\text{ Sp }(W)\).
In this framework, we consider a parametric family of functions \((f_\theta )_{ \theta \in \Theta }\) in \(\mathcal{F }\). They define a parametric set of covariances on \(G\) (see Sect. 3.2) by considering
Let \(\theta _0 \in \mathring{\Theta }\) such that the process \(\mathbf X \) is centered Gaussian process over \(\mathbf G \) with covariance operator \(\mathcal{K }(f_{\theta _0})\) (see Sect. 3.2). Our aim is to compute an estimator of \(\theta _0\) based on the observations \(X_n = (\mathbf X _i)_{i\in G_n}\).
In this frame, we get that
where \(\mathcal{K }_n(f_{\theta _0})\) is the covariance matrix of the vector \(X_n\). In the case \(G=\mathbb Z \), \((X_i)_{i \in \mathbb Z }\) is a Gaussian time series with spectral density \(f_{\theta _0}\). If we observe \(\mathbf X _n:= (X_i)_{i = 1, \ldots n}, n>0\), we can define the maximum log-likelihood estimate \(\hat{\theta }_n\) of \(\theta _0\) by
where
Here, \(\mathcal{T }_{n}(f_\theta )\) denotes the Toeplitz matrix associated with the function \(f_\theta \). This estimator is consistent as soon as the spectral densities are regular enough, and under assumptions on the function \(\theta \mapsto f_\theta \) (see for instance [2]). However, in practical situations, it is hard to compute. For this, the Whittle’s estimate is considered by maximizing an approximation of the likelihood instead of the likelihood itself
where
The Whittle estimate is also consistent and asymptotically normal and efficient, as soon as the spectral densities are regular enough.
In this paper, we generalize this methodology to the graph ARMA processes. The corresponding log-likelihood at \(\theta \) is
Then we consider the two following approximations of the log-likelihood which consist first in replacing
second in replacing \(\left( \mathcal{K }_{n}(f_\theta )\right) ^{-1}\) by \( \mathcal{K }_{n}\left( \frac{1}{f_\theta }\right) \). This gives rise to the corresponding functions:
In Sect. 4.2, we prove the consistency of the estimators \((\hat{\theta }_n)_{n \in \mathbb N }\), \((\bar{\theta }_n)_{n \in \mathbb N }\), \((\tilde{\theta }_n)_{n \in \mathbb N }\), defined as the maximum log-likelihood estimators, maximizer of respectively \(L_n (\theta ) ,\,\bar{L}_n (\theta )\), \(\tilde{L}_n (\theta )\).
Notice that approximated maximum likelihood estimators are not asymptotically normal in general (see for instance [14] for \(\mathbb Z ^d\)). Indeed, the score associated to the approximated \(\log \)-likelihood has to be asymptotically unbiased at an adequate rate [2]. A solution to this problem in \(\mathbb Z ^d\) is to use the tapered periodogram (see [7, 14, 15]).
We provide in the following a generalization to the graph processes for the following two cases.
-
The \(MA_P\) case: There exists \(P>0\) such that the true spectral density \(f_{\theta _0}\) is a polynomial of degree bounded by \(P\).
-
The \(AR_P\) case: There exists \(P>0\) such that all the spectral densities (for any \(\theta \in \Theta \)) of the parametric set are such that \(\frac{1}{f_\theta }\) is a polynomial of degree bounded by \(P\).
To achieve a good approximated \(\log \)-likelihood, we first introduce the unbiased periodogram in each of these cases. Now, let \(P>0\).
Define a subset \(V_P\) of signed measures on \(\mathbb R \) as
where \(d_\mathbf{G }(i,j), i,j \in G\) stands for the usual distance on the graph \(\mathbf G \), i.e. the length of the shortest path going from \(i\) to \(j\).
Define also the matrix \(B^{(n)}\) (the dependency on \(P\) is omitted, for clarity) by
The matrix \(B^{(n)}\) gives a boundary correction, comparing, for any \(v\in V_P\) the frequency of the interior couples of vertices with local measure \(v\) with the boundary couples of vertices with local measure \(v\). Actually, this way to deal with the edge effect is very similar to the one used for \(\mathbf G =\mathbb Z ^d\) (see [7, 14]).
Now we can define the unbiased periodogram as \(X_n^T \mathcal{Q }_n(\frac{1}{f}) X_n, \) where \(\mathcal{Q }_n(f) := B^{(n)}\odot \mathcal{K }_n(f).\) Here, the operation \(\odot \) denotes the Hadamard product for matrices, that is
Notice that this is actually a way to extend the so called tapered periodogram (see for instance [14]). Finally we can define the unbiased empirical log-likelihood, for any \(\theta \in \Theta \)
We will finally consider \(\hat{\theta }^{(u)}_n\) the maximum likelihood estimators associated to \(L^{(u)}_n(\theta )\), which will be proved to be an efficient estimator of the true parameter \(\theta _0\).
4.2 Main result: convergence and asymptotic optimality
Consider the following assumptions
Assumption 4.1
Set \(\delta _n = \text{ Card }\left\{ i \in G_n, \exists j \in G \backslash G_n, W_{ij} \ne 0 \right\} \!.\) Assume that
Assumption 4.2
Define
where for \(f \in \mathcal{F }\), the following expansion holds \( f(x) = \sum _k f_k x^k \left( x \in \text{ Sp }(W)\right) .\) Let \(\rho >0\). We make the following assumptions:
-
The map \(\theta \rightarrow f_\theta \) is injective.
-
For any \( \lambda \in \text{ Sp }(W)\), the map \(\theta \rightarrow f_\theta (\lambda )\) is continuous.
-
\(\forall \theta \in \Theta , f_\theta \in \mathcal{F }_\rho = \left\{ f \in \mathcal{F } ,\alpha (\log (f)) \le \rho \right\} \).
Assumption 4.3
There exists a positive measure \(\mu \), such that
Here, \(\mathcal{D }\) stands for the convergence in distribution
Assumption 4.4
The set \(V_P\) of possible local measures over \(G\) is finite, and \(n\) is large enough to ensure that
Assumption 4.5
There exists a positive sequence \((u_n)_{n \in \mathbb N }\) such that,
and
Assumption 4.6
Assume that
-
There exists a positive sequence \((v_n)_{n\in \mathbb N } \) such that \(v_n = o(\frac{1}{\sqrt{m_n}})\) and
$$\begin{aligned} \forall f \in \mathcal{F }_{2\rho }, \left| \frac{1}{m_n} \text{ Tr }(\mathcal{K }_{G_n}(f)) - \int f \mathrm{d }\mu \right| \le \alpha (f) v_n. \end{aligned}$$ -
For any \(\theta \in \Theta \), \(f_\theta \) is twice differentiable on \(\Theta \) and
$$\begin{aligned} \frac{\mathrm{d }}{\mathrm{d }\theta }f_\theta \in \mathcal{F }_\rho ,\quad \frac{\mathrm{d }^2}{\mathrm{d }\theta ^2}f_\theta \in \mathcal{F }_\rho . \end{aligned}$$
We can now state one of our main result:
Theorem 4.1
Under Assumptions 4.1, 4.2 and 4.3, the sequences \((\hat{\theta }_n)_{n \in \mathbb N }\), \((\bar{\theta }_n)_{n \in \mathbb N }\), \((\tilde{\theta }_n)_{n \in \mathbb N }\) converge, as \(n\) goes to infinity, \(P_{f_{\theta _0}}\)-a.s. to the true value \(\theta _0\). If moreover Assumption 4.5 holds, this is also true for \((\hat{\theta }^{(u)}_n)_{n \in \mathbb N }\).
Proof
The proof follows the guidelines of [2]. We highlight the main changes performed here. Let denote the probability distribution of the process by \(\mathbb P _{f_{\theta _0}}\). First, we define the Kullback information on \(G_n\) of \(f_{\theta _0}\) with respect to \(f \in \mathcal{F }_\rho \), by
and the asymptotic Kullback information (on \(\mathbf G \)) by
whenever it is finite.
The convergence of the estimators of the maximum approximated likelihood is a direct consequence of the following lemmas:
Lemma 4.1
For any \(f\in \mathcal{F }_\rho \), and under Assumptions 4.1, 4.2 and 4.3, the asymptotic Kullback information exists and may be written as
Furthermore, if we set \(l_n(\theta ,X_n) = \frac{1}{m_n}L_n(\theta ,X_n)\), we have that \(P_{f_{\theta _0}}\)-a.s.,
uniformly in \(\theta \in \Theta \). This property also holds for \(\bar{l}_n := \frac{1}{m_n}\bar{L}_n\) and \(\tilde{l}_n:= \frac{1}{m_n}\tilde{L}_n\). Furthermore, for \(P>0\), and for both the \(AR_P\) or the \(MA_P\) case (see above), this also holds for \(l_n^{(u)} := \frac{1}{m_n}L^{(u)}_n\).
Lemma 4.2
Let \(f_{\theta _0}\) be the true spectral density, and \((\ell _n)_{n \in \mathbb N }\) be a deterministic sequence of continuous functions such that
uniformly as \(n\) tends to infinity. Then, if \(\theta _n = \arg \max _\theta \ell _n(\theta )\), we have
The proofs of these lemmas are postponed to the Appendix (Sect. 6.2).\(\square \)
The second main result of the paper provides the asymptotic behaviour of the unbiased estimator \(\hat{\theta }^{(u)}_n \).
Theorem 4.2
In both the \(AR_P\) or \(MA_P\) cases, and under all the Assumptions 4.1, 4.2, 4.3, 4.4, 4.5, 4.6, the estimator \(\hat{\theta }^{(u)}_n\) of \(\theta _0\) is asymptotically normal:
Furthermore, the Fisher information of the model is
Hence, the previous estimator is asymptoticly efficient.
To build the estimator \(\hat{\theta }_n^{(u)}\), stronger assumptions on the graph \(\mathbf G \) are needed, which corresponds to the price to pay to obtain its asymptotic distribution. With such results, we are able to estimate the parameters of a process indexed by a graph and use the model to generate new data. A practical application would be given by the data completion over a graph with missing data. Note that we are strongly convinced that Theorem 4.2 may be applied in the \(\mathbb Z ^d\) case with holes, up to the condition that they remain few enough. Actually, Assumption 4.1 is required, so the boundary of the subgraphs (counting the holes) has to be small in front of the volume of this subgraphs. The holes must be independent from the data but their repartition must satisfy a vanishing condition with respect to the total number of observed edges. Extending our results to this case and using the procedure for predicting road trafficking velocities will be the subject of a future work.
Proof
Here again, we mimic the usual proof by extending the result of [2] to the graph case.
Using a Taylor expansion, we get
where \(\breve{\theta }_n \in ]\hat{\theta }^{(u)}_n, \theta _0[.\) As \(\theta _n^{(u)} = \arg \max l_n^{(u)}\), we have
So that,
The end of the proof relies on three lemmas:
Lemma 4.3 provides the asymptotic normality for \(\sqrt{m_n}(l_n^{(u)})^{\prime }(\theta _0) \). Combined with Lemma 4.4, we get the asymptotic normality for \(\sqrt{m_n}(\theta _0 - \hat{ \theta }_n^{(u)})\). Finally, Lemma 4.5 gives the Fisher information.
Lemma 4.3
Lemma 4.4
Lemma 4.5
The asymptotic Fisher information is :
The proofs of these lemmas are postponed in Appendix (Sect. 6.3).\(\square \)
4.3 Comments on assumptions
-
Assumption 4.1 deals with the dimension of the graph. Indeed, recall that \(\delta _n\) is the size of the boundary of \(G_n\). For the special case \(G = \mathbb Z ^d\) and \(G_n = [-n,n]^d\), we get \(m_n = (2n+1)^d\) and \(\delta _n = 2d(2n+1)^{d-1}\). Hence, the ratio \(\frac{\delta _n}{m_n}\) is a natural quantity associated to the expansion of the graph that also appears in isoperimetrical [20] and graph expander issues. Assumption 4.1 is a non-expansion criterion that states that this ratio goes to \(0\) when the size of the graph goes to infinity. The graph has to be amenable, which is satisfied for the last examples \(G = \mathbb Z ^d\) and \(G_n = [-n,n]^d\), but not for a homogeneous tree, whatever the choice of the sequence of subgraphs \((\mathbf G _\mathbf{n })_{n \in \mathbb N }\).
-
Assumption 4.2 is a usual assumption which ensures the model to be identifiable. Note that the definition of the regularity factor \(\alpha \) is very close to the one used in [2].
-
The limit measure \(\mu \) that appears in Assumption 4.3 is classically called the spectral measure of \(\mathbf G \) with respect to the sequence of subgraphs \((\mathbf G _n)_{n \in \mathbb Z }\) (see [19] for example). Actually, under Assumption 4.1, Assumption 4.3 is equivalent to the convergence of the empirical distribution of eigenvalues of \(W_{G_n}\) (here, \(W_{G_n}\) denotes the restriction of \(W\) over the subgraph \(G_n\)), as shown in the following lemma whose proof is given in Sect. 6.3.
Lemma 4.6
Let \(\lambda ^{(n)}_1,\ldots , \lambda ^{(n)}_{m_n}\) be the eigenvalues (written with their multiplicity orders) of \(W_{g_n}\). Define
and
Then, under Assumption 4.1, the convergence of \(\mu ^{[1]}_n\) to \(\mu \) and the convergence of \(\mu ^{[2]}_n\) to \(\mu \) are equivalent.
Note also that Assumption 4.3 holds as soon as there is a kind of homogeneity in the graph. The simplest application is quasi-transitive graph. Indeed, take for instance a finite graph (the pattern) and reproduce it at each vertex of an infinite (amenable) vertex-transitive graph. The final graph is then quasi-transitive, and all the previous assumptions hold. Note that if \(\mathbf G \) is “close” to be quasi-transitive, Assumption 4.3 is still true. We also could adapt notions of unimodularity [1] or stationarity [3] to our framework and prove the existence of a spectral measure.
-
Now, let us discuss about Assumption 4.4. This assumption is quite strong, and holds for instance for quasi-transitive graphs (i.e. such that the quotient of the graph with its automorphism group is finite). Relaxing this assumption can be achieved with very technical modifications, which falls out of the scope of this paper. Yet, we can clarify the meaning of the operator \(B^{(n)}\) through an example. Let us now describe the case \(G = \mathbb Z ^2\), for \(P=2\). In this case \(W^{(\mathbb Z ^2)}\) is
$$\begin{aligned} \forall i,j,k,l \in \mathbb Z , W^{(\mathbb Z ^2)}\left( (i,j),(k,l)\right) := \frac{1}{4} 1\!\!1_{\left| i-j\right| +\left| k-l\right| = 1}. \end{aligned}$$In this example, we set \(G_n = [1,n]^2\), and we can compute the matrix \(B^{(n)}\). Indeed, it remains to notice that
$$\begin{aligned} \mu _{(i_1,j_1),(i_1+k,j_1+l)} \!=\! \mu _{(i_2,j_2),(i_2 + \epsilon _1 k,j_2 + \epsilon _2 l)}, i_1,i_2,j_1,j_2,k,l \in \mathbb Z , \epsilon _1, \epsilon _2 \in \left\{ -1,1\right\} \!. \end{aligned}$$This means that the local measure of a couple of vertices depends only of their relative positions (stationarity and isotropy of this set of measure). So, we need to count the configurations given by Fig. 1 since we consider only couples of vertices \(u,v \in \mathbb Z ^2\) such that \(d_\mathbb{Z ^2}(u,v) \le 2\). We get, for any \(i,j \in \mathbb Z \),
-
\(B^{(n)}_{(i,j),(i,j)} = \frac{n^2}{n^2}= 1.\)
-
\(B^{(n)}_{(i,j),(i,j \pm 1)} = B^{(n)}_{(i,j),(i \pm 1,j)} = \frac{4n^2}{4n(n-1)}. \)
-
\(B^{(n)}_{(i,j),(i \pm 1,j \pm 1)}= \frac{4n^2}{4(n-1)^2}. \)
-
\(B^{(n)}_{(i,j),(i,j\pm 2)}= B^{(n)}_{(i,j),(i \pm 2,j)}= \frac{4n^2}{4n(n-2)} \)
One can notice that
$$\begin{aligned} \sup _{ij} \left| B^{(n)}_{ij}-1\right| \underset{n \rightarrow \infty }{\rightarrow } 0. \end{aligned}$$Assumption 4.5 just ensures that this property holds for the graph we consider.
-
-
Finally, Assumption 4.6 contains two points. The first one means that the convergence of the empirical distribution of eigenvalues of \(\mathcal K (f)\) to the spectral measure \(\mu \) is faster than \(\frac{1}{\sqrt{m_n}}\). It holds for instance for quasi-transitive graphs, with a suitable sequence of subgraphs. The second one concerns some regularity assumptions for the density function of the process. For instance, such smoothness assumptions are required in the case \(\mathbf G = \mathbb Z \) (see [2]).
5 Simulations
In this section, we give some simulations over a very simple case, where the graph \(G\) is built taking some rhombus connected by a simple edge both on the left and right (see Fig. 2).
The sequence of nested subgraphs chosen here is the growing neighborhood sequence (we chose a point \(x\) and we take \(G_n = \left\{ y \in G, d_\mathbf{G }(x,y)\le n\right\} \)). We study an \(\text{ AR }_2\) model, where,
Here, we take for \(W\) the adjacency operator of \(G\) normalized in order to get \(\sup _{i,j \in G} W_{ij} \le \frac{1}{\text{ deg }(G)}\). We choose \(\theta _0 = \frac{1}{2}\), \(m_n = 724\). We approximate the spectral measure of \(G\) by the spectral measure of a very large graph (around \(10000\) vertices) built in the same way. Figure 3 shows the empirical spectrum of the graph \(G\) with respect to the sequence of subgraphs \((G_n)_{n \in \mathbb N }\).
To compute \(\left( \mathcal K _n(f_\theta )\right) ^{-1}\), we use the power series representation of \(f_\theta \), and truncate this expression after the \(15\) first coefficient. This choice ensures that the simulation errors are neglectible with respect to the theoretical ones. Figure 4 gives the empirical distribution of
References
Aldous, D., Lyons, R.: Processes on unimodular random networks. Electron. J. Probab. 12(54), 1454–1508 (2007)
Azencott, R.; Dacunha-Castelle, D.: Series of irregular observations. Applied Probability. In: A Series of the Applied Probability Trust. Forecasting and model building. Springer, New York (1986)
Benjamini, I., Curien, N.: Ergodic theory on stationary random graphs. Technical, Report (2010). arXiv:1011.2526
Böttcher, A., Silbermann, B.: Analysis of Toeplitz operators. In: Springer Monographs in Mathematics, 2nd edn. Springer, Berlin (2006) (Prepared jointly with Alexei Karlovich)
Brockwell, P.J., Davis, R.A.: Introduction to time series and forecasting. In: Springer Texts in Statistics, 2nd edn. Springer, New York (2002) [With 1 CD-ROM (Windows)]
Cressie, N.A.C.: Statistics for spatial data. In: Wiley Series in Probability and Mathematical Statistics: Applied Probability and Statistics. Wiley, New York (1993). Revised reprint of the 1991 edition, A Wiley-Interscience Publication.
Dahlhaus, R., Künsch, H.: Edge effects and efficient parameter estimation for stationary random fields. Biometrika 74(4), 877–882 (1987)
Dahlhaus, R., Polonik, W.: Nonparametric quasi-maximum likelihood estimation for Gaussian locally stationary processes. Ann. Stat. 34(6), 2790–2824 (2006)
Darroch, J.N., Lauritzen, S.L., Speed, T.P.: Markov fields and log-linear interaction models for contingency tables. Ann. Stat. 8(3), 522–539 (1980)
Dembo, A., Zeitouni, O.: Large deviations techniques and applications. In: Stochastic Modelling and Applied Probability, vol. 38. Springer, Berlin (2010). Corrected reprint of the second (1998) edition
Gamboa, F., Loubes, J.-M., Maza, E.: Semi-parametric estimation of shifts. Electron. J. Stat. 1, 616–640 (2007)
Giraitis, L., Robinson, P.M.: Whittle estimation of arch models. Econom. Theory 17(03), 608–631 (2001)
Grenander, U., Szegő, G.: Toeplitz forms and their applications, 2nd edn. Chelsea Publishing Co., New York (1984)
Guyon, X.: Parameter estimation for a stationary process on a \(d\)-dimensional lattice. Biometrika 69(1), 95–105 (1982)
Guyon X.: Champs aléatoires sur un réseau. Masson, Paris (1992)
Kolaczyk, E.D.: Statistical analysis of network data. Methods and models. In: Springer Series in Statistics. Springer, New York (2009)
Kreĭn, M.G., Nudelman, A.A.: The Markov moment problem and extremal problems. American Mathematical Society, Providence (1977). Ideas and problems of P. L. Čebyšev and A. A. Markov and their further development, Translated from the Russian by D. Louvish, Translations of Mathematical Monographs, vol. 50
Loubes, J.-M., Maza, E., Lavielle, M., Rodríguez, L.: Road trafficking description and short term travel time forecasting, with a classification method. Can. J. Stat. 34(3), 475–491 (2006)
Mohar, B., Woess, W.: A survey on spectra of infinite graphs. Bull. Lond. Math. Soc. 21(3), 209–234 (1989)
Pittet, C.: On the isoperimetry of graphs with many ends. Colloq. Math. 78(2), 307–318 (1998)
Robinson, P.M.: Multiple local Whittle estimation in stationary systems. Ann. Stat. 36, 2508–2530 (2008)
Rudin, W.: Functional analysis, 2nd edn. In: International Series in Pure and Applied Mathematics. McGraw-Hill Inc., New York (1991)
Rue, H., Held, L.: Gaussian Markov random fields. Theory and applications. In: Monographs on Statistics and Applied Probability, vol. 104. Chapman & Hall, Boca Raton (2005)
Seber, G.A.F.: A matrix handbook for statisticians. In: Wiley Series in Probability and Statistics. Wiley-Interscience, Hoboken (2008)
Verzelen, N.: Adaptive estimation of stationary Gaussian fields. Ann. Stat. 38(3), 1363–1402 (2010)
Verzelen, N., Villers, F.: Tests for Gaussian graphical models. Comput. Stat. Data Anal. 53(5), 1894–1905 (2009)
Acknowledgments
We thank an anonymous referee for his very useful advices, and his careful proofreading of the whole paper.
Author information
Authors and Affiliations
Corresponding author
Appendix
Appendix
1.1 Szegö’s Lemmas
Szegö’s Lemmas [13] are useful in time series analysis. Indeed, they provide good approximations for the likelihood. As explained in Sect. 4, these approximations of the likelihood are easier to compute. In this section, we generalize a weak version of the Szegö Lemmas, for a general graph, under Assumption 4.1 (non expansion criterion for \(G_n\)), and Assumption 4.3 (existence of the spectral measure \(\mu \)).
For any matrix \((B_{ij})_{i,j \in G_n}\), we define the block norm
We can state the equivalent version of the first Szegö lemma for time-series
Lemma 6.1
Asymptotic homomorphism
Let \(k,n\) be positive integers, and let \(g_1,\ldots , g_k\) be analytic functions over \(\left[ -1 ,1 \right] \) having finite regularity factors (i.e. \(\alpha (g_i) < +\infty , i = 1,\ldots , k\)). Then,
Corollary 6.1
For any \(g \in \mathcal F _\rho \) (see the first page of Sect. 4.1 for the definition), and under Assumptions 4.1 and 4.3,
Proof of Lemma 6.1 This proof follows again the one of [2]. We will prove the result by induction on \(k\).
First we deal with the case \(k = 2\). Let \(f\) and \(g\) analytic functions over \(\left[ -1 ,1 \right] \) such that \(\alpha (f) < + \infty \) and \(\alpha (g) <+\infty \). We write
Using \(\mathcal K (g) = \sum _{h = 0}^\infty g_h W^h \), Fubini’s theorem gives, since all the previous sequences are in \(l^1(G)\),
Introducing
we get
The coefficient \(\Delta _h\) is a porosity factor. It measures the weight of the paths of length \(h\) going from the interior of \(G_n\) to outside. The control of \(\Delta _h\) is somehow at the core of the proof of our results. To provide an upper bound, first denote \(\Lambda _{n,h}\) the set of all paths \((k_0,\ldots , k_h) \in G^{(h+1)}\) of length \(h\) that verify \(k_0 \in G_n, k_h \in G \backslash G_n\), and \(W_{k_{i-1}k_i}\ne 0, i \in [1,h]\). We will also use the following notation, for any \(\gamma = (k_0,\ldots , k_h) \in \Lambda _{n,h}\):
Now, notice that
Or, all the paths \(\gamma = (k_0,\ldots , k_h) \in \Lambda _{n,h}\) go from the interior of \(G_n\) to outside. Therefore, these paths have to cross the boundary of \(G_n\). This means that for any \(\gamma \in \Lambda _{n,h}\), there exists \(i \in [1,h]\) such that \(k_{i-1} \in G_n, k_i \in G \backslash G_n\), that is \(k_{i-1} \in \partial G_n\).
Finally, note that, for any \(g \in G, i \in [1,h]\), \(\sharp \{ \gamma \in \Lambda _{n,h}, k_i = g \} \le \text{ deg }(G)^h\). Since the entries of \(W\) lies in \([-\frac{1}{\text{ deg }(G)},\frac{1}{\text{ deg }(G)}]\), we can write
So, taking the maximum over \(N \in \mathbb N \), we get \(\Delta _h \le h \le h+1\).
Note also that \(\Delta _0 = 0\).
Therefore, we obtain
Now, we define another norm on \(B_G\) : \( \left\| B\right\| _{\infty ,op} := \sup _{k \in G } \sum _{i \in G} \left| B_{ik} \right| ,\left( B \in B_G\right) .\) We thus obtain
Finally, we get
To conclude the proof of the lemma, define, for \(f \in \mathcal F _\rho \),
Hence, by symmetrization of the last inequality, and since \(1 \le (h+1)\) and \(\Delta _0 = 0\), we have,
So, we get
To perform the inductive step, we need the following inequalities [24]:
Let \(k>1\), and assume that for all \(j\le k-1\), Lemma 6.1 holds. Under the previous assumptions, and the inductive hypothesis for \(k-1\) we get,
which completes the induction step and proves the result.\(\square \)
Proof of Corollary 6.1 Let \(g\in \mathcal F _\rho \), and \(k\) be a positive integer. Using the definition of \(b_n\), we have
Thus, we have, thanks to Assumption 4.1
Denote \(\mu _g^{[1]}\) the real measure whose \(k \text{ th }\)-moment is given by
and \(\mu _g^{[2]}\) the real measure whose \(k \text{ th }\)-moment is given by
Notice that both of these measures have support between \(\inf g \ge e^{-\rho }>0\) and \(\sup g\le e^\rho <+\infty \), since \(\alpha (\log (g))<\rho \) (see Sect. 4). Therefore, the existence of such measures is ensured by the compactness of their support. Indeed, the \(k\)-th moment is lesser than \(e^{k\rho }\). The equality of the moments given by Eq. 5 gives the equality of the measures \(\mu _g^{[1]}\) and \(\mu _g^{[2]}\).
So that, we get
Assumption 4.3 completes the proof of the Corollary since it implies that
\(\square \)
The following lemma enables to replace \(\mathcal K _n(g)\) by the unbiased version \(\mathcal Q _n(g)\) (see Sect. 4 for the definition).
Lemma 6.2
Under Assumptions 4.1, 4.3, 4.4 and 4.5, we have, for \(n\) large enough,
Proof
We define, for any \(f\),
Actually, the proof is based of the following idea: as soon as \(f\) or \(g\) is a polynomial having degree less than or equal to \(P\), we have to control only the number of paths of length less than or equal to \(P\) (counted with their weights).
Let \(p\) be a positive number. Recall that \(\mathcal Q _n(\frac{1}{g}) = B^{(n)} \odot \mathcal K _n(\frac{1}{g})\) (see Sect. 4), we have,
Recall that we used the notation \(\left\| . \right\| _{2,op} \) to denote the largest singular value. We also used that
Using the last inequality and Assumption 4.5, we get,
This ends the proof of the Lemma.
Finally, the following lemma explains the choice of \(B^{(n)}\). The matrix \(\mathcal Q _n(g)\) is no more than a correction of the error between \(\mathcal K _n(f)\mathcal K _n(g)\) and \(\mathcal K _n(fg)\).
Lemma 6.3
(Exact correction) Let \(f,g \in \mathcal F _\rho \), and assume that either \(f\) or \(g\) is a polynomial of degree less than or equal to \(P\) (see Sect. 4). Then, the matrix \(\mathcal Q _n(g)\) verifies
Proof of Lemma 6.3 Note that we assume that \(f\) or \(g\) is a polynomial only to ensure that Assumption 4.4 may be true.
First, notice that
Since this expression is symmetric on \(f,g\), we can now consider the case where \(f\) is a polynomial of degree less than or equal to \(P\). Actually, since \(f\) is a polynomial, \(\mathcal K _n(f)_{ij} = 0\) as soon as \(d(i,j) > P\) (\(i,j \in G\)). Then, if \(i,j,k,l \in G\) are such that \(\mu _{ij}= \mu _{kl}\), we have
So that, we may here denote, for convenience, \(K(f)_{\mu _{ij}}\).
Using Assumption 4.4, this leads to
That ends the proof of Lemma 6.3.\(\square \)
1.2 Proofs of the lemmas of Theorem 4.1
Recall that the theorem relies on two lemmas. Lemma 4.2 states a condition on deterministic sequences to provide the convergence of the maximizer of these sequences.
Proof of Lemma 4.2 Recall that \(f_{\theta _0}\) denotes the true spectral density. Let \((\ell _n)_{n \in \mathbb N }\) be a deterministic sequence of continuous functions such that
uniformly as \(n\) tends to infinity. Denotes moreover \(\theta _n = \arg \max _\theta \ell _n(\theta )\). We aim at proving that
Using the compactness of \(\Theta \), let \(\theta _\infty \) be an accumulation point of the sequence \((\theta _n)_{n \in \mathbb N }\), and \((\theta _{n_k})_{k \in \mathbb N }\) be a subsequence converging to \(\theta _\infty \). As the function
is continuous on \(\Theta \), and the convergence of \((\ell _n(\theta _0) - \ell _n(\theta ))_{n \in \mathbb N }\) is uniform in \(\theta \), we have
But we can notice that, thanks to the definition of \(\theta _n\), \( \ell _{n_k}(\theta _0) - \ell _{n_k}( \theta _{n_k}) \le 0 \) So, since the function \(x \mapsto -\log (x)+x-1 \) is non negative and vanishes if, and only if, \(x=1\), we get that \(f_{\theta _0} = f_{\theta _{\infty }}\). By injectivity of the function \(\theta \rightarrow f_\theta \), we get \(\theta _\infty =\theta _0\), for any accumulation point \(\theta _\infty \) of the sequence \((\theta _n)_{n \in \mathbb N }\), which ends the proof of this first lemma.\(\square \)
Lemma 4.1 provides the uniform convergence of the contrasts of maximum likelihood and approximated maximum likelihood to the Kullback information. The proof may be cut into several lemmas.
Proof of Lemma 4.1 First, notice that by construction, we have, for any \(\theta \in \Theta \),
when it exists. Then, we can compute
Corollary 6.1 of Lemma 6.1 provides the following convergence
To prove the existence of \(\mathbb{IK }(f_{\theta _0},f_\theta )\), it only remains to prove the \(\mathbb P _{f_{\theta _0}}\)-a.s. convergence of \(\frac{1}{m_n}X_n^T\mathcal K _{n}(f_\theta )^{-1}X_n\) to \(\int \frac{f_{\theta _0}}{f_\theta } \mathrm{d } \mu \) as \(n\) goes to infinity.
This is ensured by the following Lemma.
Lemma 6.4
(Convergence lemma) For respectively \(\Lambda = \mathcal K _{n}(\frac{1}{f_\theta })\), \(\Lambda = (\mathcal K _{n}(f_\theta ))^{-1}\) or \(\Lambda = \mathcal Q _n(\frac{1}{f_\theta })\), we have,
Lemma 6.4 combined with Corollary 6.1 ensures the \(\mathbb P _{f_{\theta _0}}-\text{ a.s. }\) convergence of \(\tilde{l}_n(f_{\theta _0})-\tilde{l}_n(f_\theta )\), \( \bar{l}_n(f_{\theta _0})-\bar{l}_n(f_\theta )\) to \(\mathbb{IK }(f_{\theta _0},f_\theta )\). It provides also the \(\mathbb P _{f_{\theta _0}}-\text{ a.s. }\) convergence of \(l^{(u)}_n(f_{\theta _0})-l^{(u)}_n(f_\theta )\) to \(\mathbb{IK }(f_{\theta _0},f_\theta )\) in the \(AR_P\) or \(MA_P\) cases (see Sect. 4). To complete the assertion of Lemma 4.1, it only remains to show the uniform convergences on \(\Theta \) of the last quantities. This will be done using an equicontinuity argument given by the following Lemma.
Lemma 6.5
(Equicontinuity lemma) For all \(n \ge 0\), the sequences of functions
is an \(\mathbb P _{f_{\theta _0}}\)-a.s. equicontinuous sequence on \(\left( \left\{ f_\theta , \theta \in \Theta \right\} , \left\| .\right\| _\infty \right) \). This property also holds for \(\bar{l}_n,\,\tilde{l}_n\). Furthermore, the sequence \(\left( l^{(u)}_n(f_{\theta _0},X_n -l_n^{(u)}(f_\theta ,X_n)\right) _{n \in \mathbb N }\) is also \(\mathbb P _{f_{\theta _0}}\)-a.s. equicontinuous, on \(\left( \left\{ f_\theta , \theta \in \Theta \right\} \!, \left\| .\right\| _{1,pol}\right) \).
We can now end the proof of Lemma 4.1:
First, notice that the space \(\left\{ f_\theta , \theta \in \Theta \right\} \) is compact for the topology of the uniform convergence. This also holds for \(\left( \left\{ f_\theta , \theta \in \Theta \right\} , \left\| .\right\| _{1,pol}\right) \). So, there exists a dense sequence \((f_{\theta _p})_{p \in \mathbb N }\). Then, using Lemma 6.1 and Corollary 6.1, the sequence \(\left( l_n(f_{\theta _0},X_n) - l_n(f_{\theta _p},X_n)\right) _{n \in \mathbb N }\) converges \(\mathbb P _{f_{\theta _0}}\)-a.s. to \(\mathbb{IK }(f_{\theta _0},f_{\theta _p})\). If a sequence of functions is equicontinuous and converges pointwise on a dense subset of its domain, and if its co-domain is a complete space, then the sequence converges pointwise on all the domain [22]. Using this well known property, we obtain, \(\mathbb P _{f_{\theta _0}}\)-a.s., the pointwise convergence of \(\left( l_n(f_{\theta _0},X_n) - l_n(f_{\theta },X_n)\right) _{n \in \mathbb N }\) to \(\mathbb{IK }(f_{\theta _0},f_{\theta })\), for any \(\theta \in \Theta \).
Furthermore, if a sequence of functions is equicontinuous and converges pointwise on its domain, then this convergence is uniform on any compact subspace of the domain [22].
Thus, we get, \(\mathbb P _{f_{\theta _0}}\)-a.s., the uniform convergence on \(\Theta \) of the sequence
to \(\mathbb{IK }(f_{\theta _0},f_{\theta })\).
Using the same kind of arguments, this uniform convergence also holds for \(\bar{l}_n,\,\tilde{l}_n\) and \(l^{(u)}_n\). This concludes the proof of Lemma 4.1.\(\square \)
1.3 Proof of the technical lemmas
Proof of Lemma 4.6
Proof
Assumption 4.3 is about the convergence of \(\mu ^{[2]}_n\) to \(\mu \).
Note that
Hence Lemma 6.1 enables to conclude that, as soon as one measure converges, hence the convergence also holds for the other measure.\(\square \)
Proof of Lemma 6.4 Let \(\theta \in \Theta \). First, consider the case \(\Lambda _n = \mathcal K _n\left( \frac{1}{f_\theta }\right) \). We aim at proving that
To do that, we make use of classical tools of large deviation (see [10]). We compute the Laplace transform of \(X_n^T\Lambda _n X_n\):
These last equalities hold as soon as \(I_{G_n} -2 \lambda \mathcal K _n(f_{\theta _0})^{\frac{1}{2}}\mathcal K _n(\frac{1}{f_\theta })\mathcal K _n(f_{\theta _0})^{\frac{1}{2}} \) is positive. This is true whenever \(\lambda \ge 0\) and small enough. Indeed, we have
Recall that we use the notation \(\left\| . \right\| _{2,op} \) to denote the largest singular value. Now, for \(0 \le \lambda \le e^{-2\rho }\), define
This function verifies
Define also
Using Lemma 6.1, we have, for any \(k \ge 0\),
Then, we can use the same kind of argument than in Corollary 6.1, to compare \(\frac{1}{2m_n} \log \det \left( I_{G_n} -2 \lambda \mathcal K _n(f_{\theta _0})^{\frac{1}{2}}\mathcal K _n(\frac{1}{f_\theta })\mathcal K _n(f_{\theta _0})^{\frac{1}{2}} \right) \) and \(\log \det \left( \mathcal K _n(1-2\lambda \frac{f_{\theta _0}}{f_\theta }) \right) \).
We get,
We can also compute
As very usual, we define the convex conjugate of \(\phi \) by
As soon as \(\phi \) is strictly convex, \(\phi ^*(t)>\phi (0)=0\), for any \(t > \phi ^{\prime }(0) = \int \frac{f_{\theta _0}}{f_\theta } \mathrm{d }\mu \). We can now write, for \(0 \le \lambda \le e^{-2\rho }\),
Then we get, \(\forall t > \int \frac{f_{\theta _0}}{f_\theta } \mathrm{d }\mu \),
So that, taking the infimum on \(\lambda \), we get
We can obtain the same kind of bound for \(t < \int \frac{f_{\theta _0}}{f_\theta }\mathrm{d }\mu \) and by Borel-Cantelli theorem, we get the \(\mathbb P _{f_{\theta _0}}\)-almost sure convergence of \(\frac{1}{m_n}X_n^T \Lambda _n X_n\) to \(\int \frac{f_{\theta _0}}{f_\theta } \mathrm{d }\mu \).
To prove the same convergence with \(\Lambda _n = (\mathcal K _n(f_\theta ))^{-1}\), we have to show that the difference between the spectral empirical measure of \( \mathcal K _n(f_{\theta _0})^{\frac{1}{2}}\mathcal K _n(\frac{1}{f_\theta })\mathcal K _n(f_{\theta _0})^{\frac{1}{2}} \) and \( \mathcal K _n(f_{\theta _0})^{\frac{1}{2}}\mathcal K _n(f_\theta )^{-1}\mathcal K _n(f_{\theta _0})^{\frac{1}{2}} \) converges weakly to zero. It is sufficient to control the convergence of every moment, because these two last measures both have compact support.
For this, we make use of the Schatten norms. For any \(A,B \) matrices of \(M_{m_n}(\mathbb R )\), we define
where \(s_k(A)\) are the singular values of \(A\).
Note that
Recall that since \(f_\theta \in \mathcal F _\rho \), we have \(e^{-\rho } \le f_\theta \le e^\rho \). Hence, for any \(p \ge 1\),
To obtain the same bound with \(\Lambda _n = \mathcal Q _{n}(\frac{1}{f_\theta })\), we have to prove that the difference between the spectral empirical measures of \( \mathcal K _n(f_{\theta _0})^{\frac{1}{2}}\mathcal K _n(\frac{1}{f_\theta })\mathcal K _n(f_{\theta _0})^{\frac{1}{2}} \) and \( \mathcal K _n(f_{\theta _0})^{\frac{1}{2}}\mathcal Q _n(\frac{1}{f_\theta })\mathcal K _n(f_{\theta _0})^{\frac{1}{2}} \) converge weakly to zero. This last assertion is a direct consequence of Lemma 6.2. So, we get
\(\square \)
Proof of Lemma 6.5 Recall that we aim at proving that, \(\mathbb P _{f_{\theta _0}}\)-a.s., the sequence of functions
is equicontinuous on \(\left\{ f_\theta , \theta \in \Theta \right\} \), and that this property also holds for \(\bar{l}_n,\,\tilde{l}_n\) and \(l^{(u)}_n\).
First, we will prove the equicontinuity of the sequence
Let \(\theta , \theta ^{\prime } \in \Theta \).
Denote \(\lambda _i\) the eigenvalues of \(\mathcal K _n(f_{\theta ^{\prime }})^{-1}\left( \mathcal K _n(f_{\theta ^{\prime }})- \mathcal K _n(f_\theta )\right) \). Since \(f_\theta \in \mathcal F _\rho \), we have \(e^{-\rho } \le f_\theta \le e^\rho \).
Notice that we have
So that, to prove the equicontinuity, we may assume that \(\theta \) is close enough to \(\theta ^{\prime }\) to ensure that \(\sup _{i = 1,\ldots , m_n}\left| \lambda _i\right| \le \frac{1}{2}\).
We have
Furthermore, the sequence \((\int \log (f_\theta ) \mathrm{d }\mu )_{n \in \mathbb N }\) is also equicontinuous since, using a Taylor formula,
Now we tackle the equicontinuity of the sequences \(\left( X_n^T \mathcal K _n(f_\theta )^{-1}X_n\right) _{n \in \mathbb N },\) \(\left( X_n^T \mathcal K _n(\frac{1}{f_\theta })X_n\right) _{n \in \mathbb N }\) and \(\left( X_n^T \mathcal Q _n(\frac{1}{f_\theta })X_n\right) _{n \in \mathbb N }.\)
Notice first that, for any matrix \(B \in M_n(\mathbb R )\),
It is thus sufficient to prove the equicontinuity of the sequences \((\mathcal K _n(f_\theta )^{-1})_{n \in \mathbb N },\) \((\mathcal K _n(\frac{1}{f_\theta }))_{n \in \mathbb N }\) and \((\mathcal Q _n(f_\theta )^{-1})_{n \in \mathbb N },\) for the norm \(\left\| . \right\| _{2,op}\).
Note that
Then,
Then, recall that, for any symmetric matrix \(B \in M_n(\mathbb R )\), we have
Recall also that \(\mathcal Q _n(f_\theta ) = B^{(n)}\odot \mathcal K _n(f_\theta )\). Denote
Since the map \(f_\theta \mapsto \frac{1}{f_\theta }\) is continuous over \(\mathcal F _\rho \), which is compact, we get the uniform equicontinuity of the map \(f_\theta \mapsto X_n^T \mathcal Q _n(\frac{1}{f_\theta })X_n\) (for the norm \(\left\| . \right\| _{1,pol}\)).
This concludes the proof of Lemma 6.5. \(\square \)
Proof of Lemma 4.3 We aim at proving the asymptotic normality of \(\sqrt{m_n}(l_n^{(u)})^{\prime }(\theta _0)\). Using the Fourier transform, it is sufficient to prove that
Recall that we have
We can compute
If we define
and
the last equality means that
This holds only if \(f_{\theta _0}\) is a polynomial, or if all the \(f_\theta , \theta \in \Theta \) are polynomials. This brings out that the second theorem holds for the \(AR_P\) or \(MA_P\) case. It also explains the term ’unbiased estimator’ used for \(\theta ^{(u)}\).
Then, it is sufficient to show
If \(\tau _k\) denotes the eigenvalues of the symmetric matrix \(M_n :=\frac{t}{2}\mathcal K _n(f_{\theta _0})^{\frac{1}{2}}\mathcal Q _n(\frac{f^{\prime }_{\theta _0}}{f_{\theta _0}^2})\mathcal K _n(f_{\theta _0})^{\frac{1}{2}},\) then we can write
where \((Y_k)_{k \in G_n}\) has the standard Gaussian distribution on \(\mathbb R ^{m_n}\).
The independence of \(Y_k\) leads to
The \(\tau _k\) are bounded, thanks to the following inequality:
using that \(|t| \le e^\rho \) since \(t \in \mathrm{Supp}(\mu )\), and \( \Vert \mathcal K _n(f_{\theta _0})^{\frac{1}{2}}\Vert _{2,op} \le e^{\rho /2}\) and the last bound \(\Vert \mathcal Q _n(\frac{f^{\prime }_{\theta _0}}{f_{\theta _0}^2})\Vert _{1,op} \le (1+u_n) \alpha (\frac{f^{\prime }_{\theta _0}}{f_{\theta _0}^2}) \le (1+u_n) e^{3 \rho }\).
The Taylor expansion of \(\log (1-2\frac{\tau _k}{\sqrt{m_n}})\) gives
With \(\left| R_n\right| \le C \frac{1}{m_n\sqrt{m_n}} \sum _{k = 1}^{m_n} \left| \tau _k\right| ^3\)
Since the \(\tau _k\) are bounded the assertion will be proved if we show that
This last convergence is a consequence of Lemmas 6.1 and 6.2.
This provides the asymptotic normality of \(\sqrt{m_n}(l_n^{(u)})^{\prime }(\theta _0)\) and concludes the proof of Lemma 4.3:
\(\square \)
Proof of Lemma 4.4 We aim now at proving the \(P_{f_{\theta _0}}\)-a.s. following convergence:
We have
which leads to
Since the sequence \(l_n^{(u)}\) is equicontinuous and \(\breve{\theta }_n \underset{n \rightarrow \infty }{\rightarrow } \theta _0\), we obtain the desired convergence:
\(\square \)
Proof of Lemma 4.5 We want to compute the asymptotic Fisher information. As usual, it is sufficient to compute
where \(M_n(\theta ) = \mathcal K _n(f_\theta )^{-1}\mathcal K _n(f^{\prime }_{\theta })\mathcal K _n(f_\theta )^{-1}\mathcal K _n(f_{\theta _0})\).
This leads, together with Lemma 6.1, and Assumption 4.3 to
This ends the proof of the last lemma.\(\square \)
Rights and permissions
About this article
Cite this article
Espinasse, T., Gamboa, F. & Loubes, JM. Parametric estimation for Gaussian fields indexed by graphs. Probab. Theory Relat. Fields 159, 117–155 (2014). https://doi.org/10.1007/s00440-013-0503-2
Received:
Revised:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00440-013-0503-2
Mathematics Subject Classification
- 62M15
- 62-09



