Likelihood theory for the Graph Ornstein-Uhlenbeck process

We consider the problem of modelling restricted interactions between continuously-observed time series as given by a known static graph (or network) structure. For this purpose, we define a parametric multivariate Graph Ornstein-Uhlenbeck (GrOU) process driven by a general L\'evy process to study the momentum and network effects amongst nodes, effects that quantify the impact of a node on itself and that of its neighbours, respectively. We derive the maximum likelihood estimators (MLEs) and their usual properties (existence, uniqueness and efficiency) along with their asymptotic normality and consistency. Additionally, an Adaptive Lasso approach, or a penalised likelihood scheme, infers both the graph structure along with the GrOU parameters concurrently and is shown to satisfy similar properties. Finally, we show that the asymptotic theory extends to the case when stochastic volatility modulation of the driving L\'evy process is considered.


Introduction
Ornstein-Uhlenbeck (OU) models, driven by Brownian motion or Lévy processes, form a class of continuous-time models with a broad range of applications: in finance for pairs trading (Endres & Stübinger 2019, Holỳ & Tomanová 2018 and volatility modelling (Barndorff-Nielsen & Shephard 2001, Pigorsch & Stelzer 2009b, in neuroscience (Melanson & Longtin 2019), or even in electricity management (Longoria et al. 2018). In parallel, high-dimensional time series datasets fostered the development of sparse inference for OU-type processes (Boninsegna et al. 2018, Gaïffas & Matulewicz 2019) as a way to control interactions within complex systems. On the other hand, graphical time series models are usually restricted to discrete-time time series (Knight et al. 2020, 2016, Zhu et al. 2017.
From this observation, we introduce a mean-reverting process formulated as a graphical model and we derive its theoretical properties. This model contributes to the development of methodologies leveraging both the modelling flexibility of continuous-time models and the sparsity of graphical models. Furthermore, this article unlocks both hypothesis testing and uncertainty estimation in this setting.
In our framework, a graph is a set of nodes that interact together through a collection of links, or edges. Each node is described by a numerical value at any time and we observe a multivariate time series, where each component is the value of one node at a given time. To accommodate this graph structure, we introduce the Graph Ornstein-Uhlenbeck (GrOU) process, a parametric autoregressive model placed on a graph and driven by a Lévy process (Masuda 2004(Masuda , 2007. This strictly stationary model quantifies the impact of a node value on its increments, the momentum effect, and the impact of the neighbouring nodes, the network effect. It is interpreted as a continuous-time extension to the discrete-time Network Vector Autoregression (NAR) model (Zhu et al. 2017)-a model used for the modelling of social media , financial returns (Chen et al. 2020) or risk dynamics (Chen et al. 2019). We also introduce an alternative formulation with separate momentum and network effects for each node for a finer modelling of the graph dynamics.
For more details, the general multivariate Lévy-driven OU process and its ergodic properties are presented in Kevei (2018), Masuda (2004Masuda ( , 2007, Sandrić (2016). We refer to Sørensen (1991) for a general theory of the likelihood inference for continuously-observed jump diffusions.
This article focuses on providing the theoretical guarantees necessary for applications. First, we introduce the Lévy-driven GrOU model. Although the estimation of Lévy-driven OU processes have been largely considered (Gushchin et al. 2020, Masuda 2010, to the best of the authors' knowledge, the formulation of the GrOU process is the first of its kind.
Second, under non-degeneracy conditions, we show that this model belongs to an exponential family (Küchler & Sørensen 1997, Mancini 2009) with a fully-explicit likelihood function. We derive corresponding maximum likelihood estimators (MLEs) to estimate the momentum and networks effects. Those estimators are shown to be consistent and efficient in the sense of Hájek-Le Cam's convolution theorem (Hájek 1970, Le Cam & Lo Yang 1990. Those results extend the continuous-time asymptotic results given in the Lévy-driven univariate setting (Mai 2014) and the multivariate case with a Brownian noise (Basak et al. 2008, Höpfner 2014. Then, we prove that this family of models is locally asymptotically normal (LAN) (Le Cam & Lo Yang 1990) and provide a couple of central limit theorems (CLTs) under square integrability conditions. In addition, we extend the adaptive Lasso regularisation scheme from Gaïffas & Matulewicz (2019) to the Lévy-driven case and prove the asymptotic normality of the estimator. This approach can be used when the graph structure is not known a priori: our approach can not only be applied on graphical time series but also on general multivariate time series with sparse dependencies.
Finally, by assuming that the covariance matrix of the Lévy process is itself a matrixvalued OU process, i.e. both time-dependent and stochastic, the GrOU model is equipped with a stochastic covariance modulation term (Pigorsch & Stelzer 2009b) along with a purejump term. We show that the resulting process would still be ergodic using the mixing criterion from Fuchs & Stelzer (2013). The asymptotic results are shown to hold conditional on the knowledge of this volatility process and open the doors to a more complex modelling of graphical interactions of time series.
The GrOU process is presented in Section 2 where we describe formally the momentum and network effects along with non-degeneracy conditions. Then, Section 3 is devoted to setting up the likelihood framework and to proving the existence and uniqueness of the MLEs. In 3 Section 4, both the LAN property of the model along with the Hájek-Le Cam efficiency of the MLEs are provided. We also present the MLE CLTs with an explicit limiting covariance matrix. We present an Adaptive Lasso scheme as well as extend known Brownian asymptotic properties to the Lévy-driven case in Section 5. Finally, in Section 6, we extend further our framework to include an OU-type stochastic volatility term and we show that, conditional on the knowledge of the stochastic volatility process, the central limit theorems from Sections 4 & 5 hold.

A Graph Ornstein-Uhlenbeck process
In this section, we define the Graph Ornstein-Uhlenbeck (GrOU) process to study either the aggregated or individual behaviour of nodes on the graph.

Notations
We consider a filtered probability space (Ω, F, (F t , t ∈ R), P 0 ) to which all stochastic processes are adapted. We consider two-sided Lévy processes (L t , t ∈ R) (i.e stochastic processes with stationary and independent increments and continuous in probability and L 0 = 0 D , P 0 −a.s.) (Brockwell 2009, Remark 1) which are without loss of generality assumed to be càdlàg and we write Y t− := lim s↑t Y s for any t ∈ R. For any probability measure P, we denote by P t its restriction to the σ-field F t for any t ∈ R.
We denote by det the matrix determinant, the space of {0, 1}-valued d × d matrices by M d ({0, 1}), the space of real-valued d×d (resp. n×d) matrices by M d (R) (resp. M n,d (R)), the linear subspace of d×d symmetric matrices by S d , the (closed in S d ) positive semidefinite cone (i.e. with the real parts of their eigenvalues non-negative) by S + d and the (open in S d ) positive definite cone (i.e. with the real parts of their eigenvalues positive) by S ++ d . In particular, We denote by λ leb the one-dimensional Lebesgue measure. For a non-empty topological space, B(S) is the Borel σ-algebra on S and π is some probability measure on (S, B(S)). The collection of all Borel sets in S × R with finite π ⊗ λ leb -measure is written as B b (S × R). Also, the norms of vectors and matrices are denoted by · . We usually take the Euclidean (or Frobenius) norm but due to the equivalence between norms, our results are not normspecific and are valid under any norm in R d or M d (R). In addition, for an invertible matrix M ∈ M d (R), we define x, y M := x ⊤ M −1 y for x, y ∈ R d . Finally, for a process X t = (X In this article, ⊗ denotes the Kronecker matrix product, ⊙ is for the Hadamard (elementwise) matrix product and vec is the vectorisation transformation where columns are stacked on one another. We denote the inverse vectorisation transformation by vec −1 (x) :

The Lévy-driven Ornstein-Uhlenbeck process
We recall the construction of Ornstein-Uhlenbeck (OU) processes and introduce their graphical interpretation, namely the GrOU process. We give two parametrisations specific to the modelling of graph structures namely the θ-GrOU and the ψ-GrOU: the former encodes the momentum and network effects for the whole graph into two scalar parameters while the latter has separate parameters for each node and each neighbour.
The Lévy process L is defined by the Lévy-Khintchine characteristic triplet (b, Σ, ν) with respect to the truncation function τ (z) := I {x∈R d : x ≤1} (z) where I denotes the indicator function. More explicitly, the Lévy-Khintchine representation yields, for t ∈ R, Again, without loss of generality, consider that P 0 is a probability measure where (L t , t ∈ R) is a càdlàg Lévy process as mentioned above. Also, we denote by P t,0 the probability measure P 0 restricted to the σ-field F t as introduced in Section 2.1.

The OU process on a graph
The components of Y are interpreted as the nodes of a graph structure linked together through a collection of edges. Those are given in a adjacency (or graph topology) matrix A = (a ij ) ∈ M d ({0, 1}): : a ij = 1 for an existing link between node i and j, 0 otherwise. We assume that a ii = 0 for all i ∈ {1, . . . , d}. Requiring the knowledge of the adjacency matrix is necessary for the graphical interpretation of the OU process. This limitation is alleviated in the sparse inference scheme introduced in Section 5.
Assumption 1. We assume that A is deterministic, static in time and known.
For i ∈ {1, . . . , d}, define n i := 1 ∨ j =i a ij , which counts the number of neighbouring nodes the i-th node is connected to, or node degree. We can now define the row-normalised adjacency matrix A := diag(n −1 1 , . . . , n −1 d )A. to normalise the parameter values representing the momentum and networks effects which we introduce below.
The θ-GrOU process We introduce the two-dimensional parameter vector θ := (θ 1 , θ 2 ) ⊤ ∈ R 2 describing the network effect and the momentum effect respectively, and define the matrix The row-normalised adjacency matrix A makes the momentum parameter θ 2 and the network parameter θ 1 in (2) directly comparable with one another independently of the node degrees. This yields the SDE given by whose i-th component satisfies the equation The ψ-GrOU process Recall that we denote by ⊙ the Hadamard product and by vec −1 the inverse vectorisation transformation. In general, for a d 2 -dimensional vector ψ ∈ R d 2 , we have where such that the corresponding SDE is written as: This second parametrisation alleviates the scarcity of network interactions imposed by θ yet exposes the estimation to the curse of dimensionality as the number of nodes d grows. For simplicity, one may write Q for Q(θ) or Q(ψ) when the context is clear. With Q(θ), we restrict the interactions to the network and momentum effects. This extends the current framework to partially observable networks (e.g. too large for computations) where an exploration process needs to take place (Dereich & Mörters 2013, Section 5). This makes the estimation robust again the curse of dimensionality coming from the number of nodes. We now define the GrOU process as follows.
Definition 1. The Graph Ornstein-Uhlenbeck (GrOU) process is a càdlàg process (Y t , t ≥ 0) satisfying Equation (1) for some two-sided Lévy noise (L t , t ∈ R) where Q is given by either Equation (2) or by Equation (3) such that Q is positive definite. This process is then called a θ-GrOU process or a ψ-GrOU process, respectively.
We give sufficient conditions for Q to be positive definite in both cases in Section 2.4.

Stationary solution
Recall that S ++ d is the set of d × d matrices such that the real parts of the eigenvalues are positive. We restrict ourselves to study strictly stationary time series, we give the known conditions under which the OU process is strictly stationary. Given the standard OU processes theory from Brockwell et al. (2007), Masuda (2004), we assume the following: Assumption 2. Suppose that Q ∈ S ++ d and that the Lévy measure ν(·) satisfies the log moment condition: Then, under Assumption 2, there is a unique strictly stationary solution of the above SDE given by for any t ≥ s ≥ 0.
Recall that the Lévy-Khintchine characteristic triplet of L with respect to the truncation function τ is denoted (b, Σ, ν). Proposition 2.1, Masuda (2004) yields that the transition probability from x at time t denoted P(t, x, ·) is characterised by the triplet where the limit as t → ∞ leads a characteristic triplet of the form Proposition 2.4.2. If θ 2 > 0 such that θ 2 > |θ 1 |, then Q(θ) ∈ S ++ d . Proof. Recall that the Geršgorin's circle theorem states that any eigenvalue of Q(θ) is found in a closed circle of centre Q ii = θ 2 and radius equal to the sum of non-diagonal entries of the i-th row j =i |Q ij | for some i ∈ {1, . . . , d}. Since A is row-normalised and given that θ 2 > |θ 1 |, we conclude that the eigenvalues must be in an open disk with center θ 2 and radius |θ 1 |. This disk is positioned in the positive half-plane and does not contain the origin. This yields that all eigenvalues are strictly positive.
Proposition 2.4.2 makes practical sense as it requires that the auto-regressive part of the model is predominant in absolute terms. For the ψ-GrOU formulation, we obtain the following: then Q(ψ) ∈ S ++ d . This means that vec −1 (ψ) ∈ M d (R) has positive diagonal elements. Proof. The absolute sum of the off-diagonal elements of the i-th row of Q(ψ) give Similarly to the proof of Proposition 2.4.2, applying Geršgorin's circle theorem yields the expected result.
Remark 2.4.4. The model defined in (1) is a generalisation of the (discrete-time) Vector Autoregressive (VAR) model (Sims 1980) with a depth of 1. For a step size ∆ > 0, we then denote the sampled process by X j := (X j∆ . Hence we have the VAR(1)-representation where the parameter matrix Φ is given by and an i.i.d. noise sequence given by This VAR-like formulation with a fixed step size ∆ is studied in Fasen (2013) where one estimates e −∆Q as a d × d matrix directly. However, a strong identifiability issue hinders the estimation of Q from e −∆Q : the logarithm of a matrix (or log-matrix) is not necessarily itself a real matrix and, if so, it may be not unique (Culver 1966(Culver , p. 1146. The former holds if and only if e −∆Q is nonsingular and each elementary divisor (Jordan block) of e −∆Q belonging to a negative eigenvalue occurs an even number of times (Culver 1966, Th. 1). Since e −∆Q has only positive eigenvalues, a real-valued log-matrix always exist. On the other hand, the uniqueness is more difficult to ascertain. It requires that all the eigenvalues of e −∆Q are positive and real and that no elementary divisor (or Jordan block) of e −∆Q belonging to any eigenvalue appears more than once (Culver 1966, Th. 2). Those conditions are difficult to check in practice: the eigenvalues λ of Q have positive real parts but may have non-zero imaginary parts such that e −∆λ may be non-real. Also, the Jordan block assumption does not hold in general for all Q ∈ S ++ d . With this issue in mind, we introduce a fully-explicit likelihood function in Section 3; whose logarithm is well-defined as a real-valued positive function.

Likelihood and estimators
Suppose we observe the process Y continuously on [0, T ] for T ∈ R ∪ {∞} and let t ∈ [0, T ]. In this section, we present the likelihood framework of interest and derive closed-form formulas for the θ-GrOU and ψ-GrOU MLEs.

Ergodicity
Recall the ergodic theorem of a process (Y t , t ∈ R) satisfying Equation (1) is given by:  Masuda (2007)) Suppose that Assumptions 1 and 2 hold. Then (Y t , t ∈ R) admits a unique invariant distribution π for any choice of the law η of the initial value Y 0 . Moreover, for any measurable function g : R d → R k satisfying E [ g(Y ∞ ) ] := R d g(y) π(dy) < ∞ and for any η, we have where P η is the law of Y associated with the initial value Y 0 ∼ η.
. This result is essential for the asymptotic properties of the model statistical inference and was further extended in Kevei (2018), Sandrić (2016). We present the general likelihood framework used for GrOU processes.

The fully-explicit likelihood function
To infer the model parameters, we set up an explicit likelihood function to be maximised. For a general positive definite dynamics matrix Q, the Radon-Nikodym derivative of the corresponding d-dimensional Ornstein-Uhlenbeck process (Eq. 1) is expressed as follows (Mai 2014, Pap & van Zuijlen 1996: where Y c s is the continuous P 0 -martingale part and P Y is a probability measure equivalent to P 0 such that the process in Equation (6) is a martingale. Similarly to P t,0 , P t,Y is the restriction of P Y on F t for any t ∈ R. We note that, since Σ ∈ S ++ d , Σ is invertible and hence Equation (6) is well-defined.
Remark 3.2.1. In the rest of the article, we write t = T (i.e. t is the time horizon itself ) and consider the likelihood at time t directly.

The case of the θ-GrOU process
In this section, we write the likelihood for θ-GrOU processes and obtain the corresponding maximum likelihood estimator along with its existence and uniqueness. Consider the notations: and define We then set up a likelihood framework (in the sense of Section 2, Morales et al. (2000)) for the θ-GrOU process as follows: Proposition 3.3.3. From Equation (6), we obtain the likelihood ratio L t (θ; Y): Additionally, t → H t and t → θ ⊤ · [H] t · θ (for a fixed θ) are càdlàg hence bounded and F t -measurable for any finite t ∈ R.
Proof. According to Lemma 3.3.2 and by continuity, H and [H] are finite for any By the properties of the stochastic integral with respect to the continuous martingale part of Y, t → H t is indeed càdlàg.
From Proposition 2.4.2, we define a compact set Θ such that We state the main result on the existence and uniqueness of the continuous-time Maximum Likelihood Estimator (MLE) using Notation 3.3.1: Suppose that Assumptions 1 and 2 hold. Assume that the θ-GrOU process Y is observed in continuous time and has finite second moments. Then, the MLE θ t on the compact set Θ solves the equation Moreover, θ t satisfies the properties: 2. The MLE θ t exists almost surely and uniquely under P t,Y .
This concludes the presentation of the two-parameter estimator and the MLE for the ψ-GrOU process is defined next.

The case of the ψ-GrOU process
We focus on the ψ-GrOU processes and define their likelihood along with the corresponding MLE. From Basak et al. (2008), one deduces an intermediary result: This matrix is P 0 -almost surely nonsingular for t large enough in the sense that lim inf t→∞ t −1 λ min (K t ) > 0 and we also have that λ max (K t ) = O(t) P 0 −a.s. Here, λ min and λ max are respectively the smallest and largest eigenvalues (which are real since K t is symmetric).

Proof. See Appendix A.3.
We define the equivalent matrix to H t as follows Definition 2. We define the node-level integrated response vector as As hinted by Equation (3), we first derive the likelihood under unrestricted network interactions before applying the network topology (as a linear transformation of the former). The corresponding ψ-GrOU likelihood is formulated as follows: The likelihood with respect to ψ is given by Proof. See Appendix A.4.
Remark 3.4.3. To obtain the MLE of vec(Q(ψ)), we factor in the network topology A by transforming linearly ψ into vec(I d×d + A) ⊙ ψ as given in Eq.
(3). Therefore, there is no need to reformulate the likelihood given in Proposition 3.4.2.
that is where Q(ψ) is diagonally dominant: this ensures the well-definedness of the GrOU process. Similarly to the θ-GrOU case, we denote by P ψ Y , ψ ∈ Ψ the statistical models of Lévy-driven Ornstein-Uhlenbeck processes with respect to the likelihood from Proposition 3.4.2 indexed on Ψ.
We formulate an equivalent to Theorem 3.3.4 for the ψ-GrOU process: Theorem 3.4.4. (ψ-GrOU MLE with continuous-time observations) Suppose that Assumptions 1 and 2 hold. Assume that the ψ-GrOU process Y is observed in continuous time and has finite second moments. Then, the MLE ψ t on the compact set Ψ solves the equation Moreover, ψ t satisfies the properties: 2. The MLE ψ t exists almost surely and uniquely under P t,Y .

Asymptotic theory for the MLEs
Asymptotic properties of MLEs are a necessary step into formulating hypothesis tests necessary for sound inference. In this section, we consider having access to a continuous flow of data and we derive the asymptotic normality of the afore-mentioned estimators as well as an augmented estimator on the whole Q matrix. We consider the estimators efficiency in the sense of Hájek-Le Cam's convolution theorem (Hájek 1970, Section 2) under local asymptotic normality (Le Cam & Lo Yang 1990, Chapter 5, Section 6).

Asymptotics for θ-GrOU
We now prove the consistency of the MLE for the θ-GrOU process on a compact set Θ.
We denote by P θ Y , θ ∈ Θ the statistical models of Lévy-driven Ornstein-Uhlenbeck processes with respect to the likelihood from Proposition 3.3.3 indexed on Θ. We obtain the local asymptotic normality for this sequence: Proof. See Appendix A.6 The central limit theorem for θ is given by: (1) and has finite second moments for θ ∈ Θ. In addition, suppose that E exp(θ ⊤ H t ) < ∞ for t large enough. Then, the MLE θ t satisfies under P Y Moreover, θ t is efficient in the sense of Hájek-Le Cam's convolution theorem.
Corollary 4.1.4. Under the same assumptions as in Theorem 4.1.3, one obtains which is positive definite by Cauchy-Schwarz's inequality.
Proof. Recall the result of Lemma 3.3.2; the continuous mapping theorem yields that

Asymptotics for ψ-GrOU
We present a decomposition of the MLE ψ in the following lemma: Lemma 4.2.1. In the context of Theorem 3.4.4, we have that Similarly to Section 4.1, we denote by P ψ Y : ψ ∈ Ψ the statistical models of Lévy-driven Ornstein-Uhlenbeck processes with respect to the likelihood from Proposition 3.4.2 indexed on Ψ. We obtain the local asymptotic normality for this sequence: Proof. We can apply a similar argument to the proof of Lemma 4.1.2 in R d .
As presented in Theorem 4.1.3, we obtain a central limit theorem for ψ-GrOU with continuous-time observations as follows: Theorem 4.2.3. Assume Assumptions 1 & 2 hold and that Prop. 2.4.3 can be applied. In addition, suppose that E exp(ψ ⊤ A Σ t ) < ∞ for t large enough and ψ ∈ Ψ. Then, ψ t is consistent and we obtain: Moreover, ψ t is efficient in the sense of Hájek-Le Cam's convolution theorem.
We have derived an essential result for a general dynamics matrix (e.g. vec −1 (ψ)) with diagonal elements dominating the average off-diagonal parameters row-wise. We extend the result to include the network topology and derive a corollary for such a graph-constrained estimator as follows: Corollary 4.2.4. In the same setting as in Theorem 4.2.3, we have: Proof. By a property of the Hadamard product, observe that vec (Q(ψ)) = vec(I d×d + A) ⊙ ψ = diag(vec(I d×d + A)) · ψ = D A · ψ. By Theorem 4.2.3, the result follows directly.
The term D A highlights the application of the network topology and yields a generalised form of the two-dimensional central limit theorem given in Corollary 4.1.4.
Remark 4.2.5. Fasen (2013) proved a similar central limit theorem but for the regression on e −Q itself which remains an alternative to the MLE approach, but the identifiability issues mentioned below Remark 2.4.4, in Section 2.4 hinders the direct estimation of Q from e −Q .
Until this point, we have assumed the availability of the adjacency matrix. However, sparse stochastic processes have become increasingly influential to handle high-dimension problems (Belomestny et al. 2019, Gaïffas & Matulewicz 2019, Ma et al. 2021. Regularisation through a penalty on the model parameters is an important component of this literature and we show in the next section that general Lévy-driven OU processes can be consistently transformed into GrOU processes in this context.

Asymptotic theory of the Adaptive Lasso regularisation
A key limitation of the MLEs is that the adjacency matrix should be fully specified: this limits the applicability of the GrOU process to datasets where the graph topology is known (as in Assumption 1). Also, regularisation techniques are a powerful tool to create sparse graph-like structure for high-dimensional problems (Chen et al. 2020, Ma et al. 2021. To prove that such tools can be used in our setting, we propose an Adaptive Lasso scheme and show its consistency and asymptotic normality.

Adaptive Lasso Regularisation
Applying an L 1 -penalty on the dynamics matrix Q to the log-likelihood, called a Lasso regression, is a common practice to introduce sparsity into Q. Then, the regularised process can be interpreted as a proper GrOU process. In addition, a parameter allows the practitioners to tune how sparse the then-estimated adjacency matrix should be.
Notation 5.1.1. The support of a vector or a matrix x is denoted supp(x) and is defined as the set of indices of non-null coordinates of x. Additionnally, given a set of indices I, we denote by x |I , the restriction of x to the indices in I and x |I×I to the indices in I × I.
Similarly, Adaptive Lasso (AL) regularisation schemes leverages a penalty that takes into account any t 1/2 -consistent estimator-the MLE herein-to provide better theoretical guarantees such as consistency in variable selection (Bühlmann & Van De Geer 2011, Section 2.6) or asymptotic normality. The former is defined as the support of the estimator converging to the support of the true parameter asymptotically.

Definition
An AL scheme (Gaïffas & Matulewicz 2019) applied on a Lévy-driven OU process Y with unknown dynamics matrix Q 0 yields a GrOU-like process with non-trivial adjacency matrix A. It is defined by Q AL,t := arg max for fixed parameters λ ≥ 0 and γ > 0. Also, ⊙ denotes the Hadamard product and the denominator of the penalty | Q t | −γ is evaluated elementwise. The log-likelihood is given by The MLE components are almost-surely non-zero and penalise more the entries that are expected to be zero. Conditional on the knowledge of Q 0 , we show two oracle properties: (a) the scheme is consistent in variable selection: i.e. the support of Q AL,t converges to the support of the true parameter Q 0 as t → ∞; (b) the estimator is asymptotically normal as t → ∞ over the support of the true parameter. For instance, a Lasso regression with Gaussian noise is not consistent (Zou 2006).

Proof. See Appendix A.10.
Note that the adjacency matrix can therefore be estimated as follows: and the θ-GrOU inference can be applied next as a simplification step for high-dimensional applications although the impact of model misspecification is left for future research. Finally, the penalty parameter λ can be chosen to reach a given sparsity criterion (trial-and-error) or by cross-validation (Gaïffas & Matulewicz 2019, Section 4.1).
6 An extension to a volatility-modulated GrOU process Stationary noise distributions are usually too simplistic to explain the intrinsic variability of the data. Volatility modulation adds a stochastic scaling factor (Belomestny et al. 2019, Cai et al. 2016) which follows its own dynamics to better represent exogenous source of uncertainty (Pigorsch & Stelzer 2009b, Yang et al. 2020) whilst a jump component helps to model unforeseen perturbations or rare calendar events (Barndorff-Nielsen & Veraart 2012).
We extend the framework of Section 2 to include a stochastic volatility modulation through a positive semidefinite Ornstein-Uhlenbeck (PSOU hereafter) process (Pigorsch & Stelzer 2009a,b) and a time-changed jump term.
For the latter term, we adapt the univariate framework introduced in Barndorff-Nielsen & Veraart (2012) to the multivariate case. We find that the volatility modulation and the jump term preserve the core properties of the model-i.e. its stationarity and ergodicity-which in turn imply that extensions of the results from Sections 3 and 4 hold.
Notation 6.0.1. For a process (X t , t ≥ 0) ⊆ R d , we denote by ϕ Xt (u) := E exp iu ⊤ X t its characteristic function at time t. Similarly, for an M d (R)-valued process (X t ), we write ϕ Xt (u) := E exp itr(u ⊤ X t ) . Finally, we denote by log ϕ(·) the distinguished logarithm of ϕ for an infinitely divisible distribution (Sato et al. 1999, Lemma 7.6) Remark 6.0.2. For a two-sided Lévy process (L t , t ∈ R) ⊆ M d (R) and for adapted processes with respect to L, we denote by t 0 A a dL s B s the matrix whose (i, j)-th element is given by k,l t 0 A ik,s B lj,s dL kl,s .

Model extension
Consider the continuous-time process (Y (v) where (Σ t , t ∈ R) is a càdlàg stochastic volatility (SV) process. In addition, (W t , t ∈ R) is a d-dimensional Brownian motion process, (T t , t ∈ R) is an increasing continuous process where T t → ±∞ P 0 -a.s. as t → ±∞ and (J t , t ∈ R) is a two-sided pure-jump Lévy process with characteristic triplet (γ J , 0, ν J ) with respect to the truncation function τ (z) := I {x∈R d : x ≤1} (z) (see Section 2.2). Under standard regularity conditions (see Sections 6.3.3 & 6.4), we know that the unique candidate for a stationary solution to Equation (11) is Regarding the volatility process, consider a positive definite matrix V ∈ S ++ d and a twosided d × d matrix Lévy subordinator (L t , t ∈ R) (Barndorff-Nielsen & Pérez-Abreu 2008) such that (Σ t , t ∈ R) is a stationary positive semidefinite Ornstein-Uhlenbeck (PSOU) process (Pigorsch & Stelzer 2009b), i.e. given by We recall the existence conditions of this stationary process in Section 6.3.1.
Assumption 3. We assume the independence between W, L, J and T .
In the following two subsections, we characterise both terms presented in the stationary solution in Eq. (12). We then prove that the resulting process (Y (v) t ) is mixing hence ergodic which requires additional definitions presented in the next section.
To prove the ergodicity of the model presented in Section 6.1, we augment our framework with another class of stochastic mixed moving average (MMA) processes which have wellstudied asymptotic behaviour such as the mixing and ergodic properties (see Section 6.2).

Lévy bases, MMA processes and the mixing property
In this section, we recall the definitions of Lévy bases, characteristic quadruplet and Lévydriven MMA processes. where Π = π ⊗ λ leb is the product of a probability measure π on S ++ d and the Lesbesgue measure on R and z → ϕ(z) is the characteristic function of an infinitely divisible distribution (Section 2.2) characterised, say, by a triplet (γ, Σ, ν).
Let γ(A) := γ and Σ(A) := Σ be trivial maps from S ++ d to, respectively, R d and S + d , and let ν(dx, A) := ν(dx) be an extension of ν to R d × S ++ d . As per Section 3, Fuchs & Stelzer (2013) and p. 162 Barndorff-Nielsen et al. (2018), any such quadruplet ( γ, Σ, ν, π) characterises completely in law a Lévy basis Λ in the sense of Definition 33, Barndorff-Nielsen et al. (2018) where π is then called the intensity measure (as an extension of the control measure from Rajput & Rosinski (1989)). Indeed, S → S γ(A)π(dA) = γπ(S) and S → S Σ(A)π(dA) = Σπ(S) are respectively signed and unsigned measures on S ++ d , B(S ++ d ) ; while S ν(dx, A)π(dA) = ν(dx)π(S) is a Lévy measure on R for a fixed S ∈ S ++ d . For the existence of integrals with respect to a Lévy basis, see Th. 3.2, Fuchs & Stelzer (2013) and Th. 2.7, Rajput & Rosinski (1989). We recall the definition of multivariate MMA processes as follows:  (2013), for all t ∈ R, it is called an n-dimensional mixed moving average process (MMA for short). The function f is said to be its kernel function.
Finally, we also recall the definition of mixing processes:

Definition 5. A process (Y t , t ∈ R) is mixing if and only if, for any
. It is straightforward to observe that this implies ergodicity. Fuchs & Stelzer (2013) adapt the mixing conditions given in Maruyama (1970) and Rosiński &Żak (1997) to the multivariate context and prove that Lévy-driven MMA processes are mixing (Theorem 3.5 therein). . The ability to model specific marginal distributions whilst remaining tractable gives a flexible and powerful method to augment our original model (Pigorsch & Stelzer 2009b, Sections 4.2 and 5).

Stochastic volatility component
Denote by ρ : S d → S d the linear operator X → V X + XV ⊤ such that e tρ (S d ) = S d (Pigorsch & Stelzer 2009b, Section 3).

Invariant distribution and operator self-decomposability
The literature focuses on the existence and uniqueness of the invariant distribution given in Equation (13) (Masuda 2004, Pigorsch & Stelzer 2009b. Suppose that then, there exists a unique invariant distribution F Σ (according to Prop. 2.2, Masuda (2004), and Th. 4.1 & 4.2, Sato & Yamazato (1984)) which we take in its matrix-valued form. The distribution F Σ is operator self-decomposable with respect to the linear operator ρ (Pigorsch & Stelzer 2009b, Prop. 4.3). Hence, F Σ is absolutely continuous if the support of L is non-degenerate (i.e. ν L (a + S) < 1 for any a ∈ S d and S ⊆ S d such that dim(S) ≤ dim(S d ) − 1, see Yamazato (1983)). In that case, note that this stationary distribution is almost surely concentrated on S ++ d with respect to the Lebesgue measure (Pigorsch & Stelzer 2009b, Th. 4.4). According to Section 2.4, Pigorsch & Stelzer (2009b), one can write for t ≥ 0 that where Σ 0 ∈ S + d . We conclude that (Σ t ) is an MMA process (Fuchs & Stelzer 2013, Def. 3.3).
Proposition 6.3.1. Suppose the framework given in Sections 6.1 & 6.3 holds. Then, (Σ t , t ∈ R) is an MMA process as given in Definition 4.

Characteristic function
Recall that (L t , t ∈ R) is taken to be a two-sided matrix Lévy subordinator process: a process that is S + d -increasing (such that L t − L s ∈ S + d for any t > s) and of finite variation (Barndorff-Nielsen & Stelzer 2007). It is characterised by a triplet (γ L , 0, ν L ) where γ L ∈ S + d and ν L is a Lévy measure on the space of positive semidefinite matrices S + d such that According to Part 1, Barndorff-Nielsen & Shiryaev (2015), given Equation (16), its characteristic function at time t ∈ R given by with respect to the truncation function τ (X) ≡ 0 on M d (R) (Barndorff-Nielsen & Shiryaev 2015, Part 1). Theorem 4.9, Pigorsch & Stelzer (2009b) yields that if V ∈ S ++ d and Equations (14) & (16) hold, then the PSOU process (Σ t , t ∈ R) is strictly stationary and its distribution is infinitely divisible with characteristic function In that case, note that ν Σ (S d \S + d ) = 0. Following the characterisation of (Σ t , t ∈ R), we present the second part of the noise in Equation (11) which is a pure-jump time-changed Lévy process.

Stochastic volatility of a multivariate OU process
Suppose that (Σ t ) is strictly stationary. Let a < b ∈ R ∪ {±∞} and consider the process By Proposition 6.3.1, (Σ t ) is an Lévy-driven MMA process hence locally uniformly bounded as given by Theorem 4.3, (ii), Barndorff-Nielsen & Stelzer (2011). Therefore, for any t ≥ 0, the integral t −∞ e −(t−s)Q Σ s e −(t−s)Q ⊤ ds is a Lebesgue integral of (Σ t ) ω-wise.
We obtain the following distributional property for F (1) ab : ab is non-degenerate in the sense of Yamazato (1983) for any a < b ∈ R ∪ {±∞} Proof. See Appendix A.12.
In the case when a = −∞, we can prove the stationarity of the stochastic volatility term as follows Proof. See Appendix A.13.

Pure-jump component
Let us next consider the pure-jump process (J Tt , t ∈ R). Suppose that as well as

Characteristic function
Since J is a pure-jump Lévy process and given Equation (20), we have We recall that a stochastic process X is adapted with respect to T if X is constant on any interval [T t− , T t ] for any t ∈ R. According to Lemma 10.14, Jacod (1979), since T −t → −∞ as t → −∞ and T is continuous, then J is T -adapted. Similarly to Section 1.2.2, Barndorff-Nielsen & Veraart (2012), all the base properties of J carry over to the time-changed process. Therefore, the characteristic function of (J Tt ) is given by This implies that (J Tt , t ∈ R) has a characteristic triplet (T γ J , 0, T ⊗ ν J ).
Proof. The statement can be proved similarly to Proposition 6.3.3 since (T t ) is almost surely increasing which is not repeated here for the sake of brevity.
Proposition 6.4.2 is important since the time-changed pure-jump component does not benefit from the Gaussian structure of F (1) and MMA processes alleviate this complication.

Stationarity and ergodicity
If Equations (14), (16) and (19) as given in Equation (12). Both terms have characteristic functions given in Sections 6.3.2 and 6.4.2. Similarly to the Lévy-driven case in Masuda (2004), Equation (21) yields solutions which have operator self-decomposable distributions given the independence between L, W, J and T . Indeed, we write for Proof. From the stationarity of both right-hand side terms of Equation (21) given by Propositions 6.3.3 and 6.4.1, the result follows directly.
We prove that (Σ t , t ∈ R) and (Y (v) t , t ∈ R) are ergodic in the following proposition: Proposition 6.5.2. Suppose the framework given in Sections 6.1, 6.3 & 6.4 holds. Then, t , t ∈ R) are mixing and hence ergodic. Proof. See Appendix A.15.
The mixing and ergodicity properties of (Σ t ) are important for statistical inference on the stochastic volatility which is outside the scope of this article. We have proved that the resulting process is well-defined, stationary and ergodic. Therefore, conditional on the stochastic volatility, the estimator central limit theorems from Section 4 hold under stochastic volatility modulation.

Conclusion
In this article, we tackle the problem of modelling sparse interactions between multiple time series. For this purpose, we have defined the Graph Ornstein-Uhlenbeck process-a Lévydriven Ornstein-Uhlenbeck process adapted for graph structures-of which we propose two different configurations. We first consider a network-wide parametrisation where there is only one parameter to characterise momentum across all nodes and another unique parameter for the network effect. The first estimator is robust against the curse of dimensionality whilst only providing a scarce feedback on network interactions. Then, we consider an augmented version of this estimator with node-dependent momentum parameter and a different network effect for each of a node's neighbours. We derive the well-definedness, existence, uniqueness and efficiency of those estimators and we prove three novel central limit theorems (CLT) as the time horizon goes to infinity for both MLEs and an Adaptive Lasso scheme. The CLTs are a necessary step for both hypothesis tests (Morales et al. 2000) and quantifying uncertainty in the inference of graphical and/or high-dimensional time series. Finally, we extend the GrOU process to include both a stochastic volatility and jump terms which serves as an introduction towards flexible covariance structures for graphs. We also show that the afore-mentioned properties and asymptotic behaviours hold under standard regularity and independence assumptions.
This work is the first step towards understanding the behaviour of continuous-time stochastic processes on graph structures. A limitation of the current formulation is the necessity to have a continuum of data available and recent theoretical studies and applications leverage high-frequency data sources to circumvent this issue (Brownlees et al. 2020, Kim et al. 2016. The asymptotic properties of the GrOU process in this context is left for future research. Next, the extensions to a time-dependent and stochastic modulation of the noise open up questions on the inference of sparse Lévy-driven high-dimensional volatility or covariance structures (Belomestny et al. 2019, Cai et al. 2016, Tao et al. 2013. College London (under the grant EP/L015129/1).

A.2 Proof of Theorem 3.3.4
Consider the definition of a steep mapping from Section 2.2, Küchler & Sørensen (1997): with the notations from Notation A.1.1, for a fixed t ∈ (0, ∞), if θ → C t (θ) is a differentiable convex map such that for any θ 0 ∈ int Θ and θ 1 ∈ Θ\int Θ we have then C t is said to be steep. Note that a sufficient condition for this property to be true is to have Θ = R k (Küchler & Sørensen 1997, Section 2.2). In this context, we define and, from Notation 3.3.1, recall that we have and that 2C t (θ) = θ ⊤ · [H] t · θ by Equation (7).
Proof of Theorem 3.3.4. According to Proposition 3.3.3, the likelihood is defined for any θ ∈ R 2 and note that C t (θ) is the cumulant generating function of H t . Theorem 8.2.1, Küchler & Sørensen (1997) requires that (a) the said likelihood representation is minimal which is true since H has almost surely affinely independent components (Küchler & Sørensen 1997, Section 4); (b) θ → θ is injective which is trivially true; (c) the cumulant function θ → C t (θ) to be defined on a set Θ independent of t and to be steep -again, this is also true since by Lemma 3.3.2 θ → C t (θ) is defined on R 2 with |[H]| t < ∞ a.s. componentwise for a fixed t ≥ 0 large enough (Küchler & Sørensen 1997, criterion in Section 2.2).
In closed-form, we obtain from Proposition 3.3.3 and Notation A.1.1 that: Therefore, under necessary conditions, the MLE θ t up to time t ≥ 0 large enough is given by Regarding the existence and uniqueness of the estimator, by application of Theorem 8.2.1, Küchler & Sørensen (1997), the maximum likelihood estimator exists and is unique if and only if |H t | < ∞ P t,Y -a.s. componentwise and if det([H] t ) > 0. The first condition is satisfied again by Lemma 3.3.2 whilst the second remains to be checked.
More precisely, the almost-sure positiveness of the determinant of the matrix [H] t is proved using Fubini's theorem along with the Cauchy-Schwarz's inequality (in both its inner product and integral forms) as follows: Here, (ineq)equalities are almost sure with respect to the martingale measure P t,Y . Finally, observe that the Cauchy-Schwarz inequalities are actually sharp since the diagonal of A is zero. More precisely, the i-th component of AY u is not linearly dependent with the i-th component of Y u since diag A = 0. Thus, we conclude since the determinant of [H] t , denoted det([H] t ), has the following value: for any t ≥ 0. We have then proved that this estimator exists and is unique on R 2 . By convexity of the likelihood, it is also unique on any compact set on which it exists which concludes the proof.
which can clearly be written in its matrix form. Similarly to Masuda (2004) and Gaïffas & Matulewicz (2019), the stationary solution presented in Equation (4) gives away that the limiting (ergodic) quantity is positive definite a.s. since we have which is positive definite. The result follows immediately and by extension, for t large enough, we can deduce that K t is almost-surely positive definite.
A.4 Proof of Proposition 3.4.2 Proof of Proposition 3.4.2. Recall that Proposition 2.4.3 holds and we write Q := vec −1 (ψ). The logarithm of the likelihood defined Eq. (6) evaluated at Q is written as The first term (up to its sign) is given by t 0 QY s , dY c s Σ which can be reformulated as follows For the second term, one obtains that Finally, using that (AC) ⊗ (BD) = (A ⊗ B) · (C ⊗ D) with A, B, C, D four matrices with dimensions such that the products AC and BD are well-defined, one obtains which concludes the proof. Given the stationary and square integrability of Y, I and [I] are a.s. finite for t < ∞ and we have that Ψ := ψ ∈ R d 2 : |ψ ⊤ · [I] t · ψ| < ∞, ∀t ≥ 0 = R d 2 .
since Q(θ) = θ 2 I d×d + θ 1 A. According to Proposition 3.3.3, we define the log-likelihood ℓ t (θ; Y) := ln L t (θ; Y) for θ ∈ Θ. By the almost sure finiteness of H and [H], for any h ∈ R 2 , we obtain as increment of the log-likelihood the following: (25).
Then, we observe that: almost surely (by Theorem 3.1.1) hence in probability where G ∞ is a (deterministic) positive definite matrix. This also yields the convergence of the covariance matrix of H (s) t as t → ∞. Then, by Theorem A.7.7, Küchler & Sørensen (1997), we obtain the pairwise convergence where η is a 2-dimensional standard normal random variable and therefore We conclude that the family of statistical models P θ Y , θ ∈ Θ is locally asymptotically normal since G ∞ is deterministic.
A.7 Proof of Theorem 4.1.3 Proof of Theorem 4.1.3. Using the notations of Section A.5 (proof of Proposition 4.1.1), we have that t → ∇C t (θ) = [H] t · θ is continuous with bounded variation on compact intervals as a time integral of an L 1 integrand. Condition 8.3.2, Küchler & Sørensen (1997) holds since: (a) H t is continuously-valued in time; (b) t −1 ∇ 2 C t (θ) = t −1 [H] t → G ∞ under P Y which is positive definite; (c) by Fubini-Tonelli's theorem, we have the same limit for the expected information matrix since Y s has finite second moments for any s ≥ 0. Hence, by Theorem 8.3.4, Küchler & Sørensen (1997), we obtain that Regarding the asymptotic efficiency, recall that θ := [H] −1 t H t and from Lemma 4.1.2 and its proof (Equations (25) & (26)), we obtain that that t 1/2 ( θ − θ) is asymptotically Gaussian with an asymptotic variance of G ∞ which is the corresponding Fisher information matrix. According to the Hájek-Le Cam's convolution theorem for locally asymptotically normal experiments (Le Cam & Lo Yang 1990), we obtain the Hájek-Le Cam asymptotic efficiency of the estimator.  Küchler & Sørensen (1997), we notice that Equation (1) in particular for the continuous part of Y yields dY c t = −QY t dt + dW t . Since ψ can be written A.9 Proof of Theorem 4.2.3 The proof is similar to that of Prop. 4.1.1 and we define Q := vec −1 (ψ) as well as Observe that ∇ 2 C t (ψ) = [I] t = K t ⊗ Σ −1 and we have by ergodicity s. as t → ∞. Note that t −1 E ([I] t ) converges to the same limit and that since G ψ ∞ has almost-surely finite components and positive definite. Theorem 8.3.4, Küchler & Sørensen (1997) gives a central limit theorem with [I] 1/2 t as the scaling matrix. Since the stochastic boundedness and finite variation over bounded intervals of [I] −1 t (I t − ∇C t (ψ)) : t ≥ 0 on Ψ can be proved similarly to the proof of Proposition 4.1.1, we can now apply Theorem 8.3.4, Küchler & Sørensen (1997) and conclude that, as t → ∞, and, by Slutsky's lemma, A.10 Proof of Theorem 5.3.1 Proof. We first show the asymptotic normality property before leveraging it to prove the consistency in variable selection of the Adaptive Lasso. Since dY c t = −Q 0 Y t− dt + dW t , we center the log-likelihood around Q 0 as follows: Without loss of generality, we change the penalty rate from λ to λt since it does not depend on Q. Then, by writing Q = Q 0 + t −1/2 M for some M ∈ M d (R), we have Otherwise, for (i, j) ∈ Q 0 := {(k, l) ∈ Q 0 : 1 ≤ k, l ≤ d}, we proceed similarly but, from t 1/2 -consistency of the MLE, we have |t 1/2 Q t,ij | = o p (1). Hence, since λ(t)t (γ+1)/2 → ∞ and γ > 0, we obtain that and this quantity is zero if M ij = 0. We summarise the resulting asymptotic properties as follows: Suppose that the objective function is well-defined, i.e. supp(M) ∩ G 0 = ∅: the entries from M with indices outside the support of Q 0 are all zero. In that case, we have tr(MZ) = vec( and we conclude that the objective function is quadratic in vec(M |G 0 ). Its maximum in the limit t → ∞, denoted M, is given by Given the covariance structure of Z, vec( M |G 0 ) is a centred Gaussian random vector with covariance K −1 ∞ ⊗ Σ |G 0 ×G 0 which the proof of asymptotic normality.

Consistency in variable selection
We first note that the asymptotic normality of Q AL,t on G 0 yields that P ( Q AL,t ) ij = 0 → 1 if (i, j) ∈ G 0 . It remains to show that P ( Q AL,t ) ij = 0 → 1 if (i, j) ∈ G 0 . Suppose that ( Q AL,t ) ij = 0. We first take the derivative of the objective function with respect to the (i, j)-th parameter of Q AL,t . We multiply by t −1/2 , set it to zero before taking the absolute value on both sides: this yields the following relationship: On the right hand side, as mentioned above, we have t 1/2 ( Q AL,t ) |G 0 = o p (1). Since λt (γ+1)/2 = O(1) and γ > 0, this side diverges to ∞ in probability as t → ∞. The first term on the lefthand side is a martingale and is normally-distributed as t → ∞. By the asymptotic normality of t 1/2 ( Q AL,t ) |G 0 and the fact that t −1 K t p − − → K ∞ , the left-hand side is the absolute value of the sum of two Gaussian random variables whose probability to be larger or equal to the right-hand side (which diverges to ∞) tends to zero. The right-hand side was computed under the assumption that ( Q AL,t ) ij = 0, and a bound of the probability of that event happening is zero, hence we have that P ( Q AL,t ) ij = 0 → 1 if (i, j) ∈ G 0 . This shows the second and last property.
A.11 Proof of Proposition 6.3.1 Proof of Proposition 6.3.1. We define π V := δ V a probability measure on S ++ d and there exists an S + d -valued Lévy basis Λ L on S ++ d × R corresponding to the characteristic quadruplet (γ L , 0, ν L , π V ). We denote by vec(Λ L ) = vec(Λ(B)) : B ∈ B b (S ++ d × R) the R d 2 -valued Lévy basis corresponding to Λ L and we rewrite (Σ t , t ∈ R) as follows: which can also be interpreted as a d 2 -dimensional Lévy-driven MMA process which is mixing (Fuchs & Stelzer 2013, Theorem 3.5).