Abstract
Gaussian processes occupy one of the leading places in modern statistics and probability theory due to their importance and a wealth of strong results. The common use of Gaussian processes is in connection with problems related to estimation, detection, and many statistical or machine learning models. In this paper, we propose a precise definition of multivariate Gaussian processes based on Gaussian measures on vector-valued function spaces, and provide an existence proof. In addition, several fundamental properties of multivariate Gaussian processes, such as stationarity and independence, are introduced. We further derive two special cases of multivariate Gaussian processes, including multivariate Gaussian white noise and multivariate Brownian motion, and present a brief introduction to multivariate Gaussian process regression as a useful statistical learning method for multi-output prediction problems.
Similar content being viewed by others
Avoid common mistakes on your manuscript.
1 Introduction
In probability theory and statistics, Gaussian processes are used in connection with problems such as estimation, detection, stochastic analysis, even modern statistical or machine learning. These problems are often effectively formulated in terms of Gaussian measures on appropriate real-valued function space. As an important extension, vector-valued Gaussian processes have not been adequately explained in the literature so that the real-valued function space may limit applications of classical Gaussian processes since the correlation between multiple random values is difficult to be considered in many problems. Therefore, this paper is to consolidate the fundamentals of vector-valued stochastic processes, especially extending classical Gaussian process to multivariate Gaussian process by introducing Gaussian measures on vector-valued function spaces. The main contribution of this paper is to provide a concise proof and a proper explanation of the multivariate Gaussian process followed by examples and applications such as the multivariate Brownian motion and multivariate Gaussian process regression.
The paper is organised as follows. Section 2 introduces some preliminaries, including classical Gaussian measures, matrix-variate Gaussian distribution, stationary process, and classical Gaussian process. Section 3 presents some theoretical definitions of multivariate Gaussian process with the proof of existence. Two examples and one application of multivariate Gaussian processes which show their usefulness is presented in Sect. 4 and Sect. 5. Conclusions and a discussion are given in Sect. 6.
2 Preliminary
2.1 Gaussian measure
Definition 1
(Gaussian measure on \({\mathbb {R}}\) [1]) Let \({\mathcal {B}}({\mathbb {R}})\) denote the completion of the Borel \(\sigma \)-algebra on \({\mathbb {R}}\). Let \(\lambda : {\mathcal {B}}({\mathbb {R}}) \mapsto [0, + \infty ]\) denote the usual Lebesgue measure. Then the Borel probability measure \(\gamma : {\mathcal {B}}({\mathbb {R}}) \mapsto [0, 1] \) is Gaussian with mean \(\mu \in {\mathbb {R}}\) and variance \(\sigma ^2 >0\),
for any measurable set \(A \in {\mathcal {B}}({\mathbb {R}})\).
A random variable X on a probability space \((\Omega , {\mathcal {B}}, {\mathcal {P}})\) is Gaussian with mean \(\mu \) and variance \(\sigma ^2\) if its distribution measure is Gaussian, i.e.
In terms of random variable, we have a definition of Gaussian random variable.
Definition 2
An n-dimensional random vector \(\varvec{X} = (X_1, \cdots , X_n)\) is Gaussian if and only if \(\langle \varvec{a}, \varvec{X} \rangle : = \varvec{a}^{{\textsf{T}}}\varvec{X} = \sum a_i X_i\) is a Gaussian random variable for all \(\varvec{a} = (a_1, \cdots , a_n) \in {\mathbb {R}}^n\).
In terms of measure, we can naturally have the definition of Gaussian measure on \({\mathbb {R}}^n\).
Definition 3
(Gaussian measure on \({\mathbb {R}}^n\) [1]) Let \(\gamma \) be a Borel probability measure on \({\mathbb {R}}^n\). For each \(\varvec{a} \in {\mathbb {R}}^n\), denote a random variable \(Y(\varvec{x}), \varvec{x} \in {\mathbb {R}}^n \) as a mapping \(\varvec{x} \mapsto \langle \varvec{a}, \varvec{x} \rangle \in {\mathbb {R}}\) on the probability space \(({\mathbb {R}}^n, {\mathcal {B}}({\mathbb {R}}^n), \gamma )\). The Borel probability measure \(\gamma \) is a Gaussian measure on \({\mathbb {R}}^n\) if and only if the random variable Y is Gaussian for each \(\varvec{a}\).
2.2 Matrix-variate Gaussian distribution
In statistics, the matrix-variate Gaussian distribution is a probability distribution that is a generalization of the multivariate normal distribution to matrix-valued random variables.
Definition 4
(Matrix-variate Gaussian distribution [2]) The random matrix is said to be Gaussian:
if and only if
where \(\otimes \) denotes the Kronecker products and \(\textrm{vec}(\varvec{X})\) denotes the vectorisation of \(\varvec{X}\).
2.3 Stationary process
Definition 5
(Stationary process [3]) Let \(X = \{ X_n(\omega ) \}\) be a stochastic process on a probability space \((\Omega , {\mathcal {B}}, {\mathcal {P}})\). If for any integers n and h, two random vectors
have the same probability distribution, then X is said to be a stationary process.
2.4 Gaussian process
Let \({\mathbb {R}}_T\) be function space of all \({\mathbb {R}}\)-valued functions on TFootnote 1. Consider the product topology on \({\mathbb {R}}_T\), which is defined as the smallest topology that makes the projection maps \(\Pi _{t_{1}, \ldots , t_{n}}(f)=\left[ f\left( t_{1}\right) , \ldots , f\left( t_{n}\right) \right] \) from \({\mathbb {R}}_T\) to \({\mathbb {R}}^n\) measurable, and define \({\mathcal {F}}\) as the Borel \(\sigma \)-algebra of this topology.
Definition 6
(Gaussian measure on \(({\mathbb {R}}_T, {\mathcal {F}})\)) A measure \(\gamma \) on \(({\mathbb {R}}_T, {\mathcal {F}})\) is called as a Gaussian measure if for any \(n \ge 1\) and \(t_1, \cdots , t_n \in T\), the push-forward measure \(\gamma \circ \Pi _{t_1, \cdots , t_n}^{-1}\) on \({\mathbb {R}}^n\) is a Gaussian measure.
Briefly speaking, (multivariate) Gaussian distributions are Gaussian measures on \({\mathbb {R}}^n\), and Gaussian processes are Gaussian measures on the function space \(({\mathbb {R}}_T, {\mathcal {F}})\) due to the following theorem.
Theorem 1
(Relationship between Gaussian process and Gaussian measure) If \(X=(X_t)_{t \in T}\) is a Gaussian process, then the push-forward measure \(\gamma = {\mathcal {P}} \circ X^{-1}\) with \(X: \Omega \mapsto {\mathbb {R}}_{T}\) is Gaussian on \({\mathbb {R}}_{T}\), namely, \(\gamma \) is a Gaussian measure on \(({\mathbb {R}}_T, {\mathcal {F}})\). Conversely, if \(\gamma \) is a Gaussian measure on \(({\mathbb {R}}_T, {\mathcal {F}})\), then on the probability space \(({\mathbb {R}}_T, {\mathcal {F}}, \gamma )\), the co-ordinate random variable \(\Pi = (\Pi _t)_{t \in T}\) is from a Gaussian process.
The proof of the relationship between Gaussian process and Gaussian measure can be found in [4]. Motivated by the relationship between Gaussian measures and Gaussian processes, we properly define multivariate Gaussian processes by extending Gaussian measures on real-valued function space to vector-valued function space.
3 Multivariate Gaussian process
Gaussian processes (GPs) have been proven to be an effective statistical learning method for nonlinear problems due to many desirable properties, such as a clear structure with Bayesian interpretation, a simple integrated approach of obtaining and expressing uncertainty in predictions and the capability of capturing a wide variety of data feature by hyper-parameters [5, 6]. With the development of Gaussian processes related to machine learning algorithms, Gaussian process applications face a conspicuous limitation. The classical GP models can be only used to deal with a single output or single response problem because the process itself is defined on \({\mathbb {R}}\), and as a result the correlation between multiple tasks or responses cannot be taken into consideration [6, 7]. In order to overcome the drawback above, [8] proposed a framework to perform multi-output prediction using multivariate Gaussian processes (MV-GPs). Although [8] showed the usefulness of the proposed methods via data-driven examples, some theoretical issues of multivariate Gaussian processes, such as existence, independence condition, and stationary condition, are still not clear.
3.1 Definitions
Following the classical theory of Gaussian measure, matrix-variate Gaussian distribution, and Gaussian process, we introduce Gaussian measure on matrix space and Gaussian measure on \({\mathbb {R}}^d\)-valued function space, and finally define the multivariate Gaussian process.
According to Definition 3 (equivalence between Gaussian measure on \({\mathbb {R}}^n\) and random variable on \({\mathbb {R}}^n\)) and Definition 4 (equivalence between random variable on \({\mathbb {R}}^{nd}\) and random matrix on \({\mathbb {R}}^{n \times d}\) ), we can have a definition of Gaussian measure on \({\mathbb {R}}^{n \times d}\).
Definition 7
(Gaussian measure on \({\mathbb {R}}^{n \times d}\)) Let \(\gamma \) be a Borel probability measure on \({\mathbb {R}}^{n \times d}\). For each \(\varvec{a} \in {\mathbb {R}}^{nd}\), denote a random variable \(Y(\varvec{x}), \varvec{x} \in {\mathbb {R}}^{n \times d}\) as a mapping \(\varvec{x} \mapsto \langle \varvec{a}, \textrm{vec}(\varvec{x}) \rangle \in {\mathbb {R}}\) on the probability space \(({\mathbb {R}}^{n \times d}, {\mathcal {B}}({\mathbb {R}}^{n \times d}), \gamma )\). The Borel probability measure \(\gamma \) is a Gaussian measure on \({\mathbb {R}}^{n \times d}\) if and only if the random variable Y is Gaussian for each \(\varvec{a}\).
Let \(({\mathbb {R}}^{d})_T\) be all \({\mathbb {R}}^{d}\)-valued functions on T and consider the smallest topology on \({\mathbb {R}}^d\) that makes the projection mappings \(\Xi _{t_{1}, \ldots , t_{n}}(\varvec{f})=\left[ \varvec{f}\left( t_{1}\right) , \ldots , \varvec{f}\left( t_{n}\right) \right] \) from \(({\mathbb {R}}^{d})_T\) to \({\mathbb {R}}^{n \times d}\) measurable, and define \({\mathcal {G}}\) as the Borel \(\sigma \)-algebra of this topology. Thus we can have a definition of Gaussian measure on \({\mathbb {R}}^d\)-valued function space.
Definition 8
(Gaussian measure on \((({\mathbb {R}}^d)_T, {\mathcal {G}})\)) A measure \(\gamma \) on \((({\mathbb {R}}^d)_T, {\mathcal {G}})\) is called as a Gaussian measure if for any \(n \ge 1\) and \(t_1, \cdots , t_n \in T\), the push-forward measure \(\gamma \circ \Xi _{t_1, \cdots , t_n}^{-1}\) on \({\mathbb {R}}^{n \times d}\) is a Gaussian measure.
Inspired by the relationship between Gaussian process and Gaussian measure in Theorem 1, we can appropriately define multivariate Gaussian processes (MV-GPs).
Definition 9
(d-variate Gaussian process) Given a Gaussian measure on \((({\mathbb {R}}^d)_T, {\mathcal {G}})\), \(d \ge 1\), the co-ordinate random vector \(\Xi = (\Xi _t)_{t \in T}\) on the probability space \((({\mathbb {R}}^d)_T, {\mathcal {G}}, \gamma )\) is said to be from a d-variate Gaussian processFootnote 2.
Given the definition of MV-GPs, it is essential to show the existence before proceeding with further research. Inspired by the fact that GPs can be specified by mean function and covariance function (thus they can be denoted as \(\mathcal{G}\mathcal{P}(\mu , k)\)), we present the proof of existence of MV-GPs by applying Daniell-Kolmogorov theorem.
Theorem 2
(Existence of d-variate Gaussian processFootnote 3) For any index set T, any vector-valued mean function \(\varvec{u} : T \mapsto {\mathbb {R}}^{1 \times d}\), any covariance function \(k: T \times T \mapsto {\mathbb {R}}\) and any positive semi-definite parameter matrix \(\Lambda \in {\mathbb {R}}^{d \times d}\), there exists a probability space \((\Omega , {\mathcal {G}}, {\mathcal {P}})\) and a d-variate Gaussian process \(\varvec{f}(t) \in {\mathbb {R}}^{1 \times d}, t \in T\) on this space, whose mean function is \(\varvec{u}\), covariance function is k and parameter matrix is \(\Lambda \) , such that,
-
\(\mathbb {E}[\varvec{f}(t)] = \varvec{u}(t), \quad \forall t \in T\),
-
\(\mathbb {E}\left[ (\varvec{f}(t_s)- \varvec{u}(t_s))(\varvec{f}(t_l)- \varvec{u}(t_l))^{{\textsf{T}}}\right] = \textrm{tr}(\Lambda )k(t_s,t_l),\quad \forall t_s,t_l \in T\),
-
\(\mathbb {E}[({\textbf {F}}_{t_1, \cdots , t_n} - M_{t_1, \cdots , t_n})^{\textsf{T}}({\textbf {F}}_{t_1, \cdots , t_n} - M_{t_1, \cdots , t_n}) ] = \textrm{tr}(K_{t_1, \cdots , t_n}) \Lambda , \quad \forall n\ge 1, t_1, \cdots , t_n \in T\), where
$$\begin{aligned} M_{t_1, \cdots , t_n}&= [\varvec{u}(t_1)^{\textsf{T}}, \cdots , \varvec{u}(t_n)^{\textsf{T}}]^{\textsf{T}}\\ F_{t_1, \cdots , t_n}&= [\varvec{f}(t_1)^{\textsf{T}}, \cdots , \varvec{f}(t_n)^{\textsf{T}}]^{\textsf{T}}\\ K_{t_1, \cdots , t_n}&= \begin{bmatrix} k(t_1, t_1) &{} \cdots &{} k(t_1, t_n) \\ \vdots &{} \ddots &{} \vdots \\ k(t_n, t_1) &{} \cdots &{} k(t_n, t_n) \end{bmatrix}. \end{aligned}$$
We denote \(\varvec{f} \sim \mathcal {MGP}_d(\varvec{u}, k, \Lambda )\).
Proof
Given \(n > 1\), for every \(t_1, \cdots , t_n \in T\), a Gaussian measure \(\gamma _{t_1, \cdots , t_n}\) on \({\mathbb {R}}^{n \times d}\) satisfies the assumptions of Daniell-Kolmogorov theorem because the projection of a matrix Gaussian distribution on \({\mathbb {R}}^{n \times d}\) with \(\left[ \varvec{u}(t_1)^{{\textsf{T}}}, \cdots , \varvec{u}(t_n)^{{\textsf{T}}}\right] ^{{\textsf{T}}} \in {\mathbb {R}}^{n \times d}\), \(n \times n\) column covariance matrix \(K = (k_{i,j}) \in {\mathbb {R}}^{n \times n}\), and \(d \times d\) row covariance matrix \(\Lambda \in {\mathbb {R}}^{d \times d}\), to the first \(n-1\) co-ordinates, is precisely the Gaussian distribution with \(\left[ \varvec{u}(t_1)^{{\textsf{T}}}, \cdots , \varvec{u}(t_{n-1})^{{\textsf{T}}} \right] ^{{\textsf{T}}} \in {\mathbb {R}}^{(n-1) \times d}\), \((n-1) \times (n-1)\) column covariance matrix \(K = (k_{i,j}) \in {\mathbb {R}}^{(n-1) \times (n-1)}\), and row covariance matrix \(\Lambda \in {\mathbb {R}}^{d \times d}\) due to the conditional property of matrix Gaussian distribution [2, 8]. By the Daniell-Kolmogorov theorem, there exists a probability space \((\Omega , {\mathcal {G}}, {\mathcal {P}})\) as well as a d-variate Gaussian process \(X = (X_t)_{t \in T} \sim \mathcal {MGP}_d(\varvec{u}, k, \Lambda )\) defined on this space such that any finite dimensional distribution of \([X_{t_1}, \cdots , X_{t_n}]\) is given by the measure \(\gamma _{t_1, \cdots , t_n}\). \(\square \)
3.2 Properties
Following the existence of d-variate GPs, we also achieve some properties as follow.
Proposition 3
(Stationary) A d-variate Gaussian process \(\mathcal {MGP}_d(\varvec{u}, k, \Lambda )\) is said to be stationary if
Proof
Let \(\varvec{f} \sim \mathcal {MGP}_d(\varvec{u}, k, \Lambda )\), then for \(\forall n\ge 1, t_1, \cdots , t_n \in T\),
where \(\mathcal{M}\mathcal{N}(\cdot , \cdot , \cdot )\) is matrix-variate Gaussian distribution,
Given any time increment \(h \in T\), there also exists,
where
Since \(\varvec{u}(t) = \varvec{u}(t + h), k(t_s + h, t_l + h) = k(t_s, k_l), \forall t, t_s, t_l, h \in T\),
Therefore, \([\varvec{f}(t_1 + h)^{{\textsf{T}}},\ldots ,\varvec{f}(t_n + h)^{{\textsf{T}}}]^{{\textsf{T}}} \) has the same probability distribution as \([\varvec{f}(t_1)^{{\textsf{T}}},\ldots ,\varvec{f}(t_n)^{{\textsf{T}}}]^{{\textsf{T}}} \). Due to the arbitrary choice of \(n>1\) and \(t_1, \cdots , t_n \in T\), \(\varvec{f} \sim \mathcal {MGP}_d(\varvec{u}, k, \Lambda )\) is a stationary process according to Definition 5. \(\square \)
Proposition 4
(Independence) A d-collection of functions \(\{\varvec{f}_i\}_{i = 1,2, \cdots , d}\) identically independently follows a Gaussian process \(\mathcal{G}\mathcal{P}(\mu , k)\) if and only if
where \(\varvec{u} = [\mu , \cdots , \mu ] \in {\mathbb {R}}^d\) and \(\Lambda \) is any diagonal positive semi-definite matrix.
Proof
Necessity: if \(\varvec{f} \sim \mathcal {MGP}_d(\varvec{u}, k, \Lambda )\), then for \(\forall n\ge 1, t_1, \cdots , t_n \in T\),
where,
Rewrite the left, we obtain
where \(\varvec{\xi }_i = [f_i(t_1), f_i(t_2), \cdots , f_i(t_n)]^{{\textsf{T}}}\). Since \(\Lambda \) is a diagonal matrix, for any \(i \ne j\)
Because \(\varvec{\xi }_i\) and \(\varvec{\xi }_j\) are any finite number of realisations of \(\varvec{f}_i\) and \(\varvec{f}_j\) respectively from the same Gaussian process, \(\varvec{f}_i\) and \(\varvec{f}_j\) are uncorrelated. Due to joint finite realisations of \(\varvec{f}_i\) and \(\varvec{f}_j\) follow Gaussian, non-correlation implies independence.
Sufficiency: if \(\{\varvec{f}_i\}_{i = 1,2, \cdots , d} \sim \mathcal{G}\mathcal{P}(0, k)\) are independent, for \(\forall n\ge 1, t_1, \cdots , t_n \in T\) and for any \(i \ne j\),
Since \(\textrm{tr}( K_{t_1, \cdots , t_n})\) is non-zero, \(\Lambda _{ij}\) must be 0. Due the arbitrary choices of i, j, \(\Lambda \) must be diagonal. That is to say, \(\varvec{\xi }_i = [f_i(t_1), \cdots , f_i(t_n)]^{{\textsf{T}}}\) can be written as a matrix Gaussian distribution \(\mathcal{M}\mathcal{N}(\varvec{u}_{t_1, \cdots , t_n}, K_{t_1, \cdots , t_n}, \Lambda )\) where \(\Lambda \) is a diagonal positive semi-definite matrix. Since for \(\forall n\ge 1, t_1, \cdots , t_n \in T\) the above result holds, \(\{\varvec{f}_i\}_{i = 1, \cdots , d}\) are identically independent Gaussian processes \(\mathcal{G}\mathcal{P}(\mu , k)\). \(\square \)
4 Example: special cases
Instinctively, a special case is centred multivariate Gaussian process where vector-valued mean function \(\varvec{\mu } = \varvec{0}\). The 50 realisation samples generated from a centred multivariate Gaussian process are demonstrated in Fig. 1: Left. Furthermore, we can derive the multivariate Gaussian white noise and the multivariate Brownian motion.
4.1 Multivariate Gaussian white noise
Let \(\varvec{f} = [\varvec{f}_1,\cdots , \varvec{f}_d ] \sim \mathcal {MGP}_d(\varvec{0}, \sigma ^2 \mathbbm {1}(t_s, t_l), \Lambda )\), where \(\mathbbm {1}(\cdot )\) is an indicator function that equals 1 if \(t_s = t_l\), otherwise 0, thus
Furthermore, \(\forall n\ge 1, t_1, \cdots , t_n \in T\), there exists,
where \(F_{t_1, \cdots , t_n} = [\varvec{f}(t_1)^{\textsf{T}}, \cdots , \varvec{f}(t_n)^{\textsf{T}}]^{\textsf{T}}\).
Therefore, it is nature to have a definition of the d-variate Gaussian white noise.
Definition 10
(d-variate Gaussian white noise) A d-variate Gaussian process \(\mathcal {MGP}_d (\varvec{u}, k, \Lambda )\) is said to be a d-variate Gaussian white noise if \(\varvec{u} = \varvec{0}\) and \(k(t_s, t_l) = \sigma ^2 \mathbbm {1}(t_s, t_l)\).
Remark 1
We observe that d-variate Gaussian white noise has independence property as white noise along with T, but it has correlation along with d-variate dimension. Therefore, d-variate Gaussian white noise is also called as variate-dependent Gaussian white noise or variate-correlated Gaussian white noise, which is distinct from the traditional d-dimensional independent Gaussian white noise. Here are 50 realisation samples generated from a multivariate Gaussian white noise shown in Fig. 1: Right.
4.2 Multivariate Brownian motion
According to the Chapter 2 of the book [9], there is a definition of Brownian motion, which is a Gaussian white noise whose intensity is Lebesgue measure. Since Brownian motion is a special case of Gaussian process with continuous sample paths, mean function \(u = 0\) and covariance function \(k(s,t) = \min (s,t)\), we propose an example, d-variate Brownian motion, as a special case of d-variate Gaussian process with vector-valued mean function \(\varvec{u} = \varvec{0}\), covariance function \(k(s,t) = \min (s,t)\) and parameter matrix \(\Lambda \). Based on the Theorem 2, we derive some properties of the traditional Brownian motion to a more general vector-valued case.
Definition 11
(d-variate Brownian motion) A d-variate Gaussian process \(\mathcal {MGP}_d(\varvec{u}, k, \Lambda )\) is said to be d-variate Brownian motionFootnote 4 if all sample paths are continuous, \(\varvec{u} = \varvec{0}\) and \(k(t_s, t_l) =\min (t_s, t_l)\).
Let \(B_{t}\) be a d-variate Brownian motion, which means for all \(0 \le t_1 \le \cdots \le t_n\) the random matrix \(Z = (B_{t_1}^{{\textsf{T}}}, \ldots , B_{t_n}^{{\textsf{T}}})^{{\textsf{T}}} \in {\mathbb {R}}^{n \times d}\) has a Gaussian distribution on the probability space \((\Omega , {\mathcal {G}}, {\mathcal {P}})\) mentioned in Theorem 2. There exists a matrix \(M \in {\mathbb {R}}^{n \times d}\) and two non-negative definite matrices \(C = [c]_{jm} \in {\mathbb {R}}^{n \times n}\) and \(\Lambda = [\lambda ]_{ab} \in {\mathbb {R}}^{d \times d}\) such that
where \(W = [w]_{ja} \in {\mathbb {R}}^{n \times d}\) and \(\text {i}\) is the imaginary unit. Moreover, we also have the mean value \(M = {\mathbb {E}}[Z]\) and two covariance matrices
Assume that the mean matrix M here is a zero matrix, i.e. \({\mathbb {E}}[Z] = {\mathbb {E}}[Z \vert t = 0] = 0\), \(I_d = \Lambda \), and
Hence, \({\mathbb {E}}[B_t]=0\) for all \(t \ge 0\) and
Moreover, we have
Note that this d-variate Brownian motion \(B_t\) still has independent increments since \({\mathbb {E}} [(B_{t_i} - B_{t_{i-1}}) (B_{t_j} - B_{t_{j-1}})^{{\textsf{T}}}] = 0\) and \({\mathbb {E}} [(B_{t_i} - B_{t_{i-1}})^{{\textsf{T}}} (B_{t_j} - B_{t_{j-1}})] = 0\) when \(t_i < t_j\) holds for all \(0< t_1< \cdots < t_n\).
Remark 2
Similar to d-variate Gaussian white noise, d-variate Brownian motion also has independence property along with T, but it has correlation along with d-variate dimension. Therefore, d-variate Brownian motion is also called as variate-dependent Brownian motion or variate-correlated Brownian motion, which is distinct from the "traditional" d-dimensional Brownian motion. Actually, the "traditional" d-dimensional Brownian motion is a special case of d-variate Brownian motion with diagonal matrix \(\Lambda \).
As a Brownian motion, we then introduce Itô lemma for the d-variate Brownian motion.
Let \(B_t = [B_1(t), \cdots , B_d(t)]\) be the d-variate Brownian motion derived in Section 4.2. Then, we have the following lemma.
Lemma 5
(Itô lemma for the d-variate Brownian motion) Let F be a twice continuously differentiable real function on \({\mathbb {R}}^{d+1}\) and let \(\Lambda = [\lambda ]_{i,j} \in {\mathbb {R}}^{d \times d}\) be the covariance matrix for the d-variate dimension. Then,
Proof
By Itô lemma and the definition of the d-variate Brownian motion, we obtain
The proof is complete by \(d\langle B_i, B_j \rangle (s) = \lambda _{i,j} \textrm{d}s\). \(\square \)
5 Application: multivariate Gaussian process regression
Multi-output prediction is a good example of a practical application of multivariate Gaussian processes. Multivariate Gaussian process modelling provides a solid and unified framework for predicting multiple responses or tasks by exploiting their correlations. As a regression problem, multivariate Gaussian process As a regression problem, multivariate Gaussian process regression (MV-GPR) has closed-form expressions for marginal likelihoods and predictive distributions, so parameter estimations can employ the same optimization techniques as conventional Gaussian process modelling [8].
As a summary of MV-GPR in [8], the noise-free multi-output regression model is considered and the noise term is incorporated into the kernel function. Given n pairs of observations \(\{(x_i,\varvec{y}_i)\}_{i=1}^n, x_i \in {\mathbb {R}}^p, \varvec{y}_i \in {\mathbb {R}}^{d}\), we assume the following model
where \(\Lambda \) is an undetermined covariance matrix (the relationship between different outputs), \(k' = k(x_i,x_j) + \delta _{ij}\sigma _n^2, \) and \(\delta _{ij}\) is Kronecker delta. According to multivariate Gaussian process, it yields that the collection of functions \([\varvec{f}(x_1),\ldots ,\varvec{f}(x_n)]\) follows a matrix-variate Gaussian distribution
where \(K'\) is the \(n \times n\) covariance matrix of which the (i, j)-th element \([K']_{ij} = k'(x_i,x_j)\). Therefore, the predictive targets \(\varvec{f}_* = [f_{*1},\ldots ,f_{*m}]^{\textsf{T}}\) at the test locations \(X_* = [x_{n+1},\ldots ,x_{n+m}]^{\textsf{T}}\) is given by
where
and
Here \(K'(X,X)\) is an \(n \times n\) matrix of which the (i, j)-th element \([K'(X,X)]_{ij} = k'(x_{i},x_j)\), \(K'(X_*,X)\) is an \(m \times n\) matrix of which the (i, j)-th element \([K'(X_*,X)]_{ij} = k'(x_{n+i},x_j)\), and \(K'(X_*,X_*)\) is an \(m \times m\) matrix with the (i, j)-th element \([K'(X_*,X_*)]_{ij} = k'(x_{n+i},x_{n+j})\). In addition, the expectation and the covariance are obtained,
From a data science perspective, the hyperparameters involved in the covariance function (kernel) \(k'( \cdot , \cdot )\) and the row covariance matrix of MV-GPR needs to be estimated from the training data using a variety of methods [10], including maximum likelihood estimation, maximum a posteriori and Markov Chain Monte Carlo [11].
6 Conclusion
In this paper, we provide a formal definition of the multivariate Gaussian process as well as several of its related properties, including existence, stationarity and independence. Additionally, we present some special cases and examples of multivariate Gaussian processes, such as multivariate Brownian motion and multivariate Gaussian white noise. Finally, we present an useful application of multivariate Gaussian process regression in statistical learning.
Notes
Usually, T is a time space, however it could be considered as any arbitrary space (both one-dimensional and multi-dimensional, both discrete and continuous space) other than time.
The co-ordinate random vector can be either column and row. For simplicity, we use row random vector only in our discussions of the d-variate Gaussian process.
In order to clarify the shape of random vector, we use the notation, \({\mathbb {R}}^{d}\) for column vector only, and use the notation, \({\mathbb {R}}^{1 \times d}\), for row vector only.
For clarification, we only use row random vector in our discussions of the d-variate Brownian motion.
References
Eldredge, N.: Analysis and probability on infinite-dimensional spaces. arXiv preprint arXiv:1607.03591 (2016)
Gupta, A.K., Nagar, D.K.: Matrix variate distributions, vol. 104. Chapman and Hall/CRC, New York (1999)
Hida, T., Hitsuda, M.: Gaussian processes, vol. 120. American Mathematical Soc, Providence, Rhode Island (1993)
Rajput, B.S., Cambanis, S.: Gaussian processes and Gaussian measures. The annals of mathematical statistics, 1944–1952 (1972)
Rasmussen, C.E.: Evaluation of gaussian processes and other methods for non-linear regression. PhD thesis, University of Toronto, Toronto, Canada (1997)
Boyle, P., Frean, M.: Dependent Gaussian processes. Adv. Neural Inf. Process. Syst. 17, 217–224 (2005)
Wang, B., Chen, T.: Gaussian process regression with multiple response variables. Chemom. Intell. Lab. Syst. 142, 159–165 (2015)
Chen, Z., Wang, B., Gorban, A.N.: Multivariate Gaussian and student-t process regression for multi-output prediction. Neural Comput. Appl. 32(8), 3005–3028 (2020)
Le Gall, J.-F.: Brownian motion, Martingales, and stochastic calculus, vol. 274. Springer, Berlin, Heidelberg (2016)
Williams, C.K., Barber, D.: Bayesian classification with Gaussian processes. IEEE Trans. Pattern Anal. Mach. Intell. 20(12), 1342–1351 (1998)
Rasmussen, C.E., Williams, C.K.: Gaussian Processes for machine learning, vol. 1. MIT press, Cambridge, Massachusetts (2006)
Acknowledgements
The authors would like to thank Dr. Youssef El-Khatib for his comments and Dr. Gregory Markowsky for his kind proofreading and very helpful comments.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Chen, Z., Fan, J. & Wang, K. Multivariate Gaussian processes: definitions, examples and applications. METRON 81, 181–191 (2023). https://doi.org/10.1007/s40300-023-00238-3
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s40300-023-00238-3