In this section, we develop a theory for parameter estimation for a class of semilinear SPDE using a maximum likelihood ansatz. The application we are aiming at is an activator–inhibitor model as in Flemming et al. (2020). More precisely, we show under mild conditions that the diffusivity of such a system can be identified in finite time given high spatial resolution and observing only the activator component.
The Model and Basic Properties
Let us first introduce the abstract mathematical setting in which we are going to derive our main theoretical results. We work in spatial dimension \(d\ge 1\). Given a bounded domain \({\mathcal {D}}=[0,L_1] \times \dots \times [0,L_d] \subset {\mathbb {R}}^d\), \(L_1,\dots ,L_d>0\), we consider the following parameter estimation problem for the semilinear SPDE
$$\begin{aligned} \mathrm {d}X_t = \left( \theta _0\Delta X_t + F_{\theta _1,\dots ,\theta _K}(X)\right) \mathrm {d}t + B\mathrm {d}W_t \end{aligned}$$
(5)
with periodic boundary conditions for \(\Delta \) on the Hilbert space \(H={{\bar{L}}}^2({\mathcal {D}}) = \{u \in L^2 ({\mathcal {D}}) |\int _{\mathcal {D}} u\mathrm {d}x=0\}\), together with initial condition \(X_0\in H\). We allow the nonlinear term F to depend on additional (nuisance) parameters \(\theta _1,\dots ,\theta _K\) and write \(\theta =(\theta _0,\dots ,\theta _K)^T\), \(\theta _{1:K}=(\theta _1,\dots ,\theta _K)\) for short. Without further mentioning it, we assume that \(\theta \in \Theta \) for a fixed parameter space \(\Theta \), e.g., \(\Theta ={\mathbb {R}}_+^K\). Next, W is a cylindrical Wiener process modeling Gaussian space–time white noise, that is, \({\mathbb {E}}[\dot{W}(t, x)]=0\) and \({\mathbb {E}}[\dot{W}(t, x)\dot{W}(s, y)]=\delta (t-s)\delta (x-y)\). In order to introduce spatial correlation, we use a dispersion operator of the form \(B=\sigma (-\Delta )^{-\gamma }\) with \(\sigma >0\) and \(\gamma >d/4\), describing spectral decay of the noise intensity. Here, \(\sigma \) is the overall noise intensity, and \(\gamma \) quantifies the decay of the noise for large frequencies in Fourier space. In addition, \(\gamma \) determines the spatial smoothness of X, see Sect. 2.1.2. The condition \(\gamma >d/4\) ensures that the covariance operator \(BB^T\) is of trace class, which is a standard assumption for well-posedness of (5), cf. Liu and Röckner (2015).
Denote by \((\lambda _k)_{k\ge 0}\) the eigenvalues of \(-\Delta \), ordered increasingly, with corresponding eigenfunctions \((\Phi _k)_{k\ge 0}\). It is well known (Weyl 1911; Shubin 2001) that \(\lambda _k\asymp \Lambda k^{2/d}\) for a constant \(\Lambda >0\), i.e., \(\lim _{k\rightarrow \infty }\lambda _k/(\Lambda k^{2/d})=1\). The proportionality constant \(\Lambda \) is known explicitly [see, e.g., Shubin (2001, Proposition 13.1)] and depends on the domain \({\mathcal {D}}\). Let \(P_N:H\rightarrow H\) be the projection onto the span of the first N eigenfunctions, and set \(X^N:=P_NX\). For later use, we denote by I the identity operator acting on H. For \(s\in {\mathbb {R}}\), we write \(H^s:=D((-\Delta )^{s/2})\) for the domain of \((-\Delta )^{s/2}\), which is given by
$$\begin{aligned} (-\Delta )^{s/2}x=\sum _{k=1}^\infty \lambda _k^{s/2}\langle \Phi _k, x\rangle \Phi _k, \end{aligned}$$
and abbreviate \(|\cdot |_s:=|\cdot |_{H^s}\) for the norm on that space whenever convenient. We assume that the initial condition \(X_0\) is regular enough, i.e., it satisfies \({\mathbb {E}}[|X_0|_{s}^p]<\infty \) for any \(s\ge 0\), \(p\ge 1\), without further mentioning it in the forthcoming statements. We will use the following general class of conditions with \(s\ge 0\) in order to describe the regularity of X:
- \((A_s)\):
-
For any \(p\ge 1\), it holds
$$\begin{aligned} {\mathbb {E}}\left[ \sup _{0\le t\le T}|X_t|_{s}^p\right] < \infty . \end{aligned}$$
(6)
Our standing assumption is that X is well posed in the sense that there exists a probabilistically and analytically weak solution \(X\in C(0,T;H)\) to (5), unique in the sense of probability law, such that \((A_0)\) holds. This is a consequence, for example, of the assumptions from Liu and Röckner (2015, Theorem 5.1.3).
An Activator–Inhibitor Model
An important example for our analysis is given by the following FitzHugh–Nagumo type system of equations in \(d\le 2\) [cf. Flemming et al. (2020)]:
$$\begin{aligned} \mathrm {d}U_t&= (D_U\Delta U_t + k_1f(|U_t|_{L^2}, U_t) - k_2V_t)\mathrm {d}t + B\mathrm {d}W_t, \end{aligned}$$
(7)
$$\begin{aligned} \mathrm {d}V_t&= (D_V\Delta V_t + \epsilon (bU_t - V_t))\mathrm {d}t, \end{aligned}$$
(8)
together with sufficiently smooth initial conditions. Here, f is a bistable third-order polynomial \(f(x, u) = u(u_0-u)(u - a(x)u_0)\), and \(a\in C^1_b ({\mathbb {R}},{\mathbb {R}})\) is a bounded and continuously differentiable function with bounded derivative. The boundedness condition for a is not essential to the dynamics of U and can be realized in practice by a suitable cutoff function.
The FitzHugh–Nagumo system Fitzhugh (1961), Nagumo et al. (1962) originated as a minimal model capable of generating excitable pulses mimicking action potentials in neuroscience. Its two components U and V are called activator and inhibitor, respectively.
The spatial extension of the Fitzhugh–Nagumo system, obtained via diffusive coupling, is used to model propagation of excitable pulses and two-phase dynamics. In the case of the two phases, low and high concentration of the activator U are realized as the stable fixed points of the third-order polynomial f at 0 and \(u_0\). The unstable fixed point \(a u_0\), \(a\in (0,1)\), separates the domains of attraction of the two stable fixed points. The interplay between spatial diffusion, causing a smoothing out of concentration gradients with rate \(D_U\), and the local reaction forcing f, causing convergence of the activator to one of the stable phases with rate \(k_1\), leads to the formation of transition phases between regions with low or high concentration of U. The parameters determine shape and velocity of the transition phases, e.g., low values of a enhance the growth of regions with high activator concentration. This corresponds to the excitable regime, as explained in Flemming et al. (2020).
Conversely, a high concentration of the inhibitor V leads to a decay in the activator U, with rate \(k_2\). In the excitable regime, this mechanism leads to moving activator wave fronts. The inhibitor is generated with rate \(\epsilon b\) in the presence of U and decays at rate \(\epsilon \). Its spatial evolution is determined by diffusion with rate \(D_V\). Finally, choosing a as a functional depending on the total activator concentration introduces a feedback control that allows to stabilize the dynamics.
A detailed discussion of the relevance for cell biology is given in Sect. 3. More information on the FitzHugh–Nagumo model and related models can be found in Ermentrout and Terman (2010).
For this model, we can find a representation of the above type (5) as follows: Using the variation of constants formula, the solution V to (8) with initial condition \(V_0=0\) can be written as \(V_t=\epsilon b\int _0^te^{(t-r)(D_V\Delta -\epsilon I)}U_r\mathrm {d}r\). Inserting this representation into (7) yields the following reformulation
$$\begin{aligned} \begin{aligned} \mathrm {d}U_t&= \Big ( D_U \Delta U_t + k_1 U_t (u_0 - U_t )(U_t - a u_0) \\&\quad - k_2 \epsilon b\int _0^t e^{(t-r)( D_V\Delta -\epsilon I)}U_r\mathrm {d}r\Big )\mathrm {d}t + B\mathrm {d}W_t \\&= \left( \theta _0\Delta U_t + \theta _1F_1(U_t) + \theta _2F_2(U_t) + \theta _3F_3(U)(t) \right) \mathrm {d}t + B\mathrm {d}W_t \end{aligned} \end{aligned}$$
(9)
of the activator–inhibitor model (7), (8) by setting \(\theta _0 = D_U\), \(\theta _1 = k_1u_0{{\bar{a}}}\), \(\theta _2=k_1\), \(\theta _3 = k_2\epsilon b\), \({{\overline{F}}}=0\) for some \({{\bar{a}}}>0\) and
$$\begin{aligned} F_1(U)&= -\frac{a(|U|_{L^2})}{{{\bar{a}}}} U (u_0-U) , \end{aligned}$$
(10)
$$\begin{aligned} F_2(U)&= U^2(u_0-U), \end{aligned}$$
(11)
$$\begin{aligned} F_3(U)(t)&= -\int _0^te^{(t-r) (D_V\Delta -\epsilon I)} U_r \mathrm {d}r. \end{aligned}$$
(12)
Here \(e^{D_V\Delta -\epsilon I}\) is the semigroup generated by \(D_V\Delta -\epsilon I\). Note that \(F_3\) now depends on the whole trajectory of U, so that the resulting stochastic evolution Eq. (9) is no longer Markovian.
For the activator–inhibitor system (7), (8), we can verify well-posedness directly. For completeness, we state the optimal regularity results for both U and V, but our main focus lies on the observed variable \(X=U\).
Proposition 1
Let \(\gamma >d/4+1/2\). Then there is a unique solution (U, V) to (7), (8). Furthermore, U satisfies \((A_s)\) for any \(s<2\gamma -d/2+1\), and V satisfies \((A_s)\) for any \(s<2\gamma -d/2+3\).
The proof is deferred to “Appendix A.1.”
Basic Regularity Results
In the semilinear SPDE model (5), the nonlinear term F is assumed to satisfy [cf. Altmeyer et al. (2020b)]:
- \((F_{s,\eta })\):
-
There is \(b>0\) and \(\epsilon >0\) such that
$$\begin{aligned} |(-\Delta )^{\frac{s-2+\eta +\epsilon }{2}}F_{\theta _{1:K}}(Y)|_{C(0,T;H)} \le c(\theta _{1:K})(1+|(-\Delta )^{\frac{s}{2}}Y|_{C(0,T;H)})^b \end{aligned}$$
for \(Y\in C(0,T;H^s)\), where c depends continuously on \(\theta _{1:K}\).
In particular, if \(F(Y)(t) = F(Y_t)\), this simplifies to
$$\begin{aligned} |F_{\theta _{1:K}}(Y)|_{s-2+\eta +\epsilon } \le c(\theta _{1:K})(1+|Y|_{s})^b \end{aligned}$$
(13)
for \(Y\in H^s\). In order to control the regularity of X, we apply a splitting argument (see also Cialenco and Glatt-Holtz 2011; Pasemann and Stannat 2020; Altmeyer et al. 2020b) and write \(X={{\overline{X}}}+{{\widetilde{X}}}\), where \({{\overline{X}}}\) is the solution to the linear SPDE
$$\begin{aligned} \mathrm {d}{{\overline{X}}}_t = \theta _0\Delta {{\overline{X}}}_t\mathrm {d}t + B\mathrm {d}W_t,\quad {{\overline{X}}}_0 = 0, \end{aligned}$$
(14)
where W is the same cylindrical Wiener process as in (5), and \({{\widetilde{X}}}\) solves a random PDE of the form
$$\begin{aligned} \mathrm {d}{{\widetilde{X}}} = (\theta _0\Delta {{\widetilde{X}}}_t + F_{\theta _{1:K}}({{\overline{X}}}+\widetilde{X})(t))\mathrm {d}t,\quad {{\overline{X}}}_0 = X_0. \end{aligned}$$
(15)
Lemma 2
The process \({{\overline{X}}}\) is Gaussian, and for any \(p\ge 1\), \(s<2\gamma -d/2+1\):
$$\begin{aligned} {\mathbb {E}}\left[ \sup _{0\le t\le T}|{{\overline{X}}}_t|_s^p\right] < \infty . \end{aligned}$$
(16)
Proof
This is classical, see, e.g., Da Prato and Zabczyk (2014), Liu and Röckner (2015). \(\square \)
Proposition 3
-
(1)
Let \((A_{s})\) and \((F_{s, \eta })\) hold. Then for any \(p\ge 1\):
$$\begin{aligned} {\mathbb {E}}\left[ \sup _{0\le t\le T}|{{\widetilde{X}}}_t|_{s+\eta }^p\right] <\infty . \end{aligned}$$
(17)
In particular, if \(s+\eta <2\gamma -d/2+1\), then \((A_{s+\eta })\) is true.
-
(2)
Let \(G:C(0,T;H)\supset D(G)\rightarrow C(0,T;H)\) be any function such that \((F_{s,\eta })\) holds for G. Then for \(s < 2\gamma -d/2+1\) and \(p\ge 1\):
$$\begin{aligned} {\mathbb {E}}\left[ \sup _{0\le t\le T}|G(X)(t)|_{s+\eta -2}^p\right] < \infty . \end{aligned}$$
(18)
In particular,
$$\begin{aligned} {\mathbb {E}}\int _0^T|G(X)(t)|_{s+\eta -2}^2\mathrm {d}t < \infty . \end{aligned}$$
(19)
Proof
-
(1)
For \(t\in [0,T]\) and \(\epsilon >0\),
$$\begin{aligned} |{{\widetilde{X}}}_t|_{s+\eta }&\le |S(t)X_0|_{s+\eta } + \int _0^t|S(t-r)F_{\theta _{1:K}}(X)(t)|_{s+\eta }\mathrm {d}r \\&\le |X_0|_{s+\eta } + \int _0^t(t-r)^{-1+\epsilon /2}|F_{\theta _{1:K}}(X)(t)|_{s-2+\eta +\epsilon }\mathrm {d}r \\&\le |X_0|_{s+\eta } + \frac{2}{\epsilon }T^\frac{\epsilon }{2}\sup _{0\le t\le T}|F_{\theta _{1:K}}(X)(t)|_{s-2+\eta +\epsilon } \\&\le |X_0|_{s+\eta } + \frac{2}{\epsilon }T^\frac{\epsilon }{2}c(\theta _{1:K})(1+|X|_{C(0,T;H^s)})^b, \end{aligned}$$
where \(\theta _1,\dots ,\theta _K\) are the true parameters. This implies (17). If \(s+\eta < 2\gamma -d/2+1\), then a bound as in (17) holds for \({{\overline{X}}}\) by Lemma 2, and the claim follows.
-
(2)
This follows from
$$\begin{aligned} {\mathbb {E}}\left[ \sup _{0\le t\le T}|G(X)(t)|_{s+\eta -2}^p\right]&\le c{\mathbb {E}}\left[ \left( 1+\sup _{0\le t\le T}|X_t|_s\right) ^{bp}\right] < \infty . \end{aligned}$$
(20)
\(\square \)
These regularity results form the basis for the asymptotic analysis of diffusivity estimation, as explained in the next section.
Statistical Inference: The General Model
The projected process \(P_NX\) induces a measure \({\mathbb {P}}_\theta \) on \(C(0,T;{\mathbb {R}}^N)\). Heuristically [see Liptser and Shiryayev (1977, Section 7.6.4)], we have the following representation for the density with respect to \({\mathbb {P}}_{{{\overline{\theta }}}}^N\) for an arbitrary reference parameter \({{\overline{\theta }}}\in \Theta \):
$$\begin{aligned} \frac{\mathrm {d}{\mathbb {P}}^N_\theta }{\mathrm {d}{\mathbb {P}}^N_{{{\overline{\theta }}}}}(X^N)&= \exp \left( -\frac{1}{\sigma ^2}\int _0^T\left\langle (\theta _0-{{\overline{\theta }}}_0)\Delta X^N_t,(-\Delta )^{2\gamma }\mathrm {d}X^N_t\right\rangle \right. \\&\quad -\frac{1}{\sigma ^2}\int _0^T\left\langle P_N (F_{\theta _{1:K}}- F_{{{\overline{\theta }}}_{1:K}})(X),(-\Delta )^{2\gamma }\mathrm {d}X^N_t\right\rangle \\&\quad \left. +\frac{1}{2\sigma ^2}\int _0^T\left\langle (\theta _0-{{\overline{\theta }}}_0)\Delta X^N_t + P_N(F_{\theta _{1:K}}-F_{{{\overline{\theta }}}_{1:K}})(X), \right. \right. \\&\quad \left. \left. (-\Delta )^{2\gamma }\left[ (\theta _0+{{\overline{\theta }}}_0)\Delta X^N_t + P_N (F_{\theta _{1:K}}+F_{{{\overline{\theta }}}_{1:K}})(X)\right] \right\rangle \mathrm {d}t\right) . \end{aligned}$$
By setting the score (i.e., the gradient with respect to \(\theta \) of the log likelihood) to zero, and by formally substituting the (fixed) parameter \(\gamma \) by a (free) parameter \(\alpha \), we get the following maximum likelihood equations:
$$\begin{aligned} {{\hat{\theta }}}_0^N\int _0^T|(-\Delta )^{1+\alpha }X^N_t|_H^2\mathrm {d}t&= \int _0^T\langle (-\Delta )^{1+2\alpha }X^N_t,P_NF_{{{\hat{\theta }}}_{1:K}^N}(X)\rangle \mathrm {d}t &\\&\quad -\int _0^T\langle (-\Delta )^{1+2\alpha }X^N_t,\mathrm {d}X^N_t\rangle , \\ -{{\hat{\theta }}}_0^N\int _0^T\langle (-\Delta )^{1+2\alpha }X^N_t,\partial _{\theta _i}P_NF_{{{\hat{\theta }}}_{1:K}^N}(X)\rangle \mathrm {d}t &\\&= - \int _0^T\langle (-\Delta )^{2\alpha }P_NF_{{{\hat{\theta }}}_{1:K}^N}(X),\partial _{\theta _i}P_NF_{{{\hat{\theta }}}_{1:K}^N}(X)\rangle \mathrm {d}t \\&\quad + \int _0^T\langle (-\Delta )^{2\alpha }\partial _{\theta _i}P_NF_{{{\hat{\theta }}}_{1:K}^N}(X),\mathrm {d}X^N_t\rangle . \end{aligned}$$
Any solution \(({{\hat{\theta }}}^N_0,\dots ,{{\hat{\theta }}}^N_K)\) to these equations is a (joint) maximum likelihood estimator (MLE) for \((\theta _0,\dots ,\theta _K)\). W.l.o.g. we assume that the MLE is unique, otherwise fix any solution. We are interested in the asymptotic behavior of this estimator as \(N\rightarrow \infty \), i.e., as more and more spatial information (for fixed \(T>0\)) is available. While identifiability of \(\theta _1,\dots ,\theta _K\) in finite time depends in general on additional structural assumptions on F, the diffusivity \(\theta _0\) is expected to be identifiable in finite time under mild assumptions. Indeed, the argument is similar to Cialenco and Glatt-Holtz (2011), Pasemann and Stannat (2020), but we have to take into account the dependence of \({{\hat{\theta }}}^N_0\) on the other estimators \({{\hat{\theta }}}^N_1,\dots ,{{\hat{\theta }}}^N_K\). Note that the likelihood equations give the following useful representation for \({{\hat{\theta }}}^N_0\):
$$\begin{aligned} {{\hat{\theta }}}^N_0 = \frac{-\int _0^T\langle (-\Delta )^{1+2\alpha }X^N_t,\mathrm {d}X^N_t\rangle + \int _0^T\langle (-\Delta )^{1+2\alpha }X^N_t,P_NF_{{{\hat{\theta }}}_{1:K}^N}(X)\rangle \mathrm {d}t}{\int _0^T|(-\Delta )^{1+\alpha }X^N_t|_H^2\mathrm {d}t}. \end{aligned}$$
(21)
By plugging in the dynamics of X according to (5), we obtain the following decomposition:
$$\begin{aligned} {{\hat{\theta }}}^N_0 - \theta _0&= \frac{\int _0^T\langle (-\Delta )^{1+2\alpha }X^N_t,P_NF_{{{\hat{\theta }}}^N_{1:K}}(X)\rangle \mathrm {d}t}{\int _0^T|(-\Delta )^{1+\alpha }X^N_t|_H^2\mathrm {d}t} \nonumber \\&\quad - \frac{\int _0^T\langle (-\Delta )^{1+2\alpha }X^N_t,P_NF_{\theta _{1:K}}(X)\rangle \mathrm {d}t}{\int _0^T|(-\Delta )^{1+\alpha }X^N_t|_H^2\mathrm {d}t} - \frac{\int _0^T\langle (-\Delta )^{1+2\alpha }X^N_t,B\mathrm {d}W^N_t\rangle }{\int _0^T|(-\Delta )^{1+\alpha }X^N_t|_H^2\mathrm {d}t}. \end{aligned}$$
(22)
The right-hand side vanishes whenever for large N the denominator grows faster than the numerator in each of the three fractions. In principle, strong oscillation of the reaction parameter estimates \({{\hat{\theta }}}^N_{1:K}\) may influence the convergence rate for the first term, so in order to exclude undesirable behavior, we assume that \({{\hat{\theta }}}^N_{1:K}\) is bounded in probabilityFootnote 1. This is a mild assumption which is in particular satisfied if the estimators for the reaction parameters are consistent. In Sect. 2.3, we verify this condition for the case that F depends linearly on \(\theta _{1:K}\). Regarding the third term, we exploit the martingale structure of the noise in order to capture the growth in N. Different noise models may be used in (5) without changing the result, as long as the numerator grows slower than the denominator. For example, the present argument directly generalizes to noise of martingale typeFootnote 2. Now, the growth of the denominator can be quantified as follows:
Lemma 4
Let \(\alpha >\gamma -d/4-1/2\), let further \(\eta ,s_0>0\) such that \((A_s)\) and \((F_{s,\eta })\) are true for \(s_0\le s < 2\gamma +1-d/2\). Then
$$\begin{aligned} \int _0^T|(-\Delta )^{1+\alpha }X^N_t|_H^2\mathrm {d}t&\asymp {\mathbb {E}}\int _0^T|(-\Delta )^{1+\alpha }{{\overline{X}}}^N_t|_H^2\mathrm {d}t \end{aligned}$$
(23)
$$\begin{aligned}&\asymp C_\alpha N^{\frac{2}{d}(2\alpha -2\gamma +1)+1} \end{aligned}$$
(24)
in probability, with
$$\begin{aligned} C_\alpha = \frac{T\Lambda ^{2\alpha -2\gamma +1}d}{2\theta (4\alpha -4\gamma +2+d)}. \end{aligned}$$
(25)
Proof
Using Proposition 3 (i), the proof is exactly as in Pasemann and Stannat (2020, Proposition 4.6). \(\square \)
Theorem 5
Assume that the likelihood equations are solvable for \(N\ge N_0\), assume that \(({{\hat{\theta }}}^N_i)_{N\ge N_0}\) is bounded in probability for \(i=1,\dots ,K\). Let \(\alpha >\gamma -d/4-1/2\) and \(\eta , s_0>0\) such that \((A_s)\) and \((F_{s, \eta })\) hold for any \(s_0\le s < 2\gamma +1-d/2\). Then the following is true:
-
(1)
\({{\hat{\theta }}}^N_0\) is a consistent estimator for \(\theta _0\), i.e., \({{\hat{\theta }}}^N_0\xrightarrow {{\mathbb {P}}}\theta _0\).
-
(2)
If \(\eta \le 1 + d/2\), then \(N^{r}({{\hat{\theta }}}^N_0-\theta _0)\xrightarrow {{\mathbb {P}}}0\) for any \(r<\eta /d\).
-
(3)
If \(\eta > 1+d/2\), then
$$\begin{aligned} N^{\frac{1}{2}+\frac{1}{d}}({{\hat{\theta }}}^N_0-\theta _0)\xrightarrow {d}{\mathcal {N}}(0, V), \end{aligned}$$
(26)
with \(V = 2\theta _0(4\alpha -4\gamma +d+2)^2 / (Td\Lambda ^{2\alpha -2\gamma +1}(8\alpha -8\gamma +d+2))\).
Proof
By means of the decomposition (22), we proceed as in Pasemann and Stannat (2020). Denote by \({{\hat{\theta }}}^{\mathrm {full},N}_0\) the estimator which is given by (21) if the \({{\hat{\theta }}}^N_1,\dots ,{{\hat{\theta }}}^N_K\) are substituted by the true values \(\theta _1,\dots ,\theta _K\). In this case, the estimation error simplifies to
$$\begin{aligned} {{\hat{\theta }}}^{\mathrm {full},N}_0 - \theta _0 = -c_N \frac{\int _0^T\langle (-\Delta )^{1+2\alpha -\gamma }X^N_t,\mathrm {d}W^N_t\rangle }{\sqrt{\int _0^T|(-\Delta )^{1+2\alpha -\gamma }X^N_t|_H^2\mathrm {d}t}} \end{aligned}$$
(27)
with
$$\begin{aligned} c_N = \frac{\sqrt{\int _0^T |(-\Delta )^{1+2\alpha -\gamma } X^N_t|_H^2\mathrm {d}t}}{\int _0^T|(-\Delta )^{1+\alpha }X^N_t|_H^2\mathrm {d}t}. \end{aligned}$$
By Lemma 4, the rescaled prefactor \(c_N N^{1/2+1/d}\) converges in probability to \(\sqrt{C_{2\alpha -\gamma }} / C_\alpha \). The second factor converges in distribution to a standard normal distribution \({\mathcal {N}}(0,1)\) by the central limit theorem for local martingales (see Liptser and Shiryayev 1989, Theorem 5.5.4 (I); Jacod and Shiryaev 2003, Theorem VIII.4.17). This proves (26) for \({{\hat{\theta }}}^{\mathrm {full},N}_0\). To conclude, we bound the bias term depending on \({{\hat{\theta }}}^N_1,\dots ,{{\hat{\theta }}}^N_K\) as follows, using \(|P_NY|_{s_2}\le \lambda _N^{(s_2-s_1)/2}|P_NY|_{s_1}\) for \(s_1<s_2\): Let \(\delta >0\). Then
$$\begin{aligned}&\int _0^T\langle (-\Delta )^{1+2\alpha }X^N_t,P_NF_{{{\hat{\theta }}}_{1:K}^N}(X)\rangle \mathrm {d}t \\&\quad \le \left( \int _0^T|(-\Delta )^{1+\alpha }X^N_t|_H^2\mathrm {d}t\right) ^\frac{1}{2}\left( \int _0^T|(-\Delta )^{\alpha }P_NF_{{{\hat{\theta }}}_{1:K}^N}(X)|_H^2\mathrm {d}t\right) ^\frac{1}{2} \\&\quad \lesssim N^{\frac{1}{d}(2\alpha -2\gamma +1)+\frac{1}{2}}\left( \int _0^T|(-\Delta )^{\alpha }P_NF_{{{\hat{\theta }}}_{1:K}^N}(X)|_H^2\mathrm {d}t\right) ^\frac{1}{2} \\&\quad \lesssim N^{\frac{2}{d}(2\alpha -2\gamma +1)+1 - \frac{\eta -\delta }{d}}\left( \int _0^T|(-\Delta )^{\gamma + \frac{1}{2}-\frac{d}{4}-1+\frac{\eta -\delta }{2}}P_NF_{{{\hat{\theta }}}_{1:K}^N}(X)|_H^2\mathrm {d}t\right) ^\frac{1}{2}, \end{aligned}$$
so using \((F_{2\gamma +1-d/2-\delta ,\eta })\),
$$\begin{aligned} \frac{\int _0^T\langle (-\Delta )^{1+2\alpha }X^N_t,P_NF_{{{\hat{\theta }}}_{1:K}^N}(X)\rangle \mathrm {d}t}{\int _0^T|(-\Delta )^{1+\alpha }X^N_t|_H^2\mathrm {d}t} \lesssim c({{\hat{\theta }}}^N_{1:K})N^{-(\eta -\delta ) / d} \end{aligned}$$
As \(c({{\hat{\theta }}}^N_{1:K})\) is bounded in probability and \(\delta >0\) is arbitrarily small, the claim follows. The remaining term involving the true parameters \(\theta _1,\dots ,\theta _K\) is similar. This concludes the proof. \(\square \)
It is clear that a Lipschitz condition on F with respect to \(\theta _{1:K}\) allows to bound \({{\hat{\theta }}}^N_0-{{\hat{\theta }}}^{\mathrm {full},N}_0\) in terms of \(|{{\hat{\theta }}}^N_{1:K}-\theta _{1:K}|N^{-(\eta -\delta )/d}\) for \(\delta >0\), using the notation from the previous proof. In this case, consistency of \({{\hat{\theta }}}^N_i\), \(i=1,\dots ,K\), may improve the rate of convergence of \({{\hat{\theta }}}^N_0\). However, as noted before, in general we cannot expect \({{\hat{\theta }}}^N_i\), \(i=1,\dots ,K\), to be consistent as \(N\rightarrow \infty \).
Statistical Inference: The Linear Model
We put particular emphasis on the case that the nonlinearity F depends linearly on its parameters:
$$\begin{aligned} \mathrm {d}X_t = \left( \theta _0\Delta X_t + \sum _{i=1}^K\theta _iF_i(X) + {{\overline{F}}}(X)\right) \mathrm {d}t + B\mathrm {d}W_t. \end{aligned}$$
(28)
This model includes the FitzHugh–Nagumo system in the form (9). We state an additional verifiable condition, depending on the contrast parameter \(\alpha \in {\mathbb {R}}\), which guarantees that the likelihood equations are well posed, among others.
- \((L_\alpha )\):
-
The terms \(F_1(Y),\dots ,F_K(Y)\) are well defined as well as linearly independent in \(L^2(0,T;H^{2\alpha })\) for every non-constant \(Y\in C(0,T;C({\mathcal {D}}))\).
In particular, condition \((L_\alpha )\) implies for \(i=1,\dots ,K\) that
$$\begin{aligned} \int _0^T|(-\Delta )^\alpha F_i(X)|_H^2\mathrm {d}t > 0. \end{aligned}$$
(29)
For linear SPDEs, similar considerations have been made first in Huebner (1993), Chapter 3. The maximum likelihood equations for the linear model (28) simplify to
$$\begin{aligned} A_N(X){{\hat{\theta }}}^N(X) = b_N(X), \end{aligned}$$
(30)
where
$$\begin{aligned} A_N(X)_{0,0}&= \int _0^T|(-\Delta )^{1+\alpha }X^N_t|_H^2\mathrm {d}t, \\ A_N(X)_{0,i} = A_N(X)_{i,0}&= -\int _0^T\langle (-\Delta )^{1+2\alpha }X^N_t,P_NF_i(X)\rangle \mathrm {d}t, \\ A_N(X)_{i,j}&= \int _0^T\langle (-\Delta )^{2\alpha }P_NF_i(X),P_NF_j(X)\rangle \mathrm {d}t \end{aligned}$$
for \(i,j=1,\dots ,K\), and
$$\begin{aligned} b_N(X)_0&= -\int _0^T\langle (-\Delta )^{1+2\alpha }X^N_t,\mathrm {d}X^N_t\rangle \\&\quad + \int _0^T\langle (-\Delta )^{1+2\alpha }X^N_t,P_N\overline{F}(X)\rangle \mathrm {d}t, \\ b_N(X)_i&= \int _0^T\langle (-\Delta )^{2\alpha }P_NF_i(X),\mathrm {d}X^N_t\rangle \\&\quad - \int _0^T\langle (-\Delta )^{2\alpha }P_NF_i(X),P_N\overline{F}(X)\rangle \mathrm {d}t \end{aligned}$$
for \(i=1,\dots ,K\).
In order to apply Theorem 5, we need that the estimators \({{\hat{\theta }}}^N_1,\dots ,{{\hat{\theta }}}^N_K\) are bounded in probability.
Proposition 6
In the setting of this section, let \(\eta ,s_0>0\) such that \((A_s)\) and \((F_{s,\eta })\) are true for \(s_0\le s < 2\gamma +1-d/2\). For \(\gamma -d/4-1/2<\alpha \le \gamma \wedge (\gamma -d/4-1/2+\eta /2)\wedge (\gamma -d/8-1/4+\eta /4)\), let \((L_\alpha )\) be true. Then the \({{\hat{\theta }}}^N_i\), \(i=0,\dots ,K\), are bounded in probability.
The proof of Proposition 6 is given in “Appendix A.2.” We note that the upper bound on \(\alpha \) can be relaxed in general, depending on the exact asymptotic behavior of \(A_N(X)_{i,i}\), \(i=1,\dots ,K\). Proposition 6 together with Theorem 5 gives conditions for \({{\hat{\theta }}}^N_0\) to be consistent and asymptotically normal in the linear model (28). In particular, we immediately get for the activator–inhibitor model (7), (8), as the linear independency condition \((L_\alpha )\) is trivially satisfied and \(\eta \) can be chosen arbitrarily close to 2:
Theorem 7
Let \(\gamma >d/4\). Then \({{\hat{\theta }}}^N_0\) has the following properties in the activator–inhibitor model (7), (8):
-
(1)
In \(d=1\), let \(\gamma -3/4<\alpha \le \gamma \). Then \({{\hat{\theta }}}^N_0\) is a consistent estimator for \(\theta _0\), which is asymptotically normal as in (26).
-
(2)
In \(d=2\), let \(\gamma -1<\alpha < \gamma \). Then \({{\hat{\theta }}}^N_0\) is a consistent estimator for \(\theta _0\) with optimal convergence rate, i.e., \(N^r({{\hat{\theta }}}^N_0-\theta _0)\xrightarrow {{\mathbb {P}}}0\) for any \(r<1\).
So far, we have presented a theory of parameter estimation for stochastic reaction–diffusion models, with special emphasis on activator–inhibitor systems. In the next chapter, the context of this class of models for intracellular actin dynamics is discussed.