Diffusivity Estimation for Activator-Inhibitor Models: Theory and Application to Intracellular Dynamics of the Actin Cytoskeleton

A theory for diffusivity estimation for spatially extended activator-inhibitor dynamics modelling the evolution of intracellular signaling networks is developed in the mathematical framework of stochastic reaction-diffusion systems. In order to account for model uncertainties, we extend the results for parameter estimation for semilinear stochastic partial differential equations, as developed in [PS20], to the problem of joint estimation of diffusivity and parametrized reaction terms. Our theoretical findings are applied to the estimation of effective diffusivity of signaling components contributing to intracellular dynamics of the actin cytoskeleton in the model organism Dictyostelium discoideum.


Introduction
The purpose of this paper is to develop the mathematical theory for statistical inference methods for the parameter estimation of stochastic reaction-diffusion systems modelling spatially extended signaling networks in cellular systems. Such signaling networks are one of the central topics in cell biology and biophysics as they provide the basis for essential processes including cell division, cell differentiation, and cell motility [DBE + 17]. Nonlinearities in these network may cause rich spatiotemporal behavior including the emergence of oscillations and waves [BK17]. Furthermore, alterations and deficiencies in the network topology can explain many pathologies and play a key role in diseases such as cancer [CSS05]. Here we present a method to estimate both diffusivity and reaction terms of a stochastic reaction-diffusion system, given space-time structured data of local concentrations of signaling components. We mainly focus on the estimation of diffusivity, whose precision can be increased by simultaneous calibration of the reaction terms. To test this approach, we use fluorescence microscopy recordings of the actin dynamics in the cortex of cells of the social amoeba Dictyostelium discoideum, a well-established model organism for the study of a wide range of actin-dependent processes [AF09]. A recently introduced stochastic reaction-diffusion model could reproduce many features of the dynamical patterns observed in the cortex of these cells including excitable and bistable states [ASB18, FFAB20, MFF + 20]. In combination with the experimental data, this model will serve as a specific test case to exemplify our mathematical approach. Since in real-world applications the available data will not allow for calibrating and validating detailed mathematical models, in this paper we will be primarily interested in minimal models that are still capable of generating all observed dynamical features at correct physical magnitudes. The developed estimation techniques should in practice be as robust as possible w.r.t. uncertainty and even misspecification of the unknown real dynamics.
The impact of diffusion and reaction in a given model will be of fundamentally different structure and it is one of the main mathematical challenges to separate these impacts in the data in order to come to valid parameter estimations. On the more mathematical side, diffusion corresponds to a second order partial differential operator -resulting in a strong spatial coupling in the given data, whereas the reaction corresponds to a lower order, in fact 0-order, in general resulting in highly nonlinear local interactions in the data. For introductory purposes, let us assume that our data is given in terms of a space and time-continuous field X(t, x) on [0, T ] × D, where T is the terminal time of our observations and D ⊂ R 2 a rectangular domain that corresponds to a chosen data-segment in a given experiment. Although in practice the given data will be discrete w.r.t. both space and time, we will be interested in applications where the resolution is high enough in order to approximate the data by such a continuous field. Our standing assumption is that X(t, x) is generated by a dynamical system of the form where ∆ is the Laplacian, given by ∆X(t, x) = ∂ 2 x 1 X(t, x) + ∂ 2 x 2 X(t, x), x = (x 1 , x 2 ), which captures the diffusive spreading in the dynamics of X(t, x). The intensity of the diffusion is given by the diffusivity θ 0 . Finally, F is a generic term, depending on the solution field X(t, x), which describes all non-diffusive effects present in X(t, x), whether they are known or unknown. A natural approach to extract θ 0 from the data is to use a "cutting-out estimator" of the form where Y (t, x) is a suitable test function. In the second fraction of (2), we use the functional form for readability. In particular, we write X = X(t, x) for the solution field. We will also write X t = X(t, ·) for the (spatially varying) solution field at a fixed time t. In order to ease notation, we will use this functional form from now on throughout the paper. It is possible to derive (2) from a least squares approach by minimizing θ → ∂ t X − θ∆X 2 with a suitably chosen norm. If the non-diffusive effects described by F are negligible, we see by plugging (1) into (2) thatθ 0 is close to θ 0 . If a sound approximation F to F is known, the estimator can be made more precise by substituting ∂ t X by ∂ t X − F X in (2). A usual choice for Y is a reweighted spectral cutoff of X.
Under additional model assumptions, e.g. if (1) is in fact a stochastic partial differential equation (SPDE) driven by Gaussian white noise, a rather developed parameter estimation theory for θ 0 has been established in [PS20] on the basis of maximum likelihood estimation (MLE).
In this paper, we are interested in further taking into account also those parts of F X corresponding to local nonlinear reactions. As a particular example, we will focus on a recently introduced stochastic reaction-diffusion system of FitzHugh-Nagumo type that captures many aspects of the dynamical wave patterns observed in the cortex of motile amoeboid cells [FFAB20], Here, we identify θ 0 = D U and the only observed data is the activator variable, i.e. X = U . Therefore, in this example the non-diffusive part of the dynamics will be further decomposed as F = F + ξ, where ξ is Gaussian white noise and F = F (U ) encodes the non-Markovian reaction dynamics of the activator. The inhibitor component V in the above reaction-diffusion system is then incorporated for minimal modelling purposes to allow the formation of travelling waves in the activator variable U that are indeed observed in the time evolution of the actin concentration. This model and its dynamical features is explained in detail in Section 2.1.1.
As noted before, it is desirable to include this additional knowledge into the estimation procedure (2) by subtracting a suitable approximation F of the -in practice -unknown F. Although (3), (4) suggest an explicit parametric form for F, it is a priori not clear how to quantify the nuisance parameters appearing in the system. Thus an (approximate) model for the data is known qualitatively, based on the observed dynamics, but not quantitatively. In order to resolve this issue, we extend (2) and adopt a joint maximum likelihood estimation of θ 0 and various nuisance parameters.
The field of statistical inference for SPDEs is rapidly growing, see [Cia18] for a recent survey. The spectral approach to drift estimation was pioneered by [HKR93,HR95] and subsequently extended by various works, see e.g. [HLR97,LR99,LR00] for the case of nondiagonalizable linear evolution equations. In [CGH11], the stochastic Navier-Stokes equations have been analyzed as a first example of a nonlinear evolution equation. This has been generalized by [PS20] to semilinear SPDEs. Joint parameter estimation for linear evolution equations is treated in [Hue93,Lot03]. Besides the spectral approach, other measurement schemes have been studied. See e.g. [PT07, BT20, BT19, Cho19a, Cho19b, KT19, CH19, CDVK20, CK20, KU19] for the case of discrete observations in space and time. Recently, the so-called local approach has been worked out in [AR20] for linear equations, was subsequently generalized in [ACP20] to the semilinear case and applied to a stochastic cell repolarization model in [ABJR20].
The paper is structured as follows: In Section 2 we give a theory for joint diffusivity and reaction parameter estimation for a class of semilinear SPDEs and study the spatial high-frequency asymptotics. In Section 3, the biophysical context for these models is discussed. The performance of our method on simulated and real data is evaluated in Section 4.

Maximum Likelihood Estimation for Activator-Inhibitor Models
In this section we develop a theory for parameter estimation for a class of semilinear SPDE using a maximum likelihood ansatz. The application we are aiming at is an activator-inhibitor model as in [FFAB20]. More precisely, we show under mild conditions that the diffusivity of such a system can be identified in finite time given high spatial resolution and observing only the activator component.
2.1. The Model and Basic Properties. Let us first introduce the abstract mathematical setting in which we are going to derive our main theoretical results. We work in spatial dimension d ≥ 1. Given a bounded domain D = [0, L 1 ] × · · · × [0, L d ] ⊂ R d , L 1 , . . . , L d > 0, we consider the following parameter estimation problem for the semilinear SPDE dX t = (θ 0 ∆X t + F θ 1 ,...,θ K (X)) dt + BdW t (5) with periodic boundary conditions for ∆ on the Hilbert space H = L 2 (D) = {u ∈ L 2 (D)| D udx = 0}, together with initial condition X 0 ∈ H. We allow the nonlinear term F to depend on additional (nuisance) parameters θ 1 , . . . , θ K and write θ = (θ 0 , . . . , θ K ) T , θ 1:K = (θ 1 , . . . , θ K ) for short. Without further mentioning it, we assume that θ ∈ Θ for a fixed parameter space Θ, e.g. Θ = R K + . Next, W is a cylindrical Wiener process modelling Gaussian space-time white noise, that is, . In order to introduce spatial correlation, we use a dispersion operator of the form B = σ(−∆) −γ with σ > 0 and γ > d/4. Here, σ is the noise intensity, and γ quantifies the decay of the noise for large frequencies in Fourier space. In addition, γ determines the spatial smoothness of X, see Section 2.1.2. The condition γ > d/4 ensures that the covariance operator BB T is of trace class, which is a standard assumption for well-posedness of (5), cf. [LR15]. Denote by (λ k ) k≥0 the eigenvalues of −∆, ordered increasingly, with corresponding eigenfunctions (Φ k ) k≥0 . It is well-known [Wey11,Shu01] that λ k Λk 2/d for a constant Λ > 0, i.e. lim k→∞ λ k /(Λk 2/d ) = 1. The proportionality constant Λ is known explicitly (see e.g. [Shu01, Proposition 13.1]) and depends on the domain D. Let P N : H → H be the projection onto the span of the first N eigenfunctions, and set X N := P N X. For later use, we denote by I the identity operator acting on H. For s ∈ R, we write H s := D((−∆) s/2 ) for the domain of (−∆) s/2 , which is given by and abbreviate | · | s := | · | H s for the norm on that space whenever convenient. We assume that the initial condition X 0 is regular enough, i.e. it satisfies E[|X 0 | p s ] < ∞ for any s ≥ 0, p ≥ 1, without further mentioning it in the forthcoming statements. We will use the following general class of conditions with s ≥ 0 in order to describe the regularity of X: Our standing assumption is that there is a unique solution X to (5) such that (A 0 ) holds. This is a general statement and can be derived e.g. under the assumptions from [LR15, Theorem 3.1.5].
2.1.1. An Activator-Inhibitor Model. An important example for our analysis is given by the following FitzHugh-Nagumo type system of equations in d ≤ 2 (cf. [FFAB20]): together with sufficiently smooth initial conditions. Here, f is a bistable third-order polynomial f (x, u) = u(u 0 − u)(u − a(x)u 0 ), and a ∈ C 1 b (R, R) is a bounded and continuously differentiable function with bounded derivative. The boundedness condition for a is not essential to the dynamics of U and can be realized in practice by a suitable cutoff function.
The FitzHugh-Nagumo system [Fit61,NAY62] originated as a minimal model capable of generating excitable pulses mimicking action potentials in neuroscience. Its two components U and V are called activator and inhibitor, resp.
The spatial extension of the Fitzhugh-Nagumo system, obtained via diffusive coupling, is used to model propagation of excitable pulses and two-phase dynamics. In the case of the two phases, low and high concentration of the activator U are realized as the stable fixed points of the third order polynomial f at 0 and u 0 . The unstable fixed point au 0 , a ∈ (0, 1), separates the domains of attraction of the two stable fixed points. The interplay between spatial diffusion, causing a smoothing out of concentration gradients with rate D U , and the local reaction forcing f , causing convergence of the activator to one of the stable phases with rate k 1 , leads to the formation of transition phases between regions with low or high concentration of U . The parameters determine shape and velocity of the transition phases, e.g. low values of a enhance the growth of regions with high activator concentration. This corresponds to the excitable regime, as explained in [FFAB20].
Conversely, a high concentration of the inhibitor V leads to a decay in the activator U , with rate k 2 . In the excitable regime, this mechanism leads to moving activator wave fronts. The inhibitor is generated with rate b in the presence of U and decays at rate . Finally, its spatial evolution is determined by diffusion with rate D V .
Finally, choosing a as a functional depending on the total activator concentration introduces a feedback control that allows to stabilize the dynamics.
A detailed discussion of the relevance for cell biology is given in Section 3. More information on the FitzHugh-Nagumo model and related models can be found in [ET10].
For this model, we can find a representation of the above type (5) as follows: Using the variation of constants formula, the solution V to (8) with initial condition V 0 = 0 can be written as V t = b t 0 e (t−r)(D V ∆− I) U r dr. Inserting this representation into (7) yields the following reformulation of the activator-inhibitor model (7), (8) by setting θ 0 = D U , θ 1 = k 1 u 0ā , θ 2 = k 1 , θ 3 = k 2 b, F = 0 for someā > 0 and Here e D V ∆− I is the semigroup generated by D V ∆ − I. Note that F 3 now depends on the whole trajectory of U , so that the resulting stochastic evolution equation (9) is no longer Markovian.
For the activator-inhibitor system (7), (8), we can verify well-posedness directly. For completeness, we state the optimal regularity results for both U and V , but our main focus lies on the observed variable X = U .
The proof is deferred to Appendix A.1.
2.1.2. Basic Regularity Results. The nonlinear term F is assumed to satisfy (cf. [ACP20]): (F s,η ) There is b > 0 and > 0 such that for Y ∈ H s . In order to control the regularity of X, we apply a splitting argument (see also [CGH11,PS20,ACP20]) and write X = X + X, where X is the solution to the linear SPDE where W is the same cylindrical Wiener process as in (5), and X solves a random PDE of the form d X = (θ 0 ∆ X t + F θ 1:K (X + X)(t))dt, X 0 = X 0 .
(2) This follows from 2.2. Statistical Inference: The General Model. The projected process P N X induces a measure P θ on C(0, T ; R N ). Heuristically (see [LS77,Section 7.6.4]), we have the following representation for the density with respect to P N θ for an arbitrary reference parameter θ ∈ Θ: By setting the score (i.e. the gradient with respect to θ of the loglikelihood) to zero, and by formally substituting the (fixed) parameter γ by a (free) parameter α, we get the following maximum likelihood equations: Any solution (θ N 0 , . . . ,θ N K ) to these equations is a (joint) maximum likelihood estimator (MLE) for (θ 0 , . . . , θ K ). W.l.o.g. we assume that the MLE is unique, otherwise fix any solution. We are interested in the asymptotic behavior of this estimator as N → ∞, i.e. as more and more spatial information (for fixed T > 0) is available. While identifiability of θ 1 , . . . , θ K in finite time depends in general on additional structural assumptions on F , the diffusivity θ 0 is expected to be identifiable in finite time under mild assumptions. Indeed, the argument is similar to [CGH11,PS20], but we have to take into account the dependence ofθ N 0 on the other estimatorsθ N 1 , . . . ,θ N K . Note that the likelihood equations give the following useful representation forθ N 0 : By plugging in the dynamics of X according to (5), we obtain the following decomposition: The right-hand side vanishes whenever for large N the denominator grows faster than the numerator in each of the three fractions. In principle, strong oscillation of the reaction parameter estimatesθ N 1:K may influence the convergence rate for the first term, so in order to exclude undesirable behaviour, we assume thatθ N 1:K is bounded in probability 1 . This is a mild assumption which is in particular satisfied if the estimators for the reaction parameters are consistent. In Section 2.3 we verify this condition for the case that F depends linearly on θ 1:K . Regarding the third term, we exploit the martingale structure of the noise in order to capture the growth in N . Different noise models may be used in (5) without changing the result, as long as the numerator grows slower than the denominator. For example, the present argument directly generalizes to noise of martingale type 2 . Now, the growth of the denominator can be quantified as follows: 2 The generalization of our results to the case of multiplicative noise depends crucially on the noise model, see e.g. [CL09,Cia10]. This is beyond the scope of the present work.
Theorem 5. Assume that the likelihood equations are solvable for N ≥ N 0 , assume that (θ N i ) N ≥N 0 is bounded in probability for i = 1, . . . , K. Let α > γ − d/4 − 1/2 and η, s 0 > 0 such that (A s ) and (F s,η ) hold for any s 0 ≤ s < 2γ + 1 − d/2. Then the following is true: Proof. By means of the decomposition (22), we proceed as in [PS20]. Denote byθ full,N 0 the estimator which is given by (21) if theθ N 1 , . . . ,θ N K are substituted by the true values θ 1 , . . . , θ K . In this case, the estimation error simplifies tô By Lemma 4, the rescaled prefactor c N N 1/2+1/d converges in probability to C 2α−γ /C α . The second factor converges in distribution to a standard normal distribution N (0, 1) by the central limit theorem for local martingales (see [LS89,Theorem 5 . To conclude, we bound the bias term de- As c(θ N 1:K ) is bounded in probability and δ > 0 is arbitrarily small, the claim follows. The remaining term involving the true parameters θ 1 , . . . , θ K is similar. This concludes the proof.
It is clear that a Lipschitz condition on F with respect to θ 1:K allows to boundθ N 0 −θ full,N 0 in terms of |θ N 1:K − θ 1:K |N −(η−δ)/d for δ > 0, using the notation from the previous proof. In this case, consistency ofθ N i , i = 1, . . . , K, may improve the rate of convergence ofθ N 0 . However, as noted before, in general we cannot expectθ N i , i = 1, . . . , K, to be consistent as N → ∞.
2.3. Statistical Inference: The Linear Model. We put particular emphasis on the case that the nonlinearity F depends linearly on its parameters: This model includes the FitzHugh-Nagumo system in the form (9). We state an additional verifiable condition, depending on the contrast parameter α ∈ R, which guarantees that the likelihood equations are well-posed, amongst others.
In particular, condition (L α ) implies for i = 1, . . . , K that where In order to apply Theorem 5, we need that the estimatorsθ N 1 , . . . ,θ N K are bounded in probability.
Proposition 6. In the setting of this section, let η, s 0 > 0 such that Then theθ N i , i = 0, . . . , K, are bounded in probability. The proof of Proposition 6 is given in Appendix A.2. We note that the upper bound on α can be relaxed in general, depending on the exact asymptotic behaviour of A N (X) i,i , i = 1, . . . , K. Proposition 6 together with Theorem 5 gives conditions forθ N 0 to be consistent and asymptotically normal in the linear model (28). In particular, we immediately get for the activator-inhibitor model (7), (8), as the linear independency condition (L α ) is trivially satisfied and η can be chosen arbitrarily close to 2: Theorem 7. Let γ > d/4. Thenθ N 0 has the following properties in the activator-inhibitor model (7), (8): (1) In d = 1, let γ−3/4 < α ≤ γ. Thenθ N 0 is a consistent estimator for θ 0 , which is asymptotically normal as in (26).

Application to Activator-Inhibitor Models of Actin Dynamics
The actin cytoskeleton is a dense polymer meshwork at the inner face of the plasma membrane that determines the shape and mechanical stability of a cell. Due to the continuous polymerization and depolymerization of the actin filaments, it displays a dynamic network structure that generates complex spatiotemporal patterns. These patterns are the basis of many essential cellular functions, such as endocytic processes, cell shape changes, and cell motility [BBPSP14]. The dynamics of the actin cytoskeleton is controlled and guided by upstream signaling pathways, which are known to display typical features of nonequilibrium systems, such as oscillatory instabilities and the emergence of traveling wave patterns [DBE + 17, BK17]. Here we use giant cells of the social amoeba D. discoideum that allow us to observe these cytoskeletal patterns over larger spatial domains [GEW + 14]. Depending on the genetic background and the developmental state of the cells, different types of patterns emerge in the cell cortex. In particular, pronounced actin wave formation is observed as the consequence of a mutation in the upstream signaling pathway -a knockout of the RasG-inactivating RasGAP NF1 -which is present for instance in the commonly used laboratory strain AX2 [VWB + 16]. Giant cells of NF1-deficient strains thus provide a well-suited setting to study the dynamics of actin waves and their impact on cell shape and division [FFAB20].
In Figure 1A and B, we show a normal-sized and a giant D. discoideum cell in the wave-forming regime for comparison. Images were recorded by confocal laser scanning microscopy and display the distribution of mRFP-LimE∆, a fluorescent marker for filamentous actin, in the cortex at the substrate-attached bottom membrane of the cell. As individual actin filaments are not resolved by this method, the intensity of the fluorescence signal reflects the local cortical density of filamentous actin. Rectangular subsections of the inner part of the cortex of giant cells as displayed in panel (C) were used for data analysis in Section 4.
Many aspects of subcellular dynamical patterns have been addressed by reaction-diffusion models. While some models rely on detailed modular approaches [BAB08, DBE + 17], others have focused on specific parts of the upstream signaling pathways, such as the phosphatidylinositol lipid signaling system [ASM + 10] or Ras signaling [FMU19]. To describe wave patterns in the actin cortex of giant D. discoideum cells, the noisy FitzHugh-Nagumo type reaction-diffusion system (3), (4), combined with a dynamic phase field, has been recently proposed [FFAB20]. In contrast to the more detailed biochemical models, the structure of  this model is rather simple. Waves are generated by noisy bistable/excitable kinetics with an additional control of the total amount of activator U . This constraint dynamically regulates the amount of U around a constant level in agreement with the corresponding biological restrictions. Elevated levels of the activator represent typical cell front markers, such as active Ras, PIP3, Arp2/3, and freshly polymerized actin that are also concentrated in the inner part of actin waves. On the other hand, markers of the cell back, such as PIP2, myosin II, and cortexillin, correspond to low values of U and are found outside the wave area [SDGE + 09]. Tuning of the parameter b allows to continuously shift from bistable to excitable dynamics, both of which are observed in experiments with D. discoideum cells. In Figure 1D-F, the results of numerical simulations of this model displaying excitable dynamics are shown. Examples for bounded domains that correspond to normal-sized and giant cells are shown, as well as results with periodic boundary conditions that were used in the subsequent analysis.
Model parameters, such as the diffusivities, are typically chosen in an ad hoc fashion to match the speed of intracellular waves with the experimental observations. The approach introduced in Section 2 now allows us to estimate diffusivities from data in a more rigorous manner. On the one hand, we may test the validity of our method on in silico data of model simulations, where all parameters are predefined. On the other hand, we can apply our method to experimental data, such as the recordings of cortical actin waves displayed in Figure 1C. This will yield an estimate of the diffusivity of the activator U , as dense areas of filamentous actin reflect increased concentrations of activatory signaling components. Note, however, that the estimated value of D U should not be confused with the molecular diffusivity of a specific signaling molecule. It rather reflects an effective value that includes the diffusivities of many activatory species of the signaling network and is furthermore affected by the specific two-dimensional setting of the model that neither includes the kinetics of membrane attachment/detachment nor the three-dimensional cytosolic volume.

Diffusivity Estimation on Simulated and Real Data
In this section we apply the methods from Section 2 to synthetic data obtained from a numerical simulation and to cell data stemming from experiments as described in Section 3. We follow the formalism from Theorem 5 and perform a Fourier decomposition on each data set. Set φ k (x) = cos(2πkx) for k ≤ 0 and φ k (x) = sin(2πkx) for k > 0, then Φ k,l (x, y) = φ k (x/L 1 )φ l (y/L 2 ), k, l ∈ Z, form an eigenbasis for −∆ on the rectangular domain D = [0, L 1 ] × [0, L 2 ]. The corresponding eigenvalues are given by λ k,l = 4π 2 ((k/L 1 ) 2 + (l/L 2 ) 2 ). As before, we choose an ordering ((k N , l N )) N ∈N of the eigenvalues (excluding λ 0,0 = 0) such that λ N = λ k N ,l N is increasing.
In the sequel, we will use different versions ofθ N 0 which correspond to different model assumptions on the reaction term F , concerning both the effects included in the model and a priori knowledge on the parametrization. While all of these estimators enjoy the same asymptotic properties as N → ∞, it is reasonable to expect that they exhibit huge qualitative differences for fixed N ∈ N, depending on how much knowledge on the generating dynamics is incorporated. In order to describe the model nonlinearities that we presume, we use the notation F 1 , F 2 , F 3 as in (10), (11), (12). As a first simplification, we substitute F 1 by F 1 given by in all estimators below. This corresponds to an approximation of the function a by an effective average valueā > 0. While this clearly does not match the full model, we will see that it does not pose a severe restriction as a(U ) tends to stabilize in the simulation. Recall the explicit representation (21) ofθ N 0 . As before, K is the number of nuisance parameters appearing in the nonlinear term F . We construct the following estimators which capture qualitatively different model assumptions: (1) The linear estimatorθ lin,N 0 results from presuming K = 0 and F = 0.
(3) The full or FitzHugh-Nagumo estimatorθ full,N 0 , where K = 0 and with θ 1 = k 1 u 0ā , θ 2 = k 1 and θ 3 = k 2 b, where v is given by v t = t 0 e (t−r)(D V ∆− I) u r dr. As before, k 1 , k 2 , u 0 ,ā, D v , > 0 are known. Furthermore, we modifyθ full,N 0 in order to estimate different subsets of model parameters at the same time. We use the notationθ i,N 0 , where i is the number of simultaneously estimated parameters. More precisely, we setθ 1,N 0 =θ full,N 0 , and additionally: (1) The estimatorθ 2,N 0 results from K = 1 and F θ 1 given by (32) for known θ 2 , θ 3 > 0. This corresponds to an unknownā.
In all estimators in this section we set the regularity adjustment α = 0. This is a reasonable choice if the driving noise in (7), (8) is close to white noise.
It is worthwhile to note thatθ lin,N 0 is invariant under rescaling the intensity of the data, i.e. substituting X by cX, c > 0. This has the advantage that we do not need to know the physical units of the data. In fact, the intensity of fluorescence microscopy data may vary due to different expression levels of reporter proteins within a cell population, or fluctuations in the illumination. While invariance under intensity rescaling is a desirable property, the fact that nonlinear reaction terms are not taken into account may outweigh this advantage, especially if the SPDE model is close to the true generating process of the data. This is the case for synthetic data. The discussion in Section 4.1 shows that even if the model specific correction terms in (21) vanish asymptotically, their effect on the estimator may be huge in the nonasymptotic regime, especially at low resolution level. However, real data may behave differently, and a detailed nonlinear model may not reveal additional information on the underlying diffusivity, see Section 4.4.
4.1. Performance on Synthetic Data. First, we study the performance of the mentioned estimators on simulations. The numerical scheme is specified in Appendix B. While we have perfect knowledge on the dynamical system which generates the data in this setting, it is revelatory to compare the different versions ofθ N 0 which correspond to varying levels of model misspecification. The simulation shows that a(|U t | L 2 ) fluctuates around a value slightly larger than 0.15. We demonstrate the effect of qualitatively different model assumptions on our method in Figure 2  does not see any information on the wave fronts, the steep gradient at the transition phase leads to a low diffusivity estimate. On the other hand,θ pol,N 0 incorporates knowledge on the wave fronts as they appear in the Schlögl model, but the decay in concentration due to the presence of the inhibitor is mistaken as additional diffusion. Finally,θ full,N 0 contains sufficient information on the dynamics to give a precise estimate. In Figure 2  works very well on the full domain of 200×200 pixels with periodic boundary conditions, it decays rapidly if we just use a square section of 75 × 75 pixels. In fact, the boundary conditions are not satisfied on that square section. This leads to the presence of discontinuities at the boundary. These discontinuities, if interpreted as steep gradients, lower the observed diffusivity. Hence, a first guess to improve the quality is to mirror the square section along each axis and glue the results together. In this manner we construct a domain with 150 × 150 pixels, on whichθ 2,N 0 performs well. We emphasize that, while this periodification procedure is a natural approach, its performance will depend on the specific situation, because the dynamics at the transition edges will still not obey the true underlying dynamics. Furthermore, by modifying the data set as explained, we change its resolution, and consequently, a different amount of spectral information may be included intoθ 2,N 0 for interpretable results.  andθ 4,N 0 , which impose very specific model assumptions. This pattern can be observed across different cell data sets. In particular, this is notably different from the performance of these three estimators on synthetic data. This discrepancy seems to indicate that the lower order reaction terms in the activator-inhibitor model are not fully consistent with the information contained in the experimental data. This can have several reasons, for example, it is possible that a more detailed model reduction of the known signalling pathway inside the cell is needed. On the contrary, θ 2,N 0 seems to be comparatively rigid due to its a priori choices for θ 2  and θ 3 , but it eventually approaches the other estimators. Variations in the value of u 0 have an impact on the results for small N but not on the asymptotic behaviour. In Figure 4 (top right), the cell from Figure  4 (top left) is periodified before evaluating the estimators. As expected from the discussion in Section 4.2, the estimates rise, but the order of magnitude does not change drastically.

Invariance under Convolution. Given a function
where k and u are identified with their periodic continuation. It is well-known that T k commutes with ∆, i.e. T k • ∆ = ∆ • T k . Thus, if X is a solution to a semilinear stochastic PDE with diffusivity θ, the same is true for T k X: While the nonlinearity and the dispersion operator may be changed by T k , the diffusive part is left invariant, in particular the diffusivity of X and T k X is the same. Based on this observation, a comparison between the effective diffusivity of X and T k X for different choices of k may serve as an indicator if the assumption with (left) N = N const or (right) N = N stop is plotted against N stop . The least squares fit is shown in red. The p-value in each plot corresponds to a t-test whose null hypothesis states that the slope of the regression line is zero. Clearly, the slope is more notable in the case N = N const . that a data set X is generated by a semilinear SPDE (5) is reasonable in the first place, and if the diffusion indeed can be considered to be homogeneous and anisotropic. We use a family of periodic kernels k = kσ,σ > 0, which are normed in L 1 (D) and coincide on the reference rectangle [−L 1 /2, L 1 /2] × [−L 2 /2, L 2 /2] with a Gaussian density with standard deviationσ. In Figure 4 (bottom), the effects of applying T kσ for different bandwidthsσ are shown for one data set and its periodification. While the diffusivity of the data without periodification on the left-hand panel is slightly affected by the kernel, more precisely, its tendency to fall is enlarged, the graphs for the effective diffusivity of the periodified data are virtually indistinguishable. Periodification seems to be compatible with the expected invariance under convolution, even if the periodified data is not generated by a semilinear SPDE but instead by joining smaller patches of that form. In total, these observations are in accordance with the previous sections and suggest that the statistical analysis of the data based on a semilinear SPDE model is reasonable. 4.6. The Effective Diffusivity of a Cell Population. We compare the estimated diffusivity for a cell population consisting of 36 cells. The boundaries in space and time of all samples are selected in order to capture only the interior dynamics within a cell and, consequently, the data sets differ in their size. On the one hand, the estimated diffusivity tends to stabilize in time, i.e. the number of frames in a sample, corresponding to the final time T , does not affect the result much. On the other hand, the size of each frame, measured in pixels, determines the number of eigenfrequencies that carry meaningful information on the process. Thus, we expect that N can be chosen larger for samples with high spatial resolution. We formalize this intuition with the following heuristic: Let r x and r y denote the number of pixels in each row and column, resp., of every frame in a sample. Let N stop = 4r x r y /M 2 , where M ∈ N is a parameter representing the number of pixels needed for a sine or cosine to extract meaningful information. That is, a square of dimensions r x × r y = M × M leads to N stop = 4, so in this case only the eigenfunctions Φ k,l , k, l ∈ {−1, 1} are taken into account, whose period length is M in both dimensions. In our evaluations, we choose M = 12 for the data without periodification and M = 24 for the periodified data sets. We mention that the cells are also heterogeneous with respect to their characteristic length dx and time dt, given in meters per pixel and seconds per frame for each data set. 3 However, a detailed quantitative analysis of the resulting discretization error is beyond the scope of our work. In Figure 5, we compare the results forθ 3,Nstop 0 witĥ θ 3,Nconst 0 within the cell population, where N const = 899 is independent of the sample resolution. Evaluating the estimator at N stop decorrelates the estimated diffusivity from the spatial extension of the sample. The results for different cells have the same order of magnitude, which indicates that the effective diffusivity can be used for statistical inference for cells within a population or between populations in future research. 4.7. Estimating θ 1 . When solving the linear MLE equations in order to obtainθ 2,N 0 , we simultaneously get an estimateθ 2,N 1 for θ 1 = k 1 u 0ā . Note that u 0 = 1 by convention, and θ 2 = k 1 = 1 is treated as known quantity in this case. Thus, θ 1 can be identified withā. In Figure 6 we show the results forθ 1,N 1 on simulated data (left) and on a cell data sample (right). Note that in general, we cannot expect to observe increased precision forθ 2,N 1 as N grows, because the reaction term is of 3 Further tests indicate that the estimated diffusivity correlates with the characteristic diffusivity dx 2 /dt. A more detailed examination of the discretization effects and other dependencies which are not directly related to the underlying process is left for future research. Also, it would be interesting to understand the extent to which the effective diffusivity in the data is scale dependent.  Figure 6. Estimation of the reaction parameter θ 1 on (left) simulated data and (right) cell data. All times are given in seconds, relative to the first frame. As before, we restrict to N ≥ 25. order zero. 4 However, it is informative to consider also the large time regime T → ∞, i.e. to include more and more frames to the estimation procedure. In the case of simulated data, a oscillates around an average value close to 0.15, which should be considered to be the ground truth forā. This effective valueā is recovered well, even for small T , with increasing precision as T grows. Clearly, this depends heavily on the model assumptions. In the case of cell data, the results are rather stable. This indicates that it may be reasonable to use the concept of an "effective unstable fixed pointā of the reaction dynamics, conditioned on the model assumptions included inθ 2,N 0 ", when evaluating cell data statistically.
4.8. The Case of Pure Noise Outside the Cell. If the data set does not contain parts of the cell but rather mere noise, the estimation procedure still returns a value. This "observed diffusivity" (see Figure 7) originates completely from white measurement noise. More precisely, the appearance and vanishing of singular pixels is interpreted as instantaneous (i.e. within the time between two frames) diffusion to the steady state. Thus, the observed diffusivity in this case can be expected to be even larger than the diffusivity inside the cell. In this section, we give a heuristical explanation for the order of magnitude of the effective diffusivity outside the cell.
We work in dimension d = 2. Assume that a pixel has width x > 0. This value is determined by the spatial resolution of the data. For simplicity, we approximate it by a Gaussian density φ 0 (y) with standard deviation σ 0 = x 2 . This way, the inflection points of the (onedimensional marginal) density match the sharp edges of the pixel. Now, using φ 0 as an initial condition for the heat equation on the whole space R 2 , the density φ t after time t is also a Gaussian density, with standard deviation σ t = σ 2 0 + 2θt, obtained by convolution with the heat kernel. The maximal value f t max of φ t is attained at y = 0 with f t max = (2πσ 2 t ) −1 = (2π(σ 2 0 + 2θt)) −1 . Now, if we observe after time t > 0 at the given pixel an intensity decay by a factor b > 0, i.e.

(33)
bf t max ≤ f 0 max , this leads to an estimate for the diffusivity of the form For example, set t = 0.97s and x = 2.08 × 10 −7 m, as in the data set from Figure 7 (left). The intensity decay factor varies between different pixels in the data set, reasonable values are given for b ≤ 30. If b = 30, we get θ ≥ 1.6 × 10 −13 m 2 /s, for b = 20, we get θ ≥ 1 × 10 −13 m 2 /s, and for b = 15, we have θ ≥ 7.8 × 10 −14 m 2 /s. This matches the observed diffusivity outside the cell from Figure 7, which is indeed of order 1 × 10 −13 m 2 /s: For example, with N = N stop = 4r x r y /M 2 and M = 12, as in Section 4.6, we get N stop = 165 andθ lin,Nstop 0 = 1.36 × 10 −13 m 2 /s for this data set consisting of pure noise. In total, this gives a heuristical explanation for the larger effective diffusivity outside the cell compared to the estimated values inside the cell.
It is important to note that even if the effective diffusivity outside the cell is larger, this has almost no effect on the estimation procedure inside the cell. This is because the total energy A N (X) 0,0 of the noise outside the cell is several orders of magnitude smaller than the total energy of the signal inside the cell, see Figure 7 (right).

Discussion and Further Research
In this paper we have extended the mathematical theory of parameter estimation of stochastic reaction-diffusion system to the joint estimation problem of diffusivity and parametrized reaction terms within the variational theory of stochastic partial differential equations. We have in particular applied our theory to the estimation of effective diffusivity of intracellular actin cytoskeleton dynamics.
Traditionally, biochemical signaling pathways were studied in a purely temporal manner, focusing on the reaction kinetics of the individual components and the sequential order of the pathway, possibly including feedback loops. Relying on well-established biochemical methods, many of these temporal interaction networks could be characterized. However, with the recent progress in the in vivo expression of fluorescent probes and the development of advanced live cell imaging techniques, the research focus has increasingly shifted to studying the full spatiotemporal dynamics of signaling processes at the subcellular scale. To complement these experiments with modeling studies, stochastic reaction-diffusion systems are the natural candidate class of models that incorporate the relevant degrees of freedom of intracellular signaling processes. Many variants of this reaction-diffusion framework have been proposed in an empirical manner to account for the rich plethora of spatiotemporal signaling patterns that are observed in cells. However, the model parameters in such studies are oftentimes chosen in an ad hoc fashion and tuned based on visual inspection, so that the patterns produced in model simulations agree with the experimental observations. A rigorous framework that allows to estimate the parameters of stochastic reaction-diffusion systems from experimental data will provide an indispensable basis to refine existing models, to test how well they perform, and to eventually establish a new generation of more quantitative mathematical models of intracellular signaling patterns.
The question of robustness of the parameter estimation problem w.r.t. specific modelling assumptions of the underlying stochastic evolution equation is an important problem in applications that needs to be further investigated in future research. In particular, this applies to the dependence of diffusivity estimation on the domain and its boundary. In this work, we based our analysis on a Fourier decomposition on a rectangular domain with periodic boundary conditions. A natural, boundary-free approach is using local estimation techniques as they have been developed and used in [AR20, ACP20, ABJR20]. An additional approach aiming in the same direction is the application of a wavelet transform.
It is a crucial task to gather further information on the reaction term from the data. Principally, this cannot be achieved in a satisfactory way on a finite time horizon, so the long-time behaviour of maximum likelihood-based estimators needs to be studied in the context of stochastic reaction-diffusion systems. We will address this issue in detail in future work.

Appendix A. Additional Proofs
A.1. Proof of Proposition 1. We prove Proposition 1 by a series of lemmas. First, we show that (7), (8) is well-posed in H = L 2 × L 2 . First note that for any u, v ∈ L 2 (D) and x ∈ R: as ∂f (x, u) is bounded from above uniformly in x and u.
In particular, for any p ≥ 1.
As B = (−∆) −γ is constant and of Hilbert-Schmidt type, we can neglect it in the estimates. In order to check the conditions , we have to look separately at A 1 and A 2 . The statements for A 2 is trivial by linearity, so we test the parts of the conditions corresponding to A 1 . (H1) is clear as f (x, u) is a polynomial in u and continuous in x. (H2) follows from (35) via |U 2 | 2 H 1 , we see that local monotonicity as in (H2) is satisfied by taking ρ((u, v)) = c|u| 2 H 1 ≤ c|(u, v)| 2 V . Now, for (H3), for some V : D → R, again using (35). Finally, so it remains to control |f (|U | L 2 , U )| 2 Thus (H4) is true. Putting things together, we get that [LR15] is applicable for sufficiently large p ≥ 1, and the claim follows.
Lemma 10. The process (U, V ) satisfies (A s ) for some s > 1.
Proof. Let U = U + U be the decomposition of U into its linear and nonlinear part as in (14),(15). We will prove for any p, q ≥ 1 and s < 2. As γ > d/4 in d ≤ 2, there is s > 1 such that (42) is true for U with q = 2, and the claim follows. Let us now prove (42). First, note that by Lemma 9 and the Sobolev embedding theorem in dimension d ≤ 2, (42) is true for U t with any p, q ≥ 1 and s = 0. For k ∈ N, by |U k t | p L q = |U t | kp L kq , we see that polynomials in U satisfy (42) with s = 0, too. In particular, using that a is bounded, for i = 1, 2 and p, q ≥ 1. A simple calculation shows that the same is true for F 3 . As in the proof of Proposition 3, we see that with η < 2, q ≥ 1 and = 2 − η, and the proof is complete.
Proof of Proposition 1. In d ≤ 2, H s is an algebra for s > 1, i.e. uv ∈ H s for u, v ∈ H s [AF03]. Together with the assumption that a is bounded, it follows immediately that (F s,η ) holds for F 1 , F 2 separately (with K = 0) for any s > 1 and η < 2. For F 3 , we have for any δ > 0: so F 3 satisfies (F s,η ) even with s ∈ R, η < 4. It is now clear that the (F s,η ) holds for F = θ 1 F 1 + θ 2 F 2 + θ 3 F 3 for any s > 1 and η < 2 (with K = 3). The claim follows from Proposition 3 (i) for U and (ii) for V , noting that V = bF 3 (U ).
A.2. Proof of Proposition 6. This section is devoted to proving Proposition 6. A similar argument can be found in [Hue93, Chapter 3] for linear SPDEs. We start with an auxiliary statement, which characterizes the rate of det(A N (X)). For simplicity of notation, we abbreviate a N i := A N (X) i,i for i = 0, . . . , K. While there is a trival upper bound det(A N (X)) a N 0 . . . a N K obtained by the Cauchy-Schwarz inequality, the corresponding lower bound requires more work. Our argument is geometric in nature: A Gramian matrix measures the (squared) volume of a parallelepiped spanned by the vectors defining the matrix. The argument takes place in the separable Hilbert space L 2 (0, T ; H 2α ), and we find a (uniform in N ) lower bound on the (K + 1)-dimensional volume of the parallelepiped spanned by the vectors P N (−∆)X, P N F 1 (X), . . . , P N F K (X). While the latter K components (and thus their K-dimensional volume) converge, the first component gets eventually orthogonal to the others as N grows. This is formalized as follows: Lemma 11. Let η, s 0 > 0 such that (A s ) and (F s,η ) are true for s 0 ≤ s < 2γ + 1 − d/2. Let γ − d/4 − 1/2 < α ≤ γ − d/4 − 1/2 + η/2. Under assumption (L α ), we have det(A N (X)) a N 0 . . . a N K .
In particular, A N (X) is invertible if N is large enough.
Proof of Proposition 6. As before, we formally write F 0 (X) = ∆X in order to unify notation. By plugging in the dynamics of X, we see that for i = 0, . . . , K, Thus, withb N (X) i = T 0 (−∆) 2α−γ P N F i (X), dW N t and θ = (θ 0 , . . . , θ K ) T , this read as i.e.θ Using the explicit representation for the inverse matrix A N (X) −1 , we haveθ where A (i,j) N (X) results from A N (X) by erasing the ith row and the jth column. Remember that by Lemma 4, a N 0 C α N 2 d (2α−2γ+1)+1 . Further, due to α ≤ γ − d/4 − 1/2 + η/2, a N i /a N 0 → 0 in probability for all i = 1, . . . , K. Now, by the Cauchy-Schwarz inequality, each summand in the numerator can be bounded by terms of the form for i, j = 0, . . . , K. Thus, by Lemma 11, in order to prove the claim it remains to find a uniform bound in probability for the terms of the form |b N (X) i |/ a N i a N j . Now, for i = 1, . . . , K, uniformly in N as α < γ −d/8−1/4+η/4. Further, (a N 0 ) −1/2 converges to zero in probability, while (a N i ) −1/2 converges to a positive constant for i = 1, . . . , K. For i = 0, the reasoning is similar: By Propositon 3 and Lemma 4, we have Together with a N 0 ∼ N 2 d (2α−2γ+1)+1 and α ≤ γ, this yields thatb N (X) 2 0 /a N 0 (and consequently |b N (X) 0 |/ a N 0 a N j for j = 0, . . . , K) is bounded in probability. The claim follows.
For single and giant cell simulations in Figure 1 (D), (E) we solve (7), (8) on a square with side L = 75, with spatial increment dx = 0.15, for t ∈ [0, T ], T = 1000, with temporal increment dt = 0.002, using an explicit finite difference scheme together with a phase field model [FFAB20] to account for the interior of the single and giant cells, corresponding, respectively, to an area of A 0 =113 µm 2 and A 0 =2290 µm 2 . We used b = 0.4 and a(x) = 0.5 − b + 0.5(x/(0.25u 0 A 0 ) − 1). The noise terms are chosen as in [FFAB20], with σ = 0.1 and k η = 0.1 in the notation therein.
Experimental settings. For experiments D. discoideum AX2 cells expressing mRFP-LimE∆ as a marker for F-actin were used. Cell culture and electric pulse induced fusion to create giant cells were performed as described in [GEW + 14, FFAB20]. For live cell imaging an LSM 780 (Zeiss, Jena) with a 63x objective lens (alpha Plan-Apochromat, NA 1.46, Oil Korr M27, Zeiss Germany) were used. The spatial and temporal resolution was adjusted for each individual experiment to acquire image series with the best possible resolution, while protecting the cells from photo damage. Images series were saved as 8-bit tiff files. For image series where the full range of 256 pixel values was not utilized, the image histograms were optimized in such a way that the brightest pixels corresponded to a pixel value of 255.