Efficient numerical approximation of a non-regular Fokker–Planck equation associated with first-passage time distributions

In neuroscience, the distribution of a decision time is modelled by means of a one-dimensional Fokker–Planck equation with time-dependent boundaries and space-time-dependent drift. Efficient approximation of the solution to this equation is required, e.g., for model evaluation and parameter fitting. However, the prescribed boundary conditions lead to a strong singularity and thus to slow convergence of numerical approximations. In this article we demonstrate that the solution can be related to the solution of a parabolic PDE on a rectangular space-time domain with homogeneous initial and boundary conditions by transformation and subtraction of a known function. We verify that the solution of the new PDE is indeed more regular than the solution of the original PDE and proceed to discretize the new PDE using a space-time minimal residual method. We also demonstrate that the solution depends analytically on the parameters determining the boundaries as well as the drift. This justifies the use of a sparse tensor product interpolation method to approximate the PDE solution for various parameter ranges. The predicted convergence rates of the minimal residual method and that of the interpolation method are supported by numerical simulations.

The predicted convergence rates of the minimal residual method and that of the interpolation method are supported by numerical simulations.

Introduction
In 1978 Ratcliff [Rat78] introduced a model for binary decision processes based on diffusion processes.This model turned out to agree well with experimental data; Gold and Shadlen [GS01] provides a neurophysiological explanation for its success.Indeed, the solution (X t ) t≥0 of a one-dimensional stochastic differential equation is assumed to describe the difference in activity of two competing neuron populations.At time t = 0, the value X 0 = x 0 ∈ R represents the resting-state activity of the neuron populations.A decision is triggered when (X t ) t≥0 first reaches one of two (possibly time-dependent) critical values α or β, each reflecting an outcome of the decision process.
In a typical decision experiment, scientists can only measure the decision time and outcome.Parameter fitting thus requires access to the decision time distributions, which are rarely known explicitely.Ad hoc numerical simulations are costly whence efficient simulation methods are much soughtafter [HFW + 15,FFGC21].
In this article we extend and improve a simulation method introduced in [VV08], which is based on the Fokker-Planck equation associated to the decision time.In particular, this article may be viewed as the theoretical counterpart of our publication [BCGS21], which is aimed at the neuroscientific community.
Linking the first hitting time of a stochastic differential equation to a Fokker-Planck equation is a well-known approach that has also been applied in e.g.astrophysics [Cha43] and cell biology [HS15]; for an overview see [AKTSM18].
In particular, although we only consider examples arising from neuroscience, the simulation method we introduce is also relevant for other applications.
In [VV08], a Crank-Nicolson method is used to approximate solutions to (1.3) in the case that α, β, and µ are constant.One advantage of this setting is that one only needs to solve a single PDE of type (1.3) in order to obtain the first hitting time probabilities P[α y ≤ min(t, βy )] for all t ∈ [0, τ], y ∈ [α(0), β(0)].However, due to the fact that F is discontinuous at (t, x) = (0, α(τ)), no proof of convergence of the Crank-Nicolson for decreasing step-sizes seems available.At best, reduced rates are to be expected.Moreover, various authors have argued that time-dependent boundaries α and β and space-timedependent drift µ provide a more realistic model for decision processes, for an overview see [HFW + 15,SK13].
In this article we extend [VV08] to include diffusion models with timedependent boundaries and non-constant drift.We improve the efficiency of the numerical simulation by not approximating the solution F to (1.3) directly, instead, we approximate the solution to a parabolic PDE on a rectangular domain with homogeneous initial and boundary conditions constructed such that its difference with F (transformed to the same rectangular domain) is a function for which a rapidly converging series expansion is known.
More specifically, in Section 2 we demonstrate that if α, β are once continuously differentiable, then (1.3) can be transformed into a parabolic PDE on a rectangular domain with a space-time-dependent drift.Next, in Section 3 we demonstrate that by subtracting a known, discontinuous function, we obtain a parabolic PDE with homogeneous boundary conditions, see (3.1) below.We analyze the regularity of the solution e to this equation and verify that it is indeed smoother than F, see Corollary 3.1 and Theorem 3.1.
In Section 4 we apply a minimal residual method [And13,SW21b,SW21a] to approximate the solution e to (3.1).This method is known to give quasibest approximations from the selected trial space in the norm on a natural solution space being the intersection of two Bochner spaces.Taking as trial space the space of continuous piecewise bilinears with respect to a uniform partition of the space-time cylinder into rectangles with mesh width h, in Theorem 4.1 the optimal error bound of order h is shown for the solution e to (3.1).
In Section 5 we consider the situation that µ, α, and β can be parametrized analytically and verify that in this case the corresponding solution e to (3.1) (transformed onto the unit square) depends analytically on these parameters as well as on the final time τ, see Theorem 5.1.This justifies the use of a sparse tensor-product interpolation [NTW08] to determine the solution e to (3.1) efficiently for multiple end-time and parameter values.Finally, in Section 6 we provide numerical simulations for three different decision models taken from the neurophysiological literature.
In our parallel publication [BCGS21] mentioned above, we provide further numerical experiments and code.There, we apply the Crank-Nicolson method (without giving any error analysis) to approximate the solution e to (3.1).In the examples we consider it appears that the Crank-Nicolson method leads to similar convergence as the minimal residual method.Although we only provide a rigorous error analysis for the minimal residual method, Crank-Nicolson may be preferred in practice as it is easier to implement.We refer to [BCGS21] for further details.For normed linear spaces E and F, by L(E, F) we denote the normed linear space of bounded linear mappings E → F, and by L iso (E, F) its subset of boundedly invertible linear mappings E → F.

Transforming the Fokker-Planck equation to a rectangular space-time domain
In this section we demonstrate that (1.3) can be transformed into a PDE on a rectangular space-time domain, see (2.3) below.The PDE in (2.3) below forms the starting point for the remainder of this article, which is why we use tildes in (2.1) below to distinguish the variables and coefficients of the non-transformed equation from those in (2.3).Indeed, let and consider the following parabolic initial-and boundary value problem: (2.1) In particular, from t = θ −1 (θ(t)) we obtain that θ satisfies the following ODE With Ω := (0, 1), is a bijection with inverse we have u(t, 0) = 1, u(t, 1) = 0 (t ∈ (0, T)), and u(0, x) = 0 (x ∈ Ω).Moreover, for (t, x) ∈ (0, T) × Ω, one has and In other words, with (2.3) For its numerical solution, we always consider system (2.3) for T < ∞.

Regularity of the Fokker-Planck equation
Let u(v) denote the solution to (2.3) for some given drift function v. Due to the discontinuity between boundary and initial data, it is clear that u(v) is discontinuous at the corner (t, x) = (0, 0).This reduces the rate of convergence of standard numerical methods and makes it difficult to provide a theoretical bound on the convergence rate.However, for constant drift v, a rapidly converging series expansion of u(v) is known ( [GBK14]), which allows to efficiently approximate u(v) within any given positive tolerance.Knowing this, our approach to approximate u(v) which we solve approximately with a numerical method.To derive a priori bounds for the approximation error, we analyze the smoothness of e(v), see Section 3.3.In particular, under additional smoothness conditions on v, and using that (v − v 0 )(0, 0) = 0, we show that e(v) is more smooth than u(v 0 ), and thus than u(v), which shows the benefit of applying the numerical method to (3.1) instead of directly to (2.3).It turns out that for any v the smoothness of u(v) is determined by that of the solution u H of the heat equation on (0, ∞) × R that is 0 at t = 0 and 1 at x = 0. Its smoothness is the topic of the next subsection.

The heat kernel
The function is the heat kernel.It satisfies the latter being the space of test functions.Following [Cos90, Ex. 2.14] and [FF03], for (t, Knowing that The following lemma turns out to be handy to analyze the smoothness of u H restricted to I × Ω. The integral over x is finite if and only if p(2α + β) > −3, and if so, the expression is equal to with the first integral being finite if and only if pβ > −1.
Following [Wlo87], we analyze the regularity of the solutions u(v) and e(v) of the parabolic problems (2.3) and (3.1), respectively, in (intersections of) Bochner spaces.In particular, the space L 2 (I; plays an important role in this and following sections.For the precise definition of this space and some properties we refer to [Wlo87,Chapter 25].With H 1 0,{0} (I) denoting the closure in H 1 (I) of the functions in C ∞ (I) ∩ H 1 (I) that vanish at 0, we have the following result concerning the smoothness of u H restricted to I × Ω.
Proof By applications of Lemma 3.1, we infer that ∂ We have Finally in this subsection, notice that from

Regularity of the parabolic problem with homogeneous initial and boundary conditions
Knowing that e(v) is the solution of the parabolic problem (3.1) that has homogeneous initial and boundary conditions, we study the regularity of such a problem.
where the spatial differential operators at the right-hand side should be interpreted in a weak sense, i.e., ((∂ (3.4) (see, e.g., [Wlo87,Thm. 26.1]).Under additional smoothness conditions on the right-hand side f beyond being in L 2 (I; H −1 (Ω)), additional smoothness of the solution w can be demonstrated: [Wlo87,Thm. 25.5]).As shown in [Wlo87,Thm. 27.2 and its proof], from the last two properties of f , and To show the spatial regularity, i.e., w ∈ L 2 (I; (3.5) where, as before, the spatial differential operators should be interpreted in a weak sense.Using that and Young's inequality, one infers that for λ > 1 4 v 2 L ∞ (I×Ω) the bilinear form defined by the left-hand side of (3.5) is bounded and coercive on L 2 (I; , an induction and tensor product argument shows A(0, 0 and using that v∂ x ∈ L(L 2 (I; , one infers that w λ and thus w ∈ L 2 (I; H 3 (Ω)), and moreover w L 2 (I; and the maximal regularity result from, e.g., [dS64,DHP03].

3.3
The regularity of e e e(v v v) Recall that u H denotes the solution of the heat equation studied in Section 3.1, that u(v) denotes the solution to (3.1) for given v ∈ C(I × Ω), and v 0 := v(0, 0).Since e(v) solves (3.1), i.e., e(v) is the solution w of (3.3) for forcing function f given by in view of the regularity results proven in Proposition 3.1, we establish smoothness of e(v) by demonstrating smoothness of each of the three terms at the right-hand side of (3.6).
Lemma 3.2 It holds that Because, again by (3.2), (t, x) → xu H (t, 1) is in the same space, the proof is completed.
Proof The function w := u(v 0 ) − u(0) satisfies the homogeneous initial-and boundary conditions from (3.3), and Throughout the proof, we use the estimates for u H proven in Corollary 3.1.We start with proving , the first term is even in L 2 (I × Ω).Writing the second term as , and the proof is completed.
By combining the results of the preceding three propositions with the regularity result proven in Proposition 3.1 we obtain the following.
from Lemma 3.2 and 3.3, whereas Lemma 3.4 implies that (v so that an application of Proposition 3.1a) completes the proof.
Notice that as a consequence of Corollary 3.1, Lemma 3.2 and 3.3,

, and thus than u(v),
confirming the claim we made at the beginning of Section 3.

Minimal residual method
For solving (3.3) (specifically for the forcing function f as in (3.6), i.e., for solving e(v)), we write it in variational form, i.e., we multiply it by test functions z : I × Ω → R from a suitable collection, integrate it over I × Ω, and apply integration by parts with respect to x.We thus arrive at for all those test functions.With , where γ 0 := w → w(0, •) denotes the initial trace operator, see, e.g., [Wlo87,Chapter IV] or [SS09].
Already because X = Y × L 2 (Ω), the well-posed system (B, γ 0 )w = ( f , 0) cannot be discretized by simple Galerkin discretizations.Given a family (X h ) h∈∆ of finite dimensional subspaces of X, as discrete approximations to w one may consider the minimizers argmin w∈X Ω) .Since the dual norm • Y ′ cannot be evaluated, this approach is not immediately feasi- ble either.Therefore, for (Y h ) h∈∆ being a second family of finite dimensional subspaces, now of Y, for h ∈ ∆ as a discrete approximation from X h we consider This minimal residual approach has been studied for general parabolic PDEs in, e.g., [And13,SW21b,SW21a], where Ω can be a d-dimensional spatial domain for arbitrary d ≥ 1.
For parabolic differential operators with a possibly asymmetric spatial part, in our setting caused by a non-zero drift function v, in [SW21a, Thm.3.1] it has been shown that if X h ⊂ Y h and where the implied constant in (4.3) depends only on ̺ and an upper bound for v L ∞ (I×Ω) , i.e., w h is a quasi-best approximation from X h with respect to the norm on X.
Remark 4.1 This quasi-optimality result has been demonstrated under the condition that the spatial part of the parabolic differential operator is coercive on < 1, but which might be violated otherwise.
Although this coercivity condition might not be necessary, it can always be enforced by considering w λ (t, •) := w(t, •)e −λt , f λ (t, •) := f (t, •)e −λt instead of w and f with λ sufficiently large, see also the proof of Proposition 3.1.By approximating w λ by the minimal residual method, and by multiplying the obtained approximation by e λt , an approximation for w is obtained.Since qualitatively the transformations with e ±λt do not affect the smoothness of solution or right-hand side, for convenience in the following we pretend that coercivity holds true for (3.3).
As in [SW21a,SW21b], we equip Y h in (4.1) with the energy norm denotes the symmetric part of the spatial differential operator.Equipping Y h and X h with bases Φ h = {φ h i } and Ψ h = {ψ h j }, respectively, and denoting by w w w h the representation of the minimizer w h with respect to Ψ h , w w w h is found as the second component of the solution of . The operator A s can be replaced by any other spectrally equivalent operator on Y h without compromising the quasi-optimality result (4.3).We refer to [SW21a,SW21b] for details.
Let P 1 be the set of polynomials of degree one.Taking for n := 1/h ∈ N, it is known, cf.[SW21b, Sect.4], that condition (4.2) is satisfied for where obviously also X h ⊂ Y h .Applying this approach for f = (v − v 0 )∂ x u(v 0 ), in view of (4.3) the error of the obtained approximation for e(v) with respect to the X-norm can be bounded by the error of the best approximation from X h .To bound the latter error we recall from Theorem 3.1 that for and using that by standard interpolation estimates and uniform H1 -boundedness of these L 2 -orthogonal projectors, see e.g.[BX91,§3], one infers that h.
Similarly using that Our findings are summarized in the following theorem.
) and X h , Y h as defined in (4.5) and (4.6), the numerical approximation e h = e h (v) ∈ X h to e = e(v) obtained by the application of the minimal residual method to (3.1) 1 satisfies e − e h X h.
Notice that for this space X h of continuous piecewise bilinears, this linear decay of the error e − e h X as function of h is generally the best that can be expected.In view of the order of the space X h , one may hope that e − e h L 2 (I×Ω) is O(h 2 ), but on the basis of the smoothness demonstrated for e, even for inf ē∈X h e − ē L 2 (I×Ω) this cannot be shown.

Interpolation for parametrized drift, boundaries, and final time
In this section we consider the case that v and T in (2.3) depend on a number of parameters (ρ 1 , . . ., ρ N ) ∈ [−1, 1] N , and that one is interested in the solution u(v) to (2.3) for multiple values of these parameters.As explained in Section 3, in order to find u(v) it suffices to obtain the solution e(v) to (3.1).Instead of simply solving e(v) for each of the desired parameter values, under the provision that v and T depend smoothly on the parameters, one may attempt to interpolate e(v) from its a priori computed approximations for a carefully selected set of parameters in [−1, 1] N .
In order to be able to do so, first of all we need to get rid of the parameter dependence of the domain I × Ω = (0, T) × (0, 1).With Î := (0, 1), the function û on Î × Ω defined by û(t, x) where analogously v(t, x) := v(tT, x).Denoting this û as û( v, T), the differ- (5.2) By simply replacing I = (0, T) by Î = (0, 1) and in particular X as well as Y by , in a number of places, it is clear that the results that we obtained about the smoothness of e and its numerical approximation e h by the minimal residual method apply equally well to ê and its minimal residual approximation that we denote as êh .
Since the domain of ê is independent of parameters, we can apply the idea of interpolation.One option is to perform a 'full' tensor product interpolation.In this case, the number of interpolation points required for a fixed polynomial degree, i.e., the number of values of the parameters for which a numerical approximation for ê ∈ X has to be computed, grows exponentially with the number N of parameters.As this is undesirable, we instead apply a sparse tensor product interpolation.More specifically, we choose the Smolyak construction, based on Clenshaw-Curtis abscissae in each parameter direction, see [NTW08]: For i ∈ N let I i+1 denote the univariate interpolation operator with abscissae cos j2 −i π, j = 0, . . ., 2 i , onto the space of polynomials of degree 2 i , let I 1 be the interpolation operator with abscissa 0 and let I 0 := 0.Then, for an integer q ≥ N, we apply the sparse interpolator It is known that the resulting interpolation error in C([−1, 1] N ; X) (for arbitrary Banach space X), equipped with • L ∞ ([−1,1] N ; X) , decays subexponentially in the number of interpolation points when ê as function of each of the parameters ρ n has an extension to a differentiable mapping on a neighbourhood Σ of [−1, 1] in C. For details about this statement we refer to [NTW08, Thm. 3.11].
[NTW08] also mentions that the result requires relatively large values of q.Thus, the authors additionally prove algebraic convergence under the same assumptions but for arbitrary q [NTW08, Thm.3.10].
Instead of ê, we interpolate a numerical approximation êh , specifically the one obtained by the minimal residual method described in Section 4. For the additional error we have In [Chk14,Sect. 5.3] it has been shown that the factor , which is only of polylogarithmic order as function of the number of interpolation points.
Concerning the factor ê − êh L ∞ ([−1,1] N ; X) , in our derivation of Theorem 4.1 we have seen that for each parameter value (ρ 1 , . . ., ρ N ) ∈ [−1, 1] N the expression h −1 ê − êh X can be bounded by a constant multiple, only dependent on an upper bound for v L ∞ ( Î×Ω) and for the norm of ê in For uniformly bounded T and T −1 , and v that varies over a bounded set in , inspection of the estimates from Sect. 3 reveals that the latter norm of ê is uniformly bounded.So assuming that these conditions on T, T −1 and v hold true for (ρ 1 , . . ., ρ What remains is to establish the differentiability of the solution ê as function of each of the parameters which is done in the following theorem.

Theorem 5.1 For an open
Proof The proof is based on the fact that ê is the solution of a well-posed PDE with coefficients and a forcing term that are differentiable functions of ρ.

Numerical results
We consider three relevant examples of the form (1.3) (or its equivalent reformulation (2.1)) with σ = 1 from the literature.We transform the solution ũ of (2.1), which might live on a time-dependent spatial domain, to u, which satisfies (2.3) on the domain (0, T) × (0, 1).In each example the resulting drift function v as well as the end time point T depend on an up to N = 5 dimensional parameter ρ ρ ρ ∈ [−1, 1] N .
For all sparse interpolation points, by applying the minimal residual method from Section 4 we approximate ê by the continuous piecewise affine function êh on a uniform tensor mesh with mesh-size h, where û(v 0 ) inside the forcing term can be efficiently approximated at high accuracy as a truncated series.
Finally, for all parameter values ρ ρ ρ of interest, we apply the sparse tensor product interpolation analyzed in Section 5 giving rise to an overall error ê − I q êh X ≤ ê − êh X + êh − I q êh X ≈ êh/2 − êh X + êh − I q êh X with q the parameter that steers the accuracy of the sparse interpolation.For each of the considered three examples, we compute the latter two errors for different h and q and parameter test set ρ ρ ρ ∈ {−1, −0.5, 0.5, 1} N .(6.1)By Theorem 4.1, we expect êh/2 − êh X = O(h) for the first term.Section 5 suggests subexponential convergence of the second term êh − I q êh X as function of the number of interpolation points (this was shown for ê − I q ê X ).However, we already mentioned there that subexponential conver- gence is only observed for very high q and in practice one should rather expect algebraic convergence.
Notice that • X involves a negative order Sobolev norm.Thus, we compute an equivalent version of • X for functions in the discrete trial space w ∈ Xh ⊂ X (similarly for w ∈ Here, w w w h is the coefficient vector of w in the standard nodal basis Ψ h = {ψ h i }, B B B h and C C C h are defined as in (4.4) with the standard nodal basis Φ h = {φ h i }, and A A A
In Figure 6.1, we plot the maximal error êh/2 − êh ≈ ê − êh measured in the (equivalent) X-norm (6.2) over the test set (6.1) for different values of h. Figure 6.2 depicts the maximal interpolation error êh − I q êh over the test set (6.1) for different values of h and q.
In Figure 6.3, we plot the maximal error êh/2 − êh ≈ ê − êh measured in the (equivalent) X-norm (6.2) over the test set (6.1). Figure 6.4 depicts the maximal interpolation error êh − I q êh over the test set (6.1) for different values of h and q.

Conclusion
We have developed a numerical solution method for solving the Fokker-Planck equation on a one-dimensional spatial domain and with a discontinuity between initial and boundary data and time-dependent boundaries.We first transformed the equation to an equation on a rectangular time-space domain.We then demonstrated that the solution of a corresponding equation with a suitable constant drift function, whose solution is explicitly available as a fast converging series expansion, captures the main singularity present in the solution for a variable drift function.The equation for the difference of both these solutions, which is thus more regular than both terms, is solved   Fig. 6.8: Maximal error êh/2 ( v(ρ ρ ρ), T(ρ ρ ρ)) − êh ( v(ρ ρ ρ), T(ρ ρ ρ)) measured in (equivalent) X-norm over all ρ ρ ρ ∈ {−1, −0.5, 0, 0.5, 1} 4 for constant drift function with timedependent linear spatial domain from Section 6.3.
Finally, in order to efficiently solve Fokker-Planck equations that depend on multiple parameters, we demonstrate that the solution is a holomorphic function of these parameters.Consequently, a sparse tensor product interpolation method can be shown to converge at a subexponentional rate as function of the number of interpolation points.In one test example, this interpolation method works very satisfactory, but the results are less convincing in two other cases.We envisage that in those cases better results can be obtained by an adaptive sparse interpolation method as the one proposed in [CCS14].

1. 1
Notation In this work, by C D we mean that C can be bounded by a multiple of D, independently of parameters which C and D may depend on.Obviously, C D is defined as D C, and C D as C D and C D.

Fig. 6 . 7 :
Fig. 6.7:An approximation of the solution ũ to (2.1) obtained by transforming the approximation of û back to the original domain.