Spectral methods for nonlinear functionals and functional differential equations

We develop a rigorous convergence analysis for finite-dimensional approximations of nonlinear functionals, functional derivatives, and functional differential equations (FDEs) defined on compact subsets of real separable Hilbert spaces. The purpose this analysis is twofold: first, we prove that under rather mild conditions, nonlinear functionals, functional derivatives and FDEs can be approximated uniformly by high-dimensional multivariate functions and high-dimensional partial differential equations (PDEs), respectively. Second, we prove that such functional approximations can converge exponentially fast, depending on the regularity of the functional (in particular its Fr\'echet differentiability), and its domain. We also provide sufficient conditions for consistency, stability and convergence of functional approximation schemes to compute the solution of FDEs, thus extending the Lax-Richtmyer theorem from PDEs to FDEs. Numerical applications are presented and discussed for prototype nonlinear functional approximation problems, and for linear FDEs.


Introduction
A nonlinear functional is a map from a space of functions into a field, e.g., the real line or the complex plane [69]. Such map, which seems a rather abstract mathematical concept, plays a fundamental role in many areas of physics and applied sciences. For instance, functionals were used by Wiener for the description of Brownian motion [76], by Hohenberg and Kohn [30] to reduce the dimensionality of the Schrödinger equation in many-body quantum systems (density functional theory [46? ]), by Hopf to describe the statistical properties of turbulence [31,1,43,41], and by Bogoliubov to model systems of interacting bosons in superfluid liquid helium [8,61]. Other applications of nonlinear functionals can be found in [68,15,47,46,36,64,48,75,35,40].
Nonlinear functionals have also appeared in evolution equations known as functional differential equations (FDEs) [69]. A classical example in the context of fluid dynamics is the Hopf equation [31,45,56] which was deemed by Monin and Yaglom ([43,Ch. 10]) to be "the most compact formulation of the general turbulence problem", i.e., the problem of determining the statistical properties of the velocity and the pressure fields of the Navier-Stokes equations given statistical information on the initial state. In equation (1), V ⊆ R 3 is a periodic box, θ(x) = (θ 1 (x), θ 2 (x), θ 3 (x)) is a vector-valued (divergence-free) function, δ/δθ j (x) denotes the first-order functional derivative [27], and Φ is a complex-valued functional defined as In this equation, u(x, t) represents a stochastic solution to the Navier-Stokes equations corresponding to a random initial state, and E{·} is an expectation over the probability measure of such random initial state. Remarkably, the Hopf functional (2), allows us compute all statistical properties of the velocity field that solves the Navier-Stokes equations. Another well-known example of a functional differential equation is the Schwinger-Dyson equation of quantum field theory [47,80]. Such equation describes the dynamics of the Green functions of a quantum field theory, and it allows us to propagate field interactions in a perturbation setting (e.g., with Feynman diagrams) or in a strong coupling regime. The Schwinger-Dyson functional formalism is also useful in studying the statistical dynamics of classical systems described in terms of stochastic ordinary or partial differential equations 1 [35,40,48]. More recently, FDEs appeared in mean field games [12] and mean field optimal control [20]. Mean field games are optimization problems involving a very large (potentially infinite) number of players interacting via certain quantities. In some cases, it is possible to reformulate such optimization problems in terms of a nonlinear Hamilton-Jacobi FDE in probability density space. The standard form of such equation is [15] ∂F ([ρ], t) ∂t where ρ(x) is a d-dimensional density function supported on V ⊆ R d , δF/δρ(x) represents the first-order functional derivative of F relative to ρ(x), and H is the Hamilton's functional Here, H is a Hamilton's function on X, and G([ρ]) is an interaction potential. More general FDEs of the type (3) have been recently derived in the context of unnormalized optimal PDF transport [26]. Mean field theory is also useful in optimal feedback control of nonlinear stochastic dynamical systems, and in deep learning. Recent work of W. E and collaborators [20] laid the mathematical foundations of the population risk minimization problem in deep learning as a mean-field optimal control problem. Such mean-field optimal control problem yields a generalized version of the Hamilton-Jacobi-Bellman equation in a Wasserstein space, which is a nonlinear functional differential equation (see Eq. (20) in [20]). A fundamental question that has been left unanswered for many years is whether it is possible to develop efficient numerical methods to approximate nonlinear functionals, and compute the solution to FDEs such as (1) or (3). This is indeed a long-standing open problem in mathematical physics, and a rather unexplored area in computational mathematics. In a recent Physics Report [69] we reviewed state-of-the-art computational techniques that can be used to address this problem. The main objective of this paper is to provide rigorous mathematical foundations for such computational techniques, and to develop approximation theory and convergence analysis for nonlinear functionals, functional derivatives, and FDEs defined in compact subsets of real separable Hilbert spaces. The purpose of this analysis is twofold: first, we prove that under rather mild conditions nonlinear functionals and FDEs can be approximated uniformly by high-dimensional multivariate functions and high-dimensional partial differential equations (PDEs), respectively. Second, we prove such functional approximations can converge exponentially fast in compact subsets of real separable Hilbert spaces. We also provide sufficient conditions for consistency, stability and convergence of functional approximation schemes to compute the solution of FDEs, thus extending the Lax-Richtmyer theorem from PDEs to FDEs. This paper is organized as follows. In section 2 we briefly review nonlinear functionals defined in on Banach spaces, and recall the notion continuity, compactness and differentiability. In section 3 we specialize these concepts to nonlinear functionals defined on compact Hilbert spaces, and show that the Fréchet derivative in such compact spaces is a compact linear operator, which therefore admits a Riesz representation. In section 4 we introduce cylindrical approximations of nonlinear functionals and functional derivatives in compact Hilbert spaces. Uniform convergence of such representations is established in section 5 and section 6 for nonlinear functionals and functional derivatives, respectively. We also show that cylindrical approximations can converge exponentially fast for Fréchet differentiable functionals. Section 7 deals with the approximation of functional integrals in real separable Hilbert spaces. In section 8 we provide an example of convergence analysis for cylindrical approximations of a prototype nonlinear functional. In section 9 we develop sufficient conditions for consistency and convergence of cylindrical approximations to FDEs, thus extending the Lax-Richtmyer theorem from PDEs to FDEs. In section 10 we demonstrate the approximation theory we developed by providing numerical examples and applications to prototype nonlinear functionals and FDEs. Finally, the main findings are summarized in section 11. We also include a brief Appendix in which we review two notions of distance between function spaces, namely the deviation and the Kolmogorov width.

Nonlinear functionals defined on Banach spaces
Let X be a Banach space. A nonlinear functional on X is a map F from X into a field F (for the purpose of this paper F will either be the real line denoted by R or the complex plane denoted by C). The functional F usually does not operate on the entire space X but rather on a subset set of X, which we denote as D(F ) ⊆ X (domain of the functional) As an example, consider the real-valued functional where C (1) ([0, 1]) is the space of continuously differentiable functions in [0, 1]. Analysis of nonlinear functionals in Banach spaces is a well developed subject [66,44,60,32,22]. In particular, classical definitions of continuity and differentiability that hold for real-valued functions can be extended to functionals. For instance, Note that the definition of continuity and uniform continuity of a functional depends on how we measure the distance between elements of the function space D(F ).
in the Banach space of Lipschitz continuous periodic functions in [0, 2π], As is well known , the Fourier series of any element θ ∈ D(F ) defines a sequence of partial sums {θ m } ∈ D(F ) that converge uniformly to θ (see [34]). Thanks to such uniform convergence result, we have where the last equality follows from [5,Theorem 10]. Hence, the functional (9) is continuous on C Lip ([0, 2π]). Moreover, F sends any bounded subset of such function space into a pre-compact subset of the real line, and therefore F is completely continuous.

Differentials and derivatives of nonlinear functionals
We say that the functional F is Gâteaux differentiable at a point θ ∈ D(F ) if the limit exists and is finite for all η ∈ D(F ). The quantity dF η ([θ]) is known as Gâteaux differential of F in the direction of η [66,60]. Under rather mild conditions (see, e.g., [66, p. 37]) such differential can be represented as a linear operator acting on η [44,68] . Such linear operator is known as the Gâteaux derivative of F and and it will be denoted by F ([θ]) The Fréchet differential, on the other hand, is defined as the term dF ([θ], η) in the series expansion It is well-known that if F ([θ]) has a continuous Gâteaux derivative on D(F ), then F is Fréchet differentiable on D(F ), and these two derivatives agree [66, p. 41]. In this paper, we consider nonlinear functionals F that are continuously Gâteaux differentiable in D(F ). Hence, we will not need to distinguish between Fréchet and Gâteaux derivatives. There has been significant research activity on obtaining the minimal conditions under which a nonlinear functional is Gâteaux or Fréchet differentiable. It turns out that there are reasonably satisfactory results on Gâteaux differentiablility of Lipschitz functionals. For instance, we have the following Results on Fréchet differentiability are more rare, and usually harder to prove [38]. For instance, . As we will see in section 3, continuously differentiable nonlinear functionals on compact metric spaces are also compact, and have compact Fréchet derivative. It is also convenient to define another type of functional derivative, namely provided the limit exists. The quantity δF ([θ])/δθ(x) is known as the first-order functional derivative of F ([θ]) with respect to θ(x) (see [28, p. 309] and [27,69] where (·, ·) H is the inner product in H. As we will see in section 3, the Fréchet derivative of a continuous nonlinear functional defined on a compact subset of a real separable Hilbert space is a completely continuous 3 A Gauss-null set is a Borel set A ⊆ X such that µ(A) = 0 for every non-degenerate Gaussian measure µ on X. 4 Conditions under which linear operators between spaces of functions admit an integral representation were investigated in [10,57,58,16]. linear operator (i.e., continuous and compact). In this case, the Riesz representation (16) (see [18, p. 4] or [78,22]). Moreover, if µ([θ], x) is absolutely continuous with respect to x then there exists a Radon-Nikodym derivative, i.e., a density δF Under these assumptions, the Fréchet derivative of F admits the Lebesgue integral representation We emphasize that the expression (19) (or (16)) represents an infinite-dimensional generalization of the concept of directional derivative in multivariate calculus. By analogy, the quantity δF ([θ])/δθ(x) in (19) (or (16)) can be thought of as an infinite-dimensional gradient. Such gradient is a nonlinear functional of θ and a function of x. Higher-order Fréchet and functional derivatives can be defined similarly [69,27].
Example 1: The Fréchet derivative of the nonlinear functional (9) is easily obtained as where both θ and η are in the space of Lipschitz continuous functions (10). Clearly, equation (20) can be written as From this expression we see that the signed measure dµ([θ], x) appearing in equation (17) in this case has a density, which coincides with the first-order functional derivative Such derivative is a distribution in x and a functional of θ. It is important to emphasize that the Fréchet differential (20) is a linear functional in η that is bounded in the standard C (0) norm. In fact, |cos(θ(π))η(π)| ≤ |cos(θ(π))| sup However, such functional is unbounded in L 2 ([0, 2π]). Hence, the Riesz representation (16) does not apply to this case. To show this, let us expand η relative to any orthonormal trigonometric basis {ϕ 0 , ϕ 1 , . . .} [29] η The series (24) converges in the L 2 sense, and also pointwise since η is continuous [34]. A substitution of (24) into (20) yields It straightforward to show that and therefore the function v(x) defined in (25) is not in L 2 ([0, 2π]). Indeed, v(x) is the trigonometric series expansion of the Dirac delta function δ(x − π), which is not an element of L 2 ([0, 2π]).

Nonlinear functionals defined on compact subsets of real separable Hilbert spaces
Let us consider a metric space of functions X, e.g., a Hilbert or a Banach space, and a subset K ⊆ X. We say that K is bounded if for all θ ∈ K we have θ ≤ M , where M is a finite real number, and · is the norm in X. The set K is said to be closed if any convergent sequence in K has a limit in K. An example of a closed and bounded subset of the Hilbert space We say that the subset K ⊆ X is compact if every open cover of K has a finite subcover. There are several equivalent characterizations of compact subsets in metric spaces. For instance, Theorem 3.1. A subset K of a metric space X is compact if and only if every sequence in K has a bounded sub-sequence whose limit is in K.
The proof can be found, e.g., in [32, §1.7]. Closed and bounded function spaces are not necessarily compact, since we can define sequences that do not have convergent sub-sequences. An example is the the unit sphere in L 2 mentioned above. On the other hand, a compact set is always closed and bounded. Another useful characterization of compactness in real Hilbert spaces is the following for all θ ∈ K.
where L 2 p ([0, 2π]) is the Lebesgue space of periodic functions in [0, 2π] and ρ ∈ R is the radius of the Sobolov sphere, is a compact subset of L 2 p ([0, 2π]). Indeed, by expanding an arbitrary element θ ∈ K q in a Fourier series we obtain, for any 1 ≤ q ≤ s (see [29, p. 35 At this point it is clear that for any given > 0 there exists a natural number N such that the right hand side of (29) can be made smaller than for any θ ∈ K q . In other words, the equi-small tail condition (27) is satisfied, which proves that K q is a compact subset of L 2 p ([0, 2π]).
In fact, we have the following well-known spectral convergence result [29, p. 109 where ϕ k here are ultra-spherical polynomials. Hence, the equi-small tail condition holds for all θ in the set (30). This proves that K is a compact subset of L 2 w ([−1, 1]). Continuous functionals defined on compact metric spaces enjoy several good properties. First of all, they are bounded since they map compact sets into a closed and bounded subset of R or C. Moreover, by the Heine-Cantor Theorem we have that any continuous functional defined on a compact set is necessarily uniformly continuous. In other words, continuous nonlinear functionals on compact metric spaces are uniformly continuous and bounded [32]. If the functional is real-valued this means that the maximum and the minimum are attained at points within K. We also recall that closed subsets of compact sets are necessarily compact. Hence, a continuous functional on a compact space K maps any closed subset of K into a closed and bounded subset of R or C. Therefore such functional is necessarily compact 7 , i.e., completely continuous (see Definition 2.3). Next, we show that the Fréchet derivative of a continuous nonlinear functional defined on a compact subset of a real separable Hilbert space is a compact linear operator. Proof: Continuous functionals on compact metric spaces are necessarily completely continuous (see Definition 2.3). To prove the theorem we proceed by contradiction. To this end, suppose that F ([θ * ]) is not compact. Then it is possible to find > 0 and a sequence {θ k } ∈ K ⊆ H such that θ k H ≤ 1 and for all k = j. By definition of Fréchet derivative at θ * we have for all η ∈ K with reasonably small norm, say η H ≤ δ. In particular, we can choose δ such that Next, choose τ small enough so that (θ * + τ θ k ) ∈ K and τ θ k ≤ δ for all k ∈ N. For such functions we have This means that the functional F is not completely continuous. In fact, the inequality (36) implies that it is not possible to extract a convergent sub-sequence from the sequence This proves the Lemma.
Proof: We have seen in Theorem 3.1 that the Fréchet derivative of a continuous functional defined on a compact subset of a real separable Hilbert space is a compact linear operator. Therefore it is bounded.
is a linear operator from K into R or C, i.e., it defines a bounded linear functional. Hence, we can apply the Riesz representation theorem and conclude that there exists a unique element δF ([θ * ])/δθ(x) ∈ H such that (37) holds true. This proves the Lemma.
Example 1: Consider the nonlinear functional where θ is in the Sobolev sphere (28). The Fréchet derivative of (38) is This functional is bounded in L 2 , and therefore in H s . In fact, by the Cauchy-Schwarz inequality Therefore, for each given θ we have that Lemma 3.1 holds. In other words, there exists a unique first-order functional derivative which is an element of L 2 p ([0, 2π]) (as a function of x).

Approximation of nonlinear functionals in real separable Hilbert spaces
Let H be a real separable Hilbert space with inner product (·, ·) H . Any element θ of H can be then represented in terms of an orthonormal basis {ϕ 1 , ϕ 2 , . . .} as We introduce the projection operator 8 P m , which truncates the series expansion (42) to m terms Clearly, P m is an operator from H into the finite-dimensional space With this notation we can represent any nonlinear functional F ([θ]) in H as a function depending on an infinite (countable) number of variables. To this end, we substitute (42) A simple way to approximate this functional is to restrict its domain to the subspace P m H = D m . This reduces F ([θ]) to a multivariate function f (a 1 , . . . , a m ), which depends on as many variables as the number of basis elements of D m 9 , Next, we study the representation of the Fréchet and the first-order functional derivatives.
As we pointed out in Lemma 3.2, δF ([θ])/δθ(x) coincides with the first-order functional derivative (15). Since such derivative is an element of the Hilbert space H, it can be represented in terms of the orthonormal basis A differentiation of (45) with respect to a k yields 10 In other words, the partial derivative of f with respect to a k = (θ, ϕ k ) H is the projection of the first-order functional derivative of F onto the basis element ϕ k . By substituting (52) into (50) we obtain This expression emphasizes that the first-order functional derivative (53) is essentially a "dot product" between the (infinite-dimensional) gradient of f and the (infinite-dimensional) vector of basis elements. Evaluating (53) on the finite-dimensional function subspace D m = P m H yields Here f depends solely on the variables (a 1 , . . . , a m ), and it is defined in equation (48). If the functional derivative δF ([P m θ])/δθ(x) is an element of D m (as a function of x) then the second term on the right hand side of (54) is clearly equal to zero.
) be a continuously differentiable functional on a real separable Hilbert space H, and let P m be the projection operator (43). Then from equations (49)- (52) it follows that where f is the multivariate function defined in (48).

Convergence analysis for nonlinear functional approximation in compact Hilbert spaces
In this section we perform a convergence analysis for nonlinear functional approximation in compact subsets of real separable Hilbert spaces. We begin with the following 10 Equation (52) can be equivalently written as Lemma 5.1. (Pointwise convergence [51]) Let H be a real separable Hilbert space, K a compact subset of H, and P m : H → D m the projection operator (43). If F is a continuous functional on K then for all > 0 there exists m 0 ∈ N such that Proof: Since F is continuous on a compact subset K of H, we have by the Heine-Cantor Theorem that F is uniformly continuous on K. This means that F maps compact subsets K ⊆ H into compact subsets of the real line or complex plane. Hence, the maximum and the minimum of Next we establish uniform converge of P m θ to θ in compact metric spaces by using Dini's theorem, which we recall hereafter. Proof: Fix > 0. Since F is continuous on a compact subset K of H, the Heine-Cantor Theorem applies and F is uniformly continuous on K. Thus there exists δ > 0 such that Remark. The compactness hypothesis of the subset K we leveraged on in all convergence theorems can be replaced by the weaker assumption that K ⊆ H is bounded (e.g., a sphere) and F is uniformly continuous with respect to the so-called S-topology [6].

Convergence rate
Now that we established uniform convergence of the functional F ([P m θ]) to F ([θ]) the subsequent question is: how fast does F ([P m θ]) converge to F ([θ])? For continuously differentiable functionals (functionals with continuous Fréchet derivative) such convergence rate is the same as the rate at which P m θ converges to θ. This results follows from the following Theorem 5.3. (Mean value theorem) Let F be a real-valued continuously differentiable functional on a compact subset K of a separable real Hilbert space H. Then for all θ 1 , θ 2 ∈ K the following estimate holds where F ([η]) denotes the first-order Fréchet derivative of F .
The proof of the theorem is trivial and therefore we omit it. We simply recall that since F ([θ]) is the Fréchet derivative of a continuously differentiable functional on a compact metric space we have that F ([θ]) is a compact linear operator (Theorem 3.1) and therefore it is bounded. We also emphasize that it is possible to relax the assumptions in Theorem 5.3 significantly. For instance, it is possible to drop the requirement that F is continuously differentiable and leverage the fact that for each > 0 and any pair of points θ 1 , θ 2 ∈ K there exists a point θ * ∈ G in which H is Fréchet differentiable, and provided the line For the purpose of the present paper, however, we shall simply restrict the class of nonlinear functionals we study to continuously differentiable nonlinear functionals. This allows us to easily obtain the following convergence rate result using the mean value Theorem 5.3.

Lemma 5.3. (Convergence rate)
Let F be a real-valued, continuously differentiable functional on a compact subset K of a real separable Hilbert space H. Then for all θ ∈ K and for any finite-dimensional projection P m of the form (43) we have In particular, F ([P m θ]) converges uniformly to F ([θ]) at the same rate as P m θ converges (uniformly) to θ.
The proof follows directly from the mean value Theorem 5.3 by setting θ 1 = θ and θ 2 = P m θ.
Example 1: As a simple example of application of Lemma 5.3, consider the Sobolev space (28) of weakly differentiable periodic functions in [0, 2π]. We know that K q is a compact subset of L 2 p . Hence, given any continuously differentiable functional F on K we have that where G 1 is a constant that depends on the operator norm of the Fréchet derivative F ([θ]), and other quantities related to the Fourier series expansion. In particular, if P m θ converges to θ exponentially fast (e.g., if θ is infinitely differentiable), then we have an exponentially convergent nonlinear functional approximation method.

Approximation of Fréchet and functional derivatives in compact Hilbert spaces
In this section we study convergence of functional approximations to F ([θ]) and δF ([θ])/δθ(x) in a compact subset K of a separable real Hilbert space H. We begin with the following Theorem 6.1. (Uniform approximation of Fréchet derivatives) Let H be a real separable Hilbert space, K a compact subset of H, θ ∈ K and P m : H → D m the projection operator (43). If F is continuously differentiable on K with Fréchet derivative F ([θ]), then the sequence F ([P m θ]) converges uniformly to F ([θ]) for all θ in K. In other words, for all > 0 there exists m 0 ∈ N such that for all m ≥ m 0 , and for all θ ∈ K.
Since F is a continuous functional on a compact metric space, the Fréchet derivative F ([θ]) is a compact linear operator (Theorem 3.1). By using (62) we immediately obtain uniformly in θ ∈ K. This proves the theorem.
Next, we study convergence of the first-order functional derivative (15). This is relatively straightforward given the convergence result we just obtained in Theorem 6.1. In fact, the linear functional F ([θ])η is bounded on the compact set K ⊆ H and therefore it admits the Riesz integral representation where (·, ·) H is the inner product in H. Uniform convergence of F ([P m θ]) to F ([θ]) for all θ in the compact set K ⊆ H implies that for every > 0 there exists m 0 ∈ N such that and for all m ≥ m 0 . This yields the following The Fréchet differential of G η ([θ]) can be written as If we take the supremum of (67) over the set of η ∈ K with η = 1 we obtain This means that if the functional F is twice continuously differentiable then the rate at which the linear operator F ([P m θ]) converges to F ([θ]) is the same as the convergence rate of P m θ to θ. Note that F is a compact operator from K × K into R or C.

Approximation of functional integrals in real separable Hilbert spaces
In this section we study approximation of functional integrals defined on a real separable Hilbert space H, with particular emphasis on integrals involving the cylindrical functional, i.e., functionals of the form (47). This topic was first investigated by Friedrichs and Shapiro [25,24], and it fits the framework of functional approximations we discussed in section 4. To describe the method, we first recall that {P m } is a hierarchical and complete sequence of orthogonal projections, i.e., P m ⊆ P m+1 (the range of P m is a subset of the range of P m+1 ). Following Friedrichs, Shapiro and Sokorohod [25,24,62] (see also [ We emphasized in section 4 that F ([P m θ]) = f (a 1 , · · · , a m ) is a m-dimensional function depending on the variables a k = (θ, ϕ k ) H , which are the coordinates of θ relative to the orthonormal basis {ϕ 1 , ϕ 2 , . . . , ϕ m }.
The finite-dimensional measure in each subspace P m H can be taken, e.g., as a Gaussian product measure We know that the cylindrical functional This guarantees that the limit in (70) exists, and that the functional integral is well-defined. This allows us to define the inner product between two cylindrical functionals F and G as (see [69, §5.1]) where F(H) is the vector space of functionals defined on the Hilbert space H. The inner product (73) induces the norm Example 1: Consider the nonlinear functional in the space of square-integrable periodic functions in [0, 2π], i.e., L 2 p ([0, 2π]). We are interested in computing the functional integral where the measure dµ([θ]) is the limit of the product measure (71) as m → ∞. To this end, we first project θ onto the orthonormal Fourier basis where a k = (θ, sin(kx)) L 2 p / √ π and b k = (θ, cos(kx)) L 2 p / √ π, (k = 1, . . . , m). A substitution of (77) into (75) yields independently of m. Hence, the functional integral (76) is easily computed as 8

. An example of convergence analysis
In this section we provide a simple example of convergence analysis for a nonlinear functional defined on a compact subset of a real separable Hilbert space. To this end, consider on the following compact subset of L 2 p ([0, 2π]) where H s p ([0, 2π]) is the degree s Sobolev space of periodic functions in [0, 2π], and ρ is the radius of the Sobolev sphere K. We have seen that the Fréchet differential of F ([θ]) is given by which is clearly a linear operator in η for all θ ∈ K. Moreover, we have shown in section 3 that such operator is compact in the function space (81), and therefore it is continuous 11 . A more direct proof of continuity for (82) relies on showing that for any sequence {θ j } ∈ K converging to θ ∈ K in the L 2 p sense, we have that F ([θ m ])η converges to F ([θ])η for all η ∈ K. By using the Cauchy-Schwarz inequality we have As is well known, continuous and bounded functions such as sin(·) preserve L q p convergence under composition (see [5,Theorem 7]). Therefore F ([θ])η is continuous in θ for all θ ∈ K. Moreover, since K is a compact subset of L 2 ([0, 2π]), we have that F ([θ])η is bounded and Plugging this result into the mean value Theorem 5.3 yields the uniform convergence result The last two inequalities follow from well-known Fourier series approximation theory [29]. Next, we determine the convergence rate of the first-order Fréchet and functional derivative approximations. To this end, we notice that the second-order Fréchet derivative of (82), i.e., is a continuous bilinear operator on the compact set K × K. Therefore, by equation (69), the first-order Fréchet derivative must converge at the same rate as (85). The first-order functional derivative of F , i.e., the kernel of the integral operator (82) is As easily seen, if we evaluate (87) at P m θ we obtain the approximated functional derivative which is an element of L 2 p that converges to (87) uniformly in θ ∈ K in the L 2 p norm. This is in agreement with Lemma 6.1.

Approximation of functional differential equations in real separable Hilbert spaces
In this section we provide a brief analysis of finite-dimensional approximations to linear functional differential equations (FDEs). In particular, we show that approximation schemes obtained by evaluating the FDE on an m-dimensional subset of a compact Hilbert space converges uniformly to the FDE as we send m to infinity, provided the residual is continuous. Such consistency result implies that the m-dimensional approximation to the FDE, i.e., a PDE in m independent variables, converges to the full FDE as the number of independent variables goes to infinity. To describe the main idea, let us begin with the following prototype linear initial value problem where θ here is chosen in a compact subspace K of a real separable Hilbert space H. Next, define Clearly, if F is a solution to (89) then R([θ], t) = 0 for all θ ∈ K and for all t ≥ 0. On the other hand, if we project θ onto the finite-dimensional space D m defined in equation (44), then R([P m θ], t) = 0. If R is continuous in θ for all t ≥ 0, then the uniform approximation results we obtained in section 5 and section 6 can be used to show that |R([P m θ], t)| converges to zero uniformly in θ as we send m to infinity. This means that the m-dimensional linear PDE which, by equation (55), is equivalent to , t) as m goes to infinity. For that, we need a stability result, e.g., uniform boundedness of the operator that pushes forward the solution to (91). By "stability" here we mean that it is possible to control some norm of the solution to the PDE (91) by the initial condition in a way that is independent of the discretization parameter m, i.e., the number of terms in the series expansion of P m θ. In other words, a suitable norm of the PDE solution, e.g. (74), needs to be bounded by a constant multiple of a suitable norm of the initial condition, and all the norms involved, as well as the constant, do not depend on m. The consistency theorem for general linear FDEs can be formulated as follows is a consistent approximation of the FDE (93). Moreover, if (93) is continuously Fréchet differentiable, then the order of consistency is the same as the rate at which P m θ converges to θ in K. is Fréchet differentiable then we can apply Theorem 5.3 to conclude that the order of consistency is the same as the order at which P m θ converges to θ in the H norm.

A Lax-Richtmyer equivalence theorem for functional differential equations
The Lax-Richtmyer equivalence theorem states that a consistent method for well-posed PDE initial value problem is convergent if and only if it is stable. Such result can be extended to functional differential equations, and it allows us to show that the solution to the multivariate PDE (94) converges to the solution to the FDE (93) as we send m to infinity 12 . To this end, we can proceed as follows: a) Construct the multivariate PDE approximation (94) and show that such PDE is a consistent approximation to the FDE (93). b) Derive a suitable energy estimate/stability condition for the solution of (94). This is a PDE-specific stability result stating that it is possible to control some norm of the solution of (94) by a constant multiple of a suitable norm of the initial condition, and all the norms involved (including the constant), do not depend on m. As is well known, the simplest stability condition arises from an energy inequality, e.g. for PDEs with continuous and coercive linear operators. c) Use a) and b) to show that the solution of the PDE (94) converges to a solution of the FDE (93) as the number of independent variables is sent infinity.
The implication a) + b) ⇒ c) represents a generalization of the well-known Lax-Richtmyer theorem to functional differential equations. Hereafter we provide a simple example of consistency and stability analysis.
where θ is an element of the Sobolev sphere defined in equation (28). The FDE (95) can be written in the form (93) provided we set We first show that the multivariate PDE obtained by evaluating (95) in P m L 2 p ([0, 2π]), where P m is the projection (43) onto trigonometric polynomials, is a consistent approximation of the FDE (95). To this end, we substitute (48) and (54) where a j = (θ, ϕ j ) L 2 p The residual R([P m θ]) converges to zero uniformly in θ as m → ∞. In fact, At this point we recall that the functional δF ([P m θ], t)/δθ(x) satisfies the equi-small tail condition (27), since by the Riesz representation theorem 3.2, it is an element of L 2 p (as a function of x). This is sufficient to conclude that (98) goes to zero uniformly in θ as m goes to infinity. This, in turn, implies that the PDE converges to the FDE (95) uniformly in θ as m goes to infinity 13 . Next, we show that the solution to (99) is stable, i.e., that it is possible to bound its norm by a constant multiple of a suitable norm of the initial condition, and all the norms involved, do not depend on m. To this end, it is convenient to define and write (99) as Clearly, the vector field (A 1 , . . . , A m ) is divergence-free, i.e., since the basis functions ϕ k and their derivatives are periodic in [0, 2π]. Moreover, by using the method of characteristics [54] it is easy to show that the solution of (99) can be bounded as where f 0 is the initial condition. Hence, if the L ∞ norm of f 0 is bounded by a constant κ that is independent of m, then the problem (101) is stable in the L ∞ (R m ) norm 14 . Such strong bound implies that the solution 13 As we shall see in section 10, if the initial condition functional F0([θ]) associated with (95) is continuously Fréchet differentiable, then the solution to (95) is Fréchet differentiable as well. In this case, the residual defined in (97) is Fréchet differentiable. By using Theorem (9.1), we then conclude that the order of consistency of the PDE approximation (99) is the same as the rate at which Pmθ − θ L 2 p ([0,2π]) converges to zero. 14 An example of a cylindrical functional that is bounded in the L ∞ (R m ) norm is (101) is also bounded in the L 2 µ norm, where dµ is the measure (71). In fact, we have Note that this also implies that the functional integral (70)- (72) converges, as it is bounded by the same constant κ independently of m.

Numerical examples
In this section we provide numerical demonstrations of the approximation theorems we developed for nonlinear functionals and functional differential equations. To this end, we consider the function space defined by the following Sobolev sphere of raidius ρ We have seen in section 3 that K is a compact subset of L 2 p ([0, 2π]). Hence, any real-valued continuous functional F ([θ]) defined on K can be represented as the limit of a uniformly convergent sequence of functionals of the form F ([P m θ]), where P m is the projection operator (43). We can sample elements from (106) by taking truncated Fourier series of the form where c * −k denotes the complex conjugate, and {c k } are within a bounded subset of the complex plane. In fact, which can be easily bounded if {c k } are within a bounded subset of the complex plane.

Generation of test functions with prescribed Fourier spectrum
As is well-known, the decay rate of the modulus of the Fourier coefficients |c k | in the series expansion (107) is related to the degree of smoothness of θ, i.e., the value of s in (106) (see [29, §2]). Hence, by sampling θ from a space of periodic functions with a prescribed spectral decay |c k |, we can study the effects of the regularity parameter s in (106) on the rate of convergence of the nonlinear functional approximations we developed in section 4, section 5 and section 6. To sample test functions from (106), we represent c k in (107) in polar form, prescribe the decay of the amplitudes |c k | (k ≥ 0) and introduce a uniformly distributed random shift ϑ k ∈ [0, 2π] subject to the constraint ϑ k = −ϑ −k . This yields In section 10.3.2 we show that the solution of (101) corresponding to such initial condition converges uniformly to the solution of (95). Clearly, the assumption f0 L ∞ ≤ κ can be relaxed to, e.g., f0 L 2 µ ≤ κ1, where κ1 independent of m. This still guarantees stability of (101), just in a different norm.
We study two types of decay rates of the Fourier spectrum. The first is a power-law decay of the form where α ≥ 1 and k = 1, . . . , N . In equation (110)  The algebraic decay (110) corresponds to a Sobolev sphere (106) with index s = α. The radius of such sphere can be computed by substuting (110) into (108), and then evaluating the supremum. The second power spectrum we consider has an exponential decay of the form where β > 1 and k = 1, . . . , N . The random sequence {b(0), b(1), · · · , b(N )} has the same properties as the sequence {a(0), a(1), . . . , a(N )} in (110). The spectrum (111) corresponds to a Sobolev sphere (106) with index s = ∞ In Figure 1 we plot one sample of the random spectra (110) and (111), together with the corresponding sample functions (109) for N = 500, α ∈ {1.5, 2, 2.5, 3} and β ∈ {1.2, 1.5, 2, 3}. In the numerical applications presented hereafter we choose N large enough so that the contribution of the tail of the spectrum is negligible in the series expansion (109). This allows us to generate highly accurate approximations of θ in the space (106), which will then be projected onto a lower-dimensional subspace generated by another trigonometric basis. Such basis is the one that defines the projection operator (43). Specifically, we chose the following orthonormal basis consisting of discrete trigonometric polynomials [29, p. 29] ϕ k (x) = 1 2π(m + 1)

Approximation of nonlinear functionals
Consider the nonlinear functional analayzed in section 8 and its Fréchet derivative  Figure  1). The error in the Fréchet derivative is defined as and is computed as follows: for each given θ, we determine P m θ and then approximate the supremum over η using 10 3 sample functions η. This is done for 10 3 functions θ sampled from K as before. Notice that η ∈ K has the same form as θ and therefore it is taken from the same ensemble as θ is taken from. The results of our calculations are shown in Figure 3. is continuously Fréchet differentiable 15 , and therefore the result (69) holds.

Approximation of functional differential equations
In this section we provide an example of convergence analysis that shows in what sense the solution of the multivariate PDE (94) converges to the solution of the FDE (93) as we send m to infinity. To this end, we consider the initial value problem where F 0 is a given initial condition, and δF ([θ], t)/δθ(x) is the first-order functional derivative (15). Equation (118) is a linear FDE which can be written in the form (93) provided we define the following linear operator in the vector space of functionals  15 The functional (114) admits continuous Fréchet derivatives to any desired. In particular, we have This implies that equation (69) can applied to any of the Fréchet derivatives, by simply redefining the operator norm appearing at the right hand side of the inequality.  (113), and for test functions θ with spectra (110) (power law decay) and (111) (exponential decay). Note that, as expected, the approximated Fréchet derivative F ([Pmθ]) converges to F ([θ]) at the same rate as Pmθ converges to θ. The reason is that the functional F ([θ]) is continuously Fréchet differentiable to any desired order. Hence, the analytic result (69) holds. This is also the reason why the convergence plots are nearly identical to those in Figure 2 (compare (69) with (59)).
On the other hand, the initial condition is not translation-invariant. The solution to the initial value problem (118), with F 0 given in (122), is which is periodic in t with period 2π. It is easy to verify by direct calculation that (123) is indeed a solution to (118). To this end, let us define ∂ x = ∂/∂x. We begin by noting that The first-order functional derivative of (123) is obtained by analyzing its Fréchet differential Here we utilized the fact that the operator adjoint of the semigroup e −t∂x relative to standard L 2 p ([0, 2π]) inner product is e t∂x . Hence, the first-order functional derivative of (123) is δθ(x) = e t∂x sin(x) sin(2e −t∂x θ) .
Using again the fact that ∂ x is skew-symmetric relative to the L 2 p ([0, 2π]) inner product we obtain On the other hand, a temporal differentiation of (123) yields By setting the equality between (127) and (128) which is clearly an identity, given (124). This proof can be obviously generalized to arbitrary Fréchet differentiable initial conditions F 0 .

Finite-dimensional approximation of the functional differential equation and convergence analysis
We have seen in section 9 that the finite-dimensional approximation of the FDE (118) in the range of the Fourier projection (113) yields the multivariate PDE The matrix with entries C ij is skew-symmetric since the basis functions ϕ j are periodic (just use integration by parts). Similarly, evaluating (122) on the range of P m yields the cylindrical functional The solution to the initial value problem (131)-(130) is easily obtained as f (a, t) = f 0 e tC a , a = [a 0 , . . . , a m ] T .
We know that F 0 ([P m θ]) converges uniformly to F 0 ([θ]) as as m goes to infinity at the same rate as θ − P m θ L 2 p goes to zero. We also know that the residual of the finite-dimensional PDE approximation (130) goes to zero as we send m to infinity (see Example 1 in section 9), and that (130)-(131) is stable in L ∞ norm. This is sufficient to claim that the solution of (130)-(131), i.e., the field (132), converges uniformly in θ to the FDE solution (123) as we increase m. To show this, we first perform a simple numerical study where we compute the error at time t = π in the case where θ has power law or exponential decaying Fourier coefficients. As shown in Figure 4, the functional approximation f (a 0 , · · · , a m , t) indeed converges to F ([θ], t) at the same rate at which P m θ converges to θ. This is in agreement with our results on approximation of Fréchet differentiable functionals. It is worthwhile noticing that the convergence plots in Figures 2-4 are essentially a rescaled version of the same plot. The reason is that the FDE solution has continuous Fréchet derivatives up to any desired order. Hence, by the mean value theorem, the convergence slopes are determined only by θ − P m θ L 2 p . Next, we prove that (132) indeed converges to (123) uniformly in θ as m goes to infinity. To this end, It is well-known that the Galerkin scheme (137) is stable in the L 2 p ([0, 2π]) norm (see, e.g., [11, §6.1.1]), and that the solution θ m (x, t) converges to θ(x, t) at a rate that depends only on the regularity of θ(x, 0). This implies that where s measures the regularity of θ. In other words, if we choose θ in the function space (106) (Sobolev sphere), and we approximate the FDE (118) in terms of the high-dimensional PDE (130), then we have that the solution to such PDE converges uniformly to the solution of the FDE as m goes to infinity. We emphasize that this result follows from the consistency (98) and stability (103)-(105) of the cylindrical approximation to the FDE. If θ is infinitely differentiable in [0, 2π], then f (a 0 , . . . , a m , t) converges to F ([θ], t) exponentially fast in m, in agreement with the numerical results shown in Figure 4.

Conclusions
We established rigorous convergence results for cylindrical approximations of nonlinear functionals, functional derivatives, and functional differential equations (FDEs) defined on compact subsets of real separable Hilbert spaces. Such approximations are constructed by restricting the domain of the functionals to the range of a finite-dimensional orthogonal projection. In this setting, we proved that continuous functionals and FDEs can be approximated uniformly by multivariate functions and multidimensional partial differential equations (PDEs), respectively. The convergence rate of such approximations can be exponential, depending on the regularity of the functional (in particular its Fréchet differentiability), and its domain. We also provided sufficient conditions for consistency, stability and convergence of functional approximation schemes to compute the solution of FDEs, thus extending the well-known Lax-Richtmyer theorem from PDEs to FDEs. As we suggested in [69], these results open the possibility to utilize techniques for highdimensional model representation such as deep neural networks [52,53,79] and numerical tensor methods [17,3,55,7,59,37] to represent nonlinear functionals and compute approximate solutions to functional differential equations. We conclude by emphasizing that the results we obtained in this paper can be extended to real-or complex-valued functionals in compact Banach spaces (see, e.g., [33,65]). Also, the compactness assumption can be relaxed at the price of introducing additional conditions on the functionals [6].
Acknowledgements This research has been supported by the U.S. Army Research Office (ARO), grant number W911NF1810309. The paper was completed while Daniele Venturi was in residence at the Institute for Computational and Experimental Research in Mathematics (ICERM) in Providence, RI, during the semester program "Model and dimension reduction in uncertain and dynamic systems" -NSF grant DMS-1439786.

Appendix A. Distance between function spaces and approximability of nonlinear functionals
A key concept when approximating a nonlinear functional F ([θ]) by restricting its domain D(F ) to a finite-dimensional space functions D m is the distance between D m and D(F ). Such distance can be quantified in different ways (see, e.g., [49]). For example we can define the deviation of D m from D(F ) as which quantifies the error of the best approximation to the elements of D(F ) by elements in a vector subspace D m ⊆ H of dimension at most m. The Kolmogorov m-width can be rigorously defined, e.g., for nonlinear functionals in Hilbert spaces ( [49], Ch. 4). It should be emphasized that for a given domain of interest D(F ), finding the optimal basis spanning D m and minimizing the deviation E(D, D m ) is not an easy task. In some cases, however, asymptotic results are available, e.g., in the case of periodic Sobolev spaces [59]. It is important to emphasize that the approximation error and the computational complexity of approximating a nonlinear functional depends on the domain D(F ) and the choice of basis {ϕ 1 , ϕ 2 , . . .} spanning D m . In particular, an accurate functional approximation may be low-dimensional in one function space (i.e., for small , m is also small) and high-dimensional in another (i.e., for small , m must be taken very large) -see §3.1.2 in [69] for examples.