Optimal linear response for Markov Hilbert-Schmidt integral operators and stochastic dynamical systems

We consider optimal control problems for discrete-time random dynamical systems, finding unique perturbations that provoke maximal responses of statistical properties of the system. We treat systems whose transfer operator has an $L^2$ kernel, and we consider the problems of finding (i) the infinitesimal perturbation maximising the expectation of a given observable and (ii) the infinitesimal perturbation maximising the spectral gap, and hence the exponential mixing rate of the system. Our perturbations are either (a) perturbations of the kernel or (b) perturbations of a deterministic map subjected to additive noise. We develop a general setting in which these optimisation problems have a unique solution and construct explicit formulae for the unique optimal perturbations. We apply our results to a Pomeau-Manneville map and an interval exchange map, both subjected to additive noise, to explicitly compute the perturbations provoking maximal responses.


Introduction
The statistical properties of the long-term behaviour of deterministic or stochastic dynamical systems are strongly related to the properties of invariant or stationary measures and to the spectral properties of the associated transfer operator. When the dynamical system is perturbed it is useful to understand and predict the response of the statistical properties of the system through these objects. When such responses are differentiable, we say that the system exhibits a linear response to the class of perturbations. To first order, this response can be described by a suitable derivative expressing the infinitesimal rate of change in e.g. the natural invariant measure or in the spectrum.
Understanding the response of statistical properties to perturbation has particular importance in applications, including to climate science (see e.g. [25], [27] and the references therein).
In the present paper we go beyond quantifying responses and address natural problems concerning the optimal response, namely which perturbations elicit a maximal response. For example, given an observation function, which perturbation produces the greatest change in the expectation of this observation, and which perturbation produces the greatest change in the rate of convergence to equilibrium. Continuing the climate science application, one may wish to know which small climate action (which perturbation) would produce the greatest reduction in the average temperature (the expected observation value). We note that by considering trajectories of a perturbed map and using ergodicity, one may view the problem of maximising the response in the expectation of an observation as an infinite-horizon optimal control problem, averaging an observation along trajectories.
The linear response of dynamical systems is an area of intense research and we present a brief overview of the literature that is related to the present work. Early results concerning the response of invariant measures to the perturbation of a deterministic system have been obtained by Ruelle [42] in the uniformly hyperbolic case. More recently, these results have been extended to several other situations in which one has some hyperbolicity and sufficient regularity of the system and its perturbations. We refer the reader to the survey [5] for an extended discussion of the literature about linear response (and its failure) for deterministic systems.
The mathematical literature on linear response of invariant measures of stochastic or random dynamical systems is more recent. In the framework of continuous-time random processes and stochastic differential equations, linear response results were proved in [27,32]. Results related to the linear response of the stationary measure for diffusion in random media appear in [33,24,23,13,40]. In the discrete-time case, examples of linear response for small random perturbations of uniformly hyperbolic deterministic systems appeared in [26]. In [4] linear response results are given for random compositions of expanding or non-uniformly expanding maps. In the paper [49] the smoothness of the invariant measure response under suitable perturbations is proved for a class of random diffeomorphisms, but no explicit formula is given for the derivatives; an application to the smoothness of the rotation number of Arnold circle maps with additive noise is presented. Systems generated by the iteration of a deterministic map subjected to i.i.d. additive random perturbations are one class of stochastic systems studied in the present paper (see Section 6). The linear response of such systems is considered systematically in [20] and linear response results are proved for perturbations to the deterministic map or to the additive noise. These results are used to by [39] to extend some results of [49] outside the diffeomorphism case and applied to an idealized model of El Niño-Southern Oscillation, given by a noninvertible circle map with additive noise. Higher derivative results for the response of systems with additive noise are presented in [22]. Response results for random systems in the so-called quenched point of view appeared recently in [43], [44] where the random composition of expanding maps is considered using Hilbert cones techniques and in [11] where the random composition of hyperbolic maps is considered by a transfer operator based approach.
We remark that the addition of random perturbations is not necessarily sufficient to guarantee a linear response. An i.i.d. composition of the identity map and a rotation on the circle is considered in [19], and it is shown that using observables with square-integrable first derivative, one only has Hölder continuity of the response with respect to C 0 perturbations of the circle rotation.
One can similarly consider the linear response of the dominant eigenvalues of the transfer oper-ator under perturbation. In the literature there are several results describing the way eigenvalues and eigenvectors of suitable classes of operators change when those operators are perturbed in some way, for example classical results concerning compact operators subjected to analytic perturbations [29], and quasi-compact Markov operators subjected to C k perturbations [28]. In specific classes of dynamics, differentiability of isolated spectral data is demonstated in [26] for transfer operators of Anosov maps where the map is subjected to C k perturbations and in [32] for transfer operators arising from SDEs subjected to C k perturbations of the drift. Optimal linear response questions have been considered in the dynamical setting of homogeneous (and inhomogeneous) finite-state Markov chains [1], where explicit formulae are provided for the unique maximising perturbations that (i) maximise the norm of the response, (ii) maximise the expectation of a given observable, and (iii) maximise the spectral gap. The efficient Lagrange multiplier approach developed in [1] for questions (ii) and (iii) will be extended to the infinitedimensional setting in the present paper. In continuous time, [18] maximised the spectral gap of a numerical discretisation of a periodically forced Fokker-Planck equation (perturbing the velocity field to maximally speed up or slow down the exponential mixing rate). The same problem is considered by [16], but for general aperiodic forcing over a finite time, using the Lagrange multiplier approach of [1]. A non-spectral approach to increasing mixing rates by optimal kernel perturbations in discrete time is [15].
Related optimal control problems have been considered in [21] where the goal was to find a minimal perturbation realising a specific response to the invariant measure of a deterministic system (about the problem of finding an infinitesimal perturbation realising a given response see also [30]). These kinds of questions and other similar ones were also briefly considered in [20] for random dynamical systems consisting of deterministic maps perturbed by additive noise. Similar problems in the case of probabilistic cellular automata were considered in [38].
The present work takes the point of view of [1], extending the theory to the infinite-dimensional setting of stochastic integral operators, proving the existence of unique optimal perturbations, deriving explicit formulae for these optimal perturbations, and illustrating the formulae and their conclusions via two topical examples. We consider the class of stochastic dynamical systems with transfer operators representable by an L 2 -compact, integral operator, which includes deterministic systems perturbed by additive noise. The transfer operator L has the form Lf (x) = k(x, y)f (y) dy, where k is a stochastic kernel; in the case of deterministic systems T with additive noise, k(x, y) = ρ(x − T (y)), with ρ a probability density representing the distribution of the noise intensity (see Section 6). We consider perturbations of two types: firstly, perturbations to the kernel k, and secondly, perturbations to the map T . An outline of the paper is as follows. In Section 2 we consider general compact, integralpreserving operators L : L 2 → L 2 (see (3)) and state general linear response statements for the normalised fixed points and the leading eigenvalues of these operators (Theorem 2.2 and Proposition 2.6). In Section 3, we derive response formulae for the normalised fixed points (Corollary 3.5) and spectral values (Corollary 3.6) of operators of the form (1), under perturbation of the kernel k. In Section 4 we consider the problem of finding the perturbation that provokes a maximal response in the average of a given observable (General Problem 1) and the spectral gap (General Problem 2). We show that if the feasible set of perturbations is convex, an optimal solution exists, and that this optimum is unique if the feasible set is strictly convex. In Section 5.1, using Lagrange multipliers we derive an explicit formula for the unique optimal kernel perturbation that maximises the expectation of an observable (Theorem 5.4). In section 5.2 we prove an explicit formula for the perturbation that maximise the change in spectral gap (and therefore the rate of mixing) of the system (Theorem 5.6).
In Section 6 we specialise our integral operators to annealed transfer operators corresponding to deterministic maps T with additive noise. For these systems the kernel k has the form k(x, y) = ρ(x − T (y)) for some nonsingular transformation T , and we consider perturbations of the map T directly. Response formulas for these perturbations are developed in Proposition 6.3 and Proposition 6.6 for the invariant measure and the dominating eigenvalues, respectively. In this framework we again prove existence and uniqueness of the map perturbation maximising the derivative of the expectation of an observation (Proposition 7.3) and then derive an explicit formula for the extremiser (Theorem 7.4). Proposition 7.6 and Theorem 7.7 state results analogous to Proposition 7.3 and Theorem 7.4 for the optimization of the spectral gap and mixing rate.
In section 8 we apply and illustrate the theoretical findings of this work on the Pomeau-Manneville map and a weakly mixing interval exchange, each perturbed by additive noise. For each map we numerically estimate (i) the optimal stochastic perturbation (perturbing the kernel k) and (ii) the optimal deterministic perturbation (perturbing the map T ) that maximises the derivatives of the expectation of an observable and the mixing rate. One of the interesting lessons is that to maximally increase the mixing rate of the noisy Pomeau-Manneville map, one should perturb the kernel (stochastic perturbation) to move mass away from the indifferent fixed point or deform the map to transport mass away from the fixed point (deterministic perturbation); see Figures 4 and 7, respectively. Further numerical outcomes are discussed and explained in Section 8.

Linear response for compact integral-preserving operators
In this section we introduce general response results for integral-preserving compact operators. We consider both the response of the invariant measure to the perturbations and the response of the dominant eigenvalues.

Existence of linear response for the invariant measure
In the following, we consider integral-preserving compact operators acting on L 2 , which are not necessarily positive. We will give a general linear response statement for their invariant measures. In Section 3 we show how these results can be applied to Hilbert-Schmidt integral operators, which will later be transfer operators of suitable random dynamical systems. Let L 2 ([0, 1]) be the space of square-integrable functions over the unit interval (considered with the Lebesgue measure m); for brevity, we will denote it as simply L 2 . We remark that the analysis in the rest of the paper can be extended to manifolds, but we keep the setting simple so as not to obscure the main ideas. Let us consider the space of zero-average functions Definition 2.1. We say that an operator L : L 2 → L 2 has exponential contraction of the zero average space V if there are C ≥ 0 and λ < 0 such that ∀g ∈ V L n g 2 ≤ Ce λn g 2 (2) for all n ≥ 0.
Forδ > 0 and δ ∈ [0,δ), we consider a family of integral-preserving, compact operators L δ : L 2 → L 2 ; we think of L δ as perturbations of L 0 . We say that f δ ∈ L 2 is an invariant function of L δ if L δ f δ = f δ . We will see that under natural assumptions, the operators L δ , δ ∈ [0,δ), have a family of normalized invariant functions f δ ∈ L 2 . Furthermore, for suitable perturbations the invariant functions vary smoothly in L 2 and we get an explicit formula for the resulting derivative df δ dδ . We remark that since the operators we consider are not necessarily positive, the invariant functions are not necessarily positive.
Theorem 2.2 (Linear response for integral-preserving compact operators). Let us consider a family of compact operators L δ : L 2 → L 2 , with δ ∈ 0, δ , preserving the integral: for each g ∈ L 2 L δ g dm = g dm. (3) Then, (I) The operators have invariant functions in L 2 : for each δ there is g δ = 0 such that L δ g δ = g δ .
Under this assumption, the unperturbed operator L 0 has a unique normalized invariant function f 0 such that f 0 dm = 1. Furthermore, L 0 has exponential contraction of the zero average space V .
(III) Suppose the family of operators L δ also satisfy the following: (A2) (L δ are small perturbations and existence of derivative operator at Under these assumptions, the following hold: (a) There exists a δ 2 > 0 such that for each 0 ≤ δ < δ 2 , the operators L δ have unique invariant functions f δ such that f δ dm = 1.
thus, (Id − L 0 ) −1f represents the first order term in the perturbation of the invariant function for the family of systems L δ . Proof.

Claim (I):
We start by proving the existence of the invariant functions g δ for the operators L δ .
Since the operators are compact and integral preserving, L δ has an eigenvalue 1 for each δ. Indeed, let us consider the adjoint operators L * δ : L 2 → L 2 defined by the duality relation L δ f, g = f, L * δ g for all f, g ∈ L 2 . Because of the integral-preserving assumption, we have f, This implies L * δ 1 = 1 and thus, 1 is in the spectrum of L * δ and L δ . Since L δ is compact, its spectrum equals the eigenvalues and we have nontrivial fixed points for the operators L δ . Claim (III)(a) for δ = 0: Now we prove the uniqueness of the normalized invariant function of L 0 . Above we proved that L 0 has some invariant function g 0 = 0. The mixing assumption (A1) implies that g 0 dm = 0; to see this, we note that if g 0 dm = 0, then g 0 ∈ V , and, by (A1), g 0 cannot be a nontrivial fixed point of L 0 . We claim that f 0 = g 0 g 0 dm is the unique normalized invariant function for L 0 . To see this, suppose there was a second normalized invariant function f 0 ; then, f 0 − f 0 would be an invariant function in V , which is a contradiction.

Claim (II):
To show that L 0 has exponential contraction on V , we first note that for f ∈ L 2 , Thus, the spectrum of L 0 is contained in the unit disk by the spectral radius theorem. Now suppose λ is in the spectrum of L 0 and |λ| = 1. By the compactness assumption, there is an eigenvector f λ for λ and then we have ||L n 0 (f λ )|| 2 = ||f λ || 2 . However, L n 0 (f λ ) → L 2 f 0 f λ dm, which is not possible unless λ = 1. Hence, the spectrum of L 0 | V is strictly contained in the unit disk. Thus, by the spectral radius theorem, there is an n > 0 such that ||L n 0 | V || L 2 →L 2 ≤ 1 2 and we have exponential contraction of L 0 on V .
Claim (III)(a) for δ ∈ [0,δ]: From the assumption ||L 0 − L δ || L 2 →L 2 ≤ Kδ, we have for small enough δ that ||L n δ | V || L 2 →L 2 ≤ 2 3 and therefore, L δ is also mixing. We can apply the argument above to the operators L δ and obtain, for each small enough δ, a unique normalized invariant function f δ . Claim (III)(b): Using the exponential contraction of L 0 on V , we now show that (Id − L 0 ) −1 : L n 0 f . Since L 0 is exponentially contracting on V , and ∞ n=1 Ce λn := M < ∞, the sum ∞ n=1 L n 0 f converges in V with respect to the L 2 norm. The resolvent (Id − L 0 ) −1 : V → V is then a continuous operator and ||(Id − L 0 ) −1 || V →V ≤ 1 + M. We remark that sincef ∈ V, the resolvent can be computed atf .
Claim (III)(c): Now we are ready to prove the linear response formula. Furthermore, we have Since f 0 and f δ are the invariant functions of L 0 and L δ , we have By applying the resolvent to both sides we obtain Moreover, from assumption (A2), we have for sufficiently small δ that Since we already proved that lim δ→0 f δ − f 0 2 = 0, we are left with converging in the L 2 norm.
We remark that the strategy of proof of Theorem 2.2 is similar to the one of Theorem 3 of [20] although the assumptions made are quite different, here we consider a compact integral preserving operator on L 2 , while in [20] several norms are considered to allow low regularity perturbations and the operator is required to be positive.
It is worth to remark that the above proof gives a description of the spectral picture of L 0 . By Theorem 2.2, if L 0 satisfies (A1) then the invariant function is unique, up to normalization; this shows that 1 is a simple eigenvalue. Furthermore, L 0 preserves the direct sum L 2 = span{f 0 } ⊕ V and the spectrum of L 0 is strictly inside the unit disk when L 0 is restricted to V . Hence, the spectrum of L 0 is contained in the unit disk and there is a spectral gap. Remark 2.3. The mixing assumption in (A1) is required only for the unperturbed operator L 0 . This assumption is satisfied, for example, if L 0 is an integral operator and an iterate of this operator has a strictly positive kernel, see Corollary 5.7.1 of [35]. Later in Remark 6.4 we show this assumption is verified for a wide range of examples of stochastic dynamical systems.

Existence of linear response of the dominant eigenvalues
In this section, we consider the existence of linear response for the second largest eigenvalues (in magnitude) and provide a formula for the linear response. An important object needed to quantify linear response statements is a "derivative" of the transfer operator with respect to the perturbation. Definition 2.4. We defineL : L 2 → V as the unique linear operator satisfying Let B(L 2 ) denote the space of bounded linear operators from the Banach space L 2 to itself and r(L) denote the spectral radius of an operator L; we begin with the following definition. Adapting Theorem III.8 and Corollary III.11 of [28] to our situation, we can now state a linear response result for these eigenvalues.  (3)) compact operators. Assume that the map δ → L δ is in C 1 (I 0 , B(L 2 ([0, 1], C))) and L 0 is mixing (see (A1) in Theorem 2.2). Then, λ 1,0 := 1 ∈ σ(L 0 ) and r(L 0 ) = 1. Let I ⊂ σ(L 0 ) \ {1} be the eigenvalue(s) of maximal modulus strictly inside the unit disk; assume they are geometrically simple and let s := |I| + 1. Then there exists an interval I 1 := [0, δ 1 ), I 1 ⊂ I 0 such that for δ ∈ I 1 , L δ has s dominating simple eigenvalues. Thus, there exists functions e i,(·) ,ê i,(·) ∈ C 1 (I 1 , L 2 ([0, 1], C)) and λ i,(·) ∈ C 1 (I 1 , C) such that for δ ∈ I 1 and i, j = 2, . . . , s whereL is as in Definition 2.4.
Doing so, hypothesis (H1) of Theorem III.8 [28] is satisfied. Since r(L 0 ) = 1, we just need to show that L 0 has s dominating eigenvalues. Since L 0 is a compact operator, the eigenvalues λ i,0 ∈ I are isolated. Let Π i be the eigenprojection onto the eigenspace of λ i,0 and E i : ). We thus have: (2) L 0 (E) ⊂ E and L 0 ( E) ⊂ E.
(3) dim(E) = s and L 0 | E has s simple eigenvalues λ 1,0 ∪I. This point follows from the assumption that the eigenvalues in I are geometrically simple and the fact that λ 1,0 is simple (see Theorem 2.2).
We can now apply the argument in Corollary III.11 [28] for λ i,0 to obtain (15) (the result and proof of Corollary III.11 [28] is for the top eigenvalue, however the argument still holds for any eigenvalue λ i,0 , ∈ I by changing the index value in the proof of the corollary).

Application to Hilbert-Schmidt integral operators
In this section we apply the results of the previous section to Hilbert-Schmidt integral operators and suitable perturbations. The operators we consider are compact operators on L 2 ([0, 1], R) (or L 2 ([0, 1], C)); for brevity we will denote 2 L 2 := L 2 ([0, 1], R). To avoid confusion we point out that in the following we will also consider the space L 2 ([0, 1] 2 ) of square integrable real functions on the unit square; this space contains the kernels of the operators we consider.
Let k ∈ L 2 ([0, 1] 2 ) and consider the operator L : L 2 → L 2 defined in the following way: for f ∈ L 2 Lf (x) = k(x, y)f (y)dy; (5) such an operator is called a Hilbert-Schmidt integral operator. Such operators may represent the annealed transfer operators of systems perturbed by additive noise (see Section 6). We now list some well-known and basic facts about Hilbert-Schmidt integral operators with kernels in L 2 ([0, 1] 2 ): • The operator L : L 2 → L 2 is bounded and (see Proposition 4.7 in II. §4 [8]).
and the operator L : • If for almost every y ∈ [0, 1] we have k(x, y)dx = 1, then the Hilbert-Schmidt integral operator associated to the kernel k is integral preserving (satisfies (3)).
Combining the last two points, we have from Theorem 2.2 that such an operator has an invariant function in L 2 . Furthermore, for k ∈ L ∞ ([0, 1] 2 ) we have an analogous result. Proof. Since k is an integral-preserving kernel, L 0 satisfies (3). Thus, we can apply Theorem 2.2 to conclude that there exists a unique f ∈ L 2 , f dm = 1, such that Lf = f . Noting that k ∈ L ∞ ([0, 1] 2 ), we have from inequality (7) that f ∈ L ∞ . We now assume k is nonnegative. Let k j be the kernel of the operator L j . Since k is an integral-preserving kernel, we have . Thus, for any probability density g ∈ L 1 , we have L j g ∞ ≤ k L ∞ ([0,1] 2 ) ; thus, by Corollary 5.2.2 in [35], there exists a probability densitŷ f ∈ L 1 such that Lf =f . Since f is the unique invariant function with integral 1, we havef = f ; thus, f is a probability density.

Characterising valid perturbations and the derivative of the transfer operator
In this subsection we consider perturbations of integral-preserving Hilbert-Schmidt integral operators such that assumption (A2) of Theorem 2.2 can be verified and the derivative operatorL computed. We begin, however, by first characterizing the set of perturbations for which the integral preserving property of the operators is preserved. Consider the set V ker of kernels having zero average in the x direction, defined as V ker := k ∈ L 2 ([0, 1] 2 ) : k(x, y)dx = 0 f or a.e. y . Then, the following are equivalent Proof. Clearly, the second condition implies the first. For the other direction we prove the contrapositive. If k(x, y)dx = 0 on a set of positive measure, then for a small > 0 there is a set S of positive measure m(S) > 0 such that k(x, y)dx ≥ or k(x, y)dx ≤ − for each y ∈ S. Suppose k(x, y)dx ≥ in this set, consider f := 1 S and g := Af. Then, g(x) = k(x, y)1 S (y)dy and we have g(x)dx = S k(x, y)dxdy ≥ m(S) and g / ∈ V . The other case k(x, y)dx ≤ − is analogous.
We now prove that V ker is closed. Proof. The fact that V ker is a vector space is trivial. For fixed f ∈ L 2 ([0, 1]), the set of k ∈ L 2 ([0, 1] 2 ) such that k(x, y)f (y)dx ∈ V is closed. To see this, define the function K f : We now introduce the type of perturbations which we will investigate throughout the paper. Let L δ : L 2 → L 2 be a family of integral operators, with kernels k δ ∈ L 2 ([0, 1] 2 ), given by wherek, r δ ∈ L 2 ([0, 1] 2 ) and ||r δ || L 2 ([0,1] 2 ) = o(δ). The bounded linear operatorL : If additionally the derivative of the map δ → k δ with respect to δ varies continuously in a neighborhood of δ = 0, then δ → L δ has a continuous derivative in a neighborhood of δ = 0.
Proceeding similarly, one shows that if the map δ → k δ has a continuous derivative with respect to δ in a neighborhood of δ = 0, then δ → L δ has a continuous derivative. Indeed we are supposing that for each δ ∈ [0, δ) there isk δ such that for small enough h and furthermore δ →k δ is continuous. We have then by (6) that the associated operatorsL δ defined aṡ also varies in a continuous way as δ increases.

A formula for the linear response of the invariant measure and its continuity
Now we apply Theorem 2.2 to Hilbert-Schmidt integral operators to obtain a linear response formula for L 2 perturbations.
Corollary 3.5 (Linear response formula for kernel operators). Suppose L δ : L 2 → L 2 are integralpreserving (satisfying (3)) integral operators with stochastic kernels k δ ∈ L 2 ([0, 1] 2 ) as in (9). Suppose L 0 satisfies assumption (A1) of Theorem 2.2. Thenk ∈ V ker , the system has linear response for this perturbation and an explicit formula for it is given by Then, for a.e. y ∈ [0, 1] and δ = 0, we have As δ → 0, the right hand side approaches 0 and, since k (x, y)dx is independent of δ, we have k (x, y)dx = 0 for a.e. y ∈ [0, 1], i.e.k ∈ V ker . Furthermore by (9) there is a K ≥ 0 such that Hence the family of operators satisfy the first part of assumption (A2). The second part of this assumption is established by the results of Lemma 3.4.
Since the operators L δ are compact, integral preserving, and satisfy assumptions (A1) and (A2) we can conclude by applying Theorem 2.2 to this family of operators, obtaining Now we show that the linear response of the invariant measure is continuous with respect to the kernel perturbation. This will be used in Section 4 for the proof of the existence of solutions of our main optimization problems.
Consider the transfer operator L 0 , having a kernel k 0 ∈ L 2 ([0, 1] 2 ), and a set of infinitesimal perturbations P ⊂ V ker of k 0 . We will endow P with the topology induced by its inclusion in L 2 ([0, 1] 2 ). Suppose L δ is a perturbation of L 0 satisfying the assumptions of Lemma 3.4. By Corollary 3.5, the linear response will depend on the first-order term of the perturbation,k ∈ P , allowing us to define the function R : P → V by By (6) and the continuity of the resolvent operator it follows directly that the response function

A formula for the linear response of the dominant eigenvalues and its continuity
We apply Proposition 2.6 to Hilbert-Schmidt integral operators and obtain a linear response formula for the dominant eigenvalues in the case of L 2 perturbations.
Let λ 0 ∈ C be an eigenvalue of L 0 with the largest magnitude strictly inside the unit circle and assume that λ 0 is geometrically simple. Then, there existsλ ∈ C such that where e ∈ L 2 ([0, 1], C) is the eigenvector of L 0 associated to the eigenvalue λ 0 ,ê ∈ L 2 ([0, 1], C) is the eigenvector of L * 0 associated to the eigenvalue λ 0 andL is the operator in Lemma 3.4.
, C) is compact; by assumption, it also satisfies (3). From Lemma 3.4, the map δ → L δ is C 1 . Hence, by Proposition 2.6, we haveλ = ê,Le L 2 ([0,1],C) . Finally, we computė From the expression in the final line of the proof above, it is clear that if we considerλ as a function ofk, the mapλ : where J : H → R is a continuous linear function, H is a separable Hilbert space and P ⊂ H. Proposition 4.1 (Existence of the optimal solution). Let P be bounded, convex, and closed in H. Then, problem considered at (16) has at least one solution.
Proof. Since P is bounded and J is continuous, we have that sup k∈P J (k) < ∞. Consider a maximizing sequence k n such that lim n→∞ J (k n ) = sup k∈P J (k). Then, k n has a subsequence k n j converging in the weak topology. Since P is strongly closed and convex in H, we have that it is weakly closed. This implies that k := lim j→∞ k n j ∈ P. Also, since J (k) is continuous and linear, it is continuous in the weak topology. Then we have that J (k) = lim j→∞ J (k n j ) = sup k∈P J (k) and we realise a maximum.
Uniqueness of the optimal solution will be provided by strict convexity of the feasible set.
Definition 4.2. We say that a convex closed set A ⊆ H is strictly convex if for each pair x, y ∈ A and for all 0 < γ < 1, the points γx + (1 − γ)y ∈ int(A), where the relative interior 3 is meant. Proposition 4.3 (Uniqueness of the optimal solution). Suppose P is closed, bounded, and strictly convex subset of H, and that P contains the zero vector in its relative interior. If J is not uniformly vanishing on P then the optimal solution to (16) is unique.
Proof. Suppose that there are two distinct maximak 1 ,k 2 ∈ P with J (k 1 ) = J (k 2 ) = α. Let 0 < γ < 1 and set z = γk 1 + (1 − γ)k 2 . By strict convexity of P , z ∈ int(P ), and by linearity of J , J (z) = α. Let B r (z) denote a (relative in P ) open ball of radius r centred at z, with r > 0 chosen small enough so that B r (z) ⊂ int(P ). Because the zero vector lies in the relative interior of P , and J does not uniformly vanish on P , there exists a vector v ∈ B r (z) such that J (v) > 0. Now z + rv 2 v ∈ int(P ) and J (z + rv 2 v ) > α, contradicting maximality ofk 1 . In the following subsections we apply the general results of this section to our specific optimisation problems.

Optimising the response of the expectation of an observable
Let c ∈ L 2 be a given observable. We consider the problem of finding an infinitesimal perturbation that maximises the expectation of c. The perturbations we consider are perturbations to the kernels of Hilbert-Schmidt integral operators, of the form (9). If we denote the average of c with respect to the perturbed invariant density f δ by where the last equality follows from Corollary 3.5. The function J (k) = c, R(k) is clearly continuous as a map from (V ker , · ) L 2 ([0,1] 2 ) to R. Suppose that P is a closed, bounded, convex subset of V ker containing the zero perturbation, and that J is not uniformly vanishing on P . We wish to solve the following problem: We may immediately apply Proposition 4.1 to obtain that there exists a solution to (17). If, in addition, P is strictly convex, then by Proposition 4.3 the solution to (17) is unique.
To end this subsection we note that without loss of generality, we may assume that c ∈ span{f 0 } ⊥ . This is because for c ∈ L 2 , we have

Optimising the response of the rate of mixing
We now consider the linear response problem of optimising the rate of mixing. Let λ 0 ∈ C denote an eigenvalue of L 0 strictly inside the unit circle with largest magnitude. From now on, whenever discussing the linear response of eigenvalues to kernel perturbations we assume the conditions of Corollary 3.6. We recall that e andê are the eigenfunctions of L 0 and L * 0 , respectively, corresponding to the eigenvalue λ 0 .
To find the kernel perturbations that enhance mixing, we follow the approach taken in [1] (see also [18,16] in the continuous time setting), namely perturbing our original dynamics L 0 in such a way that the modulus of the second eigenvalue of the perturbed dynamics decreases. Equivalently, we want to decrease the real part of the logarithm of the perturbed second eigenvalue. The following result provides an explicit formula for this instantaneous rate of change. Define Proof. From (15), we have that and Next, we note that From (19)- (21), we obtain The function J (k) = k , E is clearly continuous as a map from (V ker , · L 2 ([0,1] 2 ) ) to R. As in subsection 4.2, suppose that P is a closed, bounded, strictly convex subset of V ker containing the zero element, and that J is not uniformly vanishing on P . We wish to solve the following problem: We may immediately apply Proposition 4.1 to obtain that there exists a solution to (17). If, in addition, P is strictly convex, then by Proposition 4.3 the solution to (22) is unique.

Explicit formulae for the optimal perturbations
Thus far we have not been specific about the feasible set P ; we take up this issue in this and the succeeding subsections to provide explicit formulae for the optimal responses in both problems (17) and (22). First, we have not required that the perturbed kernel k δ in (9) be nonnegative for δ > 0, however, this is a natural assumption. To facilitate this, for 0 < l < 1, define The set of allowable perturbations that we will consider in the sequel is where B 1 is the closed unit ball in L 2 ([0, 1] 2 ). We now begin verifying the conditions on P l and J required by Proposition 4.3. First, P l is clearly bounded in L 2 ([0, 1] 2 ). Second, we note that as long as F l has positive Lebesgue measure, the zero kernel is in the relative interior of P l . Third, the following lemma handles closedness of P l . Fourth, from this, since V ker and S k 0 ,l are closed subspaces, V ker ∩ S k 0 ,l is itself a Hilbert space, and hence, P l is strictly convex. Finally, sufficient conditions for the objective function to not uniformly vanish are given in Lemma 5.2.
and the result immediately follows. Then, we have y) 2 dxdy > 0 then we obtain a contradiction; thus, {k 0 <l} k(x, y) 2 dxdy = 0 and therefore k = 0 a.e. on {(x, y) ∈ [0, 1] 2 : k 0 (x, y) < l}. Hence, and The following lemma provides sufficient conditions for a functional of the general form we wish to optimise to not uniformly vanish. The general objective has the form J (k) = k (x, y)E(x, y) dy dx; in our first specific objective (optimising response of expectations) we put E(x, y) = ((Id−L * 0 ) −1 c)(x)· f 0 (y) and in our second specific objective (optimising mixing) we put E(x, y) = E(x, y) from (18). Let E + and E − denote the positive and negative parts of E.
To showk ∈ P l we need to check that (i) the support ofk is contained in F l and (ii) F y lk (x, y) dx = 0 for a.e. y ∈ Ξ(F l ); these points showk ∈ S k 0 ,l ∩ V ker and by trivial scaling we may obtaink ∈ B 1 . Item (i) is obvious from the definition ofk. For item (ii) we compute This final expression is positive due by the hypotheses of the Lemma.
as in the case of optimising the derivative of the expectation of an observable c , and in the case of optimising the derivative of a real eigenvalue-then Because h 2 = f 0 and h 2 = e are not the zero function, and h 1 = (Id − L * 0 ) −1 c and h 1 =ê are both nontrivial signed functions, the conditions of Lemma 5.2 are relatively easy to satisfy.

Maximising the expectation of an observable
In this section we provide an explicit formula for the optimal kernel perturbation to increase the expectation of an observation function c by the greatest amount. Since the objective function in (17) is linear ink, a maximum will occur on ∂B 1 ∩ V ker ∩ S k 0 ,l (i.e. we only need to consider the optimization over the unit sphere and not the unit ball). Thus, we consider the following reformulation of the general problem 1: Our first main result is: where Proof. See Appendix A.
Note that the expression for the optimal perturbationk in (28) depends only on k 0 and c. This is in part a consequence of the fact that the linear response formula (12) depends only on the first order termk (the "direction" of the perturbation) in the expansion of k δ . Thus, in order to find the unique perturbation that optimises our linear response, we seek the best "direction" for the perturbation. Similar comments hold for our other three optimal linear perturbation results in later sections.
The compactness condition on L 0 : L 1 → L 1 required for essential boundedness ofk can be addressed as follows. A criterion for L 0 to be compact on L 1 ([0, 1]) is the following (see [12]): Given ε > 0 there exists β > 0 such that for a.e. y ∈ [0, 1] and γ ∈ R with |γ| < β, A class of kernels that satisfy this are essentially bounded kernels k 0 : [0, 1] × [0, 1] → R that are uniformly continuous in the first coordinate. Such a class naturally arises in our dynamical systems settings.

Maximally increasing the mixing rate
Let λ 0 ∈ C denote a geometrically simple eigenvalue of L 0 strictly inside the unit circle and e and e denote the corresponding eigenvectors of L 0 and L * 0 , respectively. Our results concerning optimal rate of movement of λ 0 under system perturbation work for any λ 0 as above, but eigenvalues of largest magnitude inside the unit circle have the additional significance of controlling the exponential rate of mixing. We therefore primarily focus on these eigenvalues and in this section we consider again the linear response problem for enhancing the rate of mixing, now providing explicit formulae for optimal perturbations and the response.
Since we are again interested in kernel perturbations that will ensure that the perturbed kernel k δ is nonnegative, we consider the constraint set P l , as in Section 4.1, where 0 < l < 1. The objective function of (22) is linear and therefore, we only need to consider the optimization problem on V ker ∩ S k 0 ,l ∩ ∂B 1 . Thus, to obtain the perturbationk that will enhance the mixing rate, we solve the following optimization problem: such that where E is defined in (18).
where E is given in (18) and α > 0 is selected so that Proof. See Appendix B.

Linear response for map perturbations
In this section we consider random dynamics governed by the composition of a deterministic map T δ , δ ∈ [0,δ), and additive i.i.d. perturbations, or "additive noise". We will assume that the noise is distributed according to a certain Lipschitz kernel ρ and impose a reflecting boundary condition that ensures that the dynamics remain in the interval [0, 1]. More precisely, we consider a random dynamical system whose trajectories are given by where+ is the "boundary reflecting" sum, defined by a+b := π(a + b), and π : R → [0, 1] is the piecewise linear map π(x) = min i∈Z |x − 2i|. We assume throughout that

Expressing the map perturbation as a kernel perturbation
In this subsection we describe precisely the kernel of the transfer operator of the system (33). Associated with the process (33) is an integral-type transfer operator L δ , which we will derive (following the method of §10.5 in [35]). Noting that |π (z)| = 1 for all z ∈ R, the Perron-Frobenius operator P π : L 1 (R) → L 1 ([0, 1]) associated to the map π is given by For b ∈ R consider the shift operator τ b defined by (τ b g)(y) := g(y + b) for g ∈ Lip(R). For the process (33), suppose that x n has the distribution f n : [0, 1] → R + (i.e. f n ∈ L 1 , f n ≥ 0 and f n dm = 1). We note that T δ (x n ) and ω n are independent and thus the joint density of (x n , ω n ) ∈ [0, 1] × [−1, 1] is f n · ρ. Let h : [0, 1] → R be a bounded, measurable function and let E denote expectation with respect to Lebesgue measure; we then compute where the last equality follows from the duality of the Perron-Frobenius and the Koopman operators for π. Since E(h(x n+1 )) = 1 0 h(x)f n+1 (x)dx, and h is arbitrary, the map f n → f n+1 is given by for all z ∈ [0, 1]. Thus, for δ ∈ [0,δ) the integral operator L δ : L 2 ([0, 1]) → L 2 ([0, 1]) associated to the process (33) is given by where Proof. Stochasticity and nonnegativity of k δ follow from stochasticity and nonnegativity of ρ and the fact that Perron-Frobenius operators preserve these properties. Essential boundedness of k δ follows from the facts that ρ is Lipschitz (thus essentially bounded), τ is a shift, and P π is constructed from a finite sum because ρ has compact support.
wherek is as in (37). We have .
We begin by showing that the first term on the right hand side of (38) is o(δ). Since ρ is Lipschitz with constant K, one has Because the support of τ −(T δ (y)) ρ − τ −(T 0 (y)+δ·Ṫ (y)) ρ is contained in 2 intervals, each of length 2, by (39) and Lemma C.1, we therefore see that Next we show that the second term on the right hand side of (38) is o(δ). Using the definition of the derivative and the fact that ρ is differentiable a.e. we see that for a.e. x, y. Since ρ(x−T 0 (y)−δ·Ṫ (y))−ρ(x−T 0 (y)) δ ≤ KṪ (y), by dominated convergence the limit (40) also converges in L 2 . Hence, applying Lemma C.1 to the second term on the right hand side of (38), noting that D(δ) in (40) is square-integrable and supported in at most 3 intervals of length at most 2, we obtain Regarding the final statement, suppose that δ → T δ has a continuous derivative with respect to δ at a neighborhood of δ = 0. This implies thatṪ exists and varies continuously on a small interval [0, δ * ], with 0 < δ * ≤δ. Denote the derivative dT δ /dδ at δ byṪ δ , and similarly fork. One has where the final inequality follows from Lemma C.1 applied to each term in the previous line, noting that ρ is supported in a single interval of length 2. The first term in the final inequality goes to zero as δ → 0 by continuity ofṪ , and the second term goes to zero as δ → 0 since r δ 2 → 0.
6.2 A formula for the linear response of the invariant measure and continuity with respect to map perturbations By considering the kernel form of map perturbations, we can apply Corollary 3.5 to obtain the following.
, be the integral operators in (35) with the kernels k δ as in (36). Suppose that L 0 satisfies (A1) of Theorem 2.2. Then the kernelk in (37) is in V ker and Proof. The result is a direct application of Corollary 3.5; we verify its assumptions. From Lemma 6.1, k δ ∈ L 2 ([0, 1] 2 ) is a stochastic kernel and so L δ is an integral-preserving compact operator. From Proposition 6.2, k δ has the form (9). Thus, we can apply Corollary 3.5 to obtain the result. By the covering condition there is some n ∈ N such that supp(L n 0 (f + 1 )) = [0, 1]. It is then standard to deduce that there is an n 0 ≥ n such that L n 0 (f 1 ) 1 < for n ≥ n 0 . Since the transfer operator contracts the L 1 norm, then ||L n 0 f || 1 ≤ 2 for n ≥ n 0 and since was arbitrary, this implies that L 0 satisfies (A1).
Let the linear response R : L 2 → L 2 of the invariant density be defined as Lemma 6.5. The function R : L 2 → L 2 is continuous.

A formula for the linear response of the dominant eigenvalues and continuity with respect to map perturbations
We are also able to express the linear response of the dominant eigenvalues as a function of the perturbing mapṪ . Define , be integral operators generated by the kernels k δ as in (36), assume that dρ/dx is Lipschitz and δ → T δ is C 1 . Let λ δ be an eigenvalue of L δ with second largest magnitude strictly inside the unit disk. Suppose that L 0 satisfies (A1) of Theorem 2.2 and λ 0 is geometrically simple. Then where e is the eigenvector of L 0 associated to the eigenvalue λ 0 andê is the eigenvector of L * 0 associated to the eigenvalue λ 0 .

Optimal linear response for map perturbations
In this section we derive formulae for the map perturbations that maximise our two types of linear response. We begin by formalising the set of allowable map perturbations then state the formulae.

The feasible set of perturbations
Before we formulate the optimization problem, we note that in this setting, we require some restriction on the space of allowable perturbations to T 0 if we are to interpret T 0 + δṪ as a map of the unit interval for some δ strictly greater than 0 (a non-infinitesimal perturbation). With this in mind, let > 0 and F := {x ∈ [0, 1] : ≤ T 0 (x) ≤ 1 − }; it will turn out that we obtain for free thatṪ ∈ L ∞ . Note that in principle, > 0 can be taken as small as one likes, and indeed if one wishes to consider only infinitesimal perturbationsṪ then one may set F = F 0 = [0, 1]. Of course if T : S 1 → S 1 then may may use F = F 0 = [0, 1] even for non-infinitesimal perturbations. Recalling that in Proposition 6.2 we are considering L 2 perturbationsṪ of the map T 0 , we define Lemma 7.1. S T 0 , is a closed subspace of L 2 .
Proof. It is clear that S T 0 , is a subspace. To show it is closed, let {f n } ⊂ S T 0 , and suppose that f n → L 2 f ∈ L 2 . Further, suppose that F is not [0, 1] up to measure zero; otherwise S T 0 , = L 2 , which is closed. Then, we have If F c f (x) 2 dx > 0, we obtain a contradiction since F (f n (x) − f (x)) 2 dx ≥ 0; thus, F c f (x) 2 dx = 0 and so f = 0 a.e. on F c . Hence, S T 0 , is closed.
For the remainder of this section, the set of allowable perturbations that we consider is where B 1 is the unit ball in L 2 . Since S T 0 , is a closed subspace of L 2 , it is itself a Hilbert space and so P is strictly convex. The following lemma concerns the existence of a perturbationṪ for which our objectives will be nonzero; that is, our objective J is not uniformly vanishing. Denote P(x, y) := P π τ −T 0 (y) P(x, y)Ṫ (y)E(x, y) dx dy be our objective. In our first specific objective (optimising response of expectations) we will insert E(x, y) = ((Id − L * 0 ) −1 c)(x)f 0 (y) and in our second specific objective (optimising mixing) we will insert E(x, y) = E(x, y) from (18). Lemma 7.2. Assume that there is F ⊂ F such that m(F ) > 0 and E(·, y) / ∈ span{P(·, y)} ⊥ for all y ∈ F . Then there is aṪ ∈ P such that J (Ṫ ) > 0.
We expect the hypotheses of Lemma 7.2 to be satisfied "generically".

Explicit formula for the optimal map perturbation that maximally increases the expectation of an observable
In this section we consider the problem of finding the optimal map perturbation that maximizes the expectation of some observable c ∈ L 2 . We first present a result that ensures a unique solution exists and then derive an explicit expression for the optimal perturbation. We begin by noting that R(Ṫ ) ∈ V ; this follows from the fact that P π τ −T 0 (y) dρ dx (x)f 0 (y) ∈ V ker (sincek ∈ V ker , see Proposition 6.3) and therefore 1 0 P π τ −T 0 (y) dρ dx (x)f 0 (y)g(y)dy ∈ V for g ∈ L 2 (see Lemma 3.2). Hence, we only need to consider c ∈ span{f 0 } ⊥ (see the discussion at the end of Section 4.2).
where R is as in (41), has a unique solutionṪ ∈ L 2 .
Proof. Let H = L 2 , P = P and J (ḣ) = c, R(ḣ) L 2 ([0,1],R) . Using Lemma 7.1 we note that P is closed, as well as bounded, strictly convex and that it contains the zero element of H. From Lemma 6.5, it follows that c, R(ḣ) L 2 ([0,1],R) is continuous as a function ofḣ; note that it is also linear iṅ h. By hypothesis, J is not uniformly vanishing on P . We can therefore apply Propositions 4.1 and 4.3 to conclude that (45) has a unique solution.
Before we present the explicit formula for the optimal solution, we will reformulate the optimization problem (45) to simplify the analysis. We first note that since the objective function in (45) is linear inṪ , the maximum will occur on S T 0 , ∩ ∂B 1 . Combining this with the fact that we only need c ∈ span{f 0 } ⊥ , we consider the following reformulation of (45): Theorem 7.4. Suppose the transfer operator L 0 associated with the system (T 0 , ρ) has a kernel k 0 as in (36), which satisfies (A1) of Theorem 2.2, and there is a F ⊂ F such that m(F ) > 0, and f 0 (y) > 0 and (Id − L * 0 ) −1 c / ∈ span{P(·, y)} ⊥ for all y ∈ F . Let G : L 2 → L 2 be defined as Then, the unique solution to Problem C iṡ Furthermore,Ṫ ∈ L ∞ .
Proof. See Appendix D.

Explicit formula for the optimal map perturbation that maximally increases the mixing rate
In this section we set up the optimisation problem for mixing enhancement and derive a formula for the optimal map perturbation. We remark that related spectral approaches to mixing enhancement for continuous-time flows were developed in [18,16].
Recall that to enhance mixing in Section 5.2, we perturbed k 0 so that the logarithm of the real part of the second eigenvalue decreases. From Lemma 4.4, we have where λ δ denotes the second largest eigenvalue in magnitude (assumed to be simple) of the integral operator L δ with the kernel k δ = k 0 + δ ·k + o(δ), where δ → k δ is C 1 at δ = 0. Since we want to perturb T 0 byṪ , we reformulate the above inner product. Define where E(x, y) is as in (18). Proof. We first show that E ∈ L ∞ ([0, 1], R). We can write From equation (50), in order to maximally increase the spectral gap, by Proposition 7.5, we should choose the map perturbationṪ to minimise Ṫ , E . We first show this optimisation problem has a unique solution.
Proposition 7.6. Let P be the set in (44) and assume that J (Ṫ ) = Ṫ , E does not uniformly vanish on P . Then, the problem of findingṪ ∈ P such that has a unique solution.
Proof. Note that P is closed (by Lemma 7.1), bounded, strictly convex and contains the zero element of L 2 . Now, since J (ḣ) := ḣ , E L 2 ([0,1],R) is linear and continuous and by hypothesis does not vanish everywhere on P , we may apply Propositions 4.1 and 4.3 to obtain the result.
Since the objective function in (52) is linear, all optima will lie in S T 0 , ∩ ∂B 1 . Hence, we equivalently consider the following optimization problem: We now state a formula for the unique optimum.
Proof. See Appendix E.

Applications and numerical experiments
In this section we will consider two stochastically perturbed deterministic systems, namely the Pomeau-Manneville map and a weakly mixing interval exchange map. For each of these maps we numerically estimate: 1. The unique kernel perturbation that maximises the change in expectation of a prescribed observation function (see Problem A). An expression for this optimal kernel is given by (28).

The unique kernel perturbation that maximally increases the mixing rate (see Problem B).
An expression for this optimal kernel is given by (31) and (32). 3. The unique map perturbation that maximises the change in expectation of a prescribed observation function (see Problem C). An expression for this optimal map perturbation is given by (49). 4. The unique map perturbation that maximally increases the mixing rate (see Problem D). An expression for this optimal map perturbation is given by (55) and (56).
The numerics will be explained as we proceed through these four optimisation problems. We refer the reader to [1] for additional details on the implementation and related experiments.

Pomeau-Manneville map
We consider the Pomeau-Manneville map [36] T 0 (x) = with parameter value α = 1/2. For this parameter choice it is known that the map T 0 admits a unique absolutely continuous invariant probability measure, but only algebraic decay of correlations [36]. With the addition of noise as per (33), the transfer operator defined by (35) and (36) for δ = 0 becomes compact as an operator on L 2 . In our numerical experiments we will use the smooth noise kernel ρ : We now begin to set up our numerical procedure for estimating L 0 , which is a standard application of Ulam's method [47]. Let B n = {I 1 , . . . , I n } denote an equipartition of [0, 1] into n subintervals, and set B n = span{1 I 1 , . . . , 1 In }. We define the (Ulam) projection π n : L 2 ([0, 1]) → B n by π n (g) = n i=1 1 m(I i ) I i g(x)dx 1 I i . The finite-rank transfer operator L n := π n L 0 : L 2 ([0, 1]) → B n can be computed numerically. We use MATLAB's built-in functions integral.m and integral2.m to perform the ρ-convolution (using an explicit form of ρ ) and the Ulam projections, respectively. Figure 1 displays the nonzero entries in the column-stochastic matrix corresponding to L n for = 0.1.

Kernel perturbations
In the framework of Problems A and B we use the (arbitrarily chosen) monotonically increasing observation function c(x) = − cos(x). In order to estimatek as in (28) we use the code from Algorithm 3 [1]; the inputs are the Ulam matrix L n and c n (obtained as π n (c)). Equivalently, directly using (28) one may substitute f n (obtained as the leading eigenvector of L n ) for f , L n for L, c n as above for c, and solve (Id − L * n ) −1 c n (obtained as a vector y ∈ R n by numerically solving the linear system (Id−L * n )y = c n , f n y = 0). Figure 3 shows the optimal kernel perturbationsk n for n = 500. Because c is an increasing function, intuitively one might expect the kernel perturbation to try to shift mass in the invariant density from left to right. Broadly speaking, this is what one sees in the high-noise case in Figure 3 (left): vertical strips typically have red above blue, corresponding to a shift of mass to the right in [0, 1]. The main exception to this is around the y-axis value of 1/2, where red is strongly below blue along vertical strips. This is because at the next iteration, these red regions will be mapped near x = 1 and achieve the highest value of c, while the blue regions will be mapped near to x = 0 with the least value of c. In the low-noise case of Figure 3 (right), we see a similar solution with higher spatial frequencies, and strong perturbations To investigate the optimal kernel perturbation to maximally increase the rate of mixing in the stochastic system, we use the expressionk in (31). A natural approximate version (31) requires estimates of the left and right eigenfunctions of L 0 corresponding to the second largest eigenvalue λ 2 ; these are obtained directly as eigenvectors of L n . Figure 4 shows the resulting optimal kernel perturbations, computed using the code from Algorithm 4 [1] with input L n . Because the fixed point at x = 0 is responsible for the slow algebraic decay of correlations for the deterministic dynamics of T 0 , the fixed point will also play a dominant role in the mixing rate of the stochastic system for low to moderate levels of noise. Indeed, Figure 4 shows that the optimal perturbation concentrates its effort in a neighbourhood of the fixed point, and pushes mass away from the fixed point as much as possible. This is particularly extreme in the low noise case of Figure 4 (right) with the perturbation almost exclusively concentrated in a small neighbourhood of x = 0.

Map perturbations
We now turn to the problem of finding the unique map perturbationṪ that maximises the change in expectation of the observation c(x) = − cos(x) (see Problem C for a precise formulation) and maximises the speed of mixing (see Problem D). We use the natural Ulam discretisation of the expression 5 (49). The objects f n and (Id−L * n ) −1 c n are computed exactly as before in Section 8.1.1. The action of the operator G in (49) is computed using MATLAB's built-in function integral.m using an explicit form of dρ /dx for dρ/dx in (49). Figure 5 (left) shows the optimalṪ for the two noise amplitudes = 1/10 and = √ 6/100. Note that for the noise amplitude = 0.1 (blue curve in Figure 5) the map perturbationṪ is mostly positive, corresponding to moving probability mass to the right, as expected because we are maximising the change in expectation of an increasing observation function c. The blue curve is most negative in neighbourhoods of the two preimages of x = 1/2, corresponding to moving probability mass to the left. The reason for this is identical to the discussion of the "blue above red" effect in Figure 3, namely moving mass to the left creates a very large increase in the objective 5 Note that since T −1 0 ({0, 1}) is a finite set, we may take > 0 as small as we like. In the computations we set = 0, so that F = [0, 1] mod m. function value at the next iterate. This "look ahead" effect is even more pronounced in the low noise case (red curve of Figure 5), whereṪ is mostly positive, but has deep negative perturbations at multiple preimages of x = 1/2 reaching further into the past. Figure 5 (right) illustrates the Pomeau-Manneville map (black) with perturbed maps T 0 +Ṫ /100. We have chosen a scale factor of 1/100 for visualisation purposes; one should keep in mind we have optimised for an infinitesimal change in the map. Figure 6 shows the kernel derivativesk corresponding to the optimal map derivativesṪ for the two noise levels. These kernel derivatives Figure 6: Kernel perturbations corresponding to the optimal map perturbations in Figure  5. Left: = 1/10, Right: = √ 6/100.
have a restricted form because they arise purely from a derivative in the map. One may compare Figure 6 with Figure 3 and note that the kernel derivative in Figure 6 (left) attempts to follow the general structure of the kernel derivative in Figure 3 (left), while obeying its structural restrictions arising from the less flexible map perturbation. Broadly speaking, in Figure 6 (left), red lies above blue (mass is shifted to the right). Exceptions are near y = 1/2 because at the next iteration these red points will land near x = 1, achieving very high objective value, while the blue region will get mapped to near x = 0, encountering the lowest value of c. Note that the perturbation decreases from a peak to very close to zero near x = 0. This is because in a small neighbourhood of x = 0 there is already some stochastic perturbation away from x = 0 "for free" due to the reflecting boundary conditions. Thus, the map perturbationṪ does not need to invest energy in large perturbations very close to x = 0. The map perturbation that maximally increases the rate of mixing is a particularly interesting question. Our computations use the natural Ulam discretisation of (56). The computations follow as in Section 8.1.1 with the action of G computed as above. Figure 7 (left) shows the optimalṪ for the two noise amplitudes = 1/10 and = √ 6/100. A sharp map perturbation away from x = 0 is seen for both noise levels, with the perturbation sharper for the lower noise case. In both cases, the perturbations far from x = 0 are weak (low magnitude values ofṪ ). This result corresponds well with the results seen for the optimal kernel perturbations in Figure 4, where mass was primarily moved away from x = 0. As in the optimal solution shown in Figure 5 (left), the optimal perturbation in Figure 7 decreases from a sharp peak down to zero near x = 0. This is again because in a small neighbourhood of x = 0 the system experiences "free" stochastic perturbations away from x = 0 due to the reflecting boundary conditions, and thus the map perturbationṪ need not need invest energy in large perturbations very close to x = 0. Figure 7 (right) illustrates the Pomeau-Manneville map (black) with perturbed maps T 0 +Ṫ /100, where again the factor 1/100 is just for illustrative purposes. When inspecting the kernel derivativesk corresponding to the optimal map perturbationsṪ in Figure 8, we see similar behaviour to those in Figure 7.

Interval exchange map
In our second example, we consider a weak-mixing interval exchange map. This is because of an existing literature in mixing optimisation for these classes of maps with the addition of noise. Avila and Forni [3] prove that a typical interval exchange is either weak mixing or an irrational rotation. We use a specific weak-mixing [45] interval exchange map T 0 with interval permutation  (51) in [45]. We again form a stochastic system using the same noise kernels as for the Pomeau-Manneville map in Section 8.1. The mixing properties of this map have been studied in [15]. Figure 9 shows the column-stochastic matrix corresponding to L n for n = 500 and = 0.1. Figure 9: Transition matrix for the system (33) for δ = 0 and T 0 given by the interval exchange map above using n = 500 subintervals. The additive noise is drawn from the density ρ with = 1/10.

Kernel perturbations
In the framework of Problem A, we use the same observation function c(x) = − cos(x) as in the Pomeau-Manneville case study, and estimate the optimal kernel perturbationk that maximally increases the expectation of c in an identical fashion. In broad terms, one again sees thatk attempts to shift invariant probability mass to the right in [0, 1]. In Figure 10 (left), in each smooth part of the support ofk, red is "above" blue, meaning mass is pushed to the right. Clear exceptions to the "red above blue" scheme are seen as three sharp horizontal lines. The y-coordinates of these three sharp horizontal lines coincide with the three points of discontinuity in the domain of the interval exchange at approximately x = 0.43, 0.77, 0.89. Consider the sharp horizontal "blue above red" line at y ≈ 0.43. According to Figure 9, under the action of the kernel k 0 , mass in the vicinity of x = 0.6 will be transported near to x = 0.43. The perturbationk shown in Figure 10 will then tend to push this mass to the left of x = 0.43. Thus, on the next iteration there will be a bias for mass to be mapped near to x = 1 rather than near x = 0.25, achieving a much larger objective value at this iterate. A similar reasoning applies to the "blue above red" horizontal lines at y ≈ 0.77 and 0.89; the contrast is a little weaker because the potential gain at the next iterate is also weaker. In the low noise case, Figure 10 (right), displays similar behaviour to the higher noise case of Figure 10 (left). With lower noise, the deterministic dynamics plays a greater role and additional preimages are taken into account, leading to a more oscillatory optimal k. To investigate the optimal kernel perturbation to maximally increase the rate of mixing in the stochastic system (in the framework of Problem B) we use the expressionk in (31). The method of numerical approximation is identical to that used for the Pomeau-Manneville map. Figure 11 shows the signed distribution of mass that is responsible for the slowest real 6 exponential rate of decay in the stochastic system. This eigenfunction becomes more oscillatory as the level of noise decreases, and as must be the case, the magnitude of the corresponding eigenvalue increases from λ ≈ −0.7476 ( = 1/10) to λ ≈ −0.9574 ( = √ 6/100). Because the sign of these eigenvalues is negative, one expects a pair of almost-2-cyclic sets [10], consisting of three subintervals each, given by the positive and negative supports of the eigenfunctions. Figure 12 shows the approximate optimal kernel perturbations. In the high-noise situation of Figure 12 (left), the sharp horizontal changes are present at preimages of the deterministic dynamics, as they were in to Figure 10 (left). The importance of the break points to the overall mixing rate is thus clearly borne out in the optimalk; a precise interpretation of the optimalk Figure 11: Approximate second eigenfunctions of the transfer operator L 0 of the system (33) with T 0 given by the interval exchange map above. The additive noise in (33) is drawn from the density ρ with taking the values 1/10 (blue) and √ 6/100 (red). is not very straightforward. For the low noise case (Figure 12 (right)) it appears that there is an alternating shifting of mass left and right with alternating "red above blue" and "blue above red". This leads to greater mixing at smaller spatial scales than is possible in a single iteration of the deterministic interval exchange. We anticipate that decreasing the noise amplitude further will result in more rapid alternation of "red above blue" and "blue above red". As the diffusion amplitude decreases, the efficient large-scale diffusive mixing is no longer possible and so a transition is made to small-scale mixing, accessed by increasing oscillation in the kernel.

Map perturbations
The computations in this section follow those of Section 8.1.2. Figure 13 (left) shows the optimal map perturbationsṪ at two different noise levels. Figure 13 (right) illustrates T 0 +Ṫ /100 for the two different levels of noise. The kernel perturbations generated by these optimal map perturbations are displayed in Figure 14. If one compares the kernel perturbations in Figure 14 with those more flexible kernel perturbations in Figure 10, one sees that the two sets of kernel perturbations are broadly equivalent with one another in terms of the relative positions of the positive and negative (red and blue) perturbations. Note that the more restrictive kernel derivative in Figure 14 by construction cannot replicate the sharp horizontal red-blue switches in Figure 10. It turns out that the strongest of these red-blue switches, namely the one at y ≈ 0.43 in Figure 10 (left) is approximated as best as is allowed by a map perturbation, see Figure 14 (left), while the other two (weaker) horizontal red/blue switches seen in 10 are ignored. We now turn to optimal map perturbations for the mixing rate. The combined effect of the "cutting and shuffling" of interval exchanges with diffusion on mixing rates has been widely studied, e.g. [2,46,15,34,48], including investigations of the impact of changing the diffusion or the interval exchange on mixing. The very general type of formal map optimisation we consider here has not been attempted before, and we hope that our novel techniques will stimulate interesting new research questions and motivate more sophisticated experiments in the field of mixing optimisation.
Under repeated iteration, the original interval exchange T 0 cuts and shuffles the unit interval into an increasing number of smaller pieces, assisting the small scale mixing of diffusion. Our  of Figure 16. Thus, the optimisation attempts to include some additional mixing by rapid local warping of the phase space. It is plausible that this additional warping effect enhances mixing beyond the rigid shuffling of the interval exchange. An illustration of T 0 +Ṫ /100 is given in Figure  15. We emphasise that the factor 1/100 is only for visualisation purposes and for smaller factors,  A Proof of Theorem 5.4 First we need a technical lemma. We note that the statement of the lemma is analogous to the continuity of (Id − L 0 ) −1 , which was treated in the proof of Theorem 2.2.
To prove that (Id−L * 0 ) −1 : span{f 0 } ⊥ → span{f 0 } ⊥ is bounded, we will use the Inverse Mapping Theorem (Theorem III.11, [41]). Since the integral operator L * 0 has an L 2 kernel, by (6) and the triangle inequality it follows that Id−L * 0 is bounded. Also, from the Fredholm alternative argument above, Id − L * 0 : span{f 0 } ⊥ → span{f 0 } ⊥ is surjective. Thus, to apply the Inverse Mapping Theorem, we just need to show that is injective and the result follows.
Proof of Theorem 5.4. We will use the method of Lagrange multipliers to derive the expression (28) from the first-order necessary conditions for optimality and then show that such ak satisfies the second-order sufficient conditions. To this end, we consider the following Lagrangian function

Necessary conditions:
We verify the conditions in Theorem 2, §7.7, [37]. We want to findk and µ that satisfy the first-order necessary conditions: where DkL(k, µ) ∈ B(L 2 ([0, 1] 2 ), R) is the Frechet derivative with respect to the variablek. Since f is linear, we have (Dkf )k = f (k). Also, (Dkg)k = 2 k ,k L 2 ([0,1] 2 ) since Thus, for the necessary conditions of the Lagrange multiplier method to be satisfied, we need that for allk ∈ V ker ∩ S k 0 ,l and g(k) = 0.
Noting Lemma A.1 and the fact that c ∈ span{f 0 } ⊥ , we have We claim thaṫ satisfies the necessary condition (58) and lies in V ker ∩ S k 0 ,l . Before we verify this, we show that and thereforeĝ .
Uniqueness of the solution: The set P l = V ker ∩ S k 0 ,l ∩ B 1 is a closed (Lemma 5.1), bounded, strictly convex set, containingk = 0. The objective J (k) = c, R(k) is continuous (since J is linear and R is continuous (see comment following (14))) and not uniformly vanishing (Lemma 5.2). Therefore by Propositions 4.1 and 4.3,k is the unique optimum.
L ∞ boundedness of the solution: Suppose that c ∈ W and k 0 ∈ L ∞ ([0, 1] 2 ). From L 0 f 0 = f 0 and k 0 ∈ L ∞ ([0, 1] 2 ), we have by (7) that f 0 ∈ L ∞ . Let V 1 := {f ∈ L 1 : f dm = 0}. We would like to show that (Id − L 0 ) −1 : V 1 → V 1 is bounded. To obtain this, we first need the exponential contraction of L 0 on V 1 . Since L 0 is integral preserving and compact on L 1 , from the argument in the proof of Theorem 2.2 we only need to verify the L 1 version of assumption (A1) on V 1 . To verify this, we note that for h ∈ V 1 , we have L 0 h 2 ≤ L 0 h ∞ ≤ k 0 L ∞ ([0,1] 2 ) h 1 and therefore, L 0 h ∈ V since L 0 preserves the integral. Thus, for any h ∈ V 1 , lim n→∞ L n 0 h 1 ≤ lim n→∞ L n−1 0 (L 0 h) 2 = 0 since L 0 satisfies (A1) on V . Hence, the L 1 version of (A1) holds and L 0 has exponential contraction on V 1 . We then have where the last inequality follows from λ < 0; thus, (Id − L 0 ) −1 : V 1 → V 1 is bounded.

B Proof of Theorem 5.6
Proof. The optimization problem is very similar to that considered in Theorem 5.4; thus, we will refer to the proof of that theorem with the following modifications. Consider the Lagrangian function where, in this setting, we have f (k) = k , E L 2 ([0,1] 2 ,R) and g(k) = k 2 L 2 ([0,1] 2 ,R) − 1. Thus, for the necessary conditions of the Lagrange multiplier method to be satisfied, we need that for allk ∈ V ker ∩ S k 0 ,l and g(k) = 0.

C Upper bound for the norm of the reflection operator
Lemma C.1. Let P π be as in (34) and assume that the support of f ∈ L 2 (R) is contained in N intervals of lengths a j , j = 1, . . . , N . Then, P π f L 2 ([0,1]) ≤ N j=1 a j + 1 f L 2 (R) , where x denotes the smallest integer greater than or equal to x.
Thus, from (74) and the nondegeneracy of the inner product we have thatṪ = −1 F E 2µ . To conclude thatṪ satisfies the necessary condition (74), we need to check that µ = 0. Since E ∈ L 2 (as it is essentially bounded, see Proposition 7.5), the necessary condition (75) yields µ = ± 1 2 E 2 . Thus, to finish the proof thatṪ satisfies both necessary conditions (74)-(75), we will show that E 2 = 0. From the hypotheses on E, and recalling that P(x, y) = P π τ −T 0 (y) dρ dx (x), we conclude that  Hence µ = ± 1 2 E 2 = 0 andṪ = ∓1 F E E 2 ; the sign of µ is determined by checking the sufficient conditions. ClearlyṪ ∈ L 2 and has support contained inF l , thusṪ ∈ S T 0 ,l . For the sufficient conditions, as in the proof of Theorem 7.4, since the objective is linear, we require that µ > 0. Using Lemma 7.2 and Proposition 7.6 we conclude that (55) is the unique solution. The essential boundedness ofṪ follows from the essential boundedness of E (see Proposition 7.5).