1 Introduction

The statistical properties of the long-term behaviour of deterministic or stochastic dynamical systems are strongly related to the properties of invariant or stationary measures and to the spectral properties of the associated transfer operator. When the dynamical system is perturbed it is useful to understand and predict the response of the statistical properties of the system through these objects. When such responses are differentiable, we say that the system exhibits a linear response to the class of perturbations. To first order, this response can be described by a suitable derivative expressing the infinitesimal rate of change in e.g. the natural invariant measure or in the spectrum. Understanding the response of statistical properties to perturbation has particular importance in applications, including to climate science (see e.g. Ghil and Lucarini 2020; Hairer and Majda 2010 and the references therein).

In the present paper we go beyond quantifying responses and address natural problems concerning the optimal response, namely which perturbations elicit a maximal response. For example, given an observation function, which perturbation produces the greatest change in the expectation of this observation, and which perturbation produces the greatest change in the rate of convergence to equilibrium. Continuing the climate science application, one may wish to know which small climate action (which perturbation) would produce the greatest reduction in the average temperature (the expected observation value). We note that by considering trajectories of a perturbed map and using ergodicity, one may view the problem of maximising the response in the expectation of an observation as an infinite-horizon optimal control problem, averaging an observation along trajectories.

The linear response of dynamical systems is an area of intense research and we present a brief overview of the literature that is related to the present work. Early results concerning the response of invariant measures to the perturbation of a deterministic system have been obtained by Ruelle (1997) in the uniformly hyperbolic case. More recently, these results have been extended to several other situations in which one has some hyperbolicity and sufficient regularity of the system and its perturbations. We refer the reader to the survey (Baladi 2014) for an extended discussion of the literature about linear response (and its failure) for deterministic systems.

The mathematical literature on linear response of invariant measures of stochastic or random dynamical systems is more recent. In the framework of continuous-time random processes and stochastic differential equations, linear response results were proved in Hairer and Majda (2010) and Koltai et al. (2019). Results related to the linear response of the stationary measure for diffusion in random media appear in Komorowski and Olla (2005), Gantert et al. (2012), Gantert et al. (2017), Faggionato et al. (2019) and Mathieu and Piatnitski (2018). In the discrete-time case, examples of linear response for small random perturbations of uniformly hyperbolic deterministic systems appeared in Gouëzel and Liverani (2006). In Bahsoun et al. (2020), linear response results are given for random compositions of expanding or non-uniformly expanding maps. In Zmarrou and Homburg (2007) the smoothness of the invariant measure response under suitable perturbations is proved for a class of random diffeomorphisms, but no explicit formula is given for the derivatives; an application to the smoothness of the rotation number of Arnold circle maps with additive noise is presented. Systems generated by the iteration of a deterministic map subjected to i.i.d. additive random perturbations are one class of stochastic systems studied in the present paper (see Sect. 6). The linear response of such systems is considered systematically in Galatolo and Giulietti (2019) and linear response results are proved for perturbations to the deterministic map or to the additive noise. These results are used to by Marangio et al. (2019) to extend some results of Zmarrou and Homburg (2007) outside the diffeomorphism case and applied to an idealised model of El Niño-Southern Oscillation, given by a noninvertible circle map with additive noise. Higher derivative results for the response of systems with additive noise are presented in Galatolo and Sedro (2020). Response results for random systems in the so-called quenched point of view appeared recently in Sedro (2019) and Sedro and Rugh (2020) where the random composition of expanding maps is considered using Hilbert cone techniques and in Dragicevic and Sedro (2020) where the random composition of hyperbolic maps is considered by a transfer operator based approach.

We remark that the addition of random perturbations is not necessarily sufficient to guarantee a linear response. An i.i.d. composition of the identity map and a rotation on the circle is considered in Galatolo (2018), and it is shown that using observables with square-integrable first derivative, one only has Hölder continuity of the response with respect to \(C^0\) perturbations of the circle rotation.

One can similarly consider the linear response of the dominant eigenvalues of the transfer operator under perturbation. In the literature, there are several results describing the way eigenvalues and eigenvectors of suitable classes of operators change when those operators are perturbed in some way, for example classical results concerning compact operators subjected to analytic perturbations (Kato 1995), and quasi-compact Markov operators subjected to \(C^k\) perturbations (Hennion and Hervé 2001). In specific classes of dynamics, differentiability of isolated spectral data is demonstated in Gouëzel and Liverani (2006) for transfer operators of Anosov maps where the map is subjected to \(C^k\) perturbations and in Koltai et al. (2019) for transfer operators arising from SDEs subjected to \(C^k\) perturbations of the drift.

Optimal linear response questions have been considered in the dynamical setting of homogeneous (and inhomogeneous) finite-state Markov chains (Antown et al. 2018), where explicit formulae are provided for the unique maximising perturbations that (i) maximise the norm of the response, (ii) maximise the expectation of a given observable, and (iii) maximise the spectral gap. The efficient Lagrange multiplier approach created in Antown et al. (2018) for questions (ii) and (iii) will be developed for the infinite-dimensional setting of stochastic integral operators in the present paper. In continuous time, Froyland and Santitissadeekorn (2017) maximised the spectral gap of a numerical discretisation of a periodically forced Fokker-Planck equation (perturbing the velocity field to maximally speed up or slow down the exponential mixing rate). The same problem is considered by Froyland et al. (2020), but for general aperiodic forcing over a finite time, using the Lagrange multiplier approach of Antown et al. (2018). A non-spectral approach to increasing mixing rates by optimal kernel perturbations in discrete time is Froyland et al. (2016).

Related optimal control problems have been considered in Galatolo and Pollicott (2017) where the goal was to find a minimal perturbation realising a specific response to the invariant measure of a deterministic system (about the problem of finding an infinitesimal perturbation realising a given response see also Kloeckner 2018). These kinds of questions and other similar ones were also briefly considered in Galatolo and Giulietti (2019) for random dynamical systems consisting of deterministic maps perturbed by additive noise. Similar problems in the case of probabilistic cellular automata were considered in MacKay (2018).

The present work takes the point of view of Antown et al. (2018), but seeks to treat stochastic dynamical systems on smooth domains, instead of Markov chains on domains consisting of a finite number of states. We prove the existence of unique optimal perturbations, derive explicit formulae for these optimal perturbations, and illustrate the formulae and their conclusions via two topical examples. The move from stochastic matrices in Antown et al. (2018) to stochastic integral operators creates considerable additional technical challenges for the existence of the linear responses, as well as for posing and solving the infinite-dimensional optimisation problems that now arise. We consider the class of stochastic dynamical systems with transfer operators representable by an \(L^2\)-compact, integral operator, which includes deterministic systems perturbed by additive noise. The transfer operator L has the form

$$\begin{aligned} Lf(x)=\int k(x,y)f(y)\ \mathrm{d}y, \end{aligned}$$
(1)

where k is a stochastic kernel; in the case of deterministic systems T with additive noise, \(k(x,y)=\rho (x-T(y))\), with \(\rho \) a probability density representing the distribution of the noise intensity (see Sect. 6). We consider perturbations of two types: firstly, perturbations to the kernel k, and secondly, perturbations to the map T.

An outline of the paper is as follows. In Sect. 2 we consider general compact, integral-preserving operators \(L:L^2 \rightarrow L^2\) (see (3)) and state general linear response statements for the normalised fixed points and the leading eigenvalues of these operators (Theorem 2.2 and Proposition 2.6). In Sect. 3, we derive response formulae for the normalised fixed points (Corollary 3.5) and spectral values (Corollary 3.6) of operators of the form (1), under perturbation of the kernel k. In Sect. 4 we consider the problem of finding the perturbation that provokes a maximal response in the average of a given observable (General Problem 1) and the spectral gap (General Problem 2). We show that if the feasible set of perturbations is convex, an optimal solution exists, and that this optimum is unique if the feasible set is strictly convex. In Sect. 5.1, using Lagrange multipliers we derive an explicit formula for the unique optimal kernel perturbation that maximises the expectation of an observable (Theorem 5.4). In Sect. 5.2 we prove an explicit formula for the perturbation that maximise the change in spectral gap (and therefore the rate of mixing) of the system (Theorem 5.6).

In Sect. 6, we specialise our integral operators to annealed transfer operators corresponding to deterministic maps T with additive noise. For these systems, the kernel k has the form \(k(x,y)=\rho (x-T(y))\) for some nonsingular transformation T, and we consider perturbations of the map T directly. Response formulas for these perturbations are developed in Proposition 6.3 and Proposition 6.6 for the invariant measure and the dominating eigenvalues, respectively. In this framework we again prove existence and uniqueness of the map perturbation maximising the derivative of the expectation of an observation (Proposition 7.3) and then derive an explicit formula for the extremiser (Theorem 7.4). Proposition 7.6 and Theorem 7.7 state results analogous to Proposition 7.3 and Theorem 7.4 for the optimisation of the spectral gap and mixing rate.

In Sect. 8 we apply and illustrate the theoretical findings of this work on the Pomeau–Manneville map and a weakly mixing interval exchange, each perturbed by additive noise. For each map we numerically estimate (i) the optimal stochastic perturbation (perturbing the kernel k) and (ii) the optimal deterministic perturbation (perturbing the map T) that maximises the derivatives of the expectation of an observable and the mixing rate. One of the interesting lessons is that to maximally increase the mixing rate of the noisy Pomeau–Manneville map, one should perturb the kernel (stochastic perturbation) to move mass away from the indifferent fixed point or deform the map to transport mass away from the fixed point (deterministic perturbation); see Figs. 4 and 7, respectively. Further numerical outcomes are discussed and explained in Sect. 8.

2 Linear Response for Compact Integral-Preserving Operators

In this section, we introduce general response results for integral-preserving compact operators. We consider both the response of the invariant function to the perturbations and the response of the dominant eigenvalues.

2.1 Existence of Linear Response for the Invariant Function

In the following, we consider integral-preserving compact operators acting on \(L^{2}\), which are not necessarily positive. We will give a general linear response statement for their invariant functions. In Sect. 3 we show how these results can be applied to Hilbert–Schmidt integral operators, which will later be transfer operators of suitable random dynamical systems.

Let \(L^{2}([0,1])\) be the space of square-integrable functions over the unit interval (considered with the Lebesgue measure m); for brevity, we will denote it as simply \(L^{2}\). We remark that the analysis in the rest of the paper can be extended to manifolds, but we keep the setting simple so as not to obscure the main ideas.

Let us consider the space of zero-average functions

$$\begin{aligned} V:=\bigg \{f\in L^{2}~s.t.~~\int f\,\mathrm{d}m=0\bigg \}. \end{aligned}$$

Definition 2.1

We say that an operator \(L:L^{2}\rightarrow L^{2}\) has exponential contraction of the zero average space V if there are \(C\ge 0\) and \( \lambda <0\) such that \(\forall g\in V\)

$$\begin{aligned} \Vert L^{n}g\Vert _{2}\le Ce^{\lambda n}\Vert g\Vert _{2} \end{aligned}$$
(2)

for all \(n\ge 0\).

For \({\bar{\delta }}>0\) and \(\delta \in [0,{\bar{\delta }})\), we consider a family of integral-preserving, compact operators \(L_{\delta }:L^{2}\rightarrow L^{2}\); we think of \(L_{\delta }\) as perturbations of \(L_{0}\). We say that \(f_\delta \in L^2\) is an invariant function of \(L_\delta \) if \(L_\delta f_\delta =f_\delta \). We will see that under natural assumptions, the operators \(L_\delta \), \(\delta \in [0,{\bar{\delta }})\), have a family of normalised invariant functions \(f_{\delta } \in L^{2}\). Furthermore, for suitable perturbations the invariant functions vary smoothly in \(L^{2}\) and we get an explicit formula for the resulting derivative \(\frac{df_\delta }{d \delta }\). We remark that since the operators we consider are not necessarily positive, the invariant functions are not necessarily positive.

Theorem 2.2

(Linear response for integral-preserving compact operators) Let us consider a family of compact operators \(L_{\delta }:L^{2}\rightarrow L^{2}\), with \(\delta \in \left[ 0,{\overline{\delta }}\right) \), preserving the integral: for each \(g\in L^2\)

$$\begin{aligned} \int L_\delta g~\mathrm{d}m=\int g~\mathrm{d}m. \end{aligned}$$
(3)

Then,

  1. (I)

    The operators have invariant functions in \(L^2\): for each \(\delta \in [0,{\bar{\delta }})\) there is \(g_\delta \ne 0\) such that \(L_\delta g_\delta =g_\delta \).

  2. (II)

    Suppose \(L_0\) also satisfies the following:

    1. (A1)

      (mixing of the unperturbed operator) For every \(g\in V\),

      $$\begin{aligned} \lim _{n\rightarrow \infty }\Vert L_{0}^{n}g\Vert _{2}=0. \end{aligned}$$

      Under this assumption, the unperturbed operator \(L_0 \) has a unique normalised invariant function \(f_0\) such that \(\int {f}_0\ \mathrm{d}m=1\). Furthermore, \(L_0\) has exponential contraction of the zero average space V.

  3. (III)

    Suppose the family of operators \(L_\delta \) also satisfy the following:

    1. (A2)

      (\(L_\delta \) are small perturbations and existence of derivative operator at \(f_0\)) Suppose there is a \( K\ge 0\) such that \(\left| |L_{\delta }-L_{0}|\right| _{L^{2}\rightarrow L^{2}}\le K\delta \) for small \(\delta \). Furthermore, suppose there exist \({\hat{f}} \in V\) such that

      $$\begin{aligned} \underset{\delta \rightarrow 0}{\lim } \frac{(L_{\delta }-L_{0})}{ \delta }f_0 ={\hat{f}}. \end{aligned}$$

    Under these assumptions, the following hold:

    1. (a)

      There exists a \(\delta _2>0\) such that for each \(0\le \delta <\delta _2 \), the operators \(L_\delta \) have unique invariant functions \({f}_\delta \) such that \(\int {f}_\delta \ \mathrm{d}m=1.\) Furthermore, \(L_\delta \) has exponential contraction on V for \(0<\delta <\delta _2\).

    2. (b)

      The resolvent operator \(({Id}-L_0)^{-1}:V\rightarrow \) V is continuous.

    3. (c)

      \(\qquad \qquad \quad \displaystyle \lim _{\delta \rightarrow 0}\left\| \frac{f_{\delta }-f_{0}}{\delta }-({Id} -L_0)^{-1}{\hat{f}}\right\| _{2}=0;\) thus, \(({Id}-L_0)^{-1}{\hat{f}}\) represents the first-order term in the perturbation of the invariant function for the family of systems \(L_{\delta }\).

Proof

Claim (I): We start by proving the existence of the invariant functions \(g_\delta \) for the operators \(L_\delta \). Since the operators are compact and integral preserving, \(L_\delta \) has an eigenvalue 1 for each \(\delta \). Indeed, let us consider the adjoint operators \(L^*_\delta :L^2\rightarrow L^2\) defined by the duality relation \(\langle L_\delta f,g\rangle =\langle f,L^*_\delta g \rangle \) for all \(f,g\in L^2.\) Because of the integral-preserving assumption, we have \(\langle f, L^*_\delta {\mathbf {1}}\rangle = \langle L_\delta f,{\mathbf {1}}\rangle = \int L_\delta f\ \mathrm{d}m = \int f\ \mathrm{d}m = \langle f,{\mathbf {1}}\rangle \).Footnote 1 This implies \(L^*_\delta {\mathbf {1}}={\mathbf {1}}\) and thus, 1 is in the spectrum of \(L^*_\delta \) and \(L_\delta \). Since \(L_\delta \) is compact, its spectrum equals the eigenvalues and we have nontrivial fixed points for the operators \(L_\delta \).

Claim (III)(a) for \(\delta =0\): Now we prove the uniqueness of the normalised invariant function of \(L_0\). Above we proved that \(L_0 \) has some invariant function \(g_0\ne 0\). The mixing assumption (A1) implies that \(\int g_{0}\ \mathrm{d}m\ne 0\); to see this, we note that if \(\int g_{0}\ \mathrm{d}m=0\), then \(g_0\in V\), and, by (A1), \(g_0\) cannot be a nontrivial fixed point of \(L_0\). We claim that \(f_0=\frac{g_0}{\int g_{0}\ \mathrm{d}}\) is the unique normalised invariant function for \(L_0\). To see this, suppose there was a second normalised invariant function \(f'_0\); then, \(f'_0-f_0\) would be an invariant function in V, which is a contradiction.

Claim (II): To show that \(L_0\) has exponential contraction on V, we first note that for \(f\in L^2\), we can write \(f=f_0\int f\ \mathrm{d}m+[f-f_0\int f\ \mathrm{d}m]\). Since \([f-f_0\int f\ \mathrm{d}m]\in V\), it follows from (A1) that \(L_0^n f\rightarrow _{L^2} f_0\int f\ \mathrm{d}m\). Thus, the spectrum of \(L_0\) is contained in the unit disk by the spectral radius theorem. Now suppose \(\lambda \) is in the spectrum of \(L_0\) and \(|\lambda |=1\). By the compactness assumption, there is an eigenvector \(f_{\lambda }\) for \(\lambda \) and then we have \(||L_0^n(f_{\lambda })||_2=||f_{\lambda }||_2\). However, \(L_0^n(f_{\lambda })\rightarrow _{L^2} f_0\int f_\lambda \ \mathrm{d}m\), which is not possible unless \(\lambda =1\). Hence, the spectrum of \(L_0|_V\) is strictly contained in the unit disk. Thus, by the spectral radius theorem, there is an \(n>0\) such that \(||L_0^n|_V||_{L^2\rightarrow L^2}\le \frac{1}{2}\) and we have exponential contraction of \(L_0\) on V.

Claim (III)(a) for \(\delta \in [0,{\bar{\delta }}]\): From the assumptions we have \(||L_\delta -L_0||_{L^2\rightarrow L^2}\le K\delta \), and by Part (II) there is an n such that \(||L_0^n|_V||_{L^2\rightarrow L^2}\le \frac{1}{2}\). These facts imply that for small enough \(\delta \) one has \(||L_\delta ^n|_V||_{L^2\rightarrow L^2}\le \frac{2}{3}\) and therefore, \(L_\delta \) is exponentially contracting (and also mixing).

We can apply the argument in Part (II) to the operators \(L_\delta \) and obtain, for each small enough \(\delta \), a unique normalised invariant function \(f_\delta \).

Claim (III)(b): Using the exponential contraction of \(L_0\) on V, we now show that \((\text {Id}-L_{0})^{-1}:V\rightarrow V\) is continuous. Indeed, for \(f\in V\), we get \((\text {Id}-L_0)^{-1}f=f+\sum _{n=1}^{\infty }L_{0}^{n}f\). Since \(L_{0}\) is exponentially contracting on V, and \(\sum _{n=1}^{\infty }Ce^{\lambda n}:=M<\infty ,\) the sum \(\sum _{n=1}^{\infty }L_{0}^{n}f\) converges in V with respect to the \(L^{2}\) norm. The resolvent \((\text {Id}-L_{0})^{-1}:V\rightarrow V\) is then a continuous operator and \(||(\text {Id}-L_{0})^{-1}||_{V\rightarrow V}\le 1+M.\) We remark that since \({\hat{f}}\in V,\) the resolvent can be computed at \({\hat{f}}\).

Claim (III)(c): Now we are ready to prove the linear response formula. Furthermore, we have

$$\begin{aligned} \Vert f_{\delta }-f_{0}\Vert _{2}\le & {} \Vert L_{\delta }^{n}f_{\delta }-L_{0}^{n}f_{0}\Vert _{2} \\\le & {} \Vert L_{\delta }^{n}f_{0}-L_{0}^{n}f_{0}\Vert _{2}+\Vert L_{\delta }^{n}f_{\delta }-L_{\delta }^{n}f_{0}\Vert _{2} \\\le & {} \Vert L_{\delta }^{n}-L_{0}^{n}\Vert _{2}\Vert f_{0}\Vert _{2}+\Vert L_{\delta }^{n}|_{V}\Vert _{L^2\rightarrow L^2}\Vert f_{\delta }-f_{0}\Vert _{2}\\\le & {} \Vert L_{\delta }^{n}-L_{0}^{n}\Vert _{2}\Vert f_{0}\Vert _{2}+ \frac{2}{3}\Vert f_{\delta }-f_{0}\Vert _{2}, \end{aligned}$$

from which we obtain \(\Vert f_{\delta }-f_{0}\Vert _{2}\le 3 \Vert L_{\delta }^{n}-L_{0}^{n}\Vert _{L^2\rightarrow L^2}\Vert f_{0}\Vert _{2}\). Since \(\Vert L_\delta -L_0\Vert _{L^2\rightarrow L^2}\le K\delta \) and \(\Vert L_{\delta }^{n}-L_{0}^{n}\Vert _{L^2\rightarrow L^2}\le \sum _{i=1}^n \Vert L_\delta ^{n-i}(L_\delta -L_0)L_0^{i-1}\Vert _{L^2\rightarrow L^2}\), we see that \(\Vert L_\delta ^n-L_0^n \Vert _{L^2\rightarrow L^2}\rightarrow 0 \) as \(\delta \rightarrow 0\) and thus \(\Vert f_{\delta }-f_{0}\Vert _{2} \rightarrow 0\) as \(\delta \rightarrow 0.\)

Since \(f_{0}\) and \(f_{\delta }\) are the invariant functions of \(L_0\) and \(L_\delta \), we have

$$\begin{aligned} (\text {Id}-L_{0})\frac{f_{\delta }-f_{0}}{\delta }=\frac{1}{\delta }(L_{\delta }-L_{0})f_{\delta }. \end{aligned}$$

By applying the resolvent to both sides we obtain

$$\begin{aligned} \begin{aligned} \frac{f_{\delta }-f_{0}}{\delta }&= (\text {Id}-L_0)^{-1}\frac{L_{\delta }-L_{0}}{\delta }f_{\delta } \\&=(\text {Id}-L_0)^{-1}\frac{L_{\delta }-L_{0}}{\delta }f_{0}+(\text {Id}-L_0)^{-1}\frac{L_{\delta }-L_{0}}{\delta }(f_{\delta }-f_{0}). \end{aligned} \end{aligned}$$

Moreover, from assumption (A2), we have for sufficiently small \(\delta \) that

$$\begin{aligned} \left\| (\text {Id}-L_{0})^{-1}\frac{L_{\delta }-L_{0}}{\delta }(f_{\delta }-f_{0})\right\| _{2}\le \Vert (\text {Id}-L_{0})^{-1}\Vert _{V\rightarrow V}K\Vert f_{\delta }-f_{0}\Vert _{2}. \end{aligned}$$

Since we already proved that \(\lim _{\delta \rightarrow 0}\Vert f_{\delta }-f_{0}\Vert _{2}=0\), we are left with

$$\begin{aligned} \lim _{\delta \rightarrow 0}\frac{f_{\delta }-f_{0}}{\delta }=(\text {Id} -L_{0})^{-1}{\hat{f}} \end{aligned}$$

converging in the \(L^{2}\) norm. \(\square \)

We remark that the strategy of proof of Theorem 2.2 is similar to the one of Theorem 3 of Galatolo and Giulietti (2019) although the assumptions made are quite different, here we consider a compact integral preserving operator on \(L^2\), while in Galatolo and Giulietti (2019) several norms are considered to allow low regularity perturbations and the operator is required to be positive.

It is worth to remark that the above proof gives a description of the spectral picture of \(L_0\). By Theorem 2.2, if \(L_0\) satisfies (A1) then the invariant function is unique, up to normalisation; this shows that 1 is a simple eigenvalue. Furthermore, \(L_0\) preserves the direct sum \(L^2=\) span\(\{f_0\} \oplus V\) and the spectrum of \(L_0\) is strictly inside the unit disk when \(L_0\) is restricted to V. Hence, the spectrum of \(L_0\) is contained in the unit disk and there is a spectral gap.

Remark 2.3

The mixing assumption in (A1) is required only for the unperturbed operator \(L_{0}\). This assumption is satisfied, for example, if \(L_0\) is an integral operator and an iterate of this operator has a strictly positive kernel, see Corollary 5.7.1 of Lasota and Mackey (1985). Later in Remark 6.4 we show this assumption is verified for a wide range of examples of stochastic dynamical systems.

2.2 Existence of Linear Response of the Dominant Eigenvalues

In this section, we consider the existence of linear response for the second largest eigenvalues (in magnitude) and provide a formula for the linear response. An important object needed to quantify linear response statements is a “derivative” of the operator \(L_\delta \) with respect to the perturbation.

Definition 2.4

We define \({\dot{L}}:L^2\rightarrow V\) as the unique linear operator satisfying

$$\begin{aligned} \lim _{\delta \rightarrow 0}\left\| \frac{(L_\delta -L_0)}{\delta }-{\dot{L}}\right\| _{L^2\rightarrow V}=0. \end{aligned}$$

Let \({\mathcal {B}}(L^2)\) denote the space of bounded linear operators from the Banach space \(L^2\) to itself and r(L) denote the spectral radius of an operator L; we begin with the following definition.

Definition 2.5

(Hennion and Hervé 2001, Definition III.7) Let \(s\in {\mathbb {N}}, s\ge 1\). We say that \(L\in {\mathcal {B}}(L^2([0,1],{\mathbb {C}}))\) has s dominating simple eigenvalues if there exists closed subspaces E and \({\tilde{E}}\) such that

  1. 1.

    \(L^2([0,1],{\mathbb {C}}) = E\oplus {\tilde{E}}\),

  2. 2.

    \(L(E)\subset E\), \(L({\tilde{E}})\subset {\tilde{E}}\),

  3. 3.

    dim\((E)=s\) and \(L|_{E}\) has s geometrically simple eigenvalues \(\lambda _i\), \( i=1,\dots , s\),

  4. 4.

    \(r(L|_{{\tilde{E}}})<\min \{|\lambda _i|:i=1,\dots ,s\}\).

Adapting Theorem III.8 and Corollary III.11 of Hennion and Hervé (2001) to our situation, we can now state a linear response result for these eigenvalues.

Proposition 2.6

Let \(L_\delta :L^2([0,1],{{\mathbb {C}}} )\rightarrow L^2([0,1],{{\mathbb {C}}})\), where \(\delta \in [0,{{\bar{\delta }}})=:I_0\), be integral-preserving (see equation (3)) compact operators. Assume that the map \(\delta \mapsto L_\delta \) is in \(C^1(I_0,{\mathcal {B}}(L^2([0,1],{{\mathbb {C}}})))\) and \(L_0\) is mixing (see (A1) in Theorem 2.2). Then, \(\lambda _{1,0}:= 1\in \sigma (L_0)\) and \(r(L_0)=1\). Let \({\mathcal {I}}\subset \sigma (L_0)\setminus \{1\}\) be the eigenvalue(s) of maximal modulus strictly inside the unit disk; assume they are geometrically simple and let \(s:=|{\mathcal {I}}|+1\). Then there exists an interval \(I_1:=[0,\delta _1) \), \(I_1\subset I_0\) such that for \(\delta \in I_1\), \(L_\delta \) has s dominating simple eigenvalues. Thus, there exists functions \(e_{i, (\cdot )},\ {\hat{e}}_{i,(\cdot )}\in C^1(I_1,L^2([0,1],{{\mathbb {C}}}))\) and \(\lambda _{i,(\cdot )}\in C^1(I_1,{{\mathbb {C}}})\) such that for \(\delta \in I_1\) and \(i,j = 2,\dots , s\)

  1. (i)

    \(L_\delta e_{i,\delta } = \lambda _{i,\delta } e_{i,\delta }\), \(L^*_\delta {\hat{e}}_{i,\delta } = \lambda _{i,\delta }{\hat{e}}_{i,\delta }\),

  2. (ii)

    \(\langle e_{i,\delta },{\hat{e}}_{j,\delta }\rangle _{L^2([0,1],{{\mathbb {C}}})} = \delta _{i,j}\), where \(\delta _{i,j}\) is the Kronecker delta.

Furthermore, let \({\dot{\lambda }}_i\in {{\mathbb {C}}}\) satisfy

$$\begin{aligned} \lim _{\delta \rightarrow 0}\bigg |\frac{\lambda _{i,\delta }-\lambda _{i,0}}{\delta } -{\dot{\lambda }}_{i}\bigg | = 0, \end{aligned}$$

then

$$\begin{aligned} \begin{aligned} {\dot{\lambda }}_i = \langle {\hat{e}}_{i,0},{\dot{L}} e_{i,0} \rangle _{L^2([0,1],{{\mathbb {C}}})}, \end{aligned} \end{aligned}$$
(4)

where \({\dot{L}}\) is as in Definition 2.4.

Proof

From Theorem 2.2 and the discussion following it, \(1\in \sigma (L_0)\) and \(r(L_0)=1\).

We now use Theorem III.8 in Hennion and Hervé (2001) to obtain the existence of linear response and Corollary III.11 (Hennion and Hervé 2001) to obtain the formula. We begin by verifying the two hypotheses of Theorem III.8 (Hennion and Hervé 2001). We remark that our map \(\delta \mapsto L_\delta \) belonging to \(C^1([0,{{\bar{\delta }}}),{\mathcal {B}}(L^2([0,1],{{\mathbb {C}}})))\) can be extended to a map \(C^1((-{{\bar{\delta }}},{{\bar{\delta }}}),{\mathcal {B}}(L^2([0,1],{{\mathbb {C}}})))\).

Doing so, hypothesis (H1) of Theorem III.8 (Hennion and Hervé 2001) is satisfied. Since \(r(L_0)=1\), we just need to show that \(L_0\) has s dominating eigenvalues. Since \(L_0 \) is a compact operator, the eigenvalues \(\lambda _{i,0}\in {\mathcal {I}}\) are isolated. Let \(\Pi _i\) be the eigenprojection onto the eigenspace of \(\lambda _{i,0}\) and \(E_i:=\Pi _i (L^2([0,1],{{\mathbb {C}}}))\). Define the eigenspaces \(E:=\bigoplus _{i=1}^s E_i\) and \({\widetilde{E}}: = (\text {Id} -\sum _{i=1}^s\Pi _i)(L^2([0,1],{{\mathbb {C}}}))\). We thus have:

  1. (1)

    \(L^2([0,1],{{\mathbb {C}}}) = E\oplus {\widetilde{E}}\).

  2. (2)

    \(L_0\left( E\right) \subset E\) and \(L_0({\widetilde{E}})\subset {\widetilde{E}}\).

  3. (3)

    dim\(\left( E\right) =s\) and \(L_0|_{E}\) has s simple eigenvalues \(\lambda _{1,0}\cup {\mathcal {I}}\). This point follows from the assumption that the eigenvalues in \({\mathcal {I}}\) are geometrically simple and the fact that \(\lambda _{1,0}\) is simple (see Theorem 2.2).

  4. (4)

    \(r(L_0|_{{\widetilde{E}}}) <|\lambda _{i,0}|\) where \(\lambda _{i,0}\in {\mathcal {I}}\).

Thus, \(L_0\) satisfies hypothesis (H2) of Theorem III.8 since it has s dominating simple eigenvalues and \(r(L_0)=1\). Hence, from Theorem III.8 (Hennion and Hervé 2001), the map \( \delta \mapsto \lambda _{i,\delta }\) is differentiable at \(\delta =0\).

We can now apply the argument in Corollary III.11 (Hennion and Hervé 2001) for \(\lambda _{i,0}\) to obtain (15) (the result and proof of Corollary III.11 (Hennion and Hervé 2001) is for the top eigenvalue, however the argument still holds for any eigenvalue \(\lambda _{i,0}\), \(\in {\mathcal {I}}\) by changing the index value in the proof of the corollary). \(\square \)

3 Application to Hilbert–Schmidt Integral Operators

In this section, we apply the results of the previous section to Hilbert–Schmidt integral operators and suitable perturbations. The operators we consider are compact operators on \(L^{2}([0,1],{{\mathbb {R}}})\) (or \(L^{2}([0,1],{{\mathbb {C}}})\)); for brevity we will denoteFootnote 2\(L^{2}:=L^{2}([0,1],{{\mathbb {R}}})\). To avoid confusion we point out that in the following we will also consider the space \(L^{2}([0,1]^{2})\) of square integrable real functions on the unit square; this space contains the kernels of the operators we consider.

Let \(k\in L^{2}([0,1]^{2})\) and consider the operator \(L:L^2 \rightarrow L^2\) defined in the following way: for \(f\in L^{2}\)

$$\begin{aligned} Lf(x)=\int k(x,y)f(y)\mathrm{d}y; \end{aligned}$$
(5)

such an operator is called a Hilbert–Schmidt integral operator. Such operators may represent the annealed transfer operators of systems perturbed by additive noise (see Sect. 6).

We now list some well-known and basic facts about Hilbert–Schmidt integral operators with kernels in \(L^{2}([0,1]^{2})\):

  • The operator \(L:L^{2}\rightarrow L^{2}\) is bounded and

    $$\begin{aligned} ||Lf||_{2}\le ||k||_{L^{2}([0,1]^{2})}||f||_{2} \end{aligned}$$
    (6)

    (see Proposition 4.7 in II.§4 Conway 2013).

  • If \(k\in L^{\infty }([0,1]^{2})\), then

    $$\begin{aligned} ||Lf||_{\infty }\le ||k||_{L^{\infty }([0,1]^{2})}||f||_{1} \end{aligned}$$
    (7)

    and the operator \(L:L^1\rightarrow L^{\infty }\) is bounded. Furthermore, \(\Vert L\Vert _{L^p\rightarrow L^\infty }\le \Vert k\Vert _{L^\infty ([0,1]^2)}\) for \(1\le p\le \infty \).

  • If for almost every \(y\in [0,1]\) we have

    $$\begin{aligned} \int k(x,y) \mathrm{d}x=1, \end{aligned}$$

    then the Hilbert–Schmidt integral operator associated to the kernel k is integral preserving (satisfies (3)).

  • The operator \(L:L^2\rightarrow L^2\) is compact (see Kolmogorov and Fomin 1961).

Combining the last two points, we have from Theorem 2.2 that such an operator has an invariant function in \(L^{2}\). Furthermore, for \(k\in L^\infty ([0,1]^2)\) we have an analogous result.

Lemma 3.1

Let \(L:L^2\rightarrow L^2\) be an integral operator, with integral-preserving kernel \(k\in L^\infty ([0,1]^2)\), that is mixing (satisfies (A1) of Theorem 2.2). Then, there exists a unique fixed point \(f\in L^\infty \) of L satisfying \(\int f\ \mathrm{d}m =1\). Furthermore, if the kernel is nonnegative, then f is nonnegative.

Proof

Since k is an integral-preserving kernel, \(L_0\) satisfies (3). Thus, we can apply Theorem 2.2 to conclude that there exists a unique \(f\in L^2\), \(\int f\ \mathrm{d}m=1\), such that \(Lf=f\). Noting that \(k\in L^\infty ([0,1]^2)\), we have from inequality (7) that \(f\in L^\infty \).

We now assume k is nonnegative. Let \(k^j\) be the kernel of the operator \(L^{j}\). Since k is an integral-preserving kernel, we have

$$\begin{aligned} \begin{aligned} |k^2(x,y)|&= \bigg |\int k(x,z)k(z,y)\mathrm{d}z\bigg |\le \int |k(x,z)k(z,y)|\mathrm{d}z\\&\le \Vert k\Vert _{L^\infty ([0,1]^2)}\int k(z,y)\mathrm{d}z = \Vert k\Vert _{L^\infty ([0,1]^2)}; \end{aligned} \end{aligned}$$

it easily follows that \(\Vert k^j\Vert _{L^\infty ([0,1]^2)}\le \Vert k\Vert _{L^\infty ([0,1]^2)}\). Thus, for any probability density \(g\in L^1\), we have \(\Vert L^jg\Vert _\infty \le \Vert k\Vert _{L^\infty ([0,1]^2)}\); thus, by Corollary 5.2.2 in Lasota and Mackey (1985), there exists a probability density \({\hat{f}}\in L^1\) such that \(L{\hat{f}} = {\hat{f}}\). Since f is the unique invariant function with integral 1, we have \({\hat{f}}=f\); thus, f is a probability density. \(\square \)

3.1 Characterising Valid Perturbations and the Derivative Operator

In this subsection we consider perturbations of integral-preserving Hilbert–Schmidt integral operators such that assumption (A2) of Theorem 2.2 can be verified and the derivative operator \( {\dot{L}}\) computed. We begin, however, by first characterising the set of perturbations for which the integral preserving property of the operators is preserved.

Consider the set \(V_{\ker }\) of kernels having zero average in the x direction, defined as

$$\begin{aligned} V_{\ker }:=\bigg \{k\in L^{2}([0,1]^{2}): \int k(x,y)\mathrm{d}x=0~for~a.e.~y\bigg \}. \end{aligned}$$

Lemma 3.2

Consider a kernel operator \(A:L^{2}([0,1]) \rightarrow L^{2}([0,1])\) defined by \(Af(x)=\int k(x,y)f(y)\mathrm{d}y\). Then, the following are equivalent

  1. 1.

    \(A(L^{2}([0,1]))\subseteq V\),

  2. 2.

    \(k\in V_{\ker }\).

Proof

Clearly, the second condition implies the first. For the other direction we prove the contrapositive. If \(\int k(x,y)\mathrm{d}x\ne 0\) on a set of positive measure, then for a small \(\epsilon >0\) there is a set S of positive measure \(m(S) >0\) such that \(\int k(x,y)\mathrm{d}x\ge \epsilon \) or \(\int k(x,y)\mathrm{d}x\le -\epsilon \) for each \(y\in S\). Suppose \(\int k(x,y)\mathrm{d}x\ge \epsilon \) in this set, consider \(f:={\mathbf {1}}_{S}\) and \( g:=Af.\) Then, \(g(x)=\int k(x,y){\mathbf {1}}_{S}(y)\mathrm{d}y\) and we have \(\int g(x)\mathrm{d}x= \int _S \int k(x,y) \mathrm{d}x \mathrm{d}y \ge \epsilon \ m(S) \) and \( g\notin V\). The other case \(\int k(x,y)\mathrm{d}x\le -\epsilon \) is analogous. \(\square \)

We now prove that \(V_{\ker }\) is closed.

Lemma 3.3

The set \(V_{\ker }\) is a closed vector subspace of \( L^{2}([0,1]^{2}).\)

Proof

The fact that \(V_{\ker }\) is a vector space is trivial. For fixed \(f\in L^{2}([0,1])\), the set of \(k\in L^2([0,1]^2)\) such that \(\int k(x,y)f(y)\mathrm{d}x\in V\) is closed. To see this, define the function \(K_{f}:L^{2}([0,1]^{2})\rightarrow L^{2}([0,1])\) as

$$\begin{aligned} K_{f}(k)=\int k(x,y)f(y)\mathrm{d}y. \end{aligned}$$
(8)

By (6), \(K_{f}\) is continuous. Since V is closed in \(L^{2}([0,1])\), this implies that \(K_{f}^{-1}(V)\) is closed in \(L^{2}([0,1]^{2}).\) Finally, \(V_{\ker }\) is closed in \(L^2([0,1]^2)\) because \(V_{\ker }=\cap _{f\in L^{2}([0,1])}K_{f}^{-1}(V)\). \(\square \)

We now introduce the type of perturbations which we will investigate throughout the paper. Let \(L_{\delta }:L^{2}\rightarrow L^{2}\) be a family of integral operators, with kernels \(k_{\delta }\in L^{2}([0,1]^{2})\), given by

$$\begin{aligned} L_{\delta }f(x)=\int k_{\delta }(x,y)f(y)\mathrm{d}y. \end{aligned}$$

Lemma 3.4

Let \(k_{\delta }\in L^{2}([0,1]^{2})\) for each \(\delta \in [0,{\bar{\delta }}).\) Suppose that

$$\begin{aligned} k_{\delta }=k_{0}+\delta \cdot {\dot{k}}+r_\delta \end{aligned}$$
(9)

where \({\dot{k}},\ r_\delta \in L^{2}([0,1]^{2})\) and \( ||r_\delta ||_{L^{2}([0,1]^{2})} = o(\delta ).\) The bounded linear operator \({\dot{L}}:L^2\rightarrow V\) defined by

$$\begin{aligned} {\dot{L}}f(x):=\int {\dot{k}}(x,y)f(y)\mathrm{d}y \end{aligned}$$
(10)

satisfies

$$\begin{aligned} \lim _{\delta \rightarrow 0}\bigg \Vert \frac{L_{\delta }-L_{0}}{\delta }-{\dot{L}}\bigg \Vert _{L^2\rightarrow V}=0. \end{aligned}$$

If additionally the derivative of the map \(\delta \mapsto k_\delta \) with respect to \(\delta \) varies continuously in a neighborhood of \(\delta =0\), then \(\delta \mapsto L_\delta \) has a continuous derivative in a neighborhood of \(\delta =0\).

Proof

By integral preservation of \(L_\delta \) and the fact that \({\dot{k}}\in L^2([0,1]^2)\), one sees that \({\dot{L}}:L^2\rightarrow V\) and is bounded. By (9),

$$\begin{aligned} \left\| \frac{L_{\delta }-L_{0}}{\delta }-{\dot{L}}\right\| _{L^2\rightarrow V}= & {} \sup _{\Vert f\Vert _{L^2}=1}\left\| \int \frac{k_{\delta }(x,y)-k_{0}(x,y) }{\delta }f(y)\ \mathrm{d}y - \int {\dot{k}}(x,y)f(y)\ \mathrm{d}y\right\| _{L^2} \\= & {} \sup _{\Vert f\Vert _{L^2}=1}\left\| \int r_\delta (x,y)f(y)\ \mathrm{d}y\right\| _{L^2}\\\le & {} \Vert r_\delta \Vert _{L^2([0,1]^2)}=o(\delta ). \end{aligned}$$

Proceeding similarly, one shows that if the map \(\delta \mapsto k_\delta \) has a continuous derivative with respect to \(\delta \) in a neighborhood of \(\delta =0\), then \(\delta \mapsto L_\delta \) has a continuous derivative. Indeed we are supposing that for each \(\delta \in [0,{\overline{\delta }})\) there is \({\dot{k}}_\delta \) such that for small enough h

$$\begin{aligned} k_{\delta +h}=k_\delta +h\cdot {\dot{k}}_\delta +r_{\delta ,h} \end{aligned}$$

where \({\dot{k}}_\delta ,\ r_{\delta ,h} \in L^{2}([0,1]^{2})\), \( ||r_{\delta ,h} ||_{L^{2}([0,1]^{2})} = o(h)\) and furthermore \(\delta \mapsto {\dot{k}}_\delta \) is continuous. We have then by (6) that the associated operators \({\dot{L}}_\delta \) defined as

$$\begin{aligned} {\dot{L}}_\delta f(x):=\int {\dot{k}}_\delta (x,y)f(y)\mathrm{d}y \end{aligned}$$
(11)

also varies in a continuous way as \(\delta \) increases. \(\square \)

3.2 A Formula for the Linear Response of the Invariant Function and Its Continuity

Now we apply Theorem 2.2 to Hilbert–Schmidt integral operators to obtain a linear response formula for \(L^2\) perturbations.

Corollary 3.5

(Linear response formula for kernel operators) Suppose \(L_{\delta }:L^2\rightarrow L^2\) are integral-preserving (satisfying (3)) integral operators with stochastic kernels \(k_{\delta }\in L^2([0,1]^{2})\) as in (9). Suppose \(L_0\) satisfies assumption (A1) of Theorem 2.2. Then \({\dot{k}}\in V_{\ker }\), the system has linear response for this perturbation and an explicit formula for it is given by

$$\begin{aligned} \lim _{\delta \rightarrow 0}\frac{f_{\delta }-f_{0}}{\delta }=(\text {Id} -L_{0})^{-1}\int {\dot{k}}(x,y)f_{0}(y)\mathrm{d}y \end{aligned}$$
(12)

with convergence in \(L^{2}.\)

Proof

Since \(L_\delta \), \(\delta \in [0,{{\bar{\delta }}})\), is integral preserving, we have \((L_\delta -L_0)(L^2)\subset V\) and therefore, \(k_\delta -k_0\in V_{\ker }\) by Lemma 3.2, i.e. \(\delta \cdot {\dot{k}}+r_\delta \in V_{\ker }\). Then \({\dot{k}}+\frac{r_\delta }{\delta }\in V_{\ker }\) for each \(\delta \). Since \(\frac{r_\delta }{\delta }\rightarrow 0\) in \(L^2\) and \(V_{\ker }\) is a closed subspace we have \({\dot{k}}\in V_{\ker }\). Furthermore by (9) there is a \(K\ge 0\) such that

$$\begin{aligned} \left| |L_{0}-L_{\delta }|\right| _{L^{2}\rightarrow L^{2}}\le K\delta . \end{aligned}$$
(13)

Hence the family of operators satisfy the first part of assumption (A2). The second part of this assumption is established by the first result of Lemma 3.4.

Since the operators \(L_\delta \) are compact, integral preserving, and satisfy assumptions (A1) and (A2) we can conclude by applying Theorem 2.2 to this family of operators, obtaining

$$\begin{aligned} \lim _{\delta \rightarrow 0}\left\| \frac{f_{\delta }-f_{0}}{\delta }-( \text {Id}-L_{0})^{-1}\int {\dot{k}}(x,y)f_{0}(y)\mathrm{d}y\right\| _{2}=0. \end{aligned}$$

\(\square \)

Now we show that the linear response of the invariant function is continuous with respect to the kernel perturbation. This will be used in Sect. 4 for the proof of the existence of solutions of our main optimisation problems.

Consider the operator \(L_{0}\), having a kernel \(k_{0}\in L^{2}([0,1]^2)\), and a set of infinitesimal perturbations \(P\subset V_{\ker }\) of \(k_{0}\). We will endow P with the topology induced by its inclusion in \(L^2([0,1]^2)\). Suppose \(L_{\delta }\) is a perturbation of \(L_0\) satisfying the assumptions of Lemma 3.4. By Corollary 3.5, the linear response will depend on the first-order term of the perturbation, \(\dot{k} \in P\), allowing us to define the function \(R:P\rightarrow V\) by

$$\begin{aligned} R({\dot{k}}):=(\text {Id} -L_{0})^{-1}\int {\dot{k}}(x,y)f_{0}(y)\mathrm{d}y. \end{aligned}$$
(14)

By (6) and the continuity of the resolvent operator it follows directly that the response function \(R:(P,\Vert \cdot \Vert _{L^2([0,1]^2)})\rightarrow (V,\Vert \cdot \Vert _{L^2})\) is continuous.

3.3 A Formula for the Linear Response of the Dominant Eigenvalues and Its Continuity

We apply Proposition 2.6 to Hilbert–Schmidt integral operators and obtain a linear response formula for the dominant eigenvalues in the case of \(L^2\) perturbations. Denote by \(\Re (\cdot )\) and \(\Im (\cdot )\) the functions that return the real and imaginary parts of complex arguments.

Corollary 3.6

Suppose \(L_{\delta }:L^2([0,1],{{\mathbb {C}}})\rightarrow L^2([0,1],{{\mathbb {C}}})\) are integral-preserving (satisfying (3)) integral operators with kernels \(k_{\delta }\in L^2([0,1]^{2})\) satisfying \(\delta \mapsto k_\delta \in C^1([0,{\bar{\delta }}),L^2([0,1]^{2}))\). Suppose \(L_0\) satisfies (A1) of Theorem 2.2. Let \(\lambda _0\in {{\mathbb {C}}}\) be an eigenvalue of \(L_0\) with the largest magnitude strictly inside the unit circle and assume that \(\lambda _0\) is geometrically simple. Then, there exists \({\dot{\lambda }}\in {\mathbb {C}}\) such that

$$\begin{aligned} \lim _{\delta \rightarrow 0}\bigg |\frac{\lambda _{\delta }-\lambda _{0}}{\delta } -{\dot{\lambda }}\bigg | = 0. \end{aligned}$$

Furthermore,

$$\begin{aligned} \begin{aligned} {\dot{\lambda }}&= \int _0^1\int _0^1{\dot{k}}(x,y)\left( \Re ({\hat{e}})(x)\Re (e)(y) + \Im ({\hat{e}})(x)\Im (e)(y) \right) \mathrm{d}y\mathrm{d}x\\&\quad + i\int _0^1\int _0^1{\dot{k}}(x,y)\left( \Im ({\hat{e}})(x)\Re (e)(y) - \Re ({\hat{e}})(x)\Im (e)(y) \right) \mathrm{d}y\mathrm{d}x, \end{aligned} \end{aligned}$$
(15)

where \(e\in L^2([0,1],{\mathbb {C}})\) is the eigenvector of \(L_0\) associated to the eigenvalue \(\lambda _0\), \({\hat{e}}\in L^2([0,1],{\mathbb {C}})\) is the eigenvector of \(L_0^*\) associated to the eigenvalue \(\lambda _{0}\) and \({\dot{L}}\) is the operator in Lemma 3.4.

Proof

Since \(k_\delta \in L^2([0,1]^2)\), the operator \(L_\delta :L^2([0,1],{{\mathbb {C}}})\rightarrow L^2([0,1],{{\mathbb {C}}})\) is compact; by assumption, it also satisfies (3). From Lemma 3.4, the map \(\delta \mapsto L_\delta \) is \(C^1\). Hence, by Proposition 2.6, we have \({\dot{\lambda }} = \langle {\hat{e}},{\dot{L}} e \rangle _{L^2([0,1],{\mathbb {C}})}\). Finally, we compute

$$\begin{aligned} \begin{aligned} {\dot{\lambda }}=\langle {\hat{e}},{\dot{L}} e \rangle _{L^2([0,1],{\mathbb {C}})}&= \int _0^1{\hat{e}}(x) \overline{{\dot{L}}e}(x) \mathrm{d}x \\&= \int _0^1{\hat{e}}(x)\overline{\int _0^1 {\dot{k}}(x,y)e(y)\mathrm{d}y}\mathrm{d}x\\&= \int _0^1\int _0^1{\dot{k}}(x,y){\hat{e}}(x){\bar{e}}(y) \mathrm{d}y\mathrm{d}x\\&= \int _0^1\int _0^1{\dot{k}}(x,y)\left( \Re ({\hat{e}})(x)\Re (e)(y) + \Im ({\hat{e}})(x)\Im (e)(y) \right) \mathrm{d}y\mathrm{d}x\\&\quad + i\int _0^1\int _0^1{\dot{k}}(x,y)\left( \Im ({\hat{e}})(x)\Re (e)(y) - \Re ({\hat{e}})(x)\Im (e)(y) \right) \mathrm{d}y\mathrm{d}x. \end{aligned} \end{aligned}$$

\(\square \)

From the expression in the final line of the proof above, it is clear that if we consider \({\dot{\lambda }}\) as a function of \({\dot{k}}\), the map \({\dot{\lambda }}:(V_{\ker },\Vert \cdot \Vert )_{L^2([0,1]^2)}\rightarrow {\mathbb {C}}\) is continuous.

4 Optimal Response: Optimising the Expectation of Observables and Mixing Rate

Having described the responses of our dynamical systems to perturbations, it is natural to consider the optimisation problem of finding perturbations that provoke maximal responses. We consider the problems of finding the infinitesimal perturbation that maximises the expectation of a given observable and the infinitesimal perturbation that maximally enhances mixing. In doing so, we extend the approach in Antown et al. (2018) from the setting of finite-state Markov chains to the integral operators considered in the present paper. We are now in the realm of infinite-dimensional optimisation, which is considerably more challenging than the finite-dimensional optimisation in Antown et al. (2018).

We show that at an abstract level these problems reduce to the optimisation of a linear continuous functional \({\mathcal {J}}\) on a convex set P of feasible perturbations; this problem has a solution and the solution is unique if the set P of allowed infinitesimal perturbations is strictly convex. The convexity assumption on P is natural because if two different perturbations of the system are possible, then their convex combination (applying the two perturbations with different intensities) will also be possible. After introducing the abstract setting, we construct the objective functions for our two optimal response problems and state general existence and uniqueness results for the optima. Later, in Sect. 5 we focus on the construction of the set of feasible perturbations and provide explicit formulae for the maximising perturbations.

4.1 General Optimisation Setting, Existence and Uniqueness

We recall some general results (adapted for our purposes) on optimising a linear continuous function on convex sets; see also Lemma 6.2 (Froyland et al. 2020). The abstract problem is to find \({\dot{k}}\) such that

$$\begin{aligned} {\mathcal {J}}({\dot{k}})=\max _{{\dot{h}}\in P}{\mathcal {J}}({\dot{h}}), \end{aligned}$$
(16)

where \({\mathcal {J}}:{\mathcal {H}}\rightarrow {{\mathbb {R}}}\) is a continuous linear function, \({\mathcal {H}}\) is a separable Hilbert space and \(P\subset \mathcal {H }\).

Proposition 4.1

(Existence of the optimal solution) Let P be bounded, convex, and closed in \({\mathcal {H}}\). Then, problem considered at (16) has at least one solution.

Proof

Since P is bounded and \({\mathcal {J}}\) is continuous, we have that \(\sup _{k\in P}{\mathcal {J}}(k)<\infty \). Consider a maximising sequence \(k_{n}\) such that \( \lim _{n\rightarrow \infty }{\mathcal {J}}(k_{n})=\sup _{k\in P}{\mathcal {J}}(k)\). Then, \(k_{n}\) has a subsequence \(k_{n_{j}}\) converging in the weak topology. Since P is strongly closed and convex in \({\mathcal {H}}\), we have that it is weakly closed. This implies that \(\overline{k}:=\lim _{j\rightarrow \infty }k_{n_{j}}\in P.\) Also, since \({\mathcal {J}}(k)\) is continuous and linear, it is continuous in the weak topology. Then we have that \({\mathcal {J}}(\overline{k})=\lim _{j\rightarrow \infty }{\mathcal {J}}(k_{n_{j}})=\sup _{k\in P}{\mathcal {J}}(k)\) and we realise a maximum. \(\square \)

Uniqueness of the optimal solution will be provided by strict convexity of the feasible set.

Definition 4.2

We say that a convex closed set \(A\subseteq {\mathcal {H}}\) is strictly convex if for each pair \(x,y\in A\) and for all \(0<\gamma <1\), the points \(\gamma x+(1-\gamma )y\in \mathrm {int}(A)\), where the relative interiorFootnote 3 is meant.

Proposition 4.3

(Uniqueness of the optimal solution) Suppose P is closed, bounded, and strictly convex subset of \({{\mathcal {H}}}\), and that P contains the zero vector in its relative interior. If \({\mathcal {J}}\) is not uniformly vanishing on P then the optimal solution to (16) is unique.

Proof

Suppose that there are two distinct maxima \({\dot{k}}_1,{\dot{k}}_2\in P\) with \({\mathcal {J}}({\dot{k}}_1)={\mathcal {J}}({\dot{k}}_2)=\alpha \). Let \(0<\gamma <1\) and set \(z=\gamma {\dot{k}}_1+(1-\gamma ){\dot{k}}_2\). By strict convexity of P, \(z\in \mathrm {int}(P)\), and by linearity of \({\mathcal {J}}\), \({\mathcal {J}}(z)=\alpha \). Let \(B_r(z)\) denote a (relative in P) open ball of radius r centred at z, with \(r>0\) chosen small enough so that \(B_r(z)\subset \mathrm {int}(P)\). Because the zero vector lies in the relative interior of P, and \({\mathcal {J}}\) does not uniformly vanish on P, there exists a vector \(v\in B_r(z)\) such that \({\mathcal {J}}(v)>0\). Now \(z+\frac{rv}{2\Vert v\Vert }\in \mathrm {int}(P)\) and \({\mathcal {J}}(z+\frac{rv}{2\Vert v\Vert })>\alpha \), contradicting maximality of \({\dot{k}}_1\). \(\square \)

In the following subsections we apply the general results of this section to our specific optimisation problems.

4.2 Optimising the Response of the Expectation of an Observable

Let \(c\in L^2\) be a given observable. We consider the problem of finding an infinitesimal perturbation that maximises the expectation of c. The perturbations we consider are perturbations to the kernels of Hilbert–Schmidt integral operators, of the form (9). If we denote the average of c with respect to the perturbed invariant density \(f_{\delta }\) by

$$\begin{aligned} {\mathbb {E}}_{c,\delta }:=\int c~f_{\delta }~\mathrm{d}m, \end{aligned}$$

we have

$$\begin{aligned} \frac{d{\mathbb {E}}_{c,\delta }}{\mathrm{d}\delta }\bigg |_{\delta =0}=\lim _{\delta \rightarrow 0} \frac{{\mathbb {E}}_{c,\delta }-{\mathbb {E}}_{c,0}}{\delta }=\lim _{\delta \rightarrow 0}\int c~ \frac{f_{\delta }-f_{0}}{\delta }~\mathrm{d}m=\int c~R({\dot{k}})~\mathrm{d}m, \end{aligned}$$

where the last equality follows from Corollary 3.5 and (14).

The function \({\mathcal {J}}({\dot{k}})=\langle c,R({\dot{k}})\rangle \) is clearly continuous as a map from \((V_{\ker },\Vert \cdot \Vert )_{L^2([0,1]^2)}\) to \({\mathbb {R}}\). Suppose that P is a closed, bounded, convex subset of \(V_{\ker }\) containing the zero perturbation, and that \({\mathcal {J}}\) is not uniformly vanishing on P. We wish to solve the following problem:

General Problem 1

Find \({\dot{k}}\in P\) such that

$$\begin{aligned} \big \langle c,R({\dot{k}})\big \rangle _{L^{2}([0,1],{{\mathbb {R}}})}=\max _{{\dot{h}}\in P} \big \langle c,R({\dot{h}})\big \rangle _{L^{2}([0,1],{{\mathbb {R}}})}. \end{aligned}$$
(17)

We may immediately apply Proposition 4.1 to obtain that there exists a solution to (17). If, in addition, P is strictly convex, then by Proposition 4.3 the solution to (17) is unique.

To end this subsection we note that without loss of generality, we may assume that \(c\in \) span\(\{f_0\}^\perp \). This is because for \(c\in L^{2}\), we have

$$\begin{aligned} \langle c,R({\dot{k}})\rangle _{L^{2}([0,1],{{\mathbb {R}}})} =\langle c- \langle c,f_0\rangle _{L^{2}([0,1],{{\mathbb {R}}})}{\mathbf {1}},R({\dot{k}} )\rangle _{L^{2}([0,1],{{\mathbb {R}}})}, \end{aligned}$$

since \(R({\dot{k}})\in V\). From \(\int f_0(x) \mathrm{d}x = 1,\) we have that \(f\mapsto \langle f,f_0\rangle _{L^{2}([0,1],{{\mathbb {R}}})}{\mathbf {1}}\) is a projection onto span\(\{{\mathbf {1}}\}\) and so \(f\mapsto f- \langle f,f_0\rangle _{L^{2}([0,1],{{\mathbb {R}}})}{\mathbf {1}}\) is a projection onto span\( \{f_0\}^\perp \).

4.3 Optimising the Response of the Rate of Mixing

We now consider the linear response problem of optimising the rate of mixing. Let \(\lambda _0\in {{\mathbb {C}}}\) denote an eigenvalue of \(L_0\) strictly inside the unit circle with largest magnitude. From now on, whenever discussing the linear response of eigenvalues to kernel perturbations we assume the conditions of Corollary 3.6. We recall that e and \({\hat{e}}\) are the eigenfunctions of \(L_0\) and \(L_0^*\), respectively, corresponding to the eigenvalue \(\lambda _0\).

To find the kernel perturbations that enhance mixing, we follow the general approach taken in Antown et al. (2018) (see also Froyland and Santitissadeekorn 2017; Froyland et al. 2020 in the continuous time setting), namely perturbing our original dynamics \(L_0\) in such a way that the modulus of the second eigenvalue of the perturbed dynamics decreases. Equivalently, we want to decrease the real part of the logarithm of the perturbed second eigenvalue. The following result provides an explicit formula for this instantaneous rate of change. Define

$$\begin{aligned} E(x,y)&:= \left( \Re ({\hat{e}})(x)\Re (e)(y) + \Im ({\hat{e}})(x)\Im (e)(y)\right) \Re (\lambda _{0}) \nonumber \\&\quad + \left( \Im ({\hat{e}})(x)\Re (e)(y) - \Re ({\hat{e}})(x)\Im (e)(y)\right) \Im (\lambda _{0}). \end{aligned}$$
(18)

Lemma 4.4

One has

$$\begin{aligned} \frac{\mathrm{d}}{\mathrm{d}\delta }\Re \left( \log \lambda _{\delta }\right) \bigg |_{\delta =0} = \frac{ \big \langle {\dot{k}}, E\big \rangle _{L^2([0,1]^2,{{\mathbb {R}}})}}{|\lambda _{0}|^2}. \end{aligned}$$

Proof

From (15), we have that

$$\begin{aligned} \Re ({\dot{\lambda }}_{0}) = \int _0^1\int _0^1{\dot{k}}(x,y)\left( \Re ({\hat{e}})(x)\Re (e)(y) + \Im ({\hat{e}})(x)\Im (e)(y) \right) \mathrm{d}y\mathrm{d}x \end{aligned}$$
(19)

and

$$\begin{aligned} \Im ({\dot{\lambda }}_{0}) = \int _0^1\int _0^1{\dot{k}}(x,y)\left( \Im ({\hat{e}})(x)\Re (e)(y) - \Re ({\hat{e}})(x)\Im (e)(y) \right) \mathrm{d}y\mathrm{d}x. \end{aligned}$$
(20)

Next, we note that

$$\begin{aligned} \frac{\mathrm{d}}{\mathrm{d}\delta } \Re (\log \lambda _{\delta }) = \Re \left( \frac{\mathrm{d}}{\mathrm{d}\delta } \log \lambda _{\delta }\right) = \Re \left( \frac{\mathrm{d}\lambda _{\delta }}{\mathrm{d}\delta }\frac{1}{ \lambda _{\delta }}\right) . \end{aligned}$$
(21)

From (19)-(21), we obtain

$$\begin{aligned} \frac{\mathrm{d}}{\mathrm{d}\delta }\Re \left( \log \lambda _{\delta }\right) \bigg |_{\delta =0}= & {} \Re \left( \frac{{\dot{\lambda }}_{0}}{\lambda _{0}}\right) = \Re \left( \frac{{\dot{\lambda }}_{0}}{\lambda _{0}}\frac{\overline{\lambda _{0}}}{\overline{ \lambda _{0}}}\right) = \frac{\Re ({\dot{\lambda }}_{0})\Re (\lambda _{0})+\Im ({\dot{\lambda }}_{0})\Im (\lambda _{0})}{| \lambda _{0}|^2}\\= & {} \frac{\big \langle {\dot{k}}, E\big \rangle _{L^2([0,1]^2,{{\mathbb {R}}})}}{|\lambda _{0}|^2}. \end{aligned}$$

\(\square \)

The function \({\mathcal {J}}({\dot{k}})=\langle {\dot{k}},E\rangle \) is clearly continuous as a map from \((V_{\ker },\Vert \cdot \Vert _{L^2([0,1]^2)})\) to \({\mathbb {R}}\). As in Sect. 4.2, suppose that P is a closed, bounded, strictly convex subset of \(V_{\ker }\) containing the zero element, and that \({\mathcal {J}}\) is not uniformly vanishing on P. We wish to solve the following problem:

General Problem 2

Find \({\dot{k}}\in P\) such that

$$\begin{aligned} \langle {\dot{k}},E\rangle _{L^2([0,1]^2,{{\mathbb {R}}})}=\min _{{\dot{h}}\in P} \langle {\dot{k}},E\rangle _{L^2([0,1]^2,{{\mathbb {R}}})}. \end{aligned}$$
(22)

We may immediately apply Proposition 4.1 to obtain that there exists a solution to (17). If, in addition, P is strictly convex, then by Proposition 4.3 the solution to (22) is unique.

5 Explicit Formulae for the Optimal Perturbations

Thus far we have not been specific about the feasible set P; we take up this issue in this and the succeeding subsections to provide explicit formulae for the optimal responses in both problems (17) and (22). First, we have not required that the perturbed kernel \(k_\delta \) in (9) be nonnegative for \(\delta >0\), however, this is a natural assumption. To facilitate this, for \(0<l<1\), define

$$\begin{aligned}&F_l:=\{(x,y)\in [0,1]^2:k_0(x,y)\ge l\}\quad \text{ and } \nonumber \\&\quad S_{k_0,l}:= \{k\in L^2([0,1]^2): \text {supp}(k)\subseteq F_l\}. \end{aligned}$$
(23)

The set of allowable perturbations that we will consider in the sequel is

$$\begin{aligned} P_l := V_{\ker }\cap S_{k_0,l}\cap B_1, \end{aligned}$$
(24)

where \(B_1\) is the closed unit ball in \(L^2([0,1]^2)\). For modelling purposes, one may use also the parameter l to restrict the class of allowed perturbations to those that are more likely to occur according to \(k_0\). Note that in the particular situation where the support of \(k_0\) is sparse—for example when significant determinism is present—this sparsity will be respected by the perturbations in \(P_l\).

We now begin verifying the conditions on \(P_l\) and \({\mathcal {J}}\) required by Proposition 4.3. First, \(P_l\) is clearly bounded in \(L^2([0,1]^2)\). Second, we note that as long as \(F_l\) has positive Lebesgue measure, the zero kernel is in the relative interior of \(P_l\). Third, the following lemma handles closedness of \(P_l\). Fourth, from this, since \(V_{\ker }\) and \(S_{k_0,l}\) are closed subspaces, \( V_{\ker }\cap S_{k_0,l}\) is itself a Hilbert space, and hence, \(P_l\) is strictly convex. Finally, sufficient conditions for the objective function to not uniformly vanish are given in Lemma 5.2.

Lemma 5.1

The set \(S_{k_0,l}\) is a closed subspace of \(L^2([0,1]^2)\).

Proof

The fact that \(S_{k_0,l}\) is a subspace is trivial. Let \(\{k_n\}\subset S_{k_0,l}\) and suppose \(k_n\rightarrow _{L^2} k\in L^2([0,1]^2)\). Further suppose \(\{(x,y)\in [0,1]^2: k_0(x,y)< l\}\) is not a null set; otherwise \(S_{k_0,l}=L^2([0,1]^2)\) and the result immediately follows. Then, we have

$$\begin{aligned} \int _{\{k_0\ge l\}} (k_n(x,y)-k(x,y))^2\mathrm{d}y\mathrm{d}x + \int _{\{k_0 < l\}} k(x,y)^2 \mathrm{d}x\mathrm{d}y\rightarrow 0. \end{aligned}$$

Since \(\int _{\{k_0\ge l\}} (k_n(x,y)-k(x,y))^2\mathrm{d}y\mathrm{d}x\ge 0\), if \(\int _{\{k_0< l\}} k(x,y)^2 \mathrm{d}x\mathrm{d}y>0\) then we obtain a contradiction; thus, \( \int _{\{k_0< l\}} k(x,y)^2 \mathrm{d}x\mathrm{d}y=0\) and therefore \(k=0\) a.e. on \(\{(x,y)\in [0,1]^2: k_0(x,y)< l\}\). Hence, \(S_{k_0,l}\) is closed. \(\square \)

Let

$$\begin{aligned} F_l^y:=\{x\in [0,1]:(x,y)\in F_l\}, \end{aligned}$$
(25)

and for \(F_l\subset [0,1]^2\), define

$$\begin{aligned} \Xi (F_l)=\{y\in [0,1]: m(F_l^y)>0\}. \end{aligned}$$

The following lemma provides sufficient conditions for a functional of the general form we wish to optimise to not uniformly vanish. The general objective has the form \({\mathcal {J}}({\dot{k}})=\int \int {\dot{k}}(x,y){\mathcal {E}}(x,y)\ \mathrm{d}y\ \mathrm{d}x\); in our first specific objective (optimising response of expectations) we put \({\mathcal {E}}(x,y)=((\text {Id}-L_0^*)^{-1}c)(x)\cdot f_0(y)\) and in our second specific objective (optimising mixing) we put \({\mathcal {E}}(x,y)=E(x,y)\) from (18). Let \({\mathcal {E}}^+\) and \({\mathcal {E}}^-\) denote the positive and negative parts of \({\mathcal {E}}\). For \(y\in \Xi (F_l)\), let \(A(y)=\int _{F_l^y} {\mathcal {E}}^+(x,y)\ \mathrm{d}x\) and \(a(y)=\int _{F_l^y} {\mathcal {E}}^-(x,y)\ \mathrm{d}x\).

Lemma 5.2

Assume that there is \(\Xi '\subset \Xi (F_l)\) such that \(m(\Xi ')>0\) and \(A(y),a(y)>0\) for \(y\in \Xi '\). Then there is a \({\dot{k}}\in P_l\) such that \({\mathcal {J}}({\dot{k}})>0\).

Proof

For \(y\in \Xi (F_l)\), set \({\dot{k}}(x,y)={\mathbf {1}}_{F_l^y}(x)\left( a(y){\mathcal {E}}^+(x,y)-A(y){\mathcal {E}}^-(x,y)\right) \). To show \({\dot{k}}\in P_l\) we need to check that (i) the support of \({\dot{k}}\) is contained in \(F_l\) and (ii) \(\int _{F_l^y} {\dot{k}}(x,y)\ \mathrm{d}x=0\) for a.e. \(y\in \Xi (F_l)\); these points show \({\dot{k}}\in S_{k_0,l}\cap V_{\ker }\) and by trivial scaling we may obtain \({\dot{k}}\in B_1\). Item (i) is obvious from the definition of \({\dot{k}}\). For item (ii) we compute

$$\begin{aligned}&\int _{F_l^y} {\dot{k}}(x,y)\ \mathrm{d}x=\int _{F_l^y}(a(y){\mathcal {E}}^+(x,y)-A(y){\mathcal {E}}^-(x,y))\ \mathrm{d}x\\&=a(y)A(y)-A(y)a(y)=0. \end{aligned}$$

Finally, we check that \({\mathcal {J}}({\dot{k}})>0\). One has

$$\begin{aligned}&\int _{F_l} {\dot{k}}(x,y){\mathcal {E}}(x,y)\ \mathrm{d}x\ \mathrm{d}y\\&\quad =\int _{F_l} \left( a(y){\mathcal {E}}^+(x,y)-A(y){\mathcal {E}}^-(x,y)\right) \cdot {\mathcal {E}}(x,y)\ \mathrm{d}x\ \mathrm{d}y\\&\quad =\int _{F_l} a(y)({\mathcal {E}}^+(x,y))^2+A(y)({\mathcal {E}}^-(x,y))^2\ \mathrm{d}x\ \mathrm{d}y\\&\quad =\int _{\Xi (F_l)}\left[ \left( \int _{F_l^y} {\mathcal {E}}^-(x,y)\ \mathrm{d}x\right) \cdot \left( \int _{F_l^y} ({\mathcal {E}}^+(x,y))^2\ \mathrm{d}x\right) \right. \\&\qquad \left. +\left( \int _{F_l^y} {\mathcal {E}}^+(x,y)\ \mathrm{d}x\right) \cdot \left( \int _{F_l^y} ({\mathcal {E}}^-(x,y))^2\ \mathrm{d}x\right) \right] \ \mathrm{d}y. \end{aligned}$$

This final expression is positive due by the hypotheses of the Lemma. \(\square \)

Remark 5.3

We note that in the situation where \({\mathcal {E}}(x,y)\) is in separable form \({\mathcal {E}}(x,y)=h_1(x)h_2(y)\)—as in the case of optimising the derivative of the expectation of an observable c , and in the case of optimising the derivative of a real eigenvalue—then \(A(y)=h_2(y)\int _{F_l^y} h_1^+(x)\ \mathrm{d}x\) and \(a(y)=h_2(y)\int _{F_l^y} h_1^-(x)\ \mathrm{d}x\). Because \(h_2=f_0\) and \(h_2=e\) are not the zero function, and \(h_1=(\text {Id}-L_0^*)^{-1}c\) and \(h_1={\hat{e}}\) are both nontrivial signed functions, the conditions of Lemma 5.2 are relatively easy to satisfy.

5.1 Maximising the Expectation of an Observable

In this section, we provide an explicit formula for the optimal kernel perturbation to increase the expectation of an observation function c by the greatest amount. Since the objective function in (17) is linear in \({\dot{k}}\), a maximum will occur on \(\partial B_1\cap V_{\ker }\cap S_{k_0,l}\) (i.e. we only need to consider the optimisation over the unit sphere and not the unit ball). Thus, we consider the following reformulation of the general problem 1:

Problem A

Given \(l > 0 \) and \(c\in \) span\(\{f_0\}^\perp \), solve

$$\begin{aligned} \min _{{\dot{k}}\in V_{\ker }\cap S_{k_0,l}}&-\big \langle c,R({\dot{k}}) \big \rangle _{L^{2}([0,1],{{\mathbb {R}}})} \end{aligned}$$
(26)
$$\begin{aligned} \text{ subject } \text{ to }&\Vert {\dot{k}}\Vert _{L^{2}([0,1]^{2})}^{2}-1=0. \end{aligned}$$
(27)

Our first main result is:

Theorem 5.4

Let \(L_0:L^2\rightarrow L^2\) be an integral operator with the stochastic kernel \(k_0\in L^2([0,1]^2)\). Suppose that \(L_0\) satisfies (A1) of Theorem 2.2 and that there is a \(\Xi '\subset \Xi (F_l)\) with \(m(\Xi ')>0\) and \(f_0(y)>0, \int _{F_l^y} ((\text {Id}-L_0^*)^{-1}c)^+(x)\ \mathrm{d}x>0\), and \(\int _{F_l^y} ((\text {Id}-L_0^*)^{-1}c)^-(x)\ \mathrm{d}x>0\) for \(y\in \Xi '\). Then the unique solution to Problem A is

$$\begin{aligned} {\dot{k}}(x,y)= {\left\{ \begin{array}{ll} \frac{f_0(y)}{\alpha }\left( ((\text {Id}-L_{0}^{*})^{-1}c)(x)-\frac{ \int _{F_l^y}((\text {Id}-L_{0}^{*})^{-1}c)(z)\mathrm{d}z}{m(F_l^y)} \right) &{} \quad (x,y)\in F_l, \\ 0 &{} \quad \text {otherwise}, \end{array}\right. } \end{aligned}$$
(28)

where \(\alpha >0\) is selected so that \(\Vert {\dot{k}} \Vert _{L^2([0,1]^2)}=1\). Furthermore, if \(c\in W:=\) span\(\{f_0\}^\perp \cap L^\infty \), \(k_0\in L^\infty ([0,1]^2)\), and \(k_0\) is such that \( L_0:L^1\rightarrow L^1\) is compact, then \({\dot{k}}\in L^\infty ([0,1]^2)\).

Proof

See Appendix A. \(\square \)

Note that the expression for the optimal perturbation \({\dot{k}}\) in (28) depends only on \(k_0\) and c. This is in part a consequence of the fact that the linear response formula (12) depends only on the first-order term \({\dot{k}}\) (the “direction” of the perturbation) in the expansion of \(k_\delta \). Thus, in order to find the unique perturbation that optimises our linear response, we seek the best “direction” for the perturbation. Similar comments hold for our other three optimal linear perturbation results in later sections.

Remark 5.5

In certain situations we may desire to make non-infinitesimal perturbations \(k_\delta := k_0 + \delta \cdot {\dot{k}}\) that remain stochastic for small \(\delta >0\). If \({\dot{k}}\in L^\infty ([0,1]^2)\cap V_{\ker }\cap S_{k_0,l}\), clearly \(k_\delta = k_0 + \delta \cdot {\dot{k}}\) satisfies \(\int k_\delta (x,y) \mathrm{d}x =1\) for a.e. y. Also, as we are only perturbing at values where \(k_0\ge l>0 \), and since \({\dot{k}}\) is essentially bounded, there exists a \({\bar{\delta }}>0\) such that \(k_\delta \ge 0\) a.e. for all \( \delta \in (0,{{\bar{\delta }}})\). In summary, for \(\delta \in (0,{\bar{\delta }})\), \(k_\delta \) is a stochastic kernel.

The compactness condition on \(L_0:L^1\rightarrow L^1\) required for essential boundedness of \({\dot{k}}\) can be addressed as follows. A criterion for \(L_0\) to be compact on \(L^1([0,1])\) is the following (see Eveson 1995): Given \( \varepsilon >0\) there exists \(\beta >0\) such that for a.e. \(y\in [0,1]\) and \( \gamma \in {{\mathbb {R}}}\) with \(|\gamma |<\beta \),

$$\begin{aligned} \int _{{{\mathbb {R}}}}\big |{\tilde{k}}(x+\gamma ,y)-{\tilde{k}}(x,y)\big | \mathrm{d}x<\varepsilon , \end{aligned}$$

where \({\tilde{k}}:{\mathbb {R}}\times [0,1]\rightarrow {\mathbb {R}}\) is defined by

$$\begin{aligned} {\tilde{k}}(x,y) = {\left\{ \begin{array}{ll} k_0(x,y) &{} \quad x\in [0,1], \\ 0 &{} \quad \text {otherwise}. \end{array}\right. } \end{aligned}$$

A class of kernels that satisfy this are essentially bounded kernels \(k_0:[0,1]\times [0,1]\rightarrow {{\mathbb {R}}}\) that are uniformly continuous in the first coordinate. Such a class naturally arises in our dynamical systems settings.

5.2 Maximally Increasing the Mixing Rate

Let \(\lambda _0\in {{\mathbb {C}}}\) denote a geometrically simple eigenvalue of \(L_0\) strictly inside the unit circle and e and \({\hat{e}}\) denote the corresponding eigenvectors of \(L_0\) and \(L_0^*\), respectively. Our results concerning optimal rate of movement of \(\lambda _0\) under system perturbation work for any \(\lambda _0\) as above, but eigenvalues of largest magnitude inside the unit circle have the additional significance of controlling the exponential rate of mixing. We therefore primarily focus on these eigenvalues, and in this section, we consider again the linear response problem for enhancing the rate of mixing, now providing explicit formulae for optimal perturbations and the response.

Since we are again interested in kernel perturbations that will ensure that the perturbed kernel \(k_\delta \) is nonnegative, we consider the constraint set \(P_l\), as in Sect. 4.1, where \(0<l<1\). The objective function of (22) is linear and therefore, we only need to consider the optimisation problem on \(V_{\ker }\cap S_{k_0,l}\cap \partial B_1\). Thus, to obtain the perturbation \({\dot{k}}\) that will enhance the mixing rate, we solve the following optimisation problem:

Problem B

Given \(l > 0\), solve

$$\begin{aligned} \min _{{\dot{k}}\in V_{\ker }\cap S_{k_0,l}}&\big \langle {\dot{k}},E\big \rangle _{L^{2}([0,1]^{2},{{\mathbb {R}}})} \end{aligned}$$
(29)
$$\begin{aligned} \text{ such } \text{ that }&\Vert {\dot{k}}\Vert _{L^{2}([0,1]^{2},{{\mathbb {R}}})}^{2}-1=0, \end{aligned}$$
(30)

where E is defined in (18).

Theorem 5.6

Let \(L_0:L^2([0,1],{{\mathbb {C}}})\rightarrow L^2([0,1],{{\mathbb {C}}})\) be an integral operator with the stochastic kernel \(k_0\in L^2([0,1]^2,{{\mathbb {R}}})\). Suppose that \(L_0\) satisfies (A1) of Theorem 2.2 and that there is a \(\Xi '\subset \Xi (F_l)\) with \(m(\Xi ')>0\), and \(\int _{F_l^y} E(x,y)^+\ \mathrm{d}x>0\) and \(\int _{F_l^y} E(x,y)^-\ \mathrm{d}x>0\) for \(y\in \Xi '\). Then, the unique solution to Problem B is

$$\begin{aligned} {\dot{k}}(x,y)= {\left\{ \begin{array}{ll} \frac{1}{\alpha }\left( \frac{1}{m(F_l^y)}\int _{F_l^y} E(x,y)\mathrm{d}x - E(x,y)\right) &{}\quad (x,y)\in F_l\\ 0 &{}\quad \text {otherwise}, \end{array}\right. } \end{aligned}$$
(31)

where E is given in (18) and \(\alpha >0\) is selected so that \(\Vert {\dot{k}}\Vert _{L^2([0,1]^2,{{\mathbb {R}}})}=1\). Furthermore, if \(k_0\in L^\infty ([0,1]^2,{{\mathbb {R}}})\) then \({\dot{k}}\in L^\infty ([0,1]^2,{{\mathbb {R}}})\).

Proof

See Appendix B. \(\square \)

If \(\lambda _0\) is real, the optimal kernel has a simpler form:

Corollary 5.7

If \(\lambda _{0}\) is real and \( k_0\ge l\), then the solution to Problem B is

$$\begin{aligned} {\dot{k}}(x,y) = sgn(\lambda _0) \frac{e(y)}{\Vert e\Vert _2}\left( \frac{\langle {\hat{e}},{\mathbf {1}}\rangle _{L^2([0,1],{{\mathbb {R}}})}{\mathbf {1}} - {\hat{e}}(x)}{\Vert \langle {\hat{e}},{\mathbf {1}}\rangle _{L^2([0,1],{{\mathbb {R}}})} {\mathbf {1}}-{\hat{e}}\Vert _2}\right) . \end{aligned}$$
(32)

Proof

We have \(E(x,y) = \lambda _0{\hat{e}}(x)e(y)\); thus, the solution to the optimisation problem (29)- (30) is

$$\begin{aligned} {\dot{k}}(x,y) = (\lambda _{0}/\alpha )\left( \int _0^1 {\hat{e}}(x)\mathrm{d}x - {\hat{e}} (x)\right) e(y), \end{aligned}$$

where \(\alpha >0\) is the normalisation constant such that \(\Vert {\dot{k}} \Vert _{L^2([0,1]^2,{{\mathbb {R}}})}^2=1\). \(\square \)

6 Linear Response for Map Perturbations

In this section, we consider random dynamics governed by the composition of a deterministic map \( T_{\delta }\), \(\delta \in [0,{\bar{\delta }})\), and additive i.i.d. stochastic perturbations, or “additive noise”. We will assume that the noise is distributed according to a certain Lipschitz kernel \(\rho \) and impose a reflecting boundary condition that ensures that the dynamics remain in the interval [0, 1]. More precisely, we consider a random dynamical system whose trajectories are given by

$$\begin{aligned} x_{n+1}=T_{\delta }(x_{n})\ \hat{+}\ \omega _{n}, \end{aligned}$$
(33)

where \(\hat{+}\) is the “boundary reflecting" sum, defined by \(a\hat{+}b:=\pi (a+b)\), and \(\pi :{\mathbb {R}}\rightarrow [0,1]\) is the piecewise linear map \(\pi (x)=\min _{i\in {\mathbb {Z}}}|x-2i|\). We assume throughout that

  1. (T1)

    \(T_{\delta }:[0,1]\rightarrow [0,1]\) is a Borel-measurable map for each \(\delta \in [0,{{\bar{\delta }}})\),

  2. (T2)

    \(\omega _{n}\) is an i.i.d. process distributed according to a probability density \(\rho \in Lip({\mathbb {R}})\), supported on \([-1,1]\) with Lipschitz constant K.

6.1 Expressing the Map Perturbation as a Kernel Perturbation

In this subsection we describe precisely the kernel of the transfer operator of the system (33). Associated with the process (33) is an integral-type transfer operator \(L_\delta \), which we will derive (following the method of §10.5 in Lasota and Mackey 1985). Noting that \(|\pi '(z)|=1\) for all \(z\in {\mathbb {R}}\), the Perron-Frobenius operator \(P_\pi :L^1({\mathbb {R}})\rightarrow L^1([0,1])\) associated to the map \(\pi \) is given by

$$\begin{aligned} P_\pi f(x) = \sum _{z\in \pi ^{-1}(x)}f(z)=\sum _{i\in 2{\mathbb {Z}}}(f(i+x)+f(i-x)). \end{aligned}$$
(34)

For \(b\in {{\mathbb {R}}}\) consider the shift operator \(\tau _b\) defined by \((\tau _b g)(y):=g(y+b)\) for \(g\in Lip({\mathbb {R}})\). For the process (33), suppose that \(x_n\) has the distribution \(f_n:[0,1]\rightarrow {\mathbb {R}}^+\) (i.e. \(f_n\in L^1,\ f_n\ge 0\) and \(\int f_n\ \mathrm{d}m =1\)). We note that \(T_\delta (x_n)\) and \(\omega _{n}\) are independent and thus the joint density of \((x_n,\omega _n)\in [0,1]\times [-1,1]\) is \(f_n\cdot \rho \). Let \(h:[0,1]\rightarrow {{\mathbb {R}}}\) be a bounded, measurable function and let \({\mathbb {E}}\) denote expectation with respect to Lebesgue measure; we then compute

$$\begin{aligned} \begin{aligned} {\mathbb {E}}(h(x_{n+1}))&= \int _{-\infty }^{\infty }\int _0^1 h(\pi (T_\delta (y)+z))f_n(y)\rho (z)\mathrm{d}y\mathrm{d}z\\&= \int _0^1\int _{-\infty }^{\infty } h(\pi (z'))f_n(y)\rho (z'-T_\delta (y))\mathrm{d}z'\mathrm{d}y\\&= \int _0^1 f_n(y)\int _{-\infty }^{\infty } h(\pi (z'))(\tau _{-T_\delta (y)}\rho )(z')\mathrm{d}z'\mathrm{d}y\\&= \int _0^1 f_n(y)\int _{0}^{1} h(z') (P_\pi \tau _{-T_\delta (y)}\rho )(z')\mathrm{d}z' \mathrm{d}y, \end{aligned} \end{aligned}$$

where the last equality follows from the duality of the Perron-Frobenius and the Koopman operators for \(\pi \). Since \({\mathbb {E}}(h(x_{n+1})) = \int _{0}^1 h(x) f_{n+1}(x)\mathrm{d}x\), and h is arbitrary, the map \(f_n\mapsto f_{n+1}\) is given by

$$\begin{aligned} f_{n+1}(z')=\int _0^1 (P_\pi \tau _{-T_\delta (y)}\rho )(z') f_n(y)\mathrm{d}y \end{aligned}$$

for all \(z'\in [0,1]\). Thus, for \(\delta \in [0,{\bar{\delta }})\) the integral operator \(L_\delta :L^2([0,1])\rightarrow L^2([0,1])\) associated to the process (33) is given by

$$\begin{aligned} L_{\delta }f(x)=\int k_{\delta }(x,y)f(y)\mathrm{d}y, \end{aligned}$$
(35)

where

$$\begin{aligned} k_{\delta }(x,y)=(P_\pi \tau _{-T_\delta (y)}\rho )(x) \end{aligned}$$
(36)

and \(x,y\in [0,1]\).

Lemma 6.1

The kernel (36) is a stochastic kernel in \(L^\infty ([0,1]^2)\).

Proof

Stochasticity and nonnegativity of \(k_\delta \) follow from stochasticity and nonnegativity of \(\rho \) and the fact that Perron-Frobenius operators preserve these properties. Essential boundedness of \(k_\delta \) follows from the facts that \(\rho \) is Lipschitz (thus essentially bounded), \(\tau \) is a shift, and \(P_\pi \) is constructed from a finite sum because \(\rho \) has compact support. \(\square \)

Proposition 6.2

Assume that \(k_{\delta }\) arising from the system \((T_{\delta },\rho )\) is given by (36). Suppose that the family of interval maps \(\{T_\delta \}_{\delta \in [0,{{\bar{\delta }}})}\) satisfies

$$\begin{aligned} T_{\delta }=T_{0}+\delta \cdot {\dot{T}} +t_\delta , \end{aligned}$$

where \({\dot{T}},t_\delta \in L^{2}\) and \(\Vert t_\delta \Vert _{2} =o(\delta )\). Then

$$\begin{aligned} k_{\delta }=k_{0}+\delta \cdot {\dot{k}}+r_\delta \end{aligned}$$

where \({\dot{k}}\in L^2([0,1]^2)\) is given by

$$\begin{aligned} {\dot{k}}(x,y)=-\left( P_\pi \left( \tau _{-T_0(y)}\frac{\mathrm{d}\rho }{\mathrm{d}x}\right) \right) (x)\cdot {\dot{T}}(y) \end{aligned}$$
(37)

and \(r_\delta \in L^{2}([0,1]^{2})\) satisfies \(\Vert r_\delta \Vert _{L^2([0,1]^2)}=o(\delta )\).

If additionally, \(\mathrm{d}\rho /\mathrm{d}x\) is Lipschitz and the derivative of the map \(\delta \mapsto T_\delta \) with respect to \(\delta \) varies continuously in \(L^2\) in a neighborhood of \(\delta =0\), then \(\delta \mapsto k_\delta \) has a continuous derivative with respect to \(\delta \) in a neighborhood of \(\delta =0\).

Proof

We show that \(\Vert k_{\delta }(x,y)-k_{0}(x,y) - \delta \cdot {\dot{k}}(x,y)\Vert _{L^2([0,1]^2)}=o(\delta )\), where \({\dot{k}}\) is as in (37). We have

$$\begin{aligned}&\left\| k_{\delta }(x,y)-k_{0}(x,y) - \delta \cdot {\dot{k}}(x,y)\right\| _{L^2([0,1]^2)}\nonumber \\&\quad \le \left\| (P_\pi \tau _{-T_\delta (y)}\rho )(x) - (P_\pi \tau _{-(T_0(y) +\delta \cdot {\dot{T}}(y))}\rho )(x)\right\| _{L^2([0,1]^2)}\nonumber \\&\qquad +\left\| (P_\pi \tau _{-(T_0(y)+\delta \cdot {\dot{T}}(y))}\rho )(x)\right. \nonumber \\&\left. - (P_\pi \tau _{-T_0(y)}\rho )(x) - \delta \left( -\left( P_\pi \left( \tau _{-T_0(y)}\frac{\mathrm{d}\rho }{\mathrm{d}x}\right) \right) (x)\cdot {\dot{T}}(y)\right) \right\| _{L^2([0,1]^2)}. \end{aligned}$$
(38)

We begin by showing that the first term on the right hand side of (38) is \(o(\delta )\). Since \(\rho \) is Lipschitz with constant K, one has

$$\begin{aligned}&\big |(\tau _{-(T_\delta (y))}\rho )(x)-(\tau _{-(T_0(y)+\delta \cdot {\dot{T}}(y))}\rho )(x)\big |\nonumber \\&\quad = \big |\rho (x-T_\delta (y))-\rho (x-T_0(y)-\delta \cdot {\dot{T}}(y))\big |\le K|t_\delta (y)|. \end{aligned}$$
(39)

Because the support of \(\tau _{-(T_\delta (y))}\rho -\tau _{-(T_0(y)+\delta \cdot {\dot{T}}(y))}\rho \) is contained in 2 intervals, each of length 2, by (39) and Lemma C.1, we therefore see that

$$\begin{aligned} \left\| (P_\pi \tau _{-T_\delta (y)}\rho )(x) - (P_\pi \tau _{-(T_0(y)+\delta \cdot {\dot{T}}(y))}\rho )(x)\right\| _{L^2([0,1]^2)}\le 6K\Vert t_\delta \Vert _{L^2}=o(\delta ). \end{aligned}$$

Next we show that the second term on the right hand side of (38) is \(o(\delta )\). Using the definition of the derivative and the fact that \(\rho \) is differentiable a.e. we see that

$$\begin{aligned} \lim _{\delta \rightarrow 0}D(\delta )&:=\lim _{\delta \rightarrow 0}\left[ \frac{\rho (x-T_{0}(y)-\delta \cdot {\dot{T}} (y))-\rho (x-T_{0}(y))}{\delta }\right. \nonumber \\&\quad \left. -\left( -\frac{\mathrm{d}\rho }{\mathrm{d}x}(x-T_{0}(y))\dot{T }(y)\right) \right] =0 \end{aligned}$$
(40)

for a.e. xy. Since \(\bigg |\frac{\rho (x-T_{0}(y)-\delta \cdot {\dot{T}} (y))-\rho (x-T_{0}(y))}{\delta }\bigg |\le K{\dot{T}}(y)\), by dominated convergence the limit (40) also converges in \(L^{2}.\) Hence, applying Lemma C.1 to the second term on the right hand side of (38), noting that \(D(\delta )\) in (40) is square-integrable and supported in at most 3 intervals of length at most 2, we obtain

$$\begin{aligned}&\left\| (P_\pi \tau _{-(T_0(y)+\delta \cdot {\dot{T}}(y))}\rho )(x) - (P_\pi \tau _{-T_0(y)}\rho )(x)\right. \\&\left. - \delta \left( -\left( P_\pi \left( \tau _{-T_0(y)}\frac{\mathrm{d}\rho }{\mathrm{d}x}\right) \right) (x)\cdot {\dot{T}}(y)\right) \right\| _{L^2([0,1]^2)}\\&\quad \le 9\delta D(\delta )=o(\delta ). \end{aligned}$$

Regarding the final statement, suppose that \(\delta \mapsto T_\delta \) has a continuous derivative with respect to \(\delta \) at a neighborhood of \(\delta =0\). This implies that \({\dot{T}}\) exists and varies continuously on a small interval \([0,\delta ^*]\), with \(0<\delta ^*\le {\bar{\delta }}\). Denote the derivative \(\mathrm{d}T_\delta /\mathrm{d}\delta \) at \(\delta \) by \({\dot{T}}_\delta \), and similarly for \({\dot{k}}\). One has

$$\begin{aligned}&\Vert {\dot{k}}_\delta -{\dot{k}}_0\Vert _{L^2([0,1]^2)}\\&\quad =\left\| \left( P_\pi \left( \tau _{-T_\delta (y)}\frac{\mathrm{d}\rho }{\mathrm{d}x}\right) \right) (x) \cdot {\dot{T}}_\delta (y)-\left( P_\pi \left( \tau _{-T_0(y)}\frac{\mathrm{d}\rho }{\mathrm{d}x} \right) \right) (x)\cdot {\dot{T}}_0(y)\right\| _{L^2([0,1]^2)}\\&\quad \le \left\| \left( P_\pi \left( \tau _{-T_\delta (y)}\frac{\mathrm{d}\rho }{\mathrm{d}x}\right) \right) (x)\cdot ({\dot{T}}_\delta (y)-{\dot{T}}_0(y))\right\| _{L^2([0,1]^2)}\\&\qquad +\left\| \left[ \left( P_\pi \left( \tau _{-T_\delta (y)}\frac{\mathrm{d}\rho }{\mathrm{d}x}\right) \right) (x)-\left( P_\pi \left( \tau _{-T_0(y)}\frac{\mathrm{d}\rho }{\mathrm{d}x}\right) \right) (x) \right] \cdot {\dot{T}}_0(y)\right\| _{L^2([0,1]^2)}\\&\quad \le 3\Vert \mathrm{d}\rho /\mathrm{d}x\Vert _2\Vert {\dot{T}}_\delta -{\dot{T}}_0\Vert _2+6\mathrm{Lip}(\mathrm{d}\rho /\mathrm{d}x) \Vert \delta \cdot {\dot{T}}_0+r_\delta \Vert _2\Vert {\dot{T}}_0\Vert _2, \end{aligned}$$

where the final inequality follows from Lemma C.1 applied to each term in the previous line, noting that \(\rho \) is supported in a single interval of length 2. The first term in the final inequality goes to zero as \(\delta \rightarrow 0\) by continuity of \({\dot{T}}\), and the second term goes to zero as \(\delta \rightarrow 0\) since \(\Vert r_\delta \Vert _2\rightarrow 0\). \(\square \)

6.2 A Formula for the Linear Response of the Invariant Probability Density and Continuity with Respect to Map Perturbations

By considering the kernel form of map perturbations, we can apply Corollary 3.5 to obtain the following.

Proposition 6.3

Let \(L_\delta :L^2\rightarrow L^2\), \(\delta \in [0,{\bar{\delta }})\), be the integral operators in (35) with the kernels \(k_\delta \) as in (36). Suppose that \(L_0\) satisfies (A1) of Theorem 2.2. Then the kernel \({\dot{k}}\) in (37) is in \(V_{\ker }\) and

$$\begin{aligned} \lim _{\delta \rightarrow 0}\frac{f_{\delta }-f_{0}}{\delta } = -(\text {Id}-L_{0})^{-1}\int _0^1 \left( P_\pi \left( \tau _{-T_0(y)}\frac{\mathrm{d}\rho }{\mathrm{d}x}\right) \right) (x) {\dot{T}}(y)f_{0}(y)\mathrm{d}y, \end{aligned}$$

with convergence in \(L^{2}.\)

Proof

The result is a direct application of Corollary 3.5; we verify its assumptions. From Lemma 6.1, \(k_\delta \in L^2([0,1]^2)\) is a stochastic kernel and so \(L_\delta \) is an integral-preserving compact operator. From Proposition 6.2, \(k_\delta \) has the form (9). Thus, we can apply Corollary 3.5 to obtain the result. \(\square \)

Remark 6.4

If T is coveringFootnote 4 and \(\rho \) is strictly positive in a neighbourhood of zero one can show the corresponding transfer operator \(L_0\) satisfies assumption (A1) of Theorem 2.2, using arguments similar to e.g. Zmarrou and Homburg (2007) Proposition 8.1, Froyland (2013) Lemmas 3 and 10, or Galatolo and Giulietti (2019), Lemma 41. Let \(f\in L^1\) have zero average: \(\int _{[0,1]} f=0\). If f is 0 almost everywhere, \(L_0^n(f)= 0\) and we are done. Otherwise, given \(\epsilon <0\), we can find an \(f_1\) such that \(\Vert f-f_1\Vert _1<\epsilon \) and \(f_1\) is positive in some small interval \(I\subset [0,1]\). Since \(\rho \) is positive in a neighbourhood of zero, \({\mathrm {supp}}(L_0(f_1^+))\supset T(I)\). By the covering condition there is some \(n'\in {{\mathbb {N}}}\) such that \({\mathrm {supp}}(L_0^{n'}(f_1^+))=[0,1]\). It is then standard to deduce that there is an \(n_0\ge n'\) such that \(\Vert L_0^n(f_1)\Vert _1<\epsilon \) for \(n\ge n_0\). Since the transfer operator contracts the \(L^1\) norm, then \(||L_0^n f ||_1 \le 2 \epsilon \) for \(n\ge n_0\) and since \(\epsilon \) was arbitrary, this implies that \(L_0\) satisfies (A1).

Let the linear response \({\widehat{R}}:L^2\rightarrow L^2\) of the invariant density be defined as

$$\begin{aligned} {\widehat{R}}({\dot{T}}):= -(\text {Id}-L_{0})^{-1}\int _0^1 \left( P_\pi \left( \tau _{-T_0(y)}\frac{\mathrm{d}\rho }{\mathrm{d}x}\right) \right) (x) {\dot{T}}(y)f_{0}(y)\mathrm{d}y. \end{aligned}$$
(41)

Lemma 6.5

The function \({\widehat{R}}:L^2\rightarrow L^2\) is continuous.

Proof

We have

$$\begin{aligned} {\widehat{R}}({\dot{T}}_1)-{\widehat{R}}({\dot{T}}_2)= -(\text {Id}-L_0)^{-1}\int _0^1 {\tilde{k}}(x,y)\left( {\dot{T}}_1(y)-{\dot{T}}_2(y)\right) \mathrm{d}y, \end{aligned}$$

where \({\tilde{k}}(x,y) := \left( P_\pi \left( \tau _{-T_0(y)}\frac{\mathrm{d}\rho }{\mathrm{d}x}\right) \right) (x)f_0(y)\). Since \(\frac{\mathrm{d}\rho }{\mathrm{d}x}\in L^\infty \), we have \(\left( P_\pi \left( \tau _{-T_0(y)}\frac{\mathrm{d}\rho }{\mathrm{d}x}\right) \right) (x)\in L^\infty ([0,1]^2)\). From inequality (7), we then have \(f_0\in L^\infty \) and so \({\tilde{k}}\in L^\infty ([0,1]^2)\). We finally have

$$\begin{aligned} \Vert {\widehat{R}}({\dot{T}}_1)-{\widehat{R}}({\dot{T}}_2)\Vert _2 \le l \Vert (\text {Id} -L_0)^{-1}\Vert _{V\rightarrow V} \Vert {\tilde{k}}\Vert _{L^2([0,1]^2)}\cdot \Vert {\dot{T}}_1-{\dot{T}}_2\Vert _2. \end{aligned}$$

\(\square \)

6.3 A Formula for the Linear Response of the Dominant Eigenvalues and Continuity with Respect to Map Perturbations

We are also able to express the linear response of the dominant eigenvalues as a function of the perturbing map \({\dot{T}}\). Define

$$\begin{aligned} H(y)=-{\bar{e}}(y)\int _0^1\left( P_\pi \left( \tau _{-T_0(y)}\frac{\mathrm{d}\rho }{\mathrm{d}x} \right) \right) (x){\hat{e}}(x)\mathrm{d}x. \end{aligned}$$

Proposition 6.6

Let \(L_\delta :L^2([0,1],{{\mathbb {C}}})\rightarrow L^2([0,1],{{\mathbb {C}}})\), \(\delta \in [0,{\bar{\delta }})\), be integral operators generated by the kernels \(k_\delta \) as in (36), assume that \(\mathrm{d}\rho /\mathrm{d}x\) is Lipschitz and \(\delta \mapsto T_\delta \) is \(C^1\). Let \(\lambda _{\delta }\) be an eigenvalue of \(L_\delta \) with second largest magnitude strictly inside the unit disk. Suppose that \(L_{0}\) satisfies (A1) of Theorem 2.2 and \(\lambda _{0}\) is geometrically simple. Then

$$\begin{aligned} \frac{\mathrm{d}\lambda _{\delta }}{\mathrm{d}\delta }\bigg |_{\delta =0} = \langle H, {\dot{T}} \rangle _{L^2([0,1],{\mathbb {C}})}, \end{aligned}$$
(42)

where e is the eigenvector of \(L_0\) associated to the eigenvalue \( \lambda _{0} \) and \({\hat{e}}\) is the eigenvector of \(L_0^*\) associated to the eigenvalue \(\lambda _{0}\).

Proof

Since \(k_\delta \in L^2([0,1]^2,{{\mathbb {R}}})\), \(L_\delta :L^2([0,1],{{\mathbb {C}}})\rightarrow L^2([0,1],{{\mathbb {C}}})\) is compact. From Lemma 6.1 we have that \(k_\delta \) is a stochastic kernel and so \(L_\delta \) preserves the integral (i.e. it satisfies (3)). By Proposition 6.2 the kernel \(k_\delta \) is in the form (9) and the map \(\delta \mapsto k_\delta \) is \(C^1\). By Lemma 3.4 we see that \(\delta \mapsto L_\delta \) is \(C^1\), where the derivative operator \({\dot{L}}\) is the integral operator with the kernel \({\dot{k}}\). Using the assumption that \(L_0\) is mixing and \(\lambda _{0}\) is geometrically simple, we apply Proposition 2.6 to obtain \(\frac{\mathrm{d}\lambda _{\delta }}{\mathrm{d}\delta }\big |_{\delta =0} = \langle {\hat{e}}, {\dot{L}} e\rangle _{L^2([0,1],{\mathbb {C}})}\). Finally, we compute

$$\begin{aligned} \begin{aligned} \langle {\hat{e}}, {\dot{L}}e\rangle _{L^2([0,1],{\mathbb {C}})}&= \int _0^1{\hat{e}}(x)\overline{\int _0^1 {\dot{k}}(x,y)e(y)\mathrm{d}y}\mathrm{d}x\\&= \int _0^1\int _0^1{\hat{e}}(x){\dot{k}}(x,y){\bar{e}}(y)\mathrm{d}x\mathrm{d}y\\&= -\int _0^1 {\bar{e}}(y)\int _0^1\left( P_\pi \left( \tau _{-T_0(y)}\frac{\mathrm{d}\rho }{\mathrm{d}x}\right) \right) (x){\hat{e}}(x)\mathrm{d}x\ {\dot{T}}(y)\mathrm{d}y\\&= \langle H, {\dot{T}}\rangle _{L^2([0,1],{\mathbb {C}})}. \end{aligned} \end{aligned}$$

\(\square \)

From (42), the linear response of the dominant eigenvalues is continuous with respect to map perturbations.

Lemma 6.7

The eigenvalue response function \({\check{R}}:L^2\rightarrow {\mathbb {C}}\) given by \({\check{R}}({\dot{T}})=\langle H,{\dot{T}}\rangle \) is continuous.

Proof

This follows from Cauchy-Schwarz and the fact that \(H\in L^2([0,1],{{\mathbb {C}}})\); the latter claim follows from the fact that \(\left( P_\pi \left( \tau _{-T_0(y)}\frac{\mathrm{d}\rho }{\mathrm{d}x}\right) \right) (x)\in L^\infty ([0,1]^2,{{\mathbb {R}}})\) (see proof of Lemma 6.5) and that \(e,{\hat{e}}\in L^\infty ([0,1],{{\mathbb {C}}})\) (which follows from (7) and the fact that \(k_0\in L^\infty ([0,1]^2,{{\mathbb {R}}})\), see Lemma 6.1). \(\square \)

7 Optimal Linear Response for Map Perturbations

In this section, we derive formulae for the map perturbations that maximise our two types of linear response. We begin by formalising the set of allowable map perturbations then state the formulae.

7.1 The Feasible Set of Map Perturbations

Before we formulate the optimisation problem, we note that in this setting, we require some restriction on the space of allowable perturbations to \(T_0\) if we are to interpret \(T_0+\delta {\dot{T}}\) as a map of the unit interval for some \(\delta \) strictly greater than 0 (a non-infinitesimal map perturbation). With this in mind, let \(\ell >0\) and \({\widetilde{F}}_\ell :=\{x\in [0,1]:\ell \le T_0(x)\le 1-\ell \}\); it will turn out that we obtain for free that \({\dot{T}}\in L^\infty \). Note that in principle, \(\ell >0\) can be taken as small as one likes, and indeed if one wishes to consider only infinitesimal map perturbations \({\dot{T}}\) then one may set \({\widetilde{F}}_\ell ={\widetilde{F}}_0=[0,1]\). Of course if \(T:S^1\rightarrow S^1\) then may may use \({\widetilde{F}}_\ell ={\widetilde{F}}_0=[0,1]\) even for non-infinitesimal perturbations. Recalling that in Proposition 6.2 we are considering \(L^2\) perturbations \({\dot{T}}\) of the map \(T_0\), we define

$$\begin{aligned} S_{T_0,\ell }:= \{T\in L^2: \text {supp}(T)\subseteq {\widetilde{F}}_\ell \}. \end{aligned}$$
(43)

Lemma 7.1

\(S_{T_0,\ell }\) is a closed subspace of \(L^2\).

Proof

It is clear that \(S_{T_0,\ell }\) is a subspace. To show it is closed, let \(\{f_n\}\subset S_{T_0,\ell }\) and suppose that \(f_n\rightarrow _{L^2} f\in L^2\). Further, suppose that \({\widetilde{F}}_\ell \) is not [0, 1] up to measure zero; otherwise \(S_{T_0,\ell }=L^2\), which is closed. Then, we have

$$\begin{aligned} \Vert f_n-f\Vert _2^2=\int _{{\widetilde{F}}_\ell }(f_n(x)-f(x))^2\mathrm{d}x + \int _{{\widetilde{F}}_\ell ^c}f(x)^2\ \mathrm{d}x\rightarrow 0. \end{aligned}$$

If \(\int _{{\widetilde{F}}_\ell ^c}f(x)^2\mathrm{d}x>0\), we obtain a contradiction since \(\int _{{\widetilde{F}}_\ell }(f_n(x)-f(x))^2\mathrm{d}x\ge 0\); thus, \(\int _{{\widetilde{F}}_\ell ^c}f(x)^2\mathrm{d}x=0\) and so \(f=0\) a.e. on \({\widetilde{F}}_\ell ^c\). Hence, \(S_{T_0,\ell }\) is closed. \(\square \)

For the remainder of this section, the set of allowable map perturbations that we consider is

$$\begin{aligned} P_\ell := S_{T_0,\ell }\cap B_1, \end{aligned}$$
(44)

where \(B_1\) is the unit ball in \(L^2\). Since \(S_{T_0,\ell }\) is a closed subspace of \(L^2\), it is itself a Hilbert space and so \(P_\ell \) is strictly convex. The following lemma concerns the existence of a perturbation \({\dot{T}}\) for which our objectives will be nonzero; that is, our objective \({\mathcal {J}}\) is not uniformly vanishing. Denote \({\mathcal {P}}(x,y):=P_\pi \left( \tau _{-T_0(y)}\frac{\mathrm{d}\rho }{\mathrm{d}x}\right) (x)\) and let

$$\begin{aligned} {\mathcal {J}}({\dot{T}}):=\int _{\Xi ({\widetilde{F}}_\ell )}\int _0^1 {\mathcal {P}}(x,y){\dot{T}}(y){\mathcal {E}}(x,y)\ \mathrm{d}x\ \mathrm{d}y \end{aligned}$$

be our objective. In our first specific objective (optimising response of expectations) we will insert \({\mathcal {E}}(x,y)=((\text {Id}-L_0^*)^{-1}c)(x)f_0(y)\) and in our second specific objective (optimising mixing) we will insert \({\mathcal {E}}(x,y)=E(x,y)\) from (18).

Lemma 7.2

Assume that there is \(F'\subset {\widetilde{F}}_\ell \) such that \(m(F')>0\) and \({\mathcal {E}}(\cdot ,y)\notin {\mathrm {span}}\{{\mathcal {P}}(\cdot ,y)\}^\perp \) for all \(y\in F'\). Then there is a \({\dot{T}}\in P_\ell \) such that \({\mathcal {J}}({\dot{T}})>0\).

Proof

Because

$$\begin{aligned} {\mathcal {J}}({\dot{T}})=\int _{\Xi ({\widetilde{F}}_\ell )}{\dot{T}}(y) \left( \int _0^1{\mathcal {P}}(x,y){\mathcal {E}}(x,y)\ \mathrm{d}x\right) \ \mathrm{d}y, \end{aligned}$$

we may set \({\dot{T}}(y)=\int _0^1{\mathcal {P}}(x,y){\mathcal {E}}(x,y)\ \mathrm{d}x\) for \(y\in F'\) and \({\dot{T}}(y)=0\) otherwise to obtain \({\mathcal {J}}({\dot{T}})>0\). Trivial scaling yields \({\dot{T}}\in B_1\). \(\square \)

We expect the hypotheses of Lemma 7.2 to be satisfied “generically”.

7.2 Explicit Formula for the Optimal Map Perturbation that Maximally Increases the Expectation of an Observable

In this section, we consider the problem of finding the optimal map perturbation that maximises the expectation of some observable \(c\in L^2\). We first present a result that ensures a unique solution exists and then derive an explicit expression for the optimal map perturbation.

We begin by noting that \({\widehat{R}}({\dot{T}})\in V\); this follows from the fact that \(\left( P_\pi \left( \tau _{-T_0(y)}\frac{\mathrm{d}\rho }{\mathrm{d}x}\right) \right) (x)f_0(y)\in V_{\ker }\) (since \({\dot{k}}\in V_{\ker }\), see Proposition 6.3) and therefore \(\int _0^1 \left( P_\pi \left( \tau _{-T_0(y)}\frac{\mathrm{d}\rho }{\mathrm{d}x}\right) \right) (x)f_0(y) g(y)\mathrm{d}y\in V\) for \(g\in L^2\) (see Lemma 3.2). Hence, we only need to consider \(c\in \) span\(\{f_0\}^\perp \) (see the discussion at the end of Sect. 4.2).

Proposition 7.3

Let \(c\in \) span\(\{f_0\}^\perp \) and \(P_{\ell }\) be the set in (44). Assume that the function \({\mathcal {J}}({\dot{T}}):=\big \langle c,{\widehat{R}}({\dot{T}})\big \rangle _{L^{2}([0,1],{{\mathbb {R}}})}\) is not uniformly vanishing on \(P_\ell \). Then the optimisation problem

$$\begin{aligned} \big \langle c,{\widehat{R}}({\dot{T}})\big \rangle _{L^{2}([0,1],{{\mathbb {R}}})}=\max _{{\dot{h}} \in P_{\ell }}\big \langle c,{\widehat{R}}({\dot{h}})\big \rangle _{L^{2}([0,1],{{\mathbb {R}}})}, \end{aligned}$$
(45)

where \({\widehat{R}}\) is as in (41), has a unique solution \({\dot{T}}\in L^2\).

Proof

Let \({\mathcal {H}} = L^2\), \(P=P_\ell \) and \({\mathcal {J}}({\dot{h}}) = \langle c,{\widehat{R}}({\dot{h}})\rangle _{L^2([0,1],{{\mathbb {R}}})}\). Using Lemma 7.1 we note that \(P_\ell \) is closed, as well as bounded, strictly convex and that it contains the zero element of \({\mathcal {H}}\). From Lemma 6.5, it follows that \(\langle c,{\widehat{R}}({\dot{h}} )\rangle _{L^2([0,1],{{\mathbb {R}}})}\) is continuous as a function of \({\dot{h}}\); note that it is also linear in \({\dot{h}}\). By hypothesis, \({\mathcal {J}}\) is not uniformly vanishing on \(P_\ell \). We can therefore apply Propositions 4.1 and 4.3 to conclude that (45) has a unique solution. \(\square \)

Before we present the explicit formula for the optimal solution, we will reformulate the optimisation problem (45) to simplify the analysis. We first note that since the objective function in (45) is linear in \({\dot{T}}\), the maximum will occur on \(S_{T_0,\ell }\cap \partial B_1\). Combining this with the fact that we only need \(c\in \) span\(\{f_0\}^\perp \), we consider the following reformulation of (45):

Problem C

Given \(\ell \ge 0 \) and \(c\in \) span\(\{f_0\}^\perp \) solve

$$\begin{aligned} \min _{{\dot{T}}\in S_{T_0,\ell }}&-\big \langle c,{\widehat{R}}({\dot{T}})\big \rangle _{L^2([0,1],{{\mathbb {R}}})} \end{aligned}$$
(46)
$$\begin{aligned} \text{ subject } \text{ to }&\Vert {\dot{T}}\Vert ^2_2-1=0. \end{aligned}$$
(47)

Theorem 7.4

Suppose the transfer operator \(L_0\) associated with the system \((T_0,\rho )\) has a kernel \(k_0\) as in (36), which satisfies (A1) of Theorem 2.2, and there is a \(F'\subset {\widetilde{F}}_\ell \) such that \(m(F')>0\), and \(f_0(y)>0\) and \((\text {Id}-L_0^*)^{-1}c\notin {\mathrm {span}}\{{\mathcal {P}}(\cdot ,y)\}^\perp \) for all \(y\in F'\). Let \({\mathcal {G}}:L^2\rightarrow L^2\) be defined as

$$\begin{aligned} {\mathcal {G}}f(y) := \int _0^1 \left( P_\pi \left( \tau _{-T_0(y)}\frac{\mathrm{d}\rho }{\mathrm{d}x}\right) \right) (x)f(x)\mathrm{d}x. \end{aligned}$$
(48)

Then, the unique solution to Problem C is

$$\begin{aligned} {\dot{T}} (y) = {\left\{ \begin{array}{ll} -f_0(y){\mathcal {G}}((\text {Id}-L_0^*)^{-1}c)(y)/\Vert f_0{\mathcal {G}} ((\text {Id}-L_0^*)^{-1}c){\mathbf {1}}_{{\widetilde{F}}_\ell }\Vert _2 &{}\quad y\in {\widetilde{F}}_\ell , \\ 0 &{}\quad \text {otherwise}. \end{array}\right. } \end{aligned}$$
(49)

Furthermore, \({\dot{T}}\in L^\infty \).

Proof

See Appendix D. \(\square \)

7.3 Explicit Formula for the Optimal Map Perturbation that Maximally Increases the Mixing Rate

In this section, we set up the optimisation problem for mixing enhancement and derive a formula for the optimal map perturbation. We remark that related spectral approaches to mixing enhancement for continuous-time flows were developed in Froyland and Santitissadeekorn (2017), Froyland et al. (2020).

Recall that to enhance mixing in Sect. 5.2, we perturbed \(k_0\) so that the logarithm of the real part of the second eigenvalue decreases. From Lemma 4.4, we have

$$\begin{aligned} \frac{\mathrm{d}}{\mathrm{d}\delta }\Re (\log \lambda _\delta )\bigg |_{\delta =0}= \frac{\langle {\dot{k}}, E\rangle _{L^2([0,1]^2,{{\mathbb {R}}})}}{|\lambda _{0}|^2}, \end{aligned}$$
(50)

where \(\lambda _\delta \) denotes the second largest eigenvalue in magnitude (assumed to be simple) of the integral operator \(L_\delta \) with the kernel \(k_\delta = k_0+\delta \cdot {\dot{k}} + o(\delta )\), where \(\delta \mapsto k_\delta \) is \(C^1\) at \(\delta =0\). Since we want to perturb \(T_0\) by \({\dot{T}}\), we reformulate the above inner product. Define

$$\begin{aligned} {\widehat{E}}(y)= -\int _0^1 \left( P_\pi \left( \tau _{-T_0(y)}\frac{\mathrm{d}\rho }{\mathrm{d}x}\right) \right) (x)E(x,y) \mathrm{d}x, \end{aligned}$$
(51)

where E(xy) is as in (18).

Proposition 7.5

Let \(L_\delta :L^2([0,1],{{\mathbb {C}}})\rightarrow L^2([0,1],{{\mathbb {C}}})\), \(\delta \in [0,{\bar{\delta }})\), be integral operators generated by the kernels \(k_\delta \) as in (36), assume that \(\mathrm{d}\rho /\mathrm{d}x\) is Lipschitz and \(\delta \mapsto T_\delta \) is \(C^1\). Let \(\lambda _{\delta }\) be an eigenvalue of \(L_\delta \) with second largest magnitude strictly inside the unit disk. Suppose that \(L_{0}\) satisfies (A1) of Theorem 2.2 and \(\lambda _{0}\) is geometrically simple. Let e and \({\hat{e}}\) be the eigenvectors of \(L_0\) and \(L_0^*\), respectively, corresponding to the eigenvalue \(\lambda _{0}\). Then \({\widehat{E}}\in L^\infty ([0,1],{{\mathbb {R}}})\) and

$$\begin{aligned} \big \langle {\dot{k}},E\big \rangle _{L^2([0,1]^2,{{\mathbb {R}}})} = \big \langle {\dot{T}}, {\widehat{E}}\big \rangle _{L^2([0,1],{{\mathbb {R}}})}. \end{aligned}$$

Proof

We first show that \({\widehat{E}}\in L^\infty ([0,1],{{\mathbb {R}}})\). We can write

$$\begin{aligned} \begin{aligned}&-\int _0^1 \left( P_\pi \left( \tau _{-T_0(y)}\frac{\mathrm{d}\rho }{\mathrm{d}x}\right) \right) (x)E(x,y) \mathrm{d}x \\&\quad = -\sum _{i=1}^4\beta _ih_i(y)\int _0^1 \left( P_\pi \left( \tau _{-T_0(y)}\frac{\mathrm{d}\rho }{\mathrm{d}x}\right) \right) (x)g_i(x)\mathrm{d}x\\&\quad = -\sum _{i=1}^4\beta _ih_i(y)({\mathcal {G}}g_i)(y), \end{aligned} \end{aligned}$$

where \(\beta _1=\beta _2 = \Re (\lambda _{0})\), \(\beta _3=-\beta _4 = \Im (\lambda _{0}), g_1=g_4 = \Re ({\hat{e}}), g_2=g_3=\Im ({\hat{e}})\), \(h_1=h_3 = \Re (e),h_2=h_4 = \Im (e)\). From the proof of Theorem 7.4, we have \({\mathcal {G}}g_i \in L^\infty ([0,1],{{\mathbb {R}}})\). Also, from Lemma 6.1, we have that \(k_0\in L^\infty ([0,1]^2)\) and therefore \(h_i\in L^\infty ([0,1],{{\mathbb {R}}})\); thus, \({\widehat{E}}\in L^\infty ([0,1],{{\mathbb {R}}})\).

Finally, we compute

$$\begin{aligned} \langle {\dot{k}},E\rangle _{L^2([0,1]^2,{{\mathbb {R}}})}= & {} \int _0^1\int _0^1 {\dot{k}}(x,y)E(x,y) \mathrm{d}x \mathrm{d}y\\= & {} -\int _0^1\int _0^1 \left( P_\pi \left( \tau _{-T_0(y)}\frac{\mathrm{d}\rho }{\mathrm{d}x}\right) \right) (x){\dot{T}}(y)E(x,y)\mathrm{d}x\mathrm{d}y\\= & {} \int _0^1 {\dot{T}}(y) {\widehat{E}}(y) \mathrm{d}y = \big \langle {\dot{T}}, {\widehat{E}}\big \rangle _{L^2([0,1],{{\mathbb {R}}})}. \end{aligned}$$

\(\square \)

From equation (50), in order to maximally increase the spectral gap, by Proposition 7.5, we should choose the map perturbation \({\dot{T}}\) to minimise \(\langle {\dot{T}},{\widehat{E}}\rangle \). We first show this optimisation problem has a unique solution.

Proposition 7.6

Let \(P_\ell \) be the set in (44) and assume that \({\mathcal {J}}({\dot{T}})=\langle {\dot{T}},{\widehat{E}}\rangle \) does not uniformly vanish on \(P_\ell \). Then, the problem of finding \({\dot{T}}\in P_\ell \) such that

$$\begin{aligned} \big \langle {\dot{T}}, {\widehat{E}}\big \rangle _{L^2([0,1],{{\mathbb {R}}})}=\min _{{\dot{h}}\in P_\ell } \big \langle {\dot{h}},{\widehat{E}}\big \rangle _{L^2([0,1],{{\mathbb {R}}})} \end{aligned}$$
(52)

has a unique solution.

Proof

Note that \(P_\ell \) is closed (by Lemma 7.1), bounded, strictly convex and contains the zero element of \(L^2\). Now, since \({\mathcal {J}}({\dot{h}}) := \langle {\dot{h}},{\widehat{E}}\rangle _{L^2([0,1],{{\mathbb {R}}})}\) is linear and continuous and by hypothesis does not vanish everywhere on \(P_\ell \), we may apply Propositions 4.1 and 4.3 to obtain the result. \(\square \)

Since the objective function in (52) is linear, all optima will lie in \(S_{T_0,\ell }\cap \partial B_1\). Hence, we equivalently consider the following optimisation problem:

Problem D

Given \(\ell \ge 0\), solve

$$\begin{aligned} \min _{{\dot{T}}\in S_{T_0,\ell }}&\big \langle {\dot{T}}, {\widehat{E}}\big \rangle _{L^2([0,1],{{\mathbb {R}}})} \end{aligned}$$
(53)
$$\begin{aligned} \text{ such } \text{ that }&\Vert {\dot{T}}\Vert _{2}^{2}-1= 0. \end{aligned}$$
(54)

We now state a formula for the unique optimum.

Theorem 7.7

Let \((T_0,\rho )\) be a deterministic system with additive noise satisfying (T1) and (T2). Suppose the associated transfer operator \(L_0:L^2([0,1],{{\mathbb {C}}})\rightarrow L^2([0,1],{{\mathbb {C}}})\), with the kernel \(k_0\) as in (36), satisfies (A1) of Theorem 2.2, and that there is a \(F'\subset {\tilde{F}}_\ell \) with \(m(F')>0\) and \(E(\cdot ,y)\notin {\mathrm {span}}\{{\mathcal {P}}(\cdot ,y)\}^\perp \) for all \(y\in F'\). Suppose \(\lambda _0\) is geometrically simple. Then, the unique solution to the optimisation problem D is

$$\begin{aligned} {\dot{T}}(y) = {\left\{ \begin{array}{ll} \frac{1}{\alpha }\int _0^1 \left( P_\pi \left( \tau _{-T_0(y)}\frac{\mathrm{d}\rho }{\mathrm{d}x}\right) \right) (x)E(x,y) \mathrm{d}x &{} \quad y\in {\widetilde{F}}_\ell ,\\ 0 &{} \quad \text {otherwise}, \end{array}\right. } \end{aligned}$$
(55)

where E(xy) is as in (18) and \(\alpha >0\) is selected so that \(\Vert {\dot{T}}\Vert _2 =1\). Furthermore, \({\dot{T}}\in L^\infty \).

Proof

See Appendix E. \(\square \)

Corollary 7.8

If \(\lambda _{0}\) is real, then

$$\begin{aligned} {\dot{T}}(y) = {\left\{ \begin{array}{ll} \text {sgn}(\lambda _0)\frac{e(y)({\mathcal {G}}{\hat{e}})(y)}{\Vert e{\mathcal {G}}{\hat{e}}{\mathbf {1}}_{{\widetilde{F}}_\ell }\Vert _2} &{} \quad y\in {\widetilde{F}}_\ell ,\\ 0 &{} \quad \text {otherwise}, \end{array}\right. } \end{aligned}$$

where \({\mathcal {G}}\) is the operator in (48). Furthermore, if there exists an \(\ell >0\) such that \(\ell \le T_0(x)\le 1-\ell \) for \(x\in [0,1]\), then

$$\begin{aligned} {\dot{T}} = \text {sgn}(\lambda _0)\frac{e\cdot {\mathcal {G}}{\hat{e}}}{\Vert e\cdot {\mathcal {G}}{\hat{e}}\Vert _2}. \end{aligned}$$
(56)

Proof

Since \(e, {\hat{e}}\) and \(\lambda _0\) are real, we have \(E(x,y) = {\hat{e}}(x)e(y)\lambda _0\) and the expression for \({\dot{T}}\) follows from (55). Finally, if \(\ell \le T_0(x)\le 1-\ell \), then \({\widetilde{F}}_\ell = [0,1]\) and we have (56). \(\square \)

8 Applications and Numerical Experiments

In this section, we will consider two stochastically perturbed deterministic systems, namely the Pomeau–Manneville map and a weakly mixing interval exchange map. For each of these maps we numerically estimate:

  1. 1.

    The unique kernel perturbation that maximises the change in expectation of a prescribed observation function (see Problem A). An expression for this optimal kernel is given by (28).

  2. 2.

    The unique kernel perturbation that maximally increases the mixing rate (see Problem B). An expression for this optimal kernel is given by (31) and (32).

  3. 3.

    The unique map perturbation that maximises the change in expectation of a prescribed observation function (see Problem C). An expression for this optimal map perturbation is given by (49).

  4. 4.

    The unique map perturbation that maximally increases the mixing rate (see Problem D). An expression for this optimal map perturbation is given by (55) and (56).

The numerics will be explained as we proceed through these four optimisation problems. We refer the reader to Antown et al. (2018) for additional details on the implementation and related experiments.

8.1 Pomeau-Manneville Map

We consider the Pomeau-Manneville map (Liverani et al. 1999)

$$\begin{aligned} T_0(x)=\left\{ \begin{array}{ll} x(1+(2x)^\alpha ),&{}\quad x\in [0,1/2);\\ 2x-1,&{}\quad x\in [1/2,1] \end{array} \right. , \end{aligned}$$
(57)

with parameter value \(\alpha =1/2\). For this parameter choice it is known that the map \(T_0\) admits a unique absolutely continuous invariant probability measure, but only algebraic decay of correlations (Liverani et al. 1999). With the addition of noise as per (33), the transfer operator defined by (35) and (36) for \(\delta =0\) becomes compact as an operator on \(L^2\). In our numerical experiments we will use the smooth noise kernel \(\rho _\epsilon :[-\epsilon ,\epsilon ]\rightarrow {\mathbb {R}}\), defined by \(\rho _\epsilon (x)=N(\epsilon )\exp (-\epsilon ^2/(\epsilon ^2-x^2))\), where \(N(\epsilon )\) is a normalisation factor ensuring \(\int \rho _\epsilon (x)\ \mathrm{d}x=1\).

We now begin to set up our numerical procedure for estimating \(L_0\), which is a standard application of Ulam’s method (Ulam 1960). Let \(B_n = \{I_1,\dots , I_n\}\) denote an equipartition of [0, 1] into n subintervals, and set \({\mathcal {B}}_n = \) span\(\{{\mathbf {1}} _{I_1},\dots ,{\mathbf {1}}_{I_n}\}\). We define the (Ulam) projection \(\pi _n:L^2([0,1]) \rightarrow {\mathcal {B}}_n\) by \(\pi _n(g) = \sum _{i=1}^n\left( \frac{1}{m(I_i)} \int _{I_i}g(x)\mathrm{d}x\right) {\mathbf {1}}_{I_i}\). The finite-rank transfer operator \(L_{n}:=\pi _n L_0:L^2([0,1])\rightarrow {\mathcal {B}}_n\) can be computed numerically. We use MATLAB’s built-in functions integral.m and integral2.m to perform the \(\rho \)-convolution (using an explicit form of \(\rho _\epsilon \)) and the Ulam projections, respectively. Figure 1 displays the nonzero entries in the column-stochastic matrix corresponding to \(L_n\) for \(\epsilon =0.1\).

Fig. 1
figure 1

Transition matrix \(L_n\) for the system (33) generated by the Pomeau-Manneville map \(T_0\) (57) using \(n=500\) subintervals of equal length. The matrix entries are located according to the subinterval positions in the domain [0, 1], so that the image appears as a “blurred” version of the graph of \(T_0\). The additive noise in (33) is drawn according to \(\rho _\epsilon \) with \(\epsilon =1/10\)

Fig. 2
figure 2

Approximate invariant densities (left) and eigenfunctions corresponding to the 2nd largest eigenvalue of \(L_0\) (right) for the system (33) with \(T_0\) given by the Pomeau-Manneville map (57). The additive noise in (33) is drawn according to \(\rho _\epsilon \) with \(\epsilon \) taking the values 1/10 (blue) and \(\sqrt{6}/100\) (red). The Ulam matrix \(L_n\) is constructed with 500 subintervals (Color figure online)

Approximations to the invariant probability densities for our stochastic dynamics are displayed in Fig. 2 (left) for large and small noise supports. A lower level of noise permits greater concentration of invariant probability mass near the fixed point \(x=0\) of the map \(T_0\). Also shown in Fig. 2 (right) are the estimated eigenfunctions corresponding to the second-largest eigenvalue of \(L_n\). The signs of these second eigenfunctions split the interval [0, 1] into left and right hand portions, broadly indicating that the slow mixing is due to positive mass near \(x=0\) and negative mass away from \(x=0\) (Dellnitz et al. 2000); see Froyland et al. (2011) for further discussion of this point in the Pomeau-Manneville setting.

8.1.1 Kernel Perturbations

In the framework of Problems A and B we use the (arbitrarily chosen) monotonically increasing observation function \(c(x)=-\cos (x)\). In order to estimate \({\dot{k}}\) as in (28) we use the code from Algorithm 3 (Antown et al. 2018); the inputs are the Ulam matrix \(L_n\) and \(c_n\) (obtained as \(\pi _n(c)\)). Equivalently, directly using (28) one may substitute \(f_n\) (obtained as the leading eigenvector of \(L_n\)) for f, \(L_n\) for L, \(c_n\) as above for c, and solve \((Id-L_n^*)^{-1}c_n\) (obtained as a vector \(y\in {\mathbb {R}}^n\) by numerically solving the linear system \((Id-L_n^*)y=c_n, f_n^\top y=0\)). Figure 3 shows the optimal kernel perturbations \({\dot{k}}_n\) for \(n=500\). Because c is an increasing function, intuitively one might expect the kernel perturbation to try to shift mass in the invariant density from left to right. Broadly speaking, this is what one sees in the high-noise case in Fig. 3 (left): vertical strips typically have red above blue, corresponding to a shift of mass to the right in [0, 1]. The main exception to this is around the y-axis value of 1/2, where red is strongly below blue along vertical strips. This is because at the next iteration, these red regions will be mapped near \(x=1\) and achieve the highest value of c, while the blue regions will be mapped near to \(x=0\) with the least value of c. In the low-noise case of Fig. 3 (right), we see a similar solution with higher spatial frequencies, and strong kernel perturbations near the critical values of \(x=0\) and \(T_0(x)=1/2\).

Fig. 3
figure 3

Optimal kernel perturbations for the Pomeau-Manneville map to maximise the change in expectation of \(c(x)=-\cos (x)\), based on an Ulam approximation of (28) with \(n=500\) subintervals. Left: \(\epsilon =1/10\), right: \(\epsilon =\sqrt{6}/100\)

Fig. 4
figure 4

Optimal kernel perturbation for the Pomeau-Manneville map to maximally increase the mixing rate, computed with \(n=500\) subintervals. Left: \(\epsilon =1/10\), right: \(\epsilon =\sqrt{6}/100\)

To investigate the optimal kernel perturbation to maximally increase the rate of mixing in the stochastic system, we use the expression \({\dot{k}}\) in (31). A natural approximate version (31) requires estimates of the left and right eigenfunctions of \(L_0\) corresponding to the second largest eigenvalue \(\lambda _2\); these are obtained directly as eigenvectors of \(L_n\). Figure 4 shows the resulting optimal kernel perturbations, computed using the code from Algorithm 4 (Antown et al. 2018) with input \(L_n\). Because the fixed point at \(x=0\) is responsible for the slow algebraic decay of correlations for the deterministic dynamics of \(T_0\), the fixed point will also play a dominant role in the mixing rate of the stochastic system for low to moderate levels of noise. Indeed, Fig. 4 shows that the optimal kernel perturbation concentrates its effort in a neighbourhood of the fixed point, and pushes mass away from the fixed point as much as possible. This is particularly extreme in the low noise case of Fig. 4 (right) with the perturbation almost exclusively concentrated in a small neighbourhood of \(x=0\).

8.1.2 Map Perturbations

We now turn to the problem of finding the unique map perturbation \({\dot{T}}\) that maximises the change in expectation of the observation \(c(x)=-\cos (x)\) (see Problem C for a precise formulation) and maximises the speed of mixing (see Problem D). We use the natural Ulam discretisation of the expressionFootnote 5 (49). The objects \(f_n\) and \((Id-L_n^*)^{-1}c_n\) are computed exactly as before in Sect. 8.1.1. The action of the operator \({\mathcal {G}}\) in (49) is computed using MATLAB’s built-in function integral.m using an explicit form of \(\mathrm{d}\rho _\epsilon /\mathrm{d}x\) for \(\mathrm{d}\rho /\mathrm{d}x\) in (49).

Fig. 5
figure 5

Left: Optimal map perturbation \({\dot{T}}\) for the Pomeau-Manneville map to maximise the change in expectation of \(c(x)=-\cos (x)\), computed using (49) with \(n=500\). Right: Illustration of \(T_0+{\dot{T}}/100\)

Figure 5 (left) shows the optimal \({\dot{T}}\) for the two noise amplitudes \(\epsilon =1/10\) and \(\epsilon =\sqrt{6}/100\). Note that for the noise amplitude \(\epsilon =0.1\) (blue curve in Fig. 5) the map perturbation \({\dot{T}}\) is mostly positive, corresponding to moving probability mass to the right, as expected because we are maximising the change in expectation of an increasing observation function c. The blue curve is most negative in neighbourhoods of the two preimages of \(x=1/2\), corresponding to moving probability mass to the left. The reason for this is identical to the discussion of the “blue above red” effect in Fig. 3, namely moving mass to the left creates a very large increase in the objective function value at the next iterate. This “look ahead” effect is even more pronounced in the low noise case (red curve of Fig. 5), where \({\dot{T}}\) is mostly positive, but has deep negative map perturbations at multiple preimages of \(x=1/2\) reaching further into the past.

Fig. 6
figure 6

Kernel perturbations corresponding to the optimal map perturbations in Fig. 5. Left: \(\epsilon =1/10\), right: \(\epsilon =\sqrt{6}/100\)

Figure 5 (right) illustrates the Pomeau-Manneville map (black) with perturbed maps \(T_0+{\dot{T}}/100\). We have chosen a scale factor of 1/100 for visualisation purposes; one should keep in mind we have optimised for an infinitesimal change in the map. Figure 6 shows the kernel derivatives \({\dot{k}}\) corresponding to the optimal map derivatives \({\dot{T}}\) for the two noise levels. These kernel derivatives have a restricted form because they arise purely from a derivative in the map. One may compare Fig. 6 with Fig. 3 and note that the kernel derivative in Fig. 6 (left) attempts to follow the general structure of the kernel derivative in Fig. 3 (left), while obeying its structural restrictions arising from the less flexible map perturbation. Broadly speaking, in Fig. 6 (left), red lies above blue (mass is shifted to the right). Exceptions are near \(y=1/2\) because at the next iteration these red points will land near \(x=1\), achieving very high objective value, while the blue region will get mapped to near \(x=0\), encountering the lowest value of c. Note that the map perturbation decreases from a peak to very close to zero near \(x=0\). This is because in a small neighbourhood of \(x=0\) there is already some stochastic perturbation away from \(x=0\) “for free” due to the reflecting boundary conditions imposed by \(\pi \). Thus, the map perturbation \({\dot{T}}\) does not need to invest energy in large perturbations very close to \(x=0\).

Fig. 7
figure 7

Left: Optimal map perturbation \({\dot{T}}\) for the Pomeau-Manneville map to maximise the change in the mixing rate, computed using (56) with \(n=500\). Right: illustration of \(T_0+{\dot{T}}/100\)

The map perturbation that maximally increases the rate of mixing is a particularly interesting question. Our computations use the natural Ulam discretisation of (56). The computations follow as in Sect. 8.1.1 with the action of \({\mathcal {G}}\) computed as above. Figure 7 (left) shows the optimal \({\dot{T}}\) for the two noise amplitudes \(\epsilon =1/10\) and \(\epsilon =\sqrt{6}/100\). A sharp map perturbation away from \(x=0\) is seen for both noise levels, with the perturbation sharper for the lower noise case. In both cases, the map perturbations far from \(x=0\) are weak (low magnitude values of \({\dot{T}}\)). This result corresponds well with the results seen for the optimal kernel perturbations in Fig. 4, where mass was primarily moved away from \(x=0\). As in the optimal solution shown in Fig. 5 (left), the optimal map perturbation in Fig. 7 decreases from a sharp peak down to zero near \(x=0\). This is again because in a small neighbourhood of \(x=0\) the system experiences “free” stochastic perturbations away from \(x=0\) due to the reflecting boundary conditions, and thus the map perturbation \({\dot{T}}\) need not need invest energy in large perturbations very close to \(x=0\). Figure 7 (right) illustrates the Pomeau-Manneville map (black) with perturbed maps \(T_0+{\dot{T}}/100\), where again the factor 1/100 is just for illustrative purposes; we are optimising an infinitesimal map perturbation. When inspecting the kernel derivatives \({\dot{k}}\) corresponding to the optimal map perturbations \({\dot{T}}\) in Fig. 8, we see similar behaviour to those in Fig. 7.

Fig. 8
figure 8

Kernel perturbations corresponding to the optimal map perturbations in Fig. 7. Left: \(\epsilon =1/10\), right: \(\epsilon =\sqrt{6}/100\)

8.2 Interval Exchange Map

In our second example, we consider a weak-mixing interval exchange map. This is because of an existing literature in mixing optimisation for these classes of maps with the addition of noise. Avila and Forni (2007) prove that a typical interval exchange is either weak mixing or an irrational rotation. We use a specific weak-mixing (Sinai and Ulcigrai 2005) interval exchange map \(T_0\) with interval permutation \((1234)\mapsto (4321)\) and interval lengths given by the normalised entries of the leading eigenvector of the matrix \(\left( \begin{array}{cccc} 13&{}37&{}77&{}47\\ 10&{}30&{}60&{}37\\ 3&{}10&{}24&{}14\\ 4&{}10&{}19&{}12 \end{array} \right) \); see equation (51) in Sinai and Ulcigrai (2005). We again form a stochastic system using the same noise kernels as for the Pomeau-Manneville map in Sect. 8.1. The mixing properties of this map have been studied in Froyland et al. (2016). Figure 9 shows the column-stochastic matrix corresponding to \(L_n\) for \(n=500\) and \(\epsilon =0.1\).

Fig. 9
figure 9

Transition matrix for the system (33) for \(\delta =0\) and \(T_0\) given by the interval exchange map above using \(n=500\) subintervals. The additive noise is drawn from the density \(\rho _\epsilon \) with \(\epsilon =1/10\)

8.2.1 Kernel Perturbations

In the framework of Problem A, we use the same observation function \(c(x)=-\cos (x)\) as in the Pomeau-Manneville case study, and estimate the optimal kernel perturbation \({\dot{k}}\) that maximally increases the expectation of c in an identical fashion. In broad terms, one again sees that \({\dot{k}}\) attempts to shift invariant probability mass to the right in [0, 1]. In Fig. 10 (left), in each smooth part of the support of \({\dot{k}}\), red is “above” blue, meaning mass is pushed to the right.

Fig. 10
figure 10

Optimal kernel perturbation for the interval exchange map to maximise the change in expectation of \(c(x)=-\cos (x)\), computed with \(n=500\) Ulam subintervals. Left: \(\epsilon =1/10\), right: \(\epsilon =\sqrt{6}/100\)

Clear exceptions to the “red above blue” scheme are seen as three sharp horizontal lines. The y-coordinates of these three sharp horizontal lines coincide with the three points of discontinuity in the domain of the interval exchange at approximately \(x=0.43, 0.77, 0.89\). Consider the sharp horizontal “blue above red” line at \(y\approx 0.43\). According to Fig. 9, under the action of the kernel \(k_0\), mass in the vicinity of \(x=0.6\) will be transported near to \(x=0.43\). The perturbation \({\dot{k}}\) shown in Fig. 10 will then tend to push this mass to the left of \(x=0.43\). Thus, on the next iteration there will be a bias for mass to be mapped near to \(x=1\) rather than near \(x=0.25\), achieving a much larger objective value at this iterate. A similar reasoning applies to the “blue above red” horizontal lines at \(y\approx 0.77\) and 0.89; the contrast is a little weaker because the potential gain at the next iterate is also weaker. In the low noise case, Fig. 10 (right), displays similar behaviour to the higher noise case of Fig. 10 (left). With lower noise, the deterministic dynamics plays a greater role and additional preimages are taken into account, leading to a more oscillatory optimal \({\dot{k}}\).

To investigate the optimal kernel perturbation to maximally increase the rate of mixing in the stochastic system (in the framework of Problem B) we use the expression \({\dot{k}}\) in (31). The method of numerical approximation is identical to that used for the Pomeau-Manneville map. Figure 11 shows the signed distribution of mass that is responsible for the slowest realFootnote 6 exponential rate of decay in the stochastic system. This eigenfunction becomes more oscillatory as the level of noise decreases, and as must be the case, the magnitude of the corresponding eigenvalue increases from \(\lambda \approx 0.7476\) (\(\epsilon =1/10\)) to \(\lambda \approx 0.9574\) (\(\epsilon =\sqrt{6}/100\)). Because the sign of these eigenvalues is negative, one expects a pair of almost-2-cyclic sets (Dellnitz and Junge 1999), consisting of three subintervals each, given by the positive and negative supports of the eigenfunctions.

Fig. 11
figure 11

Approximate second eigenfunctions of the transfer operator \(L_0\) of the system (33) with \(T_0\) given by the interval exchange map above. The additive noise in (33) is drawn from the density \(\rho _\epsilon \) with \(\epsilon \) taking the values 1/10 (blue) and \(\sqrt{6}/100\) (red) (Color figure online)

Fig. 12
figure 12

Optimal kernel perturbation for the interval exchange map to maximally increase the mixing rate, computed with \(n=500\) Ulam subintervals. Left: \(\epsilon =1/10\), right: \(\epsilon =\sqrt{6}/100\)

Figure 12 shows the approximate optimal kernel perturbations. In the high-noise situation of Fig. 12 (left), the sharp horizontal changes are present at preimages of the deterministic dynamics, as they were in to Fig. 10 (left). The importance of the break points to the overall mixing rate is thus clearly borne out in the optimal \({\dot{k}}\); a precise interpretation of the optimal \({\dot{k}}\) is not very straightforward. For the low noise case (Fig. 12 (right)) it appears that there is an alternating shifting of mass left and right with alternating “red above blue” and “blue above red”. This leads to greater mixing at smaller spatial scales than is possible in a single iteration of the deterministic interval exchange. We anticipate that decreasing the noise amplitude further will result in more rapid alternation of “red above blue” and “blue above red”. As the diffusion amplitude decreases, the efficient large-scale diffusive mixing is no longer possible and so a transition is made to small-scale mixing, accessed by increasing oscillation in the kernel.

8.2.2 Map Perturbations

The computations in this section follow those of Sect. 8.1.2. Figure 13 (left) shows the optimal map perturbations \({\dot{T}}\) at two different noise levels. Figure 13 (right) illustrates \(T_0+{\dot{T}}/100\) for the two different levels of noise. The kernel perturbations generated by these optimal map perturbations are displayed in Fig. 14. If one compares the kernel perturbations in Fig. 14 with those more flexible kernel perturbations in Fig. 10, one sees that the two sets of kernel perturbations are broadly equivalent with one another in terms of the relative positions of the positive and negative (red and blue) perturbations. Note that the more restrictive kernel derivative in Fig. 14 by construction cannot replicate the sharp horizontal red-blue switches in Fig. 10. It turns out that the strongest of these red-blue switches, namely the one at \(y\approx 0.43\) in Fig. 10 (left) is approximated as best as is allowed by a map perturbation, see Fig. 14 (left), while the other two (weaker) horizontal red/blue switches seen in 10 are ignored.

Fig. 13
figure 13

Left: optimal map perturbation \({\dot{T}}\) for the interval exchange map to maximise the change in expectation of \(c(x)=-\cos (x)\), computed using (49) with \(n=500\). Right: illustration of \(T_0+{\dot{T}}/100\)

Fig. 14
figure 14

Kernel perturbations corresponding to the optimal map perturbations in Fig. 13. Left: \(\epsilon =1/10\), right: \(\epsilon =\sqrt{6}/100\)

We now turn to optimal map perturbations for the mixing rate. The combined effect of the “cutting and shuffling” of interval exchanges with diffusion on mixing rates has been widely studied, e.g. Ashwin et al. (2002), Sturman (2012), Froyland et al. (2016), Kreczak et al. (2017), Wang and Christov (2018), including investigations of the impact of changing the diffusion or the interval exchange on mixing. The very general type of formal map optimisation we consider here has not been attempted before, and we hope that our novel techniques will stimulate interesting new research questions and motivate more sophisticated experiments in the field of mixing optimisation.

Fig. 15
figure 15

Left: Optimal map perturbation \({\dot{T}}\) for the interval exchange map to maximise the change in the mixing rate, computed using (56) with \(n=500\). Right: illustration of \(T_0+{\dot{T}}/100\)

Fig. 16
figure 16

Kernel perturbations corresponding to the optimal map perturbations in Fig. 15. Left: \(\epsilon =1/10\), right: \(\epsilon =\sqrt{6}/100\)

Under repeated iteration, the original interval exchange \(T_0\) cuts and shuffles the unit interval into an increasing number of smaller pieces, assisting the small scale mixing of diffusion. Our results in Fig. 15 (left) show an oscillatory \({\dot{T}}\), with increasing oscillations as the noise amplitude decreases. This increased oscillation effect is also seen when comparing the left and right panes of Fig. 16. Thus, the optimisation attempts to include some additional mixing by rapid local warping of the phase space. It is plausible that this additional warping effect enhances mixing beyond the rigid shuffling of the interval exchange. An illustration of \(T_0+{\dot{T}}/100\) is given in Fig. 15. We emphasise that the factor 1/100 is only for visualisation purposes and for smaller factors, the perturbed map would remain a piecewise homeomorphism (modulo small overshoots at the boundaries, which are taken care of by the reflecting boundary conditions on the noise).