1 Introduction

Response theory is the scientific research area at the boundary between mathematics and physics dealing with the understanding of how complex systems react to perturbations affecting their dynamics. Even if it addresses is a very classical problem, the mathematical framework to develop such a theory is still a matter of research.

Although the construction of response theory can be approached by taking many different scientific point of views, it has been mostly driven by statistical mechanics [1]. In this context, by considering a complex system in a steady state (equilibrium or nonequilibrium) and applying some sort of forcing to the dynamics as, e.g., a change in the governing parameters, one is interested in analysing the resulting deviation from the steady state and, in some cases, its relaxation to the new one.

We can express these ideas formally as follows. Let \(\{\phi ^t\}_{t\ge 0}\) be a dynamical system on a compact domain \(X \subset {\mathbb {R}}^d\) generated by the evolution equation \(\dot{\mathbf {x}}={\mathbf {F}}({\mathbf {x}})\). We take here the continuous time point of view but one can equivalently formulate the problem for discrete dynamics. Furthermore, let \({\mathcal {J}}: X \longrightarrow {\mathbb {R}} \) denote a generic observable. Assuming that the system is in a steady state, we want to analyse the how the values of the observable \({\mathcal {J}}\) change when the dynamics are subject to a perturbation of the form:

$$\begin{aligned} \dot{\mathbf {x}}={\mathbf {F}}({\mathbf {x}}) + \epsilon {\mathbf {G}}({\mathbf {x}}), \end{aligned}$$
(1)

where \(\epsilon \in {\mathbb {R}}\) is the perturbation parameter and \({\mathbf {G}}\) is the perturbation vector-field, which is assumed to generate (together with \({\mathbf {F}}\)) the perturbed dynamical system \(\{\phi ^t_{\epsilon } \}_{t\ge 0}\). Thus, the value of the observable \({\mathcal {J}}\) will change as a result of the perturbation in the following fashion:

$$\begin{aligned} \frac{d}{dt}{\mathcal {J}}({\mathbf {x}})={\mathbf {F}}({\mathbf {x}})\cdot \nabla {\mathcal {J}}({\mathbf {x}}) + \epsilon {\mathbf {G}}({\mathbf {x}})\cdot \nabla {\mathcal {J}}({\mathbf {x}}). \end{aligned}$$
(2)

This equation describes how the observable changes with time, but only upon integration of the perturbed system can we know what its expected value is. The question we, thus, want to address is that of understanding and predicting the mean behaviour of quantities of interest upon modification of the governing dynamics.

In a complex system evolving with time, the mathematical object that accounts for the statistical description of its asymptotic regime is the invariant measure [2]. This measure is called invariant because it does not change under the action of the system. Roughly speaking, it tells us how mass is distributed on phase-space in the far future. Thus, if a system is undergoing perturbations, its invariant measure will change and consequently, the expected value of the observables of interest. If the perturbation parameter \(\epsilon \) is small enough, one can ask about the effect of the perturbation on the system at a given order of nonlinearity, this is, the response.

Let \(\rho \) and \(\rho _{\epsilon }\) denote the unperturbed and perturbed invariant measures respectively. If we use the notation introduced earlier, we would like to compute the expectation value of the observable \({\mathcal {J}}\) in the perturbed steady state \(\left\langle \rho _{\epsilon } , {\mathcal {J}} \right\rangle \) in terms of its former expectation value \(\left\langle \rho , {\mathcal {J}} \right\rangle \) and suitable linear and nonlinear correction terms:

$$\begin{aligned} \left\langle \rho _{\epsilon } , {\mathcal {J}} \right\rangle = \left\langle \rho , {\mathcal {J}} \right\rangle + \sum _{k=1}^{\infty }\epsilon ^k \delta [{\mathcal {J}}]_k. \end{aligned}$$
(3)

If one has access to \(\delta [{\mathcal {J}}]_k\), one could effectively predict the expected value in the perturbed system and describe its sensitivity to a certain perturbation.

The goal of this paper is to provide evidence of a practically applicable methodology for computing the linear and nonlinear corrections to the invariant measure given in Eq. (3) in the case of two simple yet important numerical models. Specifically, we want to achieve predictive power: we want to predict the impact of the applied perturbation by using only information available from the background, unperturbed simulations, i.e. without resorting to additional simulations where the extra forcing is added. We will elaborate on this later in the paper.

1.1 Response Theory

In an isolated system in equilibrium governed by a Hamiltonian, Kubo [3] found a closed set of formulas for the response describing \(\delta [{\mathcal {J}}]_k\) and establishing the connection with the fluctuation-dissipation theorem (FDT). Such theorem can be seen as a dictionary allowing for predicting the response of a system from its free fluctuations. However, the nature of the problem tackled in Kubo’s work did not allow to extend it to a more general case of nonequilibrium complex systems and the validity of the formulas was not fully addressed. In fact, deterministic dynamical systems featuring contraction of the phase-space posses invariant measures that are singular with respect to the Lebesgue measure. The absence of a regular density is a major barrier that makes impossible the straightforward application of the FDT, thus breaking the one-to-one connection between forced and free fluctuations of a system.

In the work by Ruelle [4, 5], such framework was clarified at a mathematical level, allowing to apply response theory in nonequilibrium systems. Ruelle’s results are based on the use of Markov partitions and provide a rigorous response theory for Axiom A [6] systems. Essential ingredients for the theory are the fact that the unperturbed system are structurally stable and that one can split the response operator in two parts. One refers to the contribution coming from the unstable and central manifolds and can be framed as an FDT result. The second contribution comes from the stable manifold and gives an additional term that cannot be described using the free fluctuations of the system. Hence, a suitable notion of differentiability of singular measures was established allowing to consider perturbative expansions as in Eq. (3) for a general class of deterministic dynamical systems.

The Ruelle response theory has provided a key framework for constructing algorithms aimed at practically computing the response of nonequilibrium systems to perturbations, see e.g. Ref. [7], and, specifically, for performing successfully climate change predictions using simple, see e.g. Ref. [8], and comprehensive climate models, see e.g. Refs. [9, 10].

Despite a good degree of success in the above mentioned studies, the presence of the two distinct contributions described above makes the construction of accurate response algorithms very challenging, see discussion in Ref. [8]. A different point of view, based on the study of the evolution of probabilities rather than of individual trajectories, seems necessary; see below.

1.2 The Transfer Operator Approach

Indeed, one can understand response theory by means of studying the effect of perturbations on the evolution of measures on phase-space as opposed to trajectories. Formulating response theory under this point of view uses essentially different machinery mostly hovering around the so called transfer operator [11].

Let us suppose that \({\mathbf {F}}: X \subseteq {\mathbb {R}}^d \longrightarrow {\mathbb {R}}^d\) is a vector-field that generates the dynamical system or flow\(\{\phi ^t\}_{t\in {\mathbb {R}}}\), with \(\phi ^t :X \longrightarrow X\). Then, the transfer operator semigroup \(\{\mathcal {L}^{t}\}_{t\ge 0}\) can be defined as the solution of the Liouville equation [2]:

$$\begin{aligned} \partial _t \rho ({\mathbf {x}},t) = -\nabla \cdot \left( {\mathbf {F}}\rho ({\mathbf {x}},t) \right) , \end{aligned}$$
(4)

so that \(\rho ({\mathbf {x}},t)=\mathcal {L}^t\rho _0({\mathbf {x}})\) for some initial condition \(\rho _0 \in L^1(X)\). In the language of probability, the transfer operator is describing the pushforward of an integrable function under the action of the dynamical system after t time-units. It turns out that \(\mathcal {L}^t\) is a contraction and the set \(\{ \mathcal {L}^t \}_{t\ge 0}\) is a \(C_0\)-semigroup or a group, if the dynamical system is defined globally in time [12].

Equation (4) symbolises a process where mass is only advected along the flow. However, in many areas and applications, the governing dynamics are stochasticly perturbed introducing uncertainty to the problem. This sort of perturbations can be modelled by stochastic processes of the form:

$$\begin{aligned} \dot{{\mathbf {x}}} = {\mathbf {F}}({\mathbf {x}}) + \Sigma ({\mathbf {x}})dW_t, \end{aligned}$$
(5)

where we introduce here a standard d-dimensional Wiener process \(dW_t\) and \(\Sigma ({\mathbf {x}}) \in {\mathbb {R}}^{d\times d}\) is the covariance matrix. In terms of measures, stochasticity can be translated into the fact that their evolution is not only driven by advection, but diffusion is present. The Liouville is thus transformed into the Fokker-Planck equation [13]:

$$\begin{aligned} \partial _t\rho ({\mathbf {x}},t)={\mathcal {A}}\rho ({\mathbf {x}},t):=-\nabla \cdot \left( {\mathbf {F}}\rho ({\mathbf {x}},t) \right) + \frac{1}{2}\sum _{i=1}^d\sum _{j=1}^d\partial _{x_i,x_j} \Sigma \Sigma ^{*}({\mathbf {x}})\rho ({\mathbf {x}},t). \end{aligned}$$
(6)

We have defined the differential operator \({\mathcal {A}}\) which can be shown to generate a \(C_0\)-semigroup, just as the Liouville equation does [12]. Notice that the differential operator on the right-hand-side of Eq. (4) is the same as \({\mathcal {A}}\) if no noise is present.

The transfer operator is therefore a statistical tool rather than a dynamical one. It globally describes how densities evolve with time as opposed of giving a trajectory-wise description of the system. In fact, the spectral properties of the transfer operator carry information about the statistical features of the system of interest. For instance, one observes that the fixed points of \(\mathcal {L}^t\) are nothing else than the measure that remains fixed under the action of the dynamics, namely, the invariant measure. In fact, the ergodicity and mixing character of a dynamical system can be characterised in terms of the nature of the leading eigenvalues of \(\mathcal {L}^t\) (e.g. Ref. [11]).

Introducing perturbations on the system affects not only the way trajectories evolve on phase-space but also measures. This is reflected on the Fokker–Planck evolution equation where by considering a perturbation on the vector-field of the kind \({\mathbf {F}} \mapsto {\mathbf {F}}+\sum _k \epsilon _k {\mathbf {G}}_k\) we obtain a perturbed evolution law for the density:

$$\begin{aligned} \partial _t \rho ({\mathbf {x}},t) = {\mathcal {A}}\rho ({\mathbf {x}},t) + \sum _{k=1}^n \epsilon _k {\mathcal {B}}_{k}\rho ({\mathbf {x}},t), \end{aligned}$$
(7)

where \({\mathcal {B}}_k=-\nabla \cdot \left( {\mathbf {G}}_k\circ \right) \). Under some conditions [12] the previous equation also generates a \(C_0\)-semigroup with the same functional properties as the unperturbed one. One can also introduce perturbations on the covariance matrix, which would lead to a similar equation as the one above with different perturbation operators.

In this paper, we show how to compute the result of applying the operators \({\mathcal {B}}_k\) on the right hand side of Eq. (7) via finite differencing, and then construct the invariant measure of the perturbed system. The quality of such an estimate is then tested against direct numerical simulation.

1.2.1 Projected Transfer Operators and Markov Chains

The transfer operator has been used to solve problems in different areas of science, see e.g. examples in geosciences Refs. [14,15,16]. In these applications phase-space is not defined as a continuum but as finite collection of regions. As a result, each time the transfer operator is applied, a finite probability mass is shifted from one region to another one. For this reason, in applications, the transfer operator has to be understood in a finite dimensional setting. Here, we follow such a route and take into account the Ulam’s method which is surveyed in e.g. Refs. [17,18,19] and revisited in the next lines.

We consider a finite subdivision of phase-space X into N regions or boxes\(\{B_i\}_{i=1}^{N}\) and define \({\mathbf {1}}_{B_i}\) as the characteristic function on box \(B_i \subset X\). Thus, we define the projection \(P_N: L^1\left( X\right) \longrightarrow U_N : = {\mathrm {Span}}\left( \{\mathbf {1}_{B_i}\}_{i=1}^N \right) \) as

$$\begin{aligned} P_N\rho = \sum _{i=1}^N\frac{{\mathbf {1}}_{B_i}}{\eta \left( B_i\right) }\int _{B_i}\rho {\mathbf {1}}_{B_i} \eta (dx), \end{aligned}$$
(8)

where \(\eta \) indicates some notion of volume that can depend on the nature of the problem. It follows that the operator \(P_N{\mathcal {L}}^t: U_N \longrightarrow U_N\) admits a matrix representation:

$$\begin{aligned} {\mathcal {M}}^{t}_{i,j}:=\left( P_N\mathcal {L}^t\right) _{i,j}=\frac{1}{\eta (B_i)}\int _{B_i}\mathcal {L}^t\mathbf {1}_{B_j}\eta (dx), \end{aligned}$$
(9)

which happens to be a Markov or stochastic matrix. The way we practically construct \({\mathcal {M}} ^t\) depends on the choice of measure \(\eta \). Generally, if one wants to study the asymptotic properties of the system one will take \(\eta \) as the invariant measure. On the other hand, if the focus is put on the effects of the flow on the whole domain X, and especially if one is interested in problems of relaxation to the steady state, the correct choice would be to consider the Lebesgue measure instead [20]. It is worth highlighting that the properties of the dynamical system \(\phi ^t\) are described by \(\mathcal {L}^t\).

Perturbations on the dynamics lead to perturbations on the transfer operator \(\mathcal {L}^t\) and, consequently on the matrix \({\mathcal {M}} ^t\). This fact motivates the problem of reverting the question, i.e., by considering perturbations of \({\mathcal {M}} ^t\), what can we say about the dynamics? This question has been tackled in previous work [21, 22] by suitably constructing the response operators. Alongside the development of the probabilistic theory, it was possible to establish a connection between the perturbation theory of Markov chains and linear response theory for dynamical systems, applying it to a number of low dimensional stochastic and deterministic dynamical systems.

Constructing the operator that accounts for the linear response can be a difficult and costly task [7]. Linear response can be inferred by examining how the system of interest responds to small perturbations in the governing equations. However, this process can be an expensive procedure if one deals with high-dimensional models with physical relevance. Therefore, one desires to have predictive power. In this paper we demonstrate that it is possible to calculate, by using finite representations of the transfer operator, the response of a system by sampling its unperturbed dynamics and prior knowledge of the forcings applied to it. The overall goal is to provide practically usable tools for studying the response of complex nonequilibrium system, like the climate, to perturbations. It must be highlighted that climate models are high-dimensional, making Ulam’s method intractable; see e.g. [23]. Still, previous work shows that Markov modelling can be of use in reduced phase-space [15, 24].

The structure of the rest of the document is as follows. In Sect. 2 we will present the perturbation theory for Markov matrices, key to construct the response operator at a coarse-grained level and will establish the link to the study of dynamical systems. In Sect. 3 we gather our results, showing how numerically constructed perturbation operators can be used to predict the response of the system of interest. Finally, in Sect. 4, we give a summary and discuss future work along these lines.

2 Perturbations of Finite Markov Chains

In this section, we consider a mixing (therefore, ergodic) Markov process with a finite number of states \(N\in \mathbb {N}\); see e.g. the approaches taken in Refs. [21, 22]. Then, a positive vector \({\mathbf {u}} _0\in {\mathbb {R}} ^N\) with \(\sum _{i=1}^N\left( {\mathbf {u}} _0\right) _i=1\) would indicate an initial ensemble of states. The finite Markov process would be determined by a stochastic matrix \({\mathcal {M}} \in {\mathbb {R}}^{N\times N}\) so that the sequence \(\{{\mathcal {M}}^n {\mathbf {u}}_0 \}_{n=0}^{\infty }\) would be a realisation of it.

The stochastic matrix \({\mathcal {M}}\) enjoys some properties worth highlighting. First, we note that \({\mathcal {M}}_{i,j}\) is the probability of jumping into the ith state conditioned on being at the jth. Thus, it follows that

$$\begin{aligned} \sum _{i=1}^N{\mathcal {M}}_{i,j}=1, \end{aligned}$$
(10)

for any \(j=1,\ldots , N\). Therefore, all the entries of \({\mathcal {M}}\) are greater than or equal to zero. Further, since the process is mixing, it implies that there exists \(p\in \mathbb {N}\), so that the entries of the matrix \({\mathcal {M}}^p\) are strictly greater than zero [25]. Matrices satisfying this condition are called irreducible and aperiodic [11] or primitive [25], although we shall refer to them as mixing by analogy with the Markov process they determine. Consequently, the Perron-Frobenius theorem [25] holds for this matrix, meaning that there exists a leading eigenvalue \(\lambda _1\) with positive eigenvector \({\mathbf {u}}\). By virtue of the spectral properties of stochastic matrices, it turns out that \(\lambda _1 = 1\) and \({\mathbf {u}}\) solves

$$\begin{aligned} {\mathcal {M}}{\mathbf {u}}={\mathbf {u}}. \end{aligned}$$
(11)

This means that \({\mathbf {u}}\) remains invariant under the action of \({\mathcal {M}}\). The rest of the eigenvalues lie within the unit circle. If, in addition, \({\mathbf {u}}\) is normalised so that its entries sum up to one, we call \({\mathbf {u}}\) the invariant measure of the process and it follows that \(\lim _{p\rightarrow \infty }{\mathcal {M}}^p {\mathbf {u}}_0={\mathbf {u}}\), for any normalised non-zero vector \({\mathbf {u}} _0 \in {\mathbb {R}} ^N\).

The next step now is to perturb the stochastic matrix and express the resulting perturbed invariant measure as a perturbative expansion.

In what follows, we present a generalisation of what reported in Ref. [21]. For that, we consider a perturbation of \({\mathcal {M}}\) of the form:

$$\begin{aligned} {\mathcal {M}} \longrightarrow {\mathcal {M}} + \sum _{i=1}^n\epsilon _k m_k, \end{aligned}$$
(12)

where \(\epsilon _1,\ldots ,\epsilon _n \in {\mathbb {R}}\) and \(m_1,\ldots ,m_n\in {\mathbb {R}}^{N\times N}\). The matrices \(m_k\) are what we will call the perturbation matrices. Note that the perturbed matrix \({\mathcal {M}} + \sum _{k=1}^n\epsilon _k m_k\), must also be a stochastic matrix if we want it to describe a Markov process. This requirement corresponds to having

$$\begin{aligned} \sum _{i=1}^N\left( m_k\right) _{i,j}=0, \end{aligned}$$
(13)

for any k and j. This assures that the columns of \({\mathcal {M}} + \sum _{k=1}^n\epsilon _k m_k\) add up to one. Moreover, non-negativity must be preserved, so not all choices of \(\epsilon _k\) are valid. For this, we define

$$\begin{aligned} \epsilon ^-=\min \{ \epsilon \in {\mathbb {R}} : \forall i, j \in \{1,\ldots , N\}, {\mathcal {M}} _{i,j} +\epsilon \sum _{k=1}^n\left( m_k\right) _{i,j} \ge 0 \}, \end{aligned}$$
(14)

and

$$\begin{aligned} \epsilon ^+=\max \{ \epsilon \in {\mathbb {R}} : \forall i, j \in \{1,\ldots , N\}, {\mathcal {M}} _{i,j} +\epsilon \sum _{k=1}^n\left( m_k\right) _{i,j} \ge 0 \}. \end{aligned}$$
(15)

Hence, to ensure non-negativity of the perturbed Markov process, we must have that \(\max _k \epsilon _k \in \left[ \epsilon ^- , \epsilon ^+ \right] \). To guarantee that the latter interval is non-empty, we need to be certain that \(\epsilon ^- < 0\) and \(\epsilon ^+ >0\). Suppose that \({\mathcal {M}}_{i_1,j_1}={\mathcal {M}}_{i_2,j_2}=0\) and \(\left( \sum _{k}m_k \right) _{i_1,j_1},\left( \sum _{k}m_k \right) _{i_2,j_2}<0\). This would imply that \(\epsilon ^-=\epsilon ^+=0.\) In this case, we have non-admissible perturbations.

Again, by virtue of the Perron-Frobenius theorem, such perturbed matrix will have a dominant eigenvalue, whose value is 1 and its associated eigenvector \({\mathbf {v}}={\mathbf {v}}(\epsilon _1,\ldots ,\epsilon _n)\) is strictly positive. In other words, \({\mathbf {v}}\) solves:

$$\begin{aligned} \left( {\mathcal {M}} + \sum _{k=1}^n\epsilon _k m_k \right) {\mathbf {v}}= {\mathbf {v}}. \end{aligned}$$
(16)

The goal is to express the perturbed invariant measure \({\mathbf {v}}\) in terms of \({\mathcal {M}}, m_1,\ldots , m_n, \epsilon _1,\ldots \) and \(\epsilon _n\). Not only do we want to calculate \({\mathbf {v}}\) but also describe how the unperturbed measure \({\mathbf {u}}\) responds at a given power of \(\epsilon _k\).

Using multiindex notation, we suppose a formal expansion in powers of \(\epsilon _1, \ldots , \epsilon _n\):

$$\begin{aligned} {\mathbf {v}} = {\mathbf {u}} +\sum _{|\alpha |=1}^{\infty }(\epsilon _1,\ldots ,\epsilon _n)^{\alpha } \mathbf {w} _{\alpha }, \end{aligned}$$
(17)

where \(\mathbf {w}_{\alpha }=\frac{1}{\alpha !}\left( \partial _{\epsilon _1},\ldots ,\partial _{\epsilon _n}\right) ^{\alpha }{\mathbf {v}}\). Substituting this expression in Eq. (16) we obtain:

$$\begin{aligned} \left( {\mathcal {M}} + \sum _{k=1}^{n}\epsilon _k m_k \right) \left( {\mathbf {u}} +\sum _{|\alpha |=1}^{\infty }(\epsilon _1,\ldots ,\epsilon _n)^{\alpha } \mathbf {w} _{\alpha }\right) = {\mathbf {u}} +\sum _{|\alpha |=1}^{\infty }(\epsilon _1,\ldots ,\epsilon _n)^{\alpha } \mathbf {w} _{\alpha }. \end{aligned}$$
(18)

Gathering the terms for \(| \alpha | =1\) we get:

$$\begin{aligned} \mathcal {O}(\epsilon _k): \left( 1 - {\mathcal {M}} \right) \partial _{\epsilon _k}{\mathbf {v}} = m_k {\mathbf {u}}, \end{aligned}$$

for \(k=1,\ldots ,n\). The matrix \(1-{\mathcal {M}}\) cannot be inverted since 1 is an eigenvalue of \({\mathcal {M}}\); we shall discuss this issue in the next section. At the moment, we directly apply the inverted matrix to find that

$$\begin{aligned} \partial _{\epsilon _k}{\mathbf {v}} = \left( 1 - {\mathcal {M}} \right) ^{-1}m_k {\mathbf {u}}. \end{aligned}$$
(19)

We define \(\Psi _k = \left( 1- {\mathcal {M}} \right) ^{-1}m_k\) as linear response operator. This matrix has also been named a differential matrix [26]. If we repeat the process for the second order terms (\(|\alpha | =2\)) we get:

$$\begin{aligned} \mathcal {O}\left( \epsilon _k^2\right) : \frac{1}{2!}\partial ^2_{\epsilon _k}{\mathbf {v}}&= \left( 1 - {\mathcal {M}} \right) ^{-1}m_k \partial _{\epsilon _k}{\mathbf {v}} = \left( \Psi _k \right) ^2{\mathbf {u}} \\ \mathcal {O}\left( \epsilon _k\epsilon _l\right) : \partial ^2_{\epsilon _k,\epsilon _l}{\mathbf {v}}&= \left( 1 - {\mathcal {M}} \right) ^{-1}m_l \partial _{\epsilon _k}{\mathbf {v}} + \left( 1 - {\mathcal {M}} \right) ^{-1}m_k \partial _{\epsilon _l}{\mathbf {v}} \\ {}&= \Psi _k \Psi _l {\mathbf {u}} +\Psi _l \Psi _k {\mathbf {u}}. \end{aligned}$$

Thus, we inductively construct the whole expansion as:

$$\begin{aligned} {\mathbf {v}}&= {\mathbf {u}} + \sum _{i=1}^{\infty } \left( \sum _{k=1}^{n}\epsilon _k \Psi _k \right) ^i{\mathbf {u}} =\left( 1 - \sum _{k=1}^n\epsilon _k \Psi _k \right) ^{-1}{\mathbf {u}}. \end{aligned}$$
(20)

This formula provides a tool to predict the perturbed invariant measure for as long as it converges. Also, it generalises formulas previously found in the probability literature [26] and later revisited when linking them to the study of the response of dynamical systems [21, 22].

Moreover, by simply considering the adjoint matrices, it is possible to develop a response theory for observables. Indeed, let \({\mathcal {J}}\in {\mathbb {R}}^{N}\) represent a generic coarse-grained observable. Then, its expectation value with respect to the perturbed stationary vector \({\mathbf {v}}\) satisfies:

$$\begin{aligned} \left\langle {\mathcal {J}} , {\mathbf {v}} \right\rangle= & {} \left\langle {\mathcal {J}} ,{\mathbf {u}} + \sum _{i=1}^{\infty } \left( \sum _{k=1}^n\epsilon _k \Psi _k \right) ^i{\mathbf {u}} \right\rangle =\left\langle \left( 1 + \sum _{i=1}^{\infty } \left( \sum _{k=1}^n\epsilon _k \Psi _k \right) ^i\right) ^{\top }{\mathcal {J}} ,{\mathbf {u}}\right\rangle \nonumber \\= & {} \left\langle 1 + \sum _{i=1}^{\infty } \left( \sum _{k=1}^n\epsilon _k \Psi _k^{\top } \right) ^i {\mathcal {J}} ,{\mathbf {u}} \right\rangle , \end{aligned}$$
(21)

where \(\left\langle \cdot , \cdot \right\rangle \) denotes the pairing between measures and observables. The advantage of formulas (20) and (21) is that they allow us to identify the response to perturbations at an arbitrary order of nonlinearity including the linear case, which has a special relevance in the physical literature. As a consequence, if one is interested in calculating the sensitivity of the expectation value of some observable \({\mathcal {J}}\) with respect to changes caused by \(\epsilon _k m_k\), one has to truncate the series expansion in the first order term. However, these formulas are still purely formal. We need a deeper understanding of what we mean with \(\left( 1-{\mathcal {M}} \right) ^{-1}\) and for what values of \(\epsilon _k\) they are useful.

2.1 Well-Posedness and Invertibility of \(1-{\mathcal {M}}\)

In this section we will clarify the framework for which the response formulas presented above work. First of all, we will revisit the problem of the non-invertibility of \(1-{\mathcal {M}}\), which has been previously discussed in e.g. Refs. [21, 22, 26]. Secondly, we will assess the convergence of the power expansion in Eq. (20) and include a comment on the stability of the leading eigenvalues of \({\mathcal {M}}\).

The linear response operator is not well defined a priori as 1 is an eigenvalue of the matrix \({\mathcal {M}}\) and hence \(1-{\mathcal {M}}\) is not invertible. However, we can define a more suitable normed space for which the norm of \( {\mathcal {M}}\) is less than one, making \(1-{\mathcal {M}}\) invertible. The idea relies on the fact that the vector space \({\mathbb {R}}^{N}\) on which \({\mathcal {M}}\) is defined admits a splitting of the form:

$$\begin{aligned} {\mathbb {R}}^{N}={\mathrm {Span}}\left( {\mathbf {u}} \right) \oplus V, \end{aligned}$$
(22)

where V is the invariant subspace generated by the generalised eigenvectors of \({\mathcal {M}}\) associated with the eigenvalues distinct to 1 and \({\mathrm {Span}}\left( {\mathbf {u}} \right) \) is the vector subspace spanned by \({\mathbf {u}}\). This space can also be regarded as the kernel of the functional \(\iota :{\mathbb {R}}^N\longrightarrow {\mathbb {R}}\) given by \(\iota \left( {\mathbf {x}} \right) =\sum _i({\mathbf {x}})_i\). Indeed, one observes that if \({\mathbf {u}}_k\) is a generalised eigenvector of \({\mathcal {M}}\), the identity \(\left( {\mathcal {M}} - \lambda _k \right) ^p{\mathbf {u}}_k=\mathbf {0}\) holds for some \(p\in \mathbb {N}\). Hence,

$$\begin{aligned} \iota \left( \left( {\mathcal {M}} - \lambda _k \right) ^p{\mathbf {u}}_k \right)&= \sum _{j=0}^p \begin{pmatrix}p \\ j\end{pmatrix}\lambda ^j _{k}\iota \left( {\mathcal {M}}^{p-j}{\mathbf {u}}_k\right) \\ {}&=\left( 1-\lambda ^p_k \right) \iota ({\mathbf {u}}_k) = 0. \end{aligned}$$

This implies that \(\iota ({\mathbf {u}}_k)=0\), by means of the mixing hypothesis. Furhtermore, noting that \(\iota \left( m_k {\mathbf {x}} \right) =0\) for any \({\mathbf {x}}\in {\mathbb {R}}^N\), we can summarise the idea in the following statement:

Proposition 1

Let \({\mathcal {M}}\in {\mathbb {R}}^{N\times N}\) be a mixing stochastic matrix with invariant measure \({\mathbf {u}}\). Then, for any \({\mathbf {x}}\in {\mathbb {R}} ^N\), \(m_k{\mathbf {x}}\in V\) and \(1-{\mathcal {M}}\) is invertible on V.

The proposition above is more general than the lemma presented in Ref. [21] because we do not assume the existence of a complete set of eigenpairs.

Recall that the linear response operator in Eq. (19) only requires the evaluation of \(\left( 1 - {\mathcal {M}}\right) ^{-1}\) after having calculated \(m_k {\mathbf {u}}\), so writing the inverse explicitly is not that big an abuse, by virtue of the previous proposition. However, we must underline that, numerically, we cannot directly invert \(1-{\mathcal {M}}\). To overcome this problem, we must deflate the matrix, removing the dependence on the dominant eigenspace. Concretely, we have to consider the group inverse [27] \(\mathcal {Z}\) of a Markov chain, which is defined as follows:

$$\begin{aligned} \mathcal {Z}=\left( 1 - {\mathcal {M}} + {\mathcal {M}} ^{\infty } \right) ^{-1}, \end{aligned}$$
(23)

where the matrix \({\mathcal {M}} ^{\infty }\) is defined as \(\lim _{p\rightarrow \infty }{\mathcal {M}} ^p\). This matrix can be shown to be equal to the matrix whose columns are all equal to the invariant measure \({\mathbf {u}}\). Intuitively, \({\mathcal {M}} ^{\infty }\) can be seen as a rank-one projector onto \({\mathrm {Span}}\left( {\mathbf {u}} \right) \). In fact, 1 is not an eigenvalue of \({\mathcal {M}} - {\mathcal {M}}^{\infty }\), making \(1 - {\mathcal {M}} + {\mathcal {M}} ^{\infty }\) invertible. The linear response operator is, thus, given by

$$\begin{aligned} \partial _{\epsilon _k}{\mathbf {v}}=\Psi _k{\mathbf {u}}=\left( 1 - {\mathcal {M}} \right) ^{-1}m_k{\mathbf {u}} = \mathcal {Z}m_k{\mathbf {u}}. \end{aligned}$$
(24)

This way we find a practical way of computing the linear response operator.

To assess convergence, we want to be certain that the \(L^1\) norm of the series \(\sum _{k=1}^{\infty }\left( \epsilon _1\Psi _1 + \ldots +\epsilon _n\Psi _n \right) ^k\) does not blow up. For this problem we introduce the matrix norm \(\Vert \cdot \Vert _{1^{*}}\) which we define as the norm \(\Vert \cdot \Vert _1\) restricted to V. We are now in conditions of applying the ratio test in Eq. (20):

$$\begin{aligned}&\quad \frac{\left\| \left( \sum _k\epsilon _k\Psi _k \right) ^{i+1}{\mathbf {u}}\right\| _1}{\left\| \left( \sum _k\epsilon _k\Psi _k \right) ^{i}{\mathbf {u}}\right\| _1}\\&\quad = \frac{\left\| \left( \sum _k\epsilon _k\Psi _k\right) \left( \sum _k\epsilon _k\Psi _k \right) ^i{\mathbf {u}}\right\| _1}{\left\| \left( \sum _k\epsilon _k\Psi _k \right) ^i{\mathbf {u}}\right\| _1} \le \left\| \sum _{k=1}^n\epsilon _k\Psi _k \right\| _1\\&\quad = \left\| \left( 1 - {\mathcal {M}} \right) ^{-1} \sum _{k=1}^n \epsilon _k m_k \right\| _1 \le \left\| \left( 1 - {\mathcal {M}} \right) ^{-1} \right\| _{1^{*}} \left\| \sum _{k=1}^n \epsilon _km_k \right\| _1 \\&\quad \le \left( 1 - \Vert {\mathcal {M}} \Vert _{1^{*}} \right) ^{-1} \left\| \sum _{k=1}^n\epsilon _km_k \right\| _1 \le \left( 1 - \Vert {\mathcal {M}} \Vert _{1^{*}} \right) ^{-1} \max _k\{\epsilon _k\} n \max _k\{ \Vert m_k \Vert _1\}. \end{aligned}$$

Since we want that the ratio remains lower than 1 to ensure (absolute) convergence, we choose \(\epsilon _1,\ldots , \epsilon _n\) so that

$$\begin{aligned} \vert \max _k\{ \epsilon _k \} \vert <\epsilon _{max} :=\frac{1 - \Vert {\mathcal {M}} \Vert _{1^{*}}}{ n \max _k\{ \Vert m_k \Vert _1\}}. \end{aligned}$$
(25)

By referring to the mixing character of \({\mathcal {M}}\) we conclude that \(\epsilon _{max}\) is finite and positive. Hence, \(\epsilon _{max}\) establishes a tolerance on the size of the perturbation. We would like to underline that the condition for these response formulas to work can be translated to the fact that one requires that the system has to mix mass sufficiently quickly. In other words, the difference in the magnitude of the first and second largest eigenvalues should be sufficiently large, i.e., there should be a spectral gap.

The number \(\left\| {\mathcal {M}} \right\| _{1^{*}}\) is known as the ergodicity coefficient [28] and can be used as a condition number for the Markov chain in the sense that this quantity serves to estimate the norm of the difference between a perturbed and the unperturbed invariant measures [29]. See Ref. [30] for a practical use of the ergodicity coefficient. However, the ergodicity coefficient does not tell us how perturbations affect the localisation of the eigenvalues of the matrix \({\mathcal {M}}\), something crucial if one wants to control the spectral gap. This problem is tackled by means of analysing the stability of the leading non-unit eigenvalues and summarised in the following result found in Ref. [31]:

Proposition 2

Let \({\mathcal {M}}\in {\mathbb {R}} ^{N\times N}\) be a diagonalisable, irreducible and aperiodic stochastic matrix. Let \(X\in {\mathbb {R}} ^{N\times N}\) be a non-singular matrix such that \({\mathcal {M}}=X^{-1}\Lambda X\) with \(\Lambda \) being the diagonal matrix containing the eigenvalues \(1,\lambda _2,\ldots ,\ldots \lambda _N\), of \({\mathcal {M}}\). Suppose that

$$\begin{aligned} \kappa (X)\max _k\{|\epsilon _k|\}n\max _k\{\Vert m_k\Vert \} < \frac{1}{2}\min _{1\le j \le N}\{ 1 - | \lambda _j|\}. \end{aligned}$$
(26)

Then, the perturbed chain \({\mathcal {M}}+\sum _{k=1}^n\epsilon _km_k\) has a unique invariant measure.

The proof of this result relies on classical perturbation theory, in particular on the Bauer–Fike theorem [32]. With a bit more work, one can deduce a bound on the rate of convergence to equilibrium of the perturbed chain, also presented in the work cited above.

The condition shown in Eq. (26) can very restrictive if the process is governed by a highly non-normal Markov matrix, as in this case, the condition number \(\kappa \left( X\right) \) can be very large [33]. However, in order to preserve the spectral gap, the only thing needed is a good conditioning of the eigenvalues closest to the unit circle. Hence, in order to find a sharper stability bound like in Eq. (26) the eigenvalue condition number [34] might be the object to look at.

2.2 Link to Continuous Time Dynamical Systems

In this section, we will investigate how to obtain Markov chains from continuous time dynamical systems up to finite precision via focusing on the evolution of densities given by the Fokker–Planck equation. The ultimate target will be to apply the theory for Markov chains presented earlier to asses the response to perturbations of two simple dynamical systems featuring different characteristics.

Let \(dt>0\), and let us express the Fokker–Planck equation (6) to first order as:

$$\begin{aligned} \rho ({\mathbf {x}},t+dt) = \rho ({\mathbf {x}},t) + dt {\mathcal {A}} \rho ({\mathbf {x}},t) + \mathcal {O}(dt^2). \end{aligned}$$
(27)

The idea is to view the right hand side of the previous equation as the pushforward of the measure \(\rho ({\mathbf {x}},t)\) to \(\rho ({\mathbf {x}},t+dt)\), namely, \(\mathcal {L}^{dt}\rho ({\mathbf {x}},t)\). Therefore, when considering the projection of \(\mathcal {L}^{dt}\) we would be introducing finite Markov chains, as clarified earlier.

Furthermore, a perturbation of the driving vector-field \({\mathbf {F}}\rightarrow {\mathbf {F}}+\sum _k \epsilon _k {\mathbf {G}}_k\) would incur a modification on the Fokker-Planck equation. Again, to first order:

$$\begin{aligned} \rho ({\mathbf {x}},t+dt)&= \rho ({\mathbf {x}},t) + dt{\mathcal {A}}\rho ({\mathbf {x}},t) \\&\quad + dt\sum _k \epsilon _k {\mathcal {B}}_k\rho ({\mathbf {x}},t) + \mathcal {O}(dt^2). \end{aligned}$$

It is natural then to regard \({\mathcal {B}}_k\) as the perturbation operators. In what follows we will frame this perturbation problem in a finite-precision setting by means of the finite representation of the transfer operator.

We observed that matrices introduced in Eq. (9) can be understood as Markov processes describing the probability of mass transitioning between regions of phase-space. We also noted that the measure \(\eta \) employed in the projection of the transfer operator determines the algorithms used. Generally, in dynamical systems, one is interested in the properties of the system in its asymptotic regime. This means that the projection of the transfer operator in this case has to be done with respect to the invariant measure.

In order to sample the invariant measure, long integrations of the system are needed to explore the region of phase-space where the long-term dynamics occur. Due to finite precision, integrations can be translated into time-series with equal time-step dt creating a cloud of points on the domain. After transients have died, sample points will populate certain region on phase-space, possibly shadowing [35] the dynamics on an attractor. Such region can be subdivided into N boxes \(\{B_i\}_{i=1}^N\) with Lebesgue-zero measure intersections. If \(\mathcal {S}\) denotes all the sample points living in \(\cup _i B_i\), the matrix \({\mathcal {M}}\) in Eq. (9) is constructed as:

$$\begin{aligned} {\mathcal {M}}^{dt}_{i,j}=\frac{\# \{ \mathcal {S} \cap B_i \cap \phi ^{-dt}B_j \}}{\#\{\mathcal {S}\cap B_j\}}, \end{aligned}$$
(28)

where \(\# \) is the counting measure. Essentially, this formula is counting the transitions from box to box that sample points of the time-series do after a lag of dt time-units. By construction, it immediately follows that \({\mathcal {M}}^{dt}\) is a stochastic matrix. A matrix constructed this way is called a transition matrix.

This representation of the transfer operator is what we are going to use to approximate the right-hand-side of Eq. (27) to first order. If instead one is considering the perturbed problem the Fokker-Planck equation is modified (see Eq. (7)), and a suitable matrix approximation of the perturbation operator \({\mathcal {B}}_k=-\nabla \cdot \left( {\mathbf {G}}_k\circ \right) \) is needed. Given the discrete setting, we can use finite difference schemes to approximate the differential operator \({\mathcal {B}}_k\). Of course, the way we do this depends on the particular problem. In the next sections we will put this methodology into practise with two examples.

3 Results

The main contribution of this paper is to elaborate a method that allows to predict the statistics of a system subject to forcing. In this section, by examining the Liouville/Fokker–Planck equation of two dynamical systems (one stochastic and one deterministic) we contruct suitable perturbation operators using finite differences that permit calculating the response without integrating the forced systems. These outputs are used to evaluate the statistics of the perturbed system and tested against direct simulations.

3.1 A Stochastic Dynamical System

To illustrate the applicability of the methodology described above, we consider the Ornstein–Uhlenbeck (O–U) process as a test case. The O–U process in \({\mathbb {R}}^d\) is a stochastic process \(\{X(t)\}_{t\ge 0}\) of the form:

$$\begin{aligned} dX = AXdt + \Sigma dW_t, \end{aligned}$$
(29)

where \(A \in {\mathbb {R}}^{d\times d}\) models the linear and deterministic component of the process. The stochastic part of the process is given by \(\Sigma dW_t\), where \(\Sigma \in {\mathbb {R}}^{d\times d}\) indicates the correlations in the d-dimensional Wiener process \(dW_t\). In order for this process to posses an invariant measure, the matrix A is required to have eigenvalues with strictly negative real part [36]. If this condition is met, the process will be stable and the statistics converge to a normal distribution.

The O–U process can also be investigated by considering its associated Fokker–Planck equation:

$$\begin{aligned} \partial _t \rho ({\mathbf {x}},t)&= \sum _{k=1}^2\partial _{x_k}\left( \sum _{l=1}^2 A_{k,l}x_l\rho ({\mathbf {x}},t) \right) \\&\quad + \frac{1}{2}\sum _{k=1}^2\sum _{l=1}^2 \partial ^2_{x_k,x_l}\left( \left( \Sigma \Sigma ^{*}\right) _{k,l}\rho ({\mathbf {x}},t) \right) , \end{aligned}$$

where \(\rho \in L^1(X)\) is a density for each value of time t.

We consider the two dimensional O–U process given by:

$$\begin{aligned} A=\begin{bmatrix} -1 &{} 0 \\ 0 &{} -1 \end{bmatrix}, \text { and } \Sigma =\begin{bmatrix} 1 &{} 0 \\ 0 &{} 1 \end{bmatrix}. \end{aligned}$$
(30)

The variables in this process are uncorrelated and we shall investigate the response of the system when its mean is shifted and correlations are introduced in the noise. Thus, the process we are going to study is given by:

$$\begin{aligned} dX^{\epsilon _1 , \epsilon _2} = A\left( X^{\epsilon _1,\epsilon _2}-\epsilon _1 \mu \right) dt + \sqrt{\Sigma \Sigma ^{*} + \epsilon _2 E}dW_t, \end{aligned}$$
(31)

where

$$\begin{aligned} \mu =\begin{bmatrix} 1 \\ 0 \end{bmatrix}, \text { and } E=\begin{bmatrix} 0 &{} 1 \\ 1 &{} 0 \end{bmatrix}, \end{aligned}$$
(32)

and \(\epsilon _1, \epsilon _2 \in {\mathbb {R}}\) have to satisfy the conditions for the perturbed process to be well defined, namely, \(\Sigma \Sigma ^{*} + \epsilon _2 E\) has to be symmetric positive definite. The modified Fokker–Planck equation therefore is,

$$\begin{aligned} \partial _t \rho ({\mathbf {x}},t)&= - \sum _{k=1}^2\partial _{x_k}\left( \sum _{l=1}^2 A_{k,l}x_l\rho ({\mathbf {x}},t) \right) \\&\quad +\frac{1}{2}\sum _{k=1}^2\sum _{l=1}^2 \partial ^2_{x_k,x_l}\left( \left( \Sigma \Sigma ^{*}\right) _{k,l}\rho ({\mathbf {x}},t). \right) \\&\quad -\epsilon _1 \sum _{k=1}^2\partial _{x_k}\left( \sum _{l=1}^2 \left( A\mu \right) _l\rho ({\mathbf {x}},t) \right) \\&\quad + \frac{\epsilon _2}{2}\sum _{k=1}^2\sum _{l=1}^2 \partial ^2_{x_k,x_l}\left( \left( E\right) _{k,l}\rho ({\mathbf {x}},t) \right) . \end{aligned}$$

In a concise manner, it can be written as

$$\begin{aligned} \partial _t \rho ({\mathbf {x}},t) = {\mathcal {A}}\rho ({\mathbf {x}},t) + \epsilon _1 {\mathcal {B}}_1\rho ({\mathbf {x}},t) + \epsilon _2{\mathcal {B}}_2\rho ({\mathbf {x}},t), \end{aligned}$$
(33)

where the differential operators \({\mathcal {B}}_1\) and \({\mathcal {B}}_2\) are defined by analogy.

As observed in the previous section, by differencing the time-variable with a time-step of \(dt>0\), we immediately obtain a discrete time evolution of densities to first order. Thus, the right-hand-side of the evolution equation is pushing forward the measures at time t to resulting measures at time \(t + dt\). Recall that this is what the transfer operator \(\mathcal {L}^{dt}\) precisely does.

3.1.1 Numerics: Transition and Perturbation Matrices

In order to construct a transition matrix, one has to have in hand a compact region of phase-space where all the long-term dynamics occur. Unfortunately, in the case of the O–U process, phase-space is unbounded because of the presence of white noise. However, one can practically restrict the phase-space to a compact rectangle that with high probability encloses the whole integration of the process.

Let \(\{B_i\}_{i=1}^{2^N}\) be a collection of boxes covering four standard deviations of the process on each direction. In the experiments performed we have considered \(2^{N}\) boxes by means of discretising each axis in \(2^{N/2}\) equally sized segments. We have examined the values \(N=10,12\) and 14 trying to keep a balance between numerical tractability and precision. The box subdivision of phase-space is done using the Matlab package GAIO [19]. We then construct the transition matrix \({\mathcal {M}}^{dt}\) as in Eq. (28), where \(\phi ^{\circ }\) is now replaced by the process \(X(\circ )\). To obtain the long time-series, we integrate it using an Euler–Maruyama scheme for \(10^{6}\) time-units with a time-step of \(dt=10^{-2}\) time-units. \(X \left( {-dt}\right) B_j\) denotes the set of points on phase-space that will end up in \(B_j\) after waiting dt time-units.

Regarding the perturbation operators present in Eq. (33), we examine how two implement them so that they are compatible with the domain discretisation carried out for the transition matrices. Suppose that the box \(B_i\) has center \(c^i_{k,l}\) so that \(c^i_{k \pm 1,l}=c^i_{k,l} \pm [\delta _{x_1} , 0]\), where \(\delta _{x_1}\) is the distance between consecutive boxes along the \(x_1\)-direction (the same for the \(x_2\)-direction). Then, the derivative of \(\rho \) with respect to \(x_1\) at \(c^i_{k,l}\) is given by

$$\begin{aligned} \partial _{x_1} \rho (c^i_{k,l}) = \frac{\rho (c^i_{k+1,l})-\rho (c^i_{k-1,l})}{2\delta _{x_1}} + \mathcal {O}\left( \delta _{x_1} ^2\right) . \end{aligned}$$
(34)

The same scheme is used for the \(x_2\)-direction. For the second and cross derivatives, we implemented the usual second order discretisation:

$$\begin{aligned} \partial ^2 _{x_1} \rho (c^i_{k,l})&=\frac{\rho (c^i_{k+1,l}) - 2\rho (c^i_{k,l})+ \rho (c^i_{k-1,l})}{2\delta _{x_1}^2} + \mathcal {O}(\delta _{x_1}^2), \end{aligned}$$
(35)
$$\begin{aligned} \partial ^2 _{x_1,x_2} \rho (c^i_{k,l})&=\frac{\rho (c^i_{k+1,l+1}) - \rho (c^i_{k+1,l-1})- \rho (c^i_{k-1,l+1}) + \rho (c^i_{k-1,l-1})}{4\delta _{x_1}\delta _{x_2}} + \mathcal {O}(\delta _{x_1}\delta _{x_2}). \end{aligned}$$
(36)

This stencil is completed with the same schemes in the \(x_2\)-direction. These numerical derivatives can be arranged into matrices so that their multiplication with vectors approximate their respective differentiation. Thus we obtain matrix representations \(m_k\) of the operators \({\mathcal {B}}_k\).

Of course, the schemes presented above only work for the interior boxes of the domain. We have not implemented explicit boundary conditions since the values of the derivatives of the invariant measure at the boundary of the compact domain are almost zero. In any case, boundary conditions should not inject/deplete mass so in other cases where the dynamics populate the boundaries, Neumann boundary conditions are the ones to choose [37].

A good compromise was found provided that the resolution was high enough. On Fig. 1, we see that the predicted response was well approximated. This is checked on Table 1 where we show that the error of approximating the perturbed invariant measure using the response formulas in Eq. (21) presented earlier is small. The linear response (see Fig. 2) is precisely doing what one expects: mass is pumped to the right as a consequence of the mean being shifted and the introduction of correlations inflicts a rotation. Higher order correction terms (see Fig. 2) can also give an insight on how the measure is gradually modified. As a safety check, the sum of the components of the response are checked to add up to (almost) zero, meaning that mass is not introduced or depleted.

Fig. 1
figure 1

Response of the O–U to the perturbations considered in Eq. (32). The left-hand figure shows the true response calculated by subtracting the unperturbed invariant measure from the perturbed one. The figure on the right is the predicted response calculated from Eq. (20)

Table 1 The first row refers to the values obtained from integrating the O–U process
Fig. 2
figure 2

The top-left figure shows the linear response of the O–U process with respect to the perturbations considered in Eq. (32) calculated by truncating Eq. (21) at the first order. The top-right and bottom-left show the linear response to changes in \(\epsilon _1\) and \(\epsilon _2\) respectively. The bottom-right figure contains the second order correction obtained via Eq. (21)

3.2 A Deterministic Dynamical System

A model of importance in dynamical systems and geophysical sciences is the Lorenz 63 system [38]. Such system is a low-dimensional model of atmospheric convection that, even if it is far from being a realistic model, it possesses chaotic behaviour and exhibits different regimes, just as in the atmosphere. This systems has been the toy-model to illustrate the difficulty of calculating the sensitivity and response in dynamical systems and in the climate system by extension (e.g. Refs. [39, 40]).

From the point of view of the transfer operator, the response of the Lorenz 63 system was computed in Ref. [21], by means of applying the perturbation theory for finite-state space Markov chains. However, the perturbed dynamics needed to be sampled. In case of low-dimensional models, this is not a major handicap, but in physically relevant systems this can be an expensive procedure. Therefore, we need more predictive skill.

The Lorenz 63 system is a deterministic dynamical system that is given by the following set of ordinary differential equations:

$$\begin{aligned} \dot{\mathbf {x}}(t)={\mathbf {F}}({\mathbf {x}})={\left\{ \begin{array}{ll}&{} s (y - x) \\ &{} x(r - z) - y \\ &{} xy -b z \end{array}\right. }, \end{aligned}$$
(37)

for the classical parameter values \(s =10\), \(b = 8/3\) and control parameter \(r =28\). For such choice of parameters, the Lorenz 63 system displays chaotic dynamics in a singularly hyperbolic attractor that supports an SRB measure [41]. The lack of uniform hyperbolicity implies that the Lorenz 63 system is not Axiom A and therefore one cannot expect response theory to hold. Nevertheless, numerical evidence supports the idea of linear [42] and nonlinear [43] response to be valid in this system.

The perturbation problem we tackle here is that of changing the value of the control parameter \(r \rightarrow r + \epsilon _1\) for \(\epsilon _1 \in {\mathbb {R}}\). This parameter is known as the Rayleigh number and it is proportional to the temperature difference between the convecting layers. The bifurcations associated to this parameter are numerically surveyed in Ref. [44]. We will also study the additive perturbation on the z-variable by adding \(\epsilon _2 \in {\mathbb {R}}\). These perturbations incur a modification on the vector field of the form:

$$\begin{aligned} \dot{\mathbf {x}}(t)={\mathbf {F}}({\mathbf {x}})+\epsilon _1 {\mathbf {G}}_1({\mathbf {x}})+\epsilon _2 {\mathbf {G}}_2({\mathbf {x}}). \end{aligned}$$
(38)

where \({\mathbf {G}}_1({\mathbf {x}})=[0,x,0]'\) and \({\mathbf {G}}_2({\mathbf {x}})=[0,0,1]'\). Consequently, the Liouville equation changes as in Eq. (7), with perturbation operators:

$$\begin{aligned} {\mathcal {B}}_1\rho ({\mathbf {x}},t)=-\nabla \cdot \left( {\mathbf {G}}_1({\mathbf {x}})\rho ({\mathbf {x}},t) \right) \end{aligned}$$
(39)

and

$$\begin{aligned} {\mathcal {B}}_2\rho ({\mathbf {x}},t)=-\nabla \cdot \left( {\mathbf {G}}_2({\mathbf {x}})\rho ({\mathbf {x}},t) \right) . \end{aligned}$$
(40)

Introducing forcing leads to a change in the statistics of the system as shown on Fig. 3, where we considered the example observable z and computed its mean value for equispaced values of \(\epsilon _1 \in [-5,5]\) and \(\epsilon _2\in [-1,1]\). When \(\epsilon _1 \approx -4\), a bifurcation is traversed producing a non-smooth change in the mean value of the observable. Far away from this bifurcation point, the statistics changes smoothly with respect to \(\epsilon _1\) and \(\epsilon _2\). For the calculation of these means, long integrations were considered so that the outcome is uniquely determined. This is possible thanks to the existence of an SRB measure [39].

Notice that differentiating Eq. (7) with respect to \(\epsilon _k\) (\(k=1,2\)) gives \({\mathcal {B}}_k\). So a way of obtaining a matrix representation of \({\mathcal {B}}_k\) is

$$\begin{aligned} \epsilon _k m_k = {\mathcal {M}}_{\epsilon _k}^{dt} - {\mathcal {M}}^{dt}. \end{aligned}$$
(41)

where \({\mathcal {M}}_{\epsilon _k}^{dt}\) is the transition matrix empirically constructed from the perturbed equations. This approach was followed previously [21] and served to calculate the linear response of the Lorenz 63 system, however, it involves two (or more) long integrations of the equations, something we want to avoid. We explain the methodology carried out here in the next section.

Fig. 3
figure 3

Deviation of perturbed expected values of the observable z for \(\epsilon _1 \in [-5,5]\) and \(\epsilon _2\in [-1,1]\). The expectation values were computed by integrating the equations for \(10^3\) time-units with 20 ensemble members for each value of \(\epsilon _1\) and \(\epsilon _2\)

3.2.1 Numerics: Transition and Perturbation Matrices

The invariant measure of the Lorenz 63 system is supported on an attractor, which by definition is compact. Since we know that the Lorenz attractor is contained in an absorbing closed ellipsoid on phase-space [44], we are certain that the square defined by the Cartesian product \([-20,20]\times [-30,30] \times [0, 50]\) will contain the attractor. Then, to obtain the box partition, we divide into two each axis and repeat the procedure in each resulting segment. This way, a total of \(2^N\) boxes \(\{B_i\}_{i=1}^{2^N}\) were constructed for the values of \(N=12,15\) and 18. As mentioned earlier, the box discretisation was carried out using GAIO [19].

Fig. 4
figure 4

Coarse-grained Lorenz 63 invariant measure (left) and the discretised operator \({\mathcal {B}}_1\) applied onto the invariant measure (right)

Table 2 Expectation values with respect to the unperturbed and perturbed (\(\epsilon _1= 0.1\) and \(\epsilon _2 =0.1\)) invariant measure

To obtain the sample points of the dynamics, we integrated the model for \(10^5\) time-units with a time-step of \(dt=10^{-3}\) time-units. After a spinup of ten percent of the points, we count the transitions between the boxes with a time-lag of dt time-units. This gives the transition matrix \({\mathcal {M}}^{dt}\). Thus, a coarse-grained estimation of the invariant measure can be readily obtained by solving the eigenvalue problem in Eq. (11) which is plotted on Fig. 4.

The quality of the approximation can depend on the choice of time-lag [14]. Indeed, a short time-lag can make Ulam’s method introduce artificial diffusion by analogy to the upwind scheme [37]. In our experiments, the approximation of the invariant measure was carried out using a time-lag of dt, which served to estimate the expectation values of certain observables consistently with the resolution, as shown in the first, second and third columns of Table 2.

Regarding the perturbation operators, the same techniques as in the O–U process where employed. The difference here is that we are dealing with an invariant measure that is singular with respect to the Lebesgue measure and therefore is supported in a complicated set, as illustrated by Fig. 4. Hence, one needs to take care of the boundary boxes by simply considering forward/backward approximations of the derivatives at those boxes. Thus was estimated \({\mathcal {B}}_1\) applied to the invariant measure, depicted on the right of Fig. 4.

Fig. 5
figure 5

Predictions for perturbations on r. Mean values of the observables indicated in the panels are predicted using Eq. (21) and plotted against the true means observed empirically via integrations of the system

The results of applying this methodology are presented in Fig. 5, where we show that Eq. (21) can indeed approximate the expectation values certain observables for a wide range of values of \(\epsilon _1\). We underline that these plots demonstrate the validity of the formulae not only to compute the linear response but to predict the statistics of the system. As is natural, the formulas cannot be expected to work for large values of the perturbation parameter, let alone beyond the bifurcation point. When considering the simultaneous forcings \(\epsilon _1{\mathbf {G}}_1\) and \(\epsilon _2 {\mathbf {G}}_2\), we examine the efficiency of the formulas (Table 2). We calculate the expectation value of certain observables and check that we can approximate their mean for small values of \(\epsilon _1\) and \(\epsilon _2\). Also, the linear response is calculated. As we see in the last row, refining the resolution does not always improve the results. This is due to the fact that the length of the integration has to be severely extended in order to sample the boxes covering the domain. In general, the skill of the methodology is shown in Fig. 6 where the expectation values of some observables are predicted using the response formula Eq. (21). Of course, the validity was only tested within the convergence interval.

The Ulam method is prone to errors when estimating the invariant measure, and such errors are larger (in relative terms) where the empirical occupation rate is smaller. This is made worse when one considers finite differences. Nonetheless, we expect that the errors we introduce are small in absolute terms and localised in the phase space. Therefore, if one considers smooth observables, the overall contribution coming from those regions might end up being small. The relevance of using smooth observables should not come as a surprise: Ruelle’s [5] response theory only works for \(C^3\) observables. Considering less regular observables can lead to fundamental modifications to the theory; see discussion in Ref. [45].

Fig. 6
figure 6

Relative error \((\%)\) of the prediction of the expected values of the observables indicated in the plots. The prediction was done by evaluating Eq. (21) and the true values were obtained empirically by integrating the equations for each value of \(\epsilon _1\) and \(\epsilon _2\)

4 Discussion

In this work, we take the point of view of statistics to understand the effect of perturbations on dynamical systems. Under this setting, the transfer operator is the essential object of study. By means of projecting such operator onto a finite dimensional vector space, it is possible to obtain empirically constructed matrix representations of the transfer operator via Markov modelling. Further, the properties of these matrices allow us to, up to finite precision, describe the dynamical system of interest.

Exploiting the stochastic (or Markovian) structure of the projected transfer operator, we consider multiple perturbations and express the resulting perturbed invariant measure as a series expansion describing the response at all orders of nonlinearity. Alongside previous work [21, 22, 26], it is possible to link the probabilistic concepts of finite Markov chains to dynamical systems. Namely, the mixing rate of the unperturbed chain indicated by its second eigenvalue or more generally its ergodicity coefficient determine the validity and control the practical applicability of the perturbative expansions.

The linear component in the perturbative expansion is the so called linear response and it is of physical interest [1]. This quantity is accessed by constructing suitable response operators and indicates the sensitivity of a dynamical system to prescribed perturbations. Here we emphasize the need of gaining predictive skill: given a perturbation to the vector field (possibly induced by tuning several paramters), can we calculate the sensitivity of the system a priori? For such purpose, we examined the Fokker–Planck/Liouville equation to identify the operators that incur the perturbations on the evolution of densities. Then, by considering simple finite difference methods, we were able to model (to finite-precision) matrix perturbations that allowed to exploit the perturbative formulas giving us access to the linear and nonlinear response of the systems. Notice that using this method, we only need one integration of the (unforced) model to determine the response and sensitivity.

The two numerical experiments performed were intrinsically different. The Ornstein-Uhlenbeck process is a stochastic dynamical system with an invariant measure with a density (with respect to the Lebesgue measure) which is smooth. Therefore, differential operators (discretised using finite differences) can work correctly. On the other hand, the Lorenz 63 model is a dissipative deterministic system with an attractor with Lebesgue-zero measure that is singularly hyperbolic. This translates into the fact that linear response theory is not theoretically proved [5]. Moreover, the invariant measure of such system is not smooth, so the usual notion of differentiation does not hold. In the experiments, we coarse-grained the invariant measure to obtain a vector estimate. The corresponding differential operators were applied giving a good compromise, suggesting that at a coarse-grained level, the usual methods for numerical differentiation should work. Previous investigations [37] illustrate that coarse-graining the domain inherently provokes the introduction numerical diffusion thus smoothening the invariant measure.

The partitioning of phase-space is an arbitrary decision of the modeller. In these numerical experiments, the partitioning is considered to be uniform: boxes are of equal size. This is not optimal, since there might be regions of phase-space that need more refining than others [18, 46]. The reason for choosing equally sized boxes is that the invariant measures obtained gave a good compromise when approximating the expectation value of observables. In any case, comparing box discretisations is not the target of the paper.

The low dimensionality of the problems considered in this paper allows to perform the box subdivision on the whole phase-space. Unfortunately, many physically relevant models posses a domain which is high-dimensional not to say infinite-dimensional. In these cases, it appears intractable to work on the complete domain. Because of this dimensionality barrier, reduced phase-spaces are considered. Results along this line support the applicability of the transfer operator methods in a climatological context, see, e.g., Refs. [14, 15, 24]. However, the inherent loss of Markovianity in the dimensionality reduction requires control on the memory effects introduced by the hidden variables (see, e.g., Ref. [47]), something that complicates the study of the response, as pointed out in Ref. [48] where the robustness of reduced systems with the presence of forcing is assessed. Further research should be oriented on adapting this methodology based on the transfer operator in higher-dimensional systems with vistas to assessing and predicting the sensitivity of physically relevant systems.