Response and Sensitivity Using Markov Chains

Dynamical systems are often subject to forcing or changes in their governing parameters and it is of interest to study how this affects their statistical properties. A prominent real-life example of this class of problems is the investigation of climate response to perturbations. In this respect, it is crucial to determine what the linear response of a system is to small perturbations as a quantification of sensitivity. Alongside previous work, here we use the transfer operator formalism to study the response and sensitivity of a dynamical system undergoing perturbations. By projecting the transfer operator onto a suitable finite dimensional vector space, one is able to obtain matrix representations which determine finite Markov processes. Further, using perturbation theory for Markov matrices, it is possible to determine the linear and nonlinear response of the system given a prescribed forcing. Here, we suggest a methodology which puts the scope on the evolution law of densities (the Liouville/Fokker-Planck equation), allowing to effectively calculate the sensitivity and response of two representative dynamical systems.


I. INTRODUCTION
Response theory is the scientific research area at the boundary between mathematics and physics dealing with the understanding of how complex systems react to perturbations affecting their dynamics. Even if it addresses is a very classical problem, the mathematical framework to develop such a theory is still a matter of research.
Although the construction of response theory can be approached by taking many different scientific point of views, it has been mostly driven by statistical mechanics 1 . In this context, by considering a complex system in a steady state (equilibrium or nonequilibrium) and applying some sort of forcing to the dynamics as, e.g., a change in the governing parameters, one is interested in analysing the resulting deviation from the steady state and, in some cases, its relaxation to the new one.
We can express these ideas formally as follows. Let {φ t } t≥0 be a dynamical system on a compact domain X ⊂ R d generated by the evolution equationẋ = F(x). We take here the continuous time point of view but one can equivalently formulate the problem for discrete dynamics. Furthermore, let J : X −→ R denote a generic observable. Assuming that the system is in a steady state, we want to analyse the how the values of the observable J change when the dynamics are subject to a perturbation of the form: where ε ∈ R is the perturbation parameter and G is the perturbation vector-field, which is assumed to generate (together with F) the perturbed dynamical system {φ t ε } t≥0 . Thus, the value of the observable J will change as a result of the perturbation in the following fashion: This equation describes how the observable changes with time, but only upon integration of the perturbed system can we know what its expected value is. The question we, thus, want to address is that of understanding and predicting the mean behaviour of quantities of interest upon modification of the governing dynamics. In a complex system evolving with time, the mathematical object that accounts for the statistical description of its asymptotic regime is the invariant measure 2 . This measure is called invariant because it does not change under the action of the system. Roughly speaking, it tells us how mass is distributed on phase-space in the far future. Thus, if a system is undergoing perturbations, its invariant measure will change and consequently, the expected value of the observables of interest. If the perturbation parameter ε is small enough, one can ask about the effect of the perturbation on the system at a given order of nonlinearity, this is, the response.
Let ρ and ρ ε denote the unperturbed and perturbed measures respectively. If we use the notation introduced earlier, we would like to compute the expectation value of the observable J in the perturbed steady state ρ ε , J in terms of its former expectation value ρ, J and suitable linear and nonlinear correction terms: If one has access to δ [J ] k , one could effectively predict what the expected value in the perturbed system and describe its sensitivity to a certain perturbation. In an isolated system in equilibrium governed by a Hamiltonian, R. Kubo 3 found a closed set of formulas for the response describing δ [J ] k and establishing the connection with the fluctuation-dissipation theorem (FDT). Such theorem can be seen as a dictionary allowing for predicting the response of a system from its free fluctuations. However, the nature of the problem tackled in Kubo's work did not allow to extend it to a more general case of nonequilibrium complex systems and the validity of the formulas was not fully addressed. In fact, deterministic dynamical systems featuring contraction of the phase-space posses invariant measures that are singular with respect to the Lebesgue measure. The absence of a regular density is a major barrier that makes impossible the straightforward application of the FDT, thus breaking the one-to-one connection between forced and free fluctuations of a system.
In the work by D. Ruelle 4,5 , such framework was clarified at a mathematical level, allowing to apply response theory in nonequilibrium systems. Ruelle's results are based on the use of Markov partitions and provide a rigorous response theory for Axiom A 6 systems. Essential ingredients for the theory are the fact that the unperturbed system are structurally stable and that one can split the response operator in two parts. One refers to the contribution coming from the unstable and central manifolds and can be framed as an FDT result. The second contribution comes from the stable manifold and gives an additional term that cannot be described using the free fluctuations of the system. Hence, a suitable notion of differentiability of singular measures was established allowing to consider perturbative expansions as in Eq. (3) for a general class of deterministic dynamical systems.
The Ruelle response theory has provided a key framework for constructing algorithms aimed at practically computing the response of nonequilibrium systems to perturbations, see e.g. Ref. 7, and, specifically, for performing successfully climate change predictions using simple, see e.g. Ref. 8, and comprehensive climate models, see e.g. Refs. 9 and 10.
Despite a good degree of success in the above mentioned studies, the presence of the two distinct contributions described above makes the construction of accurate response algorithms very challenging, see discussion in Ref. 8. A different point of view, based on the study of the evolution of probabilities rather than of individual trajectories, seems necessary; see below.

A. The Transfer Operator Approach
Indeed, one can understand response theory by means of studying the effect of perturbations on the evolution of mea-sures on phase-space as opposed to trajectories. Formulating response theory under this point of view uses essentially different machinery mostly hovering around the so called transfer operator 11 .
Let us suppose that F : X ⊆ R d −→ R d is a vector-field that generates the dynamical system or flow {φ t } t∈R , with φ t : X −→ X. Then, the transfer operator semigroup {L t } t≥0 can be defined as the solution of the Liouville equation 2 : so that ρ(x,t) = L t ρ 0 (x) for some initial condition ρ 0 ∈ L 1 (X). In the language of probability, the transfer operator is describing the pushforward of an integrable function under the action of the dynamical system after t time-units. It turns out that L t is a contraction and the set {L t } t≥0 is a C 0 -semigroup 12 .
Eq. (4) symbolises a process where mass is only advected along the flow. However, in many areas and applications, the governing dynamics are stochasticly perturbed introducing uncertainty to the problem. This sort of perturbations can be modelled by stochastic processes of the form: where we introduce here a standard d-dimensional Wiener process dW t and Σ(x) ∈ R d×d is the covariance matrix. In terms of measures, stochasticity can be translated into the fact that their evolution is not only driven by advection, but diffusion is present. The Liouville is thus transformed into the Fokker-Planck equation 13 : We have defined the differential operator A which can be shown to generate a C 0 -semigroup, just as the Liouville equation does. Notice that the differential operator on the righthand-side of Eq. (4) is the same as A if no noise is present.
The transfer operator is therefore a statistical tool rather than a dynamical one. It globally describes how densities evolve with time as opposed of giving a trajectory-wise description of the system. In fact, the spectral properties of the transfer operator carry information about the statistical properties of the system of interest. For instance, one observes that the fixed points of L t are nothing else than the measure that remain fixed under the action of the dynamics, namely, the invariant measure. In fact, the ergodicity and mixing charac-ter of a dynamical system can be characterised in terms of the nature of the leading eigenvalues of L t (e.g. Ref. 11).
Introducing perturbations on the system not only does it affect to the way trajectories evolve on phase-space but also to measures. This is reflected on the Fokker-Planck evolution equation where by considering a perturbation on the vectorfield of the kind F → F + ∑ k ε k G k we obtain a perturbed evolution law for the density: where B k = −∇ · (G k •). Under some conditions 12 the previous equation also generates a C 0 -semigroup with the same functional properties as the unperturbed one. One can also introduce perturbations on the covariance matrix, which would lead to a similar equation as the one above with different perturbation operators.

Projected Transfer Operators and Markov Chains
The transfer operator has been used to solve problems in different areas of science, see e.g. examples in geosciences Refs. 14-16. In these applications phase-space is not defined as a continuum but as finite collection of regions. As a result, each time the transfer operator is applied, a finite probability mass is shifted from one region to another one. For this reason, in applications, the transfer operator has to be understood in a finite dimensional setting.
To do this, we shall consider a finite subdivision of phasespace X into N regions or boxes {B i } N i=1 and define 1 B i as the characteristic function on box B i ⊂ X. Thus, we define the projection P N : where η indicates some notion of volume that can depend on the problem. It follows that the operator P N L t : U N −→ U N admits a matrix representation: which happens to be a Markov or stochastic matrix. The way we practically construct M t depends on the choice of measure η. Generally, if one wants to study the asymptotic properties of the system one will take η as the invariant measure. On the other hand, if the focus is put on the effects of the flow on the whole domain X, and especially if one is interested in problems of relaxation to the steady state, the correct choice would be to consider the Lebesgue measure instead 17 . It is worth highlighting that the properties of the dynamical system φ t are described by L t . Perturbations on the dynamics lead to perturbations on the transfer operator L t and, consequently on the matrix M t . This fact motivates the problem of reverting the question, i.e., by considering perturbations of M t , what can we say about the dynamics? This question has been tackled in previous work 18,19 by suitably constructing the response operators. Alongside the development of the probabilistic theory, it was possible to establish a connection between the perturbation theory of Markov chains and linear response theory for dynamical systems, applying it to a number of low dimensional stochastic and deterministic dynamical systems.
Constructing the operator that accounts for the linear response can be a difficult and costly task 7 . Linear response can be inferred by examining how the system of interest responds to small perturbations in the governing equations. However, this process can be an expensive procedure if one deals with high-dimensioinal models with physical relevance. Therefore, one desires to have predictive power. In this paper we demonstrate that it is possible to calculate, by using finite representations of the transfer operator, the response of a system by sampling its unperturbed dynamics and prior knowledge of the forcings applied to it. The overall goal is to provide practically usable tools for studying the response of complex nonequilibrium system, like the climate, to perturbations. In Section II we will present the perturbation theory for Markovian matrices, key to construct the response operator at a coarse-grained level. In Section III we investigate how to put into practise such theory when dealing with continuous in time dynamical systems and illustrate the applicability of the methodology with two dynamical systems of interest.

II. PERTURBATIONS OF FINITE MARKOV CHAINS
In this section, we consider a mixing (therefore, ergodic) Markov process with a finite number of states N ∈ N. Then, a positive vector u 0 ∈ R N with ∑ N i=1 (u 0 ) i = 1 would indicate an initial ensemble of states. The finite Markov process would be determined by a stochastic matrix M ∈ R N×N so that the sequence {M n u 0 } ∞ n=0 would be a realisation of it. The stochastic matrix M enjoys some properties worth highlighting. First, we note that M i, j is the probability of jumping into the ith state conditioned on being at the jth. Thus, it follows that for any j = 1, . . . , N. Therefore, all the entries of M are greater than or equal to zero. Further, since the process is mixing, it implies that there exists p ∈ N, so that the entries of the matrix M p are strictly greater than zero 20 . Matrices satisfying this condition are called irreducible and aperiodic 11 or primitive 20 , although we shall refer to them as mixing by analogy with the Markov process they determine. Consequently, the Perron-Frobenius theorem 20 holds for this matrix, meaning that there exists a leading eigenvalue λ 1 with positive eigenvector u. By virtue of the spectral properties of stochastic matrices, it turns out that λ 1 = 1 and u solves This means that u remains invariant under the action of M . If, in addition, u is normalised so that its entries sum up to one, we call u the invariant measure of the process and it follows that lim p→∞ M p u 0 = u, for any normalised non-zero vector u 0 ∈ R N . The next step now is to perturb the stochastic matrix and express the resulting perturbed invariant measure as a perturbative expansion. In what follows, we generalise what reported in Ref. 18. For that, we consider a perturbation of M of the form: where ε 1 , . . . , ε n ∈ R and m 1 , . . . , m n ∈ R N×N . The matrices m k are what we will call the perturbation matrices. Note that the perturbed matrix M + ∑ n k=1 ε k m k , must also be a stochastic matrix if we want it to describe a Markov process. This requirement corresponds to having for any k and j. This assures that the columns of M + ∑ n k=1 ε k m k add up to one. More, non-negativity must be preserved, so not all choices of ε k are valid. For this, we define and (15) Hence, to ensure non-negativity of the perturbed Markov process, we must have that max k ε k ∈ [ε − , ε + ]. To ensure that the latter interval is non-empty, we need to make sure that This would imply that ε − = ε + = 0. In this case, we have non-admissible perturbations.
Again, by virtue of the Perron-Frobenius theorem, such perturbed matrix will have a dominant eigenvalue, whose value is 1 and its associated eigenvector v = v(ε 1 , . . . , ε n ) is strictly positive. In other words, v solves: The goal is to express the perturbed invariant measure v in terms of M , m 1 , . . . , m n , ε 1 , . . . and ε n . Not only do we want to calculate v but also describe how the unperturbed measure u responds at a given power of ε k . Using multiindex notation, we suppose a formal expansion in powers of ε 1 , . . . , ε n : where Gathering the terms for |α| = 1 we get: for k = 1, . . . , n. The matrix 1 − M cannot be inverted since 1 is an eigenvalue of M ; we shall discuss this issue in the next section. At the moment, we directly apply the inverted matrix to find that We define Ψ k = (1 − M ) −1 m k as linear response operator. This matrix has also been named a differential matrix 21 . If we repeat the process for the second order terms (|α| = 2) we get: Thus, we inductively construct the whole expansion as: This formula provides a tool to predict the perturbed invariant measure for as long as it converges. Perturbative formulas like this were found in the probability literature 21 and later revisited when linking them to the study of the response of dynamical systems 18,19 .
Moreover, by simply considering the adjoint matrices, it is possible to develop a response theory for observables. Indeed, let J ∈ R N represent a generic coarse-grained observable. Then, its expectation value with respect to the perturbed stationary vector v satisfies: where ·, · denotes the pairing between measures and observables. The advantage of formulas (20) and (21) is that they allow us to identify the response to perturbations at an arbitrary order of nonlinearity including the linear case, which has a special relevance in the physical literature. As a consequence, if one is interested in calculating the sensitivity of the expectation value of some observable J with respect to changes caused by ε k m k , one has to truncate the series expansion in the first order term. However, these formulas are still purely formal. We need a deeper understanding of what we mean with (1 − M ) −1 and for what values of ε k they are useful.
A. Well-posedness and Invertibility of 1 − M In this section we will clarify the framework for which the response formulas presented above work. First of all, we will revisit the problem of the non-invertibility of 1 − M . Secondly, we will assess the convergence of the power expansion in Eq. (20) and include a comment on the stability of the leading eigenvalues of M .
The linear response operator is not well defined a priori as 1 is an eigenvalue of the matrix M and hence 1 − M is not invertible. However, we can define a more suitable normed space for which the norm of M is less than one, making 1 − M invertible. The idea relies on the fact that the vector space R N on which M is defined admits a splitting of the form: where V is the invariant subspace generated by the generalised eigenvectors of M associated with the eigenvalues distinct to 1 and Span (u) is the vector subspace spanned by u. This space can also be regarded as the kernel of the functional ι : R N −→ R given by ι (x) = ∑ i (x) i . Indeed, one observes that if u k is a generalised eigenvector of M , the identity (M − λ k ) p u k = 0 holds for some p ∈ N. Hence, This implies that ι(u k ) = 0, by virtue of the mixing hypothesis. Furhtermore, noting that ι (m k x) = 0 for any x ∈ R N , we can summarise the idea in the following statement: Proposition II.1 Let M ∈ R N×N be a mixing stochastic matrix with invariant measure u. Then, for any x ∈ R N , m k x ∈ V and 1 − M is invertible on V .
Recall that the linear response operator in Eq. (19) only requires the evaluation of (1 − M ) −1 after having calculated m k u, so writing the inverse explicitly is not that big an abuse. However, we must underline that, numerically, we cannot directly invert 1 − M . To overcome this problem, we must deflate the matrix, removing the dependence on the dominant eigenspace. Concretely, we have to consider the group inverse 22 Z of a Markov chain, which is defined as follows: where the matrix M ∞ is defined as lim p→∞ M p . This matrix can be shown to be equal to the matrix whose columns are all equal to the invariant measure u. Intuitively, M ∞ can be seen as a rank-one projector onto span (u). In fact, 1 is not The linear response operator is, thus, given by This way we find a practical way of computing the linear response operator.
To assess convergence, we want to be certain that the L 1 norm of the series ∑ ∞ k=1 (ε 1 Ψ 1 + . . . + ε n Ψ n ) k does not go to infinity. For this problem we introduce the matrix norm · 1 * which we define as the norm · 1 restricted to V . We are now in conditions of applying the ratio test in Eq. (20): Since we want that the ratio remains lower than 1 to ensure (absolute) convergence, we choose ε 1 , . . . , ε n so that By referring to the mixing character of M we conclude that ε max is finite and positive. Hence, ε max establishes a tolerance on the size of the perturbation. We would like to underline that the condition for these response formulas to work can be translated to the fact that one requires that the system has to mix mass sufficiently quickly.
The number M 1 * is known as the ergodicity coefficient 23 and can serve as a condition number for the Markov chain in the sense that this quantity serves to estimate the norm of the difference between a perturbed and the unperturbed invariant measures 24 . See Ref. 25 for a practical use of the ergodicity coefficient. However, the ergodicity coefficient does not tell us how perturbations affect the localisation of the eigenvalues of the matrix M , something crucial if one wants to control the spectral gap. This problem is tackled by means of analysing the stability of the leading non-unit eigenvalues and summarised in the following result found in Ref. 26: Proposition II.2 Let M ∈ R N×N be a diagonalisable, irreducible and aperiodic stochastic matrix. Let X ∈ R N×N be a non-singular matrix such that M = X −1 ΛX with Λ being the diagonal matrix containing the eigenvalues 1, λ 2 , . . . , . . . λ N , of M . Suppose that Then, the perturbed chain M + ∑ n k=1 ε k m k has a unique invariant measure.
The proof of this result relies on classical perturbation theory, in particular on the Bauer-Fike theorem 27 . With a bit more work, one can deduce a bound on the rate of convergence to equilibrium of the perturbed chain, also presented in the work cited above.
The condition shown in Eq. (26) can very restrictive if the process is governed by a highly non-normal Markov matrix, as in this case κ (X) can be very large. However, in order to preserve the spectral gap, the only thing needed is a good conditioning of the eigenvalues closest to the unit circle. Hence, in order to find a sharper stability bound like in Eq. (26) the eigenvalue condition number 28 might be the object to look at.

III. LINK TO CONTINUOUS TIME DYNAMICAL SYSTEMS
In this section, we will investigate how to obtain Markov chains from continuous time dynamical systems up to finite precision via focusing on the evolution of densities given by the Fokker-Planck equation. The ultimate target will be to apply the theory for Markov chains presented earlier to asses the response to perturbations of two simple dynamical systems featuring different characteristics.
In general, we proceed the following way. Let dt > 0, and let us express the Fokker-Planck equation (5) to first order as: The idea is to view the right hand side of the previous equation as the pushforward of the measure ρ(x,t) to ρ(x,t + dt), namely, L dt ρ(x,t). Therefore, when considering the projection of L dt we would be introducing Markov chains, as clarified earlier.
Furthermore, a perturbation of the driving vector-field F → F + ∑ k ε k G k would incur a modification on the Fokker-Planck equation. Again, to first order: It is natural then to regard B k as the perturbation operators. In what follows we will frame this perturbation problem in a finite-precision setting by means of the finite representation of the transfer operator.

A. Transition Matrices
We observed that matrices introduced in Eq. (9) can be understood as Markov processes describing the probability of mass transitioning between regions of phase-space. We also noted that the measure η employed in the projection of the transfer operator determines the algorithms used. Generally, in dynamical systems, one is interested in the properties of the system in its asymptotic regime. This means that the projection of the transfer operator in this case has to be done with respect to the invariant measure.
In order to sample the invariant measure, long integrations of the system are needed to explore the region of phase-space where the long-term dynamics occur. Due to finite precision, integrations can be translated into time-series with equal timestep dt creating a cloud of points on the domain. After transients have died, sample points will populate certain region on phase-space, possibly shadowing 29 the dynamics on an attractor. Such region can be subdivided into N boxes with Lebesgue-zero measure intersections. If S denotes all the sample points living in ∪ i B i , the matrix M in Eq. (9) is constructed as: where # is the counting measure. Essentially, this formula is counting the transitions from box to box that sample points of the time-series do after a lag of dt time-units. By construction, it immediately follows that M dt is a stochastic matrix. A matrix constructed this way is called a transition matrix. This representation of the transfer operator is what we are going to use to approximate the right-hand-side of Eq. (27) to first order. If instead one is considering the perturbed problem the Fokker-Planck equation is modified (see Eq. (7)), and a suitable matrix approximation of the perturbation operator B k = −∇ · (G k •) is needed. Given the discrete setting, we can use finite-volume schemes to approximate the differential operator B k . Of course, the way we do this depends on the particular problem. In the next sections we will put this methodology into practise with two examples.

B. A Stochastic Dynamical System
To illustrate the applicability of the methodology described above, we consider the Ornstein-Uhlenbeck (O-U) process as a test case. The O-U process in R d is a stochastic process {X(t)} t≥0 of the form: were A ∈ R d×d and models the linear and deterministic component of the process. The stochastic part of the process is given by ΣdW t , where Σ ∈ R d×d indicates the correlations in the d-dimensional Wiener process dW t . In order for this process to posses an invariant measure, the matrix A is required to have eigenvalues with strictly negative real part 30 . If this condition is met, the process will be stable and the statistics converge to a normal distribution.
The O-U process can also be viewed from the statistical scope by considering its associated Fokker-Planck equation: where ρ ∈ L 1 (X) is a density for each value of time t.
We consider the two dimensional O-U process given by: The variables in this process are totally uncorrelated and we shall investigate the response of the system when its mean is shifted and correlations are introduced in the noise. Thus, the process we are going to study is given by: where µ = 1 0 , and E = 0 1 1 0 , and ε 1 , ε 2 ∈ R and have to satisfy the conditions for the perturbed process to be well defined, namely, ΣΣ * + ε 2 E has to be symmetric positive definite. The modified Fokker-Planck equation therefore is, In a concise manner, it can be written as t), (34) where the differential operators B 1 and B 2 are defined by analogy.
As observed in the previous section, by differencing the time-variable with a time-step of dt > 0, we immediately obtain a discrete time evolution of densities to first order. Thus, the right-hand-side of the evolution equation is pushing forward the measures at time t to resulting measures at time t + dt. Recall that this is what the transfer operator L dt precisely does.

Numerics: transition and perturbation matrices
In order to construct a transition matrix, one has to have in hand a compact region of phase-space where all the longterm dynamics occur. Unfortunately, in the case of the O-U process, phase-space is unbounded because of the presence of white noise. However, one can practically restrict the phasespace to a compact rectangle that with high probability encloses the whole integration of the process.
Let {B i } 2 N i=1 be a collection of boxes covering four standard deviations of the process on each direction. In the experiments performed we have considered 2 N boxes by means of discretising each axis in 2 N/2 equally sized segments. We have examined the values N = 10, 12 and 14 trying to keep a balance between numerical tractability and precision. The box subdivision of phase-space is done using the MATLAB package GAIO 31 . We then construct the transition matrix M dt as in Eq. (29), where φ • is now replaced by the process X(•). To obtain the long time-series, we integrate it using an Euler-Maruyama scheme for 10 6 time-units with a time-step of dt = 10 −2 time-units. X (−dt) B j denotes the set of points on phase-space that will end up in B j after waiting dt timeunits.
Regarding the perturbation operators present in Eq. (34), we examine how two implement them so that they are compatible with the domain discretisation carried out for the transition matrices. Suppose that the box B i has center c i k,l so that c i k±1,l = c i k,l ± [δ x 1 , 0], where δ x 1 is the distance between consecutive boxes along the x 1 -direction (the same for the x 2direction). Then, the derivative of ρ with respect to x 1 at c i k,l is given by The same scheme is used for the x 2 -direction. For the second and cross derivatives, we implemented the usual second order discretisation: TABLE I. The first row refers to the values obtained from integrating the O-U process. The rest of the rows are values obtained via discretisation of the transfer operator. We defined Error1 as the L 2 norm of the difference between the coarse-grained perturbed invariant measure and the first order correction. Error2 is the same but with higher order correction terms. x This stencil is completed with the same schemes in the x 2direction. These numerical derivatives can be arranged into matrices so that their multiplication with vectors approximate their respective differentiation. Thus we obtain matrix representations m k of the operators B k . Of course, the schemes presented above only work for the interior boxes of the domain. We have not implemented explicit boundary conditions since the values of the invariant measure of the O-U process at the boundary of the compact domain are almost zero. In any case, boundary conditions should not inject/deplete mass so in other cases where the dynamics populate the boundaries, Neumann boundary conditions are the ones to choose 32 .
A good compromise was found provided that the resolution was high enough. On Fig. 1, we see that the predicted response was well approximated. This is checked on Table I where we show that the error of approximating the perturbed invariant measure using the response formulas in Eq. (21) presented earlier is small. The linear response (see Fig. 2) is precisely doing what one expects: mass is pumped to the right as a consequence of the mean being shifted and the introduction of correlations inflicts a rotation. Higher order correction terms (see Fig. 2) can also give an insight on how the measure is gradually modified. As a safety check, the sum of the components of the response are checked to add up to (almost) zero, meaning that mass is not introduced or depleted.

C. A Deterministic Dynamical System
A model of importance in dynamical systems and climatology is the Lorenz 63 system 33 . Such system is a lowdimensional model of atmospheric convention that, even if it is far from being a realistic model, it possesses chaotic behaviour and exhibits different regimes, just as in the atmo-sphere. This systems has been the toy-model to illustrate the difficulty of calculating the sensitivity and response in dynamical systems and in the climate system by extension (e.g. Refs. 34 and 35).
From the point of view of the transfer operator, the response of the Lorenz 63 system was computed in Ref. 18, by means of applying the perturbation theory for finite-state space Markov chains. However, the perturbed dynamics needed to be sampled. In case of low-dimensional models, this is not a major handicap, but in physically relevant systems this can be an expensive procedure. Therefore, we need more predictive skill. The Lorenz 63 system is a deterministic dynamical system that is given by the following set of ordinary differential equations:ẋ for the classical parameter values s = 10, b = 8/3 and control parameter r = 28. For such parameter values, the Lorenz 63 system displays chaotic dynamics in a singularly hyperbolic attractor that supports an SRB measure 36 . The lack of uniform hyperbolicity implies that the Lorenz 63 system is not Axiom A and therefore one cannot expect response theory to hold. Nevertheless, numerical evidence supports the idea of linear 37 and nonlinear 38 response to be valid in this system. The perturbation problem we tackle here is that of changing the value of the control parameter r → r + ε 1 for ε 1 ∈ R. This parameter is known as the Rayleigh number and it is proportional to the temperature difference between the convecting layers. The bifurcations associated to this parameter are numerically surveyed in Ref. 39. We would also study the additive perturbation on the z-variable by adding ε 2 ∈ R. These perturbations incur a modification on the vector field of the form:ẋ where G 1 (x) = [0, x, 0] and G 2 (x) = [0, 0, 1] . Consequently, the Liouville equation changes as in Eq. (4), with perturbation operators: and Introducing forcing provokes a change in the statistics of the system as shown on Fig. 3, where we considered the example observable z and computed its mean value for equispaced values of ε 1 ∈ [−5, 5] and ε 2 ∈ [−1, 1]. When ε 1 ≈ −4, a bifurcation is traversed producing a non-smooth change in the mean value of the observable. Far away from this bifurcation point, the statistics changes smoothly with respect to ε 1 and ε 2 . For the calculation of these means, long integrations were considered so that the outcome is uniquely determined. This is possible thanks to the existence of an SRB measure 34 .
Notice that differentiating Eq. (7) with respect to ε k (k = 1, 2) gives B k . So a way of obtaining a matrix representation of B k is Where M dt ε k is the transition matrix empirically constructed from the perturbed equations. This approach was followed previously 18 and served to calculate the linear response of the Lorenz 63 system, however, it involves two (or more) long integrations of the equations, something we want to avoid. We explain the methodology carried out here in the next section.

Numerics: transition and perturbation matrices
The invariant measure of the Lorenz 63 system is supported on attractor, which by definition is compact. Since we know that the Lorenz attractor is contained in an absorbing closed ellipsoid on phase-space 39 , we are certain that the square defined by the Cartesian product [−20, 20] × [−30, 30] × [0, 50] will contain the attractor. Then, to obtain the box partition, we divide into two each axis and repeat the procedure in each resulting segment. This way, a total of 2 N boxes {B i } 2 N i=1 were constructed for the values of N = 12, 15 and 18. As mentioned earlier, the box discretisation was carried out using GAIO 31 .
To obtain the sample points of the dynamics, we integrated the model for 10 5 time-units with a time-step of dt = 10 −3 time-units. After a spinup of ten percent of the points, we count the transitions between the boxes with a time-lag of dt time-units. This gives the transition matrix M dt . Thus, a coarse-grained estimation of the invariant measure can be readily obtained by solving the eigenvalue problem in Eq. (11) which is plotted on Fig. 4.
Regarding the perturbation operators, the same techniques as in the O-U process where employed. The difference here is that we are dealing with an invariant measure that is singular with respect to the Lebesgue measure and therefore is supported in a complicated set, as illustrated by Fig. 4. Hence, one needs to take care of the boundary boxes by simply considering forward/backward approximations of the derivatives at those boxes. Thus was estimated B 1 applied to the invariant measure, depicted on the right of Fig. 4.
The results of applying this methodology are presented in Fig. 5, where we show that Eq. (21) can indeed approximate the expectation values certain observables for a wide range of values of ε 1 . We underline that these plots demonstrate the validity of the formulae not only to compute the linear response but to predict the statistics of the system. As is natural, the formulas cannot be expected to work for large values of the perturbation parameter, let alone beyond the bifurcation point. When considering the simultaneous forcings ε 1 G 1 and ε 2 G 2 , we examine the efficiency of the formulas (Table II). We calculate the expectation value of certain observables and check that we can approximate their mean for small values of ε 1 and ε 2 . Also, the linear response is calculated. As we see in the last row, refining the resolution does not always improve the results. This is due to the fact that the length of the integration has to be severely extended in order to sample the boxes covering the domain. In general, the skill of the methodology is shown in Fig. 6 where the expectation values of some observables are predicted using the response formula Eq. (21). Of course, the validity was only tested within the convergence interval.

IV. DISCUSSION
In this work, we take the point of view of statistics to understand the effect of perturbations on dynamical systems. Under this setting, the transfer operator is the essential object of study. By means of projecting such operator onto a finite dimensional vector space, it is possible to obtain empirically constructed matrix representations of the transfer operator via Markov modelling. Further, the properties of these matrices allow us to, up to finite precision, describe the dynamical system of interest.
Exploiting the stochastic (or Markovian) structure of the projected transfer operator, we consider multiple perturbations and express the resulting perturbed invariant measure as a series expansion describing the response at all orders of nonlinearity. Alongside previous work 18,19,21 , it is possible to link the probabilistic concepts of finite Markov chains to dynamical systems. Namely, the mixing rate of the unperturbed chain indicated by its second eigenvalue or more generally its ergodicity coefficient determine the validity of the perturbative expansions. This agrees with more theoretical considerations with the so-called spectral gap condition 40 , key to establish linear response.
The linear component in the perturbative expansion is the so called linear response and it is of physical interest 1 . This quantity is accessed by constructing suitable response operators and indicates the sensitivity of a dynamical system to prescribed perturbations. Here we emphasize the need of gaining predictive skill: given a perturbation to the vector field (possibly induced by tuning several paramters), can we calculate the sensitivity of the system a priori? For such purpose, we examined the Fokker-Planck/Liouville equation to identify the operators that incur the perturbations on the evolution of densities. Then, by considering simple finite-difference methods, we were able to model (to finite-precision) matrix perturbations that allowed to exploit the perturbative formulas giving us access to the linear and nonlinear response of the systems. Notice that using this method, we only need one integration of the (unforced) model to determine the response and sensitivity.
The two numerical experiments performed were intrinsically different. The Ornstein-Uhlenbeck process is a stochastic dynamical system with an invariant measure with a density (with respect to the Lebesgue measure) which is smooth. Therefore, differential operators (discretised using finitedifferences) should work correctly. On the other hand, the Lorenz 63 model is a dissipative deterministic system with an attractor with Lebesgue-zero measure that is singularly hyperbolic. This translates into the fact that linear response theory is not theoretically proved 5 . Moreover, the invariant measure of such system is not smooth, so the usual notion of differentiation does not hold. In the experiments, we coarse-grained the invariant measure to obtain a vector estimate. The corresponding differential operators were applied giving a good compromise, suggesting that at a coarse-grained level, the usual methods for numerical differentiation should work. Previous investigations 32 illustrate that coarse-graining the domain inherently provokes the introduction numerical diffusion thus smoothening the invariant measure.
The partitioning of phase-space is an arbitrary decision of the modeller. In these numerical experiments, the partitioning is considered to be uniform: boxes are of equal size. This is not optimal, since there might be regions of phase-space that need more refining than others 41,42 . The reason for choosing equally sized boxes is that the invariant measures obtained gave a good compromise when approximating the expectation value of observables. In any case, comparing box discretisations is not the target of the paper.
The low dimensionality of the problems considered in this paper allows to perform the box subdivision on the whole phase-space. Unfortunately, many physically relevant models posses a domain which is high-dimensional not to say infinitedimensional. In these cases, it appears intractable to work on the complete domain. Because of this dimensionality barrier, reduced phase-spaces are considered. Results along this line support the applicability of the transfer operator methods in a climatological context, see, e.g., Refs. 14, 15, and 43. However, the inherent loss of Markovianity in the dimensionality reduction requires control on the memory effects introduced by the hidden variables (see, e.g., Ref. 44), something that complicates the study of the response, as pointed out in Ref. 45 where the robustness of reduced systems with the presence of forcing is assessed. Further research should be oriented on adapting this methodology based on the transfer operator in higher-dimensional systems with vistas to assessing and predicting the sensitivity of physically relevant systems.