1 Introduction

1.1 A Brief Summary of Response Theory

The development of methods for computing the response of a complex system to small perturbations affecting its dynamics is the subject of very active investigation in many fields of science and of technology. Statistical mechanics provides tools for approaching such a problem through so-called response theories, which allow for evaluating the change in the properties of a system through suitably defined operators that factor in the statistical properties of the unperturbed system and the specific nature of the perturbation one wants to study.

One can see a response theory as a virtual experimental setting where one has at hand a given system, various measurement instruments, and a knob controlling the value of a parameter, and knows how to relate the position of the knob with the reading of the instruments. In other terms, response theories provide the basis for understanding the outcome of experiments, and, not by chance, physical sciences have been at the forefront of the theoretical investigation in this direction. The monumental contribution by [1] provided the basis and the explicit formulas needed for studying the impact of very general perturbations to statistical mechanical systems at equilibrium, as described by the canonical ensemble. The Kubo formulas are extremely useful for studying a large class of problems in e.g. transport, optics, and acoustics. A cornerstone of Kubo’s theory is the fluctuation–dissipation relation, which enables connecting—within linear approximation—the free fluctuations of a system to its response to perturbations. This property is closely related to the celebrated diffusion law for the brownian motion and has been recently extend to a fully nonlinear case [2]. Despite its obvious relevance, Kubo’s approach has been criticized for several reasons:

  • it is not physically consistent in treating the transition from equilibrium to non-equilibrium dynamics, because it studies the impact on equilibrium systems of perturbations that drive them near (but out of) equilibrium, but does not clarify how a new stationary state is reached and maintained; additionally, it is not suited for studying the response to perturbations of non-equilibrium systems;

  • it lacks mathematical rigour, as it is not clear which are the systems for which the response formulas apply, and why it should apply at all.

In [35] it was clarified that it is possible to establish a rigorous response theory for Axiom A [6] continuous or discrete time dynamical systems. One obtains that the invariant SRB measure is smooth with respect to the parameter \(\epsilon \) that controls the strength of the perturbation changing the dynamics of the system from \(\dot{\mathbf {x}}=F(\mathbf {x})\) to \(\dot{\mathbf {x}}=\mathbf {F}(\mathbf {x})+\epsilon \mathbf {X}(\mathbf {x})\), in the case of continuous time evolution, and from \(\mathbf {x}_{k+1}=\mathbf {F}(\mathbf {x}_k)\) to \(\mathbf {x}_{k+1}=\mathbf {F}(\mathbf {x}_k)+\epsilon \mathbf {X}(\mathbf {x}_k)\), in the discrete case. We continue our discussion taking into consideration the continuous case.

We can introduce the unperturbed evolution operator \(S_0^t=\exp (t \mathbf {F}\cdot )\), which moves forward in time any function of phase space \(O(\mathbf {x})\) by an interval t according to the unperturbed dynamics, so that \(O(\mathbf {x}(t))=S_0^t O(\mathbf {x}(0))\), and its perturbed counterpart \(S_\epsilon ^t=\exp (t (\mathbf {F} +\epsilon \mathbf {X})\cdot )\), which instead describes the evolution in the perturbed system.

We define \(\rho _0(\mathrm {d}\mathbf {x})\) and \(\rho _\epsilon (\mathrm {d}\mathbf {x})\) the invariant measures of the unperturbed and perturbed states, respectively. In particular, one obtains that the expectation value of sufficiently smooth observables \(O(\mathbf {x})\) in the perturbed state can be expressed in the form:

$$\begin{aligned}{}[O ]_\epsilon =[ O]_0 + \sum _{n=1}^\infty \epsilon ^n \delta [O]_n, \end{aligned}$$
(1)

where \([Q]_\epsilon =\int \nu _\epsilon (\mathrm {d}\mathbf {x}) Q(\mathbf {x})\) and \([Q]_0=\int \nu _0(\mathrm {d}\mathbf {x}) Q(\mathbf {x})\), while the various terms of the perturbative expansion can be written as:

$$\begin{aligned} \delta [O]_n = \int \nu _0(\mathrm {d}\mathbf {x}) \int _0^\infty \mathrm {d} t_1\cdots \int _0^\infty \mathrm {d} t_n \Lambda S_0^{t_1} \cdots S_0^{t_{n-1}} \Lambda S_0^{t_{n}} O(\mathbf {x}), \end{aligned}$$
(2)

where \(\Lambda (\bullet )=\mathbf {X} \cdot \nabla (\bullet )\). In particular, the linear term can be written as:

$$\begin{aligned} \delta [O]_1 = \int \nu _0(\mathrm {d}\mathbf {x}) \int _0^\infty \mathrm {d} t_1 \Lambda S_0^{t_1} O(\mathbf {x}), \end{aligned}$$
(3)

All terms \(\delta [O]_j\) can be written as an expectation value on the unperturbed measure of a new observable expressed as a functional of the background vector field \(\mathbf {F}\), of the perturbative vector field \(\mathbf {X}\), and of the observable O. The somewhat surprising conclusion we draw is that the invariant measure of the system, despite being supported on a strange geometrical set, is differentiable with respect to \(\epsilon \). Among the many merits of the Ruelle response theory, one can mention that a) it clarifies the mathematical framework needed for developing a response theory, whose main ingredient, roughly speaking, is the robustness deriving from having a uniformly hyperbolic dynamics on the attractor supporting an SRB measure; and b) it works seamlessly, in principle, in equilibrium and non equilibrium statistical mechanical systems, reducing to Kubo’s formulas when considering the first scenario, if one assumes that statistical mechanical systems are Axiom A. Non-trivial implications of the nonequilibrium/equilibrium dichotomy regarding the validity of the fluctuation-dissipation relations are discussed in [2, 5, 7], while the a physical interpretation of the first and second order terms occurring in Ruelle’s response formalism is provided in [8].

Of course, at this stage one needs to bridge the gap between mathematical formalism and physical meaningfulness, One manages to bring Ruellle’s formalism into the realm of applicability by adopting the chaotic hypothesis [9, 10], which basically says that a high-dimensional chaotic physical system can be treated at all practical purposes as if it were Axiom A if we focus on macroscopic observables. The chaotic hypothesis is the generalisation of the ergodic hypothesis, and provides a firm background for translating the mathematical properties of Axiom A systems into physically meaningful statements. Clearly, the chaotic hypothesis applies far from regimes of metastability and far from critical transitions, where entirely different phenomena appear. The chaotic hypothesis might also be practically problematic in the case one treats multiscale systems featuring many near-zero Lyapunov exponents; see discussion in [11].

Taking the point of view of the chaotic hypothesis, one has that, after transients have died out, nonequilibrium systems reach a nonequilibrium steady state (NESS) where the phase space is on the average contracting (with the rate of contraction corresponding, broadly speaking, to the entropy production of the system [12]), so that one can associate to the hyperbolic strange attractor supporting the invariant measure a Hausdorff dimension that is lower that the dimensionality of the phase space and, in general, not integer [6, 13].

The last piece of the puzzle one needs to lay in order to sort out the above-mentioned criticisms to Kubo’s theory relies on the physical interpretation of how a perturbed equilibrium system reaches a steady state. A convincing point of view on this relies on emphasizing the role of thermostats, which are large physical systems interacting with the system of interest in such a way to extract the excess of heat generated as result of the energy input due to the perturbation. Thermostats are also responsible for making it possible the set-up of stationarity in the case of forced and dissipative non equilibrium systems. An extensive treatment of the role of thermostats in equilibrium and nonequilibrium systems in the context of the chaotic hypothesis is given in [14]. We will not elaborate further on this aspect here.

1.2 Transfer Operator Approach

One can point out that the formulas above describe the impact of and expressed in terms of expectation values of a generic observable O, whereas one might like to derive directly results for the impacts of the perturbations on the invariant measure.

In [35] one constructs the response of the system to perturbations by following the changes in the individual trajectories and summing over the possible initial configurations distributed according to the unperturbed invariant measure. A different point of view on response theory focuses on studying the properties of the unperturbed and perturbed transfer operators and of their generators (see [15] for an introduction on these mathematical objects), through the construction of an appropriate framework of suitable (Banach) functional spaces where their actions are well defined, able to carefully treat the fundamental differences between the (smooth) unstable and (singular) stable manifolds of the Axiom A systems [1619].

The evolution of the measure driven by the system \(\dot{\mathbf {x}}=\mathbf {F}(\mathbf {x})\) up to time \(t\ge 0\) starting from an initial condition at time \(t=0\) is described by the Perron–Frobenius transfer \(\mathcal {L}^t\) (see, e.g., [15]), so that \(\rho (\mathbf {x},t)= \mathcal {L}^t \rho (\mathbf {x},0)\). We have that the family of \(\{\mathcal {L}^t\}_{t\ge 0}\) forms a one-parameter semigroup, such that \(\mathcal {L}^{t+s}=\mathcal {L}^t\mathcal {L}^s\) and \(\mathcal {L}^0=\mathbf {1}\). The Perron–Frobenius operator \(\mathcal {L}^t\) is the adjoint of the evolution operator \(S^t=\left( \mathcal {L}^t\right) ^\top \), so that \(\langle S^t O,\rho \rangle = \langle O,\mathcal {L}^t\rho \rangle \), where \(\langle f,g \rangle \) is the action (computation of the expectation value) of the linear functional g (the probability measure) on the test function f (the observable). We have that \(\mathcal {L}^t\nu _0=\nu _0\) \(\forall t\ge 0\), meaning that the invariant measure is an eigenvector corresponding to unitary eigenvalue of the Perron–Frobenius operator.

Assuming strong continuity and boundedness of the semigroup given by \(\{\mathcal {L}^t\}_{t\ge 0}\), we can introduce the unperturbed Liouvillian operator L, which is the generator of the unperturbed Perron–Frobenius operator \(\mathcal {L}^t=\exp (t L)\), and write the Liouville evolution equation for \(\rho (\mathbf {x},t)\) as follows [20]:

$$\begin{aligned} \partial _t \rho =-\nabla \cdot \left( \rho \mathbf {F}\right) = L \rho \end{aligned}$$
(4)

One immediately obtains that \(L\nu _0=0\). In general, the spectrum of L is complex and in a strip of finite width including and below the imaginary axis consists only of isolated eigenvalues of finite multiplicity corresponding to the Ruelle–Pollicott resonances, while below such a strip one finds the essential spectrum, which is responsible for the continuum of the power spectra of integrable observables. Furthermore, the presence of a unique SRB measure comes from the presence of a simple vanishing eigenvalue, while mixing properties result from the absence of any other eigenvalue along the imaginary axis. The relevance of these properties for constructing a response theory are discussed in great detail in [18, 19]. In [21] it is argued, using mathematical considerations and examples of geophysical relevance, that the presence of Ruelle–Pollicott resonances having real part close to zero may lead to the presence of rough parameter dependence, as the smoothness of the response if lost. Additionally, in [22], it is shown, along similar lines, that the crisis of a very high-dimensional chaotic attractor near a critical transition—namely, of a climate model in the vicinity of the tipping point responsible for the transition between warm and snowball climate [2326]—can be detected and anticipated by looking at spectrum of the transfer operator.

We then have that the presence of the \(\epsilon \) perturbation to the dynamics changes the Liouville equation as follows:

$$\begin{aligned} \partial _t \rho =-\nabla \cdot \left( \rho \mathbf {F}\right) -\epsilon \nabla \cdot \left( \rho \mathbf {X}\right) = L_\epsilon \rho , \end{aligned}$$
(5)

so that we can introduce the perturbed Perron–Frobenius operator \(\mathcal {L}^t_\epsilon =\exp (tL_\epsilon )\), which pushes forward in time the measure according to the perturbed dynamics: \(\rho (\mathbf {x},t)= \mathcal {L}_\epsilon ^t \rho (\mathbf {x},0)\). Clearly, \(\langle S_\epsilon ^t O,\rho \rangle = \langle O,\mathcal {L}_\epsilon ^t\rho \rangle \). Additionally, we have that \(\mathcal {L}_\epsilon ^t\nu _\epsilon =\rho _\epsilon \) \(\forall t\ge 0\) and \(L_\epsilon \nu _\epsilon =0\). While this approach is in some sense mathematically more problematic, because it is based on studying a partial differential equation instead of a finite dimensional dynamical system, it seems to provide a more comprehensive set of tools for studying the response of a system and relating it to its unperturbed fluctuations, see, e.g., [16], where Ruelle’s formulas are obtained along these lines. See also a comprehensive review given in [18], where the applicability of the response theory beyond the case of Axiom A systems is discussed in detail..

One needs to emphasise that the transfer operator approach is more natural in all the cases when our interest focuses on studying the properties of the response of an ensemble of trajectories (initialised according to the unperturbed invariant measure) rather than on individual orbits of a system.

Note that in some applications there is not an obvious separation between the two approaches. Let’s take the problem of constructing climate projections through the use of (extremely complex) numerical climate models, which is one of the core activities summarized in the IPCC reports [27]. Indeed, modelling centers are actively pursuing the preparation of multiple runs starting from an ensemble of initial conditions for a given scenario of forcing in order to estimate more accurately the uncertainties in the projections. Nonetheless, we will not experience an ensemble of realizations of the climatic evolution, but just one.

1.3 Computing the Response

The analysis of high-dimensional complex system in terms of direct numerical simulation and of time series analysis suffers from the (almost) ubiquitous curse of dimensionality, which makes it hard to represent correctly the details of the dynamics because computational complexity explodes with the number of degrees of freedom. The construction of efficient and accurate algorithms for studying the response of a complex system to perturbations faces serious difficulties. Let’s focus now on the linear case. Some previous studies have emphasised the need for treating separately the contributions to the response coming from short and long-time delayed contributions in Eq. 3, and have underlined the need for reducing the complexity of the invariant measure by adding in the background state some stochastic forcing, able to smooth out the singularity of the SRB measure [28, 29].

A promising way to deal with the actual computation of the scalar product in Eq. 3 is to use as time-dependent basis the covariant Lyapunov vectors [30, 31], which automatically separate the contributions to the response coming from the unstable, neutral, and stable directions. This clarifies that the convergence of the formula given in Eq. 3 comes from the two distinct facts that (a) perturbations along the stable directions naturally decay, and (b) perturbations along the unstable directions grow in size, but are dominated by the loss of correlation due to mixing.

Recently, algorithms based upon adjoint methods have shown a good degree of accuracy and seem promising, even if scaling them up to high-dimensional systems has not been attempted yet [32, 33]. A different approach to the problem has been proposed in [7, 3436], where, instead of trying to computing ab initio and directly the response given in Eq. 3, the authors construct it a posteriori, probing the system with some test forcings and using the formal properties of the theory to be able to predict the response for new patterns of forcings. One can say that by studying the differential response to similar yet differently modulated perturbations, it is possible to derive the overall response properties of the system.

1.4 This Paper

Any numerical representation of a continuum system builds upon the need of discretizing the phase space and, in the case of time-continuous system, of time.

In this case, we partition the phase space of the system in say N states \(\phi _1,\ldots ,\phi _N\). In many cases, the states are constructed by discretizing the phase space in a grid of boxes, which provide a (Galerkin) basis of orthogonal functions. We then construct an initial ensemble as defined by the occupancy \(u_0^1,\ldots ,u_0^N\) of each of the \(\phi _i\)’s, \(i=1,\ldots ,N\), so that

$$\begin{aligned} u_0^i=\int \mathrm {d}\mathbf {x} \rho (\mathbf {x},0)\mathbf {1}(\phi _i), \end{aligned}$$

where \(\mathbf {1}(A)\) is the characteristic function in the set A, and we want to approximate the evolution of such occupancies change with time, considering discrete time steps \(\Delta t\), so that, to a good approximation the occupancy at time \(k\Delta t\) is

$$\begin{aligned} u^i(k\Delta t)\sim \int \mathrm {d}\mathbf {x} \mathcal {L}^{k\Delta t} \rho (\mathbf {x},0)\mathbf {1}(\phi _i). \end{aligned}$$

Moreover, in such a discrete representation, we have that the value of an observable O in the state \(\phi _i\) is given by its average

$$\begin{aligned} O(k\Delta t)=\frac{\int \mathrm {d}\mathbf {x} \rho (\mathbf {x},k\Delta t )\mathbf {1}(\phi _i) O(\mathbf {x})}{\int \mathrm {d}\mathbf {x} \rho (\mathbf {x},k\Delta t)\mathbf {1}(\phi _i)}. \end{aligned}$$
(6)

Let’s emphasize that when analyzing virtually any sort of complex system, almost invariably one proposes a natural spatial and temporal cut-off, so that one on not in fact interested in really being able to compute the response of any possible observable defined at any possible spatial and temporal resolution, whereas meso- or macroscopic properties are relevant. Going again to the useful example of climate science, it is commonly regarded as a good and useful question to learn about the change in the surface temperature in response to climate forcing on a spatial scale corresponding to say a continent or a fraction thereof, and on a temporal scale of say one year. Nobody would find useful nor intelligent to study the surface temperature response over extremely small temporal and spatial scales.

Empirically, using long numerical integrations and defining the set of finite states \(\phi _i\), \(i=1,\ldots ,N\), we can construct the stochastic matrix \(\mathcal {M}_{i,j}\) describing the probability of performing a transition from state \(\phi _i\) to state \(\phi _j\) in a period of time \(\Delta t\). The same operation can in principle be performed using experimental and observational data. A fundamental issue at the core of such procedure is whether for some dynamical systems in the limit of finer and finer partitions covering the phase space (actually, the attractor of the system) with \(N\rightarrow \infty \) one reconstructs the actual invariant measure of the original system. See in [37] a comprehensive discussion of such an issue, the so-called Ulam conjecture, and in [38] some extremely promising applications of finite state Markov processes for studying severely reduced representations of complex systems.

Following the idea that the performing the discretization of the phase space amounts to adding a stochastic perturbation of the original dynamical systems, with intensity going to zero with the scale of the actual partitions, and exploiting the fact that the SRB measure can be constructed as zero-noise limit (with measure that is absolutely continuous with respect to Lebesgue) of the physical measure, in [39, 40] it has been proposed that the Ulam conjecture applies in the case of Axiom A systems, which are endowed with an SRB measure. The convergence in the case of Anosov diffeormorphism has indeed been proved provided one adds some noise of asymptotically vanishing intensity (through stronger than the noise induced by the partition itself) to the underlying dynamics [41]. Somehow this is not so surprising because by adding noise one introduces a cutoff below which partitions do indeed work. At any practical level, these results suggest that in the case of Axiom A system constructing finite state Markov processes using Ulam partitions can do a pretty good job in simulating the true dynamics, if one consider reasonably well-behaved, smooth observables as test functions. Nonetheless, one has to note that different choices for the partitions can lead to very different rates of convergence [37]. See also the discussion and the numerical examples presented in [42].

Apart from the Ulam method, one can follow a mathematically more elegant but practically much harder way to construct finer and finer partitions. As well known, Axiom A systems possess Markov partitions, i.e. well-defined, metric independent, finite resolution representations of the phase space that refine themselves with the dynamics [6, 14]. Such Markov partitions can be used to construct in the limit the actual SRB measure of the system, and, additionally, following [43], they provide a natural way to build finite Markov chains whose properties converge in the limit to those of the Perron–Frobenius operator of the system.

Having a response formulas in the finite case has direct relevance for finite Markov chains and for interpreting the results of reduced models. Another good reason to construct a response theory in a finite state space has to do with the fact that the response operators for Axiom A systems introduced by Ruelle can be written as expectation value of certain observables on the unperturbed SRB measure. Therefore, given what said above, one can hope to have convergence of the finite state reconstructed response operators to the corresponding true response operator in the limit of infinitely fine partitions of the dynamics. Actually, providing explicit formulas for the response operator for a finite state partition of a system the response operator and taking the limit for (suitably defined) finer and finer partitions could be interpreted as a rigorous way for constructing the actual response on the asymptotic SRB measure. One needs to note—see discussion in Sects. 2.1 and 3—that special attention has to be paid when studying the convergence of such operators.

In what follows, we present the derivation of the response formulas at all orders of perturbations (as well as the full nonlinear versions) for finite state spaces of arbitrary size N. All expressions are given in terms of the transitions matrix of the unperturbed system, to its corrections to the perturbation, and of the parameter controlling the strength of the perturbation. The interest we see in the calculations we present below is mostly three-fold:

  • our results are obtained using basic linear algebra operations in finite dimensional spaces, which can used to interpret more complex operators acting on infinite dimensional spaces. It is also possible to use the finite dimensional expressions to derive, e.g., the the actual response operators for continuous time Axiom A dynamical systems;

  • we are able to derive an explicit expression for the a lower bound to for the radius of convergence of the perturbative theory, and relate it with the mixing properties of the unperturbed system. We also find a (very tentative) expression for such a lower bound in the case of continuous time case Axiom A dynamical systems;

  • our formulas can be translated into one-line commands in now widely available software tools like R, Octave, or MATLAB \(^\circledR \). This might greatly facilitate the actual implementation of response operators. In particular, we can say that our results provide a direct translation of the response theory into a readily implementable algorithms.

The paper is organised as follows. In Sect. 2, we introduce some notation and provide basic properties of ergodic finite state Markov chains, which can be taken as mathematical model on its own or as finite precision representation of ergodic (in this case, Axiom A) systems. We also show how it is possible to find an exact expression for the impact of a perturbation on the invariant measure of the Markov process and we study the radius of convergence of the perturbative expansion. In Sect. 3 we rephrase our results in terms of observables, by constructing straightforward adjoint operators in finite dimensions. In Sect. 4 we show how our findings agree with the response theory for continuous time systems when we suitably translate the matrix operations into operators. In Sect. 5 we present a simple yet instructive investigation of the response of the Lorenz ’63 system [44] to perturbations using Ulam-like partitions and the formalism developed here. In Sect. 6 we recapitulate and discuss our results.

2 Response Operators for Finite-State Markov Processes

Let’s consider an ergodic Markov process with a finite number of states defined by the N-component vector \(\mathbf {u}\). We consider the infinite Markov chain generated as \(\mathbf {u_0}\), \(\mathcal {M} \mathbf {u_0}\),\(\ldots \) \(\mathcal {M}^n \mathbf {u_0}\), \(\ldots \) where \(\mathbf {u}_0\) is the initial ensemble of states, and \(\mathcal {M}_{i,j}\in \mathbb {R}^{N\times N} \) is the stochastic transition matrix determining the probability of reaching the state i at step n if at step \(n-1\) we are in the state j. The process is taken to be stationary, so that \(\mathcal {M}\) does not change with n. We remind that \(\mathcal {M}\) is such that \(\sum _{i=1}^N \mathcal {M}_{i,j}=1\) and \(\mathcal {M}_{i,j}\ge 0\) \(\forall i,j=1,\ldots ,N\).

The invariant measure is obtained by solving the eigenvalue problem

$$\begin{aligned} \mathcal {M} \mathbf {u}= \lambda \mathbf {u}, \end{aligned}$$
(7)

and selecting the unique solution with eigenvalue \(\lambda =1\). The corresponding (column) eigenvector \({\mathbf {u_1}}\) Footnote 1 is the invariant measure of the system. We also remind that

$$\begin{aligned} \lim _{n\rightarrow \infty }\mathcal {M}^n \mathbf {z} = \alpha _1 {\mathbf {u_1}}, \quad \forall \mathbf {z}. \end{aligned}$$
(8)

where \(\{\lambda _j,\mathbf {u_j}\}\) \(j=1,\ldots ,N\) are the pairs of eigenvalues and eigenvectors of \(\mathcal {M}\), where \(\lambda _1=1\), \(|\lambda _j|<1\) if \(j>1\), and \(\mathbf {z}\) can be expressed as \(\mathbf {z}=\sum _{j=1}^{N} \alpha _j \mathbf {u_j}\).

Our goal is to find a formula for expressing the change in the invariant measure resulting from perturbing the transition matrix \(\mathcal {M}\rightarrow \mathcal {M}+\epsilon m\).

We note that in order to preserve the Markov property of the system, m obeys the following constraint: \(\sum _{j=1}^N m_{i,j}=0\), so that \(\sum _{j=1}^N\left( \mathcal {M}_{i,j} +\epsilon m_{i,j}\right) =0\). Moreover, an additional constraint on \(\epsilon m\) comes from the fact that all elements of \(\mathcal {M} +\epsilon m\) have to be positive. We define

$$\begin{aligned} \epsilon _+=\max _\epsilon | \forall i,j \in \{1,\ldots ,N\}, \mathcal {M}_{i,j} +\epsilon m_{i,j}\ge 0, \end{aligned}$$
(9)

and

$$\begin{aligned} \epsilon _-=\min _\epsilon | \forall i,j \in \{1,\ldots ,N\}, \mathcal {M}_{i,j} +\epsilon m_{i,j}\ge 0; \end{aligned}$$
(10)

clearly, \(\epsilon _-\le 0\le \epsilon _+\), and the perturbed matrix is a stochastic matrix \(\forall \epsilon \in [ \epsilon _-,\epsilon _+]\). In order to have some room for studying the impacts of perturbations, we require that \(\epsilon _+-\epsilon _- >0\). Such conditions show that, for a given \(\mathcal {M}\), it makes sense to consider only a specific class of perturbation matrices m. Let’s provide an example of an ill-chosen m: if \(\mathcal {M}\) has two zero entries \(\mathcal {M}_{i_1,j_1}=\mathcal {M}_{i_2,j_2}=0\) and \(m_{i_1,j_1}m_{i_2,j_2}<0\), then we have \(\epsilon _-= 0= \epsilon _+\).

The new invariant measure is the unique solution to the eigenvalue problem:

$$\begin{aligned} (\mathcal {M}+\epsilon m) \mathbf {u}= \lambda \mathbf {u}, \end{aligned}$$
(11)

with unitary eigenvalue. We define \({\mathbf {v_1}}\) as the invariant measure of the perturbed system. Our goal is to express it as a function of \(\mathcal {M}\), m, \(\epsilon \) and \( {\mathbf {u}}\). This amounts to constructing a response theory. We first present the results of the explicit calculation, and then discuss issues of well-posedness of the problem and convergence of the procedure in Sect. 2.1. Let’s express \({\mathbf {v_1}}={\mathbf {u_1}}+\sum _{n=1}^\infty \epsilon ^n {\mathbf {w^{n}}}\), so that we obtain:

$$\begin{aligned} (\mathcal {M}+\epsilon m) \left( {\mathbf {u_1}}+\sum _{n=1}^\infty \epsilon ^n {\mathbf {w^{n}}}\right) = {\mathbf {u_1}}+\sum _{n=1}^\infty \epsilon ^n {\mathbf {w^{n}}}, \end{aligned}$$
(12)

Note that the first eigenvalue is not changed by the perturbation \(\mathcal {M}\rightarrow \mathcal {M} +\epsilon m\), because also \(\mathcal {M} +\epsilon m\) is a stochastic matrix. Using the definition of \(\mathbf {u_1}\) we obtain a system of concatenated equations

$$\begin{aligned} (1-\mathcal {M}) {\mathbf {w^{1}}}&= m{\mathbf {u_1}}\end{aligned}$$
(13)
$$\begin{aligned} (1-\mathcal {M}) {\mathbf {w^{n}}}&= m{\mathbf {w^{n-1}}}, \quad \forall n\in \mathbf {N}, n>1. \end{aligned}$$
(14)

We obtain

$$\begin{aligned} {\mathbf {w^{1}}}&= \Psi _1 {\mathbf {u_1}}= (1-\mathcal {M})^{-1} m {\mathbf {u_1}}\end{aligned}$$
(15)
$$\begin{aligned} {\mathbf {w^{n}}}&= \Psi _1 {\mathbf {w^{n-1}}}. \end{aligned}$$
(16)

Given the recursive structure, we immediately derive the general formula:

$$\begin{aligned} {\mathbf {w^{n}}}= \Psi _n {\mathbf {u_1}} =\Psi _1^n {\mathbf {u_1}} = \prod _{j=1}^n\left( \left( 1-\mathcal {M}\right) ^{-1} m\right) {\mathbf {u_1}}. \end{aligned}$$
(17)

where \(\Psi _n=\Psi _1^n\). Concluding, we have that:

$$\begin{aligned} {\mathbf {v_1}}= {\mathbf {u_1}}+\sum _{n=1}^\infty \epsilon ^n {\mathbf {w_n}} ={\mathbf {u_1}}+\sum _{n=1}^\infty \epsilon ^n \Psi _1^n {\mathbf {u_1}} = {\mathbf {u_1}} +\sum _{n=1}^\infty \epsilon ^n \prod _{j=1}^n\left( \left( 1-\mathcal {M}\right) ^{-1} m\right) {\mathbf {u_1}} \end{aligned}$$
(18)

which provides the formula we have been looking for. We note that the term responsible for the nth order of perturbation to the measure can be expressed as

$$\begin{aligned} \lim _{\epsilon \rightarrow 0 }\frac{1}{n!}\frac{\mathrm {d}^n}{\mathrm {d}\epsilon ^n} {\mathbf {v_1}}. \end{aligned}$$
(19)

Using the matrix identity \((1-\mathcal {N})^{-1}= \sum _{k=0}^\infty \mathcal {N}^k\) with \(\mathcal {N}= \epsilon \Psi _1=\epsilon \left( 1-\mathcal {M}\right) ^{-1} m\), we can also formally express the previous result as:

$$\begin{aligned} {\mathbf {v_1}}=(1-\epsilon \Psi _1)^{-1} {\mathbf {u_1}}=(1-\epsilon (1-\mathcal {M})^{-1}m)^{-1} {\mathbf {u_1}}. \end{aligned}$$
(20)

Using again the matrix identity \((1-\mathcal {M})^{-1}=\sum _{k=0}^\infty \mathcal {M}^k\), the previous expression can be rewritten as:

$$\begin{aligned} {\mathbf {v_1}}=(1-\epsilon \Psi _1)^{-1} {\mathbf {u_1}}=\left( 1-\epsilon \sum _{k=0}^\infty \mathcal {M}^k m\right) ^{-1} {\mathbf {u_1}}. \end{aligned}$$
(21)

or

$$\begin{aligned} {\mathbf {v_1}}= {\mathbf {u_1}}+\sum _{n=1}^\infty \epsilon ^n \prod _{j=1}^n \left( \sum _{k=0}^\infty \mathcal {M}^k m\right) {\mathbf {u_1}} \end{aligned}$$
(22)

2.1 Well-Posedness and Convergence

In the previous equations, we have used somewhat carelessly the expression \((1-\mathcal {M})^{-1}\). Unfortunately, the matrix \(1-\mathcal {M}\) is not invertible, because all of its columns sum up to zero, or, alternatively, because we know that 1 is an eigenvalue of \(\mathcal {M}\). Nonetheless, the expression makes sense if we apply it to a vector belonging to \({{\mathrm{span}}}\{\mathbf {u_2},\ldots ,\mathbf {u_n}\}\). We now want to prove that:

Lemma 1

If \(\mathcal {M}\) is a Markov transition matrix \(\mathbb {R}^N\rightarrow \mathbb {R}^N\) with eigenvectors \((\mathbf {u_1},\mathbf {u_2},\ldots ,\mathbf {u_N})\), and corresponding eigenvalues \((\lambda _1=1,\lambda _2,\ldots ,\lambda _N)\), \(1>|\lambda _2|\ge \dots |\lambda _N|\), and m is a matrix matrix \(\mathbb {R}^N\rightarrow \mathbb {R}^n\) such that \(\sum _{i=1}^n m_{i,j}=0\), then \(m\mathbf {z} \in {{\mathrm{span}}}\{\mathbf {u_2},\ldots ,\mathbf {u_n}\}\) \(\forall \mathbf {z}\in \mathbb {R}^n\).

Proof

Let’s consider the vector \(\mathbf {y}=m\mathbf {z}\). Its i th component can be written as \(y_i=\sum _{j=1}^N m_{i,j}z_j\). Since \(\sum _{i=1}^N m_{i,j}=0\), we have that \(\sum _{i=1}^N z_i= \sum _{i=1}^N \sum _{j=1}^N m_{i,j} z_j=0\).

Let’s now consider the k th eigenvector \(\mathbf {u}_k\) of \(\mathcal {M}\). We have \(\sum _{j=1}^N \mathcal {M}_{i,j} u_{k;j} = \lambda _k u_{k;i}\). Since \(\sum _{i=1}^N \mathcal {M}_{i,j}=1\), taking the sum over the i components of the previous expression, we obtain: \(\sum _{i=1}^N \sum _{j=1}^N \mathcal {M}_{i,j} u_{k;j} = \sum _{j=1}^N u_{k;j} = \lambda _k \sum _{j=1}^N u_{k;j}\). Therefore, either \(\lambda _k=1\), or \(\sum _{j=1}^N u_{k;j}=0\). We have that if \(k>1\), \(\sum _{j=1}^N u_{k;j}=0\).

We conclude that \(\mathbf {y}=m\mathbf {z} \in {{\mathrm{span}}}\{\mathbf {u_2},\ldots ,\mathbf {u_N}\}\) \(\forall \mathbf {z}\in \mathbb {R}^N\). \(\square \)

Remark

One needs note that finite numerical precision might cause troubles, so that one should be careful in eliminating any component along \(\mathbf {u_1}\) at each before applying \(\sum _{j=1}^\infty \mathcal {M}^j\). Note that we must use \(\sum _{j=1}^\infty \mathcal {M}^j\) expression for \((1-\mathcal {M})^{-1}\) in any code, because otherwise any software would give us automatically a NaN as error message.

Remark

We wish to underline another method for avoiding the \(\texttt {NaN}\) problem mentioned above. Following [45], we introduce the fundamental matrix of the Markov chain as \(\mathcal {Z}=(1-\mathcal {M}+\mathcal {M}^\infty )^{-1}\), where \(\mathcal {M}^\infty \) is the limit matrix whose columns are all equal to \(\mathbf {u}\). One can show that \(\mathcal {Z}\) exists as the operation of inverse is well defined given the spectral properties of \(\mathcal {M}-\mathcal {M}^\infty \) [39]. One can show that \(\mathcal {M}^\infty m \mathbf {{z}}=0\) \(\forall \mathbf {z}\in \mathbb {R}^N\). Therfore, in all the previous Eqs. 16-22 we can substitute \((1-\mathcal {M} )^{-1}m=\sum _{j=0}^\infty \mathcal {M}^j m= \mathcal {Z}m=\sum _{j=0}^\infty (\mathcal {M}-\mathcal {M}^\infty )^j m\).

Let’s consider the problem of convergence of the expression in Eq. 18. We want to make sure that the \(L^1\) norm of \(\sum _{n=1}^\infty \epsilon ^n {\mathbf {w_n}}\) does not diverge, and use this to find a bound for the value of \(\epsilon \). A simple way to approach this problem is to study the ratio of the \(L^1\) norm of two consecutive terms in the previous series. Using Eqs. 15-16, we have:

$$\begin{aligned} \frac{\epsilon ^n||w_{n}||_1}{\epsilon ^{n-1}||w_{n-1}||_1}&= \epsilon \frac{|| (1-\mathcal {M})^{-1}m w_{n-1}||_1}{||w_{n-1}||_1} \le \epsilon \frac{|| (1-\mathcal {M})^{-1}||^*_1 || m w_{n-1}||_1}{||w_{n-1}||_1}\end{aligned}$$
(23)
$$\begin{aligned}&\le \epsilon || (1-\mathcal {M})^{-1}||^*_1 || m ||_1\end{aligned}$$
(24)
$$\begin{aligned}&\le \epsilon (1-||\mathcal {M}||^*_1)^{-1}|| m ||_1 \end{aligned}$$
(25)

where we use the submultiplicative property of the norm and we introduce a modified definition of the \(L^1\) norm taking into account that the vector \(m\mathbf {v} \in {{\mathrm{span}}}\{\mathbf {u_2},\ldots ,\mathbf {u_N}\} \forall \mathbf {v} \in \mathbb {R}^N\):

$$\begin{aligned} ||\mathcal {Q}||^{*}_1=\sup _{v\in {{\mathrm{span}}}\{\mathbf {u_2},\ldots ,\mathbf {u_N}\},||\mathbf {v}||_1=1}{\frac{||\mathcal {Q}\mathbf {v}||_1}{||\mathbf {v}||_1}}. \end{aligned}$$

Using expression 25, we have that the perturbative expression converges if

$$\begin{aligned} |\epsilon | (1-||\mathcal {M}||^*_1)^{-1}||m||_1<1 \rightarrow |\epsilon |< \epsilon _{max}=\frac{1-||\mathcal {M}||^*_1}{||m||_1}, \end{aligned}$$
(26)

The previous expression provides an explicit bound for our calculations. We note that \(\epsilon _{max}\) is finite because of the restriction imposed in the definition of the norm \(|| \bullet ||^*\). Such a bound ensures also the invertibility of \((1-\epsilon \Psi _1)^{-1}.\) From the previous result, we find the following bound for the first order correction to the invariant measure:

$$\begin{aligned} ||\epsilon w_1 ||_1 \le \frac{\epsilon ||m||_1}{1-||\mathcal {M}||^*_1}, \end{aligned}$$

so that \(||m||_1/(1-||\mathcal {M}||^*)\) can be though as a bound to the first order sensitivity of the measure to perturbations.

Using expression 24, we can derive a more generous bound for \(\epsilon \):

$$\begin{aligned} |\epsilon |<\epsilon ^*_{max}=\frac{1}{||m||_1||(1-\mathcal {M})^{-1}||^*_1}\ge \epsilon _{max}. \end{aligned}$$
(27)

while \(||m||_1||(1-\mathcal {M})^{-1}||^*_1\) provides an additional (stricter) bound to the first order sensitivity. Note that in all the previous expressions we can substitute \(||(1-\mathcal {M})^{-1}||_1^*\) with \(||\mathcal {Z}||_1\).

At this point, we wish to refer to previous results (see, e.g., [46]) providing bounds for the \(L^1\) norm of the difference between the perturbed and unperturbed invariant measure:

$$\begin{aligned} ||v_1-u_1||_1\le \frac{\epsilon ||m||_1}{1-\tau _\mathcal {M}(1)} \end{aligned}$$
(28)

where \(\tau _\mathcal {M}(1)\) is the so-called ergodicity coefficient [47] defined as:

$$\begin{aligned} \tau _\mathcal {M}(1)= \frac{1}{2}\sup _{i,j}||\mathcal {M}(\mathbf {e_i}-\mathbf {e_j})||_1 \end{aligned}$$

with \(\mathbf {e_i}\) indicating the unit vector having 1 at the ith entry and zero elsewhere. We remind that \(\tau _1(\mathcal {M})\) is larger than any subdominant eigenvalue of \(\mathcal {M}\), and \(1/(1-\tau _\mathcal {M}(1))\) can be taken as a definition of conditioning number of \(\mathcal {M}\) [48]. Clearly if \(\tau _\mathcal {M}(1)\) is close to 1, the bound given in Eq. 28 diverges. Note that \(1/(1-\tau _\mathcal {M}(1))\) is the bound to non-perturbative sensitivity mirroring the bound to the perturbative, linearized sensitivity given previously as \(1/(1-||\mathcal {M}||_1^*)\). See also additional results presented in [49].

The sensitivity of the unperturbed measure to perturbations given in Eq. 28 can also be cast in terms \(\rho _\mathcal {M}\), the smallest possible value for the constant controlling the rate of convergence of iterates \(\mathcal {M}\mathbf {e_i}\), \(\mathcal {M}^2 \mathbf {e_i}\), \(\ldots \), \(\mathcal {M}^n \mathbf {e_i}\) to \(\mathbf {u_1}\), so that \(\forall n\in \mathbb {N}_+, \forall i\in {1, \ldots N}\) we have that \(||\mathcal {M}^n \mathbf {e_i}-\mathbf {u_1}||_1\le C \rho _\mathcal {M}^n\), \(C\ge 1\) [46, 48]. The sensitivity diverges as \(\rho _\mathcal {M}\) approaches 1, i.e. when the unperturbed matrix has slow properties of convergence.

While the quantities \(||\mathcal {M}||_1^*\), \(\tau _{\mathcal {M}}(1)\), and \(\rho _\mathcal {M}\) are indeed different, they all point to the fact that if the mixing rate of the unperturbed matrix \(\mathcal {M}\) is slow—so that such quantities are close to 1 (so that \(||(1-\mathcal {M})^{-1}||_1^*\) and \(||\mathcal {Z}||_1\) are very large)—then the sensitivity of the measure to perturbations is high. See in [21] a discussion of the link between slow mixing of a system and the presence of rough parameter dependence in its response to perturbations, with some examples of applications in a geophysical context.

Bringing together the results presented in Eqs. 9, 10 and in Eq. 27, we conclude that Eqs. 1822 provide the exact expression for the invariant measure of the stochastic matrix \(\mathcal {M}+\epsilon m\) \(\forall \epsilon \in \{[-\epsilon _{max}^*,\epsilon _{max}^*] \cap [\epsilon _-,\epsilon _+]\}\).

3 Response Theory for Observables

Let’s now look at the problem in terms of impact of the perturbation m on the expectation value of observables. Observables live in the dual space of the densities, and, given our convention, they are row vectors. They are approximated as having a constant value within each cell of the chosen partition of the phase space. The expectation value of the observable \(\mathbf {\pi }\) with respect to a measure \(\mathbf {w}\) can be written as \(\langle \mathbf {\pi },\mathbf {w}\rangle \), where \(\langle \bullet , \bullet \rangle \) denotes the scalar product. By definition, we have that \(\langle \mathbf {\pi }, A \mathbf {w} \rangle =\langle A^\top \mathbf {\pi }, \mathbf {w}\rangle \), where \(A^\top \) indicates the transpose (and adjoint, because we are studying real functions) of A.

Let’s look at the change in the expectation value of the observable \(\pi \) as a result of \(\mathcal {M}\rightarrow \mathcal {M}+\epsilon m\). We can write:

$$\begin{aligned} \langle \pi , \mathbf {v_1} \rangle =[\pi ]_\epsilon&= [\pi ]_0 + \sum _{n=1}^\infty \epsilon ^n \delta [\pi ]_n \end{aligned}$$
(29)
$$\begin{aligned}&=\langle \pi ,\mathbf {u_1} \rangle + \sum _{n=1}^\infty \epsilon ^n \langle \Psi _n^\top \pi , {\mathbf {u_1}}\rangle \end{aligned}$$
(30)
$$\begin{aligned}&=\langle \pi ,\mathbf {u_1} \rangle + \sum _{n=1}^\infty \epsilon ^n \langle \left( \Psi _1^\top \right) ^n \pi , {\mathbf {u_1}}\rangle , \end{aligned}$$
(31)

where \([\pi ]_0=\langle \pi ,\mathbf {u_1} \rangle \) is the expectation value of \(\pi \) in the unperturbed system, \([\pi ]_\epsilon =\langle \pi ,\mathbf {v_1} \rangle \) is the expectation value of \(\pi \) in the perturbed system, \(\delta [\pi ]_n\) is the n th order perturbation, which can be expressed as

$$\begin{aligned} \delta [\pi ]_n=\lim _{\epsilon \rightarrow 0 }\frac{1}{n!}\frac{\mathrm {d}^n}{\mathrm {d}\epsilon ^n} \langle \pi , \mathbf {v_1} \rangle . \end{aligned}$$
(32)

Moreover, \(\Psi _n^\top \) is the n th order adjoint response operator, acting on the observables, which can be written as:

$$\begin{aligned} \Psi _n^\top =(\Psi _1^\top )^n= \prod _{j=1}^n m^\top \left( \sum _{k=1}^\infty \mathcal {M}^k\right) ^\top . \end{aligned}$$
(33)

We can also wrote Eq. 31 as:

$$\begin{aligned} \langle \pi , \mathbf {v_1} \rangle&= \langle (1- \epsilon \Psi _1^\top )^{-1} \pi , \mathbf {u_1} \rangle \end{aligned}$$
(34)
$$\begin{aligned}&= \left\langle \left( 1- \epsilon m^\top \left( \sum _{k=1}^\infty \mathcal {M}^k\right) ^\top \right) ^{-1} \pi , \mathbf {u_1} \right\rangle \end{aligned}$$
(35)
$$\begin{aligned}&= \langle (1- \epsilon m^\top (1- \mathcal {M}^\top )^{-1} )^{-1} \pi , \mathbf {u_1} \rangle . \end{aligned}$$
(36)

where the last two expressions provide the nonperturbative formulas.

Remark

Equations 22 and 31 provide at all orders the response formulas for the discrete Markov process studied here. If we are constructing empirically the discrete phase space, we expect that different choices of the partitions, corresponding to different approximate representations of the full dynamics, will deliver different results in terms of response. Hence, our results can be model dependent, which is reasonable, as we are starting from a subjective choice on the way we approximate the phase space. In fact, one can empirically test the robustness of the obtained results against a set of given criteria by comparing whether the perturbations to a certain set of relevant observables weakly depend on the specific partition used. We present a very preliminary (and encouraging) numerical study performed on the Lorenz ’63 model [44] later in Sect. 5.

Moreover, as discussed in Sect. 1.4, if we construct finer and finer partitions of for studying the response of systems whose unperturbed dynamics features an SRB invariant measure (most notably in the case of Axiom A systems), and indeed if we follow the self-refining Markov partitions of the dynamics, our results should converge to the exact response theory built upon the true SRB measure.

One needs to note that Eq. 27 gives an estimate of the largest possible value of \(\epsilon \) for a given partition, but we are are not sure whether the minimum over all the finer and finer partitions of \(\epsilon _{max}^*\) is positive—this corresponds to imposing the uniform—in N—bound on the norm of \(||(1-\mathcal {M})^{-1}||_1^*\) or \(||\mathcal {Z}||_1\).

In [39] it is shown that \(L^1\) convergence of the finite state measure constructed using the Ulam method to the actual SRB measure is realized when \(||\mathcal {Z}||_1\) grows asymptotically not faster than \(\log N\), where N is the number of states. The requirement we seem to have here for applying response theory here is unavoidably stricter because computing the response entails considering the expectation value of not necessarily well behaved observables, constructed through nontrivial operations of differentiation of the actual observables of which we want to study the sensitivity to perturbations, see Eq. 2 and [35]. This essential difficulty is exactly what motivates the point of view discussed in [18, 50], where a delicate analysis of the relationship between tangent space of the unperturbed dynamics, the perturbation flow, and of the observable allow to set up a robust framework for the response theory.

Similarly, in our case, making the response theory work at practical level means having/choosing m and \(\mathbf {u}\) in such a way that \(||(1-\mathcal {M})^{-1}||_1^*\) or \(||\mathcal {Z}||_1\) grossly overestimates in terms of norm the effect of applying \((1-\mathcal {M})^{-1}\) or equivalently \(\mathcal {Z}\) in, e.g., Eq. 22. Additionally, a suitable choice of the observable \(\pi \) can help avoiding potential singularities in Eq. 36. In other terms, response theory can work much more easily once we get rid of or cure pathological cases.

4 Towards Continuous Time Dynamical Systems

We want to rephrase the previous results in the context of continuous time dynamical systems and derive some formulas previously presented in the literature concerning Axiom A systems. We coonsider a time continuous dynamical system of the form \(\dot{\mathbf {x}}=\mathbf {F}(\mathbf {x})\) and study its response to the perturbation \(\mathbf {F}(\mathbf {x})\rightarrow \mathbf {F}(\mathbf {x})+\epsilon \mathbf {X}(\mathbf {x})\). Correspondingly, as a result of the perturbation, the original invariant measure \(\nu _0 (\mathrm {d}\mathbf {x})\) is changed into \(\nu _\epsilon (\mathrm {d}\mathbf {x})\). The Liouville equation describing the evolution of a given initial density of states \(\rho (\mathbf {x})\) for the unperturbed system can be written as

$$\begin{aligned} \partial _t \rho (\mathbf {x},t) =-\nabla \cdot \left( \mathbf {F}(\mathbf {x})\rho (\mathbf {x},t)\right) ; \end{aligned}$$
(37)

considering two instants of time separated by a small time interval \(\mathrm {d}t\), we have:

$$\begin{aligned}&\rho (\mathbf {x},t+\mathrm {d}t) =\rho (\mathbf {x},t) -\mathrm {d}t \nabla \cdot \left( \mathbf {F}(\mathbf {x})\rho (\mathbf {x},t)\right) = \mathcal {M} \rho (\mathbf {x},t)\nonumber \\&\mathcal {M}=1+\mathrm {d}t\mathcal {F} \quad \mathcal {F}= -\nabla \cdot \left( \mathbf {F}(\mathbf {x})\bullet \right) . \end{aligned}$$
(38)

We understand that \(\mathcal {M}\) is in this context the unperturbed Perron–Frobenius operator \(\mathcal {L}_\epsilon ^{\mathrm {d}t}\) pushing forward the measure \(\rho \) from t to \(t+\mathrm {d}t\). When looking at the perturbed flow we have:

$$\begin{aligned} \rho (\mathbf {x},t+\mathrm {d}t)&=\rho (\mathbf {x},t) -\mathrm {d}t \nabla \cdot \left( \mathbf {F}(\mathbf {x})\rho (\mathbf {x},t)\right) -\mathrm {d}t \epsilon \nabla \cdot \left( \mathbf {X}(\mathbf {x})\rho (\mathbf {x},t)\right) \nonumber \\&=(\mathcal {M}+\epsilon m) \rho (\mathbf {x},t),. \end{aligned}$$
(39)

where

$$\begin{aligned} m=\mathrm {d}t\mathcal {X} \quad \mathcal {X} =- \nabla \cdot \left( \mathbf {X}(\mathbf {x})\bullet \right) \end{aligned}$$
(40)

In this case, starting from Eq. 23, and considering that no normalization is applied to the perturbation operator, it is possible to propose a definition of \(\epsilon ^*_{max}\) for the continuous time dynamics taking inspiration from Eq. 27:

$$\begin{aligned} \epsilon ^*_{max}=\frac{1}{||\mathcal {X}||_\mathcal {B}||\mathcal {F}^{-1}||^*_\mathcal {B}}, \end{aligned}$$
(41)

such that the perturbative expansion converges if \(\epsilon \le \epsilon ^*_{max}\), where \(|| \bullet ||_\mathcal {B}\) describes the norm of the operator in the appropriate Banach space \(\mathcal {B}\) it belongs to, while \(|| \bullet ||^*_\mathcal {B}\) is such that the computation of the norm excludes the SRB measure. Note that \(\epsilon ^*_{max}\) is finite if both \(||\mathcal {X}||_\mathcal {B}||\) and \(||\mathcal {F}^{-1}||^*_\mathcal {B}\) are finite. This expression is admittedly tentative. As mentioned before, the problem of selecting appropriate functional spaces for constructing the response theory for Axiom A systems along the lines of studying the perturbations to the transfer operator requires a careful construction of suitable Banach spaces and of the related metrics [16, 18, 19] and is beyond the scope of this paper.Footnote 2

4.1 Linear Response

We now want to derive the Ruelle response formulas for computing the linear correction to the invariant measure resulting from the perturbation. We write

$$\begin{aligned} \nu _\epsilon (\mathrm {d}\mathbf {x})=\nu _0(\mathrm {d}\mathbf {x})+\sum _{n=1}^\infty \epsilon ^n \nu _n(\mathrm {d}\mathbf {x}), \end{aligned}$$
(42)

where n indicates the order of perturbation. Let’s first go back to the first order term in Eq. 15:

$$\begin{aligned} \epsilon {\mathbf {w^{1}}}=\epsilon \Psi _1 \mathbf {u_1} = \left( \sum _{k=1}^\infty \mathcal {M}^k \right) m {\mathbf {u_1}}. \end{aligned}$$
(43)

Each term of the form \(\mathcal {M}^k\) pushes forward up to time \(t_k=k\times \mathrm {d}t\) what is positioned to its right. Summing over k in, in fact, amounts to looking forward in time. If we insert the definition of m given above, we get the integrating factor \(\mathrm {d}t\), so that we obtain the following expression:

$$\begin{aligned} \nu _1(\mathrm {d}\mathbf {x})= - \int _0^\infty \mathrm {d} t \nabla \cdot (\mathbf {X}(\mathbf {x}(t)) \nu _0 (\mathrm {d}\mathbf {x})), \end{aligned}$$
(44)

where the evolution takes place according to the unperturbed system, and we have used the invariance of \(\nu (\mathrm {d}\mathbf {x})\) with respect to such an evolution law.

By going into the dual space of the observables, we have that the change in the value of an observable \(O(\mathbf {x})\) from time t to time \(t+\mathrm {d}t\) in the unperturbed system can be written as:

$$\begin{aligned} \frac{\mathrm {d}}{\mathrm {d}t} O(\mathbf {x}(t))= \mathbf {F}(\mathbf {x})\cdot \nabla O(\mathbf {x}(t)), \end{aligned}$$
(45)

so that

$$\begin{aligned} O(\mathbf {x}(t+\mathrm {d}t))=O(\mathbf {x}(t)) +\mathrm {d}t \mathbf {F}(\mathbf {x})\cdot \nabla O(\mathbf {x}(t))=\mathcal {M}^\top O(\mathbf {x}(t)). \end{aligned}$$
(46)

where the operator \(\mathcal {M}^\top =1+\mathrm {d}t \mathcal {F}^\top = \mathbf {1}+\mathrm {d}t \mathbf {F}(\mathbf {x})\cdot \nabla (\bullet )\). Along the same lines, one derives that the perturbation operator \(m^\top \) acting on the observable can be written as \(m^\top =\mathrm {d}t\mathcal {X}^\top =\mathrm {d}t\mathbf {X}(\mathbf {x})\cdot \nabla (\bullet )\). Furthermore, we introduce the following expansion for the expectation value of \(O(\mathbf {x})\):

$$\begin{aligned}{}[O]_\epsilon =[O]_0+\sum _{n=1}^\infty \epsilon ^ n\delta [O]_n, \end{aligned}$$
(47)

where \([O]_\epsilon \) is the expectation value in the perturbed system, \([O]_0\) is the unperturbed expectation value, and the corrections are included in the summation.

Applying this expression to the first order term in Eq. 3133:

$$\begin{aligned} \delta [\pi ]_1 = \epsilon \langle \Psi _1^\top \pi , \mathbf {u}_1\rangle = \epsilon \langle \left( m^\top \left( \sum _{k=1}^\infty \mathcal {M}^k\right) ^\top \right) \pi , \mathbf {u}_1\rangle . \end{aligned}$$
(48)

we get:

$$\begin{aligned} \delta [O]_1 = \epsilon \int \nu _0 (\mathrm {d}\mathbf {x}) \int _0^\infty \mathrm {d} t \mathbf {X}(\mathbf {x}) \cdot \nabla O(\mathbf {x}(t))= \epsilon \int \nu _0 (\mathrm {d}\mathbf {x}) \int _0^\infty \mathrm {d} t \Lambda S^t_0 O(\mathbf {x}) \end{aligned}$$
(49)

which is exactly the original version of Ruelle’s linear response formula given in Eq. 3.

One needs to note that what in Ruelle’s formulation is causality (time integration in the response starts from 0), in the context of the Markov matrices formalism followed here comes from the algebraic expansion of \((1-\mathcal {M})^{-1}\). The issues of convergence mentioned in the original paper by Ruelle can be translated in the rate of mixing of the system as determined by the properties of \(\mathcal {M}\) discussed in Sect. 2.1.

4.2 Higher Order Terms

We can repeat the same construction to derive the higher order perturbation terms in the case of the continuous time dynamical systems. Inserting in Eqs. 15, 16 the expression 38 for \(\mathcal {M}\) and expression 40 for m, we obtain for the second order the following expression for the perturbation to the invariant density:

$$\begin{aligned} \nu _2(\mathrm {d}\mathbf {x})=\epsilon ^2 \int _0^\infty \mathrm {d}t_1\int _0^\infty \mathrm {d}t_ 2 \nabla \cdot \left( \mathbf {X}(\mathbf {x}(t_1) \nabla _{\mathbf {x}(t_1)} \cdot \left( \mathbf {X}(\mathbf {x}(t_1+t_2)\right) \right) \nu _0 (\mathrm {d}\mathbf {x}), \end{aligned}$$
(50)

while the expression for the n th order correction reads like

$$\begin{aligned} \nu _n(\mathrm {d}\mathbf {x})= & {} (-1)^n\epsilon ^n \int _0^\infty \mathrm {d}t_1\ldots \int _0^\infty \mathrm {d}t_ n \nabla \cdot \left( \mathbf {X}(\mathbf {x}(t_1) \ldots \nabla _{\mathbf {x}(t_1+\ldots t_{n-1})}\right. \nonumber \\&\left. \cdot \left( \mathbf {X}(\mathbf {x}(t_1+\ldots t_{n})\right) \right) \nu _0 (\mathrm {d}\mathbf {x}), \end{aligned}$$
(51)

Considering the adjoint problem and computing the higher order corrections to the expectation value of the observable O, we derive the general response formula proposed by Ruelle

$$\begin{aligned} \delta [O]_n = \epsilon ^n \int \nu _0 (\mathrm {d}\mathbf {x}) \int _0^\infty \mathrm {d} t_1\ldots \int _0^\infty \mathrm {d} t_n \Lambda S^{t_1} \ldots S^{t_{n-1}} \Lambda S^{t_{n}} O(\mathbf {x}), \end{aligned}$$
(52)

as reported in Eq. 2.

5 A Very Basic Numerical Experiment

In order to make a (very) preliminary assessment of the potential of some of the ideas presented in this paper, we have focused on investigating some properties of the celebrated Lorenz ’63 system [44]:

$$\begin{aligned} \dot{x}=&\sigma (y-x)\nonumber \\ \dot{y}=&x(\rho -z)-y\\ \dot{z}=&xy-\beta z\nonumber \end{aligned}$$
(53)

where we have chosen the standard value for the parameters \(\sigma =10\), \(\rho =28\), and \(\beta =8/3\). We remark that such a system is not an Axiom A, but instead a singular hyperbolic system [52], which possesses a chaotic attractor and an invariant SRB measure [53]. In a previous publication [34], we have performed an analysis of the linear and nonlinear response of the Lorenz ’63 to perturbations, extending a previous investigation by Reick [54], which makes us confident that response theory can be safely applied at all practical purposes also in this case. We consider the special case of time-indepedent perturbations to the dynamics resulting from substituting \(\rho \rightarrow \rho +\epsilon \) in Eq. 53, so that the perturbation flow can be written as \(\epsilon \mathbf {X}(\mathbf {x})=[0\quad \epsilon x\quad 0]^\top \).

Fig. 1
figure 1

Attractor of the Lorenz ’63 system with indication of the cartesian grids used for constructing the partitions of its phase space. See text

We have then identified a 3-dimensional box \(\mathcal {B}\) containing the attractor, defined as \(\mathcal {B}=\{(x,y,z)\in \mathcal {R}^3 |x\in [-20,20],\quad y\in [-30,30],\quad z\in [-0,50]\}\), and subdivided it, á la Ulam, in smaller boxes of identical size using a regularly spaced cartesian grid. We have considered partitions obtained using small boxes with linear dimension given by \(dx=2 \times j\), \(dy=3\times j\), and \(dz=2.5\times j\), along the three directions, with \(j=1,2,4\), see Fig. 1. This amounts to partitioning \(\mathcal {B}\) into \(8000/j^3\) smaller boxes. Note that our construction delivers a much lower resolution with respect to what used in, e.g., [55].

We run the model with standard values of the parameters choosing as initial condition \([1\quad 1 \quad 1]^\top \) (in fact, given the global attractivity and ergodicity of the Lorenz attractor, any initial condition can be chosen), and, after discarding a transient of 1000 time units, which brings us safely into the asymptotic regime, we run the model for 50,000 time units with a simple Runge–Kutta 4th order adaptive scheme and obtain the output with time step of 0.001 time units. This takes less than 10 minutes in a today’s commercial laptop with standard specifics using MATLAB \(^\circledR \). We present results at such a low level of sophistication in order to clarify that the appracch proposed here is rather robust and of relatively simple implementation.

As the box-counting dimension or capacity of the attractor of the model given in Eq. 53 is \(d_0\sim 2.05\), we expect that the number of boxes \(B^j_k\), \(k=1,\dots ,N_B^j\) needed to cover the attractor decreases \(N_B^j\propto 1/j^{d_0}\). We obtain a slightly lower exponent \(\sim \)1.9, which is perfectly acceptable as we are far from the asymptotic regime where the scaling given by \(d_0\) is realized.

Table 1 Expectation value of the observables \(x^2\), \(y^2\), \(z^2\), and z and their linear response with respect to the perturbation \(\rho \rightarrow \rho +\epsilon \)

For each value of j, the boxes \(B^j_k\) define the discrete states \(\phi ^j_k\), \(k=1,\dots ,N_B^j\). By counting the number of times the trajectory is included in each state \(\phi ^j_k\) and normalizing we derive experimentally the asymptotic normalized occupancies \(\bar{u}^j_k\). Instead, by tracking the transitions between the various discrete states, we construct the estimate of the stochastic transition matrix \(\mathcal {M}^j_{p,q}\) describing the probability that the state \(\phi ^j_q\) makes a transition to the state \(\phi ^j_p\) in one time step. By finding the eigenvector corresponding to the unique unitary eigenvalue of \(\mathcal {M}^j_{p,q}\), we find the invariant measure, which agrees up to very high precision with the empirical occupancy rate \(\bar{u}_k\) computed from the trajectory. As a first step, we evaluate the expectation values of four meaningful observables given by \(x^2\), \(y^2\), \(z^2\), and z, as obtained from the time integration of the Lorenz model and from its discrete representation in terms of Markov chain. Table 1 shows that the agreement is rather good even when extremely coarse resolution is used.

We then show how to compute the response of the system to the perturbation due to the introduction of the vector field \(\epsilon \mathbf {X}(\mathbf {x})\). We keep in mind that when continuous time dynamics is considered, there is a very simple linear relation between the perturbation flow and the corresponding perturbation to the Perron–Frobenius operator, see Eqs. 3840.

Therefore, we repeat the the steps described above for the \(\epsilon -\)perturbed flow (we choose \(\epsilon =0.1\) in order to be on the safe side in terms of convergence), compute the new stochastic transition matrices \(\mathcal {M}^{j,\epsilon }_{p,q}\), and derive the perturbation matrices \(\epsilon m^j_{p,q}= \mathcal {M}^{j,\epsilon }_{p,q}-\mathcal {M}^{j}_{p,q}\). Once \(m_{p,q}\) and \(\mathcal {M}^{j}_{p,q}\) are known, we can use them to compute the response of the systems at all orders of nonlinearity using Eqs. 22 and 36.

One needs to note that because of the non-infinite integration time considered, of the non-infinitesimal perturbation applied, and of the somewhat arbitrary choice of the boxes, it can happen that the original and perturbed flow may be characterized by a different number of discrete states. We have observed such a difference only in the case \(j=1\), involving one single extra state for the perturbed flow, with normalized relative occupancy (\({\le }10^{-6}\)). This problem can be easily sorted out by imposing a cutoff and removing from the the discrete description all states with very low.

As discussed above, one needs to test accurately the well-posedness and convergence of the expansion in order to be sure to obtain meaningful results. This is not our goal at this stage given such a preliminary numerical test of our results. Therefore, we limit ourselves to the less ambitious yet interesting goal of computing the linear response defined in Eq. 32 for the observables indicated above, using Eq. 48. The results are reported in Table 1 and seem very encouraging. We have that the estimates of the response are very stable with respect to changes in the resolution of the boxes, and agree to a high degree of precision with the results one obtains by empirically evaluating the sensitivity of the observables with respect to the introduction of the perturbation flow using two integrations, as well, in the case of the z observable, with what reported in [34]. We note that the results are virtually unchanged if one uses instead of the high resolution time series with time step of 0.001 time units sparser observations corresponding to, e.g. a time step of 0.01 time units. Obviously, using a time resolution lower by a factor of s with respect to what considered here, one derives by tracking the transitions a stochastic transition matrix corresponding to the sth power of the one obtained at higher resolution. This does not affect the results as long as the sampling is much higher than the characteristic time scale of the system, which can be approximated in \({\sim }1/\lambda _1\sim 1.1\) time units, where \(\lambda _1\) is the positive Lyapunov exponent of the system. On much longer time scales, instead, the stochastic matrix is quasi-degenerate, with all columns almost equal to the invariant measure

6 Conclusions

Taking the point of view of finite state Markov systems, we have been able to construct a perturbation theory for studying the impact of small perturbations to the background dynamics. While previous approaches focus on the constructing a theory able to account for the effect of adding small perturbations to the baseline flow, we focus on computing the change in the invariant measure and for the change in the expectation values of general observables (one problem being the adjoint of the other) occurring when the Markov transition matrix \(\mathcal {M}\rightarrow \mathcal {M}+\epsilon m\).

The perturbation term \(\epsilon m\) has to be such that all the columns of the new stochastic matrix sum up to 1 and all entries are positive. All of our findings are obtained with rather simple linear algebra manipulations and using basic properties of the stochastic matrices. We can express the response as a perturbation series or, after suitable resummation, using compact exact formulas. We are also able to assess the convergence properties of the response theory by defining a value \(\epsilon ^*_{max}\) such that if \(|\epsilon |\le \epsilon ^*_{max}\) the perturbative expansion converges. We have that the stronger is the mixing of the unperturbed system, the larger is the value of \(\epsilon _{max}\). These findings match well with previous results providing upper bounds to the sensitivity of stochastic matrices to perturbations.

Our results provide a direct algorithmic method for studying the response to perturbations for finite state Markov processes and have the advantage of allowing for an immediate and practical change of point of view between response theory seen in terms of changes of the invariant measure or in terms of changes in the expectation values of observables, by simply computing the transpose of the resulting finite dimensional linear operators. Our findings give closed formulas for the linear and nonlinear response theory at all orders of perturbations through explicit matrix expressions that can be directly implemented in any coding language.

We can use our formulas to study the response to perturbations of finite state Markov processes constructed in order to have a simplified and treatable picture of a complex system. Given two different state spaces constructed using different finite partitions covering the attractor of the system, we cannot expect to obtain the same results for the change in the expectation value of a given observables. The results might indeed be model dependent, but this is the obvious price one has to pay because of the subjective choice of the reduced state space. An assessment of the robustness of the obtained results is key to applying our methods in the context of reduced models. Nonetheless, the extremely unsophisticated numerical study reported here on the Lorenz’63 model is quite encouraging at this regard, even if test should be made on much higher dimensional models.

If the underlying dynamics is Axiom A (or Axiom A equivalent, as in the cases where the chaotic hypothesis applies), one can impose conditions such that the response operators constructed using finer and finer partitions converge to to the actual corresponding response operators constructed on the SRB measure. Having in mind the Ulam method, the conditions are stricter than what needed in order to have convergence of the unperturbed measure, the basic reason being that Ruelle response operators correspond to nontrivial observables. One expects better convergence if the self-refining Markov partitions of the system are considered when constructing the finite state approximations.

Our results can be thought as intermediate steps at finite precision leading to the correct response formulas in the limit. One needs to add as a caveat that going from finite state to functional spaces is far from trivial and requires a high degree of mathematical precision, which is beyond the scopes of this paper. Nonetheless, the finite construction proposed here seems to somehow point at why some important mathematical issues emerge when the Perron–Frobenius operator formalism is considered in a continuum setting. In particular, the need for selecting suitable norms for vectors and linear operators in finite dimension points to the complex requirements in terms of functional spaces described in e.g. [51].

Interestingly, we can use the formulas obtained for finite state Markov processes to study the impact of perturbations to continuous time dynamical systems, after making a suitable identification between the considered transition matrices and the evolution operators for measures and observables. This operation is straightforward because there is a simple linear exact relation between the perturbation in the vector flow of the dynamical system and the perturbation in the Perron–Frobenius operator when infinitesimal time intervals are considered. As a result, we are able to derive in a very simple way previous formulas obtained studying the perturbations to the transfer operator as well as the original expressions proposed by Ruelle for the linear and higher order perturbations in the expectation values of observables. Using the results obtained in the finite state case, we propose a formula for the radius of expansion of the perturbative theory.

One can envision that in the case the underlying dynamics is discrete, there is not such a one-to-one correspondence between perturbations to the vector field and perturbations to the Markov transition matrix. This can be easily checked when constructing the perturbed Perron–Frobenius operator resulting from adding a \(\epsilon \) correction to the vector field, which results into changes in the Perron–Frobenius operator at all orders in \(\epsilon \). Therefore, the perturbative expansion is different in the two cases. Agreement is instead found in the limit \(\epsilon \rightarrow 0\), or, more practically, when we retain only the linear terms in \(\epsilon \) perturbative expansion, i.e. when aiming only at the linear response function.

Future investigations will try, on the one side, to have a sharper mathematical look at the problem of going from finite to infinitely small partitions of the phase space, and, on the other side, to delve in the numerical study of the effectiveness and efficiency of the proposed tools. Apart from testing the results on specific finite state Markov systems, we will test how robust the proposed methods are when studying finite state Markov processes that have been empirically constructed from time series of observations or of numerical simulations of high-dimensional complex systems. One may be led to hoping that it could be possible to have an accurate representation of the response of a high dimensional system to perturbations by constructing a smart finite state model well suited to studying specific observables of interest. Of course, in order to deal with the curse of dimensionality, one would like to be able to go beyond the Ulam method and deal with finite partition of reduced phase spaces where projection is applied on many or even most dimensions.

Our formulas may address the now long-standing problem of constructing suitable algorithms for studying the response of chaotic systems to perturbations. It is extremely hard to construct an algorithm for computing the (linear) response theory directly on the flow, because serious problems emerge when considering the contributions coming from the unstable directions in the tangent space. This might have great relevance for studying problems, like climate dynamics, where a direct construction of the response operator is especially challenging and slightly indirect methods have to be used [35] and a lot of effort has been devoted to defining the so-called atmospheric regimes and predicting their response to forcings [56].