Predictors and Predictands of Linear Response in Spatially Extended Systems

The goal of response theory, in each of its many statistical mechanical formulations, is to predict the perturbed response of a system from the knowledge of the unperturbed state and of the applied perturbation. A new recent angle on the problem focuses on providing a method to perform predictions of the change in one observable of the system by using the change in a second observable as a surrogate for the actual forcing. Such a viewpoint tries to address the very relevant problem of causal links within complex system when only incomplete information is available. We present here a method for quantifying and ranking the predictive ability of observables and use it to investigate the response of a paradigmatic spatially extended system, the Lorenz '96 model. We perturb locally the system and we then study to what extent a given local observable can predict the behaviour of a separate local observable. We show that this approach can reveal insights on the way a signal propagates inside the system. We also show that the procedure becomes more efficient if one considers multiple acting forcings and, correspondingly, multiple observables as predictors of the observable of interest.


Elements of Response Theory
Response Theory is an area of statistical physics that provides general methods for predicting the changes in the statistical properties of an observable of interest Ψ from the knowledge of the applied perturbation and of the statistical properties of the unperturbed system. The expectation value of Ψ in the perturbed system is expressed as a perturbative series where the zeroth-order term is the expectation value of Ψ in the unperturbed system. The higher-order terms are expressed in terms of response functions which contain information about the higher order statistics of the unperturbed system and the applied forcing A cornerstone in the development of response theory came with the work by Kubo [1, 2], who considered the case of weakly perturbed systems near thermodynamic equilibrium.
When a system is in this kind of steady state, it obeys detailed balance and features no net currents. In Kubo's theory, from the knowledge of the statistics of the unperturbed system, and in particular of the correlations describing its spontaneous fluctuations, it is possible to compute, in linear approximation, the response of the system to any (weak) perturbation. This is the key idea behind the fluctuation-dissipation theorem (FDT) [2], which establishes a link between the forced and free fluctuations in the perturbative regime [3]. A generalised version of the FDT valid for higher moments has been proposed by [4] in the spirit of Zubarev's generalization of Kubo's results [5]. We will discuss from now on the special and very relevant case of linear response theory (LRT) and of its applications.
The analysis of the response to perturbations of systems that are far from thermodynamics equilibrium requires a more general approach than what provided by Kubo's theory.
Indeed, nonequilibrium systems are ubiquitous and their investigations has both great theoretical relevance and allows one to study many important real-life examples in material science, physics, ecology, fluid dynamics, climate science, biology, among others. Often, one says that after transients have died out, in absence of time-dependent forcings, a nonequilibrium system is in the so-called nonequilibrium steady state, where detailed balance is not obeyed, and, in many cases, dissipative processes are associated with a contraction of the phase space and with the production of entropy [6]. Studying the response of NESS systems to perturbations is of great relevance both for theory and applications. At this regard, a rigorous and crucial development in the context of deterministic dynamics was provided by Ruelle [7][8][9][10], who rigorously derived a LRT for smooth observable of Axiom A dynamical systems. Ruelle's results have been been recast and extended by methods of functional analysis [11,12]. Roughly speaking, Axiom A systems provide an extremely useful example of chaos characterized by the presence of a clear separation between the expanding and contracting directions in the tangent space, formalized by the concept of uniform hyperbolicity. Crucially, Axiom A systems possess a special ergodic invariant measure -the so-called Sinai-Ruelle-Bowen measure [7] -that makes them excellent candidates for describing non-equilibrium physical systems. While Axiom A systems are mathematically quite special, their practical relevance is clarified by the chaotic hypothesis proposed by Gallavotti and Cohen [13,14], which can be see an extension of the ergodic hypothesis to the non-equilibrium case.
The chaotic hypothesis is supportive of the fact that LRT should de facto work in a large class of chaotic dynamical systems. As opposed to the equilibrium case, general nonequilibrium dynamical systems obeying Axiom A dynamics possess an invariant measure that is singular with respect to Lebesgue and is supported on a strange attractor. Ruelle proved that the response operator is given by the sum of two contributions. One is related to the dynamics on the unstable and central manifolds and can be framed as a FDT result, while the second one is related to the dynamics occurring on the stable manifold, and cannot be framed as a FDT result because of the non-smoothness of the measure [10]. In other words, the natural fluctuations are not equivalent to the forced perturbations along the stable directions [15][16][17]. The direct computation of these two contributions is far from trivial [18][19][20], but the use of adjoint and shadowing methods has recently led to promising results in this direction [21][22][23].
Under rather general hypotheses, adding noise makes the invariant measure absolutely continuous with respect to Lebesgue, so that the FDT fully holds [27]. In this setting, the obtained formula is called the Kubo-Agarwal formula, which reduces to the Kubo formula for equilibrium systems. The addition of a noise term has to be justified by the nature of the considered problem. This stochastic perspective becomes relevant in many complex systems, where the focus is on coarse-grained dynamics, which is effectively stochastic as a result of the presence of microscopic degrees of freedom. Note that the coarse-grained dynamics is in general non-markovian, with memory effect becoming negligible in the limit of infinite time-scale separation between the fast and slow variables [28][29][30][31][32].
The relationship between response theory in deterministic and stochastic systems has been thoroughly discussed in [33], where the authors also propose a very well-developed justification, not based on the chaotic hypothesis, for the broad pragmatic applicability of LRT in a vast class of deterministic chaotic systems.
Nowadays, LRT has an important role in the investigations of a vast range of systems see e.g. [20,[34][35][36][37][38][39][40]. As an example, in the case of climate science, LRT makes it possible to perform climate change projections using climate models of different levels of complexity. This amounts to computing the time-dependent measure supported on pullback attractor of the climate [41] by constructing response operators for a suitably defined reference climatic state [39]. This viewpoint allows one to investigate ways to control the future pathways of climate change [42] and to make an assessment of potential and pitfalls of geoengineering strategies [43].
Recently, LRT has been extended in such a way that explicit formulas are given for describing how adding a forcing to a system changes its n−point correlations [44]. Another recent application of LRT has focused on detecting and characterising phase transitions in a network of coupled identical agents undergoing a stochastic evolution [45]. A recently published special issue showcases several emerging areas of applications for LRT [46]. Majda and Qi have recently shown the existence of a coherent thread connecting LRT, sensitivity analysis, model reduction techniques, uncertainty quantification, and control of high-dimensional chaotic and stochastic systems [47,48]. The theme of studying the response using incomplete information on the system is the main topic of the present contribution .

Predictors and Predictands
A different angle on the problem of defining the response of a system to perturbations proposes a fairly general method that allows one to relate the response of different observables of a system undergoing a perturbation. This method could be useful in situations where we need to interpret experimental data and we do not have necessarily perfect knowledge of the properties of the system -e.g. we do not know the specific features of the applied forcing, have access to only a limited number of degrees of freedom of the system, or ignore altogether the evolution equation of the system itself -but can measure instead multiple observables at the same time [49]. This is closely related to finding solutions to nonlinear rational least squares problems [50,51].
The goal is to understand to what extent we can use perturbed observables as surrogates of the perturbation to reconstruct the time behaviour of other observables. It turns out that, if we know the time behaviour of one observable Ψ 1 , we can always reconstruct diagnostically, within linear approximation, the time behaviour of another observable Ψ 2 through a surrogate response function.
Instead, if the goal is to actually predict the future state of the observable Ψ 2 , by means of a prognostic relation, it turns out that not all choices of predictor are equally successful to perform the prognosis of a given predictand, because the surrogate response function might be, as opposed to the standard response (Green) function, not causal, i.e. its support is not limited to non-negative times.
The analysis performed on the Lorenz '96 model [52][53][54] in [49] explicitly showed that the response of an observable Ψ 1 to a given perturbations could be predicted by the response of a certain observable Ψ 2 but could not be predicted by the response of a third one Ψ 3 , because in the latter case the surrogate response function has a non-causal component.

This Paper
Following the seminal contribution by Granger [55], more and more attention has been recently paid to finding rigorous ways for defining and detecting causal links within complex systems [56], and climate science has been a very successful fertile of application of such ideas [57][58][59][60].
In this paper, we address this class of problems using a different viewpoint. We want here to make a step forward compared to [49] with the goal to better quantify the skill of different observables in predicting the response of a given observable. Here, we propose a way to evaluate how much their corresponding surrogate response functions are not causal. This problem can emerge in a variety of situations where we have more non-predictive surrogate response functions and we want to choose between them the one which provides the best prediction. For example, we could have a set of observables {Ψ 1 , ..., Ψ m } and we could aim at finding out which one(s) can be used to predict most accurately the response of another observable Ψ a . Any observable whose surrogate response function is predictive is equally good at this regard. If only one among the m observables features a predictive surrogate response function, the choice is obvious. The choice becomes less straightforward when all the observables {Ψ 1 , ..., Ψ m } are not predictive. We would like to have a quantitative method that allows us to choose the most predictive one among them. Another setting where such a method could be helpful is when we want to understand whether an observable Ψ a predicts better Ψ b or viceversa, as a better predictive power can be linked to a causal link or a flow of information with a definite verse from an observable to the other.
Specifically, we investigate causal links in the context of a spatially extended system undergoing chaotic dynamics, namely, the paradigmatic Lorenz 96 (L96) model [52][53][54]. We investigate the response of the system to localised forcings, and we observe the system by looking at its local properties in different regions. We assess whether the response of local observables can be used to predict the response of other local observables of interest. One of the goals of this analysis is to see whether we can detect a clear indication of the way signals propagate inside the system. We also explore the possibility suggested in [49] that the predictive skill of the surrogate approach proposed here could dramatically improve if one perturbs the system with multiple forcings and uses multiple observable as surrogate predictors.
The rest of the paper is structured as follows. In Section II, after briefly reviewing the surrogate LRT [49], we present the predictability index (PI), which aims at quantifying how much a surrogate response function is non-predictive. We will remark that the presence of such an index provides the opportunity to build an hierarchy of observables in terms of their predictive power of other observables. In Section III we apply the surrogate LRT on the L96 model, considering local perturbations and local observables. We will also explore the impact of adding information gathered from global observables to predict the local forced response.
In Section IV we summarise the main findings of this study and present perspectives for future research. A set of Appendices provides supplementary material of possible interest for the reader. In Appendix A we provide an illustrative example to better explain the meaning of the PI. In Appendix B we provide evidence of the fact that in our experiments and data analysis we are in a linear regime of response. In Appendix C we clarify some asymptotic properties of the response of the Lorenz '96 model, while in Appendix D some specific properties of the surrogate response functions are discussed.

II. SURROGATE RESPONSE THEORY
Following Ruelle [8][9][10], we recapitulate very informally some basic elements of LRT by studying the effect of perturbing an Axiom A system of the form˙ x = F (x), where x ∈ M, a smooth compact manifold of dimension D. We introduce the following complex pattern of forcing, consisting in N independent perturbations: where G 1 (x), . . . , G N (x) are D−dimensional smooth vector fields while e 1 (t), . . . e N (t) are time patterns. In linear approximation, which is relevant in the case the applied forcings are small, it is possible to write the change in the expectation value of any smooth observable Ψ as follows: where we have introduced the linear (Green) response functions Γ Ψ,G j , which mediate the effect of the time pattern of the perturbation e j at time τ < t on the observable Ψ at time t. The response function can be seen as the expectation value in the unperturbed system of a complex observable that depends on the applied forcing and on Ψ: where Θ is the Heaviside distribution, which determines the causality of the response functions, ρ 0 is the invariant measure of the unperturbed system and Ψ(y(t)) = S t Ψ(y) = exp(L 0 t)Ψ(y) is the value of the observable Ψ at time t following the evolution in time according to the dynamics of the unperturbed system, with initial condition y. We have that S t is the (unperturbed) Koopman operator and L 0 = F · ∇ is its generator. By applying the Fourier transform to Eq. 2, we obtain the following identitys: where χ Ψ,G j (ω), j = 1, . . . , N are the so-called susceptibilities. Since the response functions Γ Ψ,G j (t), j = 1, . . . , N are causal, under standard conditions of integrability the corresponding susceptibilities are analytic in the upper complex ω−plane [61,62].

A. Surrogate response functions
A different angle of the problem in LRT has been introduced in [49] where response relations between perturbed observables are built. These relations can be useful in a large variety of contexts where the knowledge of the forcing is just partial, and we want to use perturbed observables to diagnose the state of other perturbed observables. We consider N + 1 independent observables Ψ 1 (x), . . . , Ψ N (x). In [49] it is shown that it is possible to express the linear change of the expectation value of an observable Ψ N +1 as a function of the ones of the other N observables; where we take the surrogate of the N forcings using the other N observables through the surrogate susceptibilities J N +1,l , l = 1, . . . , N . In [49] explicit expressions for the surrogate susceptibilities J N +1,l (ω) are derived: Once we have obtained the surrogate response functions, we plug them into Eq. 5, obtaining the following expression: where Note that the previous relation does not involve the time patterns of the acting forcings.
This has important practical consequences. If we are able to derive the functions H N +1,l (t) or J N +1,l (ω), l = 1, . . . , N from a probe experiment performed with a (broadband) time pattern of forcing, we can use Eq. 7 to reconstruct the time evolution of δ Ψ N +1 (t) for any time pattern of the forcing using as input the time evolution of δ Ψ j (t), l = 1, . . . , N .
This will investigated in the next Sect. III B. This inverse problem can also be approached by considering monochromatic forcings and scanning across frequencies, in the spirit of the algorithms proposed in [50,51].
can be employed to diagnose, at the linear level in the perturbation, the time behaviour of Ψ N +1 by means of other N observables. On the other hand, it is not always possible to perform the prognosis of Ψ N +1 using the same observables, which can be useful in prediction problems. This requires that the response functions H N +1,l ∀ l = 1, . . . , N are causal, i.e. their support is in the non-negative domain.
Let's expand more on the case N = 1, to better clarify some issues associated with the surrogate LRT. Equations 6-7 become: The surrogate response function H 2,1 is predictive -i.e., it has support only on the nonnegative domaion -if and only if its Fourier transform J 2,1 has no poles in the upper complex ω−plane. Since the numerator χ Ψ 1 ,G (ω) is analytic in the upper complex ω plane, loss of predictability is realised only if the response function χ Ψ 2 ,G (ω) at the denominator has complex zeros in the upper complex ω plane, see discussion in [49].
Indeed, we expect that for a given forcing not all choices of predictors and predictands are equally successful in terms of predictive power. For instance, if there is a causal relation in a feedback or a flow of information linking observable Ψ 1 and Ψ 2 , one expects the presence of an asymmetry in the mutual predictive power.
Lastly, we remark that the surrogate response function could have a singular component in 0, as it is noted in [49]. There is a close link between the short-time behaviour of Γ Ψ 1 ,G and the high-frequency behaviour of its Fourier transform: As a consequence, it is possible to obtain the asymptotic behaviour of the Fourier transform of the surrogate response function J 2,1 using the asymptotic behaviour of the response functions. In particular, if for large values of ω we have that χ Ψ 1 ,G ≈ 1/ω α 1 +1 and χ Ψ 2 ,G ≈ 1/ω α 2 +1 , we derive: If α 1 < α 2 J 2,1 (ω) diverges for large values of ω, while it converges to a non-vanishing constant for α 1 = α 2 . Hence, in these cases the surrogate response function H 2,1 (t) will have a singular component S 2,1 (t) in t = 0 because the Fourier transform of (−iω) j is δ j (t), i.e.
Note that the exponent α describing the short time behaviour of the response function controls how long it takes for the observable to feel the effect of the forcing. The higher the exponent, the slower is the build-up of the effect of the forcing on the observable. Hence, it makes sense to use Ψ 1 to predict Ψ 2 only if α 1 ≤ α 2 , i.e. if Ψ 1 feels the applied perturbation before or approximately at the same time as Ψ 2 . From now on we then exclude the case The part of the surrogate response function H 2,1 that is practically usable for predictions has support restricted to the non-negative domain and can then be expressed as follows: In practice, since the outputs of numerical simulations have finite precision, H c 2,1 (t) can be reconstructed from data as follows: where the ε− regularization has been introduced to include in the definition of H c 2,1 possible singular components of H 2,1 (t) at t = 0.

B. Quantifying the Ability to Predict
The presence of the non-causal component in the surrogate response function H 2,1 hinders the prediction of Ψ 2 at time t using just the time behaviour of Ψ 1 up to time t. An interesting problem is to actually quantify the non-causal component of the surrogate response function. This quantification would allow to build an hierarchy of observables in terms of their predictive power of other observables.
From the discussion above, we then have that the surrogate response functions are of the form: where the constant s 2,1 ∈ R can also be zero. We propose to quantify the ability of the observable 1 to predict the observable 2 with the predictability index (PI), which is defined as follows: where: hence K c 2,1 is the causal part of the non-singular component of the surrogate response function, while K nc 2,1 is its non-causal part. Additionally, • 1 is the standard L 1 norm. The PI depends on the system under investigation, the space pattern of the forcing G(x) and on the observables Ψ 1 and Ψ 2 .
The PI is non-negative and vanishes if and only if the surrogate response function is predictive, because its support includes only the non-negative domain. A large value for the PI indicates that the response of the observable 1 is a poor predictor of the response of the observable 2. Moreover, since this method revolves around the surrogate response function, it does not depend on the chosen time pattern. We will actually see the effectiveness of this indicator in the L96 model in Section III. A few pedagogical examples can also be found in Appendix A.
We can generalize the indicator given in Eq. 15 to the case when we use Ψ l , l = 1, . . . , N observables as predictors of the observable Ψ N +1 , as in Eq. 5. For each surrogate response function H N +1,l (t), with l ∈ {1, ..., N }, we define its singular part S N +1,l (t) and its nonsingular part K N +1,l (t), which, in turn, can be split into the non-causal component K nc N +1,l and the causal component K c N +1,l . We assume that all these surrogate response functions are of the type Eq. 14. We then define: III. THE LORENZ '96 MODEL

A. Model Formulation
The L96 model [52][53][54] is a paradigmatic model that provides a metaphor of some essential properties of the dynamics of the atmosphere. It is defined on a lattice of N grid points and the evolution of the variables x i , i = 1, . . . , N is given by the following system of ordinary differential equations:ẋ where F is a constant controlling the external forcing, γ (usually set to a unitary value, as done also here) modulates the strength of the dissipation, and the quadratic term on the right hand side describes a non-standard advection process. The system obeys periodic boundary conditions, so that Recently, two extensions of the L96 model have been proposed, one able to accommodate for a complex interplay between dynamics and thermodynamics [64], and one featuring multiple competing attractors [67].

Linear response functions
We will now apply the formalism of the surrogate LRT to the L96 dynamical system Eq. 18. We will set ourselves in the chaotic regime by choosing N = 36 and F = 8.
In [49], the focus was on global perturbations, which impact directly all the variables x i , i = 1, . . . N , and on global observables. We want now to take a different route, focusing on local perturbations and local observables, which had initially been proposed in [17]. In particular, we choose the following spatial pattern of applied forcing: where δ ik is the Kronecker delta, which has unitary value if the two indices are identical and vanishes otherwise, and is a real number which measures the magnitude of the perturbation.
Equation 19 implies that we apply an extra forcing only at k th grid point. We will consider as observables of interest the dynamical variables x j . With these choices of the perturbation and the observables, the problem we are addressing amounts to asking to what extent a perturbed variable at location i can predict the future state of another perturbed variable at location j after the system has been perturbed locally at location k. Given the discrete symmetry of the system, we expect that these properties will depend only on the relative position of i, j, and k, but, since the propagation of the information in this system is directional, we expect that they will not depend only on the distance between these locations.
where δx i (t) is the difference between the value of the variable x i at time t in the perturbed and unperturbed run, both having the same initial conditionx (k) . We then repeat the same procedure by switching the sign of the forcing: → − , obtaining Γ − i,k (t). Our estimate of the response function is then given by The last step allows to remove the second order correction to the linear response and con-  siderably increases the precision of the estimate [68]. The linearity of the responses has been tested to hold very well up to = 1, see Appendix B. We choose = 1 for our numerical studies in order to have a good signal-to-noise ratio while being within the regime of linear response. We remark that the procedure above ensures that the response function is causal.
The response functions Γ i,k for i = {k − 2, k − 1, k, k + 1, k + 2} obtained for M = 2 · 10 6 and = 1 are shown in Fig. 1. We focus on the dynamical variables nearby the perturbation site because the intensity of the response decreases dramatically as we move further away, as discussed below.

Propagation of the perturbation signal
We now look at the leading order term of short-time behaviour of Γ i,k for positive times (Γ i,k vanishes for negative times): where the coefficient C (q) i 0 is the expectation value of the function C (q) i taken over the stationary measure ρ 0 , see Appendix C. In particular, we notice that at t = 0 the response function Γ i,k is equal to 1 if i = k and it is vanishing for i = k. This is intuitive: at t = 0 the perturbation is felt in all its intensity just at the grid point that has been directly perturbed. As t increases, the perturbation propagates also to the other grid points x i , with a time scale determined by the leading order term given in Eq. 22. Note that the perturbation propagates more efficiently towards the right (i > k) than towards left (i < k), since for each dynamical variable x k−q at the left of x k with leading term t q there are two dynamical variables x k+2q−3 and x k+2q (for q > 1) at its right with the same leading term.
We can have a clearer intuition of the actual propagation of the signal in space and in time by looking at the cartoon in Fig. 2. It is remarkable that the site k + 3 or k + 5 are affected by the forcing later than the sites k + 4 or k + 6, respectively. This further indicates that the advection taking place in the L96 system is a non-standard one.
The high-frequency asymptotic behaviour of the susceptibilities corresponding to the response functions given in Eq. 22 can be derived using Eq. 9.

Hierarchy of the dynamical variables
We want now to build a hierarchy of the dynamical variables in terms of their predictive power of the other dynamical variables. In particular, this hierarchy is closely related to how the perturbation signal propagates in the system: the sooner a dynamical variables feels the perturbation and the more predictive it will be. We construct the surrogate response function by using Eq. 8 and by then applying the cutoff introduced in Eq. 12. We define loses its intensity as time goes by. We can notice that there are more red-coloured variables above the site x k than below. This is consistent with the fact that the information propagates from the dynamical variables x k+j with j < 0 to the dynamical variables x k+j with j > 0, with a velocity given by the group velocity v g , shown with a green arrow in the figure. as H i,j the surrogate response function that allows one to reconstruct the variable i using the variable j as surrogate forcing. We indicate with K i,j (S i,j ) its non-singular (singular) surrogate response functions shown in Fig. 3 is entirely negligible. The second most predictive variables are x k−1 and x k+2 . A few surrogate response functions employing them as predictors are shown in Fig. 4 and Fig. 5. We can observe that these surrogate response functions show a rather small non-causal component. Notice that just the non-singular components Eq. 11 are shown in the figures, for clarity purposes. The third most predictive variables between the ones we have considered are x k−2 and x k+1 . In Fig. 6 we show the non-singular parts of the related surrogate response functions K k−2,k+1 and K k+1,k−2 . We can notice that their non-causal components are more relevant than the ones considered before, because the information retained by the predictors is degraded.
We can better quantify the importance of the non causal component of the surrogate response functions through the PI introduced in Eq. 15. The results are presented in Table   I for the surrogate response functions between the dynamical variables x k−2 , x k−1 , x k+1 and x k+2 . Looking at the values provided by the PI, we observe that the weight of the non causal part is bigger when the predictors are x k−2 and x k+1 . This is in agreement with the hierarchy described above.
Another aspect we wish to analyse is whether one can define a preferential direction for performing the prediction. If we take two variables x i and x j , we can ask ourselves whether it is better to use x i to predict x j or the other way around. Of course, this issue makes sense in the case the two variables x i and x j have the same rank (otherwise we would just use the higher ranked variable as a predictor). This is the case of x k−1 and x k+2 and of x k+2 0.0068 0.0033 0.014 · above. We perturb the L96 system with the vector field having spatial pattern G(x) = δ i,k as above and having the following time pattern: with τ = 5. This time pattern seems relevant because corresponds to the act of switching abruptly on and off the perturbation, and keeping it active for a time scale that is longer than the inverse of the first Lyapunov exponent of the system (about 0.6 time units). Therefore, it allows to appreciate both transient and longer term features of the response of the system.
We then test the skill of the variable x k in predicting x j , with j ∈ {k − 2, k − 1, k + 1, k + 2}: where we use only the causal component of the surrogate response function. The predictions are shown in Fig. 7, where we can notice that the agreement between the actual response and the prediction made through the surrogate response functions where x k is the predictor is very good in all cases.
We now consider the second most predictive dynamical variables x k−1 and x k+2 . The predictions are shown in Fig. 8. We can notice that these predictors work quite well: despite not being directly perturbed by the forcing, they retain a lot of information. We also remark that some discrepancy between prediction and the actual response emerges in terms of mismatch of the oscillations taking place during the plateau of the forcing.
Lastly, we take into account the predictions made using the variables x k−2 and x k+1 .
As we can see in Fig. 9, the predictions are clearly less successful than in previous cases, even though they show a qualitative agreement with the actual responses. This is explained by the greater relevance of the non-causal components of the related surrogate response functions H k−2,k+1 and H k+1,k−2 , as it can be seen in Fig. 6; see also Tab. I. Remarkably, be comparing the two panels of Fig. 9, we can clearly see the asymmetry between the mutual predictive power of x k−2 (better) and x k+1 (worse) discussed above.

C. Making predictions with more observables
We have shown above that some local variables cannot well predict other local variables, as a result of how the perturbation signal propagates across the system. Following the discussion in Sect. II, we test whether this can be improved by applying two independent forcings and, correspondingly, using two predictors. The idea is that by adding a forcing and a predictor we are able to learn more about the properties of the system and of its response. We then add, on top of the perturbation described by Eq. 19 with time pattern given by Eq. 23, a second forcing that impacts the viscosity of the system acting at the k th grid point : The forcing is applied using the temporal pattern where τ 2 = 3.
The corresponding response functions Γ i,k for i = {k − 2, k − 1, k, k + 1, k + 2} obtained for M = 2 · 10 6 and = 0.1 are shown in Fig. 10. We have tested the linearity of the Comparison between the response of the perturbed L96 system to perturbation with spatial pattern Eq. 19 and time pattern Eq. 23 with τ = 5 and the predictions made using the surrogate response functions H i,k for i ∈ {k − 2, k − 1, k + 1, k + 2}. response of the system for values of 2 ranging from 0.01 up to 2 and found that nonlinear corrections are negligible for 2 ≤ 0.25; see Appendix B. Note that here we need to consider smaller values of 2 compared to the case of the forcing G (1) because of the presence of the factor x i (|x i | is typically larger than one in the unperturbed runs). We then perform the data analysis for the case of the combined forcings G (1) and G (2) using 2 = 0.1.
As shown in Eq. 5 for N = 2, it is then necessary to choose a second observable as predictor. We choose the following global observable: Note that Ψ 1 is usually interpreted as the mean momentum of the system [16]. The corresponding response function Γ Ψ 1 ,k (t) (Γ Ψ 1 ,k (t)) to the forcing defined in Eq. 19 (Eq. 25) is portrayed in Fig. 1 (Fig. 10). By symmetry, Γ Ψ 1 ,k (t) and Γ (2) Ψ 1 ,k (t) do not depend on k. This choice is motivated by the fact that we wish to simulate the situation where a local observer, e.g. x k+1 , uses information on its own local state and on global properties of the system to predict the state of another local observer, e.g. x k−2 . We perform the prediction using the following formula: These surrogate response functions have in general different poles than the ones previously defined in the case of just one forcing, see discussion in Appendix D. In Fig. 11   i,k for i = {k − 2, k − 1, k, k + 1, k + 2} and = 0.1. We also portray Γ (2) Ψ 1 ,k , the response function for the mean momentum Ψ 1 (Eq. 26). See also Fig. 15. tude of the surrogate response function associated to x k−2 is greatly reduced, implying that most of the information on x k−2 is drawn from Ψ 1 in the case analysed here. The improvement in our ability to predict x k−2 is summarised in Table II. By comparing the dashed lines in Fig. 12 and in Fig. 9, one notices that adding the second forcing given in Eq. 25 has little effect on the actual response of x k−2 ; indeed, the contribution to the response is smaller by a factor O(10 −2 ) (not shown) with respect to what coming from the forcing given in Eq. 19. But, instead, our ability to predict using surrogate response functions increases substantially when we add Ψ 1 as predictor, even if such observable has little information on the local properties of the system. This is due to the fact that adding a second perturbation to the system and a second observable as predictor regularises our problem. Indeed, a greater predictive skill is obtained even if we perform simulations with smaller values of 2 than what reported above (not shown).  and PI for the surrogate response functions H k−2,Ψ 1 and H k−2,Ψ 1 when two forcings G (1) and G (2) are applied.

IV. CONCLUSIONS
In this paper we have explored the possibility of using, in the linear regime, an observable of a perturbed system as predictor of the response of another observable of the same system perturbed by a forcing. While specific conditions need to be met to be able to perform an actual prediction, it is always possible to reconstruct a posteriori the desired signal. The procedure requires gathering first some information on the relationship between the response of the two observables . Such a knowledge can be obtained by looking at the linear response of the two observables to the forcing undergoing a broadband temporal modulation (e.g. a kick in the form of a Dirac's delta). Then, the approach allows to use the response of one observable to reconstruct and, when possible, predict, the response of the second observable for any temporal pattern of modulation of the forcing. Hence, the first observable is used as a surrogate for the external forcing. The theory clarifies that the ability of two observable to predict each other is not the same, and allows for the treatment of N independent forcings to the system.
where K nc (t) 1 = |B|/b, K c (t) 1 = |A|/a, and S(t) 1 = |C|. Clearly, its value depends on the interplay of the time scales of the causal (1/a) and non-causal (1/b) components and of the magnitude of the three components A, B, and C; see Fig. 13 for a graphical representation of the contributions of the various terms. When performing an analysis based on LRT, it is clearly essential to make sure that we are indeed studying the system in a regime where higher order terms in the perturbative expansion accounting for the full response are negligible. Figure 14 portrays the estimates for various response functions Γ i,j (t) obtained using a range of values for the intensity of the perturbation applied according to Eq. 19. In all cases we have followed the procedure described in Sect. III B. The = 1 curves are also reported in Fig. 1. Figure 15 provides the corresponding information on the range of linearity of the response of the system to the perturbation applied according to Eq. 25.

Appendix C: Asymptotic Properties of the Response in the Lorenz '96 model
It is possible to obtain the behaviour of the response functions Γ i,k for t → 0 + in the L96 system as follows. Let's consider the response function Γ i,k in our case, with space pattern G(x) = δ ik and perturbed observable x i : where ∂ k stands for the derivation with respect to x k (0) and ρ 0 is the steady-state distribution over the initial condition x(0) from which the trajectory x(t) starts. We now perform a Taylor expansion of x i (t) at t = 0 + and we exploit the evolution equation Eq. 18: where C (a) p is the coefficient related to the dynamical variable p of the term of order a in the expansion above (we have that C (0) k = δ ik ). The coefficients for the linear terms (k = 1) give: Considering that ρ 0 (dx) x i = ρ 0 (dx) x j ∀i, j = 1, . . . , N because of the discrete symmetry of the system in the unperturbed state, we have the linear term in the expansion of Γ k+1,k vanishes. . We can repeat the same derivation for all the grid points of the system, taking into account that the leading term of Γ k+2q−1,k is t q+1 instead of t q . We obtain: where the averages are over the stationary measure ρ 0 .
Appendix D: Singular components of the Surrogate Response

Case of one forcing
We consider the perturbation with spatial pattern given by Eq. 19 and time pattern Eq.
We then consider the surrogate response functions H k−2,k+1 and H k+1,k−2 . We proceed as above. We look at the asymptotic behaviour of the corresponding Fourier transforms: where C (2) k−2 0 and C (2) k+1 0 can be computed directly from short-time behaviour of the response functions Γ k−2,k and Γ k+1,k : We derive the non-singular components K k−2,k+1 by performing the inverse Fourier transform of J k−2,k+1 (ω) − C