The maximum entropy principle applied to a dynamical system proposed by Lorenz

Lorenz has proposed a dynamical system in two versions (I and II) that have both proved very useful as benchmark systems in geophysical fluid dynamics. In version I of the system, used in predictability and data-assimilation studies, the system’s state vector is a periodic array of large-scale variables that represents an atmospheric field on a latitude circle. The system is driven by a constant forcing, is linearly damped and has a simple form of advection that causes the system to behave chaotically if the forcing is large enough. The present paper sets out to obtain the statistical properties of version I of Lorenz’ system by applying the principle of maximum entropy. The principle of maximum entropy asserts that the system’s probability density function should have maximal information entropy, constrained by information on the system’s dynamics such as its average energy. Assuming that the system is in a statistically stationary state, the entropy is maximized using the system’s average energy and zero averages of the first and higher order time-derivatives of the energy as constraints. It will be shown that the combination of the energy and its first order time-derivative leads to a rather accurate description of the marginal probability density function of individual variables. If the average second order time-derivative of the energy is used as well, also the correlations between the variables are reproduced. By leaving out the constraint on the average energy – so that no information is used other than statistical stationarity – it is shown that the principle of maximum entropy still yields acceptable results for moderate values of the forcing.


Introduction
The principle of maximum entropy guarantees that the probability density function of a system reproduces all the information given on the system but is maximally noncommittal otherwise. The principle has been put forward by Jaynes in a long series of publications, beginning with two studies on statistical mechanics ( [1,2], reprinted in [3]). Although Jaynes has emphasized that the principle of maximum entropy extends to systems out of statistical equilibrium, the number of studies that have explored this possibility is relatively small. It is the purpose of the present paper to investigate the principle of maximum entropy in the context of a forced-dissipative dynamical system, in particular version I of a system proposed by Lorenz [4]. This system has proved to be very useful in predictability and data-assimilation studies and is generally considered as a benchmark system in geophysical fluid dynamics. As in statistical mechanics, we wish to predict the statistics of the system's variables (the 'microscopic' variables) on the basis of a few global averages that characterize the overall state of the system (the corresponding 'macroscopic' variables).
It will be assumed that the system has been left to itself long enough that the statistics has become stationary. a e-mail: verkley@knmi.nl Ideally, the statistics is then given by a probability density function that is a stationary solution of the Liouville equation, i.e., an invariant measure. For systems of which the flow in phase space is nondivergent, such as the unforced and undamped systems of Hamiltonian mechanics 1 , any nonnegative normalized and sufficiently smooth function of the system's conserved quantities is an invariant measure. As shown in reference [1], if the measure is required to reproduce a given value of the system's average energy, the principle of maximum entropy selects a unique member out of this infinity of possibilities: Gibbs' canonical ensemble. For systems that are forced and damped the flow in phase space is generally not nondivergent so that invariant measures are much more difficult to obtain. Closest to the ideal invariant measures are the Sinai-Ruelle-Bowen (SRB) measures discussed by Ruelle [5]. However, constructing these measures requires sophisticated mathematical techniques and may not be possible for every dynamical system. In the present paper we propose to deal with the invariance of the probability density function in a limited but pragmatic way: by using zero average values of the first and possibly higher-order time-derivatives of the energy as additional constraints in the maximization of entropy. The probability density functions thus obtained are at best approximately invariant but, as will be shown, give quite satisfactory descriptions of the system's statistics.
An approach that can be regarded as an early application of the principle of maximum entropy -in the pragmatic spirit outlined above -has been pioneered by Burgers [6]. In a series of publications, reprinted in Nieuwstadt and Steketee [7], Burgers endeavoured to apply the techniques of statistical mechanics to the problem of turbulent flow. He expanded the flow field in terms of Fourier modes, introduced a phase space spanned by the corresponding Fourier components, divided the phase space in small compartments and searched for a statistical configuration which can be realized microscopically in the largest number of ways. Instead of prescribing the average energy -which would have led to the canonical ensemble of equilibrium statistical mechanics -he required that the forcing and dissipation should balance on average. Assuming that the damping is linear, so that the dissipation of the energy is quadratic in the variables, he obtained as a remarkable result that instead of the energy the dissipation of the energy is equipartitioned among the variables.
An important problem with Burgers' method is that the phase space of a fluid dynamical system is infinite dimensional. Depending on whether energy or a balance between forcing and dissipation is used as a constraint, as a result of equipartition either the energy or the dissipation diverges. Burgers proposed several solutions to this problem but none of these completely satisfied him and at some stage he abandoned the subject. Later, also Onsager [8] took up the challenge of applying statistical mechanics to turbulence; a detailed review of the scientific context in which this took place is given by Eyink and Sreenivasan [9]. Concerning two-dimensional turbulence, Onsager decided to approximate the vorticity field by a finite set of point vortices. This leads to a finite-dimensional Hamiltonian system of ordinary differential equations to which the formalism of equilibrium statistical mechanics can be applied without qualifications. It has led to many interesting results such as the formation of large vortices in two-dimensional flows, see the review by Kraichnan and Montgomery [10]. A more recent review of the statistical mechanics of two-dimensional flows, including many new developments since 1980, is given by Bouchet and Venaille [11]. In the latter review due attention is given to the work of Miller [12], Miller et al. [13] and Robert and Sommeria ([14,15]) who developed statistical mechanical methods to deal with the infinite dimensionality of fluid systems.
All these approaches assume, however, that forcing and dissipation can be ignored as a first approximation. The question whether this is a valid procedure in studying the long-time behaviour of even slightly dissipative fluid systems is still a fundamental one (see Frisch [16], p. 245). The advantage of Burgers' approach is that forcing and dissipation are incorporated from the start. In fact, they are essential ingredients of the formulation, a property that it shares with the inertial range theory of turbulence developed by Kraichnan [17] and Batchelor [18]. The study of Verkley and Lynch [19], in which Jaynes' principle of maximum entropy is applied to a two-dimensional forceddissipative model of the atmosphere, can be seen as a revisit of Burgers' approach. Representing the flow fields in terms of basis functions (spherical harmonics) and then truncating the representation, a finite dimensional model was formulated. By staying within this finite dimensional model the problem of infinite dissipation was avoided. As constraints in the maximization of entropy the energy and enstrophy as well as their decay rates (assumed zero in the statistically stationary state) were used. Spectra and mean flows were obtained from the resulting probability density function. When the two types of constraints were combined, the agreement with spectra and mean flows from numerical simulations was reasonably good.
The principle of maximum entropy was also used by Verkley [20] to formulate a parametrization of the average effect of the small-scale variables on the large-scale variables in version II of the model devised by Lorenz [4]. The large-scale variables in this version represent a meteorological field at equidistant points on a latitude circle. They are subject to a simple constant forcing, a linear damping and nonlinear advection. Coupled to each large-scale variable is a group of small-scale variables that obey similar dynamics and represent the effects of motions beyond the resolution of the large-scale variables. The coupling goes in both directions; the small-scale variables are being forced by the large-scale variables and the large-scale variables are being forced by the small-scale variables. To parametrize the latter effect, the maximum entropy principle was applied to the statistics of the small-scale variables, assuming that their statistics is stationary on the time-scale of the large-scale variables. The small-scale entropy was maximized under the constraint that the average time derivative of the small-scale variables' energy is zero. The resulting probability density function was then used to calculate the average effect of the small-scale variables on the large-scale variables, yielding a simple linear damping not unlike the damping that would have resulted from an empirical approach.
In all cases mentioned, the probability density function is a product of independent normal distributions in the different variables. For an individual variable this result compares reasonably well with numerically obtained probability density functions, both in terms of averages and in terms of variances. The predicted independence of the variables is, however, usually not in accord with numerical simulations. Now, it can be understood easily that the principle of maximum entropy will not lead to covariances between the variables if the mathematical expressions of the constraints do not involve cross-terms between the variables. Cross-terms are usually present in higher order time-derivatives of the energy and covariances will therefore result from the maximization of entropy if these are used as constraints. In the context of version I of the model devised by Lorenz [4], i.e., the version without small-scale variables, we will study the consequences of using the second-order time-derivative of the energy as an additional constraint in the maximization of entropy. The present work aims at showing that this leads to quite realistic covariances between the variables.
In the next section we give a short summary of the maximum entropy principle, in a setting that is valid for any finite dimensional dynamical system. We then, in Section 3, introduce version I of the system proposed by Lorenz [4]. As an introduction to the dynamics of this system, we discuss the stability properties of the steady state in which all variables are equal to the forcing. We then show the results of a numerical simulation in a case for which the system behaves chaotically. This case will be used as a numerical reference in Section 4. In this section we apply the maximum entropy principle, first using as a constraint a given value of the average energy, then adding a zero average first-order time-derivative of the energy and, finally, adding a zero average second-order time-derivative of the energy. In the latter two cases it is also studied what the consequences are of deleting the energy constraint. The analytical and numerical results are placed in the perspective of smaller and higher values of the forcing in Section 5. The conclusions and an outlook are given in Section 6. Some of the mathematical detail is deferred to Appendices.

The maximum entropy principle
We consider a dynamical system described by a point X = (X 1 , X 2 , . . . , X K ) in a K-dimensional phase space. If we are uncertain as to the exact state of the system, then it is appropriate to represent the system by means of a probability density function P(X). The amount of uncertainty is quantified by the information entropy S I , defined as: Here the integral is over the whole phase space, dX denoting a K-dimensional volume element dX 1 dX 2 . . . dX K . The measure M(X) embodies any a priori information that we have on the system. Its presence implies that the argument of the natural logarithm is dimensionless and ensures furthermore, as emphasized by Jaynes [21], that the information entropy S I is independent of a coordinate transformation of X. Such a coordinate transformation could result from a deterministic time-evolution of the dynamical system, implying that the information entropy S I is independent of time in that case. The principle of maximum entropy -see Jaynes [22] and Pressé et al. [23] for general introductions -states that the information entropy should be maximal under the constraints of normalization and any other information that is available. The normalization constraint reads: The other information will be assumed to have the form of given averages where K l (X) is a function of the state vector X, l = 1, 2, . . . , L, the number of averages being L. The most important example is the energy E(X), but in this paper we will pay particular attention to dE(X)/dt and d 2 E(X)/dt 2 . We note that the constraints generally leave so much room for the probability density function that maximization of the information entropy S I selects a member from a set that is usually infinitely large. The member that is selected, however, is special in that it reproduces the information on the system's dynamics and, at the same time, is as noncommittal as possible, i.e., has not more structure than the information justifies. Using a set of Lagrange multipliers λ = (λ 1 , λ 2 , . . . , λ L ) to take into account the constraints above, the maximization of S I leads to (see Verkley and Lynch [19], their Eq. (12)) where the partition function Z(λ) is the normalization factor of the probability density function, given by: It can be seen from the expression above that the maximum entropy probability density function reduces to the a priori measure if there are no additional constraints of the form (1); in this case all Lagrange multipliers are zero 2 . The Lagrange multipliers must be obtained from the values of the averages, i.e., from K 0 l , l = 1, 2, . . . , L. An important relation that can be used to relate the average values of K l to the Lagrange multipliers λ l is the following: This identity can be derived straightforwardly from (3).

The system
The system to be studied is version I of a model that Lorenz [4] devised to investigate problems in atmospheric predictability, data-assimilation and parametrization. E.g., Lorenz and Emanuel [24] used this model to study observation strategies in data-assimilation. In the present paper we will use the same model to study the formalism of maximum entropy as a means of obtaining the system's statistics without full recourse to numerical integrations. The model state X = (X 1 , X 2 , . . . , X K ) of the system represents a certain atmospheric field at K equidistant points on a latitude circle. The system is assumed to be periodic so that X k+K = X k−K = X k for all k, see the schematic in Figure 1. The equations that govern the system are: where k = 1, . . . , K and K ≥ 4. The quadratic term in this equation represents advection, the linear term represents damping and the term F , assumed to be independent of k, is the forcing. In order to allow for chaotic behaviour, the dimension of the state space should be at least 4. The variables have been nondimensionalized in such a way that the coefficients in front of the advection and linear terms are 1; a nondimensional unit of time corresponds to 5 days. The most important global quantity that characterizes the state of the system is the energy E, defined by: It can be checked that, due to the periodicity conditions, the quadratic term in the equations drop out if we calculate the first-order time-derivative of E. We thus have: For the second-order time-derivative of the energy we find, in a similar fashion, Here we used the periodicity conditions to advance from the second to the third line in the expression above. At this point it might be useful to note that both Lorenz [4] and Lorenz and Emanuel [24] do not attribute a specific physical meaning to the variables X k . Instead, they remark in the latter reference (p. 400) that the model is formulated as "one of the simplest possible systems that treats all variables alike and shares certain properties with many atmospheric models, namely, (1) the nonlinear terms, intended to simulate advection, are quadratic and together conserve the total energy [. . . ]; (2) the linear terms, representing mechanical or thermal dissipation, decrease the total energy; (3) the constant terms, representing external forcing, prevent the total energy from decaying to zero". Be that as it may, it is to the credit of the system that, without forcing and damping, the system conserves energy, is time-reversal invariant and characterized by a nondivergent flow in phase space. As a result, the statistics of the unforced undamped version of the system is well described by equilibrium statistical mechanics, as shown at the end of Section 4.1.
A rather detailed analysis of the system's dynamics in dependence of F , for K = 4, K = 8 and K = 40, is presented by Orrell and Smith [25] using a special type of bifurcation diagram. We will discuss in some detail the starting point of any analysis of such kind, namely the state X s of which all components are equal to F , i.e., X sk = F, k = 1, . . . , K. This is a stationary solution (steady state) of the system (5). If we denote a small perturbation of X sk by x k and linearize the system (5) around this state, we see that the dynamics of these perturbations is governed by (see Lorenz and Emanuel [24], their Eq. (4)) Introducing a matrix A with matrix elements A kl , this can be written as: where it is to be noted that the system is cyclic so that the index k + K and k − K denote the same variable as the index k. If all eigenvalues of the matrix A have a real part that is negative, then all solutions of the linearized system decay exponentially in time and the steady state is linearly stable. If one or more eigenvalues of the matrix A have a positive real part, then the steady state is linearly unstable. The special case that all eigenvalues have a negative real part, except for one or more of which the real part is zero, is characterized as neutrally stable. If we write out the matrix A, we see that each row has the same elements as the row above it but shifted one position to the right, elements disappearing on the right reappearing on the left due to the periodicity conditions. Such matrices are called circulant; the theory of these matrices can be found in Davis [26], pp. 72-73. The matrix W of normalized eigenvectors, the columns of which denote the different eigenvectors, has elements given by: where Note that the structure of these eigenvectors is independent of the details of the matrix A but only depends on the number of variables K. The eigenvalues χ m do depend on the details of the matrix A and are given by: Using (9) and the periodicity conditions we find: From these expressions it may be concluded that the steady state is stable if, for all values of m, the quantity 1 + F z m is positive. Otherwise, the steady state is either unstable or neutrally stable. It can be checked straightforwardly that W l,K+2−m = W * lm and χ K+2−m = χ * m . This implies that real-valued normal modes can be constructed by linearly combining eigenvectors with indices m and K + 2 − m.
Plots of z m and q m , for K = 4 (squares) and K = 36 (circles), are shown in the upper and lower panels of Figure 2, respectively. The solid curves are the functions z(x) = cos(2x) − cos(x) (upper panel) and q(x) = sin(2x) + sin(x) (lower panel), with x expressed in degrees. The function z(x) has a minimum of −9/8 and a maximum of 2, as can been shown easily by differentiation. The minimum value of z m is thus close to −9/8, while the maximum of z m is close to 2; depending on the number of nodes K, the minimum is closer to −9/8 and the maximum closer to 2.
Denoting the minimum (negative) value by z min and the maximum (positive) value by z max it follows that the steady state is stable as long as the forcing F satisfies When the forcing F is outside this range, the steady state is either unstable or neutrally stable. For values of F slightly exceeding the maximum allowed value the flow is time-dependent and dominated by a westward traveling wave of which the structure closely resembles the first growing mode, i.e., the eigenvector of which the  wavenumber is given by m − 1 with m the value for which z m = z min . If F increases further the amplitude of the wave becomes larger and the dominant wavenumber decreases. However, with increasing F the dominance by waves gives way to increasingly chaotic behaviour, see also Orrell and Smith [25]. Analogous statements are valid for negative values of F . In order to explore the nature of the system's dynamics outside the range of forcing values for which the steady state is stable, we integrated the system (5) numerically for the values F = 2.5, 5, 10 and 20 for a period of 20 000 time units. We used a fourth order Runge-Kutta time integration scheme with a time step of 0.001 and sampled the model's state every 0.1 time unit. We took K = 36, as in Lorenz [4], and started the integration with all variables X k equal to 0, except for X 1 which was set to 0.1. The evolution of the energy (6) of the system over the first 1000 time units is shown in Figure 3. The graphs show that in all four cases the energy quickly reaches a well-defined equilibrium value over this period of time. Table 1. Overview of the results of the numerical integrations (first set of four rows and collectively denoted by 'Num. sim.') and the results of the maximum entropy analysis (next five sets of four rows and collectively denoted by 'Maxent' and the equation numbers referring to the constraints used). The different columns give the values of the forcing F , the average energy E0 = E , the average μ k = γ and standard deviation C 1/2 kk of an individual variable X k , as well as the parameters σ and ϕ. Values that are underlined have been used as input of the maximum entropy analysis.
Case We are interested in the statistics of an individual variable X k as well as in correlations C kl between the variable X k with any of the other variables X l . The statistics of an individual variable can be described by giving its full marginal probability density function or, more compactly but less complete, by giving its mean μ k and variance C kk . We note that, due to the system's symmetry, the mean μ k is independent of the index k and the covariance C kl is only dependent on the difference between k and l. All numerical averages are calculated straightforwardly by adding the sampled values of the relevant quantity and dividing by the total number of samples, i.e., by 200 001. Marginal probability density functions are obtained from histograms of X k . An appropriate interval of X k is divided into 100 equal-sized bins, the number of samples within each bin is counted and the result is divided by the total number of samples and the bin size. The numerical results corresponding to the four values of F are given in the first set of rows of Table 1. We will discuss in more detail the case F = 10 for which the numerical average energy E 0 is 465.2, or 12.92 for E 0 /K, i.e., for the energy per degree of freedom. The outcomes of this numerical integration will be used in the next section as illustration of the results to be discussed there. In the first graph (panel a) of Figure 4 we show a snapshot of the system at t = 1000, i.e., at the end of the integration window of Figure 3. Due to the fact that the forcing F is positive, the average value of an individual variable X k is positive as well. We see from this snapshot that a wave pattern is present, with a wave number around 10. This wave behaves very erratically, although a moderate westward motion can be discerned. The second panel of Figure 4 shows the numerical marginal probability density function, based on a histogram over the interval [−30, 30], of the values of X k , k = 1. We see that, for F = 10, the form is not unlike a normal distribution and is characterized by an average of 2.586 and a standard deviation of 4.385, see Table 1.
As the presence of a moving wave pattern already indicates, the variables X k are correlated. This is confirmed by the last graph (panel c) of Figure 4 in which we show the correlation coefficient C kl for k = 19 as a function of l. The choice of k = 19 is motivated by the wish to have the maximum (equal to the variance or squared standard deviation) in the middle of the figure but is otherwise arbitrary due to the system's symmetry. The covariance can be described as a damped oscillation away from its reference point k = 19. In the next paragraph we will investigate to what extent the statistics, as revealed by the marginal probability density function of X k and the correlation function C kl , can be reproduced from the maximum entropy principle.

Applying the maximum entropy principle
In this paper we are interested in the statistical properties of (5). We will assume that the system has been integrated numerically for a sufficiently long time so that the statistics can be considered stationary. The results discussed in the previous section support this assumption, at least for the cases considered.
The principle of maximum entropy, as outlined in Section 2, will be applied to calculate the system's probability density function P(X). We will make no a priori assumptions about the probability density function and thus use: In Section 4.1 we will derive an expression for P(X), using as constraints the normalization condition, written as: and the condition that the average energy has a given value: This is essentially a recapitulation of the approach of equilibrium statistical mechanics and leads to Gibbs' canonical ensemble. It will be investigated to what extent the statistics of the system is reproduced if the energy E 0 is taken from the corresponding numerical simulation. In Section 4.2, we will discuss the consequences of adding the constraint This is based on the assumption that the statistics of the system is stationary. Requiring the average time rate of change of the energy to be zero is an attempt to bring the probability density function closer to a stationary solution of the Liouville equation. To what extent this is the case is not investigated but we will see that, by adding the latter constraint, a marked improvement is achieved in the description of the system's statistics. In Section 4.3 we will study the consequences of adding the constraint This may also be justified on the basis of the system's assumed statistical stationarity and can be seen as a step further in the direction of a stationary solution of the Liouville equation. The use of this constraint will lead to additional, quite realistic, structure in the statistics in the form of correlations between the variables. In all cases considered we compare the analytical results with the numerical results for F = 10 given in the previous section. The results for the other values of the forcing are summarized in Table 1, discussed in Section 4.4.

The case E = E 0
Computing the maximum entropy probability density function P(X) for the system (5) using only the normalization condition (15) and condition (16) as constraints is straightforward. From (2) and (16) it follows immediately that: where λ 0 is the Lagrange multiplier associated with the constraint E = E 0 and the partition function Z(λ 0 ) is given by: By writing: in which we have defined (see (A.1) and (A.2) of Appendix A) the probability density function assumes the form: By introducing the matrix B with matrix elements B kl = δ kl the probability density function can be written as: Now, a general multivariate normal distribution is defined by (see [27], their Eq. (6.19.1)) where μ is a vector of means (μ 1 , . . . , μ K ) and C is the covariance matrix with matrix elements C kl and determinant |C|. By definition, a multivariate normal distribution has the following properties: and its entropy is given by (see [27], their Eq. (17.2.2)) It can thus be concluded that the sought-for probability density function is simply given by: with and For the maximum entropy, denoted by S M , we thus have: whereas for the partition function we obtain: The latter result follows by equating expressions (22) and (23) of the probability density function.
In the resulting probability density function (25) the value of σ, related to the Lagrange multiplier λ 0 , is still unknown. A relation for σ 2 can be obtained by applying (4) to (28), but we prefer to follow a different route. Using (24b), (24c) and (26), one obtains: Combining the constraint (16) with (6) gives: It thus follows that we have: so that This completes the determination of the probability density function. If we substitute (29) into expression (27) for the entropy S M we may calculate the entropy per degree of freedom (S M /K) as a function of the energy per degree of freedom (E 0 /K). A grey shading plot of S M /K as a function of F and E 0 /K is given in the upper panel of Figure 5. The inclusion of F is in anticipation of future plots of a similar nature, but at the moment the plot's independence of F may remind us that in the analytic treatment the forcing has not entered the stage yet. The numerical integration discussed in the previous section has F = 10 and E 0 /K = 12.92 and is represented by the dot in the graph, the corresponding value of S M /K being 3.04.
When we use the numerical average value 12.92 of E 0 /K, we may calculate σ 2 and thus the probability density function P(X). This probability density function involves all variables X k . The marginal probability density function of any subset of variables may be found by retaining only the means and covariances of this subset. If there is only a single variable X k in this subset, we have: i.e., a univariate normal distribution with mean μ k = 0 and variance A plot of this marginal probability density function, together with the numerically obtained probability density function of the previous section, is shown in the second panel of Figure 5. The analytical marginal probability density function has its maximum at the origin and not at a finite positive value, but the width is only mildly overestimated, see Table 1. From (26b) we conclude that the analytical covariance matrix is proportional to the identity matrix. The correlations between the variables, as exemplified by the last panel of Figure 4, are thus not reproduced.
As remarked above, the probability density function (25) is Gibbs' canonical ensemble. This probability density function is expected to give a proper description of the statistics of the unforced and undamped system. To demonstrate that this is indeed the case, we continued the time integration discussed in the previous section for another 20 000 time units but with the forcing and damping put to zero and sampling the results over the latter period. When comparing the flow patterns of this integration with the flow patterns of the forced and damped integration, it was noticed that waves remain present and continue to behave erratically but without overall wave propagation. This suggests that correlations between the different variables will be small and, indeed, they are: not larger in amplitude than 0.2, in accord with the absence of correlations in (25). The marginal probability density function for a single variable is shown in the third panel of Figure 5, together with the analytical distribution of panel b. The match is nearly perfect. Similar results apply when this is repeated for the other integrations.

The case E = E 0 , dE/dt = 0
If we maximize the entropy under the normalization condition (15) and the constraints (16) and (17) we obtain, according to (2), where λ 0 and λ 1 are the Lagrange multipliers associated with the constraints (16) and (17), respectively, and the partition function Z(λ 0 , λ 1 ) is given by: The argument of the exponential can be written as: where σ 2 and γ are given by (see (A.1) and (A.2) of Appendix A) Using the matrix B introduced in the previous subsection (with matrix elements B kl = δ kl ) the probability density function can be written . If normalized, this distribution is seen to be a multivariate normal distribution of the form (23) with and The entropy is given by (27) and for the partition function we have: The resulting probability density function is thus the multivariate normal distribution (25) with a diagonal covariance matrix given by (35b) and nonzero means that are given by (35a).
In obtaining the values of σ 2 and γ we might use (4) but we prefer to reason as follows. We have: (16) and (17) assume the form From the first condition we obtain: and if we substitute this in this second condition we get an expression of γ in terms of the energy E 0 and the forcing F : The value of γ now being known, the same condition can be used to obtain σ 2 : From σ 2 and γ 2 thus obtained we may deduce the values of the Lagrange multipliers λ 0 and λ 1 These equations follow from (34) by dividing the expression of γ by the expression of σ 2 , giving λ 1 , and then using the expression of σ 2 to obtain λ 0 . This completes the determination of the probability density function. Substituting expression (36) of γ into (37) we obtain: This result is quite remarkable as it shows that not all values of the energy E 0 are allowed as only values of E 0 for which lead to an acceptable solution, i.e., a solution for which σ 2 is nonnegative 3 . If the expression for σ 2 is substituted in equation (27) of the entropy, we see that the entropy depends on the energy as well as on the forcing and that the entropy is well-defined as long as (40) is satisfied.
In panel a of Figure 6 the entropy is plotted as a function of the forcing and the energy. The upper solid parabola, at which the left-hand side and the right-hand side of (40) are equal, bounds from above the region of allowed values of the energy. The lower dot in this panel, at which S M /K = 2.90, denotes the values F = 10 and E 0 /K = 12.92 and refers to the numerical integration discussed in the previous section. The marginal probability density function is (30) with μ k = γ and C kk given by: This probability density function is plotted as the smooth solid curve in the second panel of Figure 6, together with the numerically obtained probability density function discussed in the previous section. The agreement between the analytical and the numerical curves is now much better, especially because the analytical mean of 2.584 (based on the numerically given value of the energy) is very close to the numerical value of 2.586, see Table 1. Another improvement is that the analytical probability density function has become somewhat slimmer and has acquired a somewhat higher maximum value. The covariance matrix is diagonal, see (35b), so that the covariances as seen in panel c of Figure 4 are not reproduced. For each value of the forcing F there is a value of the energy E 0 for which the entropy is larger than for other values of E 0 . This energy value would be the result of our analysis, had the energy not been used to constrain the maximization of entropy. For any value of the forcing F , we may obtain this maximum by repeating the analysis above but leaving out the constraint (16). The result can be obtained straightforwardly by putting the Lagrange multiplier λ 0 equal to zero in (38a). It follows that: The points in panel a of Figure 6 for which the first of these relations is valid are represented by the intermediately thin parabola. It is a curve halfway between zero and the maximum allowed value of E 0 /K and connects the entropy maxima that pertain to given values of the forcing F . For F = 10 the value of E 0 /K on this curve is 25.00 and the corresponding point is denoted by the upper dot in panel a of Figure 6. The isoline of S M /K that passes through this point, with a value of 3.03, is displayed in the form of a thin solid curve. The marginal probability density function that corresponds to these values is shown in the third panel of Figure 6, along with the same numerical distribution as in panel b. In accordance with the fact that the energy is higher than the corresponding numerical simulation the mean is higher than the numerical mean. Also the width is somewhat larger. Although the latter result is less accurate than the former is it about as accurate as the result in the previous subsection. To the credit of the result shown in panel c of Figure 6 is that no information from the numerical simulation is used; it is exclusively based on the assumption that the system has reached a statistically stationary state.

The case E = E 0 , dE/dt = 0, d 2 E/dt 2 = 0
In this subsection we consider the case in which, apart from the normalization condition, the conditions E = E 0 , dE/dt = 0 and d 2 E/dt 2 = 0 are used as constraints in the maximization of entropy. According to (2) the probability density function can be written where λ 0 , λ 1 and λ 2 are the Lagrange multipliers associated with the constraints and the partition function Z(λ 0 , λ 1 , λ 2 ) is given by: The argument of the exponential can be written as: with σ 2 , γ and ϕ given by (see (A.1

) and (A.2) of Appendix A)
By introducing the matrix D with the matrix elements we obtain for the probability density function (44) . After normalization, this distribution becomes a multivariate normal distribution (23) with and The entropy is given by: and for the partition function we have: This is the same general form as obtained in the cases with one and two constraints, but with a covariance matrix C that has more structure as it is proportional to the inverse of a matrix D that is not diagonal. The matrix D is circulant like the matrix A of the linear stability problem of the steady state. Therefore, the matrix of eigenvectors -of which the elements are denoted by W lm -is given by (10) and (11). For the eigenvalues, denoted here by ψ m , the expression is (12) with matrix elements D 1n instead of A 1n . This yields where z m is given by (13a). As the determinant of a matrix is the product of its eigenvalues, we have: which implies that The entropy can thus be written It also follows from (49) that for the partition function.
The partition function contains square roots of ψ m = 1 + ϕz m . To ensure that the partition function is real and positive (as it should be as it is the normalization factor of the probability density function) we require that ϕ be such that 1+ϕz m is positive for all values of m. We have seen in Section 3 that the minimum z min of z m is negative (close to −9/8) and that the maximum z max of z m is positive (close to 2) so that all values of 1 + ϕz m are positive as long as ϕ satisfies This limits the range of ϕ. The values of σ 2 , ϕ and γ -and therewith the Lagrange multipliers λ 0 , λ 1 and λ 2 -are to be determined from the constraints (16), (17) and (18). Again, we might make use of (4) to obtain the necessary relations, but we will follow a route which, we believe, is more clarifying. An important ingredient in the evaluation of the constraints is the inverse of the matrix D. In Appendix B it is shown that the matrix elements of D −1 , denoted by D −1 kl , are given by: The elements C kl of the covariance matrix C thus become We recall that it follows from the expressions (6) and (7) that: Furthermore, it follows from the properties (24) of a multivariate normal distribution that: where we also used (47) and (52). We thus have: From (8) we deduce that: Equations (24), (47) and (52) imply that This gives To advance from the previous two expressions to the last we used the definition (13a) of z m . The constraints on the energy and on the first-and second-order time derivatives of the energy are thus seen to lead to the following three conditions on σ 2 , γ and ϕ Once we have obtained these parameters, the Lagrange multipliers λ 0 , λ 1 and λ 2 can be derived from (46). Indeed, we see that: The expression for λ 2 follows by dividing (46c) by (46a). By dividing (46b) by (46a) and using the result for λ 2 , one finds λ 1 . Expression (46a) can be used to obtain λ 0 . By subtracting (54b) from (54a) we find: which is the same expression for γ as we obtained in the previous subsection. Substituting this value in (54a) we see that This expression resembles (39) and, as 1 + ϕz m should be positive for all m, implies again that in order for σ 2 to be nonnegative. By multiplying (54c) by γ and adding the result to (54b) we obtain, excluding σ 2 = 0, This equation is generally nonlinear in ϕ and has to be solved numerically. (For K = 4, however, the equation is quadratic and can be solved analytically.) After solving (58), any of equations (54) can be used to obtain σ 2 from ϕ, by which the determination of the probability density function is completed. We noted above that, in order for 1+ϕz m to be positive for all m, the parameter ϕ should lie between −1/z max and −1/z min , see (51). Now, for a sum of fractions (58) with positive denominators to add up to zero, at least one of the nominators has to be negative. This implies that, if ϕ has to have a value within the limits (51), then γ needs to have a value outside these limits. The immediate consequence is that The first case is relevant for positive F for which it implies that The second case is relevant for negative F in which case it implies that Given a value of F and a value of E 0 /K such that (57) and either (59) or (60) are satisfied we may solve (58) for ϕ and thus acquire all necessary parameters to determine the probability density function. However, when the forcing F lies between the limits −1/z max and −1/z min values of E 0 /K such that (57) and either (59) or (60) are satisfied do not exist because then the conditions (59) or (60) are incompatible with (57). We have seen in the previous section that for these values of the forcing the steady state solution of the Lorenz system is stable. Equation (58) cannot be solved in this case although the system (54) admits the special solution (for which the value of ϕ is not relevant) implying that there is only one acceptable value for E 0 /K, This is the situation in which the probability density function is infinitely sharply peaked at the stable stationary state X k = X sk = F . It is quite remarkable that the added constraint d 2 E/dt 2 = 0 identifies this special case. As in the previous subsections, the marginal probability density function may be found by retaining only the means and covariances of the subset of variables in which we are interested. For a single variable X k , this gives (30) in which μ k = γ and in which C kk follows from (53): In the second equality we used (54b). The covariances between the variables X k are given by (53). Note that the resulting marginal probability density function is identical to the one we obtained in the previous subsection. An important difference, however, is that the covariance matrix (53) is no longer diagonal.
In the upper panel of Figure 7, in the form of a grey shading plot, we show the entropy per degree of freedom (S M /K) as a function of the forcing (F ) and the energy per degree of freedom (E 0 /K). The plot is similar to the plot shown in Figure 6, except for the lower limits of the allowed energy. These limits, in which the left-hand sides and the right-hand sides of (59) and (60)  Also here we note that for each value of the forcing F , there is a value of E 0 at which the entropy S M is largest. Plotted as a function of F and represented by a curve of intermediate thickness, these values are included in panel a of Figure 7. The value of E 0 /K at F = 10 for which the entropy is largest is 25.55 and is marked by the upper dot. The entropy S M /K at this point is 3.02; the corresponding isoline is shown in the form of a thin curve passing through this point. The value of E 0 /K at this point is somewhat higher than the energy of the corresponding point in Figure 6 -there it is 25.00 -so that its associated marginal probability density function is slightly shifted towards higher values. The difference between this graph and the graph shown in panel c of Figure 6 is too small to warrant a separate plot. Instead, we show the analytical correlation function in panel c. In accord with the higher value of the energy, its peak is higher than the corresponding plot in the second panel of Figure 7.
The curve which marks the value of the energy at which the entropy is maximal is obtained by repeating the maximum entropy calculation without the constraint on the energy, i.e., without (16). The results can be obtained rather simply by setting the Lagrange multiplier λ 0 equal to 0. If we do this in the first equation of (55), we see that the average γ is given directly in terms of the forcing F and the parameter γ by: The calculations proceed as before, except that (54a) is not to be considered as a constraint but as one of the expressions from which E 0 might be calculated once γ, σ 2 and ϕ are known -an alternative is (56). The value of ϕ is obtained from (58) by substituting (62) for γ and solving directly for ϕ in terms of F . After substitution of (62) into (58), the equation becomes: This equation will not yield acceptable values of ϕ for all values of the forcing F ; the forcing has to lie outside the range from −1/z max to −1/z min , i.e., outside the range for which the steady state X k = X sk = F is stable. This can be seen as follows. In order to have a real and positive partition function, all denominators in (63) should be positive. But if F would lie between −1/z max and −1/z min , then also all nominators are positive so that the first term on the left-hand side of (63) cannot cancel the second term on the left-hand side. The only option to satisfy (54) is to have σ 2 = 0 and γ = F so that E 0 /K = F 2 /2, in accord with what we have seen above. If F lies outside the range from −1/z max to −1/z min , then an acceptable value of ϕ can be found from (63), either analytically (if K = 4) or numerically by a root finding procedure. From the resulting value one may calculate γ using (62) and σ 2 from either the second or the third equation in (54). The energy per degree of freedom E 0 /K can then be calculated by using e.g. (56). To give some insight into the corresponding graphs for small values of F , we reproduce an enlarged portion of panel a of Figure 7 in Figure 8.
Of central importance in the use of d 2 E/dt 2 = 0 as a constraint in the maximization of entropy is (58) or, if we leave out E = E 0 as a constraint, its counterpart (63). If γ or F lies outside the interval in which the steady state is stable, then both (58) and (63) can be solved to give acceptable values of ϕ. As remarked above, these relations are generally nonlinear in ϕ and need to be solved numerically. However, experience has learned us that the equations are very well behaved, the functions involved being monotonic with a single zero if either γ of F lies in the region of allowed values (if this is not so, the function has singularities and multiple roots). If K = 4, then z 1 = 0, z 2 = −1, z 2 = 2 and z 3 = −1, see Figure 2, and both functions are quadratic. Relation (58) is then equivalent with: and its solutions are: Relation (63) becomes in the case K = 4 and has solutions With these exact results, the maximum entropy analysis for the Lorenz system with K = 4 is fully analytical; this may be useful in further studies. In panel a of Figure 9 we show the solutions of (58), i.e., ϕ as a function of γ, for

Summary
For both F = 10, the case which was used as a reference in the previous subsections, as well as for F = 2.5, F = 5 and F = 20 the results of the numerical simulations and the maximum entropy analysis are given in Table 1. We recall that the time evolution of the energy in all four cases is shown in Figure 3. The first four rows of the table (Num. sim.) contain the results of the numerical integration for the different values of the forcing F ; the values of F are given in the first column of the table. The average energy E 0 = E is given in the second column, the mean value μ k = γ of an individual variable X k is given in the third column and the standard deviation C 1/2 kk is given in the fourth column. The fifth and sixth column are reserved for the analytical variables σ and ϕ.
The second set of rows shows the analytical results obtained in Section 4.1 in which the entropy is maximized with (16) as a constraint, i.e., under the condition that E = E 0 . The values of E 0 are taken from the numerical simulation and are repeated in the second column. The values are underlined to emphasize that they are input of the maximum entropy (maxent) analysis. The third set of rows refers to Section 4.2 in which the entropy is maximized with (16) and (17) as constraints, i.e., with E = E 0 and dE/dt = 0. Also here the average energy is input of the analysis. This is different in the fourth set of rows in which we give the analytical results in case only (17), or dE/dt = 0, is used as a constraint. Here the average energy is output of the analysis. The last two sets of rows contain the results of Section 4.3 in which (18), i.e. d 2 E/dt 2 = 0 is added as a constraint.

Discussion
In this section we place the results for F = 10, on which we focused in the previous section, in the perspective of higher and lower values of F . In Figure 10  in 100 equal-sized intervals and counting the number of samples of the variable X k (k = 1) within each interval. We also show, using solid lines, the analytical marginal probability density functions that result from the maximum entropy analysis with (16) and (17) as constraints, i.e., using the conditions E = E 0 and dE/dt = 0. The parameters γ and C 1/2 kk that characterize these distributions are given in the third set of rows of Table 1. By means of the dashed curves we show the marginal probability density functions in the case that only (17), i.e., dE/dt = 0, is used as a constraint. The corresponding parameters are given in the fourth set of rows in Table 1. The two different set of theoretical curves illustrate the differences between the cases with and without the constraint E = E 0 . Note that the results shown in panel c have been shown before in panels b and c of Figure 6. They are included here to place them in the context of similar graphs for lower and higher values of the forcing F . We recall that all plots refer to K = 36.
Looking at the numerical probability density functions, it becomes evident that their resemblance with a normal distribution becomes less if the forcing decreases. At F = 2.5 this is quite outspoken. Of the three distinct maxima that can be discerned in the marginal probability density function, the outer two are related to the fact, mentioned in Section 3, that the behaviour of the system at small values of the forcing is dominated by a (westward) propagating wave. This implies that the behaviour of an individual variable X k is dominated by an oscillation, meaning that a relatively large amount of time is spent at the upper and lower extremes of X k . Generally speaking, the solid curves demonstrate that the accurate description of the marginal probability density functions in the case that both E = E 0 and dE/dt = 0 are taken into account in the maximization of entropy is valid in the whole range of forcing values shown, although, as we have just seen, the form of the numerical marginal probability density function can be quite unlike a normal distribution for smaller values of F .
If the constraint E = E 0 is left out, the average energy E is overestimated. This was already noted in the previous section for F = 10, but Table 1 (fourth and sixth set of rows) shows this to be also true for the other values of F . It reflects itself in analytical marginal probability density functions (dashed curves) that have mean values that are higher than the numerically obtained ones. We also see, and this is confirmed by a closer look at Table 1, that the overestimation, if expressed in relative terms, is larger for larger values of the forcing F . For moderate values of F , say for values not larger than 10, the difference between the results with and without the constraint E = E 0 is small enough for the results to be useful as an estimate of the marginal probability density function of a variable X k . And this is quite remarkable as no information from a numerical integration is needed to produce this estimate. This is different from the case in which E = E 0 is used as the single constraint in the maximization of entropy. For values of F smaller than about 10 this case yields marginal probability density functions that are about as accurate as those obtained by using dE/dt = 0, but rely on information from a numerical integration in the form of a value of E 0 . For the same series of forcing values, i.e., for F = 2.5, 5, 10 and 20, the covariance matrix C kl as a function of l with k = 19 is shown in Figure 11. The numerically obtained covariance matrix is displayed, as in the panels b and c of Figure Figure 7. The figure illustrates that the resemblance between numerical and analytical results is generally quite good but decreases with decreasing values of F . Indeed, the numerical results display correlations that extend over larger differences between l and k than the analytical results, the more so for smaller values of F . Note that the discrepancy is least serious in case the constraint E = E 0 is taken into account.

Conclusions
We have used the principle of maximum entropy to obtain the probability density function of a dynamical system proposed by Lorenz [4]. The state vector X of this system has K components X k that represent an atmospheric field on a latitude circle and satisfy an equation that incorporates forcing, damping and advection. In the four numerical integrations that we have studied, in which the forcing F was varied, the system quickly reaches a statistically stationary state in which the energy E fluctuates around a well-defined and constant average E 0 , see Figure 3. An important constraint in the maximization of the entropy is thus E = E 0 , where the brackets denote the average defined in terms of the probability density function of the system. The fact that statistical stationarity quickly sets in also justifies the constraint dE/dt = 0, a condition used previously by Burgers [6] in the context of turbulence. Statistical stationarity furthermore justifies constraints of the type d n E/dt n = 0, with n ≥ 2. The constraints on the average time-derivatives of the energy enforce approximate time-invariance of the probability density function and generate additional structure in the statistics. In the present paper we concentrate on E = E 0 , dE/dt = 0 and d 2 E/dt 2 = 0.
Using the single constraint E = E 0 , the maximum entropy principle yields a probability density function in the form of a multivariate normal distribution P(X) = N (μ, C, X), in which the means μ k are zero and the covariance matrix C kl is proportional to the identity matrix, see (25) and (26). The proportionality factor σ 2 is given in terms of the energy E 0 and the number of variables K by (29). The variables are therefore uncorrelated and each variable satisfies a marginal probability density function in the form of a univariate normal distribution with mean μ k = 0 and variance C kk = σ 2 . The forcing has not been taken into account yet so that the entropy diagram of Figure 5a, where the entropy per degree of freedom (S M /K) is plotted as a function of the forcing F and the energy per degree of freedom (E 0 /K), is independent of F . For the case F = 10, where E 0 /K is 12.92, the analytically obtained marginal probability density function is somewhat too broad as compared with the numerically obtained distribution and has a zero instead of a nonzero mean, see Figure 5b and Table 1.
Augmenting the constraint E = E 0 with dE/dt = 0 leads to the same multivariate normal distribution (25), but with nonzero means μ k = γ that are given by (36) and a diagonal covariance matrix C kl of which the factor σ 2 is given in terms of F and γ by (37). From (36) and (37) it follows that the energy E 0 cannot assume all values but should satisfy E 0 /K ≤ F 2 /2. This has rather important consequences for the entropy diagram in Figure 6a, which is now bounded from above by the parabola E 0 /K = F 2 /2. The analytical marginal probability density function of an individual variable X k , for F = 10 and its corresponding numerical value of E 0 /K, is shown in Figure 6b, together with the numerical distribution. The match between the two distributions is now much better than in the previous case. In Figure 6a the curve of intermediate thickness is the parabola E 0 /K = F 2 /4, which gives the energy that would result if E = E 0 is skipped as a constraint. For F = 10 and the corresponding value of the energy the analytical marginal probability density function is shown in Figure 6c. In accordance with an energy that is overestimated, the probability distribution has a mean that is too large compared with the numerical distribution.
The use of d 2 E/dt 2 = 0 as a constraint in the maximization of entropy leads to a further limitation of the allowed values of E 0 /K. For positive F , the value of E 0 /K should be larger than or equal to (F/2)(−1/z min ) and for negative F , the value of E 0 /K should be larger than or equal to (F/2)(−1/z max ). Here z min and z max are the minimum and maximum values of the quantity z m , defined in (13a). For F -values between −1/z max and −1/z min these conditions are incompatible with the condition that E 0 /K be smaller than or equal to F 2 /2. This is exactly the range of F -values for which the steady state X k = X sk = F is stable. In this range the probability density function collapses into a delta function at the stable steady state. The entropy diagram for this case is shown in Figure 7a, with an enlargement of the region around the steady state in Figure 8.
On the condition that F lies outside the region for which the steady state is stable, the constraint d 2 E/dt 2 = 0 turns out not to influence the marginal probability density function of an individual variable X k . It results, however, in a covariance matrix C kl that has nonzero off-diagonal elements. For F = 10 and E 0 /K = 12.92 the function C kl is plotted as a function of l for k = 19 in Figure 7b (open circles) together with the numerically obtained covariance matrix (solid dots). The resemblance between the two is quite good, the analytical covariances decreasing in amplitude somewhat faster with increasing |l − k| than the numerical covariances. Leaving out the constraint E = E 0 leads to energy values that are determined directly in terms of the forcing. These values are represented by the curve of intermediate thickness in Figure 7a. For F = 10 and the corresponding value of E 0 /K, the associated covariance matrix is plotted in Figure 7c, using open squares. Also here the energy is overestimated, reflecting itself in a maximum of the analytical C kl that is larger than its numerical counterpart. Figures 10 and 11 place the results of the maximum entropy analysis in the perspective of smaller and larger values of the forcing F . Both figures show that the marginal probability density functions and the covariance matrices show the best match between numerical and analytical results if the constraint E = E 0 is included in the analysis. Skipping this constraint leads to an overestimation of the energy. As a consequence, the marginal probability density functions have too large mean values and the covariance matrices too high maxima. We note, however, that if the forcing F is moderate, say less than about 10, leaving out the constraint E = E 0 gives results that are still useful as first estimates of the corresponding numerical integration. They deserve credit in relying only on the assumption that the system is in a statistically stationary state.
We recall from Lorenz and Emanuel [24] that the variables of the system proposed by Lorenz are not to be associated with a particular atmospheric field. Instead, the system is meant to be representative of general atmospheric models by incorporating the most basic atmospheric processes in an elementary way. Due to its simplicity, the system is well suited to illustrate the main message of this article: that the fundamental principles of statistical mechanics can be extended to systems out of equilibrium by using the principle of maximum entropy in combination with constraints on the average time-derivatives of the energy (or of other relevant global quantities). Future research on Lorenz' system might concern the use of timederivatives higher than second-order, as these are expected to introduce additional detail into the probability density function. Mathematically this could pose a challenge as it will lead to non-normal (non-Gaussian) statistics. Of much interest is the application of the method to concrete physical problems, such as atmospheric convection, cloud physics or the climate itself, see Marston [29]. Research on these subjects might yield parametrizations of sub-grid scale processes that are less dependent on ad-hoc assumptions and tuning. In any case, we believe that the method holds sufficient promise to warrant testing on a variety of physical problems.