A maximum caliber approach for continuum path ensembles

The maximum caliber approach implements the maximum entropy principle for trajectories by maximizing a path entropy under external constraints. The maximum caliber approach can be applied to a diverse set of equilibrium and non-equilibrium problems concerning the properties of trajectories connecting different states of a system. In this review, we recapitulate the basic concepts of the maximum entropy principle and of its maximum caliber implementation for path ensembles, and review recent applications of this approach. In particular, we describe how we recently used this approach to introduce a framework, called here the continuum path ensemble maximum caliber (CoPE-MaxCal) method, to impose kinetic constraints in molecular simulations, for instance to include experimental information about transition rates. Such incorporation of dynamical information can ameliorate inaccuracies of empirical force fields, and lead to improved mechanistic insights. We conclude by offering an outlook for future research.


Introduction
The maximum entropy principle is a general variational principle to maximize the entropy of a system under given constraints [1,2]. This principle provides an effective route to a statistical mechanical description of complex molecular systems. The maximum caliber approach represents an implementation of the maximum entropy principle for path ensembles [3,4]. This approach is becoming increasingly used to deal with the equilibrium and non-equilibrium statistical mechanics of trajectories [5][6][7][8][9][10][11].
The trajectories of a molecular system can be produced using computer simulation methods, as for instance molecular dynamics. Such simulations have become a standard tool for predicting equilibrium properties of molecular systems in structural biology, physics, chemistry, and material science, yielding stable and metastable structures and their populations, as given by the free energy differences between states. While the application of molecular dynamics provides such information by sampling the Boltzmann distribution, the timescales that such simulations can access are often insufficient to observe the reactive processes of interest [12][13][14][15]. This is known as the sampling problem, and is a well-known source of statistical and systematic errors [16]. Therefore, enhanced sampling is often needed to overcome the high free energy barriers that are underlying the long timescales [13,17,18]. A second source of error arises from the approximaa e-mail: p.g.bolhuis@uva.nl (corresponding author) tions used in the definition of the force field used in the simulations. In the last decade, it has become therefore common to correct for force field inaccuracies in a system-dependent manner by incorporating experimental data as restraints in the simulations. To implement this strategy, powerful tools have been developed based on the maximum entropy principle [19][20][21][22][23][24][25][26][27][28][29][30] leading to numerous applications for example in protein folding and assembly [31][32][33][34][35][36] as well as in cases where force fields are less accurate [37], such as for intrinsically disordered proteins (IDPs) and nucleic acids [28,30,[38][39][40][41][42][43][44].
The maximum entropy (MaxEnt) principle can also be applied to accurately determine dynamical processes in complex systems undergoing rare transitions, such as biomolecular isomerization, association, self-assembly or nucleation phenomena. In these approaches, experimental rate constants or time-dependent data are incorporated in discrete time, biased molecular dynamics simulations using the maximum caliber (MaxCal) approach [9,45] (for an explanation of the terminology see Table 1). We recently explored a new avenue by developing a method based on the MaxCal approach, which utilizes unbiased atomistic classical molecular dynamics in combination with trajectory-based rare event methods. This framework can be used to ameliorate the effects of force field errors on the kinetic properties of complex molecular systems [46]. Dill and coworkers [5,9,11] have reviewed the broad scope of applications of the MaxCal approach in the area of non-equilibrium dynamics, including Green-Kubo relations, and Onsager's reciprocal relations, as well as applications in equilibrium situations, including the optimization of Markov state models [47]. The latter approach aims to infer the interconversion rate matrix between metastable states from limited information on the population of the states themselves. Using MaxCal optimization, a classical-mechanics connection between trajectories and Langevin/Fokker-Planck parameters was recently derived [8], Furthermore, a way to construct such Markov state models for non-equilibrium steady state processes was also reported [48,49].
These approaches concern discrete systems [9], in the sense that continuous molecular dynamics trajectories are discretized into matrices of transition probabilities between metastable states, which can then be analyzed with MaxCal approaches. In many cases, however, one would like to keep the microscopic atomistic information of the pathways, and use the concept of altering the path probabilities, also known as trajectory reweighting. This concept is well known in the literature, e.g., in large deviation theory [50,51] and path reweighting [48,52,53]. However, the application of the MaxCal approach to continuous space trajectories has proven difficult.
An important feature of the CoPE-MaxCal method is that it can be applied a posteriori, i.e., by postprocessing an existing prior path ensemble. Importantly, such an approach addresses simultaneously the sampling and the force field problems. Thus, the CoPE-MaxCal method enables a system-dependent characterization when experimental data are available, without solving the extremely challenging problem of optimizing the force field itself in a system-independent, transferable manner, even though this might be preferred from a fundamental point of view.
In this review, we provide a perspective of the CoPE-MaxCal method by explaining the basic MaxEnt and MaxCal concepts, and how they lead to the CoPE-MaxCal reweighting scheme. Our aim is to recapitulate, and partly recast, this method, put it in perspective, and to provide an outlook on future research directions. The remainder of the paper is organized as follows. In Sects. 2 and 3, we introduce the MaxEnt principle and the MaxCal approach, and describe how they apply to continuous time and coordinate spaces. Section 4 reviews the specific application to the incorporation of rate constants. Section 5 discusses several extensions, and connects some loose ends. We end with a conclusion and an outlook.

The maximum entropy principle
According to the MaxEnt principle [1,2], the condition of equilibrium for a system is achieved by a maximization of the Shannon entropy where p i is the probability distribution for state i. Note that this entropy is related to the Gibbs entropy by a factor k B . For a closed and isolated system with only the constraint of an overall normalization, obtaining the probability distribution that optimizes the entropy can efficiently be done by the method of Lagrange multipliers. The Lagrange function is where ν is the Lagrange multiplier, which is used to impose the normalization condition. The extremum of this function follows from setting to zero the derivative with respect to p i Solving this for p i gives which is a constant factor. Applying the normalization condition leads to where W is the number of accessible states, so that the Lagrange multiplier is given by e ν+1 = W , leading to the probability p i = 1/W . This condition corresponds to the standard assumption in statistical mechanics that in a closed and isolated system each microstate has the same probability, which corresponds to the Boltzmann identity S = ln W , in units of the Boltzmann constant k B . Often the MaxEnt principle is concerned with the relative entropy, which is where p 0 denotes the prior distribution, e.g., the constant probability of the microcanonical ensemble. The relative entropy is the negative of the Kullback-Leibler (KL) divergence, and reaches its maximum of zero only when the prior and posterior distribution are identical. The MaxEnt principle can be applied subject to additional constraints, for instance that an average of a fluctuating observable s i is fixed to a certain value The MaxEnt principle states that in equilibrium, the entropy should be maximized subject to this constraint. Using the method of Lagrange multipliers, the optimization function becomes which again can be optimized by setting the derivative to zero Rearranging leads to where we ignored the normalization constant set by ν.
Assuming a constant prior probability p 0 , and normalizing, gives the distribution where the partition function is As an example, we can take the energy E i of the system to be the observable s, and identify η as β. This leads to which is the Boltzmann distribution, with β = 1/k B T being the inverse temperature of the surrounding reservoir that the system can exchange heat with. In addition, the derivative of Z with respect to the variable η gives the average of the observable. In summary, the above outline illustrates how applying the MaxEnt principle to systems with a large number of degrees of freedom leads to standard statistical mechanics. For further details, we refer to Refs. [1,2,9].

MaxEnt in configuration space
In this section, we discuss the application of the Max-Ent principle in continuous space [28,29]. Consider a continuous state space x, where x denotes the positions and velocities of all the particles in a system. The maximum relative entropy principle states that, given a prior probability distribution (density) ρ 0 (x), the optimal distribution ρ(x) maximally compatible with a set of given experimental constraints is the one that which maximizes the entropy, or the negative of the KL divergence  The entropy should, thus, be maximized subject to M experimental constraints where the s i denote the experimental observables (1 ≤ i ≤ M ), and s exp i is the value of the experimental constraint. The last line is the normalization condition. Again, the relative entropy is always non-positive, and zero only if the two distributions are identical. In Fig. 1 we illustrate the application of the MaxEnt principle to the reweighting of the prior conformation distribution, thereby identifying a posterior distribution which meets a given experimental constraints s exp i . The maximization can be carried out using the method of the Lagrangian multipliers, by looking for the stationary point of the Lagrange function where the μ i and ν are the multipliers, which are related to each of the M experimental constraints, and the normalization condition, respectively. A functional derivative with respect to ρ(x) gives Setting the derivative to zero and solving for ρ(x) leads to where Z is a normalization factor akin to the partition function, The posterior MaxEnt density is, thus, a reweighing of the prior distribution with an exponential factor involving the experimental observable. The unknown multipliers can be determined by noting that the solution of the Lagrange function is equivalent to minimizing the following function with respect to μ, as shown by Cesari and coworkers [28] where bold symbols denote a vector of the M multipliers and observables. This can be seen as a Legendre transformation. Minimization of Γ(μ) with respect to each multiplier yields a set of M equations for the ensemble averages in the posterior ensemble which can be solved to give the sought multipliers μ i . For further details on the MaxEnt approach, we refer the interested reader to Ref. [28].

MaxEnt approach for equilibrium constants
In this review, we are concerned with the application of the MaxEnt principle to incorporate kinetic information, for example rate constants, in the characterization of a system. While the phase space distributions in the MaxEnt approach preclude the description of time dependent properties, one can use MaxEnt to impose the ratio of rate constants, which is related to the equilibrium constant.
In what follows, we focus on systems that exhibit twostate kinetics between two well-defined stable states, A and B, for which there is a separation between the molecular timescale and the reaction time [12,13]. This guarantees that well-defined rate constants, k AB and k BA , exist for the interconversion from A to B, and vice versa. Using the detailed balance condition k AB π A = k BA π B , with π being the equilibrium population of the two states, it follows that This equilibrium constant is related to the equilibrium fraction K = π B /(π A + π B ) = 1/(1 + K eq ), which reduces to K = π B , for normalized populations. This is an equilibrium quantity that can be treated with the MaxEnt procedure described above by setting M = 1 and setting the s exp 1 = π B . Next, we should find a microscopic function s 1 (x) that can measure the equilibrium fraction K. In the following, we denote this microscopic function g(x). Substituting these variables in Eq. 22 gives which states that the ensemble average of g(x) should be equal to π B , the fraction of configurations in state B. Therefore, g(x) should be a function that determines whether or not a configuration is part of B or not. A possible, quite natural, definition is the (configurational) committor function p B (x), which gives the probability for ending in state B, when a trajectory is initiated in x with random velocities [56,76,121] (see Fig. 2). While this is not an easy function to compute, it is well defined and extensively studied [56,122,123]. Note that the committor is also sometimes denoted Onsager's splitting probability [124], or p-fold [125] in the context of protein folding. Here, following the transition path theory convention [126], we denote p B (x) as the committor function. Using the committor function p B (x), it is possible to compute the density in the basin of attraction of state A, and that of state B, respectively, as where p A (x) = 1 − p B (x). Setting g(x) = p B (x), we now can apply the MaxEnt principle to obtain the reweighted posterior distributions The posterior densities will alter the committor function as [127] p with respect to the original prior committor or, by substituting the densities As we need the g(x) to be correct in the posterior ensemble average, it is clear that g(x) ≡ p B (x) is not equal to p 0 B , but instead follows from numerically solving Eq. 30.
The multipliers μ A and μ B derive from the ensemble average conditions, and are related to the ratio of the Using the definitions in terms of rates, this is equal to We can then identify the multipliers μ A and μ B as We will later see that this is correct by applying the MaxCal approach.

The maximum caliber approach
In 1980, Jaynes proposed an extension of the maximum entropy principle for trajectories [3]. Following the notation of Dill and collaborators [5,9,11], the path entropy is defined as where Γ = {X 0 , X 1 , ...X L } denotes a sequence of discrete state variables X, at different time indices, p Γ is the probability of such sequence, and q Γ denotes a prior or reference distribution. The summation is over all possible sequences of states, or trajectories. Consider a dynamical feature s(Γ) in the system that can be computed over the trajectory ensemble According to the MaxCal approach, imposing this average as a constraint on a system can be carried out according to the maximum entropy principle, by optimizing the distribution p Γ such that it maximizes the path entropy in Eq. 34. To do so, we again use the method of Lagrange multipliers. The optimization function, or caliber is This constrained caliber can, in analogy with the Max-Ent approach, be optimized by setting the derivative with respect to p Γ to zero, and solving for p Γ , with the partition function analog The path ensemble average of s is recovered by taking the derivative of Z with respect to η Solving for η gives the value of the Lagrange multiplier. Dill and collaborators reviewed the MaxCal approach extensively, and illustrated how it can be invoked to obtain general principles for non-equilibrium systems, including the Onsager's and Green-Kubo relations, Prigogine's entropy production, and the diffusion and Fokker-Planck equations. In addition, the Max-Cal approach can be used for optimization of transition rate matrices in Markov state models, for equilibrium and non-equilibrium steady states [48,49]. The Max-Cal approach also has been used for reaction coordinate optimization [128,129].
These studies define the state space as a discrete space, which makes the approach suitable for Markov state models, but less so for molecular dynamics trajectories in continuous space. However, as we are interested in atomistic microscopic trajectories, our purpose is to recast the MaxCal approach in continuous space.

Continuum path ensemble (CoPE) MaxCal
In this section, we discuss how the MaxCal approach can be extended to the space of continuous trajectories. Such space can be mapped using molecular dynamics simulations, Langevin dynamics or Brownian dynamics. A trajectory can be denoted as a ordered sequence of frames x = {x 0 , x 1 , ...x L }, where x is a continuous space configuration as in Sect. 2.2, and the subscripts denote the time index. Each frame is separated from the next by a short time interval Δt, so that the total duration of a trajectory is T = LΔt. The probability distribution Table 2 Definitions of variables and functions used in this work CV Description

ρ(x)
A configurational probability distribution, with x denoting the configuration ρ 0 (x) The prior configurational probability distribution ρ ME (x) The maximum entropy posterior configurational probability distribution, after applying the constraints si(x) Collective variable/observable i as a function of the microscopic configurations.
Level of confidence in the data i σ k AB Level of confidence in the rate constant data kAB μi Lagrange multiplier associated with experimental data i μA,B Lagrange multiplier associated with experimental rate constant data for A (kAB), or B (kBA) Short time Markovian probability representing the evolution of the probability density A full path distribution, with x denoting the trajectory The prior full path distribution The maximum caliber posterior path distribution, after applying the constraints hA,B(x) Indicator function identifying whether a configuration is in state (A,B)

PA,B[x]
A partial path distribution of trajectories starting in A, or B The prior partial path distribution of trajectories starting in A, B P MC The MaxCal posterior partial path distribution of trajectories starting in A, after applying the constraints related to state A, i.e kAB P MC The MaxCal posterior partial path distribution of trajectories starting in B, after applying the constraints related to state B i.e kBA λ Collective variable used to parameterize ordered set of interfaces between state A and B λ(x) Value of collective variable λ evaluated at configuration x λmax [x] Maximum value of λ along an individual path x λmin [x] Minimum value of λ along an individual path x PA(λn|λi) Probability that a trajectory originating from A and crossing interface i, reaches interface n (i.e., B) Probability that a trajectory originating from A and crossing the first interface (λ0), reaches interface i PA(λ|λ0) Probability that a trajectory originating from A and crossing the first interface, reaches a value λ. φ0 Flux through innermost interface λ0 for trajectories starting in A fA(λ[x])

MaxCal biasing/reweighting function for pathways starting in A fB(λ[x])
MaxCal biasing/reweighting function for pathways starting in B P 0 A (λ|λ0) Probability a trajectory of the prior path ensemble originating from A and crossing the first interface, reaches a value λ P MC Probability a trajectory of the MaxCal posterior path ensemble originating from A and crossing the first interface, reaches a value λ g(λ) MaxEnt biasing function of the configurational densities that reassures the experimentally determined equilibrium constant g exp Experimentally determined value of the configurational density observable, i.e. the equilibrium constant ρ(λ) Configurational density projected onto λ ρ 0 (λ) The prior configurational density projected onto λ ρ MC (λ) The MaxCal configurational density projected onto λ ρA,B(λ) A configurational density projected onto λ for trajectories originating in A, The prior configurational density projected onto λ for trajectories originating in A, B ρ ME A,B (λ) The MaxEnt configurational density projected onto λ for trajectories originating in A, The MaxCal configurational density projected onto λ for pathways originating in A, B ρ(λ|x) Instantaneous density operator projecting the path The reaching histogram for pathways originating from state A, crossing the innermost interface λ0 and just reach the value of λ R 0 B (λ|λn) The reaching histogram for pathways originating from state B, crossing the innermost interface λn and just reach the value of λ Keq The equilibrium constant K The equilibrium fraction K exp eq The experimental equilibrium constant K 0 eq The prior equilibrium constant pB(x) The probability that a trajectory initiated at configuration x (with random momenta) ends up in B, a.k.a. the committor pB(λ) The probability to end up in B, a.k.a. the committor, projected onto the collective variable λ S(P||P 0 ) Relative path entropy, or Caliber, equivalent to negative KL divergence between path distributions P and P 0 where ρ(x 0 ) is the initial distribution, and p(x i → x i+1 ) is the short time Markovian propagator that describes the transition from state x i to the next x i+1 in the time interval.
In molecular dynamics, the initial condition is often the Boltzmann distribution ρ(x) = exp(−βH(x)), with H(x) the Hamiltonian of the system. The short-time Markovian probability p(x i → x i+1 ) depends on the dynamics used, for example for molecular dynamics in the microcanonical ensemble this probability is a δ function given by the molecular dynamics integrator or propagator φ, p( , where δ R denotes the random displacement in the stochastic (Wiener) process, and σ 2 is the variance of that displacement [54]. One can also define a Metropolis-Hasting Monte Carlo-based dynamics where the min function returns the lower of its arguments [54]. Besides enabling the estimate of equilibrium properties, molecular dynamics trajectories can also offer information on dynamical and non-equilibrium properties. Since this type of study is affected by the approximations involved in the definition of the force field, here we are concerned with improving such description by incorporating in the simulations dynamical information in the form of constraints on dynamical properties such as the rate constants.
The path entropy is now defined as where Dx indicates an integral over all trajectories or paths x. Following MaxEnt for continuous space, the MaxCal approach states that the optimal distribution P[x] is given by the maximization Again, the s exp i are experimental constraints encoded in the microscopic observable s i [x]. The ensemble average s i [x] refers to either static/thermodynamic or dynamic/kinetic observables. Note that s[x] is now a function of the entire path, which includes autocorrelation functions.
Following the same procedure as for the MaxEnt approach, we can perform this optimization by applying the method of Lagrange multipliers, and finding the stationary point of the Lagrange function, or caliber by taking the functional derivative with respect to the path distribution. This yields (44) which gives in turn the posterior distribution as a reweighting of a given prior distribution P 0 [x]. In Fig. 1 we illustrate the effect of the CoPE-MaxCal procedure in reweighting the prior path distribution, thereby identifying a posterior distribution which meets a given experimental constraints s exp i , which can be for instance a rate constant. It is important to realize that the trajectories are not changing, but only the weights with which they occur in the ensemble. This is analogous to the fact that the configurations are not changing in the MaxEnt procedure.

CoPE-MaxCal for dynamical constraints
In this section, we discuss the application of the CoPE-MaxCal approach to incorporate dynamical information as constraints in the description of a system. This can be achieved for instance using a correlation function c(t) defined as where the observable s i at time t = 0 is correlated with observable s j a later time t. As i and j can be identical, this definition includes autocorrelation functions. In the second equation, we have used the explicit dependence s i,j (x τ ) as a function of the configuration x τ , with τ = t/Δt corresponding to the frame index for time t.
Experimental information about the dynamics of a system in the form of correlation data c exp (t) can be imposed as a constraint on the path ensemble distribution, giving the Lagrange function Following the optimization procedure gives yielding the posterior distribution As an example, consider the mobility K τ [x], measuring the mean square displacement at a particular time τ with respect to time τ = 0. As this correlation only has to be constrained at τ , the posterior is Note that this is also the expression for the s-ensemble (with μ = s, where s should not be confused with the observable function s [x]). In the s-ensemble path ensembles are biased according to a time correlation function [51,[105][106][107][108][109][110]. This s-ensemble is usually presented in the context of large deviation theory, but clearly follows also from the MaxCal approach. The sensemble biases all paths with a field s conjugate to the function K. In the MaxCal approach, the Lagrange multiplier μ follows from the constraint imposed. Thus, the s-ensemble might also be interpreted as the field that imposes a certain constraint.

MaxCal for thermodynamic constraints
Since equilibrium properties are not time dependent, they can be computed as time averages over path ensembles distributions with L being the average path length, and x t the coordinates at each time step of the path. Constraining an equilibrium property s exp then leads to a posterior distribution An alternative way of constraining equilibrium properties is to first reduce the path space back to a configurational density ρ(x) ≡ P (x) by The average then becomes Indeed, substitution of Eqs. 52 and 53 in Eq. 54 yields the same result as Eq. 51.

Independence of partial path distributions
In the above sections, we did not specify the path ensembles precisely. When considering systems that show two-state kinetics between two stable states, A and B, with forward and backward rate constants, k AB and k BA respectively, we can think about partial path ensembles P A [x] and P B [x], consisting, respectively, of all paths that start in A and paths that start in B. We can define these partial ensembles as are the indicator functions, which are unity when the configuration x is in state A(B), and zero otherwise. Restricting all paths to start and end in one of the stable states, the total path distribution is simply the sum of the non-normalized partial path distribu- We apply different constraints s A and s B to both partial ensembles we arrive at the Lagrange function where we used the definition of the partial ensembles. Maximization of the caliber yields the posterior (56) or, expressing it in partial ensembles For paths belonging to partial ensemble A, h A (x 0 ) = 1 and thus h B (x 0 ) = 0, leading to while for paths from partial ensemble B, h A (x 0 ) = 0 and h B (x 0 ) = 1 it holds Thus, both partial ensembles can be optimized and normalized independently. ≡ k exp BA . However, we also need a microscopic dynamical correlation function for this constraint The observed rate constant is the time derivative of this function, measured at the plateau value [12] The rate is, thus, the flux through the boundary of state B, given that the trajectory started in state A. At this point, we are switching to the language of transition path sampling [54][55][56], and in particular transition interface sampling [57], as this allows us to write the rate constant in terms of the path probability P[x]. In the TIS method, the configuration space is foliated by a set of non-intersecting interfaces parametrized with a collective variable λ(x). This set of n + 1 interfaces is denoted {λ 0 , λ 1 ...λ n }. Taking λ 0 and λ n as the boundary of A and B, respectively, the rate is expressed as the effective positive flux φ through these interfaces where φ i denotes the effective positive flux through interface i. In the second equation, we have defined the crossing probability P A (λ n |λ 0 ) for a trajectory to reach interface λ n , under the condition that it has already crossed λ 0 = λ A . Note that φ 0 is usually large, and easy to obtain from a molecular dynamics simulation in state A, while P A (λ n |λ 0 ) is usually very small, and difficult to compute. While it is in principle (and sometimes even in practice) possible to obtain this also from a brute force molecular dynamics simulation, it is much more efficient to compute via TPS [17,54,56,130,131], TIS [57][58][59], Forward Flux Sampling [111,112], or other trajectorybased methods. In these sampling approaches, we can construct a path ensemble by reweighting the paths, yielding the so-called reweighted path ensemble [132], which constitutes a collection of paths with an associated weight, representing the probability to observe that path in an unbiased simulation, i.e., P A [x]. This reweighted path ensemble is related to the crossing probability by where λ max [x] returns the maximum value of λ(x) along the trajectory, and the Heaviside step function θ returns unity for the paths that cross the λ interface, assuming that λ monotonically increases when going from A to B. This function can take the place of the microscopic correlation function in the Lagrange function, giving Optimizing this function as before leads to the posterior We can now solve for the Lagrange multiplier by constructing a log partition function as in Eq. 21 taking the derivative with respect to the μ A and setting it to zero Since only the paths that reach the final interface are contributing to the θ function, this changes into where we replaced the P with P 0 in the normalization, which hardly affects it. The exponent on the left hand side can be taken out of the integral, yielding The exponential factor needs to be larger than unity for an increase in the rate, which is then associated with a negative Lagrange multiplier μ A . As we prefer to associate a positive variable with an increase in rate, Indeed, a positive μ A now increases the rate.

Imposing a rate constraint for all λ
In the above procedure, only the reactive AB paths in the prior ensemble are reweighted to yield a given rate constant. However, while this imposes the experimental rate constant, it does lead to an undesirable discontinuity at the boundary of λ B . Moreover, the rate cannot be strongly dependent on where precisely this boundary is located. More precisely, the rate can be expressed as the product of two crossing probabilities [57] We can now interpret the λ i as the interface that we put the kinetic constraint on. Since this location is arbitrary for 0 < i < n we impose constraints on all these λ values simultaneously. This leads to the following Lagrange, or caliber, function where the sum runs over the n + 1 interfaces. In each of the constraints in the summation the paths that cross λ i are counted, and P A (λ n |λ i ) is the crossing probability to reach B from the λ i interface. Optimization leads to the MaxCal posterior path distribution Note that the P A (λ n |λ i ) in this equation is dependent on the posterior distribution itself, and hence this is an implicit definition. The sum in the exponent is only dependent on λ max [x] (and of course on P A (λ n |λ)), but for a given system P A (λ n |λ) is a function of λ, so the sum can be written as where the P A (λ n |λ) dependence is implicit in the function f A . The interpretation is that the weight of each trajectory in the posterior path ensemble is entirely dependent on λ max [x]. The n+1 constraints imposed are given for k = 0...n, with In Ref. [46] it is shown that these constraints can fulfilled for any reasonably continuous function f A (λ), as long as f A (λ B ) = μ A , and f A (λ 0 ) = 0. We can simplify the function by considering all paths that reach a maximum at λ j . Due to the θ function in Eq. 74 only interfaces 0 < i < j contribute to the sum, leading to The function f A (λ) will be determined in the next section. Figure 3 shows schematically how the TIS, reweighted path ensemble and CoPE-MaxCal approaches are related in terms of interface ensembles based on λ.
The prior path distributions can be projected on the collective variable (CV) or order parameter λ. In particular, the crossing probability P 0 A (λ|λ 0 ) and configurational density ρ 0 A (λ) are expressed as where we defined the instantaneous density projection operatorρ to project a single path x on the CV λ. These projections can also be done for the posterior distribution, leading to the configurational density ρ MC A (λ) and for the crossing probability where we defined the prior probability R 0 A (λ|λ 0 ) for a path to just reach the interface λ

The MaxCal bias function f A (λ) follows from MaxEnt for the density
The function f A (λ) above is not completely specified. However, we can make use of the density obtained by a MaxEnt reweighting of the same system. In particular, we can set where we projected the configurational density on the λ parameter. Like in Sect. 2.3, we can use the committor to impose the constraint, g(λ) = p B (λ), where p B (λ) is now the projected committor. This committor follows from the implicit equation Eq. 30. This procedure leads to the following condition for the densities which should be solved numerically. We can also use the full (configurational) committor p B (x), which gives where p max B [x] returns the maximum committor value along the trajectory x. Clearly, this might be very difficult to obtain in practice.

Application of CoPE-MaxCal to protein folding
The CoPE-MaxCal approach enables the investigation of mechanistic properties for instance the location of the transition state, by imposing experimental transition rates. In this section we briefly highlight an application of the MaxCal approach on the folding of a small protein, which we presented in Ref. [46]. Taking information from a long molecular dynamics simulation [133], we constructed the prior path distributions and predicted rates [46]. Since the force field did not reproduce the folding rate correctly at the expected melting temperature, we imposed the correct experimental folding rate using the CoPE-MaxCal procedure [46]. The reweighted path ensembles led to a reweighted free energy and committor landscape. Importantly, the imposed rate caused a shift in predicted transition state (see Fig. 4), that provides a qualitative change in mechanistic insight. We refer to Ref. [46] for more details and examples.

Interpretation of the CoPE-MaxCal method
The CoPE-MaxCal method takes as input an unbiased ensemble of paths from molecular dynamics simulations, or a reweighted path ensemble [132] from TIS [57] or Virtual Interface Exchange TPS [134], and reweights each trajectory in this ensemble according to how far it progresses along a predefined collective variable (see a rendering in Fig. 5). This includes the paths that cross the barrier and reach the final state, so the rate constants are automatically constrained to the correct value via the functions f A,B (λ). The more involved part of the framework is to ensure that the thermodynamic properties are correct, in particular the equilibrium constant. This requires a specific bias function g(λ) based on the committor function, which produces the least perturbed path ensemble, while still obeying the constraints (see Sect. 2.3). So, imposing g(λ) can be viewed as responsible for constraining equilibrium conditions, whereas f A,B (λ) takes care of the dynamical corrections. The interpretation of the reweighting procedure is that trajectories are artificially made more, or less, probable in the path ensembles. This is analogous to changing the weight of each conformation in the Boltzmann distribution, using the MaxEnt approach. Note that the trajectories themselves do not change in the reweighting procedure. Rather, the distribution of initial conditions is adjusted. This is analogous, but not identical, to how microcanonical trajectories can be reweighted to give a canonical distributed path ensemble (see, e.g., Ref. [135]).

Multiple state descriptions
The MaxCal method can be extended to multiple states. For multiple states, the kinetics is described by a rate matrix K with entries k ij for each pair of states i and j. Incorporating multiple rate constants in the ensembles adds additional constraints to the caliber function, which ensures that the fluxes through the final interfaces are correct, and yields the following path reweighting where the function f i is now defined for each state, as function of the maximum value along an order parameter, that is possibly different for each state. Note that by changing the weight of state i all rates out of i are changed in the same way. Thus k i,j = c i k 0 i,j , i.e., all the rates from state i change proportionally by the same factor. This leads to a consistent description.
The other ingredient, the MaxEnt part, is based on the committor. Applying a correction to the imposed equilibrium constant K ij = π j /π i = k ij /k ji . means that the densities for each basin of attraction ρ i (x) are reweighted by where p i is the committor to state i with i p i = 1 [136]. Using the definition This implicit equation for p i (x) can be solved numerically for a given ρ 0 i (x). This gives rise to the MaxEnt reweighting by setting g i (x) = p i (x) which in turn can be related to the MaxCal condition or where the minimum of the committor p i to state i is searched for along the path. Note that we have to use the minimum along the pathway now, since the committor starts high, being close to the initial state, and then slowly decreases to zero as the reaction progresses. It is also possible to project the committor to an order parameter λ i , connected to each state. A numerical solution should then lead to a set of committorbased bias functions p i (λ i ). We leave the specific details for future work.

An alternative formulation of the MaxCal approach
In this section, we discuss whether one could establish a MaxCal approach by including the equilibrium density constraint directly in the MaxCal procedure. This would give an additional constraint to fulfill One could include this constraint in the MaxCal approach, which would lead to an additional weighting factor where the ζ multiplier would be determined by solving and where one should use the non-normalized distributions to obtain the correct relative path ensemble weights. This could in principle be done but has a serious downside, namely that the paths are weighted with the instantaneous path length L[x], which is sensitive to the A and B definition, and hence seems arbitrary. Moreover, this is not what we do in the CoPE-MaxCal approach. Instead, we state that the MaxCal method should reproduce the MaxEnt density ρ ME A (x) at each configuration x. This means that, in fact, we are introducing many constraints, one for each configuration point x, yielding an additional set of constraints where g(x) is again the MaxEnt biasing function defined in Sect. 2.3. This set of constraints ensures that the densities match at all configurations x. The path weight is then where the ζ A (x) multiplier would be determined by solving the root of the derivative of the log partition function with respect to the ζ A (x) Indeed, imposing this equation is at the heart of the CoPE-MaxCal method. Note that this implies that the equilibrium constant is also correctly reproduced. Note also that the constraint is stronger than what is needed from MaxCal alone. However, we stress for an equilibrium system the MaxEnt approach should give a distribution that is consistent with the MaxCal one.

Effects of the errors in the measurements
The MaxEnt principle can be used to model uncertainties in the experimental data [24,[28][29][30]. This is done by adding to the constraint average s i the expected error e i due to the perturbed distribution, leading to When the error is Gaussian distributed with a standard deviation σ i , this average expected error is given by where σ i can be interpreted as the level of confidence in the experimental measurements or data. One can include this into the Langrange function and obtain Optimizing this function, and solving for the Lagrange multipliers μ i thus takes the level of confidence into account. Setting σ = 0, the level of confidence is so high that it is actually a constraint, and leads to the original Eq. 21. When σ is set large, the Lagrange multiplier μ i will become smaller, and the optimized distributor will hardly differ from the original distribution. Thus, this procedure changes the constraint into a restraint, tuned by the level of confidence in the data. The MaxCal equation can also be adjusted to incorporate the errors in the data, leading to the analog of Eq. 102 where σ kAB represents the level of confidence in the rate constant data. Thus, one can turn the constraint condition into a restraint condition.

CopE-MaxCal and machine learning
The MaxCal method aims at incorporating experimental data in molecular simulations by a path reweighting function f A (λ). Since the computation of f A (λ) from Eq. 86 might be not always trivial, one might use advanced regression procedures, such as those provided by machine learning methods. In addition, when considering the more general form of the function, f A (p B,max [x]), machine learning methods might be invoked to optimize both the f A function as well as the committor description in terms of CVs. This direction of research is left for future exploration.

Optimization of a collective variable with CoPE-MaxCal
By inspecting the CoPE-MaxCal approach, one realizes that the path reweighing depends on the choice of the collective variable λ. For a different choice of λ, one would get a different weight function, and hence caliber. So, it should be possible to optimize the caliber function over all possible CVs. In principle, it is possible to vary the collective variable and maximize the entropy and caliber as function of this CV. The most optimal collective variable is then the one that leads to the least perturbed path distribution. To do so, we can substitute the optimized MaxCal distributions P MC , with C A , C B normalization constants for these distributions, into the path entropy expression. Here, we can make use of the reaching histograms approach R 0 A (λ|λ 0 ), and R 0 B (λ|λ n ). It follows that where the normalization C A = dλR 0 A (λ|λ 0 )e fA(λ) , is now in terms of the reaching histograms. While in the above all sub-distributions P 0 A , P A , P B , P 0 B are supposed to be normalized, we require the normalized total path distributions P and P 0 , when computing the total entropy In terms of S[P A ||P 0 A ] and S[P B ||P 0 B ], defined above, the total path entropy becomes It is now possible to compare combinations of different CVs with each other and select the best combination that optimizes the caliber, i.e disturbs the original distribution as little as possible. This gives thus an additional variational principle. Going back to the original caliber maximization over the distribution, we can add a second level of optimization with f (λi) , the instantaneous measures for the rates. The additional maximization over the CV space gives, thus, the best set of CVs. Such a procedure is very similar to reaction coordinate optimization [128,[137][138][139][140][141].

CoPE-MaxCal as a variational principle?
In the previous section, we discussed how varying the collective variables until the path entropy is maximized yields the best set of CVs, i.e., the reaction coordinate. It seems, therefore, natural to identify the committor p B as the best collective variable, or reaction coordinate, since it predicts for each configurations the probability to end in B. But why should the MaxCal procedure give the same reaction coordinate? In other words, why would the path entropy be optimal when the committor is chosen as a CV?
This question can be addressed by showing that when deviating from the optimal CV, i.e., the committor function, one always increases the KL divergence, i.e., lowers the path entropy. This would be the case even for very small perturbations, i.e., in the limit μ → 0. Such a finding would amount to establishing a variational principle.
To justify such a principle, we can take a closer look at the path entropy and its behavior for a given path distribution. We consider first an illustrative example in Fig. 6.
For a path ensemble in 2D space of a 3 × 3 grid, we consider two cases, a diagonal and a horizontal CV. From the symmetry of the problem, the left panel should represent the best CV, with the maximum path entropy S or the lowest KL divergence, the negative of S, which is always positive (or zero). To show that we reconsider the expressions for the negative entropy, and change from a integral expression to a discrete version for the sake of this illustration. The KL divergence is then written as where we used R i as the discrete version of the reaching histogram R(λ|λ 0 ), and f i as the discrete version of f (λ). The normalization Substitution of these expressions into the above equation gives We can simplify this expression by taking a factor f 0 out of the first and a factor e f 0 out of the second fraction, diagonal CV horizontal CV Fig. 6 CoPe-MaxCal as a variational principle. We show a path ensemble in 2D space of a 3×3 grid emanating from the bottom left cell, which is part of state A. In both panels, the same prior path ensemble is considered. The coloring of the paths shows the prior weight of the trajectories. Blue indicates a higher weight, while green and red lower weights.
The CV foliation is shown as shades of yellow cells (from dark to light is bin 0, 1 and 2). Also shown is a indication of the location of the barrier in the projection on the CV. This is the best estimate of the location of the barrier in each CV and canceling out several factors: where the f 0 terms cancel Up to now, we made no assumptions. Next, we consider the small perturbation limit μ 1. This means that |f | 1. The logarithm can then be replaced, yielding Combining the two fractions gives The second assumption that we can make is that the R 0 is the largest contribution to the path ensemble. This represents the paths that make small excursions from the stable state. In fact, R 0 can be made exponentially larger than the next terms in the summation, if the barrier is high enough. This means that we can approximate D KL by only keeping the j = 0 term.
The next step is to realize that the bias f 0 → 0, which means that we obtain Finally, since all biases are small, due the fact that we choose μ 1, we can expand the exponential giving The above reasoning shows that in the limit of small μ, the path entropy, i.e., the KL divergence, scales with the square of the bias f -function (thus, being always positive as expected), and is linear in the value of the reaching histogram, the number of paths reaching a certain bin. This derivation shows that this KL divergence should be minimal for the optimal CV.
The f -function is changing with the choice of the CV, but not strongly, as we assume |f | 1. This means that the most important factor is the R i . This then establishes a variational principle. Either f goes to zero (i.e., no perturbation), which is a trivial solution, or the CV is chosen such that the path reaching histogram R i becomes minimal. This path density is minimal for a CV that is aligned with the committor, for the following reason. Reactive paths have to go through a bottleneck where the path density R i is low. When the CV is the committor, this R i is the density of reaching paths that is measured. Any deviation from this optimal CV will include paths that are not part of the bottleneck, with a higher path density. This will then increase the KL divergence, or lower the path entropy. This is illustrated in the figure 6, where only 3 bins are considered: R 0 , R 1 and R 2 (indicated by the yellow shades in the cells). The diagonal CV has only 3 paths in R 1 (the green paths), but the horizontal CV also picks up the blue paths in the lower middle cells, which have a much higher weight (as they do not reach the barrier). This increase in R i is much more influential than the marginal change in f i due to the change in variable. In fact, for this example the function f only changes by 1% between the CVs.
While this is just an example, it is possible that this variational principle holds more generally. In fact, the concept of lowest path density R i for the best CV is reminiscent of the variational transition state theory [142], which states that the best CV is the one that has the highest barrier, or lowest rate. In our treatment, the rate constant does not change since the reactive pathways do not change. Still, the configurational density for both the A and B ensembles projected on the CV, directly gives the free energy as we have seen above. Thus, optimizing the CV using MaxCal is consistent with optimizing the free energy barrier in the sense of the variational transition state theory.

Summary
In this review, we have described the MaxEnt and MaxCal approaches and their extensions to continuum path ensembles through the CoPE-MaxCal method. This strategy enables the definition of the least perturbed path ensemble consistent with given external information, i.e., the trajectory ensemble of maximum relative path entropy under given constraints. We have focused in particular on imposing kinetic rate constants on the continuous path ensembles obtained by molecular dynamics simulations of rare events. We have shown how this can be done by a reweighting function depending on the maximum value of a collective variable along the trajectory. To obtain this function, we made use of the MaxEnt approach to impose the equilibrium constant.
The CoPE-MaxCal optimized trajectory ensemble can yield new information for other observables. One example is a shift in transition state, which we dis-cussed in the previous section. Other kinetic properties might include residual dipolar couplings in NMR spectroscopy.
Besides reviewing the CoPE-MaxCal method, we have discussed several further additional directions of investigations of this method, including an extension to multiple states, and the relationship of the MaxCal collective variable optimization with reaction coordinate optimization methods.
To provide an outlook, we present here some open questions and future directions.

Incorporation of time-resolved experimental measurements
An potential extension and application of the CoPE-MaxCal framework involves the addition of constraints on the evolution of time-dependent variables. In this way, one goes beyond adding a constraint on the final reactive flux into a certain state (the rate constant) for trajectories starting in a different initial state. This extension can be obtained for example by starting from the correlation function in Eq. 46. This would result in a reweighting of the existing trajectories based on constraints added at each time, as derived in Eq. 49. Such time-dependent experimental data could be provided for example by time-dependent small angle X-ray scattering (SAXS) [33,41,42,143] and time-dependent solution cryo-electron microscopy [144].

Rational design of mutations and lead compounds in drug discovery
We envision that a combination of our CoPE-MaxCal framework (providing more accurate path ensembles) with perturbative interaction models can result in kinetics-mutation landscapes and/or kinetics-leadcompound-optimization landscapes. This will, thus, constitute a relatively inexpensive way to non-random design of biomolecular mutation and lead compound derivatives that modify the barrier in the desired direction, as opposed to random design of mutations that have to be tested experimentally, or in expensive additional simulations.

Biased molecular dynamics simulations
In its initial implementation, the CoPE-MaxCal method utilizes unbiased molecular dynamics simulations, possibly with the help of enhanced path sampling such as TPS, thereby preserving the correct kinetics up to the accuracy of the force field. Yet, advances in computational structural biology methods enable incorporating experimental data in the form of restraints or constraints [28][29][30][31][32][34][35][36]38,39,43] by biasing the Hamiltonian, while providing free energies that maximally meet the experimental restraints or constraints. These observations raise the question whether the CoPE-MaxCal approach can be combined with such biasing methods, even if they cannot reproduce the natural kinetics.
Obtaining a free energy landscape using enhanced sampling, and then performing unbiased molecular dynamics, might be a viable way to reconstruct unbiased kinetics. For instance, trajectories from infrequent metadynamics or frequency adaptive metadynamics [145][146][147][148] can subsequently be reweighted to meet given kinetic constraints.
Finally, the Filizola and Keller groups recently explored the use of biased molecular dynamics simulations and subsequent application of constraints on configurational ensemble averages that result in the alteration of an transition-probability matrix to reconstruct the unbiased kinetics [149,150].

Force field optimization for molecular kinetics
Molecular dynamics is in principle capable to provide quantitative predictions about the thermodynamics and kinetics of a system at the atomistic level. However, the accuracy of the calculations is limited by the quality of the force fields, which are typically parameterized using quantum mechanical calculations and experimental measurements of equilibrium properties [14]. There are several methods using MaxEnt, Bayesian inference and force-matching techniques that pursuit the development of such force fields [28,[151][152][153][154][155]. However, there is still a need of further methodological advances in this area for the accurate reproduction of time-dependent properties, such as the rate constants. These advances would have a large impact in the growing field of integrative structural biology, since for example they would potentially better predict populations of conformations that lie in the transition states [46]. Such a force field optimization for molecular kinetics is currently not possible within CoPE-Maxcal approach, as the prior dynamical trajectories, and hence the force field, are not affected in the reweighting procedure. Yet, it might be made possible by introducing novel forward models for the rate as a function of the force field parameters. By computing the derivative of the rate constant with respect to these parameters, one could minimize an error function to meet target kinetics. This is an avenue that are we currently actively pursuing.

Open questions in statistical mechanics
The extension of the MaxEnt principle to path ensembles that we have discussed in this review prompts a series of questions of general nature, some of which are listed in the following.
(1) We have presented the CoPE-MaxCal method as a reweighting scheme. This method affects only the initial conditions (see Sect. 4.6), but not the trajectories themselves. It would be interesting to investigate the analogy between this type of reweighting and the coupling to a heat bath, such as done in the derivation of the canonical ensemble from the microcanonical ensemble. One can also ask whether the MaxCal approach can be used to modify the trajectories themselves. This might be used in an on-the-fly optimization such as proposed in [156]. (2) In the CoPE-MaxCal application to rate constants we have used the Transition Path Sampling and Transition Interface Sampling framework. While these methods are efficient tools to obtain and manipulate path ensembles, one should be able to formulate the methodology independently. (3) The investigation of the relationship between the CoPE-MaxCal approach and other path reweighting methodologies, such as the s-ensemble or other path reweighting methods, could provide further insight for establishing more general methods. (4) The CoPE-MaxCal framework may be extended to non-equilibrium steady states, e.g., for the driven dynamics observed in active systems. For that situation the MaxEnt approach cannot describe the correct steady state density, and one has to use the MaxCal approach in order to construct the reweighting function.

Final thoughts
We anticipate that the application of the MaxCal approach to ensembles of continuum trajectories, as for example provided by molecular dynamics simulations, will offer new opportunities to address equilibrium and non-equilibrium problems that involve the trajectory space. We undoubtedly will see more research in this area in the coming years.

Author contributions
ZFB and PGB designed research; ZFB and PGB performed research; and ZFB, MV, and PGB wrote the paper. All authors have read and approved the final manuscript.

Data Availability Statement
This manuscript has no associated data or the data will not be deposited [Authors' comment: This review has no associated data as all data discussed has been published elsewhere.] Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statu-tory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.