Foundations of Physics

, Volume 44, Issue 9, pp 923–931

Newtonian Dynamics from the Principle of Maximum Caliber

  • Diego González
  • Sergio Davis
  • Gonzalo Gutiérrez

DOI: 10.1007/s10701-014-9819-8

Cite this article as:
González, D., Davis, S. & Gutiérrez, G. Found Phys (2014) 44: 923. doi:10.1007/s10701-014-9819-8


The foundations of Statistical Mechanics can be recovered almost in their entirety from the principle of maximum entropy. In this work we show that its non-equilibrium generalization, the principle of maximum caliber (Jaynes, Phys Rev 106:620–630, 1957), when applied to the unknown trajectory followed by a particle, leads to Newton’s second law under two quite intuitive assumptions (both the expected square displacement in one step and the spatial probability distribution of the particle are known at all times). Our derivation explicitly highlights the role of mass as an emergent measure of the fluctuations in velocity (inertia) and the origin of potential energy as a manifestation of spatial correlations. According to our findings, the application of Newton’s equations is not limited to mechanical systems, and therefore could be used in modelling ecological, financial and biological systems, among others.


Newtonian mechanics Maximum caliber 

1 Introduction

Newtonian mechanics is a cornerstone of our civilization. From its simpler formulation, \(F=ma\), to Lagrangian and Hamiltonian mechanics it has evolved an elegant mathematical apparatus around it. However, we still have to assume the principle of minimum action as a postulate (that is, if we do not consider Newtonian mechanics as a limiting consequence of quantum mechanics).

There have been attempts to derive Newtonian dynamics from different principles. Particularly related to this work, it has been previously derived from information-geometric arguments leading to the idea of entropic dynamics [1]. This idea is based on the assumption of an irreducible uncertainty in the position of a particle, implying an information metric for space from which Newton’s second law naturally emerges. Caticha and Cafaro’s derivation is founded on Jaynes’ maximum entropy principle [2] (MaxEnt for short), a generic rule for the construction of probabilistic models.

In 1957, Edwin T. Jaynes  [2] postulated that Statistical Mechanics has to be understood, not as a physical theory in the same footing as, say, classical mechanics or electromagnetism, but as an application of statistical inference to a system with macroscopically large numbers of degrees of freedom. The question was reversed from “given the microscopic evolution of the system, what is the probability distribution for the macroscopic quantities?” to “given a few known macroscopic properties, what are the possible microstates compatible with said knowledge?”. The answer, as initially proposed by Gibbs, was the probability distribution with maximum entropy \(S=-\sum _i P_i\ln P_i\) subjected to constraints reflecting the known macroscopic properties. Jaynes, after the work of Shannon in information theory, realized that this procedure (maximization of \(S\) constrained only by the known information) is not limited to Statistical Mechanics but a valid principle in any problem of statistical inference. This is the principle of Maximum Entropy. Due to the uniqueness of Shannon’s entropy in characterizing uncertainty, it is the most unbiased procedure for the construction of statistical models. Later, it has been axiomatically derived [3, 4] from requirements of internal consistency, so its validity is independent of any meaning assigned to the quantity \(S\).

A full understanding of the maximum entropy principle requires the adoption of the Bayesian viewpoint on probability: An “absolute” probability distribution \(P(x)\) does not exist, only conditional probability distributions \(P(x|H)\) where \(H\) represents a given state of knowledge. For a given \(H\), MaxEnt gives us the most unbiased model we can propose with our (possibly limited) information, therefore we cannot guarantee beforehand that it is the correct model. As we acquire more and more information, \(P(x|H)\) becomes increasingly more independent of \(H\), and for all practical purposes the distribution becomes objective (independent of the observer).

The principle of Maximum Caliber [5, 6] generalizes this idea to dynamical systems, including time explicitly. It has been applied recently to discrete dynamics [7] and earlier to derive the Fokker–Planck equations [8] and the Markov process formalism [9].

In this work we show that if we use the maximum caliber principle to find the unknown trajectory of a particle, there are two general conditions that automatically lead to Newton’s second law, namely that (a) the expected square displacement per step is known at all times, and (b) that the time-independent probability of finding the particle at any coordinate is also known. Knowledge of both (a) and (b) leads to Newton’s second law in expectation over trajectories, and what is perhaps more interesting, any dynamical system not following Newton’s second law has to violate at least one of these assumptions.

2 The Maximum Entropy and Maximum Caliber Formalism

Consider a system with \(N\) degrees of freedom, whose states are denoted by vectors \(\vec {x}=(x_1,\ldots , x_N)\). Suppose the expectation values of \(M\) functions \(f_i(\vec {x})\) are known. This represents a Bayesian state of knowledge, that we will denote by \(H\). Maximization of the Shannon entropy
$$\begin{aligned} S = -\int d\vec x P(\vec x|H)\ln P(\vec x|H) \end{aligned}$$
under the constraints \(\big <f_i(\vec x)\big >_H = F_i\) leads to the MaxEnt model
$$\begin{aligned} P(\vec {x}|H) = \frac{1}{Z(\vec {\lambda })}\exp \left( -\sum _{k=1}^M \lambda _k f_k(\vec {x})\right) , \end{aligned}$$
where the value of the Lagrange multipliers \(\lambda _k\) needed to impose the \(M\) constraints can be determined from
$$\begin{aligned} -\frac{\partial }{\partial \lambda _k} \ln Z(\vec {\lambda }) = F_k. \end{aligned}$$
Equation 3, being a nonlinear equation is usually impractical to solve. Moreover, it needs the partition function explicitly. However, it has been recently shown [10] that for the Lagrange multipliers the equality
$$\begin{aligned} \Big <\nabla \cdot \vec {v}\Big > = \sum _{k=1}^M \lambda _k \Big <\vec {v}\cdot \nabla f_k\Big > \end{aligned}$$
holds, with \(\vec {v}\) an arbitrary differentiable vector field, and this provides a linear system of equations for \(\vec \lambda \). The most probable microstate \(\vec x_0\) is such that
$$\begin{aligned} \nabla P(\vec x|H)|_{\vec x=\vec x_0}=0, \end{aligned}$$
therefore the function
$$\begin{aligned} G(\vec x)=\sum _{k=1}^M \lambda _k f_k(\vec {x}) \end{aligned}$$
which is a combination of all the constraints imposed, is also an extremum at \(\vec x_0\).
In the case of dynamical systems, we now ask for the possible microscopical trajectories compatible with some known information \(H\) (expressed as expectation values). The result is the probability distribution of trajectories \(P[x(t)|H]\) which maximizes the Shannon entropy
$$\begin{aligned} \mathcal {S} = -\int \mathcal {D}x(t) P[x(t)|H] \ln P[x(t)|H] \end{aligned}$$
given \(H\). Note that \(S\) is now defined as a functional integral, as indicated by the notation \(\mathcal {D}x(t)\). This conveys the idea of integration over a space of trajectories \(x(t)\), as is commonly denoted in Feynman’s path integral formulation of quantum mechanics [11].
The probability distribution functional for the different possible trajectories is
$$\begin{aligned} P[x(t)|H] = \frac{1}{Z[\lambda (t)]}e^{-\int dt \lambda (t) f[x(t); t]}, \end{aligned}$$
where, similarly to Eq. 3, the Lagrange multiplier function can be obtained from
$$\begin{aligned} -\frac{\delta }{\delta \lambda (t)} \ln Z[\lambda (t)] = \big <f[x(t); t]\big >. \end{aligned}$$
Here the most probable trajectory extremizes a particular functional (analogous to an “action” in classical mechanics)
$$\begin{aligned} A[x(t)] = \int dt \lambda (t) f[x(t); t] \end{aligned}$$
depending on the constraining function. This leads to the question: without introducing the Lagrangian of classical mechanics explicitly, could it emerge naturally from simpler constraints in a Maximum Caliber problem?

3 Derivation of Newton’s Second Law

Consider a single particle following an unknown trajectory \(x(t)\) in one spatial dimension. This can be easily generalized to many particles in arbitrary dimensions, at the cost of overcomplicated notation.

If we discretize time, such that the continuous variable \(t\) is sampled at intervals \(\Delta t\) and becomes \(t_k=k\Delta t\) (with \(k=0,\ldots ,n-1\)), the positions \(x(t)\) have to be replaced with \(x_k=x(t_k)\). Now the trajectory \(x(t)\) itself becomes a vector \(\vec {x}=(x_0, \ldots , x_{n-1})\), and the Lagrange function \(\lambda (t)\) also becomes a vector \(\vec \lambda =(\lambda _0, \ldots , \lambda _{n-1})\). This is the method of finite differences [12]. We can see that, using this discretized version, we recover Eqs. 2 and 3, therefore we can also employ the identity given by Eq. 4 for this kind of discretized Maximum Caliber problem.

We now impose the following constraints (expectations are to be interpreted over all possible trajectories)
$$\begin{aligned} \Big <(x_i - x_{i-1})^2\Big > = (\Delta t)^2{d_i}^2 \end{aligned}$$
$$\begin{aligned} \Big <\delta (x_i-X)\Big > = P(x_i=X|H), \end{aligned}$$
for all values of \(i\) and \(X\). The first constraint recognizes the fact that the expected square displacement in one (possible infinitesimal) step is known for all times, and is equal to an arbitrary function \({d_i}^2\) times the time step. We expressed it in this form so that \(d_i\) can remain finite when taking the limit \(\Delta t \rightarrow 0\). The second constraint imposes that the static, time-independent probability distribution for the coordinate \(x\) is also known.

Here we must remark that, as we are considering the Bayesian view of probability, the last constraint can always be fulfilled: \(P(x_i=X|H)\) is a model comprising our knowledge of \(x_i\)under the state H, not a physical property of the phenomenon. Different states of information \(H\) may assign different probability distributions \(P(x_i=X|H)\), and the “quality” of this information will be reflected in the resulting potential energy function emerging from this formalism (as will be seen later).

The probability distribution function for \(\vec {x}\) is
$$\begin{aligned} P(\vec x|H) = \frac{1}{Z(\vec \lambda )}\exp \Big (-\sum _{i=0}^{n-1}\frac{\lambda _i}{(\Delta t)^2} (x_i-x_{i-1})^2-\sum _{i=0}^{n-1}\int dX \mu (X)\delta (x_i-X)\Big ), \end{aligned}$$
where \(\lambda _i\) and \(\mu (X)\) are Lagrange multipliers associated to the constraints in Eqs. 9 and 10, respectively.
After integrating the Dirac delta function, Eq. 11 becomes
$$\begin{aligned} P(\vec x|H) = \frac{1}{Z(\vec \lambda )}\exp \Big (-\sum _{i=0}^{n-1}\frac{\lambda _i}{(\Delta t)^2} (x_i-x_{i-1})^2-\sum _{i=0}^{n-1}\mu (x_i)\Big ). \end{aligned}$$
This is the probability of the particle taking a well-defined discretized trajectory \(\vec x\), and is precisely the solution of a Maximum Entropy problem with \(n\) degrees of freedom and \(n\) Lagrange multipliers \(\lambda _i\) (plus the function \(\mu \)).
The most probable trajectory for the particle follows a minimum action principle. Indeed, if we define \(m_k = 2\lambda _k\) and \(\Phi (x) = -\mu (x)\), we recover, from Eq. 12,
$$\begin{aligned} P(\vec {x}|H) = \frac{1}{Z}\exp \Big (-A(\vec x)\Big ) \end{aligned}$$
where a discretized version of the classical action
$$\begin{aligned} A(\vec x) = \sum _{i=0}^{n-1}\Big [\frac{1}{2}m_i{\dot{x}_i}^2 - \Phi (x_i)\Big ] \end{aligned}$$
appears in the exponential. Clearly, Eqs. 13 and 14 in the continuum limit become
$$\begin{aligned} P[x(t)|H] = \frac{1}{Z}\exp \Big (-A[x(t)]\Big ) \\ \nonumber A[x(t)] = \int dt \mathcal {L}(x(t), \dot{x}(t); t), \end{aligned}$$
and it follows that the most probable continuous trajectory \(x(t)\) is the one that extremizes the classical action \(A[x(t)]\) with Lagrangian
$$\begin{aligned} \mathcal {L}(x, \dot{x}; t) = \frac{1}{2}m(t)\dot{x}^2-\Phi (x) \end{aligned}$$
and associated Hamiltonian
$$\begin{aligned} \mathcal {H}(x, p; t)=\frac{p^2}{2m(t)}+\Phi (x). \end{aligned}$$
This is the formalism of Classical Mechanics for a particle with time-dependent mass \(m(t)\) subjected to a potential energy \(\Phi (x)\). Both \(m\) and \(\Phi \) have emerged from the Lagrange multipliers associated with the constraints on the expected square of the step and the probability distribution of the coordinate, respectively. Thus we can say the following: whenever the information about the expected square of the step is important, the particle acquires mass, and whenever the information about which regions are more probable in space becomes important, the particle is subjected to a potential energy.
As Eq. 12 is a MaxEnt solution, Eq. 4 holds as
$$\begin{aligned} \Big <\nabla \cdot \vec {v}(\vec {x})\Big > = \sum _{i=0}^{n-1}\frac{\lambda _i}{(\Delta t)^2}\Big <\vec {v}(\vec {x})\cdot \nabla (x_i-x_{i-1})^2\Big > + \sum _{i=0}^{n-1}\Big <\vec {v}(\vec {x})\cdot \nabla \mu (x_i)\Big >, \end{aligned}$$
with \(\vec {v}\) an arbitrary differentiable vector field, of our choosing. If we choose \(\vec {v}\) such that it has a single component \(k\), i.e. \(v_i=\delta _{i,k}\omega (\vec {x})\) with \(\omega \) an arbitrary scalar field, we obtain
$$\begin{aligned} \begin{aligned} \Big <\frac{\partial \omega }{\partial x_k}\Big > = \sum _{i=0}^{n-1}\frac{\lambda _i}{(\Delta t)^2}\Big <\omega (\vec {x})\cdot 2(x_i-x_{i-1})(\delta _{i,k}-\delta _{i-1,k})\Big > + \sum _{i=0}^{n-1}\Big <\omega (\vec {x})\cdot \mu '(x_i)\delta _{i,k}\Big > \\ =\frac{1}{(\Delta t)^2}\Big <2\omega (\vec {x})\Big [\lambda _k(x_k-x_{k-1})-\lambda _{k+1}(x_{k+1}-x_k)\Big ]\Big > + \Big <\omega (\vec {x})\mu '(x_k)\Big >. \end{aligned} \end{aligned}$$
But recalling that the discrete forward derivative is
$$\begin{aligned} \dot{a}_i \approx \frac{a_{i+1}-a_i}{\Delta t}, \end{aligned}$$
we can re-write Eq. 19 as
$$\begin{aligned} \Big <\frac{\partial \omega }{\partial x_k}\Big > = -\Big <\omega \Big (\dot{p}_k + \mu '(x_k)\Big )\Big >, \end{aligned}$$
where \(p_k=m_k\dot{x}_k\).
Considering \(\omega =1\) we finally obtain
$$\begin{aligned} \Big <\dot{p}_k\Big > = -\Big <\Phi '(x_k)\Big >. \end{aligned}$$
which, in the continuous limit (\(\Delta t \rightarrow 0\)), becomes
$$\begin{aligned} \Big <\dot{p}(t)\Big > = -\Big <\Phi '(x(t))\Big >, \end{aligned}$$
the expectation value of Newton’s second law with momentum \(p=m\dot{x}\) and potential energy \(\Phi (x)\).

In appendix A we explore the validity of some aspects of the canonical formalism, namely the Poisson bracket, for the expectation over trajectories.

4 Concluding Remarks

We have found that two simple constraints are sufficient to recover Newton’s second law in expectation for the probable trajectories of a particle. The first constraint, on the step size as a function of time, leads to the existence of an inertial mass \(m(t)\) proportional to the Lagrange multiplier \(\lambda (t)\). To understand the meaning of this, remember that for any variational problem solved using Lagrange multipliers, the larger the value of the multiplier, the more restrictive (and therefore more relevant) the constraint. An irrelevant constraint has always a vanishing multiplier. As Jaynes [13] (p. 945) clearly states, “The Lagrange multipliers \(\lambda _k\) in the MAXENT formalism have therefore a deep meaning: \(\lambda _k\) is the ’potential’ of the datum \(R'_k\), that measures how important a constraint it represents.”

Now we motivate the following principle: constraints related to conserved quantities are always more relevant. For instance, this explains the fact that the canonical ensemble in equilibrium statistical mechanics is correctly derived just from a single constraint, the energy or expectation of the Hamiltonian, which is an integral of motion. Another illustration is the following: suppose we are trying to recover the trajectory of a particle from information about the distance to a particular point. If this distance is a constant, this is enough to isolate a unique trajectory, the circle. If we only know that the distance varies between \(r_1\) and \(r_2\), the number of compatible trajectories will increase with \(\Delta r=r_2-r_1\), thus the strength of the constraint will correspondingly decrease with increasing \(\Delta r\).

Given the earlier discussion, the closer \(d_i^2\) is to be a conserved quantity, the more relevant the first constraint is. In this case, \(\lambda (t)\) is large and therefore, \(m(t)\) is also large. Conversely, if the value of \(m\) is small, this means \(\lambda (t)\) is small and therefore \(d_i^2\) has larger fluctuations. In the continuous limit it is the instantaneous speed that fluctuates (there is a non-zero acceleration). This embodies the idea of inertia, and is reminiscent of the ideas of Smolin [14] and of Nelson [15] about inertia being inversely proportional to the size of quantum fluctuations.


GG and SD thank Jorge Zanelli for useful conversations at the beginning of this work. DG gratefully acknowledges the access to resources provided by Grupo de Nano Materiales (Departamento de Física, Facultad de Ciencias, Universidad de Chile). SD acknowledges funding from FONDECYT 1140514.

Copyright information

© Springer Science+Business Media New York 2014

Authors and Affiliations

  • Diego González
    • 1
  • Sergio Davis
    • 1
  • Gonzalo Gutiérrez
    • 1
  1. 1.Grupo de Nanomateriales, Departamento de Física, Facultad de CienciasUniversidad de ChileSantiagoChile

Personalised recommendations