Abstract
We show that the formulations of non-relativistic quantum mechanics can be derived from an extended least action principle. The principle can be considered as an extension of the least action principle from classical mechanics by factoring in two assumptions. First, the Planck constant defines the minimal amount of action a physical system needs to exhibit during its dynamics in order to be observable. Second, there is constant vacuum fluctuation along a classical trajectory. A novel method is introduced to define the information metrics to measure additional observability due to vacuum fluctuations, which is then converted to an additional action through the first assumption. Applying the variational principle to minimize the total actions allows us to recover the basic quantum formulations including the uncertainty relation and the Schrödinger equation in the position representation. In the momentum representation, the same method can be applied to obtain the Schrödinger equation for a free particle while further investigation is still needed for a particle with an external potential. Furthermore, the principle brings in new results on two fronts. At the conceptual level, we find that the information metrics for vacuum fluctuations are responsible for the origin of the Bohm quantum potential. Even though the Bohm potential for a bipartite system is inseparable, the underlying vacuum fluctuations are local. Thus, inseparability of the Bohm potential does not justify a non-local causal relation between the two subsystems. At the mathematical level, quantifying the information metrics for vacuum fluctuations using more general definitions of relative entropy results in a generalized Schrödinger equation that depends on the order of relative entropy. The extended least action principle is a new mathematical tool. It can be applied to derive other quantum formalisms such as quantum scalar field theory.
Similar content being viewed by others
Avoid common mistakes on your manuscript.
1 Introduction
Although quantum mechanics has been extensively verified experimentally, it still faces challenges to answer many fundamental questions. For instance, is probability amplitude, or wavefunction, just a mathematical tool or associated with ontic physical property? What is the meaning of wavefunction collapse during measurement? Does quantum entanglement imply non-local causal connection among entangled systems? The last question has been the source of contentions in understanding the EPR thought experiment [2] and Bell inequality [3]. These questions motivate the next level of reformulation of quantum mechanics. With the advancements of quantum information and quantum computing [4, 5] in recent decades, physicists are searching for new foundational principles from the information perspective [6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37]. Reformulating quantum mechanics based on information principles appears promising, and we will briefly review here some of the interesting results relevant to this paper.
Zeilinger [7, 8] suggests that a foundational principle for quantum mechanics is that an elementary system carries 1 bit of information. Such principle brings novel insight on entanglement for, say, a bipartite system. Because if the 2 bits of information is exhausted in specifying joint properties of the two subsystems, then nothing can be specified for the individual subsystem. However, the question on whether entanglement is due to non-local causal effect remains unanswered. Another result that has gained considerable popularity is the interpretation of the role of wavefunction in quantum mechanics. In the information based interpretations of quantum mechanics, such as Relational Quantum Mechanics [6], QBism [11], the wavefunction in the Schrödinger equation is just a mathematical tool to hold the state of knowledge about the quantum system. There is no ontological reality associated with the wavefunction itself. This view can resolve certain paradoxes such as the EPR experiment [38].
At the mathematical formulation level, a number of theories have been proposed to derive the Schrödinger equation from information based principles. There are two categories of such reformulations. The first category of reformulation is based on pure information-theoretic principles. A recent such example is provided by Höhn [30, 31], where a concrete quantum theory for a single qubit and N-qubit from elementary rules on an observer’s information acquisition is successfully constructed. The limitation of such a reconstruction is that the connection to classical mechanics is not clearly shown. It only shows that an unitary time evolution operator governs the Schrödinger equation. The concrete form of Hamiltonian in the Schrödinger equation cannot be derived. The second category is based on classical mechanics, then adds additional information based variables into the reformulation. Reginatto first shows that by adding a term related to Fisher information in the least action principle, the Schrödinger equation can be obtained [37]. Later the Fisher information term is derived based on a postulate of exact uncertainty relation [43]. Various approaches based on entropy extremization are also proposed to derive quantum mechanics. The entropic dynamics [34, 35] attempts to extract quantum mechanics as an application of the methods of inference from maximizing Shannon entropy. Another variation approach based on relative entropy is constructed to recover stochastic mechanics which in turn can lead to the Schrödinger equation [50]. The limitation for the entropy extremization approaches in [34, 35] and [50] is their dependency on the stochastic mechanics as underlying physical model [47], which suffers from the concerns of hidden variables such as osmotic velocity, and its difficulty to explain non-local behavior of multi-particle systems [48].
We are more interested in the second category of reformulation because of its advantage of providing a clear connection between classical mechanics and quantum mechanics. This allows one to understand where quantumness originates from an information perspective. The purpose of the present work is to continue such reformulations but at a more fundamental level in order to avoid the limitations described above. At the center of our investigation effort is the extended least action principle. We assume a quantum system experiences vacuum fluctuations constantly. If we want to apply the least action principle, the challenge is how to calculate the additional action due to the vacuum fluctuations besides the action for a classical trajectory. To solve the problem, we further assume that a quantum system must manifest a minimal amount of action effort that is determined by the Planck constant in order to be observable. The challenge is then converted into finding the proper information metrics to measure the observable information due to vacuum fluctuation. As the main contribution of this paper, a novel method is introduced to calculate this information metric, which enables the extension of the least action principle for a quantum system. The detailed physical motivations of the extended least action principle and its underlying assumptions are described in Sect. 2.
From the extended least action principle, a series of results are obtained. First, by recursively applying the extended least action principle in an infinitesimal time interval and an accumulated time interval, the uncertainty relation and the Schrödinger equation are recovered; Although similar results have been obtained in other research works [34, 35, 37, 43, 44], what is novel here is the simplicity and cleanness. There are no arbitrary constants or Lagrangian multipliers introduced, and no additional postulates needed. The same method can be applied in the momentum representation to obtain the Schrödinger equation in momentum representation for a free particle. Imposing a no preferred representation assumption results in the transformation theory between position and momentum representations. Second, we will show that variation of the information metrics for vacuum fluctuations gives the Bohm quantum potential. The vacuum fluctuations are assumed to be local so that for a bipartite system, the vacuum fluctuations for the two subsystems are independent from each other. However, the corresponding information metrics, and consequently the Bohm quantum potential, for the two subsystems are inseparable in general. This suggests that the inseparability of Bohm quantum potential does not necessarily justify a non-local underlying mechanism. Third, we will demonstrate that the extended least action principle can be a mathematical tool to produce new results that were not reported in other research literature. By quantifying the information metrics for vacuum fluctuations using more general definitions of relative entropy such as the Rényi or Tsallis divergence, we obtain a generalized Schrödinger equation. The applicability of the generalized Schrödinger equation needs further investigation, but the equation is legitimate from the information-theoretic perspective.
Extending the least action principle in classical mechanics to quantum mechanics not only shows clearly how classical mechanics becomes quantum mechanics, but also opens up a new mathematical toolbox. Indeed, the quantum scalar field theory can be obtained as well from the principle [55].
The rest of the article is organized as follows. First we describe in detail how the extended least action principle is constructed and what the underlying assumptions are. Then we show how the basic quantum theory is recovered. This follows by the derivation of a generalized Schrödinger equation not reported in earlier research literature. Next, we analyze the locality of vacuum fluctuations and its implications to the Bohm quantum potential. We then conclude the article after comprehensive discussions and comparisons to previous relevant research works.
2 The Extended Least Action Principle
The first assumption to make here is that there are vacuum fluctuations that a quantum system will be constantly experiencing. It is not our intention here to investigate the origin, or establish a physical model, of such vacuum fluctuations. Instead, we make a minimal number of assumptions on the underlying physical model, only enough so that we can apply the variation principle. The advantage of this approach is to avoid keeping track of physical details that are irrelevant for predicting future measurement results. It also avoids the potential need of introducing hidden variables such as the osmotic velocity in stochastic mechanics. The vacuum fluctuation is assumed to be local. This means that for a composite system, the fluctuation of each subsystem is independent of each other. The vacuum fluctuation is also assumed to be completely random such that the mean of fluctuations is zero but the variance is non-zero. We state the assumption as following:
Assumption 1 – A quantum system experiences vacuum fluctuations constantly. The fluctuations are completely random, and local.
Now consider a particle with mass m moving from position A to B. The motion of the particle is a combination of two independent components, the classical trajectory due to external potential, and the random vacuum fluctuations around any given position along the classical path. Due to the vacuum fluctuations, there is no definite trajectory. How to construct a principle based on information related metrics that can derive the laws of dynamics for this physical scenario?
In classical mechanics, the path trajectory follows the laws of dynamics derived through the least action principle. Thus, it is natural to consider recasting the least action principle to be based on information related metrics such that it can be extended to derive quantum mechanics. The action for the classical trajectory can be calculated as usual, the challenge here is to calculate the additional action due to vacuum fluctuations since the physical details of the vacuum fluctuations is unknown. We wish to find another way to calculate this additional action. The second assumption introduced next will help this attempt. We assume that the physical object must exhibit a minimal amount of action during its dynamical motion in order to be observable or distinguishable (relative to a reference frame), and this amount of action effort is determined by the Planck constant \(\hbar\). As such, the Planck constant is a discrete unit of action for measuring the observable information. Making use of this understanding of the Planck constant inversely provides us a new way to calculate the additional action due to vacuum fluctuations. That is, even though we do not know the physical details of vacuum fluctuations, the vacuum fluctuations manifest themselves via a discrete action unit determined by the Planck constant as an observable information unit. If we are able to define an information metric that quantifies the amount of observable information manifested by vacuum fluctuations, we can then multiply the metric with the Planck constant to obtain the action associated with vacuum fluctuations.
The existence of the constant \(\hbar\) and its interpretation cannot be deduced from classical mechanics, but has to be a fundamental assumption itself, or be derived from another fundamental postulate. The existence of the Planck constant implies a fundamental physical limitation that is not recognized in classical mechanics. Indeed, Rovelli has pointed out [6] that his postulate on limited information for a quantum system implies the existence of Planck constant. This implies that the Planck constant plays a role to connect physical variables to certain information metrics. But it is unclear how \(\hbar\) is used to measure the amount of information in the subsequent reconstruction effort of quantum theory in [6]. In this paper, instead of introducing a postulate of limited information for a quantum system, we assume there is a non-zero discrete action unit to measure the degree of observability exhibited from a path trajectory with action effort S, and this unit is called the Planck constant \(\hbar\). Conceptually, our assumption is more intuitive. What we assume is that there is a lower limit to the amount of action effort that a system needs to exhibit in order to be observable or distinguishable, and such a unit of action effort is defined by the Planck constant. Formally, the assumption can be stated as,
Assumption 2 – There is a lower limit to the amount of action that a physical system needs to exhibit in order to be observable. This basic discrete unit of action effort is given by \(\hbar /2\) where \(\hbar\) is the Planck constant.
The word exhibit implies that the observability is uncovered by the movement of the physical system itself, instead of an actual measurement.
With Assumption 2, the challenge to calculate the additional action due to vacuum fluctuation is converted to define a proper new information metric If, which measures the additional distinguishable, hence observable, information exhibited due to vacuum fluctuations. Even though we do not know the physical details of vacuum fluctuations (except that as Assumption 1 states, these vacuum fluctuations are completely random and local), the problem becomes less challenged since there are information-theoretic tools available. The first step is to assign a transition probability distribution due to vacuum fluctuation for an infinitesimal time step at each position along the classical trajectory. The distinguishability then can be defined as the information distance between the transition probability distribution and a uniform probability distribution. Uniform probability distribution is chosen here as reference to reflect the complete randomness of vacuum fluctuations. In information theory, the common information metric to measure the information distance between two probability distributions is relative entropy. Relative entropy is more fundamental to Shannon entropy since the latter is just a special case of relative entropy when the reference probability distribution is a uniform distribution. But there is a more important reason to use relative entropy. As shown in later section, when we consider the dynamics of the system for an accumulated time period, we assume the initial position is unknown but is given by a probability distribution. This probability distribution can be defined along the position of classical trajectory without vacuum fluctuations, or with vacuum fluctuations. The information distance between the two probability distributions gives the additional distinguishability due to vacuum fluctuations. It is again measured by a relative entropy. Thus, relative entropy is a powerful tool allowing us to extract meaningful information about the dynamic effects of vacuum fluctuations. Concrete form of \(I_f\) will be defined later as a functional of Kullback–Leibler divergence \(D_{KL}\), \(I_f:=f(D_{KL})\), where \(D_{KL}\) measures the information distances of different probability distributions caused by vacuum fluctuations. Thus, the total action due to both classical trajectory and vacuum fluctuation is
where \(S_c\) is the classical action. Quantum theory can be derived through a variation approach to minimize such a functional quantity, \(\delta S_t=0\). When \(\hbar \to 0\), \(S_t=S_c\). Minimizing \(S_t\) is then equivalent to minimizing \(S_c\), resulting in the dynamics laws of classical mechanics. However, in quantum mechanics, \(\hbar \ne 0\), the contribution from \(I_f\) must be included when minimizing the total action. We can see \(I_f\) is where the quantum behaviors of a system come from. These ideas can be condensed as the
Extended Principle of Least Action – The law of physical dynamics for a quantum system tends to exhibit as little as possible the action functional defined in (1).
Alternatively, we can interpret the extended least action principle more from an information perspective by rewriting (1) as
where \(I_t=2S_t/\hbar\). Denote \(I_p=2S_c/\hbar\), which measures the amount of \(S_c\) using the discrete unit \(\hbar/2\). \(I_p\) is not a conventional information metric but can be considered carrying meaningful physical information. To see this connection, recall that the classical action is defined as an integral of Lagrangian over a period of time along a path trajectory of a classical object. There are two aspects to understand the action functional. In classical mechanics, the path trajectory can be traced, measured, or observed. Given two fixed end points, the longer of the path trajectory, the larger value of the action. It indicates (1) the more dynamic effort the system exhibits; and (2) the easier to trace the path and distinguish the object from the background reference frame, or in other words, the more physical information available for potential observation. Thus, action \(S_c\) not only quantifies the dynamic effort of the system, but also is associated with the detectability, or observability, of the physical object during the dynamics along the path. In classical mechanics, we focus on the first aspect via the least action principle, and derive the law of dynamics from minimizing the action effort. The second aspect is not useful since we cannot quantify the intuition that \(S\) is associated with the observability of the physical object. One reason is that there is no natural unit of action to convert \(S\) into a information related metric. The introduction of the Planck constant in Assumption 2 helps to quantify this intuition. We call \(I_p\) the observability of the classical trajectory. Similarly, \(I_f\) measure the distinguishable information of the probability distributions with and without vacuum fluctuations. Thus, \(I_t\) is the total observable information. With the expression above for \(I_t\), the extended least action principle can be re-stated as
Principle of Least Observability – The law of physical dynamics for a quantum system tends to exhibit as little as possible the observability defined in (1A).
Mathematically, there is no difference between (1) and (1A) when applying the variation principle to derive the laws of dynamics. The form of \(S_t\) in terms of action is more accessible to the physics community. However, the form of \(I_t\) in terms of observability seems conceptually more generic. We will leave the exact interpretations of the principle aside and use the two interpretations interchangeable in this paper. The key point to remember is that the Planck constant connects the physical action to metrics related to observable information in either interpretation.
Independent from the least observability principle, we need another assumption similar to the no preference of reference frame postulate in special relativity. The observable information of the physical dynamics can be expressed in different representations. Loosely speaking, a representation is characterized by a set of variables with their values acting like coordinates to describe the properties of the system [41]. For instance, the position representation uses position variables to describe the physical properties of the system. Similarly, the momentum representation uses momentum variables to describe the physical properties of the system. We assume that the total observable information extracted in a representation is a complete description of the dynamics of the system. The physical laws derived in other representations do not offer additional power of predictions for future measurement results. Consequently, the physical laws for the dynamics of the system derived from different representations must be equivalent. As shown later, from the same least observability principle, we can derive the Schrödinger equation independently in both position and momentum representations. But we demand the results must be equivalent. In summary, we have
Assumption 3 – There is no preferred representation for the law of physics derived in each representation.
Assumption 3 will lead the transformation formulation between position and momentum representations.
With the least observability principle and the underlying assumptions explained, we now proceed to describe the results from applying this principle.
3 Basic Quantum Formulation
3.1 Dynamics of Vacuum fluctuations and The Uncertainty Relation
First we consider the dynamics of a system an infinitesimal time internal \(\Delta t\). Suppose we choose a reference frame such that the dynamics of the system under study is only due to the random vacuum fluctuations. That is, if we ignore vacuum fluctuations, the system is at rest relative to such a referece frame. This also means the external potential is neglected for the time being. Define the probability for the system to transition from a 3-dimensional space position \({\textbf{x}}\) to another position \({\textbf{x}}+{\textbf{w}}\), where \({\textbf{w}}=\Delta {\textbf{x}}\) is the displacement in 3-dimensional space due to fluctuations, as \(\wp ({\textbf{x}}+{\textbf{w}}|{\textbf{x}})d^3{\textbf{w}}\). The expectation value of classical action is \(S_c=\int \wp ({\textbf{x}}+{\textbf{w}}|{\textbf{x}})Ld^3{\textbf{w}}dt\). Since we only consider the vacuum fluctuations, the Lagrangian L only contains the kinetic energy, \(L=\frac{1}{2}m{\textbf{v}}\cdot {\textbf{v}}\). For an infinitesimal time internal \(\Delta t\), one can approximate the velocity \({\textbf{v}}={\textbf{w}}/\Delta t\). This gives
The information metrics \(I_f\) is supposed to capture the additional revelation of information due to vacuum fluctuations. Thus, it is naturally defined as a relative entropy, or more specifically, the Kullback-Leibler divergence, to measure the information distance between \(\wp ({\textbf{x}}+{\textbf{w}}|{\textbf{x}})\) and some prior probability distribution. Since the vacuum fluctuations are completely random, it is intuitive to assume the prior distribution with maximal ignorance [35, 42]. That is, the prior probability distribution is a uniform distribution \(\mu\).
Combined with (2), the total amount of information defined in (1A) is
Taking the variation \(\delta I = 0\) with respect to \(\wp\) gives
Since \(\delta \wp\) is arbitrary, one must have
The solution for \(\wp\) is
where Z is a normalization factor that absorbs factor \(\mu e^{-1}\). Equation (4) shows that the transition probability density is a Gaussian distribution. The variance \(\langle w_i^2\rangle = \hbar \Delta t/2m\), where \(i\in \{1, 2, 3\}\) denotes the spatial index. Recalling that \(w_i/\Delta t = v_i\) is the approximation of velocity due to the vacuum fluctuations, we denote \(p_i^f=mv_i=mw_i/\Delta t\). Since \(\langle p_i^f\rangle \propto \langle w_i\rangle = 0\), then \(\langle (p_i+p_i^f)^2-p_i^2\rangle = \langle (p_i^f)^2\rangle\), and \(p_i^f\) can be considered as the fluctuations of momentum on top of the classical momentum. That is, \(\Delta p_i = p_i^f= mw_i/\Delta t\). Rearranging \(\langle w_i^2\rangle = \hbar \Delta t/2\,m=\langle (\Delta x_i)^2\rangle\) gives
This relation is first proposed by Hall and Reginatto as an exact uncertainty relation [43, 44], where it is postulated with mathematical arguments. Here we derive it from the variation principle of minimizing the amount of information due to vacuum fluctuations. Now squaring both sides of (5) and applying the Cauchy-Schwarz inequality gives
Taking square root of both sides results in
3.2 Derivation of The Schrödinger Equation
We now turn to the dynamics for a cumulative period from \(t_A\rightarrow t_B\). Suppose a typical reference frame is chosen such that if the vacuum fluctuations are ignored, the system moves along a classical path trajectory. External potential is considered here with such a reference frame. In classical mechanics, the equation of motion is described by the Hamilton-Jacobi equation,
Suppose the initial condition is unknown, and define \(\rho ({\textbf{x}}, t)\) as the probability density for finding a particle in a given volume of the configuration space. The probability density must satisfy the normalization condition \(\int \rho ({\textbf{x}}, t) d^3{\textbf{x}} = 1\), and the continuity equation
The pair \((S, \rho )\) completely determines the motion of the classical ensemble. As pointed out by Hall and Reginatto [43, 44], the Hamilton-Jacobi equation, and the continuity equation, can be derived from classical action
through fixed point variation with respect to \(\rho\) and S, respectively. Appendix A gives a more rigorous proof of (8) using extended canonical transformation method. Note that \(S_c\) and S are different physical variables. As shown in Appendix A, \(S_c\) can be considered as the ensemble average of classical action while S is a variable introduced in a canonical transformation that satisfied \({\textbf{p}}=\nabla S\). The degree of observability for the motion of this ensemble between the two fixed points is \(I_p = 2S_c/\hbar\) according to Assumption 2.
To define the information metrics for the vacuum fluctuations, \(I_f\), we slice the time duration \(t_A\rightarrow t_B\) into N short time steps \(t_0=t_A, \ldots , t_j, \ldots , t_{N-1}=t_B\), and each step is an infinitesimal period \(\Delta t\). In an infinitesimal time period at time \(t_j\), the particle not only moves according to the Hamilton-Jacobi equation but also experiences random fluctuations. The probability density \(\rho ({\textbf{x}}, t_j)\) alone is insufficient to encode all the observable information. Instead, we need to consider \(\rho ({\textbf{x}}+{\textbf{w}}, t_j)\) for all possible \({\textbf{w}}\). Such additional revelation of distinguishability is due to the vacuum fluctuations on top of the classical trajectory. The proper measure of this distinction is the information distance between \(\rho ({\textbf{x}}, t_j)\) and \(\rho ({\textbf{x}}+{\textbf{w}}, t_j)\). A natural choice of such information measure is \(D_{KL}(\rho ({\textbf{x}}, t_j) || \rho ({\textbf{x}}+{\textbf{w}}, t_j))\). We then take the average of \(D_{KL}\) over \({\textbf{w}}\). Denoting \(\langle \cdot \rangle _w\) the expectation value, and summing up such quantity for each infinitesimal time interval, lead to the definition
Notice that \(\wp ({\textbf{x}}+{\textbf{w}}| {\textbf{x}})\) is a Gaussian distribution given in (4). When \(\Delta t\) is small, only small \({\textbf{w}}\) will contribute to \(I_f\). As shown in Appendix B, when \(\Delta t\rightarrow 0\), \(I_f\) turns out to be
Eq. (11) contains the term related to Fisher information for the probability density [39]. Some literature directly adds Fisher information in the variation method as a postulate to derive the Schrödinger equation [37]. But (11) bears much more physical significance than Fisher information. First, it shows that \(I_f\) is proportional to \(\hbar\). This is not trivial because it avoids introducing additional arbitrary constants for the subsequent derivation of the Schrödinger equation. More importantly, defining \(I_f\) using the relative entropy opens up new results that cannot be obtained if \(I_f\) is defined using Fisher information, because there are other generic forms of relative entropy such as Rényi divergence or Tsallis divergence. As will be seen later, by replacing the Kullback–Leibler divergence with Rényi divergence, one will obtain a generalized Schrödinger equation. Other authors also derive (11) using mathematical arguments [43, 44], while our approach is based on intuitive information metrics. With (11), the total degree of observability is
Variation of I with respect to S gives the continuity equation, while variation with respect to \(\rho\) leads to
The last term is the Bohm quantum potential [45]. The Bohm potential is considered responsible for the non-locality phenomenon in quantum mechanics [46]. Historically, its origin is mysterious. Here we show that it originates from the information metrics related to relative entropy, \(I_f\). The physical implications of this result will be discussed later. Defined a complex function \(\Psi =\sqrt{\rho }e^{iS/\hbar }\), the continuity equation and the extended Hamilton-Jacobi equation (13) can be combined into a single differential equation,
which is the Schrödinger Equation.
In summary, by recursively applying the same least observability principle in two steps, we recover the uncertainty relation and the Schrödinger equation. The first step is for a short time period to obtain the transitional probability density due to vacuum fluctuations; Then the second step is for a cumulative time period to obtain the dynamics law for \(\rho\) and S. The applicability of the same variation principle shows the consistency and simplicity of the theory, although the form of Lagrangian is different in each step. In the first step, the Lagrangian only contains the kinetic energy \(L=m{\textbf{v}}\cdot {\textbf{v}}/2\), which is in the form of \(L=\dot{{\textbf{x}}}\cdot {\textbf{p}} - H\) where H is the classical Hamiltonian. In the second step, we use a different form of classical Lagrangian \(L^\prime = \partial S/\partial t + H\). As shown in Appendix A, L and \(L^\prime\) are related through an extended canonical transformation. The choice of Lagrangian L or \(L^\prime\) does not affect the form of Lagrange’s equations. Here we choose \(L^\prime = \partial S/\partial t + H\) as the classical Lagrangian in the second step in order to use the pair of variables \((\rho , S)\) in the subsequent variation procedure.
To demonstrate the simplicity of the least observability principle, in Appendix C, we apply the principle to derive the Schrödinger equation in an external electromagnetic field. The interesting point here in this example is that the external electromagnetic field has no influence on the vacuum fluctuations. This reconfirms that the information metrics \(I_f\) is independent of the external potential.
3.3 Transformation Between Position and Momentum Representations
The classical action \(S_c\) and information metrics \(I_f\) in (1) are so far defined in the position representation, i.e., using position x as variable. However, there can be other observable quantities to serve as representation variables. Momentum is one of such representation variables. We can find the proper expressions for \(S_c\) and \(I_f\) in the momentum representation, and follow the same variation principle to derive the quantum theory. By Assumption 3, one would expect the law of dynamics in the momentum representation is equivalent to that in the position representation derived earlier. First let’s consider the effect of fluctuations in a short time step \(\Delta t\). The vacuum fluctuations occur not only in spatial space, but also in momentum space. Denote the transition probability density for the vacuum fluctuations as \(\tilde{\wp }({\textbf{p}}+\mathbf {\omega }|{\textbf{p}})\) where \(\mathbf {\omega }=\Delta {\textbf{p}}\) is due to the momentum fluctuations. The classical Lagrangian without considering external potential is \(L=({\textbf{p}}+\mathbf {\omega })\cdot ({\textbf{p}}+\mathbf {\omega })/2m\), and the average classical action is
Since \(\langle \mathbf {\omega } \rangle =0\), the only term contributed in the variation with respect to \(\tilde{\wp }\) is the one with \(\langle \mathbf {\omega }\cdot \mathbf {\omega } \rangle\). Similar to the definition of \(I_f\) in the position representation, here we define \(I_f=:D_{KL}(\tilde{\wp }({\textbf{p}}+\mathbf {\omega }|{\textbf{p}})||{\tilde{\mu }})\) where \({\tilde{\mu }}\) is a uniform probability density in the momentum space. Plugging all these expressions into (1) and let \(\delta S_t=0\) with respect to \(\tilde{\wp }\), one will obtain
and \(Z'\) is the normalization factor. The variance \(\langle \omega _i^2 \rangle =\langle (\Delta p_i)^2\rangle = m\hbar /2\Delta t\), where i is the spatial index. This is also a Gaussian distribution but with a significant difference from (4) in the position representation. That is, when \(\Delta t\rightarrow 0\), \(\langle (\Delta p_i)^2\rangle \rightarrow \infty\) while \(\langle (\Delta x_i)^2\rangle \rightarrow 0\). This implies that when \(\Delta t\rightarrow 0\), the Gaussian distribution \(\tilde{\wp }\) becomes a uniform distribution. Note that \(\Delta p_i\Delta t=m\Delta x_i\), rearranging \(\langle (\Delta p_i)^2\rangle = m\hbar /2\Delta t\) gives the same uncertainty relation in (5).
For illustration purposes, we will only derive the momentum representation of the Schrödinger equation for a free particle. Let \(\varrho ({\textbf{p}}, t)\) be the probability density in the momentum representation, the classical action is
\(I_f\) is defined similarly to (9) as
However, when \(\Delta t\rightarrow 0\), \(\tilde{\wp }({\textbf{p}}+\mathbf {\omega }|{\textbf{p}})\) becomes an uniform distribution, \(I_f \rightarrow \infty\) independent of \(\varrho\), as shown in Appendix E. This implies that \(I_f\) does not contribute when taking variation with respect to \(\varrho\). Thus,
Variation with respect to \(\varrho\) gives
and variation with respect to S gives \(\partial \varrho /\partial t = 0\). Defined \(\psi =\sqrt{\varrho }e^{i(S/\hbar )}\), the two differential equations are combined into a single differential equation,
which is the Schrödinger equation for a free particle in the momentum representation. Recalled that in the position representation, the Schrödinger equation for a free particle is \(i\hbar \partial \Psi /\partial t = [-(\hbar ^2/2m)\nabla ^2]\Psi\). The two equations are derived independently from the variation of dynamics information defined in (1). Assumption 3 demands that the two equations must be equivalent. To meet this requirement, one sufficient condition is that the two wavefunctions are transformed through
This transformation justifies the introduction of operator \({\hat{p}}_i=:-i\hbar \partial /\partial x_i\) to represent momentum in the position representation, because using (18), one can verify that the expectation value of momentum \(\langle \psi ({\textbf{p}}, t)|p_i|\psi ({\textbf{p}}, t)\rangle\) can be computed as \(\langle \Psi ({\textbf{x}}, t) |{\hat{p}}_i|\Psi ({\textbf{x}}, t)\rangle\). Introduction of the momentum operator \({\hat{p}}_i=:-i\hbar \partial /\partial x_i\) leads to the commutation relation \([{\hat{x}}_i, {\hat{p}}_i]=i\hbar\).
Suppose in the momentum representation there is a different action unit \(\hbar _p \ne \hbar\). Repeating the same variation procedure gives a Schrödinger equation for a free particle
To satisfy Assumption 3, the transformation function (18) needs to be modified as
where \(\beta = \hbar _p/\hbar\). Consequently, \([{\hat{x}}_i, {\hat{p}}_i]=i\hbar \sqrt{\beta }\). It is clear that the assumption of having a different constant \(\hbar _p \ne \hbar\) in momentum representation is incompatible with the well established Dirac commutation relation \([{\hat{x}}_i, {\hat{p}}_i]=i\hbar\). By accepting \([{\hat{x}}_i, {\hat{p}}_i]=i\hbar\), one must reject \(\hbar _p \ne \hbar\).
Deriving the Schrödinger equation, from the least observability principle, in the momentum representation with an external potential \(V({\textbf{x}})\ne 0\) is a much more complicated task. However, the theory for a free particle is sufficient to demonstrate why the Planck constant must be the same in both position and momentum representations.
4 The Generalized Schrödinger Equation
The term \(I_f\) is supposed to capture the additional distinguishability exhibited by the vacuum fluctuations, and is defined in (9) as the summation of the expectation values of Kullback–Leibler divergence between \(\rho ({\textbf{x}},t)\) and \(\rho ({\textbf{x}}+{\textbf{w}},t)\). However, there are more generic definitions of relative entropy, such as the Rényi divergence [51, 53]. From an information theoretic point of view, there is no reason to exclude alternative definitions of relative entropy. Suppose we define \(I_f\) based on Rényi divergence,
Parameter \(\alpha \in (0,1)\cup (1, \infty )\) is called the order of Rényi divergence. When \(\alpha \rightarrow 1\), \(I_f^{\alpha }\) converges to \(I_f\) as defined in (9). In Appendix D, we show that using \(I_f^{\alpha }\) and following the same variation principle, we arrive at a similar extended Hamilton-Jacobi equation as (13),
with an additional coefficient \(\alpha\) appearing in the Bohm quantum potential term. Defined \(\Psi ^\prime =\sqrt{\rho }e^{iS/\sqrt{\alpha }\hbar }\), the continuity equation and the extended Hamilton-Jacobi equation (21) can be combined into an equation similar to the Schrödinger equation, see Appendix D,
When \(\alpha =1\), the regular Schrödinger equation is recovered as expected. Equation (22) gives a family of linear equations for each order of Rényi divergence.
Interestingly, if we define \(\hbar _{\alpha }= \sqrt{\alpha }\hbar\), then \(\Psi ^\prime =\sqrt{\rho }e^{iS/\hbar _{\alpha }}\), and (22) becomes the same form of the regular Schrödinger equation with replacement of \(\hbar\) with \(\hbar _{\alpha }\). It is as if there is an intrinsic relation between the order of Rényi divergence and the Plank constant. This remains to be investigated further. On the other hand, if the wavefunction is defined as usual without the factor \(\sqrt{\alpha }\), \(\Psi ^\prime =\sqrt{\rho }e^{iS/\hbar }\), it will result in a nonlinear Schrödinger equation. This implies that the linearity of Schrödinger equation depends on how the wavefunction is defined from the pair of real variables \((\rho , S)\).
We also want to point out that \(I_f^{\alpha }\) can be defined using Tsallis divergence [52, 54] as well, instead of using the Rényi divergence,
When \(\Delta t\rightarrow 0\), it can be shown that the \(I_f^\alpha\) defined above converges into the same form as (D3). Hence it results in the same generalized Schrödinger equation (22).
5 Locality of Vacuum Fluctuations
Now we apply the least observability principle to a bipartite system. The ensemble average of classical action for the bipartite system is given by
In addition, we need to consider the information metric \(I_f\) for the bipartite system due to fluctuations. One of the key points in Assumption 1 is the locality of the vacuum fluctuations. The fluctuations experienced by particle A are completely independent from the fluctuations experienced by particle B. Formally, the locality of vacuum fluctuation can be defined by the separability of the joint transition probability of the bipartite system,
Extend the definition of \(I_f\) in (9) to the bipartite system:
Using (25), we show in Appendix F that when \(\Delta t \rightarrow 0\),
Variation of \(I_f\) with respect to \(\rho\) gives the Bohm quantum potential for the bipartite system, as shown in (F6) of Appendix F,
The interesting finding here is that even though the vacuum fluctuations for the two subsystems are independent from each other, \(I_f\) and the Bohm potential are inseparable in general. The inseparability depends on the inseparability of the initial condition \(\rho ({\textbf{x}}_a, {\textbf{x}}_b, 0)\). This suggests that there is no need for a non-local mechanism underlying the inseparability of the Bohm quantum potential.
The Schrödinger equation of the bipartite system is derived in Appendix F as
where \(\Psi ({\textbf{x}}_a, {\textbf{x}}_b, t) = \sqrt{\rho ({\textbf{x}}_a, {\textbf{x}}_b, t)}e^{iS({\textbf{x}}_a, {\textbf{x}}_b, t)/\hbar }\). Suppose there is no interaction between the two subsystems after \(t=0\) but the initial joint probability density at \(t=0\) is inseparable, then \(\Psi ({\textbf{x}}_a, {\textbf{x}}_b, t)\) is an entanglement state for \(t>0\). Such an entanglement state can be maintained and manifested even though the two non-interacting subsystems move away from each other. Similar to the inseparability of Bohm potential, the inseparable correlation is maintained through \(I_f\), but the underlying vacuum fluctuations are local for the two subsystems. This suggests that an inseparable correlation can be propagated through a mechanism that is local. The implication of locality of vacuum fluctuations on entanglement deserves further analysis and discussion.
6 Discussion and conclusions
6.1 Implications of Assumption 2
The interpretation of Planck constant as the discrete action unit for the degree of observability reflects a fundamental physical limit. That is, there is a lower limit to the action effort needed to exhibit observable information of the dynamical behavior of a physical system. Smaller action effort will not be observable, information exhibited by an action unit smaller than \(\hbar /2\) is indistinguishable and in-observable. In other words, the Planck constant determines the resolution (in terms of action) of the observable information for the dynamics behavior of a physical system. Historically the Planck constant was first introduced to show that energy of radiation from a black body is discrete. One can consider the discrete energy unit as the smallest unit to be distinguished, or detected, in the black body radiation phenomenon. Here, we just interpret Planck constant from an information acquisition point of view. Interestingly, the postulate in special relativity that the speed of light in vacuum is constant in all inertial reference frames reflects another limit for information propagation. As pointed out by Landau [40], the constant speed of light actually is a consequence of a fundamental physical limit that there is a limit of speed in any interaction between two systems. The speed limit also implies that propagation of physical information is not instant because information is propagated through physical media such as light. Thus, the Planck constant or the speed of light each manifests a physical limit from an information processing point of view, but from different angle.
As mentioned in the introduction section, the definition of \(I_p=2S_c/\hbar\) in Assumption 2 should not be associated with the phase of probability amplitude for a trajectory of a particle in Feynman’s path integral [1]. Fundamentally, the path integral theory does not interpret the Planck constant as the quantum of action effort to exhibit observable information. The difference of the factor 2 is purely due to mathematical reason, since in path integral \(S/\hbar\) is associated with the probability amplitude, whereas in our formulation, we deal with the variable of probability density, which is the modulus square of probability amplitude. Nevertheless, both path integral and our formulation based on the least observability principle give the same Schrödinger equation. This is because both theories start with the contribution of the classical path, then add the additional contributions due to vacuum fluctuations, but in different ways. In path integral, the summation of \(e^{iS/\hbar }\) from all possible paths for the probability amplitude effectively collects the contributions due to vacuum fluctuations. On the other hand, in the least observability principle, the effect of vacuum fluctuations is manifested through the summation of the Kullback–Leibler divergence as defined in (9).
6.2 Alternative Formulations of the Least Observability Principle
Alternatively, we can interpret the least observability principle based on Eq. (1) as minimizing \(I_f\) with the constraint of \(S_c\) being a constant, and \(\hbar /2\) simply being a Lagrangian multiplier for such a constraint. Again, mathematically, it is an equivalent formulation. In that case, Assumption 2 is not needed. Instead it will be replaced by the assumption that the average action \(S_c\) is a constant with respect to variations over \(\rho\) and S. Which assumption to use depends on which choice is more physically intuitive. We believe that the least observability principle based on Assumption 2, where the Planck constant defines the discrete unit of action effort to exhibit observable information, gives more intuitive physical meaning of the mathematical formalism.
6.3 Comparisons with Relevant Research Works
In the original paper for Relational Quantum Mechanics (RQM) [6], Rovelli proposes two postulates from information perspective. The first postulate, there is a maximum amount of relevant information that can be extracted from a system, is in the same spirit with Assumption 2. Rovelli has pointed out that his first postulate implies the existence of Planck constant. But the reconstruction effort of quantum theory in [6] does not define the meaning of information and how \(\hbar\) is used to compute the amount of information. Here we reverse the logic of the argument in [6]. We make explicit mathematical connections between \(\hbar\) and the degree of observability in (1), leading to the least observability principle to reconstruct quantum mechanics. Conceptually, we make it more clear the connection between the Planck constant and the discreteness of action effort to exhibit observable information. The second postulate in [6], it is always possible to acquire new information about a system, is motivated to explain the complementarity in quantum theory [30, 31]. This postulate appears quite counterintuitive. It is not needed in our theory in terms of explaining complementarity. Instead, we assume there is no preferred representation for physical laws, which is more intuitive. The no preferred representation assumption allows us to derive the transformation formulation between position and momentum representations, and consequently the commutative relation \([{\hat{x}}_i, {\hat{p}}_i]=i\hbar\). Other authors proposed postulates similar to the no preferred representation assumption, such as no preferred measurement [33], no preferred reference frame [32], but in very different contexts.
The entropic dynamics approach to quantum mechanics [34, 35] bears some similarity with the theory presented in this work. For instance, the formulations are carried out with two steps, an infinitesimal time step and a cumulative time period. It also aims to derive the physical dynamics by extremizing entropy. However, the entropic dynamics approach relies on another postulate on energy conservation to complete the derivation of the Schrödinger equation. The theory presented in this paper has the advantage of simplicity since it recursively applies the same least observability principle in both an infinitesimal time step and a cumulative time period. The entropic dynamics approach also requires several seemingly arbitrary constants in the formulation, while we only need the Planck constant \(\hbar\) and its meaning is clearly given in Assumption 2.
The derivation of the Schrödinger equation in Sect. 3.2 starts from (8) which is due to Hall and Reginatto [43, 44]. Mathematically, we arrive at the same extended Hamilton-Jacobi equation (13) as that in [43, 44]. However, the underlying physical foundation is very different. Hall and Reginatto assume an exact uncertainty relation (5), while in our theory (5) is derived from the least observability principle in a infinitesimal time step. We clearly show the information origin of the Bohm potential, while Hall and Reginatto derive it by assuming the random fluctuations in momentum space and the exact uncertainty relation. We also use the general definition of relative entropy for information metrics \(I_f\) and obtain the generalized Schrödinger equation, which is not possible using the methods presented in [43, 44].
6.4 Limitations
Assumption 1 makes minimal assumptions on the vacuum fluctuations, but does not provide a more concrete physical model for the vacuum fluctuations. The underlying physics for the vacuum fluctuations is expected to be complex but crucial for a deeper understanding of quantum mechanics. It is beyond the scope of this paper. The intention here is to minimize the assumptions that are needed to derive the basic formulation of quantum mechanics, so that future research can just focus on these assumptions.
Another limitation is that the Schrödinger equation in the momentum representation is only derived for a free particle. In the case that the external potential exists, the derivation will be complicated. We will leave it for future research. Thus, Assumption 3 is only applied in the case of a free particle. It remains to be confirmed if it is applicable for generic cases with external potential. However, for the purpose of demonstrating why the Planck constant must be the same in both position and momentum representation, we only need a special case of a free particle.
6.5 Conclusions
We propose an extended least action principle, or, least observability principle, to demonstrate how classical mechanics becomes quantum mechanics from the information perspective. The least observability principle extends the least action principle by factoring in two assumptions. Assumption 2 states that the Planck constant defines the lower limit to the amount of action that a physical system needs to exhibit in order to be observable. Classical mechanics corresponds to a physical theory when such a lower limit of action effort is approximated as zero. The existence of the Planck constant allows us to quantify the physical intuition that the action effort is also associated with the observability of the system dynamics. New information metrics for the additional degree of distinguishability exhibited from vacuum fluctuations are introduced. These metrics are defined in terms of relative entropy to measure the information distances of different probability distributions caused by local vacuum fluctuations. To derive quantum theory, the least observability principle seeks to minimize the degree of observability from both classical trajectory and vacuum fluctuations. Nature appears to behave as least distinguishable as possible for future observation. This principle allows us to elegantly derive the uncertainty relation between position and momentum, and the Schrödinger equations in both position and momentum representations. Adding the no preferred representation assumption, we obtain the transformation formulation between position and momentum representations. The Planck constant must be the same in different presentations in order to be compatible with the Dirac commutation relation between position and momentum.
The information metric \(I_f\) is responsible for the origin of the Bohm quantum potential. The Bohm potential is widely considered as non-local for a bipartite system. We have shown that such non-locality just reflects the inseparability of the information metrics \(I_f\) for the bipartite system. Interestingly, the inseparability of \(I_f\) is preserved and manifested through a local mechanism - the vacuum fluctuations. Thus, even though the Bohm potential is inseparable for a bipartite system, there is no non-local causal relation between the two subsystems.
Utilizing Rényi divergence in the least observability principle leads to a generalized Schrödinger equation (22) that depends on the order of Rényi divergence. Given the extensive experimental confirmations of the normal Schrödinger equation, it is inconceivable that one will find physical scenarios for which the generalized Schrödinger equation with \(\alpha \ne 1\) is applicable. However, the generalized Schrödinger equation is legitimate from an information perspective. It confirms that the least observability principle can produce new results.
Extending the least action principle in classical mechanics to the least observability principle in quantum mechanics not only illustrates clearly how classical mechanics becomes quantum mechanics, but also opens up a new mathematical toolbox. It can be applied to field theory to obtain the Schrödinger functional equation for a massive scalar field [55]. We expect other advanced quantum formulations, such as the Pauli equation for an electron with spin, can be obtained from it. Lastly, the principle brings in interesting implications on the interpretation aspects of quantum mechanics, including new insights on quantum entanglement, which will be reported separately. The perception on the speed of light and Planck constant from information acquisition is also intriguing. That is, the speed of light is the upper limit of propagating information, while the Planck constant is considered as the lower limit of action effort to exhibit observable information of a physical system.
Change history
05 July 2024
A Correction to this paper has been published: https://doi.org/10.1007/s10701-024-00782-6
References
Feynman, R.: Space–time approach to non-relativistic quantum mechanics. Rev. Mod. Phys. 20, 367 (1948)
Einstein, A., Podolsky, B., Rosen, N.: Can quantum-mechanical description of physical reality be considered complete? Phys. Rev. 47, 777–780 (1935)
Bell, J.: On the Einstein Podolsky Rosen paradox. Phys. Phys. Fizika 1, 195 (1964)
Nielsen, M.A., Chuang, I.L.: Quantum Computation and Quantum Information. Cambridge University Press, Cambridge (2000)
Hayashi, M., Ishizaka, S., Kawachi, A., Kimura, G., Ogawa, T.: Introduction to Quantum Information Science (pp. 90, 150, 152, 197). Sptinger, Berlin (2015)
Rovelli, C.: Relational quantum mechanics. Int. J. Theor. Phys. 35, 1637–1678 (1996). https://doi.org/10.1007/BF02302261
Zeilinger, A.: A foundational principle for quantum mechanics. Found. Phys. 29(4), 631–643 (1999)
Brukner, C., Zeilinger, A.: Information and fundamental elements of the structure of quantum theory. In: Castell, L., Ischebeck, O. (eds.) Time, Quantum Information. Springer, Berlin (2003)
Brukner, C., Zeilinger, A.: Operationally invariant information in quantum measurements. Phys. Rev. Lett. 83, 3354–3357 (1999). arxiv: org/abs/quant-ph/0005084
Brukner, C., Zeilinger, A.: Young’s experiment and the finiteness of information. Phil. Trans. R. Soc. Lond. A 360, 1061 (2002). arxiv: org/abs/quant-ph/0201026
Fuchs,C. A.: Quantum Mechanics as Quantum Information (and only a little more). arXiv:quant-ph/0205039 (2002)
Brukner, Č, Zeilinger, A.: Information invariance and quantum probabilities. Found. Phys. 39(7), 677–689 (2009)
Brukner, C., Zukowski, M., Zeilinger, A.: The essence of entanglement. arxiv: org/abs/quant-ph/0106119
Spekkens, R.W.: Evidence for the epistemic view of quantum states: A toy theory. Phys. Rev. A 75(3), 032110 (2007)
Spekkens, R.W.: Quasi-quantization: classical statistical theories with an epistemic restriction. arxiv: org/abs/1409.5041
Paterek, T., Dakic, B., Brukner, C.: Theories of systems with limited information content. New J. Phys. 12, 053037 (2010). arxiv: org/abs/0804.1423
Görnitz, T., Ischebeck, O.: An Introduction to Carl Friedrich von Weizsäcker’s Program for a Reconstruction of Quantum Theory. Time, Quantum and Information. Springer, Berlin (2003)
Lyre, H.: Quantum theory of ur-objects as a theory of information. Int. J. Theor. Phys. 34(8), 1541–1552 (1995)
Hardy, L: Quantum theory from five reasonable axioms, arxiv: org/abs/quant-ph/0101012 [quant-ph]
Dakic, B., Brukner, C.: Quantum theory and beyond: is entanglement special? In: Halvorson, H. (ed.) Deep Beauty: Understanding the Quantum World through Mathematical Innovation, pp. 365–392. Cambridge University Press, Cambridge (2011)
Masanes, L., Müller, M.P.: A derivation of quantum theory from physical requirements. New J. Phys. 13(6), 063001 (2011)
Müller, M.P., Masanes, L.: Information-theoretic postulates for quantum theory. arxiv: org/abs/1203.4516 [quant-ph]
Masanes, L., Müller, M.P., Augusiak, R., Perez-Garcia, D.: Existence of an information unit as a postulate of quantum theory. PNAS 110(41), 16373 (2013). arXiv: org/abs/1208.0493 08, 2012
Chiribella, G., D’Ariano, G.M., Perinotti, P.: Informational derivation of quantum theory. Phys. Rev. A 84(1), 012311 (2011)
Müller, M.P., Masanes, L.: Three-dimensionality of space and the quantum bit: how to derive both from information-theoretic postulates. New J. Phys. 15, 053040 (2013). https://doi.org/10.1088/1367-2630/15/5/053040. arXiv: org/abs/1206.0630 [quant-ph]
Hardy, L.: Reconstructing quantum theory, arXiv: org/abs/1303.1538
Kochen, S.: A reconstruction of quantum mechanics, arXiv preprint arXiv:1306.3951 (2013)
Goyal, P.: From Information Geometry to Quantum Theory. New J. Phys. 12, 023012 (2010). arXiv: org/abs/0805.2770
Reginatto, M., Hall, M.J.W.: Information geometry, dynamics and discrete quantum mechanics. AIP Conf. Proc. 1553, 246 (2013). arXiv:1207.6718
Höhn, P.A.: Toolbox for reconstructing quantum theory from rules on information acquisition. Quantum 1, 38 (2017). arXiv:org/abs/1412.8323 [quant-ph]
Höhn, P.A.: Quantum theory from questions. Phys. Rev. A 95, 012102 (2017). arXiv: org/abs/1511.01130 [quant-ph]
Stuckey, W., McDevitt, T., Silberstein, M.: No preferred reference frame at the foundation of quantum mechanics. Entropy 24, 12 (2022)
Mehrafarin, M.: Quantum mechanics from two physical postulates. Int. J. Theor. Phys. 44, 429 (2005). arXiv:quant-ph/0402153
Caticha, A.: Entropic Dynamics, Time, and Quantum Theory. J. Phys. A: Math. Theor. 44, 225303 (2011). arXiv: org:1005.2357
Caticha, A.: The Entropic Dynamics approach to Quantum Mechanics. Entropy 21, 943 (2019). arXiv: org:1908.04693
Frieden, B.R.: Fisher Information as the Basis for the Schrödinger Wave Equation. American J. Phys. 57, 1004 (1989)
Reginatto, M.: Derivation of the equations of nonrelativistic quantum mechanics using the principle of minimum Fisher information. Phys. Rev. A 58, 1775 (1998)
Smerlak, M., Rovelli, C.: Relational EPR. Found. of Phys. 37, 427–445 (2007)
Frieden, B.R.: Physics from Fisher Information. Cambridge University Press, Cambridge (1999)
Landau, L.D., Lifshitz, E.M.: The Classical Theory of Fields: Course of Theoretical Physics (Chap. 1), vol. 2, 4th edn. Butterworth-Heinemann, Oxford (1980)
Dirac, P.A.M.: The Principles of Quantum Mechanics, 4th edn. Clarendon, Oxford (1958)
Jaynes, E.T.: Prior information. IEEE Transactions on Systems Science and Cybernetics 4(3), 227–241 (1968)
Michael, J.W.H., Reginatto, M.: Schrödinger equation from an exact uncertainty principle. J. Phys. A: Math. Gen. 35, 3289 (2002)
Michael, J.W.H., Reginatto, M.: Quantum mechanics from a Heisenberg-type equality. Fortschr. Phys. 50, 646–651 (2002)
Bohm, D.: A suggested interpretation of the quantum theory in terms of hidden variables, I and II. Phys. Rev. 85, 166–180 (1952)
Stanford Encyclopedia of Philosophy: Bohmian Mechanics (2021). https://plato.stanford.edu/entries/qm-bohm/
Nelson, E.: Derivation of the Schrödinger Equation from Newtonian Mechanics. Phy. Rev. 150, 1079 (1966)
Nelson, E.: Quantum Fluctuations. Princeton University Press, Princeton (1985)
Feynman, R.P.: Lectures on Physics, vol. II. Addison-Wesley, London (1964)
Yang, J.M.: Variational principle for stochastic mechanics based on information measures. J. Math. Phys. 62, 102104 (2021). arXiv: org/abs/2102.00392 [quant-ph]
Rényi, A.: On measures of entropy and information. In: Neyman, J. (ed.) Proceedings of the 4th Berkeley Symposium on Mathematics, Statistics and Probability, pp. 547–561. University of California Press, Berkeley (1961)
Tsallis, C.: Possible generalization of Boltzmann-Gibbs statistics. J. Stat. Phys. 52, 479–487 (1998)
van Erven, T., Harremoës, P.: Rényi divergence and Kullback-Leibler divergence. IEEE Trans. Inf. Theory 70, 7 (2014)
Nielsen, F., Nock, R.: On Rényi and Tsallis entropies and divergences for exponential families. J. Phys. A: Math. and Theo. 45, 3 (2012)
Yang, J.M.: Quantum scalar field theory based on an extended least action principle. Int. J. Theor. Phys. 63, 15 (2024). https://doi.org/10.1007/s10773-023-05540-4
Acknowledgements
The author would like to thank the anonymous referees for their valuable comments, which help to strengthen the rationale behind the least observability principle and improve the clarity of the presentation of this paper.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendices
Appendix A: Extended Canonical Transformation
In classical mechanics, the canonical transformation is a change of canonical coordinators \(({\textbf{x}}, {\textbf{p}}, t)\) to generalized canonical coordinators \(({\textbf{X}}, {\textbf{P}}, t)\) that preserves the form of Hamilton’s equations. Denote the Lagrangian for both canonical coordinators as \(L_{xp}={\textbf{p}}\cdot \dot{{\textbf{x}}}-H({\textbf{x}},{\textbf{p}},t)\) and \(L'_{XP}={\textbf{P}}\cdot \dot{{\textbf{X}}}-K({\textbf{X}},{\textbf{P}},t)\), respectively,where K is the new form of Hamiltonian with the generalized coordinators. To ensure the form of Hamilton’s equations is preserved from the least action principle, one must have
One way to meet such conditions is that the Lagrangian in both integrals satisfy the following relation
where G is a generation function, and \(\lambda\) is a constant. When \(\lambda \ne 1\), the transformation is called extended canonical transformations. Here we will choose \(\lambda =-1\). Re-arranging (A3), we have
Choose a generation function \(G={\textbf{P}}\cdot \dot{{\textbf{X}}} + S({\textbf{x}}, {\textbf{P}}, t)\), that is, a type 2 generation function. Its total time derivative is
The divergence operator \(\nabla _P\) refers to partial derivative over the generalized momenta \({\textbf{P}}\). Comparing (A4) and A5) results in
From (A6), \(K= - (\partial S/\partial t + H)\). Thus, \(L'_{XP} = {\textbf{P}}\cdot \dot{{\textbf{X}}} + (\partial S/\partial t + H)\). We can choose a generation function S such that \({\textbf{X}}\) does not explicitly depend on t during motion. For instance, supposed \(S({\textbf{x}}, {\textbf{P}}, t)=F({\textbf{x}}, {\textbf{P}}) + f({\textbf{x}}, t)\), one has \({\textbf{X}}=-\nabla _P F({\textbf{x}}, {\textbf{P}})\), so that \(\dot{{\textbf{X}}}=0\) and \(L'_{XP} = \partial S/\partial t + H({\textbf{x}}, {\textbf{p}}, t)\). Then the action integral in the generalized canonical coordinators becomes
For the ensemble system with probability density \(\rho ({\textbf{x}}, t)\), the Lagrangian density \({\mathcal {L}}=\rho L'_{XP}\), and the average value of the classical action is,
which is Eq.(8). If one further imposes constraint on the generation function S such that the generalized Hamiltonian \(K=0\), Eq. (A6) becomes the Hamilton-Jacobi equation \(\partial S/\partial t + H = 0\). It is a special solution for the least action principle based on \(A_c\) when the generalized canonical coordinators and momenta are \(({\textbf{X}}, {\textbf{P}})\). It is also a solution for the least action principle based on \(S_c\) when the generalized canonical coordinators and momenta are \((\rho , S)\) [43, 44]. In either case, it is legitimate to interpret \(A_c\) or \(S_c\) as the corresponding classical action integral.
Appendix B: The Derivation of Schrödinger Equation
The key step in deriving the Schrödinger equation is to prove (11) from (9). To do this, one first takes the Taylor expansion of \(\rho ({\textbf{x}}+{\textbf{w}}, t)\) around x
where \(\partial _i=\partial /\partial x_i\) and \(\partial _i^2=\partial ^2/\partial x^2_i\). The expansion is legitimate because (4) shows that the variance of fluctuation displacement w is proportional to \(\Delta t\). As \(\Delta t \rightarrow 0\), only very small w is possible. Then
In the second step, the Taylor expansion \(ln(1+y)=y - y^2/2 + O(y^2)\) is used. Substitute the above expansion into (9) and note that \(ln(\rho ({\textbf{x}}+{\textbf{w}}, t_j)/\rho ({\textbf{x}}, t_j)) = - ln(\rho ({\textbf{x}}, t_j)/\rho ({\textbf{x}}+{\textbf{w}}, t_j))\),
The second and last steps use the fact that \(\langle w_i\rangle =0\). Integrating the last term and assuming \(\rho\) is a smooth function such that its spatial gradient approaches zero when \(|x_i|\rightarrow \pm \infty\), we have
Substitute \(\langle w_i^2\rangle = \hbar \Delta t/2m\) into (B6) and then into (9),
which is Eq. (11). The next step is to derive (13). Variation of I given in (12) with respect to \(\rho\) gives
Integration by part for the term with \(\delta \nabla \rho\), we have
Taking \(\delta I = 0\) for arbitrary \(\delta \rho\), we must have
One can verify that \(\left[ \frac{\nabla \rho \cdot \nabla \rho }{\rho ^2} - 2\frac{\nabla ^2\rho }{\rho }\right] =-4\frac{\nabla ^2\sqrt{\rho }}{\sqrt{\rho }}\). Substituting it back to (B13) gives the desired result in (13).
Appendix C: Charge Particle in An External Electromagnetic Field
Suppose a particle of charge q and mass m is placed in an electromagnetic field with vector potential \({{\textbf {A}}}\) and scalar potential \(\phi\). Without random fluctuations, the particle moves along a classical trajectory determined by the classical Hamilton-Jacobi equation:
Compared to (7), a generalized momentum term \((\nabla S - q{{\textbf {A}}})\) replaces the original momentum \(\nabla S\) [47, 49]. Similarly, the continuity equation becomes
These two equations can be derived through fixed point variation on the average classical action
Thus, observable information from the classical trajectory can be defined as \(I_p=2S_c/\hbar\). In addition, the particle also experiences constant fluctuations around the classical trajectory. We assume the external electromagnetic field has no influence on the vacuum fluctuations. This means \(I_f\) defined in (9) is applicable here. Variation of the total observable information \(I_p+I_f\) with respect to \(\rho\) gives the extended Hamilton-Jacobi equation
Defined \(\Psi =\sqrt{\rho }e^{iS/\hbar }\), the continuity equation and the extended Hamilton-Jacobi equation (C4) are combined into a single differential equation,
which is the Schrödinger equation in an external electromagnetic field on the condition \(\nabla \cdot {{\textbf {A}}} =0\).
Appendix D: Rényi Divergence and the Generalized Schrödinger Equation
Based on the definition of \(I_f^{\alpha }\) in (19), and starting from (B1), we have
Given the normalization condition \(\int \rho d^3{\textbf{x}} = 1\), and the regularity assumption of \(\rho\), \(\int \nabla \rho d^3{\textbf{x}} = 0\), we have
Thus, \(I_f^{\alpha }\) is simplified as
Compared to (B9), the only difference from \(I_f\) is an additional coefficient \(\alpha\), i.e., \(I_f^{\alpha } = \alpha I_f\). Equation (21) can be derived by repeating the calculation in Section B. To obtain the generalized Schrödinger equation, we define \(\Psi ^\prime =\sqrt{\rho }e^{iS/\sqrt{\alpha }\hbar }\), then take the partial derivative over time,
Multiplying \(i\sqrt{\alpha }\hbar /\Psi ^\prime\) both sides, and applying the continuity equation and extended Hamilton-Jaccobi function (21), we get
Taking the gradient of \(\Psi ^\prime =\sqrt{\rho }e^{iS/\sqrt{\alpha }\hbar }\), and using \(\rho = \Psi ^\prime \Psi ^{\prime *}\), one can obtain the following identities
Substitute these identities into (D4),
Multiplying \(\Psi ^\prime\) both sides, we arrive at the generalized Schrödinger equation (22).
Appendix E: Schrödinger equation for a Free Particle in Momentum Representation
In deriving the Schrödinger equation for a free particle in momentum representation, we need to prove that \(I_f\), defined in (15), does not contribute in the variation procedure with respect to \(\varrho ({\textbf{p}}, t)\), as long as \(\varrho ({\textbf{p}}, t)\) is a regular smooth function. We provide an intuitive proof here that is sufficiently convincing. A more mathematically rigorous proof is desirable in future research. First, we note that the Kullback–Leibler divergence is a special case of Rényi divergence \(D^{\alpha }_R\) when the order \(\alpha = 1\). Second, we make use of the fact that the Rényi divergence is non-decreasing as a function of its order \(\alpha\) [53]. Thus,
Given the non-negativity of divergence [4], the expectation value of \(D_{KL}\) and \(D^{\frac{1}{2}}_R\) with respect to transition probability density \(\tilde{\wp }({\textbf{p}}+\mathbf {\omega }|{\textbf{p}})\) also satisfies the inequality,
As shown in the main text, as \(\Delta t\rightarrow 0\), the variance \(\langle \omega _i^2 \rangle \rightarrow \infty\), and \(\wp ({\textbf{p}}+\mathbf {\omega }|{\textbf{p}})\) becomes a uniform function with respect to \({\tilde{w}}\). This means that any value of \(\mathbf {\omega }\) contributes equally in calculating the divergence \(D^{\frac{1}{2}}_R\). However, the integral inside the logarithm function basically depends on the overlap between functions \(\varrho ({\textbf{p}}, t_j)\) and \(\varrho ({\textbf{p}}+\mathbf {\omega }, t_j)\). We will ignore the case when \(\varrho ({\textbf{p}}, t_j)\) is a constant because in that case \(D_{KL}(\varrho ({\textbf{p}}, t_j) || \varrho ({\textbf{p}}+\mathbf {\omega }, t_j)=0\). Assuming \(\varrho ({\textbf{p}}, t_j)\) is a smooth function. For a free particle with finite energy, the momentum is also finite. Combining this fact with the normalization condition \(\int \varrho ({\textbf{p}}, t_j)d^3{\textbf{p}} = 1\), we must have \(\lim _{|{\textbf{p}}|\rightarrow \infty }\varrho ({\textbf{p}}, t_j)= 0\). Thus, the overlap between functions \(\varrho ({\textbf{p}})\) and \(\varrho ({\textbf{p}}+\mathbf {\omega })\) will be sufficiently small when \(|\mathbf {\omega }|\) is sufficiently large,
The implies
Given the non-negativity of \(D^{\frac{1}{2}}_R\) for any \(|\mathbf {\omega }|\), and the probability distribution for each \(\mathbf {\omega }\) becomes uniform, the value of right hand side of (E3) will be dominated by large \(\mathbf {\omega }\), and the result is approaching positive infinity. Hence, the left hand side of (E3) is also approaching positive infinity. This result is independent of the specific functional form of \(\varrho ({\textbf{p}}, t_j)\) assuming that \(\varrho ({\textbf{p}}, t_j)\) is a smooth continuous function. Consequently, variation of \(E_{\mathbf {\omega }}[D_{KL}]\) with respect to \(\varrho ({\textbf{p}}, t_j)\) does not give any constraint to \(\varrho ({\textbf{p}}, t_j)\),
Since this is true for every time moment \(t_j\), from the definition of \(I_f\) in (15), we conclude that \(\delta I_f /\delta \varrho = 0\). Note that if defining \(I_f\) using Fisher information, instead of Kullback–Leibler divergence \(D_{KL}\), as the information metrics, one will not reach the conclusion that \(I_f\) is a infinite number independent of \(\varrho\).
Appendix F: The Schrödinger Equation for a Bipartite System
From the definition of \(I_f\) in (9) for the bipartite system, we have
Expanding the logarithm function similarly to (B2),
Using the locality of vacuum fluctuations defined in (25), we have
The last step uses the fact that \(\langle w_{ia}\rangle = \langle w_{ib}\rangle = 0\). Taking the same assumption as (B8) that \(\rho\) is a regular function and its gradient with respect to \({\textbf{x}}_a\) or \({\textbf{x}}_b\) approaches zero when \(|{\textbf{x}}_a|, |{\textbf{x}}_b|\rightarrow \pm \infty\), we have
Substituting \(\langle w_{ia}^2\rangle = \hbar \Delta t/2m_a\) and \(\langle w_{ib}^2\rangle = \hbar \Delta t/2m_b\), and taking \(\Delta t\rightarrow 0\), we get
Combined with (24), the total amount of information for the bipartite system is
Variations with respect to S and \(\rho\), respectively, give two equations,
Defined \(\Psi ({\textbf{x}}_a, {\textbf{x}}_b, t) = \sqrt{\rho ({\textbf{x}}_a, {\textbf{x}}_b, t)}e^{iS/\hbar }\), it can be verified that that the two equations above are equivalent to the Schrödinger equation in (29).
Equation (F3) shows that \(I_f\) is inseparable since \(\rho ({\textbf{x}}_a, {\textbf{x}}_b, t) \ne \rho _a({\textbf{x}}_a, t)\rho _b({\textbf{x}}_b, t)\). On the other hand, suppose \(\rho ({\textbf{x}}_a, {\textbf{x}}_b, t) = \rho _a({\textbf{x}}_a, t)\rho _b({\textbf{x}}_b, t)\), then \(\nabla _a \rho = \rho _b\nabla _a\rho _a\). Similarly, \(\nabla _b \rho = \rho _a\nabla _b\rho _b\), then
and is clearly separable into two independent terms, where
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Yang, J.M. Quantum Mechanics Based on an Extended Least Action Principle and Information Metrics of Vacuum Fluctuations. Found Phys 54, 32 (2024). https://doi.org/10.1007/s10701-024-00757-7
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s10701-024-00757-7